track history more precisely in documentation for individual source files allow shared-* for API functions (requires tweaking dispatch) speedups for more architectures speedups for more microarchitectures consider faster PRF for the keyed hash giving the nonce merge subroutines in source to the extent possible scan for and remove any unused functions and files restructure for more merging at object-code level sort object files (for, e.g., improved cache utilization) optionally allow post-installation patching of current cpu as an exceptional cpuid (based on benchmarks and, with more CPU time, full functionality tests) dispatch: eliminate, e.g., avx2 if avx is higher priority speed up dispatch cpuid tests (lazy evaluation, merging cpuid calls) powbatch: adjust MAXTODO considering time vs. memory usage powbatch: use inversion tree, and try parallel mults nPbatch: use larger batches to allow faster powbatch nPbatch: use 0-padding of leftovers when this is faster than individual nP multiscalar: adjust MAXTODO multiscalar: vectorize across inputs nG etc.: vectorize across points being added batch versions of all other operations vectorize all other parallelizable operations verify constbranch, constindex full functional verification