speedups for more architectures speedups for more microarchitectures merge tables across implementations/compilers consider automatically eliminating rarely-used implementations merge subroutines in source to the extent possible compile each asm subroutine with only one compiler after compilation, merge object files that are essentially repeated sort object files (for, e.g., improved cache utilization) automatically build+use table of known exceptional cpuids that want other implementations improve automatic order of implementations to reduce number of exceptional cpuids optionally allow post-installation patching of current cpu as another exceptional cpuid (based on benchmarks and, with more CPU time, full functionality tests) dispatch: eliminate, e.g., avx2 if avx is higher priority speed up dispatch cpuid tests (lazy evaluation, merging cpuid calls) randombytes: support getrandom, getentropy verify constbranch, constindex full functional verification