Binary Vector Search at 350GB/S Using ARM Neon

MarekDlugos 18 hours ago

re: optimization for 1024b vectors — do you pad shorter ones, or fallback to a more general kernel?

marekgalovic 18 hours ago

We do a projection of the original vectors so that it matches one of our optimized kernel. This generally gives us better recall vs. simple padding since all bits are utilized.