USearch

USearch is a compact and high-performance similarity search engine designed for both vectors and soon, texts. Developed by Unum Cloud, it aims to be a faster, more efficient alternative to established solutions like FAISS, utilizing a streamlined, single-file implementation.

Key Features

Performance and Efficiency

  • 10x faster HNSW implementation compared to FAISS.
  • Optimized with SIMD and user-defined metrics via JIT compilation.
  • Hardware-agnostic support for half-precision (f16) and quarter-precision (i8) data types.
  • Capable of handling large indexes directly from disk, minimizing RAM usage.

Versatility

  • Multi-language support: Available for C++, Python, JavaScript, Java, Rust, and more.
  • Cross-platform compatibility: Works on Linux, MacOS, Windows, iOS, and WebAssembly.
  • Wide application range: Supports spatial, binary, probabilistic, and user-defined metrics, making it suitable for diverse applications from genomics to chemistry.

Integration and Usability

  • Simple API: Designed for ease of use while providing extensive customization options.
  • Extensible: Supports custom metric definitions and flexible search configurations.
  • Memory efficiency: Uses advanced downcasting and quantization techniques to reduce memory footprint.

Technical Insights

USearch leverages several advanced techniques to achieve its performance and flexibility:

  • Horner's method for polynomial approximations, outperforming GCC 12 by 119x.
  • Masked loads with Arm SVE and x86 AVX-512, eliminating tail loops for efficiency.
  • Custom bithacks for operations like sqrt, bypassing standard library dependencies.
  • Native bindings for each supported language, avoiding the overhead of general-purpose binding libraries.

Comparison with FAISS

While both USearch and FAISS use the HNSW algorithm, USearch distinguishes itself through:

  • Faster indexing times for high-dimensional data.
  • A more maintainable codebase with significantly fewer lines of code.
  • Support for user-defined metrics and more ID types, including uint40_t.
  • Fewer dependencies, enhancing portability and deployment.

Performance Benchmarks

USearch demonstrates substantial performance improvements over FAISS, especially in high-dimensional vector spaces. For example, indexing 100 million 96-dimensional vectors is 10x faster in USearch. Additionally, its lighter bindings make it more deployable, particularly in environments with limited resources.

Applications

Multi-Modal Semantic Search

USearch can be integrated with AI models like UForm to enable text-to-image search. By leveraging multi-modal embeddings, it provides powerful semantic search capabilities across diverse datasets.

Molecular Search with RDKit

USearch supports binary similarity metrics like the Tanimoto coefficient, making it ideal for chemistry applications. By integrating with RDKit, it enables efficient molecular fingerprint searching, even across vast datasets.

GIS Applications on iOS

With support for Objective-C and Swift, USearch can be used in mobile applications for real-time, geospatial searches using latitude and longitude coordinates.

Clustering and Joins

USearch provides near-real-time clustering capabilities, suitable for datasets ranging from tens to millions of entries. It also supports various types of joins, including fuzzy and semantic joins, enabling advanced data matching and integration tasks.

Integrations

USearch has been integrated with several major platforms and libraries, including:

  • GPTCache: Python-based caching for AI models.
  • LangChain: Provides bindings for both Python and JavaScript.
  • ClickHouse: High-performance columnar database integration.
  • Microsoft Semantic Kernel: Available for Python and C#.
  • LanternDB: Available in C++ and Rust.

Future Developments

Future plans for USearch include expanding its capabilities to handle text-based searches, further enhancing its versatility and performance across different domains.

For more technical details, visit the USearch GitHub repository and explore the comprehensive documentation and benchmarks available.

Similar Projects