Article Source
Awesome Vector Database 
A curated list of awesome works related to high dimensional structure/vector search & database
Services
- Google Vector Search (Vertex AI)
- Pinecone
- Weaviate [Beginner Guide]
- Vespa
- txtai
- marqo
- cohere
- vectara
Libraries & Engines
Multidimensional data / Vectors
- Faiss
- Typesense
- Qdrant
- annoy
- NGT
- pgvector
- Chroma
- LlamaIndex
- vectordb
- jvector
- RAFT
- Vald
- Voyager
- tinyvector
Texts
Others
- SimSIMD: Efficient Alternative to
scipy.spatial.distance
andnumpy.inner
Benchmarks & Databases
📚 Books
- Foundations of Multidimensional and Metric Data Structures
- Introduction to Information Retrieval
- Deep Learning for Search
Conferences & Workshops
- VLDB
- Tutorial:
- Image Retrieval in the Wild (CVPR20) [Video]
- Haystack
- Neural Search In Action
- ACM MM 2020: Effective and Efficient: Toward Open-world Instance Re-identification
- Retrieval Augmented Generation and Vespa [Slides]
Courses
- Long Term Memory in AI - Vector Search and Databases (COS 495 - Princeton) [Class Notes]
Publications
Survey
- Pan, James Jie, Jianguo Wang, and Guoliang Li. “Survey of Vector Database Management Systems.” arXiv preprint arXiv:2310.14021 (2023). [Paper]
- Nearest neighbor search: the old, the new, and the impossible. Andoni, Alexandr. [Paper]
Quantization
Source: A survey of product quantization.
- PQ: Product quantization for nearest neighbor search. Jegou, Herve, Matthijs Douze, and Cordelia Schmid. [Paper, Code, Julia Code, nanopq]
- k-selection on GPU: Billion-scale similarity search with gpus. Johnson, Jeff, Matthijs Douze, and Hervé Jégou [Paper, Code]
- A survey of product quantization. Matsui, Yusuke, Yusuke Uchida, Hervé Jégou, and Shin’ichi Satoh [Paper]
- OPQ: Optimized Product Quantization. Ge, Tiezheng, Kaiming He, Qifa Ke, and Jian Sun [Homepage, Paper, Code, nanopq]
- Quicker adc: Unlocking the hidden potential of product quantization with simd. André, Fabien, Anne-Marie Kermarrec, and Nicolas Le Scouarnec [Paper, Code]
- ScaNN: Accelerating Large-Scale Inference with Anisotropic Vector Quantization. Guo, Ruiqi, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar [Paper, Python/C++ Inference, Julia Training/Inference]
- The inverted multi-index. Babenko, Artem, and Victor Lempitsky [Paper, Code]
- Are We There Yet? Product Quantization and its Hardware Acceleration. Fernandez-Marques, Javier, Ahmed F. AbouElhamayed, Nicholas D. Lane, and Mohamed S. Abdelfattah. [Paper]
- LibVQ: A Toolkit for Optimizing Vector Quantization and Efficient Neural Retrieval. Li, Chaofan, Zheng Liu, Shitao Xiao, Yingxia Shao, Defu Lian, and Zhao Cao. [Paper, Code]
Graph-based Methods
- A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. Wang, Mengzhao, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. [Paper, Code]
- HNSW: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. Malkov, Yu A., and Dmitry A. Yashunin. [Paper, Code], Rust Version
- Scaling Graph-Based ANNS Algorithms to Billion-Size Datasets: A Comparative Analysis. Dobson, Magdalen, Zheqi Shen, Guy E. Blelloch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun. [Paper]
- FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search. Chen, Patrick, Wei-Cheng Chang, Jyun-Yu Jiang, Hsiang-Fu Yu, Inderjit Dhillon, and Cho-Jui Hsieh [Paper, Video]
- NSG : Navigating Spread-out Graph For Approximate Nearest Neighbor Search. Fu, Cong, Chao Xiang, Changxu Wang, and Deng Cai. [Paper, Code]
- EFANNA : Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph. Cong Fu, Deng Cai. [Paper, Code]
Hashing
- Awesome Papers on Learning to Hash
- A survey on learning to hash. Wang, Jingdong, Ting Zhang, Nicu Sebe, and Heng Tao Shen [Paper]
- A survey on deep hashing methods. Luo, Xiao, Haixin Wang, Daqing Wu, Chong Chen, Minghua Deng, Jianqiang Huang, and Xian-Sheng Hua. [Paper]
- Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. Gong, Yunchao, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin [Paper, Python code, Matlab code]
Evaluation & Metrics
- Which BM25 do you mean? A large-scale reproducibility study of scoring variants. Kamphuis, Chris, Arjen P. de Vries, Leonid Boytsov, and Jimmy Lin [Paper]
đź“° Articles & Talks
- What is a Vector Database? [Article]
- Vector databases (Part 1): What makes each one different?
- eBay’s Blazingly Fast Billion-Scale Vector Similarity Engine [Article]
- Computer Vision Meetup: Computer Vision Applications at Scale with Vector Databases [Video]
- How to choose your vector database in 2023?
- Do we really need a specialized vector database?
- Vector database is not a separate database category
- Vector Databases: A First-Principles Approach