A fast and light weight vector database implemented in modern C++20 with support for Approximate Nearest Neighbor (ANN) search algorithms. Suitable for medium size projects on embedded systems.
- Unified Database Architecture:
- Single
VectorDatabaseclass works with all index types - Clean, simple API via
IVectorDatabase::create(config) - Consistent threading model using
std::shared_mutex
- Single
- Multiple Index Types (all through unified VectorDatabase):
- Flat: Brute-force exact search - best for <1K vectors, 100% recall
- HNSW: Hierarchical Navigable Small World graphs - O(log N), 95-99% recall
- IVF: Inverted File Index - memory-efficient, fast construction, 85-98% recall
- Thread-Safe Operations:
- Concurrent reads: Multiple threads can search simultaneously
- Exclusive writes: Inserts/removes are properly serialized
- Modern C++20: Concepts, spans, smart pointers, RAII
- Flexible Distance Metrics: L2 (Euclidean), Cosine, Dot Product
- Persistence: Save and load database to/from disk
- C++20 compatible compiler (GCC 11+, Clang 14+)
- GNU Make or CMake 3.20+
- Git (for automatic dependency cloning)
The following dependencies are automatically handled by the build scripts:
- Google Test v1.15.2 (github.com/google/googletest)
- Automatically cloned to
external/googletestor fetched by CMake
- Automatically cloned to
git clone https://github.com/Alexk-195/lynx_vector_db.git
cd lynx_vector_db
# Build with setup.sh (recommended - handles all dependencies automatically)
./setup.sh
# OR build with CMake (dependencies auto-detected or fetched)
mkdir build && cd build
cmake ..
cmake --build ../setup.sh # Build release version
./setup.sh debug # Build debug version
./setup.sh clean # Clean build artifacts
./setup.sh rebuild # Clean and rebuild
./setup.sh test # Run unit tests
./setup.sh install # Install to system (requires sudo)
# Quality Assurance and Validation (Ticket #2072)
./setup.sh coverage # Generate code coverage report
./setup.sh tsan # Run ThreadSanitizer (data race detection)
./setup.sh asan # Run AddressSanitizer (memory error detection)
./setup.sh ubsan # Run UndefinedBehaviorSanitizer + AddressSanitizer
./setup.sh valgrind # Run Valgrind memory leak check (requires valgrind)
./setup.sh clang-tidy # Run clang-tidy static analysis (requires clang-tidy)
./setup.sh benchmark # Run performance benchmarks onlymake # Build release
make debug # Build debug
make clean # Clean build
make run # Build and run
make info # Show build configuration
# Note: To run tests, use ./setup.sh test or CMake (see Testing section below)# Configure (dependencies auto-detected)
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
# Build
cmake --build . --parallel
# Run tests (see Testing section below for more options)
ctest # Run all tests
make test # Alternative way to run all tests
make check # Run tests with verbose output
# Install
sudo cmake --install .For examples see src/main_minimal.cpp and src/main.cpp and src/compare_indices.cpp
# Using setup.sh (recommended - automatically configures CMake)
./setup.sh test
# Or using CMake directly
cd build
ctest # Run all testscd build
# Run all tests
ctest
# Run all tests with output
ctest --output-on-failure
# Run all tests with verbose output
ctest -V
# Run all tests quietly
ctest -Qcd build
# Run tests matching a pattern (regex)
ctest -R "ThreadingTest" # All threading tests
ctest -R "IVFDatabaseTest" # All IVF database tests
ctest -R "HNSW" # All HNSW-related tests
# Run specific test by exact name
ctest -R "^IVFDatabaseTest.BatchInsertAndSearch$"
# Run multiple patterns
ctest -R "Threading|IVF" # Threading OR IVF tests
# Exclude tests matching a pattern
ctest -E "Benchmark" # Exclude benchmark tests
# Combine include and exclude
ctest -R "Threading" -E "Stress" # Threading tests except stress testscd build
# Verbose output for specific tests
ctest -R "ThreadingTest" -V
# Show only failures
ctest -R "IVFDatabaseTest" --output-on-failure
# Rerun only failed tests
ctest --rerun-failed
# Stop on first failure
ctest --stop-on-failurecd build
# Run tests multiple times to check for flakiness
ctest -R "StatisticsConsistency" --repeat until-pass:5
# Run a specific test 10 times
for i in {1..10}; do
echo "=== Run $i ==="
ctest -R "ConcurrentWrites" --output-on-failure || break
done| Category | Pattern | Description |
|---|---|---|
| Threading Tests | ctest -R "ThreadingTest" |
Multi-threaded safety tests |
| IVF Tests | ctest -R "IVFDatabaseTest|IVFIndexTest" |
IVF index tests |
| HNSW Tests | ctest -R "HNSWTest|HNSWDatabaseTest" |
HNSW index tests |
| Flat Tests | ctest -R "FlatIndexTest" |
Flat/brute-force index tests |
| Distance Tests | ctest -R "DistanceMetricsTest" |
Distance metric tests |
| Persistence Tests | ctest -R "Persistence" |
Save/load functionality tests |
| Benchmark Tests | ctest -R "Benchmark" |
Performance benchmarks |
cd build
# Run the test binary directly with Google Test filters
./bin/lynx_tests --gtest_filter="ThreadingTest.*"
./bin/lynx_tests --gtest_filter="*IVF*"
./bin/lynx_tests --gtest_list_tests # List all tests
./bin/lynx_tests --gtest_filter="ThreadingTest.*" --gtest_repeat=10| Index Type | Best For | Query Speed | Memory | Construction | Recall |
|---|---|---|---|---|---|
| Flat | Small datasets (<1K), exact search required | Slow (O(N)) | Low | Instant | 100% |
| HNSW | High-performance ANN, low latency critical | Fast (O(log N)) | High | Slow | 95-99% |
| IVF | Memory-constrained, fast construction needed | Medium | Low | Fast | 85-98% |
n_clusters (number of clusters):
- Rule of thumb:
sqrt(N)where N is the number of vectors - Example: For 10,000 vectors, use ~100 clusters
- Larger values: Faster search but may require higher n_probe for good recall
- Smaller values: Slower search but better recall with low n_probe
n_probe (clusters to search):
- Controls the speed/recall tradeoff
- Start with 10 and adjust based on your needs:
- n_probe=1: Fastest queries, ~60-70% recall
- n_probe=10: Balanced, ~90-95% recall
- n_probe=sqrt(n_clusters): High recall, ~98%+
- Higher values = better recall but slower queries
Example tuning process:
// Start with default configuration
config.ivf_params.n_clusters = 100; // For ~10K vectors
config.ivf_params.n_probe = 10;
// If recall is too low, increase n_probe
params.n_probe = 20; // Better recall, slower
// If queries are too slow, decrease n_probe
params.n_probe = 5; // Faster, lower recall
// For datasets with natural clusters, fewer clusters may work better
config.ivf_params.n_clusters = 50; // Try fewer clustersLynx provides comprehensive validation tools through setup.sh for ensuring code quality, thread safety, memory safety, and performance. These tools are essential for production-ready code and are part of the quality assurance process documented in Ticket #2072.
| Tool | Command | Purpose | When to Use |
|---|---|---|---|
| Code Coverage | ./setup.sh coverage |
Measure test coverage, identify untested code paths | Before releases, to ensure comprehensive testing |
| ThreadSanitizer | ./setup.sh tsan |
Detect data races and threading issues | After threading changes, before production |
| AddressSanitizer | ./setup.sh asan |
Detect memory errors (buffer overflows, use-after-free) | During development, before commits |
| UBSanitizer | ./setup.sh ubsan |
Detect undefined behavior + memory errors | For comprehensive safety checks |
| Valgrind | ./setup.sh valgrind |
Thorough memory leak detection | Final validation, investigating memory issues |
| clang-tidy | ./setup.sh clang-tidy |
Static code analysis, best practices | Code review, quality improvements |
| Benchmarks | ./setup.sh benchmark |
Performance testing | After optimizations, regression testing |
Code Coverage (./setup.sh coverage):
- Generates HTML coverage report in
build-coverage/coverage_report/ - Target: >85% overall, >95% on critical paths
- Results saved to
tickets/2072_coverage_report.txt
ThreadSanitizer (./setup.sh tsan):
- Detects data races, deadlocks, and thread safety issues
- Essential for multi-threaded code validation
- Results saved to
tickets/2072_tsan_results.txt ⚠️ Note: Slower execution, ~2-5x overhead
AddressSanitizer (./setup.sh asan):
- Finds buffer overflows, use-after-free, memory corruption
- Fast execution, low overhead (~2x)
- Results saved to
tickets/2072_asan_results.txt - Recommended for regular development use
UndefinedBehaviorSanitizer (./setup.sh ubsan):
- Combines AddressSanitizer + undefined behavior detection
- Catches integer overflows, null pointer dereferences, etc.
- Results saved to
tickets/2072_ubsan_results.txt - Most comprehensive sanitizer check
Valgrind (./setup.sh valgrind):
- Definitive memory leak detection
- Slower execution (~10-50x overhead)
- Results saved to
tickets/2072_valgrind_results.txt - Requires:
sudo apt-get install valgrind(Ubuntu/Debian)
clang-tidy (./setup.sh clang-tidy):
- Static analysis for code quality
- Enforces modern C++ best practices
- Results saved to
tickets/2072_clang_tidy_results.txt - Requires:
sudo apt-get install clang-tidy(Ubuntu/Debian)
Benchmarks (./setup.sh benchmark):
- Runs performance benchmark tests only
- Measures search latency, memory usage, throughput
- Results saved to
tickets/2072_benchmark_results.txt - Uses Release build for accurate performance metrics
# 1. Quick validation (during development)
./setup.sh asan # Check for memory errors
# 2. Comprehensive validation (before commits)
./setup.sh coverage # Ensure good test coverage
./setup.sh tsan # Check threading safety
./setup.sh asan # Check memory safety
# 3. Release validation (before production)
./setup.sh coverage # Final coverage check
./setup.sh tsan # Thread safety validation
./setup.sh ubsan # Comprehensive sanitizer check
./setup.sh benchmark # Performance validation
./setup.sh clang-tidy # Code quality review
# 4. Deep investigation (if issues found)
./setup.sh valgrind # Thorough memory leak checkBased on Ticket #2070 quality assurance standards:
- ✅ Code Coverage: >85% overall, >95% on critical paths
- ✅ ThreadSanitizer: Clean (no race conditions)
- ✅ AddressSanitizer: Clean (no memory errors)
- ✅ UBSanitizer: Clean (no undefined behavior)
- ✅ Valgrind: Zero memory leaks in application code
- ✅ clang-tidy: No critical warnings
- ✅ Benchmarks: No performance regression
All validation tools save their results to the tickets/ directory:
tickets/2072_coverage_report.txt- Coverage analysistickets/2072_tsan_results.txt- ThreadSanitizer outputtickets/2072_asan_results.txt- AddressSanitizer outputtickets/2072_ubsan_results.txt- UBSanitizer outputtickets/2072_valgrind_results.txt- Valgrind outputtickets/2072_clang_tidy_results.txt- Static analysistickets/2072_benchmark_results.txt- Performance benchmarks
lynx_vector_db/
├── Makefile # Build configuration
├── setup.sh # Build script
├── CLAUDE.md # AI assistant instructions
├── README.md # This file
├── LICENSE # MIT License
├── doc/
│ └── research.md # ANN algorithm research and other files
├── tickets/
│ ├── README.md # Ticketing system documentation
│ └── done/ # Completed tickets
├── src/
│ ├── include/
│ │ └── lynx/
│ │ └── lynx.h # Public interface
│ ├── lib/
│ │ └── ... # Implementation files
│ ├── main_minimal.cpp # Example of minimal database executable
│ └── main.cpp # Larger example of database usage
└── tests/
└── ... # Unit tests
Lynx uses a clean, layered architecture with a unified database implementation:
-
API Layer: Pure virtual interface (
IVectorDatabase) in src/include/lynx/lynx.h- Provides factory method:
IVectorDatabase::create(config) - All operations (insert, search, remove, persistence)
- Thread-safe by design
- Provides factory method:
-
Database Layer: Unified
VectorDatabaseimplementation- Single class that works with all index types
- Thread-safe using
std::shared_mutex:- Concurrent reads (multiple searches simultaneously)
- Exclusive writes (inserts/removes serialized)
- Manages vector storage and metadata
- Delegates search to the appropriate index
-
Index Layer: Pluggable index implementations
FlatIndex- Brute-force exact search (O(N))HNSWIndex- Hierarchical graphs for fast ANN (O(log N))IVFIndex- Inverted file with k-means clustering
┌─────────────────────────────────────────────────────────┐
│ User Application │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ IVectorDatabase (Interface) │
│ - create() - insert() - search() │
│ - remove() - batch_insert() │
│ - get() - save()/load() │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ VectorDatabase (Unified Implementation) │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Thread Safety: std::shared_mutex │ │
│ │ - Concurrent reads (search, get, contains) │ │
│ │ - Exclusive writes (insert, remove) │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Vector Storage │ │
│ │ - std::unordered_map<id, VectorRecord> │ │
│ │ - Metadata management │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Index Selection (via Config) │ │
│ │ - IndexType::Flat → FlatIndex │ │
│ │ - IndexType::HNSW → HNSWIndex │ │
│ │ - IndexType::IVF → IVFIndex │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────┬───────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌─────────────┐ ┌───────────┐ ┌──────────┐
│ FlatIndex │ │ HNSWIndex │ │ IVFIndex │
├─────────────┤ ├───────────┤ ├──────────┤
│ Exact │ │ Graph- │ │ Cluster- │
│ search │ │ based ANN │ │ based ANN│
│ O(N) │ │ O(log N) │ │ O(k) │
└─────────────┘ └───────────┘ └──────────┘
All operations are thread-safe using std::shared_mutex:
// Multiple threads can search concurrently
std::thread t1([&]() { db->search(query1, k); });
std::thread t2([&]() { db->search(query2, k); }); // Concurrent with t1
// Writes are exclusive
std::thread t3([&]() { db->insert(record); }); // Blocks during write
t1.join(); t2.join(); t3.join();Performance characteristics:
- Low to medium concurrency: Excellent performance with
std::shared_mutex - Embedded systems: Low overhead, simple threading model
- High concurrency: Scales well for most use cases
- doc/MIGRATION_GUIDE.md - Migration guide for unified VectorDatabase
- doc/research.md - ANN algorithm research
- tests/README.md - Infos about unit testing
- tickets/README.md - File-based ticketing system
- Read CLAUDE.md for coding guidelines
- Review open tickets in tickets/ for tasks to work on
- Follow the code style conventions
- Add tests for new functionality
MIT License - see LICENSE for details.