Skip to content

Python bindings for ruvector #73

@adrianco

Description

@adrianco

I had a discussion with Claude about what this could look like. Let me know if this looks like a thing worth doing or not, and whether I should have a go at getting it built and make a pull request....

RuVector Python Bindings Implementation Specification

Overview

This document provides a complete specification for implementing Python bindings for RuVector, a distributed vector database with graph query support. The bindings should be contributed to the main ruvector repository at https://github.com/ruvnet/ruvector.

Goals

  1. Create native Python bindings using PyO3 and Maturin
  2. Match API parity with existing Node.js bindings (@ruvector/core, @ruvector/graph-node, @ruvector/gnn)
  3. Publish to PyPI as ruvector
  4. Follow ruvector's existing code style and contribution patterns

Repository Context

Before starting, clone and examine the ruvector repository:

git clone https://github.com/ruvnet/ruvector.git
cd ruvector

Key directories to study:

  • crates/ruvector-core/ - Core vector database engine
  • crates/ruvector-graph/ - Graph database with Cypher support
  • crates/ruvector-gnn/ - Graph Neural Network layers
  • crates/ruvector-node/ - Node.js bindings (reference implementation)
  • npm/packages/ - npm package structure (reference for Python package structure)

Project Structure

Create the following structure within the ruvector repository:

crates/
└── ruvector-python/
    ├── Cargo.toml
    ├── pyproject.toml
    ├── README.md
    ├── LICENSE                    # MIT (same as main repo)
    ├── src/
    │   ├── lib.rs                 # Main module entry point
    │   ├── vector_db.rs           # VectorDB bindings
    │   ├── graph_db.rs            # GraphDB bindings  
    │   ├── gnn.rs                 # GNN layer bindings
    │   ├── compression.rs         # Tensor compression bindings
    │   ├── types.rs               # Shared type definitions
    │   └── errors.rs              # Error handling
    ├── python/
    │   └── ruvector/
    │       ├── __init__.py        # Re-export from native module
    │       ├── py.typed           # PEP 561 marker
    │       └── _types.pyi         # Type stubs
    ├── tests/
    │   ├── test_vector_db.py
    │   ├── test_graph_db.py
    │   ├── test_gnn.py
    │   └── conftest.py
    └── examples/
        ├── basic_vector_search.py
        ├── graph_queries.py
        ├── rag_pipeline.py
        └── knowledge_graph.py

Dependencies

Cargo.toml

[package]
name = "ruvector-python"
version = "0.1.0"
edition = "2021"
authors = ["RuVector Contributors"]
license = "MIT"
description = "Python bindings for RuVector - a distributed vector database that learns"
repository = "https://github.com/ruvnet/ruvector"
keywords = ["vector-database", "graph-database", "machine-learning", "python", "pyo3"]
categories = ["database", "science", "api-bindings"]

[lib]
name = "ruvector"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.22", features = ["extension-module", "abi3-py38"] }
ruvector-core = { path = "../ruvector-core" }
ruvector-graph = { path = "../ruvector-graph" }
ruvector-gnn = { path = "../ruvector-gnn" }
ruvector-collections = { path = "../ruvector-collections" }
ruvector-filter = { path = "../ruvector-filter" }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
thiserror = "1.0"
numpy = "0.22"  # For efficient array handling

[build-dependencies]
pyo3-build-config = "0.22"

[features]
default = []
# Enable GNN features (optional, adds dependencies)
gnn = []
# Enable distributed features
distributed = ["ruvector-core/distributed"]

pyproject.toml

[build-system]
requires = ["maturin>=1.4,<2.0"]
build-backend = "maturin"

[project]
name = "ruvector"
version = "0.1.0"
description = "A distributed vector database that learns - Python bindings"
readme = "README.md"
license = { file = "LICENSE" }
requires-python = ">=3.8"
authors = [
    { name = "RuVector Contributors" }
]
classifiers = [
    "Development Status :: 4 - Beta",
    "Intended Audience :: Developers",
    "Intended Audience :: Science/Research",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.8",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Programming Language :: Rust",
    "Topic :: Database",
    "Topic :: Scientific/Engineering :: Artificial Intelligence",
]
keywords = ["vector-database", "embeddings", "graph-database", "machine-learning", "rag"]

[project.optional-dependencies]
dev = [
    "pytest>=7.0",
    "pytest-asyncio>=0.21",
    "numpy>=1.20",
    "sentence-transformers>=2.0",  # For testing with real embeddings
]

[project.urls]
Homepage = "https://github.com/ruvnet/ruvector"
Documentation = "https://github.com/ruvnet/ruvector/tree/main/crates/ruvector-python"
Repository = "https://github.com/ruvnet/ruvector"
Issues = "https://github.com/ruvnet/ruvector/issues"

[tool.maturin]
features = ["pyo3/extension-module"]
python-source = "python"
module-name = "ruvector._ruvector"
strip = true

[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"

Implementation Details

src/lib.rs

use pyo3::prelude::*;

mod vector_db;
mod graph_db;
mod gnn;
mod compression;
mod types;
mod errors;

use vector_db::PyVectorDB;
use graph_db::PyGraphDB;
use gnn::{PyGNNLayer, PyRuvectorLayer};
use compression::{compress, decompress};
use types::{PySearchResult, PyNode, PyEdge};

/// RuVector: A distributed vector database that learns
#[pymodule]
fn _ruvector(m: &Bound<'_, PyModule>) -> PyResult<()> {
    // Core classes
    m.add_class::<PyVectorDB>()?;
    m.add_class::<PyGraphDB>()?;
    
    // GNN classes
    m.add_class::<PyGNNLayer>()?;
    m.add_class::<PyRuvectorLayer>()?;
    
    // Result types
    m.add_class::<PySearchResult>()?;
    m.add_class::<PyNode>()?;
    m.add_class::<PyEdge>()?;
    
    // Utility functions
    m.add_function(wrap_pyfunction!(compress, m)?)?;
    m.add_function(wrap_pyfunction!(decompress, m)?)?;
    
    // Version info
    m.add("__version__", env!("CARGO_PKG_VERSION"))?;
    
    Ok(())
}

src/errors.rs

use pyo3::exceptions::{PyIOError, PyValueError, PyRuntimeError};
use pyo3::prelude::*;
use thiserror::Error;

#[derive(Error, Debug)]
pub enum RuVectorError {
    #[error("Database error: {0}")]
    Database(String),
    
    #[error("Invalid dimension: expected {expected}, got {got}")]
    DimensionMismatch { expected: usize, got: usize },
    
    #[error("Cypher query error: {0}")]
    CypherError(String),
    
    #[error("Serialization error: {0}")]
    SerializationError(String),
    
    #[error("Not found: {0}")]
    NotFound(String),
    
    #[error("IO error: {0}")]
    IoError(#[from] std::io::Error),
}

impl From<RuVectorError> for PyErr {
    fn from(err: RuVectorError) -> PyErr {
        match err {
            RuVectorError::Database(msg) => PyRuntimeError::new_err(msg),
            RuVectorError::DimensionMismatch { expected, got } => {
                PyValueError::new_err(format!(
                    "Dimension mismatch: expected {}, got {}", expected, got
                ))
            }
            RuVectorError::CypherError(msg) => PyValueError::new_err(msg),
            RuVectorError::SerializationError(msg) => PyValueError::new_err(msg),
            RuVectorError::NotFound(msg) => PyValueError::new_err(msg),
            RuVectorError::IoError(e) => PyIOError::new_err(e.to_string()),
        }
    }
}

pub type Result<T> = std::result::Result<T, RuVectorError>;

src/types.rs

use pyo3::prelude::*;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;

#[pyclass]
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct PySearchResult {
    #[pyo3(get)]
    pub id: String,
    #[pyo3(get)]
    pub distance: f32,
    #[pyo3(get)]
    pub metadata: Option<HashMap<String, String>>,
    #[pyo3(get)]
    pub vector: Option<Vec<f32>>,
}

#[pymethods]
impl PySearchResult {
    fn __repr__(&self) -> String {
        format!("SearchResult(id='{}', distance={:.4})", self.id, self.distance)
    }
    
    fn to_dict(&self) -> HashMap<String, PyObject> {
        Python::with_gil(|py| {
            let mut dict = HashMap::new();
            dict.insert("id".to_string(), self.id.clone().into_py(py));
            dict.insert("distance".to_string(), self.distance.into_py(py));
            if let Some(ref meta) = self.metadata {
                dict.insert("metadata".to_string(), meta.clone().into_py(py));
            }
            dict
        })
    }
}

#[pyclass]
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct PyNode {
    #[pyo3(get)]
    pub id: String,
    #[pyo3(get)]
    pub labels: Vec<String>,
    #[pyo3(get)]
    pub properties: HashMap<String, String>,
}

#[pymethods]
impl PyNode {
    fn __repr__(&self) -> String {
        format!("Node(id='{}', labels={:?})", self.id, self.labels)
    }
}

#[pyclass]
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct PyEdge {
    #[pyo3(get)]
    pub id: String,
    #[pyo3(get)]
    pub edge_type: String,
    #[pyo3(get)]
    pub source: String,
    #[pyo3(get)]
    pub target: String,
    #[pyo3(get)]
    pub properties: HashMap<String, String>,
}

#[pymethods]
impl PyEdge {
    fn __repr__(&self) -> String {
        format!("Edge({} -[{}]-> {})", self.source, self.edge_type, self.target)
    }
}

src/vector_db.rs

use pyo3::prelude::*;
use pyo3::types::{PyDict, PyList};
use numpy::{PyArray1, PyReadonlyArray1};
use std::collections::HashMap;
use std::sync::{Arc, RwLock};

use ruvector_core::{VectorDB, DbOptions, VectorEntry, SearchQuery};
use crate::errors::{RuVectorError, Result};
use crate::types::PySearchResult;

#[pyclass(name = "VectorDB")]
pub struct PyVectorDB {
    inner: Arc<RwLock<VectorDB>>,
    dimensions: usize,
}

#[pymethods]
impl PyVectorDB {
    /// Create a new VectorDB instance
    /// 
    /// Args:
    ///     dimensions: The dimensionality of vectors to store
    ///     path: Optional path for persistent storage. If None, uses in-memory storage.
    ///     distance_metric: Distance metric to use ('cosine', 'euclidean', 'dot')
    ///     
    /// Example:
    ///     db = VectorDB(dimensions=384, path="./vectors.db")
    #[new]
    #[pyo3(signature = (dimensions, path=None, distance_metric="cosine"))]
    fn new(dimensions: usize, path: Option<&str>, distance_metric: &str) -> PyResult<Self> {
        let mut opts = DbOptions::default();
        opts.dimensions = dimensions as u32;
        opts.distance_metric = distance_metric.to_string();
        
        if let Some(p) = path {
            opts.storage_path = p.to_string();
        }
        
        let db = VectorDB::new(opts)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(Self {
            inner: Arc::new(RwLock::new(db)),
            dimensions,
        })
    }
    
    /// Insert a vector into the database
    /// 
    /// Args:
    ///     id: Unique identifier for the vector
    ///     vector: The vector as a list of floats or numpy array
    ///     metadata: Optional metadata dictionary
    ///     
    /// Example:
    ///     db.insert("doc1", [0.1, 0.2, 0.3], metadata={"source": "wiki"})
    #[pyo3(signature = (id, vector, metadata=None))]
    fn insert(
        &self,
        id: &str,
        vector: PyReadonlyArray1<f32>,
        metadata: Option<&Bound<'_, PyDict>>,
    ) -> PyResult<()> {
        let vec = vector.as_slice()?;
        
        if vec.len() != self.dimensions {
            return Err(RuVectorError::DimensionMismatch {
                expected: self.dimensions,
                got: vec.len(),
            }.into());
        }
        
        let meta = metadata.map(|d| {
            d.iter()
                .filter_map(|(k, v)| {
                    let key = k.extract::<String>().ok()?;
                    let val = v.extract::<String>().ok()?;
                    Some((key, val))
                })
                .collect::<HashMap<_, _>>()
        });
        
        let entry = VectorEntry {
            id: Some(id.to_string()),
            vector: vec.to_vec(),
            metadata: meta.map(|m| serde_json::to_string(&m).unwrap()),
        };
        
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.insert(entry)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(())
    }
    
    /// Insert multiple vectors in batch
    /// 
    /// Args:
    ///     entries: List of (id, vector, metadata) tuples
    ///     
    /// Example:
    ///     db.insert_batch([
    ///         ("doc1", [0.1, 0.2], {"type": "article"}),
    ///         ("doc2", [0.3, 0.4], {"type": "blog"}),
    ///     ])
    fn insert_batch(&self, entries: &Bound<'_, PyList>) -> PyResult<usize> {
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        let mut count = 0;
        for item in entries.iter() {
            let tuple = item.extract::<(String, Vec<f32>, Option<HashMap<String, String>>)>()?;
            let (id, vector, metadata) = tuple;
            
            if vector.len() != self.dimensions {
                return Err(RuVectorError::DimensionMismatch {
                    expected: self.dimensions,
                    got: vector.len(),
                }.into());
            }
            
            let entry = VectorEntry {
                id: Some(id),
                vector,
                metadata: metadata.map(|m| serde_json::to_string(&m).unwrap()),
            };
            
            db.insert(entry)
                .map_err(|e| RuVectorError::Database(e.to_string()))?;
            count += 1;
        }
        
        Ok(count)
    }
    
    /// Search for similar vectors
    /// 
    /// Args:
    ///     query: Query vector as list or numpy array
    ///     k: Number of results to return
    ///     filter: Optional metadata filter (not yet implemented)
    ///     include_vectors: Whether to include vectors in results
    ///     
    /// Returns:
    ///     List of SearchResult objects
    ///     
    /// Example:
    ///     results = db.search([0.1, 0.2, 0.3], k=10)
    ///     for r in results:
    ///         print(f"{r.id}: {r.distance}")
    #[pyo3(signature = (query, k=10, filter=None, include_vectors=false))]
    fn search(
        &self,
        query: PyReadonlyArray1<f32>,
        k: usize,
        filter: Option<&Bound<'_, PyDict>>,
        include_vectors: bool,
    ) -> PyResult<Vec<PySearchResult>> {
        let vec = query.as_slice()?;
        
        if vec.len() != self.dimensions {
            return Err(RuVectorError::DimensionMismatch {
                expected: self.dimensions,
                got: vec.len(),
            }.into());
        }
        
        let search_query = SearchQuery {
            vector: vec.to_vec(),
            k,
            filter: None, // TODO: Implement filter parsing
            include_vectors,
        };
        
        let db = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        let results = db.search(&search_query)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(results.into_iter().map(|r| PySearchResult {
            id: r.id,
            distance: r.distance,
            metadata: r.metadata.and_then(|m| serde_json::from_str(&m).ok()),
            vector: if include_vectors { Some(r.vector) } else { None },
        }).collect())
    }
    
    /// Get a vector by ID
    /// 
    /// Args:
    ///     id: The vector ID to retrieve
    ///     
    /// Returns:
    ///     SearchResult or None if not found
    fn get(&self, id: &str) -> PyResult<Option<PySearchResult>> {
        let db = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        match db.get(id) {
            Ok(Some(entry)) => Ok(Some(PySearchResult {
                id: entry.id.unwrap_or_default(),
                distance: 0.0,
                metadata: entry.metadata.and_then(|m| serde_json::from_str(&m).ok()),
                vector: Some(entry.vector),
            })),
            Ok(None) => Ok(None),
            Err(e) => Err(RuVectorError::Database(e.to_string()).into()),
        }
    }
    
    /// Delete a vector by ID
    /// 
    /// Args:
    ///     id: The vector ID to delete
    ///     
    /// Returns:
    ///     True if deleted, False if not found
    fn delete(&self, id: &str) -> PyResult<bool> {
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.delete(id)
            .map_err(|e| RuVectorError::Database(e.to_string()))
    }
    
    /// Get the number of vectors in the database
    fn __len__(&self) -> PyResult<usize> {
        let db = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.len().map_err(|e| RuVectorError::Database(e.to_string()).into())
    }
    
    /// Get database statistics
    fn stats(&self) -> PyResult<HashMap<String, PyObject>> {
        let db = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        Python::with_gil(|py| {
            let mut stats = HashMap::new();
            stats.insert("dimensions".to_string(), self.dimensions.into_py(py));
            stats.insert("count".to_string(), db.len().unwrap_or(0).into_py(py));
            Ok(stats)
        })
    }
    
    /// Sync data to disk (for persistent storage)
    fn sync(&self) -> PyResult<()> {
        let db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.sync().map_err(|e| RuVectorError::Database(e.to_string()).into())
    }
    
    fn __repr__(&self) -> String {
        let count = self.inner.read()
            .map(|db| db.len().unwrap_or(0))
            .unwrap_or(0);
        format!("VectorDB(dimensions={}, count={})", self.dimensions, count)
    }
}

src/graph_db.rs

use pyo3::prelude::*;
use pyo3::types::PyDict;
use std::collections::HashMap;
use std::sync::{Arc, RwLock};

use ruvector_graph::{GraphDB, NodeBuilder, EdgeBuilder};
use crate::errors::{RuVectorError, Result};
use crate::types::{PyNode, PyEdge};

#[pyclass(name = "GraphDB")]
pub struct PyGraphDB {
    inner: Arc<RwLock<GraphDB>>,
}

#[pymethods]
impl PyGraphDB {
    /// Create a new GraphDB instance
    /// 
    /// Args:
    ///     path: Optional path for persistent storage
    ///     
    /// Example:
    ///     graph = GraphDB(path="./graph.db")
    #[new]
    #[pyo3(signature = (path=None))]
    fn new(path: Option<&str>) -> PyResult<Self> {
        let db = if let Some(p) = path {
            GraphDB::with_path(p)
        } else {
            GraphDB::new()
        }.map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(Self {
            inner: Arc::new(RwLock::new(db)),
        })
    }
    
    /// Execute a Cypher query
    /// 
    /// Args:
    ///     cypher: Cypher query string
    ///     
    /// Returns:
    ///     Query results as a list of dictionaries
    ///     
    /// Example:
    ///     results = graph.execute("MATCH (n:Person) RETURN n.name")
    fn execute(&self, cypher: &str) -> PyResult<PyObject> {
        let db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        let result = db.execute(cypher)
            .map_err(|e| RuVectorError::CypherError(e.to_string()))?;
        
        Python::with_gil(|py| {
            // Convert result to Python object
            let json_str = serde_json::to_string(&result)
                .map_err(|e| RuVectorError::SerializationError(e.to_string()))?;
            
            let json_module = py.import_bound("json")?;
            let parsed = json_module.call_method1("loads", (json_str,))?;
            Ok(parsed.into())
        })
    }
    
    /// Create a node
    /// 
    /// Args:
    ///     id: Unique node identifier
    ///     labels: List of node labels
    ///     properties: Node properties dictionary
    ///     
    /// Example:
    ///     graph.create_node("person1", ["Person"], {"name": "Alice", "age": "30"})
    #[pyo3(signature = (id, labels=None, properties=None))]
    fn create_node(
        &self,
        id: &str,
        labels: Option<Vec<String>>,
        properties: Option<&Bound<'_, PyDict>>,
    ) -> PyResult<PyNode> {
        let mut builder = NodeBuilder::new(id);
        
        if let Some(lbls) = labels {
            for label in lbls {
                builder = builder.label(&label);
            }
        }
        
        if let Some(props) = properties {
            for (key, value) in props.iter() {
                let k: String = key.extract()?;
                let v: String = value.extract()?;
                builder = builder.property(&k, v);
            }
        }
        
        let node = builder.build();
        
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.create_node(node.clone())
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(PyNode {
            id: id.to_string(),
            labels: labels.unwrap_or_default(),
            properties: properties
                .map(|p| {
                    p.iter()
                        .filter_map(|(k, v)| {
                            Some((k.extract::<String>().ok()?, v.extract::<String>().ok()?))
                        })
                        .collect()
                })
                .unwrap_or_default(),
        })
    }
    
    /// Create an edge between nodes
    /// 
    /// Args:
    ///     source: Source node ID
    ///     target: Target node ID
    ///     edge_type: Relationship type
    ///     properties: Edge properties dictionary
    ///     
    /// Example:
    ///     graph.create_edge("person1", "person2", "KNOWS", {"since": "2020"})
    #[pyo3(signature = (source, target, edge_type, properties=None))]
    fn create_edge(
        &self,
        source: &str,
        target: &str,
        edge_type: &str,
        properties: Option<&Bound<'_, PyDict>>,
    ) -> PyResult<PyEdge> {
        let mut builder = EdgeBuilder::new(source, target, edge_type);
        
        if let Some(props) = properties {
            for (key, value) in props.iter() {
                let k: String = key.extract()?;
                let v: String = value.extract()?;
                builder = builder.property(&k, v);
            }
        }
        
        let edge = builder.build();
        let edge_id = edge.id.clone();
        
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.create_edge(edge)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(PyEdge {
            id: edge_id,
            edge_type: edge_type.to_string(),
            source: source.to_string(),
            target: target.to_string(),
            properties: properties
                .map(|p| {
                    p.iter()
                        .filter_map(|(k, v)| {
                            Some((k.extract::<String>().ok()?, v.extract::<String>().ok()?))
                        })
                        .collect()
                })
                .unwrap_or_default(),
        })
    }
    
    /// Get a node by ID
    fn get_node(&self, id: &str) -> PyResult<Option<PyNode>> {
        let db = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        match db.get_node(id) {
            Ok(Some(node)) => Ok(Some(PyNode {
                id: node.id,
                labels: node.labels,
                properties: node.properties,
            })),
            Ok(None) => Ok(None),
            Err(e) => Err(RuVectorError::Database(e.to_string()).into()),
        }
    }
    
    /// Delete a node by ID
    fn delete_node(&self, id: &str) -> PyResult<bool> {
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.delete_node(id)
            .map_err(|e| RuVectorError::Database(e.to_string()))
    }
    
    /// Sync data to disk
    fn sync(&self) -> PyResult<()> {
        let db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.sync().map_err(|e| RuVectorError::Database(e.to_string()).into())
    }
    
    fn __repr__(&self) -> String {
        "GraphDB()".to_string()
    }
}

src/gnn.rs

use pyo3::prelude::*;
use numpy::{PyArray1, PyArray2, PyReadonlyArray1, PyReadonlyArray2};
use std::sync::{Arc, RwLock};

use ruvector_gnn::{GNNLayer, RuvectorLayer, LayerConfig};
use crate::errors::RuVectorError;

#[pyclass(name = "GNNLayer")]
pub struct PyGNNLayer {
    inner: Arc<RwLock<GNNLayer>>,
    input_dim: usize,
    output_dim: usize,
}

#[pymethods]
impl PyGNNLayer {
    /// Create a new GNN layer
    /// 
    /// Args:
    ///     input_dim: Input feature dimension
    ///     output_dim: Output feature dimension
    ///     heads: Number of attention heads
    ///     dropout: Dropout rate
    ///     
    /// Example:
    ///     layer = GNNLayer(input_dim=128, output_dim=256, heads=4)
    #[new]
    #[pyo3(signature = (input_dim, output_dim, heads=4, dropout=0.1))]
    fn new(input_dim: usize, output_dim: usize, heads: usize, dropout: f32) -> PyResult<Self> {
        let config = LayerConfig {
            input_dim,
            output_dim,
            heads,
            dropout,
        };
        
        let layer = GNNLayer::new(config)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(Self {
            inner: Arc::new(RwLock::new(layer)),
            input_dim,
            output_dim,
        })
    }
    
    /// Forward pass through the GNN layer
    /// 
    /// Args:
    ///     query: Query features (1D array)
    ///     neighbors: Neighbor features (2D array: n_neighbors x feature_dim)
    ///     weights: Edge weights (1D array: n_neighbors)
    ///     
    /// Returns:
    ///     Enhanced query features (1D array)
    fn forward<'py>(
        &self,
        py: Python<'py>,
        query: PyReadonlyArray1<f32>,
        neighbors: PyReadonlyArray2<f32>,
        weights: PyReadonlyArray1<f32>,
    ) -> PyResult<Bound<'py, PyArray1<f32>>> {
        let query_vec = query.as_slice()?;
        let neighbors_data = neighbors.as_array();
        let weights_vec = weights.as_slice()?;
        
        // Convert neighbors to Vec<Vec<f32>>
        let neighbors_vec: Vec<Vec<f32>> = neighbors_data
            .rows()
            .into_iter()
            .map(|row| row.to_vec())
            .collect();
        
        let layer = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        let result = layer.forward(query_vec, &neighbors_vec, weights_vec)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(PyArray1::from_vec_bound(py, result))
    }
    
    fn __repr__(&self) -> String {
        format!("GNNLayer(input_dim={}, output_dim={})", self.input_dim, self.output_dim)
    }
}

#[pyclass(name = "RuvectorLayer")]
pub struct PyRuvectorLayer {
    inner: Arc<RwLock<RuvectorLayer>>,
}

#[pymethods]
impl PyRuvectorLayer {
    /// Create a RuvectorLayer (GNN + attention for vector search enhancement)
    /// 
    /// Args:
    ///     input_dim: Input dimension
    ///     output_dim: Output dimension
    ///     heads: Number of attention heads
    ///     dropout: Dropout rate
    #[new]
    #[pyo3(signature = (input_dim, output_dim, heads=4, dropout=0.1))]
    fn new(input_dim: usize, output_dim: usize, heads: usize, dropout: f32) -> PyResult<Self> {
        let layer = RuvectorLayer::new(input_dim, output_dim, heads, dropout)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(Self {
            inner: Arc::new(RwLock::new(layer)),
        })
    }
    
    /// Apply differentiable search enhancement
    fn enhance_search<'py>(
        &self,
        py: Python<'py>,
        query: PyReadonlyArray1<f32>,
        candidates: PyReadonlyArray2<f32>,
    ) -> PyResult<Bound<'py, PyArray1<f32>>> {
        let query_vec = query.as_slice()?;
        let candidates_data = candidates.as_array();
        
        let candidates_vec: Vec<Vec<f32>> = candidates_data
            .rows()
            .into_iter()
            .map(|row| row.to_vec())
            .collect();
        
        let layer = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        let scores = layer.enhance_search(query_vec, &candidates_vec)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(PyArray1::from_vec_bound(py, scores))
    }
}

src/compression.rs

use pyo3::prelude::*;
use numpy::{PyArray1, PyReadonlyArray1};

use ruvector_gnn::compression::{TensorCompressor, CompressionLevel};
use crate::errors::RuVectorError;

/// Compress a vector using adaptive quantization
/// 
/// Args:
///     vector: Input vector as numpy array
///     ratio: Compression ratio (0.0-1.0). Lower = more compression.
///     
/// Returns:
///     Compressed vector as numpy array
///     
/// Example:
///     compressed = ruvector.compress(embedding, ratio=0.3)  # ~8x compression
#[pyfunction]
#[pyo3(signature = (vector, ratio=0.5))]
pub fn compress<'py>(
    py: Python<'py>,
    vector: PyReadonlyArray1<f32>,
    ratio: f32,
) -> PyResult<Bound<'py, PyArray1<f32>>> {
    let vec = vector.as_slice()?;
    
    let level = if ratio < 0.1 {
        CompressionLevel::Binary      // 32x
    } else if ratio < 0.25 {
        CompressionLevel::PQ4         // 16x
    } else if ratio < 0.5 {
        CompressionLevel::PQ8         // 8x
    } else if ratio < 0.75 {
        CompressionLevel::Float16     // 2x
    } else {
        CompressionLevel::None
    };
    
    let compressor = TensorCompressor::new(level);
    let compressed = compressor.compress(vec)
        .map_err(|e| RuVectorError::Database(e.to_string()))?;
    
    Ok(PyArray1::from_vec_bound(py, compressed))
}

/// Decompress a vector
/// 
/// Args:
///     vector: Compressed vector as numpy array
///     original_dim: Original vector dimension
///     
/// Returns:
///     Decompressed vector as numpy array
#[pyfunction]
#[pyo3(signature = (vector, original_dim=None))]
pub fn decompress<'py>(
    py: Python<'py>,
    vector: PyReadonlyArray1<f32>,
    original_dim: Option<usize>,
) -> PyResult<Bound<'py, PyArray1<f32>>> {
    let vec = vector.as_slice()?;
    
    let compressor = TensorCompressor::default();
    let decompressed = compressor.decompress(vec, original_dim)
        .map_err(|e| RuVectorError::Database(e.to_string()))?;
    
    Ok(PyArray1::from_vec_bound(py, decompressed))
}

python/ruvector/init.py

"""
RuVector: A distributed vector database that learns.

Store embeddings, query with Cypher, scale horizontally with Raft consensus,
and let the index improve itself through Graph Neural Networks.

Example:
    >>> from ruvector import VectorDB, GraphDB
    >>> 
    >>> # Vector search
    >>> db = VectorDB(dimensions=384, path="./vectors.db")
    >>> db.insert("doc1", embedding, metadata={"source": "wiki"})
    >>> results = db.search(query_embedding, k=10)
    >>> 
    >>> # Graph queries
    >>> graph = GraphDB(path="./graph.db")
    >>> graph.execute("CREATE (a:Person {name: 'Alice'})")
    >>> graph.execute("MATCH (p:Person) RETURN p.name")
"""

from ._ruvector import (
    # Core classes
    VectorDB,
    GraphDB,
    
    # GNN classes
    GNNLayer,
    RuvectorLayer,
    
    # Result types
    SearchResult,
    Node,
    Edge,
    
    # Utility functions
    compress,
    decompress,
    
    # Version
    __version__,
)

__all__ = [
    # Core
    "VectorDB",
    "GraphDB",
    
    # GNN
    "GNNLayer",
    "RuvectorLayer",
    
    # Types
    "SearchResult",
    "Node",
    "Edge",
    
    # Utils
    "compress",
    "decompress",
    
    # Meta
    "__version__",
]

python/ruvector/_types.pyi (Type Stubs)

"""Type stubs for ruvector native module."""

from typing import Dict, List, Optional, Any, Sequence, Union
import numpy as np
import numpy.typing as npt

class SearchResult:
    id: str
    distance: float
    metadata: Optional[Dict[str, str]]
    vector: Optional[List[float]]
    
    def to_dict(self) -> Dict[str, Any]: ...

class Node:
    id: str
    labels: List[str]
    properties: Dict[str, str]

class Edge:
    id: str
    edge_type: str
    source: str
    target: str
    properties: Dict[str, str]

class VectorDB:
    def __init__(
        self,
        dimensions: int,
        path: Optional[str] = None,
        distance_metric: str = "cosine",
    ) -> None: ...
    
    def insert(
        self,
        id: str,
        vector: Union[List[float], npt.NDArray[np.float32]],
        metadata: Optional[Dict[str, str]] = None,
    ) -> None: ...
    
    def insert_batch(
        self,
        entries: List[tuple[str, List[float], Optional[Dict[str, str]]]],
    ) -> int: ...
    
    def search(
        self,
        query: Union[List[float], npt.NDArray[np.float32]],
        k: int = 10,
        filter: Optional[Dict[str, Any]] = None,
        include_vectors: bool = False,
    ) -> List[SearchResult]: ...
    
    def get(self, id: str) -> Optional[SearchResult]: ...
    def delete(self, id: str) -> bool: ...
    def sync(self) -> None: ...
    def stats(self) -> Dict[str, Any]: ...
    def __len__(self) -> int: ...

class GraphDB:
    def __init__(self, path: Optional[str] = None) -> None: ...
    
    def execute(self, cypher: str) -> Any: ...
    
    def create_node(
        self,
        id: str,
        labels: Optional[List[str]] = None,
        properties: Optional[Dict[str, str]] = None,
    ) -> Node: ...
    
    def create_edge(
        self,
        source: str,
        target: str,
        edge_type: str,
        properties: Optional[Dict[str, str]] = None,
    ) -> Edge: ...
    
    def get_node(self, id: str) -> Optional[Node]: ...
    def delete_node(self, id: str) -> bool: ...
    def sync(self) -> None: ...

class GNNLayer:
    def __init__(
        self,
        input_dim: int,
        output_dim: int,
        heads: int = 4,
        dropout: float = 0.1,
    ) -> None: ...
    
    def forward(
        self,
        query: npt.NDArray[np.float32],
        neighbors: npt.NDArray[np.float32],
        weights: npt.NDArray[np.float32],
    ) -> npt.NDArray[np.float32]: ...

class RuvectorLayer:
    def __init__(
        self,
        input_dim: int,
        output_dim: int,
        heads: int = 4,
        dropout: float = 0.1,
    ) -> None: ...
    
    def enhance_search(
        self,
        query: npt.NDArray[np.float32],
        candidates: npt.NDArray[np.float32],
    ) -> npt.NDArray[np.float32]: ...

def compress(
    vector: npt.NDArray[np.float32],
    ratio: float = 0.5,
) -> npt.NDArray[np.float32]: ...

def decompress(
    vector: npt.NDArray[np.float32],
    original_dim: Optional[int] = None,
) -> npt.NDArray[np.float32]: ...

__version__: str

Tests

tests/conftest.py

import pytest
import numpy as np
import tempfile
import os

@pytest.fixture
def temp_dir():
    """Create a temporary directory for database files."""
    with tempfile.TemporaryDirectory() as tmpdir:
        yield tmpdir

@pytest.fixture
def sample_vectors():
    """Generate sample vectors for testing."""
    np.random.seed(42)
    return {
        "dimensions": 128,
        "vectors": [
            ("doc1", np.random.rand(128).astype(np.float32)),
            ("doc2", np.random.rand(128).astype(np.float32)),
            ("doc3", np.random.rand(128).astype(np.float32)),
        ],
        "query": np.random.rand(128).astype(np.float32),
    }

tests/test_vector_db.py

import pytest
import numpy as np
from ruvector import VectorDB

class TestVectorDB:
    def test_create_in_memory(self):
        db = VectorDB(dimensions=128)
        assert len(db) == 0
    
    def test_create_persistent(self, temp_dir):
        path = f"{temp_dir}/vectors.db"
        db = VectorDB(dimensions=128, path=path)
        assert len(db) == 0
    
    def test_insert_and_search(self, sample_vectors):
        db = VectorDB(dimensions=sample_vectors["dimensions"])
        
        # Insert vectors
        for id, vector in sample_vectors["vectors"]:
            db.insert(id, vector)
        
        assert len(db) == 3
        
        # Search
        results = db.search(sample_vectors["query"], k=2)
        assert len(results) == 2
        assert all(hasattr(r, "id") for r in results)
        assert all(hasattr(r, "distance") for r in results)
    
    def test_insert_with_metadata(self, sample_vectors):
        db = VectorDB(dimensions=sample_vectors["dimensions"])
        
        id, vector = sample_vectors["vectors"][0]
        db.insert(id, vector, metadata={"source": "test", "type": "document"})
        
        result = db.get(id)
        assert result is not None
        assert result.metadata["source"] == "test"
    
    def test_dimension_mismatch(self):
        db = VectorDB(dimensions=128)
        
        with pytest.raises(ValueError, match="Dimension mismatch"):
            wrong_dim = np.random.rand(256).astype(np.float32)
            db.insert("test", wrong_dim)
    
    def test_delete(self, sample_vectors):
        db = VectorDB(dimensions=sample_vectors["dimensions"])
        
        id, vector = sample_vectors["vectors"][0]
        db.insert(id, vector)
        assert len(db) == 1
        
        deleted = db.delete(id)
        assert deleted
        assert len(db) == 0
    
    def test_batch_insert(self, sample_vectors):
        db = VectorDB(dimensions=sample_vectors["dimensions"])
        
        entries = [
            (id, vector.tolist(), {"index": str(i)})
            for i, (id, vector) in enumerate(sample_vectors["vectors"])
        ]
        
        count = db.insert_batch(entries)
        assert count == 3
        assert len(db) == 3
    
    def test_persistence(self, temp_dir, sample_vectors):
        path = f"{temp_dir}/vectors.db"
        
        # Create and populate
        db1 = VectorDB(dimensions=sample_vectors["dimensions"], path=path)
        for id, vector in sample_vectors["vectors"]:
            db1.insert(id, vector)
        db1.sync()
        
        # Reopen
        db2 = VectorDB(dimensions=sample_vectors["dimensions"], path=path)
        assert len(db2) == 3

tests/test_graph_db.py

import pytest
from ruvector import GraphDB

class TestGraphDB:
    def test_create_in_memory(self):
        graph = GraphDB()
        assert graph is not None
    
    def test_create_node(self):
        graph = GraphDB()
        
        node = graph.create_node(
            "person1",
            labels=["Person"],
            properties={"name": "Alice", "age": "30"}
        )
        
        assert node.id == "person1"
        assert "Person" in node.labels
        assert node.properties["name"] == "Alice"
    
    def test_create_edge(self):
        graph = GraphDB()
        
        graph.create_node("person1", ["Person"], {"name": "Alice"})
        graph.create_node("person2", ["Person"], {"name": "Bob"})
        
        edge = graph.create_edge(
            "person1", "person2", "KNOWS",
            properties={"since": "2020"}
        )
        
        assert edge.source == "person1"
        assert edge.target == "person2"
        assert edge.edge_type == "KNOWS"
    
    def test_cypher_create(self):
        graph = GraphDB()
        
        graph.execute("CREATE (a:Person {name: 'Alice'})")
        graph.execute("CREATE (b:Person {name: 'Bob'})")
        graph.execute("MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'}) CREATE (a)-[:KNOWS]->(b)")
        
        result = graph.execute("MATCH (p:Person) RETURN p.name")
        assert len(result) == 2
    
    def test_cypher_match(self):
        graph = GraphDB()
        
        graph.create_node("alice", ["Person"], {"name": "Alice"})
        graph.create_node("bob", ["Person"], {"name": "Bob"})
        graph.create_edge("alice", "bob", "KNOWS")
        
        result = graph.execute("""
            MATCH (a:Person)-[:KNOWS]->(b:Person)
            RETURN a.name, b.name
        """)
        
        assert len(result) == 1
    
    def test_get_node(self):
        graph = GraphDB()
        
        graph.create_node("test", ["Label"], {"key": "value"})
        
        node = graph.get_node("test")
        assert node is not None
        assert node.id == "test"
        
        missing = graph.get_node("nonexistent")
        assert missing is None
    
    def test_delete_node(self):
        graph = GraphDB()
        
        graph.create_node("test", ["Label"])
        assert graph.get_node("test") is not None
        
        deleted = graph.delete_node("test")
        assert deleted
        assert graph.get_node("test") is None

tests/test_gnn.py

import pytest
import numpy as np
from ruvector import GNNLayer, RuvectorLayer, compress, decompress

class TestGNNLayer:
    def test_create_layer(self):
        layer = GNNLayer(input_dim=128, output_dim=256, heads=4)
        assert layer is not None
    
    def test_forward(self):
        layer = GNNLayer(input_dim=128, output_dim=256, heads=4)
        
        query = np.random.rand(128).astype(np.float32)
        neighbors = np.random.rand(5, 128).astype(np.float32)
        weights = np.ones(5, dtype=np.float32)
        
        output = layer.forward(query, neighbors, weights)
        assert output.shape == (256,)

class TestRuvectorLayer:
    def test_enhance_search(self):
        layer = RuvectorLayer(input_dim=128, output_dim=128, heads=4)
        
        query = np.random.rand(128).astype(np.float32)
        candidates = np.random.rand(10, 128).astype(np.float32)
        
        scores = layer.enhance_search(query, candidates)
        assert scores.shape == (10,)

class TestCompression:
    def test_compress_decompress(self):
        original = np.random.rand(128).astype(np.float32)
        
        compressed = compress(original, ratio=0.5)
        decompressed = decompress(compressed, original_dim=128)
        
        # Should be close but not exact due to lossy compression
        assert decompressed.shape == original.shape
    
    def test_compression_levels(self):
        original = np.random.rand(128).astype(np.float32)
        
        # Different compression levels
        for ratio in [0.05, 0.2, 0.4, 0.6, 0.9]:
            compressed = compress(original, ratio=ratio)
            assert compressed is not None

Examples

examples/basic_vector_search.py

"""Basic vector search example with RuVector."""

import numpy as np
from ruvector import VectorDB

def main():
    # Create a vector database
    db = VectorDB(dimensions=384, path="./demo_vectors.db")
    
    # Generate some sample embeddings (in practice, use a real embedding model)
    np.random.seed(42)
    documents = [
        ("doc1", "Introduction to machine learning"),
        ("doc2", "Deep learning with neural networks"),
        ("doc3", "Natural language processing basics"),
        ("doc4", "Computer vision fundamentals"),
        ("doc5", "Reinforcement learning tutorial"),
    ]
    
    # Insert documents with their embeddings
    for doc_id, text in documents:
        # Simulate embedding generation
        embedding = np.random.rand(384).astype(np.float32)
        db.insert(doc_id, embedding, metadata={"text": text})
    
    print(f"Inserted {len(db)} documents")
    
    # Search for similar documents
    query = np.random.rand(384).astype(np.float32)
    results = db.search(query, k=3)
    
    print("\nSearch results:")
    for result in results:
        print(f"  {result.id}: distance={result.distance:.4f}")
        if result.metadata:
            print(f"    text: {result.metadata.get('text', 'N/A')}")

if __name__ == "__main__":
    main()

examples/knowledge_graph.py

"""Knowledge graph example combining vectors and graph queries."""

import numpy as np
from ruvector import VectorDB, GraphDB

def main():
    # Initialize both databases
    vectors = VectorDB(dimensions=384, path="./kg_vectors.db")
    graph = GraphDB(path="./kg_graph.db")
    
    # Create some entities
    entities = [
        ("python", "Language", {"name": "Python", "type": "programming"}),
        ("rust", "Language", {"name": "Rust", "type": "programming"}),
        ("ml", "Topic", {"name": "Machine Learning"}),
        ("webdev", "Topic", {"name": "Web Development"}),
    ]
    
    np.random.seed(42)
    for entity_id, label, props in entities:
        # Create graph node
        graph.create_node(entity_id, [label], props)
        
        # Create embedding and store in vector DB
        embedding = np.random.rand(384).astype(np.float32)
        vectors.insert(entity_id, embedding, metadata=props)
    
    # Create relationships
    graph.create_edge("python", "ml", "USED_FOR")
    graph.create_edge("python", "webdev", "USED_FOR")
    graph.create_edge("rust", "webdev", "USED_FOR")
    
    # Query: Find what Python is used for
    result = graph.execute("""
        MATCH (lang:Language {name: 'Python'})-[:USED_FOR]->(topic:Topic)
        RETURN topic.name
    """)
    print("Python is used for:", result)
    
    # Semantic search: Find entities similar to a query
    query = np.random.rand(384).astype(np.float32)
    similar = vectors.search(query, k=2)
    print("\nMost similar entities:")
    for r in similar:
        print(f"  {r.id}: {r.distance:.4f}")

if __name__ == "__main__":
    main()

CI/CD Configuration

.github/workflows/python.yml (add to existing workflows)

name: Python Bindings

on:
  push:
    branches: [main]
    paths:
      - 'crates/ruvector-python/**'
  pull_request:
    paths:
      - 'crates/ruvector-python/**'

jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      
      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
      
      - name: Install maturin
        run: pip install maturin pytest numpy
      
      - name: Build and test
        working-directory: crates/ruvector-python
        run: |
          maturin develop
          pytest tests/ -v

  build-wheels:
    needs: test
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Build wheels
        uses: PyO3/maturin-action@v1
        with:
          working-directory: crates/ruvector-python
          args: --release --out dist
          manylinux: auto
      
      - name: Upload wheels
        uses: actions/upload-artifact@v4
        with:
          name: wheels-${{ matrix.os }}
          path: crates/ruvector-python/dist/*.whl

  publish:
    needs: build-wheels
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/python-v')
    
    steps:
      - uses: actions/download-artifact@v4
        with:
          pattern: wheels-*
          merge-multiple: true
          path: dist
      
      - name: Publish to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          password: ${{ secrets.PYPI_API_TOKEN }}

README.md

# RuVector Python Bindings

Python bindings for [RuVector](https://github.com/ruvnet/ruvector), a distributed vector database that learns.

## Installation

```bash
pip install ruvector

Quick Start

from ruvector import VectorDB, GraphDB
import numpy as np

# Vector search
db = VectorDB(dimensions=384, path="./vectors.db")

# Insert vectors
embedding = np.random.rand(384).astype(np.float32)
db.insert("doc1", embedding, metadata={"source": "wiki"})

# Search
query = np.random.rand(384).astype(np.float32)
results = db.search(query, k=10)
for r in results:
    print(f"{r.id}: {r.distance}")

# Graph queries with Cypher
graph = GraphDB(path="./graph.db")
graph.execute("CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})")
result = graph.execute("MATCH (p:Person) RETURN p.name")

Features

  • Vector Search: HNSW index, <0.5ms latency, SIMD acceleration
  • Graph Queries: Neo4j-style Cypher syntax
  • GNN Enhancement: Neural network layers that improve search over time
  • Compression: 2-32x memory reduction with adaptive quantization
  • Persistence: Optional file-based storage with crash recovery

API Reference

See the [full documentation](https://github.com/ruvnet/ruvector/tree/main/crates/ruvector-python).

License

MIT


## Contribution Checklist

Before submitting the PR:

1. [ ] All tests pass: `pytest tests/ -v`
2. [ ] Code is formatted: `cargo fmt`
3. [ ] No clippy warnings: `cargo clippy`
4. [ ] Type stubs are complete and accurate
5. [ ] Examples work correctly
6. [ ] README is updated
7. [ ] CHANGELOG entry added
8. [ ] Version matches main ruvector version

## PR Description Template

```markdown
## Python Bindings for RuVector

This PR adds native Python bindings for RuVector using PyO3 and Maturin.

### Features
- `VectorDB`: Vector storage and similarity search
- `GraphDB`: Cypher query support
- `GNNLayer`/`RuvectorLayer`: GNN enhancement layers
- `compress`/`decompress`: Tensor compression utilities

### API Parity
Matches the Node.js bindings (`@ruvector/core`, `@ruvector/graph-node`, `@ruvector/gnn`) with Pythonic adaptations.

### Testing
- Unit tests for all public APIs
- Integration tests with numpy arrays
- Tested on Python 3.8-3.12, Linux/macOS/Windows

### Documentation
- Type stubs (`.pyi`) for IDE support
- Docstrings with examples
- README with quickstart guide

Closes #XX (if there's a related issue)

Notes for Implementation

  1. Study the Node.js bindings first: Look at crates/ruvector-node/src/lib.rs for patterns
  2. Match the API signatures: Keep function names and parameters consistent with Node.js
  3. Use numpy for arrays: Python users expect numpy array support
  4. Thread safety: Use Arc<RwLock<>> for the inner database handles
  5. Error messages: Make them Pythonic and helpful
  6. Type hints: Complete .pyi files are essential for Python IDE support
  7. Test on all platforms: macOS, Linux, Windows all behave differently

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions