Python bindings for ruvector

I had a discussion with Claude about what this could look like. Let me know if this looks like a thing worth doing or not, and whether I should have a go at getting it built and make a pull request....



# RuVector Python Bindings Implementation Specification

## Overview

This document provides a complete specification for implementing Python bindings for RuVector, a distributed vector database with graph query support. The bindings should be contributed to the main ruvector repository at https://github.com/ruvnet/ruvector.

## Goals

1. Create native Python bindings using PyO3 and Maturin
2. Match API parity with existing Node.js bindings (`@ruvector/core`, `@ruvector/graph-node`, `@ruvector/gnn`)
3. Publish to PyPI as `ruvector`
4. Follow ruvector's existing code style and contribution patterns

## Repository Context

Before starting, clone and examine the ruvector repository:

```bash
git clone https://github.com/ruvnet/ruvector.git
cd ruvector
```

Key directories to study:
- `crates/ruvector-core/` - Core vector database engine
- `crates/ruvector-graph/` - Graph database with Cypher support
- `crates/ruvector-gnn/` - Graph Neural Network layers
- `crates/ruvector-node/` - Node.js bindings (reference implementation)
- `npm/packages/` - npm package structure (reference for Python package structure)

## Project Structure

Create the following structure within the ruvector repository:

```
crates/
└── ruvector-python/
    ├── Cargo.toml
    ├── pyproject.toml
    ├── README.md
    ├── LICENSE                    # MIT (same as main repo)
    ├── src/
    │   ├── lib.rs                 # Main module entry point
    │   ├── vector_db.rs           # VectorDB bindings
    │   ├── graph_db.rs            # GraphDB bindings  
    │   ├── gnn.rs                 # GNN layer bindings
    │   ├── compression.rs         # Tensor compression bindings
    │   ├── types.rs               # Shared type definitions
    │   └── errors.rs              # Error handling
    ├── python/
    │   └── ruvector/
    │       ├── __init__.py        # Re-export from native module
    │       ├── py.typed           # PEP 561 marker
    │       └── _types.pyi         # Type stubs
    ├── tests/
    │   ├── test_vector_db.py
    │   ├── test_graph_db.py
    │   ├── test_gnn.py
    │   └── conftest.py
    └── examples/
        ├── basic_vector_search.py
        ├── graph_queries.py
        ├── rag_pipeline.py
        └── knowledge_graph.py
```

## Dependencies

### Cargo.toml

```toml
[package]
name = "ruvector-python"
version = "0.1.0"
edition = "2021"
authors = ["RuVector Contributors"]
license = "MIT"
description = "Python bindings for RuVector - a distributed vector database that learns"
repository = "https://github.com/ruvnet/ruvector"
keywords = ["vector-database", "graph-database", "machine-learning", "python", "pyo3"]
categories = ["database", "science", "api-bindings"]

[lib]
name = "ruvector"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.22", features = ["extension-module", "abi3-py38"] }
ruvector-core = { path = "../ruvector-core" }
ruvector-graph = { path = "../ruvector-graph" }
ruvector-gnn = { path = "../ruvector-gnn" }
ruvector-collections = { path = "../ruvector-collections" }
ruvector-filter = { path = "../ruvector-filter" }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
thiserror = "1.0"
numpy = "0.22"  # For efficient array handling

[build-dependencies]
pyo3-build-config = "0.22"

[features]
default = []
# Enable GNN features (optional, adds dependencies)
gnn = []
# Enable distributed features
distributed = ["ruvector-core/distributed"]
```

### pyproject.toml

```toml
[build-system]
requires = ["maturin>=1.4,<2.0"]
build-backend = "maturin"

[project]
name = "ruvector"
version = "0.1.0"
description = "A distributed vector database that learns - Python bindings"
readme = "README.md"
license = { file = "LICENSE" }
requires-python = ">=3.8"
authors = [
    { name = "RuVector Contributors" }
]
classifiers = [
    "Development Status :: 4 - Beta",
    "Intended Audience :: Developers",
    "Intended Audience :: Science/Research",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.8",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Programming Language :: Rust",
    "Topic :: Database",
    "Topic :: Scientific/Engineering :: Artificial Intelligence",
]
keywords = ["vector-database", "embeddings", "graph-database", "machine-learning", "rag"]

[project.optional-dependencies]
dev = [
    "pytest>=7.0",
    "pytest-asyncio>=0.21",
    "numpy>=1.20",
    "sentence-transformers>=2.0",  # For testing with real embeddings
]

[project.urls]
Homepage = "https://github.com/ruvnet/ruvector"
Documentation = "https://github.com/ruvnet/ruvector/tree/main/crates/ruvector-python"
Repository = "https://github.com/ruvnet/ruvector"
Issues = "https://github.com/ruvnet/ruvector/issues"

[tool.maturin]
features = ["pyo3/extension-module"]
python-source = "python"
module-name = "ruvector._ruvector"
strip = true

[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
```

## Implementation Details

### src/lib.rs

```rust
use pyo3::prelude::*;

mod vector_db;
mod graph_db;
mod gnn;
mod compression;
mod types;
mod errors;

use vector_db::PyVectorDB;
use graph_db::PyGraphDB;
use gnn::{PyGNNLayer, PyRuvectorLayer};
use compression::{compress, decompress};
use types::{PySearchResult, PyNode, PyEdge};

/// RuVector: A distributed vector database that learns
#[pymodule]
fn _ruvector(m: &Bound<'_, PyModule>) -> PyResult<()> {
    // Core classes
    m.add_class::<PyVectorDB>()?;
    m.add_class::<PyGraphDB>()?;
    
    // GNN classes
    m.add_class::<PyGNNLayer>()?;
    m.add_class::<PyRuvectorLayer>()?;
    
    // Result types
    m.add_class::<PySearchResult>()?;
    m.add_class::<PyNode>()?;
    m.add_class::<PyEdge>()?;
    
    // Utility functions
    m.add_function(wrap_pyfunction!(compress, m)?)?;
    m.add_function(wrap_pyfunction!(decompress, m)?)?;
    
    // Version info
    m.add("__version__", env!("CARGO_PKG_VERSION"))?;
    
    Ok(())
}
```

### src/errors.rs

```rust
use pyo3::exceptions::{PyIOError, PyValueError, PyRuntimeError};
use pyo3::prelude::*;
use thiserror::Error;

#[derive(Error, Debug)]
pub enum RuVectorError {
    #[error("Database error: {0}")]
    Database(String),
    
    #[error("Invalid dimension: expected {expected}, got {got}")]
    DimensionMismatch { expected: usize, got: usize },
    
    #[error("Cypher query error: {0}")]
    CypherError(String),
    
    #[error("Serialization error: {0}")]
    SerializationError(String),
    
    #[error("Not found: {0}")]
    NotFound(String),
    
    #[error("IO error: {0}")]
    IoError(#[from] std::io::Error),
}

impl From<RuVectorError> for PyErr {
    fn from(err: RuVectorError) -> PyErr {
        match err {
            RuVectorError::Database(msg) => PyRuntimeError::new_err(msg),
            RuVectorError::DimensionMismatch { expected, got } => {
                PyValueError::new_err(format!(
                    "Dimension mismatch: expected {}, got {}", expected, got
                ))
            }
            RuVectorError::CypherError(msg) => PyValueError::new_err(msg),
            RuVectorError::SerializationError(msg) => PyValueError::new_err(msg),
            RuVectorError::NotFound(msg) => PyValueError::new_err(msg),
            RuVectorError::IoError(e) => PyIOError::new_err(e.to_string()),
        }
    }
}

pub type Result<T> = std::result::Result<T, RuVectorError>;
```

### src/types.rs

```rust
use pyo3::prelude::*;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;

#[pyclass]
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct PySearchResult {
    #[pyo3(get)]
    pub id: String,
    #[pyo3(get)]
    pub distance: f32,
    #[pyo3(get)]
    pub metadata: Option<HashMap<String, String>>,
    #[pyo3(get)]
    pub vector: Option<Vec<f32>>,
}

#[pymethods]
impl PySearchResult {
    fn __repr__(&self) -> String {
        format!("SearchResult(id='{}', distance={:.4})", self.id, self.distance)
    }
    
    fn to_dict(&self) -> HashMap<String, PyObject> {
        Python::with_gil(|py| {
            let mut dict = HashMap::new();
            dict.insert("id".to_string(), self.id.clone().into_py(py));
            dict.insert("distance".to_string(), self.distance.into_py(py));
            if let Some(ref meta) = self.metadata {
                dict.insert("metadata".to_string(), meta.clone().into_py(py));
            }
            dict
        })
    }
}

#[pyclass]
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct PyNode {
    #[pyo3(get)]
    pub id: String,
    #[pyo3(get)]
    pub labels: Vec<String>,
    #[pyo3(get)]
    pub properties: HashMap<String, String>,
}

#[pymethods]
impl PyNode {
    fn __repr__(&self) -> String {
        format!("Node(id='{}', labels={:?})", self.id, self.labels)
    }
}

#[pyclass]
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct PyEdge {
    #[pyo3(get)]
    pub id: String,
    #[pyo3(get)]
    pub edge_type: String,
    #[pyo3(get)]
    pub source: String,
    #[pyo3(get)]
    pub target: String,
    #[pyo3(get)]
    pub properties: HashMap<String, String>,
}

#[pymethods]
impl PyEdge {
    fn __repr__(&self) -> String {
        format!("Edge({} -[{}]-> {})", self.source, self.edge_type, self.target)
    }
}
```

### src/vector_db.rs

```rust
use pyo3::prelude::*;
use pyo3::types::{PyDict, PyList};
use numpy::{PyArray1, PyReadonlyArray1};
use std::collections::HashMap;
use std::sync::{Arc, RwLock};

use ruvector_core::{VectorDB, DbOptions, VectorEntry, SearchQuery};
use crate::errors::{RuVectorError, Result};
use crate::types::PySearchResult;

#[pyclass(name = "VectorDB")]
pub struct PyVectorDB {
    inner: Arc<RwLock<VectorDB>>,
    dimensions: usize,
}

#[pymethods]
impl PyVectorDB {
    /// Create a new VectorDB instance
    /// 
    /// Args:
    ///     dimensions: The dimensionality of vectors to store
    ///     path: Optional path for persistent storage. If None, uses in-memory storage.
    ///     distance_metric: Distance metric to use ('cosine', 'euclidean', 'dot')
    ///     
    /// Example:
    ///     db = VectorDB(dimensions=384, path="./vectors.db")
    #[new]
    #[pyo3(signature = (dimensions, path=None, distance_metric="cosine"))]
    fn new(dimensions: usize, path: Option<&str>, distance_metric: &str) -> PyResult<Self> {
        let mut opts = DbOptions::default();
        opts.dimensions = dimensions as u32;
        opts.distance_metric = distance_metric.to_string();
        
        if let Some(p) = path {
            opts.storage_path = p.to_string();
        }
        
        let db = VectorDB::new(opts)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(Self {
            inner: Arc::new(RwLock::new(db)),
            dimensions,
        })
    }
    
    /// Insert a vector into the database
    /// 
    /// Args:
    ///     id: Unique identifier for the vector
    ///     vector: The vector as a list of floats or numpy array
    ///     metadata: Optional metadata dictionary
    ///     
    /// Example:
    ///     db.insert("doc1", [0.1, 0.2, 0.3], metadata={"source": "wiki"})
    #[pyo3(signature = (id, vector, metadata=None))]
    fn insert(
        &self,
        id: &str,
        vector: PyReadonlyArray1<f32>,
        metadata: Option<&Bound<'_, PyDict>>,
    ) -> PyResult<()> {
        let vec = vector.as_slice()?;
        
        if vec.len() != self.dimensions {
            return Err(RuVectorError::DimensionMismatch {
                expected: self.dimensions,
                got: vec.len(),
            }.into());
        }
        
        let meta = metadata.map(|d| {
            d.iter()
                .filter_map(|(k, v)| {
                    let key = k.extract::<String>().ok()?;
                    let val = v.extract::<String>().ok()?;
                    Some((key, val))
                })
                .collect::<HashMap<_, _>>()
        });
        
        let entry = VectorEntry {
            id: Some(id.to_string()),
            vector: vec.to_vec(),
            metadata: meta.map(|m| serde_json::to_string(&m).unwrap()),
        };
        
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.insert(entry)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(())
    }
    
    /// Insert multiple vectors in batch
    /// 
    /// Args:
    ///     entries: List of (id, vector, metadata) tuples
    ///     
    /// Example:
    ///     db.insert_batch([
    ///         ("doc1", [0.1, 0.2], {"type": "article"}),
    ///         ("doc2", [0.3, 0.4], {"type": "blog"}),
    ///     ])
    fn insert_batch(&self, entries: &Bound<'_, PyList>) -> PyResult<usize> {
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        let mut count = 0;
        for item in entries.iter() {
            let tuple = item.extract::<(String, Vec<f32>, Option<HashMap<String, String>>)>()?;
            let (id, vector, metadata) = tuple;
            
            if vector.len() != self.dimensions {
                return Err(RuVectorError::DimensionMismatch {
                    expected: self.dimensions,
                    got: vector.len(),
                }.into());
            }
            
            let entry = VectorEntry {
                id: Some(id),
                vector,
                metadata: metadata.map(|m| serde_json::to_string(&m).unwrap()),
            };
            
            db.insert(entry)
                .map_err(|e| RuVectorError::Database(e.to_string()))?;
            count += 1;
        }
        
        Ok(count)
    }
    
    /// Search for similar vectors
    /// 
    /// Args:
    ///     query: Query vector as list or numpy array
    ///     k: Number of results to return
    ///     filter: Optional metadata filter (not yet implemented)
    ///     include_vectors: Whether to include vectors in results
    ///     
    /// Returns:
    ///     List of SearchResult objects
    ///     
    /// Example:
    ///     results = db.search([0.1, 0.2, 0.3], k=10)
    ///     for r in results:
    ///         print(f"{r.id}: {r.distance}")
    #[pyo3(signature = (query, k=10, filter=None, include_vectors=false))]
    fn search(
        &self,
        query: PyReadonlyArray1<f32>,
        k: usize,
        filter: Option<&Bound<'_, PyDict>>,
        include_vectors: bool,
    ) -> PyResult<Vec<PySearchResult>> {
        let vec = query.as_slice()?;
        
        if vec.len() != self.dimensions {
            return Err(RuVectorError::DimensionMismatch {
                expected: self.dimensions,
                got: vec.len(),
            }.into());
        }
        
        let search_query = SearchQuery {
            vector: vec.to_vec(),
            k,
            filter: None, // TODO: Implement filter parsing
            include_vectors,
        };
        
        let db = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        let results = db.search(&search_query)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(results.into_iter().map(|r| PySearchResult {
            id: r.id,
            distance: r.distance,
            metadata: r.metadata.and_then(|m| serde_json::from_str(&m).ok()),
            vector: if include_vectors { Some(r.vector) } else { None },
        }).collect())
    }
    
    /// Get a vector by ID
    /// 
    /// Args:
    ///     id: The vector ID to retrieve
    ///     
    /// Returns:
    ///     SearchResult or None if not found
    fn get(&self, id: &str) -> PyResult<Option<PySearchResult>> {
        let db = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        match db.get(id) {
            Ok(Some(entry)) => Ok(Some(PySearchResult {
                id: entry.id.unwrap_or_default(),
                distance: 0.0,
                metadata: entry.metadata.and_then(|m| serde_json::from_str(&m).ok()),
                vector: Some(entry.vector),
            })),
            Ok(None) => Ok(None),
            Err(e) => Err(RuVectorError::Database(e.to_string()).into()),
        }
    }
    
    /// Delete a vector by ID
    /// 
    /// Args:
    ///     id: The vector ID to delete
    ///     
    /// Returns:
    ///     True if deleted, False if not found
    fn delete(&self, id: &str) -> PyResult<bool> {
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.delete(id)
            .map_err(|e| RuVectorError::Database(e.to_string()))
    }
    
    /// Get the number of vectors in the database
    fn __len__(&self) -> PyResult<usize> {
        let db = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.len().map_err(|e| RuVectorError::Database(e.to_string()).into())
    }
    
    /// Get database statistics
    fn stats(&self) -> PyResult<HashMap<String, PyObject>> {
        let db = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        Python::with_gil(|py| {
            let mut stats = HashMap::new();
            stats.insert("dimensions".to_string(), self.dimensions.into_py(py));
            stats.insert("count".to_string(), db.len().unwrap_or(0).into_py(py));
            Ok(stats)
        })
    }
    
    /// Sync data to disk (for persistent storage)
    fn sync(&self) -> PyResult<()> {
        let db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.sync().map_err(|e| RuVectorError::Database(e.to_string()).into())
    }
    
    fn __repr__(&self) -> String {
        let count = self.inner.read()
            .map(|db| db.len().unwrap_or(0))
            .unwrap_or(0);
        format!("VectorDB(dimensions={}, count={})", self.dimensions, count)
    }
}
```

### src/graph_db.rs

```rust
use pyo3::prelude::*;
use pyo3::types::PyDict;
use std::collections::HashMap;
use std::sync::{Arc, RwLock};

use ruvector_graph::{GraphDB, NodeBuilder, EdgeBuilder};
use crate::errors::{RuVectorError, Result};
use crate::types::{PyNode, PyEdge};

#[pyclass(name = "GraphDB")]
pub struct PyGraphDB {
    inner: Arc<RwLock<GraphDB>>,
}

#[pymethods]
impl PyGraphDB {
    /// Create a new GraphDB instance
    /// 
    /// Args:
    ///     path: Optional path for persistent storage
    ///     
    /// Example:
    ///     graph = GraphDB(path="./graph.db")
    #[new]
    #[pyo3(signature = (path=None))]
    fn new(path: Option<&str>) -> PyResult<Self> {
        let db = if let Some(p) = path {
            GraphDB::with_path(p)
        } else {
            GraphDB::new()
        }.map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(Self {
            inner: Arc::new(RwLock::new(db)),
        })
    }
    
    /// Execute a Cypher query
    /// 
    /// Args:
    ///     cypher: Cypher query string
    ///     
    /// Returns:
    ///     Query results as a list of dictionaries
    ///     
    /// Example:
    ///     results = graph.execute("MATCH (n:Person) RETURN n.name")
    fn execute(&self, cypher: &str) -> PyResult<PyObject> {
        let db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        let result = db.execute(cypher)
            .map_err(|e| RuVectorError::CypherError(e.to_string()))?;
        
        Python::with_gil(|py| {
            // Convert result to Python object
            let json_str = serde_json::to_string(&result)
                .map_err(|e| RuVectorError::SerializationError(e.to_string()))?;
            
            let json_module = py.import_bound("json")?;
            let parsed = json_module.call_method1("loads", (json_str,))?;
            Ok(parsed.into())
        })
    }
    
    /// Create a node
    /// 
    /// Args:
    ///     id: Unique node identifier
    ///     labels: List of node labels
    ///     properties: Node properties dictionary
    ///     
    /// Example:
    ///     graph.create_node("person1", ["Person"], {"name": "Alice", "age": "30"})
    #[pyo3(signature = (id, labels=None, properties=None))]
    fn create_node(
        &self,
        id: &str,
        labels: Option<Vec<String>>,
        properties: Option<&Bound<'_, PyDict>>,
    ) -> PyResult<PyNode> {
        let mut builder = NodeBuilder::new(id);
        
        if let Some(lbls) = labels {
            for label in lbls {
                builder = builder.label(&label);
            }
        }
        
        if let Some(props) = properties {
            for (key, value) in props.iter() {
                let k: String = key.extract()?;
                let v: String = value.extract()?;
                builder = builder.property(&k, v);
            }
        }
        
        let node = builder.build();
        
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.create_node(node.clone())
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(PyNode {
            id: id.to_string(),
            labels: labels.unwrap_or_default(),
            properties: properties
                .map(|p| {
                    p.iter()
                        .filter_map(|(k, v)| {
                            Some((k.extract::<String>().ok()?, v.extract::<String>().ok()?))
                        })
                        .collect()
                })
                .unwrap_or_default(),
        })
    }
    
    /// Create an edge between nodes
    /// 
    /// Args:
    ///     source: Source node ID
    ///     target: Target node ID
    ///     edge_type: Relationship type
    ///     properties: Edge properties dictionary
    ///     
    /// Example:
    ///     graph.create_edge("person1", "person2", "KNOWS", {"since": "2020"})
    #[pyo3(signature = (source, target, edge_type, properties=None))]
    fn create_edge(
        &self,
        source: &str,
        target: &str,
        edge_type: &str,
        properties: Option<&Bound<'_, PyDict>>,
    ) -> PyResult<PyEdge> {
        let mut builder = EdgeBuilder::new(source, target, edge_type);
        
        if let Some(props) = properties {
            for (key, value) in props.iter() {
                let k: String = key.extract()?;
                let v: String = value.extract()?;
                builder = builder.property(&k, v);
            }
        }
        
        let edge = builder.build();
        let edge_id = edge.id.clone();
        
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.create_edge(edge)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(PyEdge {
            id: edge_id,
            edge_type: edge_type.to_string(),
            source: source.to_string(),
            target: target.to_string(),
            properties: properties
                .map(|p| {
                    p.iter()
                        .filter_map(|(k, v)| {
                            Some((k.extract::<String>().ok()?, v.extract::<String>().ok()?))
                        })
                        .collect()
                })
                .unwrap_or_default(),
        })
    }
    
    /// Get a node by ID
    fn get_node(&self, id: &str) -> PyResult<Option<PyNode>> {
        let db = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        match db.get_node(id) {
            Ok(Some(node)) => Ok(Some(PyNode {
                id: node.id,
                labels: node.labels,
                properties: node.properties,
            })),
            Ok(None) => Ok(None),
            Err(e) => Err(RuVectorError::Database(e.to_string()).into()),
        }
    }
    
    /// Delete a node by ID
    fn delete_node(&self, id: &str) -> PyResult<bool> {
        let mut db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.delete_node(id)
            .map_err(|e| RuVectorError::Database(e.to_string()))
    }
    
    /// Sync data to disk
    fn sync(&self) -> PyResult<()> {
        let db = self.inner.write().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        db.sync().map_err(|e| RuVectorError::Database(e.to_string()).into())
    }
    
    fn __repr__(&self) -> String {
        "GraphDB()".to_string()
    }
}
```

### src/gnn.rs

```rust
use pyo3::prelude::*;
use numpy::{PyArray1, PyArray2, PyReadonlyArray1, PyReadonlyArray2};
use std::sync::{Arc, RwLock};

use ruvector_gnn::{GNNLayer, RuvectorLayer, LayerConfig};
use crate::errors::RuVectorError;

#[pyclass(name = "GNNLayer")]
pub struct PyGNNLayer {
    inner: Arc<RwLock<GNNLayer>>,
    input_dim: usize,
    output_dim: usize,
}

#[pymethods]
impl PyGNNLayer {
    /// Create a new GNN layer
    /// 
    /// Args:
    ///     input_dim: Input feature dimension
    ///     output_dim: Output feature dimension
    ///     heads: Number of attention heads
    ///     dropout: Dropout rate
    ///     
    /// Example:
    ///     layer = GNNLayer(input_dim=128, output_dim=256, heads=4)
    #[new]
    #[pyo3(signature = (input_dim, output_dim, heads=4, dropout=0.1))]
    fn new(input_dim: usize, output_dim: usize, heads: usize, dropout: f32) -> PyResult<Self> {
        let config = LayerConfig {
            input_dim,
            output_dim,
            heads,
            dropout,
        };
        
        let layer = GNNLayer::new(config)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(Self {
            inner: Arc::new(RwLock::new(layer)),
            input_dim,
            output_dim,
        })
    }
    
    /// Forward pass through the GNN layer
    /// 
    /// Args:
    ///     query: Query features (1D array)
    ///     neighbors: Neighbor features (2D array: n_neighbors x feature_dim)
    ///     weights: Edge weights (1D array: n_neighbors)
    ///     
    /// Returns:
    ///     Enhanced query features (1D array)
    fn forward<'py>(
        &self,
        py: Python<'py>,
        query: PyReadonlyArray1<f32>,
        neighbors: PyReadonlyArray2<f32>,
        weights: PyReadonlyArray1<f32>,
    ) -> PyResult<Bound<'py, PyArray1<f32>>> {
        let query_vec = query.as_slice()?;
        let neighbors_data = neighbors.as_array();
        let weights_vec = weights.as_slice()?;
        
        // Convert neighbors to Vec<Vec<f32>>
        let neighbors_vec: Vec<Vec<f32>> = neighbors_data
            .rows()
            .into_iter()
            .map(|row| row.to_vec())
            .collect();
        
        let layer = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        let result = layer.forward(query_vec, &neighbors_vec, weights_vec)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(PyArray1::from_vec_bound(py, result))
    }
    
    fn __repr__(&self) -> String {
        format!("GNNLayer(input_dim={}, output_dim={})", self.input_dim, self.output_dim)
    }
}

#[pyclass(name = "RuvectorLayer")]
pub struct PyRuvectorLayer {
    inner: Arc<RwLock<RuvectorLayer>>,
}

#[pymethods]
impl PyRuvectorLayer {
    /// Create a RuvectorLayer (GNN + attention for vector search enhancement)
    /// 
    /// Args:
    ///     input_dim: Input dimension
    ///     output_dim: Output dimension
    ///     heads: Number of attention heads
    ///     dropout: Dropout rate
    #[new]
    #[pyo3(signature = (input_dim, output_dim, heads=4, dropout=0.1))]
    fn new(input_dim: usize, output_dim: usize, heads: usize, dropout: f32) -> PyResult<Self> {
        let layer = RuvectorLayer::new(input_dim, output_dim, heads, dropout)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(Self {
            inner: Arc::new(RwLock::new(layer)),
        })
    }
    
    /// Apply differentiable search enhancement
    fn enhance_search<'py>(
        &self,
        py: Python<'py>,
        query: PyReadonlyArray1<f32>,
        candidates: PyReadonlyArray2<f32>,
    ) -> PyResult<Bound<'py, PyArray1<f32>>> {
        let query_vec = query.as_slice()?;
        let candidates_data = candidates.as_array();
        
        let candidates_vec: Vec<Vec<f32>> = candidates_data
            .rows()
            .into_iter()
            .map(|row| row.to_vec())
            .collect();
        
        let layer = self.inner.read().map_err(|e| {
            RuVectorError::Database(format!("Lock error: {}", e))
        })?;
        
        let scores = layer.enhance_search(query_vec, &candidates_vec)
            .map_err(|e| RuVectorError::Database(e.to_string()))?;
        
        Ok(PyArray1::from_vec_bound(py, scores))
    }
}
```

### src/compression.rs

```rust
use pyo3::prelude::*;
use numpy::{PyArray1, PyReadonlyArray1};

use ruvector_gnn::compression::{TensorCompressor, CompressionLevel};
use crate::errors::RuVectorError;

/// Compress a vector using adaptive quantization
/// 
/// Args:
///     vector: Input vector as numpy array
///     ratio: Compression ratio (0.0-1.0). Lower = more compression.
///     
/// Returns:
///     Compressed vector as numpy array
///     
/// Example:
///     compressed = ruvector.compress(embedding, ratio=0.3)  # ~8x compression
#[pyfunction]
#[pyo3(signature = (vector, ratio=0.5))]
pub fn compress<'py>(
    py: Python<'py>,
    vector: PyReadonlyArray1<f32>,
    ratio: f32,
) -> PyResult<Bound<'py, PyArray1<f32>>> {
    let vec = vector.as_slice()?;
    
    let level = if ratio < 0.1 {
        CompressionLevel::Binary      // 32x
    } else if ratio < 0.25 {
        CompressionLevel::PQ4         // 16x
    } else if ratio < 0.5 {
        CompressionLevel::PQ8         // 8x
    } else if ratio < 0.75 {
        CompressionLevel::Float16     // 2x
    } else {
        CompressionLevel::None
    };
    
    let compressor = TensorCompressor::new(level);
    let compressed = compressor.compress(vec)
        .map_err(|e| RuVectorError::Database(e.to_string()))?;
    
    Ok(PyArray1::from_vec_bound(py, compressed))
}

/// Decompress a vector
/// 
/// Args:
///     vector: Compressed vector as numpy array
///     original_dim: Original vector dimension
///     
/// Returns:
///     Decompressed vector as numpy array
#[pyfunction]
#[pyo3(signature = (vector, original_dim=None))]
pub fn decompress<'py>(
    py: Python<'py>,
    vector: PyReadonlyArray1<f32>,
    original_dim: Option<usize>,
) -> PyResult<Bound<'py, PyArray1<f32>>> {
    let vec = vector.as_slice()?;
    
    let compressor = TensorCompressor::default();
    let decompressed = compressor.decompress(vec, original_dim)
        .map_err(|e| RuVectorError::Database(e.to_string()))?;
    
    Ok(PyArray1::from_vec_bound(py, decompressed))
}
```

### python/ruvector/__init__.py

```python
"""
RuVector: A distributed vector database that learns.

Store embeddings, query with Cypher, scale horizontally with Raft consensus,
and let the index improve itself through Graph Neural Networks.

Example:
    >>> from ruvector import VectorDB, GraphDB
    >>> 
    >>> # Vector search
    >>> db = VectorDB(dimensions=384, path="./vectors.db")
    >>> db.insert("doc1", embedding, metadata={"source": "wiki"})
    >>> results = db.search(query_embedding, k=10)
    >>> 
    >>> # Graph queries
    >>> graph = GraphDB(path="./graph.db")
    >>> graph.execute("CREATE (a:Person {name: 'Alice'})")
    >>> graph.execute("MATCH (p:Person) RETURN p.name")
"""

from ._ruvector import (
    # Core classes
    VectorDB,
    GraphDB,
    
    # GNN classes
    GNNLayer,
    RuvectorLayer,
    
    # Result types
    SearchResult,
    Node,
    Edge,
    
    # Utility functions
    compress,
    decompress,
    
    # Version
    __version__,
)

__all__ = [
    # Core
    "VectorDB",
    "GraphDB",
    
    # GNN
    "GNNLayer",
    "RuvectorLayer",
    
    # Types
    "SearchResult",
    "Node",
    "Edge",
    
    # Utils
    "compress",
    "decompress",
    
    # Meta
    "__version__",
]
```

### python/ruvector/_types.pyi (Type Stubs)

```python
"""Type stubs for ruvector native module."""

from typing import Dict, List, Optional, Any, Sequence, Union
import numpy as np
import numpy.typing as npt

class SearchResult:
    id: str
    distance: float
    metadata: Optional[Dict[str, str]]
    vector: Optional[List[float]]
    
    def to_dict(self) -> Dict[str, Any]: ...

class Node:
    id: str
    labels: List[str]
    properties: Dict[str, str]

class Edge:
    id: str
    edge_type: str
    source: str
    target: str
    properties: Dict[str, str]

class VectorDB:
    def __init__(
        self,
        dimensions: int,
        path: Optional[str] = None,
        distance_metric: str = "cosine",
    ) -> None: ...
    
    def insert(
        self,
        id: str,
        vector: Union[List[float], npt.NDArray[np.float32]],
        metadata: Optional[Dict[str, str]] = None,
    ) -> None: ...
    
    def insert_batch(
        self,
        entries: List[tuple[str, List[float], Optional[Dict[str, str]]]],
    ) -> int: ...
    
    def search(
        self,
        query: Union[List[float], npt.NDArray[np.float32]],
        k: int = 10,
        filter: Optional[Dict[str, Any]] = None,
        include_vectors: bool = False,
    ) -> List[SearchResult]: ...
    
    def get(self, id: str) -> Optional[SearchResult]: ...
    def delete(self, id: str) -> bool: ...
    def sync(self) -> None: ...
    def stats(self) -> Dict[str, Any]: ...
    def __len__(self) -> int: ...

class GraphDB:
    def __init__(self, path: Optional[str] = None) -> None: ...
    
    def execute(self, cypher: str) -> Any: ...
    
    def create_node(
        self,
        id: str,
        labels: Optional[List[str]] = None,
        properties: Optional[Dict[str, str]] = None,
    ) -> Node: ...
    
    def create_edge(
        self,
        source: str,
        target: str,
        edge_type: str,
        properties: Optional[Dict[str, str]] = None,
    ) -> Edge: ...
    
    def get_node(self, id: str) -> Optional[Node]: ...
    def delete_node(self, id: str) -> bool: ...
    def sync(self) -> None: ...

class GNNLayer:
    def __init__(
        self,
        input_dim: int,
        output_dim: int,
        heads: int = 4,
        dropout: float = 0.1,
    ) -> None: ...
    
    def forward(
        self,
        query: npt.NDArray[np.float32],
        neighbors: npt.NDArray[np.float32],
        weights: npt.NDArray[np.float32],
    ) -> npt.NDArray[np.float32]: ...

class RuvectorLayer:
    def __init__(
        self,
        input_dim: int,
        output_dim: int,
        heads: int = 4,
        dropout: float = 0.1,
    ) -> None: ...
    
    def enhance_search(
        self,
        query: npt.NDArray[np.float32],
        candidates: npt.NDArray[np.float32],
    ) -> npt.NDArray[np.float32]: ...

def compress(
    vector: npt.NDArray[np.float32],
    ratio: float = 0.5,
) -> npt.NDArray[np.float32]: ...

def decompress(
    vector: npt.NDArray[np.float32],
    original_dim: Optional[int] = None,
) -> npt.NDArray[np.float32]: ...

__version__: str
```

## Tests

### tests/conftest.py

```python
import pytest
import numpy as np
import tempfile
import os

@pytest.fixture
def temp_dir():
    """Create a temporary directory for database files."""
    with tempfile.TemporaryDirectory() as tmpdir:
        yield tmpdir

@pytest.fixture
def sample_vectors():
    """Generate sample vectors for testing."""
    np.random.seed(42)
    return {
        "dimensions": 128,
        "vectors": [
            ("doc1", np.random.rand(128).astype(np.float32)),
            ("doc2", np.random.rand(128).astype(np.float32)),
            ("doc3", np.random.rand(128).astype(np.float32)),
        ],
        "query": np.random.rand(128).astype(np.float32),
    }
```

### tests/test_vector_db.py

```python
import pytest
import numpy as np
from ruvector import VectorDB

class TestVectorDB:
    def test_create_in_memory(self):
        db = VectorDB(dimensions=128)
        assert len(db) == 0
    
    def test_create_persistent(self, temp_dir):
        path = f"{temp_dir}/vectors.db"
        db = VectorDB(dimensions=128, path=path)
        assert len(db) == 0
    
    def test_insert_and_search(self, sample_vectors):
        db = VectorDB(dimensions=sample_vectors["dimensions"])
        
        # Insert vectors
        for id, vector in sample_vectors["vectors"]:
            db.insert(id, vector)
        
        assert len(db) == 3
        
        # Search
        results = db.search(sample_vectors["query"], k=2)
        assert len(results) == 2
        assert all(hasattr(r, "id") for r in results)
        assert all(hasattr(r, "distance") for r in results)
    
    def test_insert_with_metadata(self, sample_vectors):
        db = VectorDB(dimensions=sample_vectors["dimensions"])
        
        id, vector = sample_vectors["vectors"][0]
        db.insert(id, vector, metadata={"source": "test", "type": "document"})
        
        result = db.get(id)
        assert result is not None
        assert result.metadata["source"] == "test"
    
    def test_dimension_mismatch(self):
        db = VectorDB(dimensions=128)
        
        with pytest.raises(ValueError, match="Dimension mismatch"):
            wrong_dim = np.random.rand(256).astype(np.float32)
            db.insert("test", wrong_dim)
    
    def test_delete(self, sample_vectors):
        db = VectorDB(dimensions=sample_vectors["dimensions"])
        
        id, vector = sample_vectors["vectors"][0]
        db.insert(id, vector)
        assert len(db) == 1
        
        deleted = db.delete(id)
        assert deleted
        assert len(db) == 0
    
    def test_batch_insert(self, sample_vectors):
        db = VectorDB(dimensions=sample_vectors["dimensions"])
        
        entries = [
            (id, vector.tolist(), {"index": str(i)})
            for i, (id, vector) in enumerate(sample_vectors["vectors"])
        ]
        
        count = db.insert_batch(entries)
        assert count == 3
        assert len(db) == 3
    
    def test_persistence(self, temp_dir, sample_vectors):
        path = f"{temp_dir}/vectors.db"
        
        # Create and populate
        db1 = VectorDB(dimensions=sample_vectors["dimensions"], path=path)
        for id, vector in sample_vectors["vectors"]:
            db1.insert(id, vector)
        db1.sync()
        
        # Reopen
        db2 = VectorDB(dimensions=sample_vectors["dimensions"], path=path)
        assert len(db2) == 3
```

### tests/test_graph_db.py

```python
import pytest
from ruvector import GraphDB

class TestGraphDB:
    def test_create_in_memory(self):
        graph = GraphDB()
        assert graph is not None
    
    def test_create_node(self):
        graph = GraphDB()
        
        node = graph.create_node(
            "person1",
            labels=["Person"],
            properties={"name": "Alice", "age": "30"}
        )
        
        assert node.id == "person1"
        assert "Person" in node.labels
        assert node.properties["name"] == "Alice"
    
    def test_create_edge(self):
        graph = GraphDB()
        
        graph.create_node("person1", ["Person"], {"name": "Alice"})
        graph.create_node("person2", ["Person"], {"name": "Bob"})
        
        edge = graph.create_edge(
            "person1", "person2", "KNOWS",
            properties={"since": "2020"}
        )
        
        assert edge.source == "person1"
        assert edge.target == "person2"
        assert edge.edge_type == "KNOWS"
    
    def test_cypher_create(self):
        graph = GraphDB()
        
        graph.execute("CREATE (a:Person {name: 'Alice'})")
        graph.execute("CREATE (b:Person {name: 'Bob'})")
        graph.execute("MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'}) CREATE (a)-[:KNOWS]->(b)")
        
        result = graph.execute("MATCH (p:Person) RETURN p.name")
        assert len(result) == 2
    
    def test_cypher_match(self):
        graph = GraphDB()
        
        graph.create_node("alice", ["Person"], {"name": "Alice"})
        graph.create_node("bob", ["Person"], {"name": "Bob"})
        graph.create_edge("alice", "bob", "KNOWS")
        
        result = graph.execute("""
            MATCH (a:Person)-[:KNOWS]->(b:Person)
            RETURN a.name, b.name
        """)
        
        assert len(result) == 1
    
    def test_get_node(self):
        graph = GraphDB()
        
        graph.create_node("test", ["Label"], {"key": "value"})
        
        node = graph.get_node("test")
        assert node is not None
        assert node.id == "test"
        
        missing = graph.get_node("nonexistent")
        assert missing is None
    
    def test_delete_node(self):
        graph = GraphDB()
        
        graph.create_node("test", ["Label"])
        assert graph.get_node("test") is not None
        
        deleted = graph.delete_node("test")
        assert deleted
        assert graph.get_node("test") is None
```

### tests/test_gnn.py

```python
import pytest
import numpy as np
from ruvector import GNNLayer, RuvectorLayer, compress, decompress

class TestGNNLayer:
    def test_create_layer(self):
        layer = GNNLayer(input_dim=128, output_dim=256, heads=4)
        assert layer is not None
    
    def test_forward(self):
        layer = GNNLayer(input_dim=128, output_dim=256, heads=4)
        
        query = np.random.rand(128).astype(np.float32)
        neighbors = np.random.rand(5, 128).astype(np.float32)
        weights = np.ones(5, dtype=np.float32)
        
        output = layer.forward(query, neighbors, weights)
        assert output.shape == (256,)

class TestRuvectorLayer:
    def test_enhance_search(self):
        layer = RuvectorLayer(input_dim=128, output_dim=128, heads=4)
        
        query = np.random.rand(128).astype(np.float32)
        candidates = np.random.rand(10, 128).astype(np.float32)
        
        scores = layer.enhance_search(query, candidates)
        assert scores.shape == (10,)

class TestCompression:
    def test_compress_decompress(self):
        original = np.random.rand(128).astype(np.float32)
        
        compressed = compress(original, ratio=0.5)
        decompressed = decompress(compressed, original_dim=128)
        
        # Should be close but not exact due to lossy compression
        assert decompressed.shape == original.shape
    
    def test_compression_levels(self):
        original = np.random.rand(128).astype(np.float32)
        
        # Different compression levels
        for ratio in [0.05, 0.2, 0.4, 0.6, 0.9]:
            compressed = compress(original, ratio=ratio)
            assert compressed is not None
```

## Examples

### examples/basic_vector_search.py

```python
"""Basic vector search example with RuVector."""

import numpy as np
from ruvector import VectorDB

def main():
    # Create a vector database
    db = VectorDB(dimensions=384, path="./demo_vectors.db")
    
    # Generate some sample embeddings (in practice, use a real embedding model)
    np.random.seed(42)
    documents = [
        ("doc1", "Introduction to machine learning"),
        ("doc2", "Deep learning with neural networks"),
        ("doc3", "Natural language processing basics"),
        ("doc4", "Computer vision fundamentals"),
        ("doc5", "Reinforcement learning tutorial"),
    ]
    
    # Insert documents with their embeddings
    for doc_id, text in documents:
        # Simulate embedding generation
        embedding = np.random.rand(384).astype(np.float32)
        db.insert(doc_id, embedding, metadata={"text": text})
    
    print(f"Inserted {len(db)} documents")
    
    # Search for similar documents
    query = np.random.rand(384).astype(np.float32)
    results = db.search(query, k=3)
    
    print("\nSearch results:")
    for result in results:
        print(f"  {result.id}: distance={result.distance:.4f}")
        if result.metadata:
            print(f"    text: {result.metadata.get('text', 'N/A')}")

if __name__ == "__main__":
    main()
```

### examples/knowledge_graph.py

```python
"""Knowledge graph example combining vectors and graph queries."""

import numpy as np
from ruvector import VectorDB, GraphDB

def main():
    # Initialize both databases
    vectors = VectorDB(dimensions=384, path="./kg_vectors.db")
    graph = GraphDB(path="./kg_graph.db")
    
    # Create some entities
    entities = [
        ("python", "Language", {"name": "Python", "type": "programming"}),
        ("rust", "Language", {"name": "Rust", "type": "programming"}),
        ("ml", "Topic", {"name": "Machine Learning"}),
        ("webdev", "Topic", {"name": "Web Development"}),
    ]
    
    np.random.seed(42)
    for entity_id, label, props in entities:
        # Create graph node
        graph.create_node(entity_id, [label], props)
        
        # Create embedding and store in vector DB
        embedding = np.random.rand(384).astype(np.float32)
        vectors.insert(entity_id, embedding, metadata=props)
    
    # Create relationships
    graph.create_edge("python", "ml", "USED_FOR")
    graph.create_edge("python", "webdev", "USED_FOR")
    graph.create_edge("rust", "webdev", "USED_FOR")
    
    # Query: Find what Python is used for
    result = graph.execute("""
        MATCH (lang:Language {name: 'Python'})-[:USED_FOR]->(topic:Topic)
        RETURN topic.name
    """)
    print("Python is used for:", result)
    
    # Semantic search: Find entities similar to a query
    query = np.random.rand(384).astype(np.float32)
    similar = vectors.search(query, k=2)
    print("\nMost similar entities:")
    for r in similar:
        print(f"  {r.id}: {r.distance:.4f}")

if __name__ == "__main__":
    main()
```

## CI/CD Configuration

### .github/workflows/python.yml (add to existing workflows)

```yaml
name: Python Bindings

on:
  push:
    branches: [main]
    paths:
      - 'crates/ruvector-python/**'
  pull_request:
    paths:
      - 'crates/ruvector-python/**'

jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      
      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
      
      - name: Install maturin
        run: pip install maturin pytest numpy
      
      - name: Build and test
        working-directory: crates/ruvector-python
        run: |
          maturin develop
          pytest tests/ -v

  build-wheels:
    needs: test
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Build wheels
        uses: PyO3/maturin-action@v1
        with:
          working-directory: crates/ruvector-python
          args: --release --out dist
          manylinux: auto
      
      - name: Upload wheels
        uses: actions/upload-artifact@v4
        with:
          name: wheels-${{ matrix.os }}
          path: crates/ruvector-python/dist/*.whl

  publish:
    needs: build-wheels
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/python-v')
    
    steps:
      - uses: actions/download-artifact@v4
        with:
          pattern: wheels-*
          merge-multiple: true
          path: dist
      
      - name: Publish to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          password: ${{ secrets.PYPI_API_TOKEN }}
```

## README.md

```markdown
# RuVector Python Bindings

Python bindings for [RuVector](https://github.com/ruvnet/ruvector), a distributed vector database that learns.

## Installation

```bash
pip install ruvector
```

## Quick Start

```python
from ruvector import VectorDB, GraphDB
import numpy as np

# Vector search
db = VectorDB(dimensions=384, path="./vectors.db")

# Insert vectors
embedding = np.random.rand(384).astype(np.float32)
db.insert("doc1", embedding, metadata={"source": "wiki"})

# Search
query = np.random.rand(384).astype(np.float32)
results = db.search(query, k=10)
for r in results:
    print(f"{r.id}: {r.distance}")

# Graph queries with Cypher
graph = GraphDB(path="./graph.db")
graph.execute("CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})")
result = graph.execute("MATCH (p:Person) RETURN p.name")
```

## Features

- **Vector Search**: HNSW index, <0.5ms latency, SIMD acceleration
- **Graph Queries**: Neo4j-style Cypher syntax
- **GNN Enhancement**: Neural network layers that improve search over time
- **Compression**: 2-32x memory reduction with adaptive quantization
- **Persistence**: Optional file-based storage with crash recovery

## API Reference

See the [[full documentation](https://github.com/ruvnet/ruvector/tree/main/crates/ruvector-python)](https://github.com/ruvnet/ruvector/tree/main/crates/ruvector-python).

## License

MIT
```

## Contribution Checklist

Before submitting the PR:

1. [ ] All tests pass: `pytest tests/ -v`
2. [ ] Code is formatted: `cargo fmt`
3. [ ] No clippy warnings: `cargo clippy`
4. [ ] Type stubs are complete and accurate
5. [ ] Examples work correctly
6. [ ] README is updated
7. [ ] CHANGELOG entry added
8. [ ] Version matches main ruvector version

## PR Description Template

```markdown
## Python Bindings for RuVector

This PR adds native Python bindings for RuVector using PyO3 and Maturin.

### Features
- `VectorDB`: Vector storage and similarity search
- `GraphDB`: Cypher query support
- `GNNLayer`/`RuvectorLayer`: GNN enhancement layers
- `compress`/`decompress`: Tensor compression utilities

### API Parity
Matches the Node.js bindings (`@ruvector/core`, `@ruvector/graph-node`, `@ruvector/gnn`) with Pythonic adaptations.

### Testing
- Unit tests for all public APIs
- Integration tests with numpy arrays
- Tested on Python 3.8-3.12, Linux/macOS/Windows

### Documentation
- Type stubs (`.pyi`) for IDE support
- Docstrings with examples
- README with quickstart guide

Closes #XX (if there's a related issue)
```

## Notes for Implementation

1. **Study the Node.js bindings first**: Look at `crates/ruvector-node/src/lib.rs` for patterns
2. **Match the API signatures**: Keep function names and parameters consistent with Node.js
3. **Use numpy for arrays**: Python users expect numpy array support
4. **Thread safety**: Use `Arc<RwLock<>>` for the inner database handles
5. **Error messages**: Make them Pythonic and helpful
6. **Type hints**: Complete `.pyi` files are essential for Python IDE support
7. **Test on all platforms**: macOS, Linux, Windows all behave differently

Python bindings for ruvector #73

Description

RuVector Python Bindings Implementation Specification

Overview

Goals

Repository Context

Project Structure

Dependencies

Cargo.toml

pyproject.toml

Implementation Details

src/lib.rs

src/errors.rs

src/types.rs

src/vector_db.rs

src/graph_db.rs

src/gnn.rs

src/compression.rs

python/ruvector/init.py

python/ruvector/_types.pyi (Type Stubs)

Tests

tests/conftest.py

tests/test_vector_db.py

tests/test_graph_db.py

tests/test_gnn.py

Examples

examples/basic_vector_search.py

examples/knowledge_graph.py

CI/CD Configuration

.github/workflows/python.yml (add to existing workflows)

README.md

Quick Start

Features

API Reference

License

Notes for Implementation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions