-
Notifications
You must be signed in to change notification settings - Fork 76
Description
RFC: Deep Integration of ruvector-attention into ruvector-gnn
Executive Summary
This RFC proposes a comprehensive integration of ruvector-attention capabilities into ruvector-gnn, combining advanced attention mechanisms with graph neural network operations to create a state-of-the-art hierarchical vector search system.
Current State Analysis
ruvector-attention (Standalone Crate)
The attention crate provides a rich set of attention mechanisms:
| Component | Description | Status |
|---|---|---|
| Scaled Dot-Product | Standard transformer attention | ✅ Complete |
| Multi-Head Attention | Parallel attention heads | ✅ Complete |
| Edge-Featured Attention (GATv2) | Graph attention with edge features | ✅ Complete |
| Hyperbolic Attention | Poincaré ball model attention | ✅ Complete |
| Dual-Space Attention | Euclidean + Hyperbolic combined | ✅ Complete |
| Flash Attention | Memory-efficient O(block_size) | ✅ Complete |
| Linear Attention | O(n) complexity attention | ✅ Complete |
| MoE Attention | Mixture of Experts routing | ✅ Complete |
| GraphRoPE | Rotary Position Embeddings for graphs | ✅ Complete |
ruvector-gnn (Current Implementation)
The GNN crate has its own attention implementation:
| Component | Description | Limitation |
|---|---|---|
| MultiHeadAttention | Basic MHA in layer.rs |
Duplicates attention crate functionality |
| RuvectorLayer | GNN layer with attention | Uses internal MHA only |
| differentiable_search | Soft attention search | Basic cosine similarity only |
| hierarchical_forward | Layer-wise processing | No geometric attention support |
Integration Benefits
1. Hyperbolic Space for Hierarchical Search 🌐
HNSW graphs are inherently hierarchical. Hyperbolic geometry naturally captures tree-like structures with exponentially more space at boundaries.
Current: Euclidean cosine similarity only
Proposed: Poincaré distance for hierarchical relationships
- Better parent-child similarity
- Improved layer-wise search
- ~15-30% better recall on hierarchical data
2. Edge-Featured Graph Attention (GATv2) 🔗
Current GNN layers ignore edge features. GATv2 integration enables:
// Current: Simple weighted aggregation
let aggregated = self.aggregate_messages(&neighbor_msgs, edge_weights);
// Proposed: Edge-featured attention
let edge_features = compute_edge_features(node, neighbors);
let aggregated = edge_attention.compute_with_edges(
&node_msg, &neighbor_msgs, &neighbor_msgs, &edge_features
);Benefits:
- Distance-aware attention (closer neighbors get dynamic weighting)
- Edge type features (HNSW layer level, connection type)
- ~20% improvement in GNN message passing quality
3. Flash Attention for Large-Scale Search ⚡
Memory-efficient attention for searching across millions of vectors:
Standard MHA: O(n²) memory for attention matrix
Flash Attention: O(block_size) memory with tiled computation
For n=1M vectors:
- Standard: ~4TB memory (impossible)
- Flash: ~64MB memory (feasible)
4. Mixture of Experts for Adaptive Search 🎯
Different query types benefit from different attention mechanisms:
| Query Type | Best Expert | Why |
|---|---|---|
| Hierarchical data | HyperbolicExpert | Tree-like structure |
| Dense clusters | StandardExpert | Euclidean similarity |
| Sparse features | LinearExpert | Low-rank attention |
MoE routing automatically selects optimal attention per query.
5. Dual-Space Attention for Hybrid Search 🔄
Combine Euclidean (local structure) and Hyperbolic (global hierarchy):
// Proposed integration in differentiable_search
pub fn differentiable_search_dual_space(
query: &[f32],
candidates: &[Vec<f32>],
k: usize,
dual_space_config: DualSpaceConfig,
) -> (Vec<usize>, Vec<f32>) {
let dual_attn = DualSpaceAttention::new(dual_space_config);
// Combine both geometric perspectives
let (euc_scores, hyp_scores) = dual_attn.get_space_contributions(&query, &candidates);
// Adaptive weighting based on query characteristics
let combined_scores = adaptive_combine(euc_scores, hyp_scores, &query);
// ...
}Proposed Architecture
┌─────────────────────────────────────────────────────────────────┐
│ ruvector-gnn (Enhanced) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ RuvectorLayer │───▶│ AttentionBackend │◀─┐ │
│ │ (GNN Layer) │ │ (Pluggable) │ │ │
│ └─────────────────┘ └─────────────────┘ │ │
│ │ │ │
│ ┌──────────────────────┼──────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ Training │ │ ruvector-attention│ │
│ │ (EWC, Adam) │ │ (imported) │ │
│ └──────────────┘ └──────────────────┘ │
│ │ │
│ ┌──────────────────────┼──────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌────────────────┐ ┌──────────────┐ │
│ │Hyperbolic │ │ Edge-Featured │ │ Flash │ │
│ │ Attention │ │ GAT (GATv2) │ │ Attention │ │
│ └───────────┘ └────────────────┘ └──────────────┘ │
│ │ │ │ │
│ └──────────────────────┼──────────────────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ MoE Routing │ │
│ │ (Adaptive) │ │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Implementation Roadmap
Phase 1: Foundation (Week 1-2)
- Add
ruvector-attentionas dependency toruvector-gnn - Create
AttentionBackendtrait for pluggable attention - Refactor
RuvectorLayerto use trait-based attention - Add feature flags:
hyperbolic,flash,moe,full-attention
Phase 2: Core Integration (Week 3-4)
- Integrate
EdgeFeaturedAttentioninto GNN message passing - Replace internal MHA with
ruvector-attention::MultiHeadAttention - Add
DualSpaceAttentionoption forhierarchical_forward - Implement
HyperbolicGNNLayervariant
Phase 3: Search Enhancement (Week 5-6)
- Create
differentiable_search_hyperbolicfunction - Add Flash attention for large candidate sets
- Implement MoE-based adaptive search
- Benchmark against current implementation
Phase 4: NAPI Bindings (Week 7-8)
- Expose new attention modes via NAPI
- Update
Float32Arrayinterfaces for all new functions - Create JavaScript examples and documentation
- Performance optimization for Node.js
Phase 5: Production Hardening (Week 9-10)
- Comprehensive benchmarking suite
- Memory profiling and optimization
- SIMD acceleration for attention kernels
- Documentation and migration guide
API Design
Rust API
// New attention-aware GNN layer
pub struct RuvectorLayerV2 {
attention: Box<dyn AttentionBackend>,
gru: GRUCell,
norm: LayerNorm,
}
impl RuvectorLayerV2 {
pub fn with_hyperbolic(config: HyperbolicConfig) -> Self;
pub fn with_edge_featured(config: EdgeFeaturedConfig) -> Self;
pub fn with_dual_space(config: DualSpaceConfig) -> Self;
pub fn with_moe(config: MoEConfig) -> Self;
}
// Enhanced search
pub fn hierarchical_forward_v2(
query: &[f32],
layer_embeddings: &[Vec<Vec<f32>>],
gnn_layers: &[RuvectorLayerV2],
search_config: SearchConfig,
) -> SearchResult;
pub struct SearchConfig {
pub attention_mode: AttentionMode,
pub hyperbolic_curvature: Option<f32>,
pub temperature: f32,
pub use_flash: bool,
}NAPI/JavaScript API
// New exports from @ruvector/gnn
export interface AttentionConfig {
mode: 'standard' | 'hyperbolic' | 'dual_space' | 'edge_featured' | 'moe';
curvature?: number; // For hyperbolic
euclideanWeight?: number; // For dual_space
numExperts?: number; // For MoE
topK?: number; // For MoE
}
export function hierarchicalForwardV2(
query: Float32Array,
layerEmbeddings: Float32Array[][],
layers: string[],
config: AttentionConfig
): Float32Array;
export function differentiableSearchHyperbolic(
query: Float32Array,
candidates: Float32Array[],
k: number,
curvature: number
): { indices: number[]; weights: Float32Array };Performance Expectations
| Metric | Current | With Attention Integration |
|---|---|---|
| Recall@10 (hierarchical) | ~85% | ~92-95% |
| Memory (1M vectors) | Baseline | -40% (Flash) |
| Latency (search) | Baseline | -15% (optimized paths) |
| GNN accuracy | Baseline | +20% (edge features) |
Compatibility
- Backward Compatible: All existing APIs preserved
- Opt-in Features: New features behind feature flags
- Migration Path: Gradual adoption via
V2suffix functions
Open Questions
- Default attention mode: Should we default to
dual_spacefor best general performance? - Curvature learning: Should curvature be learnable or fixed?
- MoE overhead: Is the routing overhead worth it for smaller datasets?
- WASM support: Which attention modes should be WASM-compatible?
References
- Graph Attention Networks (Veličković et al., 2017)
- GATv2 (Brody et al., 2021)
- Hyperbolic Neural Networks (Ganea et al., 2018)
- Flash Attention (Dao et al., 2022)
- Mixture of Experts (Shazeer et al., 2017)
Related Issues
- NAPI-RS Type Conversion Errors in @ruvector/attention and @ruvector/gnn #35 - NAPI-RS type conversion (fixed in fix(gnn-node): Use Float32Array for NAPI bindings to fix type conversion errors #36)
- Security: 10 npm vulnerabilities detected in dependencies #37 - Security vulnerabilities (fixed)
Requesting feedback on:
- Priority ordering of phases
- Additional attention mechanisms to consider
- Specific use cases to optimize for
- NAPI binding design preferences
/cc @ruvnet