Skip to content

RFC: Deep Integration of ruvector-attention into ruvector-gnn #38

@ruvnet

Description

@ruvnet

RFC: Deep Integration of ruvector-attention into ruvector-gnn

Executive Summary

This RFC proposes a comprehensive integration of ruvector-attention capabilities into ruvector-gnn, combining advanced attention mechanisms with graph neural network operations to create a state-of-the-art hierarchical vector search system.

Current State Analysis

ruvector-attention (Standalone Crate)

The attention crate provides a rich set of attention mechanisms:

Component Description Status
Scaled Dot-Product Standard transformer attention ✅ Complete
Multi-Head Attention Parallel attention heads ✅ Complete
Edge-Featured Attention (GATv2) Graph attention with edge features ✅ Complete
Hyperbolic Attention Poincaré ball model attention ✅ Complete
Dual-Space Attention Euclidean + Hyperbolic combined ✅ Complete
Flash Attention Memory-efficient O(block_size) ✅ Complete
Linear Attention O(n) complexity attention ✅ Complete
MoE Attention Mixture of Experts routing ✅ Complete
GraphRoPE Rotary Position Embeddings for graphs ✅ Complete

ruvector-gnn (Current Implementation)

The GNN crate has its own attention implementation:

Component Description Limitation
MultiHeadAttention Basic MHA in layer.rs Duplicates attention crate functionality
RuvectorLayer GNN layer with attention Uses internal MHA only
differentiable_search Soft attention search Basic cosine similarity only
hierarchical_forward Layer-wise processing No geometric attention support

Integration Benefits

1. Hyperbolic Space for Hierarchical Search 🌐

HNSW graphs are inherently hierarchical. Hyperbolic geometry naturally captures tree-like structures with exponentially more space at boundaries.

Current: Euclidean cosine similarity only
Proposed: Poincaré distance for hierarchical relationships
         - Better parent-child similarity
         - Improved layer-wise search
         - ~15-30% better recall on hierarchical data

2. Edge-Featured Graph Attention (GATv2) 🔗

Current GNN layers ignore edge features. GATv2 integration enables:

// Current: Simple weighted aggregation
let aggregated = self.aggregate_messages(&neighbor_msgs, edge_weights);

// Proposed: Edge-featured attention
let edge_features = compute_edge_features(node, neighbors);
let aggregated = edge_attention.compute_with_edges(
    &node_msg, &neighbor_msgs, &neighbor_msgs, &edge_features
);

Benefits:

  • Distance-aware attention (closer neighbors get dynamic weighting)
  • Edge type features (HNSW layer level, connection type)
  • ~20% improvement in GNN message passing quality

3. Flash Attention for Large-Scale Search

Memory-efficient attention for searching across millions of vectors:

Standard MHA: O(n²) memory for attention matrix
Flash Attention: O(block_size) memory with tiled computation

For n=1M vectors:
- Standard: ~4TB memory (impossible)
- Flash: ~64MB memory (feasible)

4. Mixture of Experts for Adaptive Search 🎯

Different query types benefit from different attention mechanisms:

Query Type Best Expert Why
Hierarchical data HyperbolicExpert Tree-like structure
Dense clusters StandardExpert Euclidean similarity
Sparse features LinearExpert Low-rank attention

MoE routing automatically selects optimal attention per query.

5. Dual-Space Attention for Hybrid Search 🔄

Combine Euclidean (local structure) and Hyperbolic (global hierarchy):

// Proposed integration in differentiable_search
pub fn differentiable_search_dual_space(
    query: &[f32],
    candidates: &[Vec<f32>],
    k: usize,
    dual_space_config: DualSpaceConfig,
) -> (Vec<usize>, Vec<f32>) {
    let dual_attn = DualSpaceAttention::new(dual_space_config);
    
    // Combine both geometric perspectives
    let (euc_scores, hyp_scores) = dual_attn.get_space_contributions(&query, &candidates);
    
    // Adaptive weighting based on query characteristics
    let combined_scores = adaptive_combine(euc_scores, hyp_scores, &query);
    // ...
}

Proposed Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     ruvector-gnn (Enhanced)                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────┐    ┌─────────────────┐                    │
│  │  RuvectorLayer  │───▶│ AttentionBackend │◀─┐                │
│  │  (GNN Layer)    │    │   (Pluggable)    │  │                │
│  └─────────────────┘    └─────────────────┘  │                │
│                                │              │                │
│         ┌──────────────────────┼──────────────┘                │
│         │                      │                               │
│         ▼                      ▼                               │
│  ┌──────────────┐    ┌──────────────────┐                      │
│  │  Training    │    │ ruvector-attention│                      │
│  │  (EWC, Adam) │    │    (imported)     │                      │
│  └──────────────┘    └──────────────────┘                      │
│                              │                                 │
│       ┌──────────────────────┼──────────────────────┐         │
│       │                      │                      │         │
│       ▼                      ▼                      ▼         │
│ ┌───────────┐    ┌────────────────┐    ┌──────────────┐       │
│ │Hyperbolic │    │ Edge-Featured  │    │    Flash     │       │
│ │ Attention │    │  GAT (GATv2)   │    │  Attention   │       │
│ └───────────┘    └────────────────┘    └──────────────┘       │
│       │                      │                      │         │
│       └──────────────────────┼──────────────────────┘         │
│                              ▼                                 │
│                    ┌──────────────────┐                        │
│                    │  MoE Routing     │                        │
│                    │  (Adaptive)      │                        │
│                    └──────────────────┘                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Implementation Roadmap

Phase 1: Foundation (Week 1-2)

  • Add ruvector-attention as dependency to ruvector-gnn
  • Create AttentionBackend trait for pluggable attention
  • Refactor RuvectorLayer to use trait-based attention
  • Add feature flags: hyperbolic, flash, moe, full-attention

Phase 2: Core Integration (Week 3-4)

  • Integrate EdgeFeaturedAttention into GNN message passing
  • Replace internal MHA with ruvector-attention::MultiHeadAttention
  • Add DualSpaceAttention option for hierarchical_forward
  • Implement HyperbolicGNNLayer variant

Phase 3: Search Enhancement (Week 5-6)

  • Create differentiable_search_hyperbolic function
  • Add Flash attention for large candidate sets
  • Implement MoE-based adaptive search
  • Benchmark against current implementation

Phase 4: NAPI Bindings (Week 7-8)

  • Expose new attention modes via NAPI
  • Update Float32Array interfaces for all new functions
  • Create JavaScript examples and documentation
  • Performance optimization for Node.js

Phase 5: Production Hardening (Week 9-10)

  • Comprehensive benchmarking suite
  • Memory profiling and optimization
  • SIMD acceleration for attention kernels
  • Documentation and migration guide

API Design

Rust API

// New attention-aware GNN layer
pub struct RuvectorLayerV2 {
    attention: Box<dyn AttentionBackend>,
    gru: GRUCell,
    norm: LayerNorm,
}

impl RuvectorLayerV2 {
    pub fn with_hyperbolic(config: HyperbolicConfig) -> Self;
    pub fn with_edge_featured(config: EdgeFeaturedConfig) -> Self;
    pub fn with_dual_space(config: DualSpaceConfig) -> Self;
    pub fn with_moe(config: MoEConfig) -> Self;
}

// Enhanced search
pub fn hierarchical_forward_v2(
    query: &[f32],
    layer_embeddings: &[Vec<Vec<f32>>],
    gnn_layers: &[RuvectorLayerV2],
    search_config: SearchConfig,
) -> SearchResult;

pub struct SearchConfig {
    pub attention_mode: AttentionMode,
    pub hyperbolic_curvature: Option<f32>,
    pub temperature: f32,
    pub use_flash: bool,
}

NAPI/JavaScript API

// New exports from @ruvector/gnn
export interface AttentionConfig {
  mode: 'standard' | 'hyperbolic' | 'dual_space' | 'edge_featured' | 'moe';
  curvature?: number;        // For hyperbolic
  euclideanWeight?: number;  // For dual_space
  numExperts?: number;       // For MoE
  topK?: number;             // For MoE
}

export function hierarchicalForwardV2(
  query: Float32Array,
  layerEmbeddings: Float32Array[][],
  layers: string[],
  config: AttentionConfig
): Float32Array;

export function differentiableSearchHyperbolic(
  query: Float32Array,
  candidates: Float32Array[],
  k: number,
  curvature: number
): { indices: number[]; weights: Float32Array };

Performance Expectations

Metric Current With Attention Integration
Recall@10 (hierarchical) ~85% ~92-95%
Memory (1M vectors) Baseline -40% (Flash)
Latency (search) Baseline -15% (optimized paths)
GNN accuracy Baseline +20% (edge features)

Compatibility

  • Backward Compatible: All existing APIs preserved
  • Opt-in Features: New features behind feature flags
  • Migration Path: Gradual adoption via V2 suffix functions

Open Questions

  1. Default attention mode: Should we default to dual_space for best general performance?
  2. Curvature learning: Should curvature be learnable or fixed?
  3. MoE overhead: Is the routing overhead worth it for smaller datasets?
  4. WASM support: Which attention modes should be WASM-compatible?

References

Related Issues


Requesting feedback on:

  1. Priority ordering of phases
  2. Additional attention mechanisms to consider
  3. Specific use cases to optimize for
  4. NAPI binding design preferences

/cc @ruvnet

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions