RFC: Deep Integration of ruvector-attention into ruvector-gnn

# RFC: Deep Integration of ruvector-attention into ruvector-gnn

## Executive Summary

This RFC proposes a comprehensive integration of `ruvector-attention` capabilities into `ruvector-gnn`, combining advanced attention mechanisms with graph neural network operations to create a state-of-the-art hierarchical vector search system.

## Current State Analysis

### ruvector-attention (Standalone Crate)

The attention crate provides a rich set of attention mechanisms:

| Component | Description | Status |
|-----------|-------------|--------|
| **Scaled Dot-Product** | Standard transformer attention | ✅ Complete |
| **Multi-Head Attention** | Parallel attention heads | ✅ Complete |
| **Edge-Featured Attention (GATv2)** | Graph attention with edge features | ✅ Complete |
| **Hyperbolic Attention** | Poincaré ball model attention | ✅ Complete |
| **Dual-Space Attention** | Euclidean + Hyperbolic combined | ✅ Complete |
| **Flash Attention** | Memory-efficient O(block_size) | ✅ Complete |
| **Linear Attention** | O(n) complexity attention | ✅ Complete |
| **MoE Attention** | Mixture of Experts routing | ✅ Complete |
| **GraphRoPE** | Rotary Position Embeddings for graphs | ✅ Complete |

### ruvector-gnn (Current Implementation)

The GNN crate has its own attention implementation:

| Component | Description | Limitation |
|-----------|-------------|------------|
| **MultiHeadAttention** | Basic MHA in `layer.rs` | Duplicates attention crate functionality |
| **RuvectorLayer** | GNN layer with attention | Uses internal MHA only |
| **differentiable_search** | Soft attention search | Basic cosine similarity only |
| **hierarchical_forward** | Layer-wise processing | No geometric attention support |

## Integration Benefits

### 1. **Hyperbolic Space for Hierarchical Search** 🌐

HNSW graphs are inherently hierarchical. Hyperbolic geometry naturally captures tree-like structures with exponentially more space at boundaries.

```
Current: Euclidean cosine similarity only
Proposed: Poincaré distance for hierarchical relationships
         - Better parent-child similarity
         - Improved layer-wise search
         - ~15-30% better recall on hierarchical data
```

### 2. **Edge-Featured Graph Attention (GATv2)** 🔗

Current GNN layers ignore edge features. GATv2 integration enables:

```rust
// Current: Simple weighted aggregation
let aggregated = self.aggregate_messages(&neighbor_msgs, edge_weights);

// Proposed: Edge-featured attention
let edge_features = compute_edge_features(node, neighbors);
let aggregated = edge_attention.compute_with_edges(
    &node_msg, &neighbor_msgs, &neighbor_msgs, &edge_features
);
```

**Benefits:**
- Distance-aware attention (closer neighbors get dynamic weighting)
- Edge type features (HNSW layer level, connection type)
- ~20% improvement in GNN message passing quality

### 3. **Flash Attention for Large-Scale Search** ⚡

Memory-efficient attention for searching across millions of vectors:

```
Standard MHA: O(n²) memory for attention matrix
Flash Attention: O(block_size) memory with tiled computation

For n=1M vectors:
- Standard: ~4TB memory (impossible)
- Flash: ~64MB memory (feasible)
```

### 4. **Mixture of Experts for Adaptive Search** 🎯

Different query types benefit from different attention mechanisms:

| Query Type | Best Expert | Why |
|------------|-------------|-----|
| Hierarchical data | HyperbolicExpert | Tree-like structure |
| Dense clusters | StandardExpert | Euclidean similarity |
| Sparse features | LinearExpert | Low-rank attention |

MoE routing automatically selects optimal attention per query.

### 5. **Dual-Space Attention for Hybrid Search** 🔄

Combine Euclidean (local structure) and Hyperbolic (global hierarchy):

```rust
// Proposed integration in differentiable_search
pub fn differentiable_search_dual_space(
    query: &[f32],
    candidates: &[Vec<f32>],
    k: usize,
    dual_space_config: DualSpaceConfig,
) -> (Vec<usize>, Vec<f32>) {
    let dual_attn = DualSpaceAttention::new(dual_space_config);
    
    // Combine both geometric perspectives
    let (euc_scores, hyp_scores) = dual_attn.get_space_contributions(&query, &candidates);
    
    // Adaptive weighting based on query characteristics
    let combined_scores = adaptive_combine(euc_scores, hyp_scores, &query);
    // ...
}
```

## Proposed Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                     ruvector-gnn (Enhanced)                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────┐    ┌─────────────────┐                    │
│  │  RuvectorLayer  │───▶│ AttentionBackend │◀─┐                │
│  │  (GNN Layer)    │    │   (Pluggable)    │  │                │
│  └─────────────────┘    └─────────────────┘  │                │
│                                │              │                │
│         ┌──────────────────────┼──────────────┘                │
│         │                      │                               │
│         ▼                      ▼                               │
│  ┌──────────────┐    ┌──────────────────┐                      │
│  │  Training    │    │ ruvector-attention│                      │
│  │  (EWC, Adam) │    │    (imported)     │                      │
│  └──────────────┘    └──────────────────┘                      │
│                              │                                 │
│       ┌──────────────────────┼──────────────────────┐         │
│       │                      │                      │         │
│       ▼                      ▼                      ▼         │
│ ┌───────────┐    ┌────────────────┐    ┌──────────────┐       │
│ │Hyperbolic │    │ Edge-Featured  │    │    Flash     │       │
│ │ Attention │    │  GAT (GATv2)   │    │  Attention   │       │
│ └───────────┘    └────────────────┘    └──────────────┘       │
│       │                      │                      │         │
│       └──────────────────────┼──────────────────────┘         │
│                              ▼                                 │
│                    ┌──────────────────┐                        │
│                    │  MoE Routing     │                        │
│                    │  (Adaptive)      │                        │
│                    └──────────────────┘                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

## Implementation Roadmap

### Phase 1: Foundation (Week 1-2)
- [ ] Add `ruvector-attention` as dependency to `ruvector-gnn`
- [ ] Create `AttentionBackend` trait for pluggable attention
- [ ] Refactor `RuvectorLayer` to use trait-based attention
- [ ] Add feature flags: `hyperbolic`, `flash`, `moe`, `full-attention`

### Phase 2: Core Integration (Week 3-4)
- [ ] Integrate `EdgeFeaturedAttention` into GNN message passing
- [ ] Replace internal MHA with `ruvector-attention::MultiHeadAttention`
- [ ] Add `DualSpaceAttention` option for `hierarchical_forward`
- [ ] Implement `HyperbolicGNNLayer` variant

### Phase 3: Search Enhancement (Week 5-6)
- [ ] Create `differentiable_search_hyperbolic` function
- [ ] Add Flash attention for large candidate sets
- [ ] Implement MoE-based adaptive search
- [ ] Benchmark against current implementation

### Phase 4: NAPI Bindings (Week 7-8)
- [ ] Expose new attention modes via NAPI
- [ ] Update `Float32Array` interfaces for all new functions
- [ ] Create JavaScript examples and documentation
- [ ] Performance optimization for Node.js

### Phase 5: Production Hardening (Week 9-10)
- [ ] Comprehensive benchmarking suite
- [ ] Memory profiling and optimization
- [ ] SIMD acceleration for attention kernels
- [ ] Documentation and migration guide

## API Design

### Rust API

```rust
// New attention-aware GNN layer
pub struct RuvectorLayerV2 {
    attention: Box<dyn AttentionBackend>,
    gru: GRUCell,
    norm: LayerNorm,
}

impl RuvectorLayerV2 {
    pub fn with_hyperbolic(config: HyperbolicConfig) -> Self;
    pub fn with_edge_featured(config: EdgeFeaturedConfig) -> Self;
    pub fn with_dual_space(config: DualSpaceConfig) -> Self;
    pub fn with_moe(config: MoEConfig) -> Self;
}

// Enhanced search
pub fn hierarchical_forward_v2(
    query: &[f32],
    layer_embeddings: &[Vec<Vec<f32>>],
    gnn_layers: &[RuvectorLayerV2],
    search_config: SearchConfig,
) -> SearchResult;

pub struct SearchConfig {
    pub attention_mode: AttentionMode,
    pub hyperbolic_curvature: Option<f32>,
    pub temperature: f32,
    pub use_flash: bool,
}
```

### NAPI/JavaScript API

```typescript
// New exports from @ruvector/gnn
export interface AttentionConfig {
  mode: 'standard' | 'hyperbolic' | 'dual_space' | 'edge_featured' | 'moe';
  curvature?: number;        // For hyperbolic
  euclideanWeight?: number;  // For dual_space
  numExperts?: number;       // For MoE
  topK?: number;             // For MoE
}

export function hierarchicalForwardV2(
  query: Float32Array,
  layerEmbeddings: Float32Array[][],
  layers: string[],
  config: AttentionConfig
): Float32Array;

export function differentiableSearchHyperbolic(
  query: Float32Array,
  candidates: Float32Array[],
  k: number,
  curvature: number
): { indices: number[]; weights: Float32Array };
```

## Performance Expectations

| Metric | Current | With Attention Integration |
|--------|---------|---------------------------|
| Recall@10 (hierarchical) | ~85% | ~92-95% |
| Memory (1M vectors) | Baseline | -40% (Flash) |
| Latency (search) | Baseline | -15% (optimized paths) |
| GNN accuracy | Baseline | +20% (edge features) |

## Compatibility

- **Backward Compatible**: All existing APIs preserved
- **Opt-in Features**: New features behind feature flags
- **Migration Path**: Gradual adoption via `V2` suffix functions

## Open Questions

1. **Default attention mode**: Should we default to `dual_space` for best general performance?
2. **Curvature learning**: Should curvature be learnable or fixed?
3. **MoE overhead**: Is the routing overhead worth it for smaller datasets?
4. **WASM support**: Which attention modes should be WASM-compatible?

## References

- [Graph Attention Networks (Veličković et al., 2017)](https://arxiv.org/abs/1710.10903)
- [GATv2 (Brody et al., 2021)](https://arxiv.org/abs/2105.14491)
- [Hyperbolic Neural Networks (Ganea et al., 2018)](https://arxiv.org/abs/1805.09112)
- [Flash Attention (Dao et al., 2022)](https://arxiv.org/abs/2205.14135)
- [Mixture of Experts (Shazeer et al., 2017)](https://arxiv.org/abs/1701.06538)

## Related Issues

- #35 - NAPI-RS type conversion (fixed in #36)
- #37 - Security vulnerabilities (fixed)

---

**Requesting feedback on:**
1. Priority ordering of phases
2. Additional attention mechanisms to consider
3. Specific use cases to optimize for
4. NAPI binding design preferences

/cc @ruvnet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Deep Integration of ruvector-attention into ruvector-gnn #38

RFC: Deep Integration of ruvector-attention into ruvector-gnn

Executive Summary

Current State Analysis

ruvector-attention (Standalone Crate)

ruvector-gnn (Current Implementation)

Integration Benefits

1. Hyperbolic Space for Hierarchical Search 🌐

2. Edge-Featured Graph Attention (GATv2) 🔗

3. Flash Attention for Large-Scale Search ⚡

4. Mixture of Experts for Adaptive Search 🎯

5. Dual-Space Attention for Hybrid Search 🔄

Proposed Architecture

Implementation Roadmap

Phase 1: Foundation (Week 1-2)

Phase 2: Core Integration (Week 3-4)

Phase 3: Search Enhancement (Week 5-6)

Phase 4: NAPI Bindings (Week 7-8)

Phase 5: Production Hardening (Week 9-10)

API Design

Rust API

NAPI/JavaScript API

Performance Expectations

Compatibility

Open Questions

References

Related Issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Component	Description	Status
Scaled Dot-Product	Standard transformer attention	✅ Complete
Multi-Head Attention	Parallel attention heads	✅ Complete
Edge-Featured Attention (GATv2)	Graph attention with edge features	✅ Complete
Hyperbolic Attention	Poincaré ball model attention	✅ Complete
Dual-Space Attention	Euclidean + Hyperbolic combined	✅ Complete
Flash Attention	Memory-efficient O(block_size)	✅ Complete
Linear Attention	O(n) complexity attention	✅ Complete
MoE Attention	Mixture of Experts routing	✅ Complete
GraphRoPE	Rotary Position Embeddings for graphs	✅ Complete

Component	Description	Limitation
MultiHeadAttention	Basic MHA in `layer.rs`	Duplicates attention crate functionality
RuvectorLayer	GNN layer with attention	Uses internal MHA only
differentiable_search	Soft attention search	Basic cosine similarity only
hierarchical_forward	Layer-wise processing	No geometric attention support

Query Type	Best Expert	Why
Hierarchical data	HyperbolicExpert	Tree-like structure
Dense clusters	StandardExpert	Euclidean similarity
Sparse features	LinearExpert	Low-rank attention

Metric	Current	With Attention Integration
Recall@10 (hierarchical)	~85%	~92-95%
Memory (1M vectors)	Baseline	-40% (Flash)
Latency (search)	Baseline	-15% (optimized paths)
GNN accuracy	Baseline	+20% (edge features)

RFC: Deep Integration of ruvector-attention into ruvector-gnn #38

Description

RFC: Deep Integration of ruvector-attention into ruvector-gnn

Executive Summary

Current State Analysis

ruvector-attention (Standalone Crate)

ruvector-gnn (Current Implementation)

Integration Benefits

1. Hyperbolic Space for Hierarchical Search 🌐

2. Edge-Featured Graph Attention (GATv2) 🔗

3. Flash Attention for Large-Scale Search ⚡

4. Mixture of Experts for Adaptive Search 🎯

5. Dual-Space Attention for Hybrid Search 🔄

Proposed Architecture

Implementation Roadmap

Phase 1: Foundation (Week 1-2)

Phase 2: Core Integration (Week 3-4)

Phase 3: Search Enhancement (Week 5-6)

Phase 4: NAPI Bindings (Week 7-8)

Phase 5: Production Hardening (Week 9-10)

API Design

Rust API

NAPI/JavaScript API

Performance Expectations

Compatibility

Open Questions

References

Related Issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions