Signer-Conditioned Attention ST-GCN with multi-head bias-aware gating

A PyTorch implementation for sign language detection and recognition using Spatio-Temporal Graph Convolutional Networks (ST-GCN). This model incorporates signer-aware attention, adversarial training to reduce signer bias, and supports both multi-class sign classification and binary signing detection.

Features

ST-GCN Backbone: Spatio-temporal graph convolutional layers for modeling sign language gestures
Bias-Aware Attention: Attention mechanism conditioned on signer embeddings to adapt to different signing styles
Adversarial Signer Head: Optional domain-adversarial training via gradient reversal layer to reduce signer-specific bias
Supervised Contrastive Learning: Optional InfoNCE-based contrastive loss for better feature learning
Flexible Landmark Support: Works with pose, hands, and face landmarks (e.g., from MediaPipe or OpenPose)
Pseudo-Signer Clustering: K-means clustering of pose statistics to create pseudo signer IDs for unsupervised signer diversity
Binary & Multi-Class: Supports both binary classification (signing vs. not-signing) and multi-class sign recognition

Architecture

The model consists of several key components:

ST-GCN Backbone: Processes spatio-temporal keypoint sequences using graph convolutions
Signer Encoder: Encodes per-window pose statistics into signer embeddings
Bias-Aware Attention: Temporally attends to features while being conditioned on signer style
Temporal Aggregator: BiGRU-based aggregation over time
Classifier Head: Final classification layer
Signer Head (optional): Adversarial head for signer identification to encourage signer-invariant features

Installation

pip install -r requirements.txt

Requirements

PyTorch >= 2.1.0
NumPy >= 1.23
scikit-learn >= 1.3
matplotlib >= 3.7
tqdm >= 4.65

Optional (for landmark extraction):

MediaPipe >= 0.10.0
OpenCV >= 4.7.0

Data Format

The model expects:

Landmark files: .npz files containing keypoint sequences. Each file should contain arrays for:
- pose: (T, 33, C) - Body pose landmarks
- left_hand: (T, 21, C) - Left hand landmarks (optional)
- right_hand: (T, 21, C) - Right hand landmarks (optional)
- face: (T, 478, C) - Face landmarks (optional)
Groundtruth file: Text file with format:
```
<video_id> <frame_idx> <label>
```

Example: s means signing and n means no signing

video001 0 s
video001 25 s
video001 50 n

Usage

Training

Basic Training

python -m train_landmarks \
    --data /path/to/landmarks/folder \
    --batch 32 \
    --epochs 50 \
    --lr 1e-3 \
    --num-classes 100 \
    --window 25 \
    --stride 1 \
    --include-pose \
    --include-hands \
    --save-best \
    --log-csv \
    --out runs/signgcn

Binary Classification (Signing Detection)

python -m train_landmarks \
    --data /path/to/landmarks/folder \
    --binary \
    --signing-labels "sign,signing,gesture" \
    --batch 32 \
    --epochs 50 \
    --best-metric pr_auc \
    --out runs/signgcn_binary

With Pseudo-Signer Clustering

Uses K-means clustering on pose statistics to create pseudo signer IDs for adversarial training:

python -m train_landmarks \
    --data /path/to/landmarks/folder \
    --use-pseudo-signers \
    --num-pseudo-signers 8 \
    --signer-loss-weight 0.5 \
    --batch 32 \
    --epochs 50 \
    --out runs/signgcn_pseudo

With Supervised Contrastive Learning

Adds InfoNCE-based contrastive loss:

python -m train_landmarks \
    --data /path/to/landmarks/folder \
    --use-supcon \
    --supcon-weight 0.1 \
    --supcon-temp 0.07 \
    --batch 32 \
    --epochs 50 \
    --out runs/signgcn_supcon

Using Pre-defined Splits

# First, create splits
python -m build_video_splits \
    --data /path/to/landmarks/folder \
    --window 25 \
    --stride 1 \
    --train 0.8 \
    --val 0.1 \
    --test 0.1 \
    --out /path/to/landmarks/folder/splits.json

# Then train with splits
python -m train_landmarks \
    --data /path/to/landmarks/folder \
    --splits-json /path/to/landmarks/folder/splits.json \
    --batch 32 \
    --epochs 50 \
    --out runs/signgcn_splits

Key Training Arguments

--data: Path to folder containing .npz files and groundtruth
--window: Temporal window size (default: 25)
--stride: Sliding window stride (default: 1)
--coords: Number of coordinates per keypoint (default: 2, i.e., x, y)
--include-pose: Include body pose landmarks
--include-hands: Include hand landmarks
--include-face: Include face landmarks
--batch: Batch size
--epochs: Number of training epochs
--lr: Learning rate
--num-classes: Number of sign classes (for multi-class)
--binary: Train binary classifier (signing vs. not-signing)
--use-pseudo-signers: Enable pseudo-signer clustering
--use-supcon: Enable supervised contrastive learning
--save-best: Save best model checkpoint
--best-metric: Metric for best model selection (acc, loss, pr_auc, f1)
--log-csv: Log metrics to CSV file
--out: Output directory for checkpoints and logs

Visualization

Visualize signer embeddings with t-SNE or PCA:

python -m visualize_signer_embeddings \
    --data /path/to/landmarks/folder \
    --ckpt runs/signgcn/best.pt \
    --window 25 \
    --include-pose \
    --include-hands \
    --reduce tsne \
    --out embeddings_plot.png \
    --save-csv embeddings.csv

Building Video Splits

Create train/val/test splits based on videos:

python -m build_video_splits \
    --data /path/to/landmarks/folder \
    --window 25 \
    --stride 1 \
    --s-label S \
    --rule any \
    --train 0.8 \
    --val 0.1 \
    --test 0.1 \
    --out splits.json

Testing All Variations

Train all model variations on a small subset of data for quick testing:

python train_all_variations.py \
    --data /path/to/landmarks/folder \
    --max-files 3 \
    --epochs 2 \
    --batch 8 \
    --base-out runs/test_variations

This script will train:

Basic training
Binary classification
Pseudo-signer clustering
Supervised contrastive learning
FiLM attention
MI minimization
Combined variations

Options:

--max-files: Limit number of .npz files to use (default: 3)
--epochs: Number of epochs per variation (default: 2)
--batch: Batch size (default: 8)
--base-out: Base output directory (default: runs/test_variations)
--skip-*: Skip specific variations (e.g., --skip-binary, --skip-pseudo)

Model Configuration

The model can be customized through various hyperparameters:

stgcn_channels: Hidden dimensions for ST-GCN layers (default: (64, 128, 128))
stgcn_kernel: Temporal kernel size (default: 3)
stgcn_dilations: Dilation rates for temporal convolutions (default: (1, 2, 3))
temporal_hidden: Hidden dimension for temporal aggregator (default: 256)
signer_emb_dim: Dimension of signer embedding (default: 64)
attn_heads: Number of attention heads (default: 4)
lambda_grl: Gradient reversal weight for adversarial training (default: 0.5)
dropout: Dropout rate (default: 0.1)

Output

Training produces:

best.pt: Best model checkpoint (if --save-best is used)
last.pt: Last epoch checkpoint
metrics.csv: Training metrics per epoch (if --log-csv is used)

Checkpoint format:

{
    'model': model.state_dict(),
    'epoch': epoch_number,
    'val_acc': validation_accuracy,
    # ... other metadata
}

Project Structure

SignSTGCN/
├── train_landmarks.py          # Main training script
├── train_all_variations.py     # Train all variations on small data subset
├── signGCN.py                  # Model entry point / example usage
├── model.py                    # SignSTGCNModel implementation
├── losses.py                   # Loss functions
├── build_video_splits.py       # Video split generation
├── visualize_signer_embeddings.py  # Embedding visualization
├── test_film_attention.py       # Test script for FiLM attention
├── datasets/
│   └── landmarks_npz.py        # NPZ landmark dataset
├── layers/
│   └── graph.py                # ST-GCN backbone
├── modules/
│   ├── signer.py               # Signer encoder
│   ├── attention.py            # Bias-aware attention
│   ├── temporal.py             # Temporal aggregator
│   └── heads.py                # Classification and signer heads
└── utils/
    └── graph.py                # Graph utilities

Citation

If you use this code in your research, please cite:

@software{signstgcn,
  title={SignSTGCN: Sign Language Detection with Spatio-Temporal Graph Convolutional Networks},
  author={Cyrine Chaabani},
  year={2024}
}

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.

Acknowledgments

This implementation builds upon ST-GCN architectures and incorporates techniques for reducing signer bias in sign language recognition models.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
GC_training		GC_training
datasets		datasets
layers		layers
modules		modules
runs		runs
utils		utils
.gitignore		.gitignore
EXPERIMENTS_README.md		EXPERIMENTS_README.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
assemble_segment_lists.py		assemble_segment_lists.py
build_video_splits.py		build_video_splits.py
check_npz_files.py		check_npz_files.py
compare_attention_types.py		compare_attention_types.py
copy_videos_with_label.py		copy_videos_with_label.py
copy_videos_with_label_p.py		copy_videos_with_label_p.py
create_label_p_segments.py		create_label_p_segments.py
create_segment_lists.py		create_segment_lists.py
create_segment_npzs.py		create_segment_npzs.py
create_segments.py		create_segments.py
download_segment_lists.py		download_segment_lists.py
find_videos_with_label.py		find_videos_with_label.py
losses.py		losses.py
model.py		model.py
requirements.txt		requirements.txt
run_test.py		run_test.py
signGCN.py		signGCN.py
summarize_segment_labels.py		summarize_segment_labels.py
test_film_attention.py		test_film_attention.py
train_all_variations.py		train_all_variations.py
train_landmarks.py		train_landmarks.py
videos_with_label_P_not_in_s.txt		videos_with_label_P_not_in_s.txt
videos_with_label_S.txt		videos_with_label_S.txt
visualize_signer_embeddings.py		visualize_signer_embeddings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Signer-Conditioned Attention ST-GCN with multi-head bias-aware gating

Features

Architecture

Installation

Requirements

Data Format

Usage

Training

Basic Training

Binary Classification (Signing Detection)

With Pseudo-Signer Clustering

With Supervised Contrastive Learning

Using Pre-defined Splits

Key Training Arguments

Visualization

Building Video Splits

Testing All Variations

Model Configuration

Output

Project Structure

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

Cyr-Ch/SCA-STGCN

Folders and files

Latest commit

History

Repository files navigation

Signer-Conditioned Attention ST-GCN with multi-head bias-aware gating

Features

Architecture

Installation

Requirements

Data Format

Usage

Training

Basic Training

Binary Classification (Signing Detection)

With Pseudo-Signer Clustering

With Supervised Contrastive Learning

Using Pre-defined Splits

Key Training Arguments

Visualization

Building Video Splits

Testing All Variations

Model Configuration

Output

Project Structure

Citation

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages