Feature Store Benchmark

Setup

Run

pip install -r requirements.txt
export PYTHONPATH='.'

Check the config.yml file and make sure the required files and directories exist.

(Optional) Setup terraform:

terraform apply;
terraform output -json > config.json;

Experiments

First, update config.yml to point to the correct directories.

recsys (ALS)

Train a model with python workloads/recsys/train_als.py. Make sure the split/dataset is set to what you want.

python workloads/recsys/als_train \
    --split 0.5 \
    --dataset "ml-1m" \
    --workers 12 \ 
    --resume [True/False] \ # resume from previous checkpoint
    --download_dataset [True/False] \

Run streaming inference/updates. Make sure you have the right dataset set in the script.

python workloads/recsys/stream_als.py \
    --split 0.5 \
    --dataset "ml-1m" \
    --workers 12 \ 
    --download_dataset [True/False] \ # download exisitng model/data

Evaluate in nb/als-plots.ipynb

anomaly detection (STL)

Run streaming inference/updates.

python workloads/stl/stream_simulation.py

Repository structure

experiment_name/ 
    notebooks/ 
    data/ 
    preprocessing/
    simulation/
    ralf/
	client.py
	server.py
    download_data.sh

Dataset Structure

Event stream data: events_<NUM_KEYS>_<TIME_INTERVAL_MS>_<NUM_ROWS>.csv

event_id (unique id)
key_id
ts (millisecond timestamp since interval start)
value

Query stream data: queries_<NUM_KEYS>_<TIME_INTERVAL_MS>_<NUM_ROWS>.csv

query_id (unique id)
key_id (queried key)
ts

Optimal feature data: features_<NUM_KEYS>_<TIME_INTERVAL_MS>_<NUM_ROWS>.csv

key_id
ts (Range from `0-<TIME_INTERVAL_MS>)
feature (optimal pre-computed feature value at ts)

Optimal prediction data predictions_<NUM_KEYS>_<TIME_INTERVAL_MS>_<NUM_ROWS>.csv

query_id (corresponds to prediction query)
key_id
prediction (optimal prediction result)

Experiment Output

Experiments should output a actual_features_<...>.csv and actual_predictions_<...>.csv files, which can be compared to pre-generated ideal feature/prediction data to evaluate performance.

Name		Name	Last commit message	Last commit date
Latest commit History 318 Commits
nb		nb
scripts		scripts
workloads		workloads
.gitignore		.gitignore
README.md		README.md
config.yml		config.yml
recsys_experiments.sh		recsys_experiments.sh
requirements.txt		requirements.txt
run_stl.sh		run_stl.sh
stl_experimets.sh		stl_experimets.sh
test_stl.sh		test_stl.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Feature Store Benchmark

Setup

Experiments

recsys (ALS)

anomaly detection (STL)

Repository structure

Dataset Structure

Experiment Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

feature-store/experiments

Folders and files

Latest commit

History

Repository files navigation

Feature Store Benchmark

Setup

Experiments

recsys (ALS)

anomaly detection (STL)

Repository structure

Dataset Structure

Experiment Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages