QuantData

QuantData is a collection of Jupyter notebooks for automating quantitative factor research. It uses the OpenAI API to interpret factor descriptions (even from annotated images), generate Python implementations, validate them against minute-level market data, and evaluate the resulting factors before combining them into multi-factor portfolios.

Key Features

LLM-driven factor ideation: Capture factor definitions from prompts or screenshots, turn them into runnable code, and iterate automatically when errors occur.
Batch factor computation and testing: Generate factor values from minute data, merge with daily aggregates, neutralize by market value, and produce ranked deciles with backtests for each factor notebook.
Multi-factor ranking: Combine individual factor ranks into a single feather dataset ready for downstream portfolio construction.

Repository Layout

API_factor_process.ipynb – Loads API credentials from .env, instantiates the OpenAI client, and defines the four-step workflow (step1 – step4) that analyzes factor images, proposes code, retries on failure, and surfaces output paths.
factor_test/ – Folder of per-factor notebooks (e.g., factor_109.ipynb) that compute factor values, export monthly results, merge them with daily bars, neutralize, rank into deciles, and backtest both long-only and long-short implementations.
factor_combine_rank.ipynb – Merges multiple factor rank files with the daily price table to build a consolidated rank_combine.feather dataset for multi-factor selection.

Prerequisites

Install the Python stack used across the notebooks:

Workflow & helpers: openai, streamlit, python-dotenv, pathlib, time, base64, os
Data & analytics: pandas, numpy, statsmodels, matplotlib

Configuration

Create a .env file in the repository root with your API token, e.g.:
```
API_KEY=your_openai_like_service_key
```

The notebooks read this file and build the OpenAI client with the configured base_url and key.

Review and adjust the Windows-style data paths (e.g., F:\QuantData...) to match your local storage before running the notebooks.

Data Expectations

Minute-level inputs live under F:\QuantData\minute_data and are named like
YYYYMM_oneminute.feather for batch processing.
Generated factor outputs are saved to
F:\QuantData\factor_result (per factor) and
F:\QuantData\factor_result_allmonth<factor_name> (aggregated) for later merging.
Daily price data is expected at
F:\QuantData\AShareEODPrices_allmonth.feather
for alignment, neutralization, and ranking.
Quantile ranks for each factor are written to
F:\QuantData\factor_rank\rank_<id>.feather,
and combined ranks land in
F:\QuantData\factor_rank\rank_combine.feather.

Typical Workflow

Describe a factor:
Supply an explanatory image or prompt to API_factor_process.ipynb;
step1 and step2 analyze the factor and draft code, while step3 executes and auto-regenerates until it runs successfully.
Use step4 to surface the output file path.
Materialize factor values:
Execute the relevant notebook in factor_test/ to batch-process monthly minute files via generate_factor_files, producing per-factor feather outputs.
Integrate and evaluate:
Merge factor results with daily bars, neutralize by log market value, bucket into deciles, and backtest the spread between top and bottom groups.
Build multi-factor ranks:
After individual factors are ranked, run factor_combine_rank.ipynb to join them with the master price table and export the consolidated ranking dataset.

Tips

Keep an eye on API usage: the notebooks expose commented alternatives for different OpenAI-compatible endpoints if you need to switch providers.
Regeneration logic persists new code and error logs to disk (factor_pycode, factor_code_string), making it easier to inspect failed iterations and rerun only the necessary pieces.
The backtest cells plot cumulative net values for each decile plus a long-short series; rerun them after adjusting factors or rebalance periods to visualize performance shifts.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
factor_test		factor_test
._API_factor_process.ipynb		._API_factor_process.ipynb
._factor_combine_rank.ipynb		._factor_combine_rank.ipynb
._factor_test		._factor_test
._template_testback.ipynb		._template_testback.ipynb
API_factor_process.ipynb		API_factor_process.ipynb
README.md		README.md
factor_combine_rank.ipynb		factor_combine_rank.ipynb
template_testback.ipynb		template_testback.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QuantData

Key Features

Repository Layout

Prerequisites

Configuration

Data Expectations

Typical Workflow

Tips

About

Uh oh!

Releases

Packages

Languages

CheeseLee888/QuantData

Folders and files

Latest commit

History

Repository files navigation

QuantData

Key Features

Repository Layout

Prerequisites

Configuration

Data Expectations

Typical Workflow

Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages