Skip to content

jackstine/Options-Deep-Data-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Options-Deep: Advanced Stock Analysis Platform

Python PostgreSQL SQLAlchemy License

A comprehensive Python-based stock analysis platform for multi-source data ingestion, normalization, and algorithmic analysis.

Features โ€ข Quick Start โ€ข Architecture โ€ข Documentation โ€ข Contributing


๐ŸŽฏ Overview

Options-Deep is a modular, extensible platform designed for sophisticated stock market analysis. It provides a unified interface for ingesting data from multiple financial data sources, performing advanced transformations, and running analytical algorithms on normalized datasets.

Key Capabilities

  • Multi-Source Data Ingestion: Seamlessly integrate data from NASDAQ, Yahoo Finance, Alpha Vantage, and more
  • Intelligent Data Normalization: Automatic data cleaning, validation, and standardization across sources
  • Advanced Analytics Engine: Built-in technical indicators, fundamental analysis, and machine learning algorithms
  • Enterprise-Grade Database: PostgreSQL with SQLAlchemy ORM for reliable data persistence
  • Extensible Architecture: Plugin-based design for easy addition of new data sources and algorithms

โœจ Features

๐Ÿ“ˆ Data Sources

  • NASDAQ Screener - CSV import with company fundamentals
  • Yahoo Finance - Real-time quotes and historical data
  • Polygon.io - Professional-grade market data
  • CSV/JSON Files - Custom data import capabilities
  • Extensible Framework - Add new sources with simple wrapper pattern

๐Ÿ”ง Data Processing

  • Automatic Data Validation - Ensures data quality and consistency
  • Symbol Normalization - Handles different ticker formats across exchanges
  • Timezone Management - UTC normalization for global markets
  • Duplicate Detection - Intelligent handling of overlapping data sources

๐Ÿ—๏ธ Database Architecture

  • Companies Table - Core business information and fundamentals
  • Tickers Table - Symbol mapping and metadata
  • Ticker History - Time-series price and volume data
  • Migration System - Alembic-powered database versioning
  • Multi-Database Support - Separate schemas for equities and algorithms

๐Ÿงฎ Analysis Capabilities

  • Technical Indicators - Moving averages, RSI, MACD, Bollinger Bands
  • Fundamental Analysis - P/E ratios, market cap analysis, sector comparisons
  • Machine Learning - Price prediction models and clustering analysis
  • Custom Algorithms - Framework for building proprietary analysis tools

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.9+ with pip
  • PostgreSQL 12+
  • Git for version control
  • Virtual Environment (recommended: venv, conda, or pyenv)

Installation

  1. Clone the Repository

    git clone https://github.com/your-username/options-deep.git
    cd options-deep
  2. Set Up Python Environment

    # Create virtual environment
    python -m venv venv
    
    # Activate environment
    source venv/bin/activate  # Linux/Mac
    # or
    venv\Scripts\activate     # Windows
    
    # Install dependencies
    pip install -r requirements.txt
  3. Configure Environment

    # Copy environment template
    cp .env.example .env
    
    # Edit configuration (see Configuration section below)
    nano .env
  4. Set Up Database

    # Set database password environment variable
    export OPTIONS_DEEP_DATA_WAREHOUSE_PASSWORD=your_password
    
    # Run database migrations
    cd src/database/equities
    alembic upgrade head
  5. Verify Installation

    # Run example company ingestion
    python examples/company_pipeline_usage.py
    
    # Sync NASDAQ screener data
    python src/cmd/nasdaq_screener_sync/main.py

First Steps

  1. Load Sample Data

    from src.pipelines.companies.new_company_pipeline import CompanyPipeline
    from src.data_sources.nasdaq.screener import load_screener_file
    
    # Load companies from NASDAQ screener
    companies = load_screener_file("data/nasdaq_screener_companies.json")
    
    # Process through pipeline
    pipeline = CompanyPipeline()
    results = pipeline.process_companies(companies)
  2. Query Your Data

    from src.repos.equities.companies.company_repository import CompanyRepository
    from src.config.configuration import CONFIG
    
    # Get database configuration
    db_config = CONFIG.get_equities_config()
    
    # Initialize repository
    repo = CompanyRepository(db_config)
    
    # Find companies by sector
    tech_companies = repo.find_by_sector("Technology")
    print(f"Found {len(tech_companies)} technology companies")

๐Ÿ›๏ธ Architecture

System Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Data Sources  โ”‚โ”€โ”€โ”€โ”€โ”‚   Data Pipeline   โ”‚โ”€โ”€โ”€โ”€โ”‚    Database      โ”‚
โ”‚                 โ”‚    โ”‚                   โ”‚    โ”‚                  โ”‚
โ”‚ โ€ข NASDAQ        โ”‚    โ”‚ โ€ข Validation      โ”‚    โ”‚ โ€ข Companies      โ”‚
โ”‚ โ€ข Yahoo Finance โ”‚    โ”‚ โ€ข Normalization   โ”‚    โ”‚ โ€ข Tickers        โ”‚
โ”‚ โ€ข Polygon       โ”‚    โ”‚ โ€ข Transformation  โ”‚    โ”‚ โ€ข Price History  โ”‚
โ”‚ โ€ข CSV/JSON      โ”‚    โ”‚ โ€ข Deduplication   โ”‚    โ”‚ โ€ข Analysis       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                  โ”‚
                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                       โ”‚  Analysis Engine  โ”‚
                       โ”‚                   โ”‚
                       โ”‚ โ€ข Technical       โ”‚
                       โ”‚ โ€ข Fundamental     โ”‚
                       โ”‚ โ€ข ML Models       โ”‚
                       โ”‚ โ€ข Custom Algos    โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Project Structure

options-deep/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ cmd/                     # Command-line applications
โ”‚   โ”‚   โ””โ”€โ”€ nasdaq_screener_sync/# NASDAQ data synchronization
โ”‚   โ”œโ”€โ”€ config/                  # Configuration management
โ”‚   โ”‚   โ”œโ”€โ”€ models/              # Configuration data models
โ”‚   โ”‚   โ””โ”€โ”€ environment_configs/ # Environment-specific settings
โ”‚   โ”œโ”€โ”€ data_sources/            # Data ingestion layer
โ”‚   โ”‚   โ”œโ”€โ”€ base/                # Abstract base classes
โ”‚   โ”‚   โ”œโ”€โ”€ models/              # Data models (Company, Ticker)
โ”‚   โ”‚   โ”œโ”€โ”€ nasdaq/              # NASDAQ data provider
โ”‚   โ”‚   โ””โ”€โ”€ yahoo_finance/       # Yahoo Finance integration
โ”‚   โ”œโ”€โ”€ database/                # Database layer
โ”‚   โ”‚   โ”œโ”€โ”€ equities/            # Equities database schema
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ tables/          # SQLAlchemy models
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ migrations/      # Alembic migrations
โ”‚   โ”‚   โ””โ”€โ”€ algorithms/          # Algorithms database schema
โ”‚   โ”œโ”€โ”€ repos/                   # Repository pattern implementations
โ”‚   โ”‚   โ””โ”€โ”€ equities/            # Equity data repositories
โ”‚   โ””โ”€โ”€ pipelines/               # Data processing pipelines
โ”œโ”€โ”€ tests/                       # Comprehensive test suite
โ”œโ”€โ”€ data/                        # Sample data files
โ”œโ”€โ”€ examples/                    # Usage examples
โ””โ”€โ”€ scripts/                     # Utility scripts

Key Components

๐Ÿ”Œ Data Source Abstraction

# All data sources implement common interface
class CompanyDataSource(ABC):
    @abstractmethod
    def fetch_companies(self) -> List[Company]:
        """Fetch company data from source"""
        pass
    
    @abstractmethod  
    def validate_connection(self) -> bool:
        """Test data source connectivity"""
        pass

๐Ÿ—„๏ธ Repository Pattern

# Clean separation of data access logic
class CompanyRepository:
    def create_company(self, company: Company) -> Company:
        """Create new company record"""
        
    def find_by_symbol(self, symbol: str) -> Optional[Company]:
        """Find company by ticker symbol"""
        
    def find_by_sector(self, sector: str) -> List[Company]:
        """Find all companies in sector"""

โš™๏ธ Configuration Management

# Environment-aware configuration system
from src.config.configuration import CONFIG

# Get database configuration for current environment
db_config = CONFIG.get_equities_config()

# Automatic environment detection: local, dev, prod
engine = create_engine(db_config.connection_string())

๐Ÿ“Š Database Schema

Core Tables

Companies

CREATE TABLE companies (
    id SERIAL PRIMARY KEY,
    company_name VARCHAR(500) NOT NULL,
    exchange VARCHAR(20) NOT NULL,
    sector VARCHAR(100) INDEX,
    industry VARCHAR(200) INDEX, 
    country VARCHAR(100),
    market_cap BIGINT,
    description TEXT,
    active BOOLEAN DEFAULT TRUE INDEX,
    source VARCHAR(50) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Tickers

CREATE TABLE tickers (
    id SERIAL PRIMARY KEY,
    symbol VARCHAR(20) NOT NULL UNIQUE INDEX,
    company_id INTEGER REFERENCES companies(id),
    primary_ticker BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Ticker History

CREATE TABLE ticker_history (
    id SERIAL PRIMARY KEY,
    ticker_id INTEGER REFERENCES tickers(id),
    company_id INTEGER REFERENCES companies(id),
    price_date DATE NOT NULL,
    open_price DECIMAL(10,2),
    high_price DECIMAL(10,2),
    low_price DECIMAL(10,2),
    close_price DECIMAL(10,2),
    volume BIGINT,
    source VARCHAR(50),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(ticker_id, price_date)
);

โšก Performance & Scaling

Database Optimizations

  • Strategic Indexing - Optimized indexes on frequently queried columns
  • Connection Pooling - SQLAlchemy connection pool management
  • Query Optimization - Efficient joins and bulk operations
  • Partitioning Ready - Architecture supports table partitioning for large datasets

Data Processing

  • Batch Operations - Bulk inserts for high-throughput data loading
  • Async Support - Asynchronous data fetching capabilities
  • Memory Efficient - Streaming processing for large datasets
  • Caching Layer - Built-in caching for frequently accessed data

๐Ÿ”ง Configuration

Environment Variables

Create a .env file in the project root:

# Environment (local, dev, prod)
ENVIRONMENT=local

# Database Password
OPTIONS_DEEP_DATA_WAREHOUSE_PASSWORD=your_secure_password

# API Keys (optional)
YAHOO_FINANCE_API_KEY=your_yahoo_key
POLYGON_API_KEY=your_polygon_key
ALPHA_VANTAGE_API_KEY=your_alpha_vantage_key

# Logging
LOG_LEVEL=INFO

Database Configuration

The system supports multiple environments with different database configurations:

// src/config/environment_configs/local.json
{
    "databases": {
        "equities": {
            "host": "localhost",
            "port": 5432,
            "database": "options_deep_equities_local",
            "username": "e-user"
        },
        "algorithm": {
            "host": "localhost", 
            "port": 5432,
            "database": "options_deep_algorithm_local",
            "username": "e-user"
        }
    }
}

๐Ÿงช Testing

Test Suite Structure

tests/
โ”œโ”€โ”€ unit/                    # Fast, isolated unit tests
โ”‚   โ”œโ”€โ”€ test_data_sources/   # Data source testing
โ”‚   โ”œโ”€โ”€ test_models/         # Model validation tests
โ”‚   โ””โ”€โ”€ test_repositories/   # Repository logic tests
โ”œโ”€โ”€ integration/             # Database integration tests
โ”‚   โ”œโ”€โ”€ test_pipelines/      # End-to-end pipeline tests
โ”‚   โ””โ”€โ”€ test_database/       # Database operation tests
โ””โ”€โ”€ fixtures/                # Test data and mocks

Running Tests

# Install test dependencies
pip install -r requirements-dev.txt

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test category
pytest tests/unit/           # Unit tests only
pytest tests/integration/    # Integration tests only

# Run with database (requires test database setup)
pytest tests/integration/ --db

Test Database Setup

# Create test database
createdb options_deep_test

# Set test environment
export ENVIRONMENT=test
export OPTIONS_DEEP_DATA_WAREHOUSE_PASSWORD=test_password

# Run migrations
cd src/database/equities
alembic upgrade head

๐Ÿ“ˆ Usage Examples

1. Company Data Ingestion

from src.data_sources.nasdaq.screener import load_screener_file
from src.pipelines.companies.new_company_pipeline import CompanyPipeline
from src.config.configuration import CONFIG

# Load NASDAQ screener data
companies = load_screener_file("data/nasdaq_screener_8_3_2025.csv")

# Process through pipeline
db_config = CONFIG.get_equities_config() 
pipeline = CompanyPipeline(db_config)

# Batch process companies
results = pipeline.process_companies(companies)
print(f"Processed {len(results.successful)} companies successfully")

2. Querying Company Data

from src.repos.equities.companies.company_repository import CompanyRepository

# Initialize repository
repo = CompanyRepository(CONFIG.get_equities_config())

# Find companies by criteria
apple = repo.find_by_symbol("AAPL")
tech_companies = repo.find_by_sector("Technology")
large_caps = repo.find_by_market_cap_range(10_000_000_000, None)

# Get company statistics
stats = repo.get_sector_statistics()
print(f"Technology sector has {stats['Technology']['count']} companies")

3. Multi-Source Data Integration

from src.data_sources.factory import DataSourceFactory

# Create different data source providers
nasdaq_source = DataSourceFactory.create("nasdaq_screener", 
                                         file_path="data/screener.csv")
yahoo_source = DataSourceFactory.create("yahoo_finance",
                                       api_key="your_key")

# Fetch and merge data
nasdaq_data = nasdaq_source.fetch_companies()
yahoo_data = yahoo_source.fetch_historical_prices(["AAPL", "GOOGL"])

# Process through unified pipeline
pipeline.merge_and_process([nasdaq_data, yahoo_data])

4. Custom Analysis Pipeline

from src.algorithms.technical.moving_averages import SMACalculator
from src.algorithms.fundamental.valuation import PERatioAnalyzer

# Create analysis pipeline
sma_calc = SMACalculator(period=20)
pe_analyzer = PERatioAnalyzer()

# Run analysis on company data
for company in tech_companies:
    # Technical analysis
    price_data = repo.get_price_history(company.ticker.symbol)
    sma_result = sma_calc.calculate(price_data)
    
    # Fundamental analysis
    pe_result = pe_analyzer.analyze(company)
    
    # Store results
    results_repo.save_analysis_result({
        'company_id': company.id,
        'sma_20': sma_result.current_value,
        'pe_ratio': pe_result.pe_ratio,
        'analysis_date': datetime.now()
    })

๐Ÿ› ๏ธ Development

Setting Up Development Environment

# Install development dependencies
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

# Run code formatting
black src/ tests/
isort src/ tests/

# Run type checking
mypy src/

# Run linting
flake8 src/ tests/

Code Standards

  • Type Annotations: Mandatory for all functions and variables
  • Docstrings: Google-style docstrings for all public functions
  • Error Handling: Comprehensive exception handling with custom exception types
  • Testing: Minimum 90% code coverage required
  • Formatting: Black code formatter with 88-character line limit

Contributing Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Implement your changes with tests
  4. Run the full test suite (pytest)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Database Migrations

# Create new migration
cd src/database/equities
alembic revision --autogenerate -m "Description of changes"

# Review generated migration file
# Edit migration file if needed

# Apply migration
alembic upgrade head

# Rollback migration (if needed)
alembic downgrade -1

๐Ÿ“š Documentation

Additional Resources

API Documentation

Generate API documentation:

# Install documentation dependencies
pip install sphinx sphinx-autodoc-typehints

# Generate docs
cd docs/
make html

# View documentation
open _build/html/index.html

๐Ÿšจ Troubleshooting

Common Issues

Database Connection Problems

# Check PostgreSQL is running
pg_ctl status

# Verify database exists
psql -l | grep options_deep

# Test connection
psql -h localhost -U e-user -d options_deep_equities_local

Environment Configuration Issues

# Verify environment variables
echo $ENVIRONMENT
echo $OPTIONS_DEEP_DATA_WAREHOUSE_PASSWORD

# Check configuration file exists
ls src/config/environment_configs/$ENVIRONMENT.json

# Validate JSON configuration
python -c "import json; print(json.load(open('src/config/environment_configs/local.json')))"

Data Import Problems

# Check data file format
head -5 data/nasdaq_screener_companies.json

# Validate data structure
python -c "
from src.data_sources.nasdaq.screener import load_screener_file
companies = load_screener_file('your_file.csv')
print(f'Loaded {len(companies)} companies')
"

๐Ÿ”ฎ Roadmap

Phase 1: Core Infrastructure โœ…

  • Multi-source data ingestion framework
  • Database schema and migrations
  • Configuration management system
  • Repository pattern implementation

Phase 2: Advanced Analytics ๐Ÿšง

  • Technical indicator library
  • Fundamental analysis algorithms
  • Machine learning price prediction models
  • Real-time data streaming support

Phase 3: API & Web Interface ๐Ÿ“‹

  • REST API development
  • Web dashboard for data visualization
  • Authentication and user management
  • Portfolio tracking capabilities

Phase 4: Performance & Scale ๐Ÿ“‹

  • Data pipeline optimization
  • Caching layer implementation
  • Horizontal scaling support
  • Advanced monitoring and alerting

๐Ÿค Contributing

We welcome contributions from the community! Whether you're fixing bugs, adding features, improving documentation, or sharing ideas, your input is valuable.

Ways to Contribute

  • ๐Ÿ› Bug Reports - Found an issue? Open a detailed bug report
  • ๐Ÿ’ก Feature Requests - Have an idea? We'd love to hear it
  • ๐Ÿ“ Documentation - Help improve our docs and examples
  • ๐Ÿ”ง Code Contributions - Submit pull requests for fixes and features
  • ๐Ÿงช Testing - Help expand our test coverage
  • ๐Ÿ“Š Data Sources - Add support for new financial data providers

Development Setup

See our Developer Guide for detailed setup instructions and coding standards.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ’ฌ Support

  • Documentation: Start with this README and Developer Guide
  • Issues: Open an issue on GitHub for bugs and feature requests
  • Discussions: Use GitHub Discussions for questions and general discussion
  • Email: Contact the maintainers at [your-email@domain.com]

Built with โค๏ธ for the financial analysis community

โญ Star this repo if you find it helpful!

About

ETL pipelines for retrieving data from EODHD.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published