Skip to content

cbwinslow/opendiscourse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenDiscourse

Python Version PostgreSQL License Build Status

OpenDiscourse is a comprehensive platform for analyzing and processing both media intelligence and government documents. The platform combines advanced data processing capabilities with modern web technologies to deliver intelligent document analysis and retrieval-augmented generation (RAG) capabilities.

Note: The repository structure was recently reorganized for better maintainability. See REORGANIZATION_SUMMARY.md for details about the new structure and migration guide.

Key Features

  • πŸ” Semantic Search: Vector-based similarity search with natural language query processing
  • πŸ“„ Document Processing: Multi-format document ingestion (PDF, DOC, TXT, etc.)
  • πŸ›οΈ Government Data: Automated GovInfo API integration and legislative document processing
  • πŸ€– RAG Capabilities: Question-answering over document corpus with contextual response generation
  • πŸ”— Entity Extraction: Named entity recognition and relationship mapping
  • πŸ“Š Analytics: Document analytics and usage insights
  • πŸš€ Scalable: Kubernetes-ready deployment with horizontal scaling
  • πŸ”’ Secure: Enterprise-grade security with comprehensive input validation

Quick Start

Prerequisites

  • Python 3.13+
  • PostgreSQL 14+ with pgvector extension
  • Docker (optional, for containerized deployment)
  • Node.js 18+ (for frontend development)

Installation

  1. Clone the repository:

    git clone <repo_url>
    cd opendiscourse
  2. Set up Python environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    pip install -r requirements-dev.txt
  3. Configure environment:

    cp config/environments/.env.example config/environments/.env
    # Edit .env with your database credentials and API keys
  4. Initialize database:

    python scripts/setup/init_db.py
  5. Run the application:

    python -m opendiscourse

Project Structure

opendiscourse/
β”œβ”€β”€ config/                    # Configuration files
β”‚   └── environments/         # Environment-specific settings
β”‚       β”œβ”€β”€ development.toml  # Development settings
β”‚       β”œβ”€β”€ production.toml   # Production settings
β”‚       └── testing.toml      # Testing settings
β”œβ”€β”€ data/                     # Data files (not version controlled)
β”‚   β”œβ”€β”€ raw/                  # Raw data
β”‚   └── processed/            # Processed data
β”œβ”€β”€ docs/                     # Documentation
β”‚   β”œβ”€β”€ api/                  # API documentation
β”‚   └── guides/               # Development guides
β”œβ”€β”€ infrastructure/           # Infrastructure as code
β”‚   β”œβ”€β”€ database/             # Database configurations
β”‚   β”œβ”€β”€ middleware/           # Middleware configurations
β”‚   └── networking/           # Network configurations
β”œβ”€β”€ opendiscourse/            # Main Python package
β”‚   β”œβ”€β”€ api/                  # API endpoints
β”‚   β”‚   β”œβ”€β”€ v1/               # API version 1
β”‚   β”‚   └── v2/               # API version 2
β”‚   β”œβ”€β”€ core/                 # Core functionality
β”‚   β”œβ”€β”€ db/                   # Database models and migrations
β”‚   β”œβ”€β”€ services/             # Business logic services
β”‚   β”‚   β”œβ”€β”€ scraping/         # Web scraping services
β”‚   β”‚   β”œβ”€β”€ search/           # Search functionality
β”‚   β”‚   └── storage/          # Data storage services
β”‚   └── utils/                # Utility functions
β”œβ”€β”€ scripts/                  # Utility scripts
β”‚   β”œβ”€β”€ checks/               # System health checks
β”‚   β”œβ”€β”€ database/             # Database maintenance
β”‚   β”œβ”€β”€ deployment/           # Deployment scripts
β”‚   └── setup/                # Setup and installation
└── tests/                    # Test suite
    β”œβ”€β”€ integration/          # Integration tests
    └── unit/                 # Unit tests
        β”œβ”€β”€ data/             # Test data
        └── mocks/            # Test mocks

Development Setup

  1. Clone the repository

  2. Install dependencies:

    pip install -r requirements-dev.txt
  3. Set up environment variables:

    cp config/environments/.env.example config/environments/.env
    # Edit the .env file with your configuration
  4. Run the development server:

    python -m opendiscourse

API Documentation

Core Endpoints

Document Management

  • POST /api/v1/documents/ - Upload and process documents
  • GET /api/v1/documents/{id} - Retrieve document by ID
  • GET /api/v1/documents/ - List documents with filtering
  • DELETE /api/v1/documents/{id} - Delete document

Search and RAG

  • POST /api/v1/search/semantic - Semantic search across documents
  • POST /api/v1/search/vector - Vector similarity search
  • POST /api/v1/rag/query - RAG-based question answering
  • GET /api/v1/rag/history - Query history

Government Data

  • POST /api/v1/govdata/ingest - Ingest government documents
  • GET /api/v1/govdata/sources - List available data sources
  • POST /api/v1/govdata/scrape - Trigger data scraping

API Authentication

All API endpoints require authentication. Include your API key in the header:

curl -H "Authorization: Bearer YOUR_API_KEY" \
     -H "Content-Type: application/json" \
     https://api.opendiscourse.com/v1/documents/

For detailed API documentation, visit /docs when running the server.

Deployment

Development Deployment

Use the provided Docker setup for quick development deployment:

# Using the development script
./run-dev.sh

# Or manually with Docker Compose
docker-compose up -d

The application will be available at:

Production Deployment

For production deployment with Kubernetes:

# Deploy to Kubernetes cluster
./deploy/deploy.sh

# Or deploy manually
kubectl apply -f k8s/

See README-DEPLOYMENT.md for comprehensive production deployment guides.

Environment Configuration

Configure the application using environment-specific TOML files:

# config/environments/production.toml
[database]
url = "postgresql://user:pass@host:5432/opendiscourse"
pool_size = 20
echo = false

[api]
host = "0.0.0.0"
port = 8000
workers = 4

[security]
secret_key = "your-secret-key"
api_key_header = "X-API-Key"

[features]
rag_enabled = true
vector_search = true
government_data = true

Testing

Run the complete test suite:

# Run all tests
pytest tests/

# Run with coverage
pytest --cov=opendiscourse tests/

# Run specific test categories
pytest tests/unit/          # Unit tests only
pytest tests/integration/   # Integration tests only

# Run performance tests
pytest tests/performance/

Test Configuration

Configure testing environment:

# Set test database URL
export TEST_DATABASE_URL="postgresql://test:test@localhost:5432/opendiscourse_test"

# Run tests with custom settings
pytest --env=testing tests/

Development Workflow

Code Quality Standards

Before submitting changes, ensure code quality:

# Format code
black opendiscourse/ tests/

# Type checking
mypy opendiscourse/

# Linting
flake8 opendiscourse/ tests/

# Import sorting
isort opendiscourse/ tests/

# Run all quality checks
./scripts/quality-check.sh

Pre-commit Hooks

Install pre-commit hooks for automatic code quality checks:

pre-commit install

Contributing

We welcome contributions! Please follow these guidelines:

Getting Started

  1. Fork the repository
  2. Create a feature branch:
    git checkout -b feature/your-feature-name
  3. Set up development environment:
    python -m venv .venv
    source .venv/bin/activate
    pip install -r requirements-dev.txt

Development Process

  1. Make your changes following the coding standards
  2. Add tests for new functionality
  3. Update documentation as needed
  4. Run the test suite:
    pytest tests/
  5. Check code quality:
    ./scripts/quality-check.sh

Submitting Changes

  1. Commit your changes:
    git add .
    git commit -m "feat: add new feature description"
  2. Push to your fork:
    git push origin feature/your-feature-name
  3. Create a Pull Request with:
    • Clear description of changes
    • Reference to related issues
    • Screenshots if applicable
    • Test results

Contribution Guidelines

  • Code Style: Follow PEP 8 and use Black formatter
  • Type Hints: All Python code must include type hints
  • Documentation: Update relevant documentation
  • Tests: Maintain or improve test coverage (minimum 80%)
  • Commit Messages: Use conventional commit format
  • Issue Tracking: Link PRs to relevant issues

Types of Contributions

  • πŸ› Bug Fixes: Fix existing functionality
  • ✨ Features: Add new functionality
  • πŸ“š Documentation: Improve documentation
  • πŸ”§ Maintenance: Code refactoring, dependency updates
  • πŸ§ͺ Testing: Improve test coverage
  • πŸš€ Performance: Optimize existing functionality

Support and Community

  • Issues: Report bugs and request features via GitHub Issues
  • Discussions: Join community discussions in GitHub Discussions
  • Documentation: Comprehensive docs available in the /docs directory
  • Examples: Check the /examples directory for usage examples

Roadmap

See PROJECT_PLAN.md for detailed development roadmap and feature planning.

Upcoming Features

  • v1.1.0: Enhanced document processing workflows
  • v1.2.0: NVIDIA NIM integration for production RAG
  • v1.3.0: Enhanced React frontend
  • v2.0.0: Microservices architecture and production scaling

Security

For security concerns, please email security@opendiscourse.com instead of opening a public issue.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


Maintained by: Development Team
Last Updated: June 26, 2025
Version: 1.0.1

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 11