OpenDiscourse is a comprehensive platform for analyzing and processing both media intelligence and government documents. The platform combines advanced data processing capabilities with modern web technologies to deliver intelligent document analysis and retrieval-augmented generation (RAG) capabilities.
Note: The repository structure was recently reorganized for better maintainability. See REORGANIZATION_SUMMARY.md for details about the new structure and migration guide.
- π Semantic Search: Vector-based similarity search with natural language query processing
- π Document Processing: Multi-format document ingestion (PDF, DOC, TXT, etc.)
- ποΈ Government Data: Automated GovInfo API integration and legislative document processing
- π€ RAG Capabilities: Question-answering over document corpus with contextual response generation
- π Entity Extraction: Named entity recognition and relationship mapping
- π Analytics: Document analytics and usage insights
- π Scalable: Kubernetes-ready deployment with horizontal scaling
- π Secure: Enterprise-grade security with comprehensive input validation
- Python 3.13+
- PostgreSQL 14+ with pgvector extension
- Docker (optional, for containerized deployment)
- Node.js 18+ (for frontend development)
-
Clone the repository:
git clone <repo_url> cd opendiscourse
-
Set up Python environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements-dev.txt
-
Configure environment:
cp config/environments/.env.example config/environments/.env # Edit .env with your database credentials and API keys -
Initialize database:
python scripts/setup/init_db.py
-
Run the application:
python -m opendiscourse
opendiscourse/
βββ config/ # Configuration files
β βββ environments/ # Environment-specific settings
β βββ development.toml # Development settings
β βββ production.toml # Production settings
β βββ testing.toml # Testing settings
βββ data/ # Data files (not version controlled)
β βββ raw/ # Raw data
β βββ processed/ # Processed data
βββ docs/ # Documentation
β βββ api/ # API documentation
β βββ guides/ # Development guides
βββ infrastructure/ # Infrastructure as code
β βββ database/ # Database configurations
β βββ middleware/ # Middleware configurations
β βββ networking/ # Network configurations
βββ opendiscourse/ # Main Python package
β βββ api/ # API endpoints
β β βββ v1/ # API version 1
β β βββ v2/ # API version 2
β βββ core/ # Core functionality
β βββ db/ # Database models and migrations
β βββ services/ # Business logic services
β β βββ scraping/ # Web scraping services
β β βββ search/ # Search functionality
β β βββ storage/ # Data storage services
β βββ utils/ # Utility functions
βββ scripts/ # Utility scripts
β βββ checks/ # System health checks
β βββ database/ # Database maintenance
β βββ deployment/ # Deployment scripts
β βββ setup/ # Setup and installation
βββ tests/ # Test suite
βββ integration/ # Integration tests
βββ unit/ # Unit tests
βββ data/ # Test data
βββ mocks/ # Test mocks
-
Clone the repository
-
Install dependencies:
pip install -r requirements-dev.txt
-
Set up environment variables:
cp config/environments/.env.example config/environments/.env # Edit the .env file with your configuration -
Run the development server:
python -m opendiscourse
POST /api/v1/documents/- Upload and process documentsGET /api/v1/documents/{id}- Retrieve document by IDGET /api/v1/documents/- List documents with filteringDELETE /api/v1/documents/{id}- Delete document
POST /api/v1/search/semantic- Semantic search across documentsPOST /api/v1/search/vector- Vector similarity searchPOST /api/v1/rag/query- RAG-based question answeringGET /api/v1/rag/history- Query history
POST /api/v1/govdata/ingest- Ingest government documentsGET /api/v1/govdata/sources- List available data sourcesPOST /api/v1/govdata/scrape- Trigger data scraping
All API endpoints require authentication. Include your API key in the header:
curl -H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
https://api.opendiscourse.com/v1/documents/For detailed API documentation, visit /docs when running the server.
Use the provided Docker setup for quick development deployment:
# Using the development script
./run-dev.sh
# Or manually with Docker Compose
docker-compose up -dThe application will be available at:
- Web Interface: http://localhost:3000
- API: http://localhost:3000/api
- Documentation: http://localhost:3000/docs
For production deployment with Kubernetes:
# Deploy to Kubernetes cluster
./deploy/deploy.sh
# Or deploy manually
kubectl apply -f k8s/See README-DEPLOYMENT.md for comprehensive production deployment guides.
Configure the application using environment-specific TOML files:
# config/environments/production.toml
[database]
url = "postgresql://user:pass@host:5432/opendiscourse"
pool_size = 20
echo = false
[api]
host = "0.0.0.0"
port = 8000
workers = 4
[security]
secret_key = "your-secret-key"
api_key_header = "X-API-Key"
[features]
rag_enabled = true
vector_search = true
government_data = trueRun the complete test suite:
# Run all tests
pytest tests/
# Run with coverage
pytest --cov=opendiscourse tests/
# Run specific test categories
pytest tests/unit/ # Unit tests only
pytest tests/integration/ # Integration tests only
# Run performance tests
pytest tests/performance/Configure testing environment:
# Set test database URL
export TEST_DATABASE_URL="postgresql://test:test@localhost:5432/opendiscourse_test"
# Run tests with custom settings
pytest --env=testing tests/Before submitting changes, ensure code quality:
# Format code
black opendiscourse/ tests/
# Type checking
mypy opendiscourse/
# Linting
flake8 opendiscourse/ tests/
# Import sorting
isort opendiscourse/ tests/
# Run all quality checks
./scripts/quality-check.shInstall pre-commit hooks for automatic code quality checks:
pre-commit installWe welcome contributions! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name
- Set up development environment:
python -m venv .venv source .venv/bin/activate pip install -r requirements-dev.txt
- Make your changes following the coding standards
- Add tests for new functionality
- Update documentation as needed
- Run the test suite:
pytest tests/
- Check code quality:
./scripts/quality-check.sh
- Commit your changes:
git add . git commit -m "feat: add new feature description"
- Push to your fork:
git push origin feature/your-feature-name
- Create a Pull Request with:
- Clear description of changes
- Reference to related issues
- Screenshots if applicable
- Test results
- Code Style: Follow PEP 8 and use Black formatter
- Type Hints: All Python code must include type hints
- Documentation: Update relevant documentation
- Tests: Maintain or improve test coverage (minimum 80%)
- Commit Messages: Use conventional commit format
- Issue Tracking: Link PRs to relevant issues
- π Bug Fixes: Fix existing functionality
- β¨ Features: Add new functionality
- π Documentation: Improve documentation
- π§ Maintenance: Code refactoring, dependency updates
- π§ͺ Testing: Improve test coverage
- π Performance: Optimize existing functionality
- Issues: Report bugs and request features via GitHub Issues
- Discussions: Join community discussions in GitHub Discussions
- Documentation: Comprehensive docs available in the
/docsdirectory - Examples: Check the
/examplesdirectory for usage examples
See PROJECT_PLAN.md for detailed development roadmap and feature planning.
- v1.1.0: Enhanced document processing workflows
- v1.2.0: NVIDIA NIM integration for production RAG
- v1.3.0: Enhanced React frontend
- v2.0.0: Microservices architecture and production scaling
For security concerns, please email security@opendiscourse.com instead of opening a public issue.
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with FastAPI for the API framework
- PostgreSQL with pgvector for vector search
- Sentence Transformers for embeddings generation
- React for the frontend interface
Maintained by: Development Team
Last Updated: June 26, 2025
Version: 1.0.1