OpenDiscourse

OpenDiscourse is a comprehensive platform for analyzing and processing both media intelligence and government documents. The platform combines advanced data processing capabilities with modern web technologies to deliver intelligent document analysis and retrieval-augmented generation (RAG) capabilities.

Note: The repository structure was recently reorganized for better maintainability. See REORGANIZATION_SUMMARY.md for details about the new structure and migration guide.

Key Features

🔍 Semantic Search: Vector-based similarity search with natural language query processing
📄 Document Processing: Multi-format document ingestion (PDF, DOC, TXT, etc.)
🏛️ Government Data: Automated GovInfo API integration and legislative document processing
🤖 RAG Capabilities: Question-answering over document corpus with contextual response generation
🔗 Entity Extraction: Named entity recognition and relationship mapping
📊 Analytics: Document analytics and usage insights
🚀 Scalable: Kubernetes-ready deployment with horizontal scaling
🔒 Secure: Enterprise-grade security with comprehensive input validation

Quick Start

Prerequisites

Python 3.13+
PostgreSQL 14+ with pgvector extension
Docker (optional, for containerized deployment)
Node.js 18+ (for frontend development)

Installation

Clone the repository:
```
git clone <repo_url>
cd opendiscourse
```

Set up Python environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements-dev.txt

Configure environment:

cp config/environments/.env.example config/environments/.env
# Edit .env with your database credentials and API keys

Initialize database:
```
python scripts/setup/init_db.py
```
Run the application:
```
python -m opendiscourse
```

Project Structure

opendiscourse/
├── config/                    # Configuration files
│   └── environments/         # Environment-specific settings
│       ├── development.toml  # Development settings
│       ├── production.toml   # Production settings
│       └── testing.toml      # Testing settings
├── data/                     # Data files (not version controlled)
│   ├── raw/                  # Raw data
│   └── processed/            # Processed data
├── docs/                     # Documentation
│   ├── api/                  # API documentation
│   └── guides/               # Development guides
├── infrastructure/           # Infrastructure as code
│   ├── database/             # Database configurations
│   ├── middleware/           # Middleware configurations
│   └── networking/           # Network configurations
├── opendiscourse/            # Main Python package
│   ├── api/                  # API endpoints
│   │   ├── v1/               # API version 1
│   │   └── v2/               # API version 2
│   ├── core/                 # Core functionality
│   ├── db/                   # Database models and migrations
│   ├── services/             # Business logic services
│   │   ├── scraping/         # Web scraping services
│   │   ├── search/           # Search functionality
│   │   └── storage/          # Data storage services
│   └── utils/                # Utility functions
├── scripts/                  # Utility scripts
│   ├── checks/               # System health checks
│   ├── database/             # Database maintenance
│   ├── deployment/           # Deployment scripts
│   └── setup/                # Setup and installation
└── tests/                    # Test suite
    ├── integration/          # Integration tests
    └── unit/                 # Unit tests
        ├── data/             # Test data
        └── mocks/            # Test mocks

Development Setup

Clone the repository
Install dependencies:
```
pip install -r requirements-dev.txt
```

Set up environment variables:

cp config/environments/.env.example config/environments/.env
# Edit the .env file with your configuration

Run the development server:
```
python -m opendiscourse
```

API Documentation

Core Endpoints

Document Management

POST /api/v1/documents/ - Upload and process documents
GET /api/v1/documents/{id} - Retrieve document by ID
GET /api/v1/documents/ - List documents with filtering
DELETE /api/v1/documents/{id} - Delete document

Search and RAG

POST /api/v1/search/semantic - Semantic search across documents
POST /api/v1/search/vector - Vector similarity search
POST /api/v1/rag/query - RAG-based question answering
GET /api/v1/rag/history - Query history

Government Data

POST /api/v1/govdata/ingest - Ingest government documents
GET /api/v1/govdata/sources - List available data sources
POST /api/v1/govdata/scrape - Trigger data scraping

API Authentication

All API endpoints require authentication. Include your API key in the header:

curl -H "Authorization: Bearer YOUR_API_KEY" \
     -H "Content-Type: application/json" \
     https://api.opendiscourse.com/v1/documents/

For detailed API documentation, visit /docs when running the server.

Deployment

Development Deployment

Use the provided Docker setup for quick development deployment:

# Using the development script
./run-dev.sh

# Or manually with Docker Compose
docker-compose up -d

The application will be available at:

Web Interface: http://localhost:3000
API: http://localhost:3000/api
Documentation: http://localhost:3000/docs

Production Deployment

For production deployment with Kubernetes:

# Deploy to Kubernetes cluster
./deploy/deploy.sh

# Or deploy manually
kubectl apply -f k8s/

See README-DEPLOYMENT.md for comprehensive production deployment guides.

Environment Configuration

Configure the application using environment-specific TOML files:

# config/environments/production.toml
[database]
url = "postgresql://user:pass@host:5432/opendiscourse"
pool_size = 20
echo = false

[api]
host = "0.0.0.0"
port = 8000
workers = 4

[security]
secret_key = "your-secret-key"
api_key_header = "X-API-Key"

[features]
rag_enabled = true
vector_search = true
government_data = true

Testing

Run the complete test suite:

# Run all tests
pytest tests/

# Run with coverage
pytest --cov=opendiscourse tests/

# Run specific test categories
pytest tests/unit/          # Unit tests only
pytest tests/integration/   # Integration tests only

# Run performance tests
pytest tests/performance/

Test Configuration

Configure testing environment:

# Set test database URL
export TEST_DATABASE_URL="postgresql://test:test@localhost:5432/opendiscourse_test"

# Run tests with custom settings
pytest --env=testing tests/

Development Workflow

Code Quality Standards

Before submitting changes, ensure code quality:

# Format code
black opendiscourse/ tests/

# Type checking
mypy opendiscourse/

# Linting
flake8 opendiscourse/ tests/

# Import sorting
isort opendiscourse/ tests/

# Run all quality checks
./scripts/quality-check.sh

Pre-commit Hooks

Install pre-commit hooks for automatic code quality checks:

pre-commit install

Contributing

We welcome contributions! Please follow these guidelines:

Getting Started

Fork the repository

Create a feature branch:

git checkout -b feature/your-feature-name

Set up development environment:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt

Development Process

Make your changes following the coding standards
Add tests for new functionality
Update documentation as needed
Run the test suite:
```
pytest tests/
```
Check code quality:
```
./scripts/quality-check.sh
```

Submitting Changes

Commit your changes:

git add .
git commit -m "feat: add new feature description"

Push to your fork:

git push origin feature/your-feature-name

Create a Pull Request with:
- Clear description of changes
- Reference to related issues
- Screenshots if applicable
- Test results

Contribution Guidelines

Code Style: Follow PEP 8 and use Black formatter
Type Hints: All Python code must include type hints
Documentation: Update relevant documentation
Tests: Maintain or improve test coverage (minimum 80%)
Commit Messages: Use conventional commit format
Issue Tracking: Link PRs to relevant issues

Types of Contributions

🐛 Bug Fixes: Fix existing functionality
✨ Features: Add new functionality
📚 Documentation: Improve documentation
🔧 Maintenance: Code refactoring, dependency updates
🧪 Testing: Improve test coverage
🚀 Performance: Optimize existing functionality

Support and Community

Issues: Report bugs and request features via GitHub Issues
Discussions: Join community discussions in GitHub Discussions
Documentation: Comprehensive docs available in the /docs directory
Examples: Check the /examples directory for usage examples

Roadmap

See PROJECT_PLAN.md for detailed development roadmap and feature planning.

Upcoming Features

v1.1.0: Enhanced document processing workflows
v1.2.0: NVIDIA NIM integration for production RAG
v1.3.0: Enhanced React frontend
v2.0.0: Microservices architecture and production scaling

Security

For security concerns, please email security@opendiscourse.com instead of opening a public issue.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with FastAPI for the API framework
PostgreSQL with pgvector for vector search
Sentence Transformers for embeddings generation
React for the frontend interface

Maintained by: Development Team
Last Updated: June 26, 2025
Version: 1.0.1

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
.github		.github
.idx		.idx
.lsp/.cache		.lsp/.cache
.vscode		.vscode
_completed		_completed
ansible		ansible
api		api
config		config
congress.gov		congress.gov
data/raw		data/raw
diagnostic_reports		diagnostic_reports
diagnostic_tools/linux_system_diagnostics		diagnostic_tools/linux_system_diagnostics
docker/infrastructure		docker/infrastructure
docs		docs
document_processing		document_processing
documentation		documentation
examples		examples
flow-framework		flow-framework
github_sync_package		github_sync_package
govinfo		govinfo
k8s		k8s
monitoring		monitoring
nlp		nlp
opendiscourse-docs		opendiscourse-docs
opendiscourse		opendiscourse
opendiscourse_sdk		opendiscourse_sdk
opensearch-api-specification		opensearch-api-specification
rag		rag
requirements		requirements
scripts		scripts
search_engine		search_engine
server		server
sites		sites
src		src
supabase/docker/volumes/api		supabase/docker/volumes/api
templates		templates
terraform		terraform
tests		tests
vector_store		vector_store
web		web
webui		webui
${input:duckdb_db_path}		${input:duckdb_db_path}
.babelrc		.babelrc
.copilot-instructions.md		.copilot-instructions.md
.devcontainer.json		.devcontainer.json
.editorconfig		.editorconfig
.env.example		.env.example
.env.test		.env.test
.gitignore		.gitignore
.gitignore-nextjs		.gitignore-nextjs
.python-version		.python-version
.version-config.json		.version-config.json
.windsurfrules		.windsurfrules
CHANGELOG.md		CHANGELOG.md
CODEBASE_REVIEW.md		CODEBASE_REVIEW.md
CODE_REVIEW_NOTES.md		CODE_REVIEW_NOTES.md
CODING_STANDARDS.md		CODING_STANDARDS.md
CONTRIBUTING.md		CONTRIBUTING.md
DIFF_20251109_030348.md		DIFF_20251109_030348.md
ISSUES_FROM_REVIEW.md		ISSUES_FROM_REVIEW.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RECOMMENDATIONS_20251109_030359.md		RECOMMENDATIONS_20251109_030359.md
REVIEW_SUMMARY.md		REVIEW_SUMMARY.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

cbwinslow/opendiscourse

Folders and files

Latest commit

History

Repository files navigation

OpenDiscourse

Key Features

Quick Start

Prerequisites

Installation

Project Structure

Development Setup

API Documentation

Core Endpoints

Document Management

Search and RAG

Government Data

API Authentication

Deployment

Development Deployment

Production Deployment

Environment Configuration

Testing

Test Configuration

Development Workflow

Code Quality Standards

Pre-commit Hooks

Contributing

Getting Started

Development Process

Submitting Changes

Contribution Guidelines

Types of Contributions

Support and Community

Roadmap

Upcoming Features

Security

License

Acknowledgments

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 11

Uh oh!

Languages

Packages