This repository contains a high-performance, intelligent RAG (Retrieval-Augmented Generation) system, developed as a submission for the HackRx 6.0 competition. While the project placed 42nd out of a highly competitive field, the resulting architecture represents a robust, production-ready application that showcases advanced AI engineering and DevOps practices.
The system is designed to process complex, multi-modal documents from a URL, answer a series of questions with high accuracy and low latency, and is deployed on AWS via a fully automated CI/CD pipeline.
- Multi-Modal Document Processing: Capable of intelligently parsing complex documents, including PDFs, DOCX, PPTX (with image OCR), XLSX, and even recursively scanning ZIP archives to find and process the most relevant file.
- Hybrid-Cloud AI Strategy: Leverages a "best-of-breed" approach, using Amazon Titan Embeddings V2 for high-throughput, accurate semantic search and Google Gemini for its state-of-the-art reasoning and content generation capabilities.
- Dynamic Three-Tier Processing Engine: The system intelligently analyzes each incoming request and routes it to the most efficient processing tier, dramatically optimizing for speed and accuracy.
- Advanced RAG Techniques: Implements a sophisticated RAG pipeline for large documents, including dynamic question classification and hypothetical query generation (HyDE) to improve retrieval accuracy on complex questions.
- Fully Asynchronous & Parallelized: Built on
asyncio, the entire pipeline is non-blocking. Document ingestion (chunk embedding) and question answering are performed in parallel, ensuring maximum performance. - Automated CI/CD Pipeline on AWS: The application is containerized with Docker and automatically tested and deployed to AWS App Runner using GitHub Actions on every push to the
mainbranch.
This system is more than a simple script; it's a resilient, scalable service. The two key architectural patterns are its processing engine and its model resilience strategy.
To ensure optimal performance, the system does not use a one-size-fits-all approach. It dynamically selects one of three tiers based on the input:
-
Tier 1: Agentic Path (for API tasks)
- Trigger: The system detects if a request is not a document query but a direct instruction (e.g., "go to this URL and find the token").
- Action: Bypasses the RAG pipeline entirely and uses Gemini's Tool Calling feature to directly interact with web resources and solve the task.
-
Tier 2: Full-Context Path (for small documents)
- Trigger: A new document is ingested and found to be under a size threshold (e.g., 50,000 characters).
- Action: Skips the expensive RAG process (chunking, embedding, vector search). Instead, it loads the entire document text into the LLM's context window for a single, comprehensive Q&A call. This is faster and more accurate for smaller files.
-
Tier 3: High-Accuracy RAG Pipeline (for large documents)
- Trigger: The document is too large for the context window.
- Action: Engages the full RAG pipeline, including parallelized embedding with AWS Titan, upsert to Pinecone, and parallelized question answering with Google Gemini.
To ensure the service is robust against temporary API failures, the answer generation step is designed with a fallback mechanism:
- Primary Model: All generation requests are first sent to the primary LLM (e.g.,
gemini-2.5-flash-lite). - Fallback Model: If the primary model returns a server-side error (like a
500), the system automatically re-tries the request using a secondary, highly reliable model (e.g.,gemini-2.5-flash). This makes the application resilient to transient issues with a specific model API.
| Category | Technology |
|---|---|
| Backend | FastAPI, Uvicorn |
| AI / ML | Google Gemini, AWS Titan Embeddings V2, Pinecone |
| Data Processing | Pydantic, PyPDF, Docx, Pytesseract (OCR) |
| DevOps | Docker, GitHub Actions, AWS App Runner, Amazon ECR |
-
Clone the repository:
git clone https://github.com/B4K2/HackRx6.0.git cd HackRx6.0 -
Create a virtual environment and install dependencies:
- This project uses
pyproject.tomlfor dependency management.
python -m venv .venv source .venv/bin/activate pip install .
- This project uses
-
Configure Environment Variables:
- Create a
.envfile in the project root by copying the example:
cp .env.example .env
- Fill in the required API keys and settings in the
.envfile.
- Create a
-
Run the application:
uvicorn app.main:app --reload
The API will be available at
http://127.0.0.1:8000.
This project is configured with a complete CI/CD pipeline using GitHub Actions. The workflow is defined in .github/workflows/deploy.yml and performs the following steps on every push to the main branch:
- Run Tests: Installs dependencies and runs
pytestto ensure the application is healthy. - Build Docker Image: Builds a new, clean container image of the application.
- Push to ECR: Tags the image with the commit SHA and pushes it to a private Amazon ECR repository.
- Deploy to App Runner: Triggers a new deployment on the AWS App Runner service, updating the application to the latest version.