Skip to content
/ HackRx6.0 Public

๐Ÿš€ A production-grade, multi-modal RAG API built for the HackRx 6.0 competition, deployed on AWS with a full CI/CD pipeline.

Notifications You must be signed in to change notification settings

B4K2/HackRx6.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

83 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

HackRx 6.0: A Production-Grade Intelligent RAG System

CI/CD Pipeline Status Python Version Framework Deployment

This repository contains a high-performance, intelligent RAG (Retrieval-Augmented Generation) system, developed as a submission for the HackRx 6.0 competition. While the project placed 42nd out of a highly competitive field, the resulting architecture represents a robust, production-ready application that showcases advanced AI engineering and DevOps practices.

The system is designed to process complex, multi-modal documents from a URL, answer a series of questions with high accuracy and low latency, and is deployed on AWS via a fully automated CI/CD pipeline.

โœจ Core Features

  • Multi-Modal Document Processing: Capable of intelligently parsing complex documents, including PDFs, DOCX, PPTX (with image OCR), XLSX, and even recursively scanning ZIP archives to find and process the most relevant file.
  • Hybrid-Cloud AI Strategy: Leverages a "best-of-breed" approach, using Amazon Titan Embeddings V2 for high-throughput, accurate semantic search and Google Gemini for its state-of-the-art reasoning and content generation capabilities.
  • Dynamic Three-Tier Processing Engine: The system intelligently analyzes each incoming request and routes it to the most efficient processing tier, dramatically optimizing for speed and accuracy.
  • Advanced RAG Techniques: Implements a sophisticated RAG pipeline for large documents, including dynamic question classification and hypothetical query generation (HyDE) to improve retrieval accuracy on complex questions.
  • Fully Asynchronous & Parallelized: Built on asyncio, the entire pipeline is non-blocking. Document ingestion (chunk embedding) and question answering are performed in parallel, ensuring maximum performance.
  • Automated CI/CD Pipeline on AWS: The application is containerized with Docker and automatically tested and deployed to AWS App Runner using GitHub Actions on every push to the main branch.

๐Ÿ›๏ธ Architectural Deep Dive

This system is more than a simple script; it's a resilient, scalable service. The two key architectural patterns are its processing engine and its model resilience strategy.

1. The Three-Tiered Processing Architecture

To ensure optimal performance, the system does not use a one-size-fits-all approach. It dynamically selects one of three tiers based on the input:

  • Tier 1: Agentic Path (for API tasks)

    • Trigger: The system detects if a request is not a document query but a direct instruction (e.g., "go to this URL and find the token").
    • Action: Bypasses the RAG pipeline entirely and uses Gemini's Tool Calling feature to directly interact with web resources and solve the task.
  • Tier 2: Full-Context Path (for small documents)

    • Trigger: A new document is ingested and found to be under a size threshold (e.g., 50,000 characters).
    • Action: Skips the expensive RAG process (chunking, embedding, vector search). Instead, it loads the entire document text into the LLM's context window for a single, comprehensive Q&A call. This is faster and more accurate for smaller files.
  • Tier 3: High-Accuracy RAG Pipeline (for large documents)

    • Trigger: The document is too large for the context window.
    • Action: Engages the full RAG pipeline, including parallelized embedding with AWS Titan, upsert to Pinecone, and parallelized question answering with Google Gemini.

2. Dual Model Strategy & Resilience

To ensure the service is robust against temporary API failures, the answer generation step is designed with a fallback mechanism:

  • Primary Model: All generation requests are first sent to the primary LLM (e.g., gemini-2.5-flash-lite).
  • Fallback Model: If the primary model returns a server-side error (like a 500), the system automatically re-tries the request using a secondary, highly reliable model (e.g., gemini-2.5-flash). This makes the application resilient to transient issues with a specific model API.

๐Ÿ› ๏ธ Tech Stack

Category Technology
Backend FastAPI, Uvicorn
AI / ML Google Gemini, AWS Titan Embeddings V2, Pinecone
Data Processing Pydantic, PyPDF, Docx, Pytesseract (OCR)
DevOps Docker, GitHub Actions, AWS App Runner, Amazon ECR

๐Ÿš€ Setup and Local Development

  1. Clone the repository:

    git clone https://github.com/B4K2/HackRx6.0.git
    cd HackRx6.0
  2. Create a virtual environment and install dependencies:

    • This project uses pyproject.toml for dependency management.
    python -m venv .venv
    source .venv/bin/activate
    pip install .
  3. Configure Environment Variables:

    • Create a .env file in the project root by copying the example:
    cp .env.example .env
    • Fill in the required API keys and settings in the .env file.
  4. Run the application:

    uvicorn app.main:app --reload

    The API will be available at http://127.0.0.1:8000.

๐Ÿ”„ CI/CD Pipeline

This project is configured with a complete CI/CD pipeline using GitHub Actions. The workflow is defined in .github/workflows/deploy.yml and performs the following steps on every push to the main branch:

  1. Run Tests: Installs dependencies and runs pytest to ensure the application is healthy.
  2. Build Docker Image: Builds a new, clean container image of the application.
  3. Push to ECR: Tags the image with the commit SHA and pushes it to a private Amazon ECR repository.
  4. Deploy to App Runner: Triggers a new deployment on the AWS App Runner service, updating the application to the latest version.

About

๐Ÿš€ A production-grade, multi-modal RAG API built for the HackRx 6.0 competition, deployed on AWS with a full CI/CD pipeline.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published