Document Chat & Summary Application

A RAG-based application that allows you to upload documents, chat with them using natural language, and generate intelligent summaries.

🚀 Quick Start

Prerequisites

Python 3.9+ (recommended: Python 3.11)

Option 1: One-Command Setup (Recommended)

Clone the repository:

git clone <your-repo-url>
cd ChatWithDocs

Make the run script executable:
```
chmod +x run.sh
```
Run the application:
```
./run.sh
```

That's it! The script will:

✅ Create a virtual environment
✅ Install all dependencies
✅ Start the backend API server
✅ Launch the Streamlit frontend
✅ Open your browser automatically

Option 2: Manual Setup

If you prefer to set up manually or encounter issues with the script:

Step 1: Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate it
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

Step 2: Install Dependencies

# Install backend dependencies
cd backend
pip install -r requirements.txt

# Install frontend dependencies
cd ../frontend
pip install -r requirements.txt

Step 3: Start Services

Terminal 1 - Backend:

cd backend
python main.py

Terminal 2 - Frontend:

cd frontend
streamlit run app.py

🔧 Configuration

LLM Provider Setup

Choose one of the following options:

Option A: OpenAI (Recommended for best results)

Get an API key from OpenAI
In the Streamlit interface:
- Select "openai" as provider
- Choose your model (gpt-4, gpt-3.5-turbo)
- Enter your API key
- Click "Configure LLM"

Option B: Local LLM with Ollama (Free, runs offline)

Install Ollama:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Windows: Download from https://ollama.ai

Start Ollama and pull a model:

# Start Ollama service
ollama serve

# In another terminal, pull a model
ollama pull llama3     # or mistral, phi3, codellama

Configure in Streamlit:
- Select "ollama" as provider
- Choose your model (llama3, mistral, etc.)
- Click "Configure LLM"

📖 How to Use

1. Upload Documents

Go to the Upload tab
Drag & drop or select files (PDF, DOCX, TXT)
Click "Upload and Process"
Wait for processing to complete

2. Chat with Documents

Go to the Chat tab
Select a document from the sidebar
Ask questions like:
- "What is this document about?"
- "What are the main conclusions?"
- "Explain the methodology used"
- "Find information about X"

3. Generate Summaries

Go to the Summary tab
Select document and summary type:
- General: Main points for general audience
- Executive: Business-focused insights
- Technical: Detailed technical summary
- Bullet Points: Easy-to-scan format
Choose length (100-1000 words)
Click "Generate Summary"

4. Monitor Analytics

Go to the Analytics tab
View document statistics
Check system health
Monitor chat history

🌐 Access Points

Once running, access the application at:

Frontend (Streamlit): http://localhost:8501
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs

🔍 Troubleshooting

Common Issues

"ModuleNotFoundError" or Import Errors

# Ensure you're in the virtual environment
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Reinstall dependencies
pip install -r backend/requirements.txt
pip install -r frontend/requirements.txt

"Port already in use"

# Kill existing processes
lsof -ti:8000 | xargs kill -9  # Backend
lsof -ti:8501 | xargs kill -9  # Frontend

# Or use different ports
streamlit run app.py --server.port 8502

"Cannot connect to backend"

Check if backend is running: http://localhost:8000/health
Ensure both services are running
Check firewall settings

ChromaDB/Vector Store Issues

# Clear vector database if corrupted
rm -rf backend/chroma_db/
rm backend/app_database.db

# Restart application
./run.sh

LLM Configuration Issues

OpenAI: Verify API key is valid and has credits
Ollama: Ensure Ollama service is running (ollama serve)
Check the configuration in Streamlit sidebar

Debug Mode

Enable debug information in the Streamlit sidebar:

Check "Show Debug Info"
Click "Debug Backend Storage" to see document status
Click "Debug Vector Store" to check embeddings

Performance Tips

For large documents: Increase chunk size in settings
For better results: Use OpenAI models (gpt-4)
For privacy: Use local Ollama models
For speed: Use smaller models (gpt-3.5-turbo)

📁 Project Structure

ChatWithDocs/
├── run.sh                 # Main startup script
├── backend/
│   ├── main.py           # FastAPI server
│   ├── database.py       # SQLite database
│   ├── requirements.txt  # Python dependencies
│   ├── services/         # Business logic
│   └── models/           # Data models
├── frontend/
│   ├── app.py            # Streamlit interface
│   └── requirements.txt  # Frontend dependencies
├── chroma_db/            # Vector database (auto-created)
├── uploads/              # Temporary file storage
└── .gitignore           # Git ignore rules

Environment Variables

Create a .env file in the backend directory:

# OpenAI Configuration
OPENAI_API_KEY=your_api_key_here

# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434

# Database Configuration
DATABASE_URL=sqlite:///app_database.db

# Vector Store Configuration
CHROMA_PERSIST_DIRECTORY=./chroma_db

Customizing Settings

Edit configuration in the service files:

Chunk size: services/document_processor.py
Similarity threshold: services/chat_service.py
Model parameters: services/llm_service.py

🎯 Features

✅ Multi-format support: PDF, Word, TXT files
✅ Intelligent chat: RAG-based document interaction
✅ Smart summaries: Multiple summary styles
✅ Dual LLM support: OpenAI API + Local Ollama
✅ Vector search: Semantic similarity matching
✅ Conversation memory: Multi-turn chat history
✅ Source attribution: See which parts of documents were used
✅ Real-time processing: Instant document analysis
✅ Privacy options: Local-only processing with Ollama

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
run.sh		run.sh

ParijatSoftware/ChatWithDocs

Folders and files

Latest commit

History

Repository files navigation