Text2test is a web app that generates downloadable questions and answers from PDF textbooks uploaded by users, helping students prepare for exams You can upload any PDF book or document, and let AI create questions to improve your learning experience.
You can try the app for free at this link: Text2Test
- Chapter-Based Questions: Extract table of contents, select specific chapters, and generate targeted questions from chosen sections
- Topic-Based Questions: Input keywords or topics to generate questions from relevant content across the entire document
- Automatic text extraction with PyMuPDF
- Intelligent page numbering correction
- Table of contents detection and parsing
- Chapter boundary identification
- PDF preview and inspection tools
- LLM: Gemma2-12B-IT-4QAT model hosted on RunPod via Ollama
- Embeddings: SentenceTransformers (all-MiniLM-L6-v2) for semantic search
- Vector Database: ChromaDB for efficient content retrieval
- Smart chunking with configurable overlap for context preservation
- Generate downloadable Word documents (.docx) with all questions and answers organised by chapter or topic
- Streamlit
- PyMuPDF (fitz): PDF text extraction and page analysis
- ChromaDB: Vector database for semantic search and retrieval
- SentenceTransformers: Text embeddings for content similarity
- Python-docx: Word document generation
- Gemma2-12B-IT: Large language model for question generation
- Ollama: Model serving framework
- RunPod: GPU cloud infrastructure
- Docker: Containerized deployment
- Intelligent text chunking with sentence-level overlap
- Table of contents extraction and cleaning
- Chapter boundary detection
- Content preprocessing and optimization
text2test/
├── app/ # Main application
│ ├── main.py # Entry point and navigation
│ ├── pages/ # Multi-page interface
│ │ ├── 1_chapter_questions.py
│ │ ├── 2_topic_questions.py
│ │ └── 3_inspect_pdf.py
│ ├── backend/ # Core processing modules
│ │ ├── raw_text_processing.py # PDF extraction
│ │ ├── chromadb_utils.py # Vector database
│ │ ├── text_processing.py # Content chunking
│ │ ├── runpod_client.py # AI model interface
│ │ └── messages_templates.py # LLM prompts
│ ├── chromadb_model/ # Local embedding model
│ └── utils/ # Helper functions
- Upload PDF: Users upload their study material in PDF format
- Text Extraction: PyMuPDF extracts and processes text, identifying chapters and structure
- Content Indexing: Text is chunked and embedded using SentenceTransformers, stored in ChromaDB
- Question Generation:
- Chapter Mode: Extract TOC, let users select chapters, generate questions from specific content
- Topic Mode: Use semantic search to find relevant passages, generate focused questions
- AI Processing: Gemma2-12B-IT generates contextual questions and detailed answers
- Export: Download formatted questions as Word documents