Multi Scrapper AI is an advanced, Streamlit-powered content intelligence tool that can scrape, parse, summarize, and analyze data from:
✔ YouTube Videos
✔ PDF Documents
✔ Websites/HTML Pages
It uses Gemini 2.0 Flash and Ollama-based LLM parsing to help users extract insights, answer questions, and create summaries—turning long content into instant knowledge. Built for students, researchers, developers, and general users who want quick insights from long content without manual reading.
Extracts transcripts using YouTube Transcript API Detects all available transcript languages automatically Summarizes using Gemini 2.0 Flash Clean, structured output with subheadings Thumbnail preview + Download Summary option
Extract text using PDFMiner / PyPDF2 High-accuracy extraction even from complex PDFs Shows full extracted text in an expandable view Supports deep question-answering with Ollama
Scrapes webpage content using: requests BeautifulSoup Extracts only relevant content Cleans, splits, and prepares DOM text AI parsing using Ollama models Useful for SEO research, content extraction, competitor analysis, etc.
YouTube transcript summarization Fast and accurate content understanding Ollama LLM
Website content parsing PDF question answering Custom extraction tasks
Python Streamlit Requests BeautifulSoup YoutubeTranscriptApi PDFMiner / PyPDF2 Langcodes
Gemini 2.0 Flash (Google Generative AI) Ollama Models (Local/Hosted)ed data.
multi-scrapper-ai/ ├── main.py # Streamlit application (UI + logic) ├── scrape.py # Web scraping utilities ├── parse.py # Ollama-based content parsing ├── requirements.txt # Python dependencies └── README.md # Project documentation
git clone https://github.com/A4xPraddy/MultiScrapperGenAi.git
pip install -r requirements.txt
streamlit run main.py
YouTube link PDF file Website URL
Fetch transcript Extract PDF text Scrape website body
Generate structured summaries Extract custom information Answer user questions Convert raw text into knowledge
Summaries Parsed results Extracted content
✔ Students preparing notes
✔ Researchers analyzing multiple sources
✔ Content writers or SEO analysts
✔ Journalists fact-checking content
✔ Developers building AI-assisted tools
✔ Anyone wanting quick understanding of long content