This script scrapes Valorant guide clips from tracker.gg and saves them in a structured JSON format.
- Scrapes all pages of Valorant guide clips from tracker.gg
- Automatically converts map names to map IDs
- Automatically converts agent names to agent IDs
- Automatically converts team names to team IDs
- Removes map, team, and agent tags from the tags list
- Saves the data in a structured JSON format
- Multiple scraping methods to bypass anti-scraping measures
- Install the required dependencies:
pip install -r requirements.txtRun the combined scraper which will automatically try different methods:
python combined_scraper.pyThis script will:
- Try the API-based method first (most reliable)
- If that fails, try the CloudScraper method
- If that fails, try the Selenium method
- Use the successful method to scrape all pages
- Save the results to
output/tracker_clips.json
If you want to use a specific scraping method:
python api_scraper.pypython tracker_scraper.pypython selenium_scraper.pyThe script outputs a JSON file with the following structure:
[
{
"title": "Clip Title",
"description": "Clip Description",
"tags": ["Tag1", "Tag2"],
"mapID": "MapID",
"teamID": "TeamID",
"agentID": "AgentID",
"videoURL": "VideoURL",
"thumbnailURL": "ThumbnailURL",
"author": "Author",
"sourceURL": "SourceURL"
},
...
]Note: Fields that are not available or empty will not be included in the output.
- Directly accesses the tracker.gg API
- Most reliable method
- Fastest performance
- Least likely to be blocked
- Uses specialized library to bypass Cloudflare protection
- Parses HTML content
- Medium reliability
- Uses a real browser to render the page
- Most resource-intensive
- Handles JavaScript-rendered content
- Slowest but most thorough
If you encounter issues with the scraper:
- Check the HTML and JSON files saved in the
outputdirectory to see what the scraper is actually receiving - Try running the Selenium scraper with
headless=Falseto see the browser in action - Adjust the delay between requests if you're getting rate limited
- Make sure you have the latest Chrome browser installed for Selenium
- If all methods fail, the website structure may have changed - check for updates to this scraper
- To limit the number of pages to scrape, modify the
scrape()function call in the script:
scraper.scrape(max_pages=5) # Limit to 5 pages- To start scraping from a specific page, modify the
scrape()function call:
scraper.scrape(start_page=3) # Start from page 3