A Chrome extension that captures screen regions, performs OCR text extraction, and generates AI-powered 3-line summaries.
- Screen Region Selection: Interactive overlay for selecting capture areas on web pages
- OCR Text Extraction: Supports OCR.Space API and Tesseract.js (local processing)
- AI Summarization: Generates concise 3-line summaries using OpenAI GPT or Anthropic Claude
- Copy to Clipboard: One-click copying of summaries
- macOS-inspired UI: Clean, modern interface with system-adaptive colors
- Cross-device Sync: Settings and API keys sync across devices
- Clone the repository:
git clone https://github.com/aucus/context-capture.git
cd context-capture- Install dependencies:
npm install- Build the extension:
npm run build- Load in Chrome:
- Open
chrome://extensions/ - Enable "Developer mode"
- Click "Load unpacked"
- Select the
dist/folder
- Open
npm run build
npm run packageThis creates a dist.zip file ready for Chrome Web Store submission.
-
OCR Service:
- OCR.Space: Get a free API key from OCR.Space (500 calls/day)
- Tesseract.js: No API key required (local processing)
-
LLM Service:
- OpenAI: Get an API key from OpenAI Platform
- Anthropic: Get an API key from Anthropic Console
Open the extension popup and click "Settings" to configure:
- OCR service selection
- LLM service selection
- API keys
- Theme preferences
- Start Capture: Click the extension icon or use the popup "Capture Region" button
- Select Area: Drag to select the region containing text
- Process: The extension will automatically:
- Capture the selected region
- Extract text via OCR
- Generate a 3-line AI summary
- Copy: Click "Copy to Clipboard" to copy the summary
src/
├── manifest.json # Extension manifest
├── content-script/ # Web page interaction
│ ├── selector.ts # Region selection logic
│ ├── ui.ts # Results popup and styling
│ ├── capture.ts # Screen capture coordination
│ └── content-script.ts # Main content script
├── background/ # Service worker
│ ├── ocr.ts # OCR API integration
│ ├── llm.ts # LLM API integration
│ ├── storage.ts # Settings management
│ └── background.ts # Main background script
├── popup/ # Extension popup
│ ├── popup.html
│ ├── popup.ts
│ └── popup.css
├── options/ # Settings page
│ ├── options.html
│ ├── options.ts
│ └── options.css
└── shared/ # Common utilities
├── types.ts # TypeScript interfaces
├── constants.ts # API endpoints and limits
└── utils.ts # Helper functions
npm run build- Production buildnpm run dev- Development build with watch modenpm test- Run unit testsnpm run lint- Run ESLintnpm run type-check- Run TypeScript checksnpm run package- Create distribution zip
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Run specific test file
npm test -- storage.test.ts- Endpoint:
https://api.ocr.space/parse/image - Free Tier: 500 calls/day
- Features: Multiple language support, orientation detection
- Local Processing: No API calls required
- Features: Offline processing, no rate limits
- Performance: Slower than cloud APIs
- Model: gpt-3.5-turbo
- Prompt: "Summarize the following text in exactly 3 lines, focusing on key points:"
- Token Limit: 4096 tokens
- Model: claude-3-haiku-20240307
- Features: Fast, cost-effective summarization
- Token Limit: 4096 tokens
- API keys stored securely in Chrome extension storage
- Content scripts never directly access API keys
- All API calls routed through background service worker
- No sensitive data logged in content script context
- Content Security Policy implemented
- No Selection: "Please select an area to capture"
- OCR Failure: "Text recognition failed, please try again"
- LLM Failure: "Summary generation failed, check network and API key"
- Invalid API Key: "Please check your API key in settings"
- Comprehensive API response validation
- Network timeout handling (30 seconds)
- Retry logic with exponential backoff
- Graceful fallback mechanisms
- Image compression for OCR (max 1024px, JPEG 0.8 quality)
- Efficient region cropping using Canvas API
- Minimal DOM modifications
- Proper cleanup of event listeners
- Optimized bundle size
- Efficient use of Chrome tabs.captureVisibleTab API
- Memory-conscious image processing
- API rate limit handling
- Automatic cleanup of temporary resources
- Chrome 88+ (Manifest V3)
- Edge 88+ (Chromium-based)
- Other Chromium-based browsers
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Run tests:
npm test - Commit your changes:
git commit -am 'Add feature' - Push to the branch:
git push origin feature-name - Submit a pull request
MIT License - see LICENSE file for details.
This extension:
- Only processes images you explicitly select
- Does not store or transmit any personal data
- Uses your configured API keys for OCR and LLM services
- Syncs settings across your devices (optional)
For issues and feature requests, please use the GitHub Issues page.
- Initial release
- Screen region selection
- OCR text extraction (OCR.Space + Tesseract.js)
- AI summarization (OpenAI + Anthropic)
- Settings management
- Cross-device sync