CodeSnapAI addresses the critical "context explosion vs. information loss" paradox in modern software engineering. We compress massive codebases into ultra-compact semantic snapshots while preserving 95%+ debugging-critical information, enabling AI-assisted development at unprecedented scale.
Core Innovation: Transform 5MB+ codebases into <200KB semantic representations that LLMs can actually understand and act upon.
Modern software development faces three critical bottlenecks:
| Challenge | Current State | CodeSnapAI Solution |
|---|---|---|
| Context Overload | Large codebases contain millions of details, overwhelming AI debuggers and human developers | Intelligent semantic compression with risk-weighted preservation |
| Semantic Loss | Traditional code summarization loses critical dependency relationships and error patterns | Multi-dimensional semantic tagging system maintaining architectural integrity |
| Governance Fragmentation | Complexity detection tools (SonarQube, Codacy) report issues but require manual remediation | Automated end-to-end workflow: scan → AI-generated patches → validation → deployment |
| Multi-Language Chaos | Each language requires separate toolchains and analysis frameworks | Unified semantic abstraction layer across Go, Java, C/C++, Rust, Python |
🚀 20:1 Compression Ratio - Industry-leading semantic snapshot technology
🎯 95%+ Information Retention - Preserves all debugging-critical relationships
🔄 Closed-Loop Automation - From issue detection to validated patch deployment
🌐 Universal Language Support - Unified analysis across 5+ major languages
⚡ Sub-30s Analysis - Process 100K LOC projects in under 30 seconds
🔓 Open Source & Extensible - Plugin architecture for custom rules and languages
- Unified AST Parsing: Leverages tree-sitter for Go, Java, C/C++, Rust, Python, Shell
- Deep Semantic Extraction:
- Function signatures, call graphs, dependency trees
- Complexity metrics (cyclomatic, cognitive, nesting depth)
- Error handling patterns (panic/error wrapping/exceptions)
- Concurrency primitives (goroutines, async/await, channels)
- Database/network operation markers
- Incremental Analysis: File-level hashing for efficient change detection
- Advanced Compression Strategies:
- Package-level aggregation with representative sampling
- Critical path extraction (high-call-count functions prioritized)
- Semantic clustering by functional tags
- Risk-weighted pruning (high-risk modules preserved verbatim)
- Multiple Output Formats: YAML (human-readable), JSON (API), Binary (performance)
- Rich Metadata: Project structure, dependency graphs, risk heatmaps, git context
- Multi-Dimensional Risk Model:
- Complexity score (weighted McCabe + Cognitive Complexity)
- Error pattern analysis (unsafe operations, missing handlers)
- Test coverage penalties for critical paths
- Transitive dependency vulnerability propagation
- Change frequency from git history (instability indicators)
- Configurable Thresholds: Custom scoring rules per project type
- Actionable Reports: Drill-down capabilities with root cause analysis
- Automated Issue Detection:
- Cyclomatic complexity > 10 (configurable)
- Cognitive complexity > 15
- Nesting depth > 4
- Function length > 50 LOC
- Parameter count > 5
- Code duplication > 3%
- LLM-Powered Refactoring:
- Context-enriched prompt generation
- Structured JSON output validation
- Multi-turn conversation support
- Patch Management Pipeline:
- Syntax validation via language parsers
- Automated test execution (pre/post patching)
- Git-based rollback mechanism
- Optional approval workflows
- Natural Language Queries:
- "Why did TestUserLogin fail?" → Full call chain localization
- "Show high-risk modules" → Ranked list with justifications
- "Explain function ProcessPayment" → Semantic summary + dependencies
- Debugger Integration: Compatible with pdb, gdb, lldb, delve
- Real-Time Navigation: Semantic search across codebase
- Python Parser: Fixed nested async function extraction, Python 3.10+ match statement support, enhanced error recovery
- Go Parser: Added Go 1.18+ generics support with type constraints, improved struct tag parsing
- Java Parser: Enhanced annotation parsing for nested annotations, record class support, lambda expression filtering
- 97.5% Test Coverage: 100+ real-world code samples with ground truth validation
- Performance Optimized: Analyze 1000 LOC in <500ms (40% faster than previous version)
- Error Recovery: Robust partial AST parsing on syntax errors
- Semantic Extraction: >95% accuracy against hand-annotated ground truth
- CI Integration: Automated GitHub Actions workflow with coverage reporting
- Type Safety: Full Pydantic model validation for all AST nodes
- Python 3.10 or higher
- Git (for repository analysis features)
pip install codesagegit clone https://github.com/turtacn/CodeSnapAI.git
cd CodeSnapAI
poetry install- Initialize Configuration:
poetry run codesage config init --interactive
- Analyze Your Code:
# Auto-detect languages (Python, Go, Java, Shell) poetry run codesage scan ./your-project --language auto - Create a Snapshot:
poetry run codesage snapshot create ./your-project
You can run CodeSnapAI using Docker without installing dependencies locally.
# Build the image
docker build -t codesage .
# Run a scan
docker run -v $(pwd):/workspace codesage scan .# Analyze a Go microservice project
codesage snapshot ./my-go-service -o snapshot.yaml
# Output: snapshot.yaml (compressed semantic representation)codesage analyze snapshot.yaml
# Output example:
# Project: my-go-service (Go 1.21)
# Total Functions: 342
# High-Risk Modules: 12 (see details below)
# Top Complexity Hotspots:
# - handlers/auth.go::ValidateToken (Cyclomatic: 18, Cognitive: 24)
# - services/payment.go::ProcessRefund (Cyclomatic: 15, Cognitive: 21)codesage debug snapshot.yaml TestUserRegistration
# Output:
# Test Failure Localization:
# Root Cause: handlers/user.go::RegisterUser, Line 45
# Call Chain: RegisterUser → ValidateEmail → CheckDuplicates
# Risk Factors: Missing error handling for database timeout (Line 52)
# Suggested Fix: Wrap db.Query with context.WithTimeout# Scan for complexity violations
codesage scan ./my-go-service --threshold cyclomatic=10 cognitive=15
# Auto-generate refactoring with LLM
codesage govern scan_results.json --llm claude-3-5-sonnet --apply
# Output:
# Detected 8 violations
# Generated 8 refactoring patches
# Validation: 7/8 passed tests (1 requires manual review)
# Applied patches to: handlers/auth.go, services/payment.go, ...CodeSage includes a web-based console for visualizing analysis results, reports, and governance plans.
Launch the Console:
codesage web-consoleThis will start a local web server (default: http://127.0.0.1:8080) where you can browse the project dashboard, file details, and governance tasks.
Screenshot Placeholder:
You can use the codesage report command to generate reports and enforce CI policies.
# Generate reports
codesage report \
--input /path/to/snapshot.yaml \
--out-json /path/to/report.json \
--out-md /path/to/report.md \
--out-junit /path/to/report.junit.xml
# Enforce CI policy
codesage report \
--input /path/to/snapshot.yaml \
--ci-policy-strictYou can easily integrate CodeSnapAI into your GitHub Actions workflow using our official action.
# .github/workflows/codesnap_audit.yml
name: CodeSnapAI Security Audit
on: [pull_request]
jobs:
audit:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
checks: write
steps:
- uses: actions/checkout@v4
- name: Run CodeSnapAI
uses: turtacn/CodeSnapAI@main # Replace with tagged version in production
with:
target: "."
language: "python"
fail_on_high: "true"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}from codesage import SemanticAnalyzer, SnapshotGenerator, RiskScorer
# Initialize analyzer
analyzer = SemanticAnalyzer(language='go')
analysis = analyzer.analyze_directory('./my-service')
# Generate snapshot
generator = SnapshotGenerator(compression_ratio=20)
snapshot = generator.create(analysis)
snapshot.save('snapshot.yaml')
# Risk scoring
scorer = RiskScorer()
risks = scorer.score(analysis)
print(f"High-risk modules: {len(risks.high_risk)}")
for module in risks.high_risk:
print(f" {module.path}: {module.score}/100")
print(f" Reasons: {', '.join(module.risk_factors)}")from codesage.plugins import LanguagePlugin
class KotlinPlugin(LanguagePlugin):
def get_tree_sitter_grammar(self):
return 'tree-sitter-kotlin'
def extract_semantic_tags(self, node):
# Custom semantic extraction logic
if node.type == 'coroutine_declaration':
return ['async', 'concurrency']
return []
# Register plugin
from codesage import PluginRegistry
PluginRegistry.register('kotlin', KotlinPlugin())# Watch mode for continuous analysis
codesage watch ./src --alert-on complexity>15
# Terminal output (with color-coded alerts):
# ⚠️ ALERT: handlers/auth.go::ValidateToken
# Cognitive Complexity increased: 12 → 17 (+5)
# Recommendation: Extract validation logic to separate functionGIF Demo: docs/demos/complexity-monitoring.gif
# Interactive refactoring session
codesage refactor ./services/payment.go --interactive
# LLM Conversation:
# 🤖 I've identified 3 complexity issues. Let's start with ProcessRefund:
# Current Cyclomatic Complexity: 18
# Suggested approach: Extract retry logic and error handling
#
# 👤 Focus on the retry logic first
# 🤖 Generated patch: [shows diff]
# Tests: ✅ All 12 tests pass
# Apply this change? (y/n)GIF Demo: docs/demos/interactive-refactoring.gif
# Analyze multiple projects
codesage dashboard --repos "service-a,service-b,service-c" --port 8080
# Opens web UI showing:
# - Cross-project complexity trends
# - Shared high-risk patterns
# - Dependency vulnerability heatmapGIF Demo: docs/demos/multi-repo-dashboard.gif
version: "1.0"
# Language settings
languages:
- go
- python
# Compression settings
snapshot:
compression_ratio: 20
preserve_patterns:
- ".*_test.go$" # Keep all test files
- "main.go$" # Keep entry points
# Complexity thresholds
thresholds:
cyclomatic_complexity: 10
cognitive_complexity: 15
nesting_depth: 4
function_length: 50
parameter_count: 5
duplication_rate: 0.03
# Risk scoring weights
risk_scoring:
complexity_weight: 0.3
error_pattern_weight: 0.25
test_coverage_weight: 0.2
dependency_weight: 0.15
change_frequency_weight: 0.1
# LLM integration
llm:
provider: anthropic # or openai, local
model: claude-3-5-sonnet-20241022
temperature: 0.2
max_tokens: 4096- Architecture Overview - System design and component details
- API Reference - Python library documentation
- Plugin Development - Create custom language analyzers
- Performance Tuning - Optimization strategies for large codebases
- Governance Workflows - Best practices for automated refactoring
We welcome contributions from the community! CodeSnapAI is built on the principle that better code analysis tools benefit everyone.
-
Fork the Repository
git clone https://github.com/turtacn/CodeSnapAI.git cd CodeSnapAI -
Create a Feature Branch
git checkout -b feature/your-amazing-feature
-
Make Your Changes
- Follow our Code Style Guide
- Add tests for new features
- Update documentation
-
Run Tests
pytest tests/ --cov=codesage
-
Submit a Pull Request
- Use our PR template
- Link related issues
- 🌐 Language Support: Add parsers for new languages (Scala, Swift, etc.)
- 📊 Metrics: Implement new complexity or quality metrics
- 🤖 LLM Integrations: Add support for new AI models
- 📝 Documentation: Improve guides and examples
- 🐛 Bug Fixes: Help us squash bugs
See CONTRIBUTING.md for detailed guidelines.
CodeSnapAI is released under the Apache License 2.0.
Copyright 2024 CodeSnapAI Contributors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
CodeSnapAI builds upon the excellent work of:
- tree-sitter - Incremental parsing system
- Anthropic Claude - Advanced language model capabilities
- FastAPI - Modern API framework
Special thanks to all contributors who make this project possible.
- 💬 Discussions: GitHub Discussions
- 🐛 Bug Reports: Issue Tracker
- 📧 Email: codesnapai@example.com
- 🐦 Twitter: @CodeSnapAI
-
AI调试助手: ChatDBG、Debug-gym等工具已实现AI与传统调试器(pdb/gdb/lldb)集成,支持交互式调试和根因分析
-
代码复杂度工具: Codacy、SonarQube、NDepend等商业工具提供圈复杂度、认知复杂度等多维度分析
-
通用AI代码助手: Workik、GitHub Copilot等提供上下文感知的错误检测和修复建议

