Shiksha Copilot is a transformative educational tool developed under the VELLM (Universal Empowerment with Large Language Models) initiative by Microsoft Research India. It is specifically designed to assist educators in streamlining and enriching the process of lesson planning and content creation. Shiksha Copilot acts as an intelligent assistant, empowering teachers to craft meaningful, engaging, and curriculum-aligned learning experiences tailored to the specific needs of their classrooms.
By allowing educators to select the exact curriculum, grade level, subject, and chapter they plan to teach, Shiksha Copilot provides a customized experience that aligns directly with their instructional goals. The system leverages advanced LLMs to generate a variety of pedagogically sound educational artifacts including:
- Detailed lesson plans
- Real-world examples that contextualize abstract concepts
- Analogies that bridge new ideas with familiar experiences
- Hands-on activities that promote experiential learning
- Formative and summative assessments for effective evaluation
Once teachers review and validate the generated content, the system automatically compiles it into user-friendly formats such as Microsoft Word (DOCX), PowerPoint (PPT), and student-friendly handouts. Furthermore, teachers can generate comprehensive question banks covering multiple chapters in alignment with standard educational blueprint formats. The system also features a conversational AI interface, enabling users to interact with Shiksha Copilot via natural language queries, ask textbook-related questions, or generate customized educational materials on the fly.
Please find additional details in Shiksha-Copilot User FAQ
---
config:
theme: base
themeVariables:
background: transparent
primaryTextColor: '#1f2328'
primaryColor: '#e8f1ff'
secondaryColor: '#fff1e5'
lineColor: '#6e7781'
clusterBkg: '#f6f8fa'
clusterBorder: '#d0d7de'
edgeLabelBackground: '#ffffff'
flowchart:
curve: basis
diagramPadding: 8
nodeSpacing: 35
rankSpacing: 40
---
flowchart LR
subgraph OFFLINE["🔄 Offline Processing"]
direction TB
A("📘 Curriculum Textbooks")
A2("🌐 External Open-Source Material")
B("🧭 Shiksha Ingestion")
B2("👨🏫 Human Curators")
end
subgraph F["📚 Knowledge Base"]
direction LR
C("🧠 Vector Datastore")
D("🕸️ Graph Datastore")
E("📄 Document Datastore")
F1("☁️ Azure Blob Store")
end
subgraph s1["🖥️ Frontend"]
G("Shiksha Website")
end
subgraph H["🧩 FastAPI"]
I("💬 Lesson Chat")
J("📝 Question Paper")
K("🎓 Edu Chat")
end
subgraph M["⚙️ Durable Functions"]
N("🛠️ Lesson Plan Generation")
end
subgraph ONLINE["🌐 Online User Experience"]
s1
H
M
end
L("🔎 Bing Search API")
A --> B
A2 --> B
B --> B2
B2 --> F
G -- SYNC --> H
G -. ASYNC .-> M
I --> F
J --> F
K --> L
M --> F
C:::store
D:::store
E:::store
F1:::store
classDef store fill:#eef6ff,stroke:#1f6feb,stroke-width:1px,color:#1f2328
classDef ghost fill:transparent,stroke:transparent,color:transparent
style s1 fill:#e8f8f0,stroke:#2da44e,stroke-width:1px
style H fill:#fff1e5,stroke:#bf8700,stroke-width:1px
style M fill:#f5f0ff,stroke:#8250df,stroke-width:1px
style F fill:#f6f8fa,stroke:#8b949e,stroke-width:1px
style OFFLINE fill:#fff5f5,stroke:#e5534b,stroke-width:1px
style ONLINE fill:#f0fff4,stroke:#2da44e,stroke-width:1px
- Multi-Source Content Ingestion: Processes curriculum textbooks and external open-source educational materials through an intelligent ingestion pipeline with human curator oversight for quality assurance.
- Comprehensive Knowledge Base: Maintains curriculum content across vector, graph, and document datastores enabling rich contextual retrieval and cross-referenced learning materials.
- Curriculum-Aligned Content Creation: Automatically generates lesson content strictly grounded in the curated knowledge base, enhancing relevance and coherence with educational standards and learning objectives.
- Interactive Pedagogical Tools: Supports the generation of multiple forms of educational content including interactive activities, analogies, real-world examples, and hands-on exercises to cater to diverse learner needs and teaching contexts.
- Low-Resource Classroom Support: Designed to empower educators in resource-constrained environments by providing comprehensive lesson planning tools, reducing preparation time, and ensuring access to quality educational materials regardless of infrastructure limitations.
- Dual API Architecture: Features both synchronous FastAPI services for real-time interactions and asynchronous durable functions for complex lesson plan orchestration.
- Interactive Conversational Features:
- Lesson Chat: Context-aware discussions about specific curriculum topics
- Question Paper Generator: Structured assessment creation aligned with educational blueprints
- Edu Chat: General educational queries enhanced with external web search capabilities
- Full-Stack Web Application: Complete frontend and backend implementation for seamless user experience and robust server-side processing.
- Deliverable Generation: Converts validated lesson plans into readily usable classroom materials such as DOCX documents, PowerPoint slides, and student handouts.
- Modular Architecture: Offers independently usable components including:
- Ingestion Pipeline with multiple processing engines (MinerU, SmolDocLing, OlmOCR)
- LLM Task Queue (Deprecated)
- Retrieval-Augmented Generation (RAG) Wrapper
- Translation Model Training and Evaluation Toolkit
- Empower educators across different regions and teaching contexts to create personalized, effective, and inclusive learning experiences.
- Accelerate the lesson planning workflow while maintaining high educational quality.
- Enhance the teacher’s role by freeing up time spent on content generation and enabling more focus on teaching delivery and student engagement.
- Provide researchers and developers access to modular and reusable components of the system to further innovation in education technology.
- Shiksha Copilot is not intended for commercial deployment or mission-critical applications without extensive validation.
- It should not be used in domains that require high-stakes decision-making such as healthcare, law enforcement, or finance, where inaccuracies can lead to serious consequences.
- It should also not be applied in any highly regulated environment unless the components have undergone necessary compliance checks and validations.
- The system is designed exclusively for teacher use with human oversight and is not intended for direct student interaction.
Shiksha Copilot follows a modular architecture with clear separation between offline content processing and online user services:
- Curriculum Ingestion - Processes textbooks and external educational materials into structured knowledge
- Ingestion Pipeline Components - Core text extraction and processing engines
- MinerU Pipeline - Advanced document processing
- SmolDocLing Pipeline - Lightweight document processing
- OlmOCR Pipeline - OCR-based text extraction
- Shiksha Website Frontend - React-based user interface
- Shiksha Website Backend - Server-side application logic
- Shiksha API Services - FastAPI endpoints for real-time interactions
- Durable Functions - Orchestrated lesson plan generation workflows
- RAG Wrapper - Retrieval-augmented generation interface
- LLM Queue -- Deprecated
- Translation Toolkit:
Instructions for backend and frontend deployment of the entire Shiksha Copilot system, as well as individual modules, are provided in dedicated setup guides within the repository.
- We conducted extensive qualitative evaluations involving internal user testing teams and collaborated with educational partners such as the Sikshana Foundation to assess usability and utility in real-world classroom scenarios.
- Azure Content Filtering Integration: The system integrates Azure Content Filtering services to moderate both user inputs and generated outputs, ensuring educational appropriateness, factual correctness, and respectful communication across all interactions.
- Human-in-the-Loop Quality Assurance: The system incorporates human curators in the content ingestion pipeline to ensure educational quality and accuracy before materials enter the knowledge base.
- Multi-Layer Content Filtering: Beyond Azure services, the system employs additional content validation mechanisms and meta prompts to guide the language model toward generating only education-relevant and unbiased content.
- Structured Knowledge Representation: Content is stored across multiple datastore types (vector, graph, document) enabling comprehensive retrieval and cross-validation of educational materials.
- Ongoing system performance is tracked via anonymous telemetry available to the developers of the application.
- Experimental Nature: Shiksha Copilot is a research prototype and has not been extensively tested for production use. All usage should be under human supervision.
- Content Dependency: The quality of generated lessons is directly dependent on the curated knowledge base, which requires ongoing maintenance and updates by human curators.
- Language Support: Primarily tested on English-language inputs and outputs; support for other languages is experimental and should be used cautiously.
- Model Limitations: Outputs from large language models may occasionally include hallucinated facts, speculative content, or biased information. It is crucial that educators carefully review and validate the content before classroom use.
- Content Accuracy: The AI-generated content may contain occasional errors or inaccuracies. Users of Shiksha Copilot should thoroughly validate all content before incorporating it into their classroom teaching materials.
- Model Dependency: The quality and reliability of outputs are inherently tied to the underlying model. Shiksha Copilot currently utilizes the GPT-4o model.
- Infrastructure Requirements: The system requires multiple components (knowledge base, API services, web application) to be properly deployed and maintained for full functionality.
- Security: Developers deploying the tool in open environments must implement/use appropriate security mechanisms like Azure's content moderation.
- Choose model configurations that best suit your application's context—larger models for general-purpose generation or smaller fine-tuned models for domain-specific tasks.
- Leverage platforms like Azure OpenAI (AOAI) that incorporate state-of-the-art safety features and Responsible AI (RAI) policies. Learn more from:
- Ensure ethical and legal sourcing of datasets—obtain proper consent, anonymize personally identifiable data, and secure usage rights for all media or textual content.
- Review privacy, consent, and data usage policies for all components interacting with Shiksha Copilot, including storage and retrieval systems.
- Comply with local and international data protection regulations (e.g., GDPR, FERPA) when deploying this tool in real-world settings.
This repository may contain references to Microsoft trademarks, products, or services. Use of Microsoft trademarks must follow the official Trademark & Brand Guidelines. Unauthorized or misleading use of trademarks, including those of third parties, is prohibited.
Shiksha Copilot is developed with a strong commitment to privacy-by-design principles and ethical considerations in educational technology:
- All user interactions with Shiksha Copilot are treated with strict confidentiality and data minimization practices.
- No personally identifiable information (PII) is stored in the system, and none is required for any of its features to work.
- No educational content generated by AI or edited by teachers is stored in our systems.
- Anonymous telemetry collects only system performance metrics and does not track individual user behavior or content.
- Transparency: We clearly communicate the capabilities and limitations of the system to users.
- Fairness: Content generation algorithms are regularly audited for potential biases across different cultural contexts, subjects, and pedagogical approaches.
- Inclusivity: The system is designed to support diverse learning needs and perspectives, with ongoing improvements to enhance accessibility.
- Agency and Oversight: Teachers maintain complete editorial control over generated content, reinforcing their critical role in educational decision-making.
- Content is generated with age-appropriateness in mind from the beginning, rather than applying filtering post-generation. Users can implement their own additional filters if necessary.
- The system incorporates safeguards against generating harmful, misleading, or culturally insensitive content.
- Any research conducted using anonymized data follows established ethical guidelines for educational research.
- If deployed in research contexts, proper informed consent is obtained from all participants, with clear explanation of data usage.
- Research findings are transparently reported, including both positive outcomes and limitations.
- An internal ethics review process evaluates potential applications and deployment contexts.
- Regular audits assess the system's adherence to these ethical principles and identify areas for improvement.
- We actively seek input from educational experts, ethicists, and stakeholders to continuously refine our approach.
By prioritizing these principles, we aim to ensure that Shiksha Copilot serves as a responsible tool that empowers educators while respecting their autonomy, protecting user privacy, and advancing equitable educational outcomes.
We welcome feedback and collaboration from our audience. If you have suggestions, questions, or observe unexpected/offensive behavior in our technology, please contact us at kchourasia@microsoft.com. If the team receives reports of undesired behavior or identifies issues independently, we will update this repository with appropriate mitigations.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Microsoft and any contributors grant you a license to the code in this repository under the MIT License. For more details, see the LICENSE file.
Microsoft, Windows, Microsoft Azure, and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.
Privacy information can be found at https://go.microsoft.com/fwlink/?LinkId=521839
Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel, or otherwise.
