-
Notifications
You must be signed in to change notification settings - Fork 12
USFM New Experimental Importer #471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implemented a new experimental way to import USFM files (new button in the importers webview), that uses the rebuilding logic of IDML and other improters I implemented. It still needs to be fully tested, but this first iteration should be mostly functional with most USFM files. (important, users can not merge cells, it will break it)
Hope this works!
WalkthroughA new USFM (Unified Standard Format Marker) import and export system is introduced, comprising webview-based importer components (form, parser, validators, cell aligner, inline marker converter) and server-side round-trip exporter. Export routing is extended to classify and process USFM files. A file save handler is added to the NewSourceUploader provider. Changes
Sequence DiagramssequenceDiagram
participant User as User/Webview
participant Form as UsfmImporterForm
participant Validator as Validator
participant Parser as USFM Parser
participant Aligner as Cell Aligner
participant Notebook as Notebook Builder
User->>Form: Select USFM files
Form->>Validator: validateFile()
Validator->>Validator: Check extension & markers
Validator-->>Form: FileValidationResult
rect rgb(220, 245, 220)
Note over Parser,Notebook: Parse & Build Notebooks
Form->>Parser: parseFile()
Parser->>Parser: Read & parse USFM<br/>Extract metadata
Parser-->>Form: ParsedUsfmDocument
Form->>Notebook: Create source notebook
Form->>Notebook: Create codex notebook
Notebook-->>Form: NotebookPair
end
alt Translation Import
rect rgb(245, 230, 220)
Note over Form,Aligner: Align Content (if intent=target)
Form->>Aligner: usfmCellAligner()
Aligner->>Aligner: Multi-tier matching<br/>(label, ID, verse)
Aligner-->>Form: AlignedCell[]
end
Form->>User: Show alignment preview
User->>Form: Confirm alignment
end
Form->>Form: handleImportCompletion()
Form-->>User: Success notification
sequenceDiagram
participant Handler as exportHandler
participant USFM as USFM Exporter
participant FileSystem as Filesystem
participant Progress as VS Code Progress
Handler->>Progress: Show progress UI
rect rgb(220, 245, 220)
Note over Handler,FileSystem: Process Each USFM File
loop For each file in filesToExport
Handler->>Handler: Read Codex notebook
Handler->>Handler: Determine source USFM<br/>(from metadata/fallback)
Handler->>USFM: Load original USFM
Handler->>Handler: Extract cell content & mappings
Handler->>USFM: exportUsfmRoundtrip()
USFM->>USFM: Build line mappings<br/>Extract translations
USFM->>USFM: Merge translations<br/>Preserve structure
USFM-->>Handler: Updated USFM content
Handler->>FileSystem: Write timestamped file
Handler->>Progress: Update per-file progress
end
end
Handler-->>Handler: Error handling & logging
Progress-->>Handler: Complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/exportHandler/exportHandler.ts (1)
1447-1450: Update the supported types list in warning message.The warning message lists "Supported types: DOCX, IDML, Biblica, PDF" but this is now outdated. USFM, OBS, and TMS are also supported (and PDF is commented out).
vscode.window.showWarningMessage( `The following files were skipped (unsupported or coming soon):\n${unsupportedList}\n\nSupported types: DOCX, IDML, Biblica, PDF`, + `The following files were skipped (unsupported or coming soon):\n${unsupportedList}\n\nSupported types: DOCX, IDML, Biblica, OBS, TMS, USFM`, { modal: false } );
🧹 Nitpick comments (13)
src/test/suite/integration/project-healing.test.ts (1)
363-365: Typethisfor Mocha context to avoidnoImplicitThissurprisesIf your test TS config enables
noImplicitThis, consider:-test("Healing preserves .project directory structure via merge", async function () { +test("Healing preserves .project directory structure via merge", async function (this: Mocha.Context) { - this.timeout(10000); // Increase timeout for file operations + this.timeout(10_000); // Increase timeout for file operationswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tsx (1)
5-13:supportedExtensionsuppercase entries are dead (extension matching lowercases)Given
getImporterByExtension()lowercases the filename extension,"SFM"/"USFM"will never be matched:- supportedExtensions: ["usfm", "sfm", "SFM", "USFM"], + supportedExtensions: ["usfm", "sfm"],webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsx (2)
42-42: Consider adding proper type annotation fortargetCells.The
targetCellsstate is typed asany[]. Consider using a more specific type from the plugin types (e.g., the cell type fromNotebookPair) for better type safety.
177-186: Consider removing arbitrary delay before completion.The 2-second
setTimeoutdelay before callinghandleImportCompletionappears to be for UI feedback, but it could cause issues if the user navigates away or if an error occurs during this window. Consider using a more controlled approach or removing the delay entirely.- setTimeout(async () => { - try { - // For multi-file imports, pass all notebook pairs for batch import - const notebooks = - notebookPairs.length === 1 ? notebookPairs[0] : notebookPairs; - await handleImportCompletion(notebooks, props); - } catch (err) { - setError(err instanceof Error ? err.message : "Failed to complete import"); - } - }, 2000); + try { + // For multi-file imports, pass all notebook pairs for batch import + const notebooks = + notebookPairs.length === 1 ? notebookPairs[0] : notebookPairs; + await handleImportCompletion(notebooks, props); + } catch (err) { + setError(err instanceof Error ? err.message : "Failed to complete import"); + }src/providers/NewSourceUploader/NewSourceUploaderProvider.ts (1)
1365-1371: Handle edge case in file extension extraction.If
fileNamehas no extension,fileName.split('.').pop()returns the entire filename, not an empty string or'*'. This could create an unexpected filter entry.const saveUri = await vscode.window.showSaveDialog({ defaultUri, saveLabel: 'Save', filters: mime ? { 'All Files': ['*'], - [mime]: [fileName.split('.').pop() || '*'] + [mime]: [path.extname(fileName).slice(1) || '*'] } : undefined });webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmInlineMapper.ts (2)
10-81: Refactor to eliminate code duplication.The
convertUsfmInlineMarkersInTexthelper function (lines 10-81) is nearly identical to the inline marker processing logic inconvertUsfmInlineMarkersToHtml(lines 159-228). Consider extracting the shared parsing logic into a single reusable function to improve maintainability.
144-156: Footnote replacement logic is sound but could reuse regex.The reverse-order replacement correctly preserves string positions. However,
footnoteRegex2(line 147) duplicates the pattern fromfootnoteRegex(line 92). Consider reusing the same regex withlastIndexreset, or extract to a constant.webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmParser.ts (2)
155-159: Type cast bypasses type checking.The
as anycast on the metadata object (line 159) bypasses TypeScript's type checking. IfcreateProcessedCellhas a defined metadata type, consider aligning thecellMetadataobject with that type or updating the type definition.
461-462: TODO: Footnote extraction not implemented.The
footnoteCountis hardcoded to 0 with a TODO comment. While footnotes are converted to HTML inline (viaconvertUsfmInlineMarkersToHtml), the count and extraction for metadata purposes is not implemented. Consider tracking this as a follow-up task.Would you like me to open an issue to track implementing footnote extraction and counting?
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.ts (1)
101-106: Extensive debug logging may impact performance.The console.log statements (lines 101-106 and others throughout) are helpful for debugging but could impact performance with large USFM files. Consider conditionally enabling these logs based on a debug flag or using a proper logging framework with log levels.
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.ts (2)
21-22: Minor redundancy in extension list.The
SUPPORTED_EXTENSIONSarray includes both lowercase and uppercase variants ('sfm','SFM','usfm','USFM'), butvalidateFileExtensionalready performs case-insensitive matching. This redundancy is harmless but could be simplified.-const SUPPORTED_EXTENSIONS = ['usfm', 'sfm', 'SFM', 'USFM']; +const SUPPORTED_EXTENSIONS = ['usfm', 'sfm'];
99-99: Potential ID collision withDate.now().Using
Date.now()for IDs could cause collisions if multiple files are imported within the same millisecond. Consider using a UUID or combining with a random component for uniqueness.- id: `usfm-experimental-source-${Date.now()}`, + id: `usfm-experimental-source-${Date.now()}-${Math.random().toString(36).substring(2, 9)}`,src/exportHandler/exportHandler.ts (1)
997-1008: Consider using a more specific type instead ofany.The
cellDatais typed asany, which loses type safety. Consider defining a proper interface for the cell data structure being built.- const codexCells = codexNotebook.cells.map(cell => { - const cellData: any = { + interface ExportCell { + kind: number; + value: string; + metadata: any; + id?: string; + } + const codexCells = codexNotebook.cells.map(cell => { + const cellData: ExportCell = { kind: cell.kind, value: cell.value, metadata: cell.metadata, };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
src/exportHandler/exportHandler.ts(3 hunks)src/projectManager/projectExportView.ts(1 hunks)src/providers/NewSourceUploader/NewSourceUploaderProvider.ts(3 hunks)src/test/suite/integration/project-healing.test.ts(1 hunks)webviews/codex-webviews/src/NewSourceUploader/importers/registry.tsx(4 hunks)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsx(1 hunks)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.ts(1 hunks)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tsx(1 hunks)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts(1 hunks)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.ts(1 hunks)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmInlineMapper.ts(1 hunks)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmParser.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/audio-file-organization.mdc)
**/*.{ts,tsx}: When programmatically adding audio files, use zero-padding of 3 digits for chapter and verse numbers (e.g., chapter 1 becomes '001', verse 25 becomes '025')
Audio file paths must be validated before conversion to webview URIs, and audio files must be restricted to the workspace .project/attachments/ directory for security
Audio file paths should be converted to webview-compatible URIs using webview.asWebviewUri() for frontend integration
Directory scanning for audio files should look for files matching the pattern {BOOK}{CCC}{VVV}.* in the .project/attachments/{BOOK}/ directory
Audio buttons should only appear in the webview when valid audio files are found and successfully loaded for a cell
Audio elements should be created on-demand to minimize memory usage, with only one audio file playing at a time per cell
Files:
src/projectManager/projectExportView.tssrc/test/suite/integration/project-healing.test.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsxwebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tsxsrc/providers/NewSourceUploader/NewSourceUploaderProvider.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmParser.tswebviews/codex-webviews/src/NewSourceUploader/importers/registry.tsxwebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmInlineMapper.tssrc/exportHandler/exportHandler.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts
**/*.{js,ts,html,htm}
📄 CodeRabbit inference engine (.cursor/rules/audio-recording-permissions.mdc)
Always check browser support for
navigator.mediaDevicesandgetUserMediaAPI before attempting to access microphone
Files:
src/projectManager/projectExportView.tssrc/test/suite/integration/project-healing.test.tssrc/providers/NewSourceUploader/NewSourceUploaderProvider.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmParser.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmInlineMapper.tssrc/exportHandler/exportHandler.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts
**/*.{js,ts}
📄 CodeRabbit inference engine (.cursor/rules/audio-recording-permissions.mdc)
**/*.{js,ts}: Properly clean up media streams by callingtrack.stop()on all tracks after recording is complete to release microphone access
UseMediaRecorderAPI with event handlers (ondataavailable,onstart,onstop) to manage recording state and collect audio data
Revoke object URLs created withURL.createObjectURL()usingURL.revokeObjectURL()to prevent memory leaks when audio data is no longer needed
Files:
src/projectManager/projectExportView.tssrc/test/suite/integration/project-healing.test.tssrc/providers/NewSourceUploader/NewSourceUploaderProvider.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmParser.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmInlineMapper.tssrc/exportHandler/exportHandler.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts
**/*.{js,ts,jsx,tsx}
📄 CodeRabbit inference engine (.cursor/rules/migrating-webviews-to-shadcn.mdc)
Use relative paths instead of import aliases for ShadCN component imports (e.g.,
../components/ui/buttonrather than aliased paths)
Files:
src/projectManager/projectExportView.tssrc/test/suite/integration/project-healing.test.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsxwebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tsxsrc/providers/NewSourceUploader/NewSourceUploaderProvider.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmParser.tswebviews/codex-webviews/src/NewSourceUploader/importers/registry.tsxwebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmInlineMapper.tssrc/exportHandler/exportHandler.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts
**/*.{tsx,ts}?(@(component|page))
📄 CodeRabbit inference engine (.cursor/rules/shadcn-cell-editor.mdc)
Use "use client" directive at the top of React components in a Vite + React + TypeScript VSCode webview environment
Files:
src/projectManager/projectExportView.tssrc/test/suite/integration/project-healing.test.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsxwebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tsxsrc/providers/NewSourceUploader/NewSourceUploaderProvider.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmParser.tswebviews/codex-webviews/src/NewSourceUploader/importers/registry.tsxwebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmInlineMapper.tssrc/exportHandler/exportHandler.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts
**/*.{tsx,ts}
📄 CodeRabbit inference engine (.cursor/rules/shadcn-cell-editor.mdc)
**/*.{tsx,ts}: Import ShadCN UI components from '@/components/ui/' path alias for Button, Card, Tabs, Textarea, Progress, Separator, and Tooltip components
Import icons from 'lucide-react' library for UI icons
Define component prop interfaces using TypeScript interface syntax with descriptive prop names
Use React.useState hook for state management with proper type annotations where necessary
Use async/await for asynchronous operations like API calls and file operations
Structure TabsContent components with proper border and padding classes for consistent styling
Use aria-label attributes on icon buttons for accessibility
Use conditional rendering with ternary operators and helper functions (like getTranscriptionAreaContent()) for complex UI states
Files:
src/projectManager/projectExportView.tssrc/test/suite/integration/project-healing.test.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsxwebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tsxsrc/providers/NewSourceUploader/NewSourceUploaderProvider.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmParser.tswebviews/codex-webviews/src/NewSourceUploader/importers/registry.tsxwebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmInlineMapper.tssrc/exportHandler/exportHandler.tswebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts
**/*.{jsx,tsx}
📄 CodeRabbit inference engine (.cursor/rules/migrating-webviews-to-shadcn.mdc)
**/*.{jsx,tsx}: MigrateVSCodeButtontoButtonfrom../components/ui/buttonwith appearance mapping:appearance="icon"→variant="outline",appearance="secondary"→variant="secondary",appearance="primary"or no appearance →variant="default"
Remove VSCode-specific props likeappearancewhen migrating to ShadCN components, and use thevariantprop instead
MigrateVSCodeBadgetoBadgefrom../components/ui/badge
MigrateVSCodeCardtoCard, CardContent, CardHeaderetc. from../components/ui/card
Usecn()utility from../lib/utilsfor conditional className merging in ShadCN components
Preserve accessibility attributes (aria-labels, titles, etc.) when migrating from VSCode to ShadCN components
Files:
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsxwebviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tsxwebviews/codex-webviews/src/NewSourceUploader/importers/registry.tsx
🧠 Learnings (4)
📚 Learning: 2025-12-12T00:01:22.734Z
Learnt from: CR
Repo: genesis-ai-dev/codex-editor PR: 0
File: .cursor/rules/migrating-webviews-to-shadcn.mdc:0-0
Timestamp: 2025-12-12T00:01:22.734Z
Learning: Applies to **/*.{jsx,tsx} : Migrate `VSCodeCard` to `Card, CardContent, CardHeader` etc. from `../components/ui/card`
Applied to files:
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsx
📚 Learning: 2025-12-12T00:01:35.322Z
Learnt from: CR
Repo: genesis-ai-dev/codex-editor PR: 0
File: .cursor/rules/types.mdc:0-0
Timestamp: 2025-12-12T00:01:35.322Z
Learning: Applies to types/index.d.ts : When passing messages between webviews and providers, update the correct type in index.d.ts
Applied to files:
src/providers/NewSourceUploader/NewSourceUploaderProvider.ts
📚 Learning: 2025-12-12T00:01:31.825Z
Learnt from: CR
Repo: genesis-ai-dev/codex-editor PR: 0
File: .cursor/rules/shadcn-cell-editor.mdc:0-0
Timestamp: 2025-12-12T00:01:31.825Z
Learning: Applies to **/*.{tsx,ts} : Import icons from 'lucide-react' library for UI icons
Applied to files:
webviews/codex-webviews/src/NewSourceUploader/importers/registry.tsx
📚 Learning: 2025-12-12T00:01:31.825Z
Learnt from: CR
Repo: genesis-ai-dev/codex-editor PR: 0
File: .cursor/rules/shadcn-cell-editor.mdc:0-0
Timestamp: 2025-12-12T00:01:31.825Z
Learning: Applies to **/*.{tsx,ts} : Import ShadCN UI components from '@/components/ui/' path alias for Button, Card, Tabs, Textarea, Progress, Separator, and Tooltip components
Applied to files:
webviews/codex-webviews/src/NewSourceUploader/importers/registry.tsx
🧬 Code graph analysis (8)
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsx (4)
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.ts (3)
usfmExperimentalImporter(174-192)validateFile(27-65)parseFile(73-172)webviews/codex-webviews/src/NewSourceUploader/types/plugin.ts (4)
ImporterComponentProps(204-251)AlignedCell(35-42)ImportedContent(23-30)CellAligner(47-51)webviews/codex-webviews/src/NewSourceUploader/types/common.ts (2)
ImportProgress(34-38)NotebookPair(29-32)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts (1)
usfmCellAligner(14-210)
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tsx (1)
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsx (1)
UsfmImporterForm(30-408)
src/providers/NewSourceUploader/NewSourceUploaderProvider.ts (2)
webviews/codex-webviews/src/NewSourceUploader/types/plugin.ts (1)
SaveFileMessage(353-358)src/test/manual-structure-test.js (2)
vscode(6-6)workspaceFolder(12-12)
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.ts (4)
webviews/codex-webviews/src/NewSourceUploader/types/common.ts (5)
FileValidationResult(49-56)ProgressCallback(81-81)ImportResult(40-47)ProcessedNotebook(17-27)NotebookPair(29-32)webviews/codex-webviews/src/NewSourceUploader/utils/workflowHelpers.ts (2)
validateFileExtension(78-84)createProgress(12-20)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmParser.ts (1)
parseUsfmFile(42-466)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.ts (1)
exportUsfmRoundtrip(26-392)
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.ts (1)
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmInlineMapper.ts (1)
htmlInlineToUsfm(232-444)
webviews/codex-webviews/src/NewSourceUploader/importers/registry.tsx (1)
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.tsx (1)
usfmExperimentalImporterPlugin(5-13)
src/exportHandler/exportHandler.ts (2)
src/test/manual-structure-test.js (2)
vscode(6-6)originalFileUri(57-57)webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.ts (1)
exportUsfmRoundtrip(26-392)
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts (2)
webviews/codex-webviews/src/NewSourceUploader/types/plugin.ts (3)
CellAligner(47-51)ImportedContent(23-30)AlignedCell(35-42)src/utils/editMapUtils.ts (1)
cellLabel(48-50)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Node 20 • macos-latest
- GitHub Check: Node 20 • windows-latest
- GitHub Check: Node 20 • ubuntu-latest
🔇 Additional comments (17)
src/projectManager/projectExportView.ts (1)
337-345: UI update for Rebuild Export formats looks consistentNice touch adding USFM to the “original format” description and the visible tag list.
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsx (2)
48-73: LGTM!The
handleFileSelectcallback is well-implemented with proper error handling for file previews and sensible limits on preview size.
247-407: LGTM!The JSX rendering uses ShadCN components correctly with proper accessibility attributes (labels, aria associations). The UI provides good feedback during processing and error states.
src/providers/NewSourceUploader/NewSourceUploaderProvider.ts (1)
320-326: LGTM!The new
saveFilecommand handler follows the established pattern in the message router, with proper type casting and delegation to the handler method.webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmInlineMapper.ts (2)
337-438: Regex fallback handles nested structures iteratively with recursion.The fallback regex approach uses iteration with a maximum limit (20 iterations) to handle nested tags progressively. However, the recursive calls to
htmlInlineToUsfmwithin the iteration (lines 394, 402, 428) could potentially cause deep recursion if content is malformed. This is mitigated by the maxIterations limit on the outer loop but consider adding a depth parameter for safety with deeply nested malicious input.
232-335: LGTM!The DOMParser-based approach for HTML→USFM conversion is well-implemented with proper handling of footnotes, inline markers, and nested structures. The fallback to regex ensures compatibility in Node.js contexts.
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmParser.ts (3)
12-32: LGTM!The
ParsedUsfmDocumentinterface is well-structured with all necessary fields for round-trip export support. ThelineMappingstype provides good structure for tracking source line to cell relationships.
42-74: LGTM!The function setup properly initializes all tracking state for parsing, including multi-line verse handling with break tags. The
versesOnlyparameter enables target import mode that skips non-verse content.
166-432: LGTM!The main parsing loop correctly handles the USFM format including:
- Multi-line verses with break tags (
\li1,\q1,\b)- Header content assigned to chapter 1
- Continuation lines without markers
- The
versesOnlymode for target importsThe logic preserves structure metadata for round-trip export.
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.ts (3)
10-34: LGTM!The
LineMappinginterface is consistent with the parser's output, and the function signature with overloaded parameter handling provides good backward compatibility for existing importers.
230-351: Complex multi-line verse handling.The logic for mapping
<br>-separated translation parts back to USFM break lines is necessarily complex to support round-trip export. The implementation correctly:
- Preserves original break markers (
\li1,\q1,\b)- Uses metadata to track break tag order
- Handles cases where translation has more/fewer parts than original
This is a critical piece for maintaining USFM structure during round-trip.
389-391: LGTM!The export function correctly reconstructs USFM content by preserving markers and structure while substituting translated text. The summary logging provides useful metrics for debugging import/export cycles.
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/index.ts (3)
27-65: LGTM!The validation function has appropriate checks for extension, file size, and basic USFM marker presence. Error handling is in place for file read failures.
73-172: LGTM!The parsing logic correctly reads the file, delegates to the USFM parser, and constructs properly structured notebook pairs with metadata for round-trip export support. Style cells are preserved in the codex notebook while text cells are cleared for translation.
174-192: LGTM!The plugin definition properly implements the
ImporterPlugininterface with round-trip export support. The dynamic import for the exporter and the fallback path for imports withoutlineMappingsare well-handled.src/exportHandler/exportHandler.ts (2)
856-1054: LGTM overall - robust USFM round-trip export implementation.The function handles multiple fallback paths for locating the original USFM file, supports both lineMappings-based and fallback export modes, and includes comprehensive debug logging. The error handling per-file allows partial success when exporting multiple files.
1409-1423: LGTM!The USFM export block follows the established pattern for other format exports in
exportCodexContentAsRebuild, with proper error handling and progress reporting.
| const exportFolder = vscode.Uri.file(userSelectedPath); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing directory creation for export folder.
Unlike other export functions in this file (e.g., exportCodexContentAsIdmlRoundtrip, exportCodexContentAsDocxRoundtrip), this function does not call vscode.workspace.fs.createDirectory(exportFolder). This could cause the export to fail if the directory doesn't exist.
const exportFolder = vscode.Uri.file(userSelectedPath);
+ await vscode.workspace.fs.createDirectory(exportFolder);🤖 Prompt for AI Agents
In src/exportHandler/exportHandler.ts around lines 871 to 872, the exportFolder
Uri is created but the directory is not ensured to exist; add a call to await
vscode.workspace.fs.createDirectory(exportFolder) (using the same pattern as
exportCodexContentAsIdmlRoundtrip/exportCodexContentAsDocxRoundtrip) immediately
after constructing exportFolder so the folder is created before any write
operations, and propagate or handle any errors as the surrounding function does
for other export flows.
| } else if ( | ||
| corpusMarker === 'usfm' || | ||
| importerType === 'usfm-experimental' || | ||
| importerType === 'usfm' || | ||
| // Also check for NT/OT corpus markers with USFM file extensions (Bible books imported as USFM) | ||
| ((corpusMarker === 'NT' || corpusMarker === 'OT') && | ||
| originalFileName && | ||
| (originalFileName.endsWith('.usfm') || originalFileName.endsWith('.sfm') || originalFileName.endsWith('.USFM') || originalFileName.endsWith('.SFM'))) || | ||
| (originalFileName && (originalFileName.endsWith('.usfm') || originalFileName.endsWith('.sfm') || originalFileName.endsWith('.USFM') || originalFileName.endsWith('.SFM'))) | ||
| ) { | ||
| // USFM files use the USFM round-trip exporter | ||
| filesByType['usfm'] = filesByType['usfm'] || []; | ||
| filesByType['usfm'].push(filePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the directory structure and find importer files
fd -t f "importer" webviews/codex-webviews/src/NewSourceUploader/ --type fRepository: genesis-ai-dev/codex-editor
Length of output: 1666
🏁 Script executed:
# Search for .usfm/.sfm references in importers
rg -n "\.usfm|\.sfm" webviews/codex-webviews/src/NewSourceUploader/ -t tsRepository: genesis-ai-dev/codex-editor
Length of output: 2302
🏁 Script executed:
# Get the broader context around lines 1277-1289 in exportHandler.ts
cat -n src/exportHandler/exportHandler.ts | sed -n '1260,1310p'Repository: genesis-ai-dev/codex-editor
Length of output: 3876
🏁 Script executed:
# Check how corpusMarker and importerType are set in metadata
rg -n "corpusMarker\s*=" src/exportHandler/ --type ts -B 2 -A 2Repository: genesis-ai-dev/codex-editor
Length of output: 10483
🏁 Script executed:
# Check Paratext importer to see how it handles USFM files and metadata
head -100 webviews/codex-webviews/src/NewSourceUploader/importers/paratext/index.tsRepository: genesis-ai-dev/codex-editor
Length of output: 203
🏁 Script executed:
# Look for where file metadata is created/stored
rg -n "importerType|corpusMarker" src/exportHandler/exportHandler.ts --type ts -B 1 -A 1 | head -80Repository: genesis-ai-dev/codex-editor
Length of output: 5654
🏁 Script executed:
# Find Paratext importer files
find webviews/codex-webviews/src/NewSourceUploader/importers/paratext -type f -name "*.ts" -o -name "*.tsx"Repository: genesis-ai-dev/codex-editor
Length of output: 310
🏁 Script executed:
# Search for where corpusMarker is assigned (set, not just read)
rg -n "corpusMarker\s*:" --type ts -B 2 -A 2 | head -150Repository: genesis-ai-dev/codex-editor
Length of output: 13353
🏁 Script executed:
# Check for importerType assignment in metadata
rg -n "importerType\s*:" --type ts -B 2 -A 2 | head -150Repository: genesis-ai-dev/codex-editor
Length of output: 15770
🏁 Script executed:
# Check Paratext importer file
cat webviews/codex-webviews/src/NewSourceUploader/importers/paratext/index.tsxRepository: genesis-ai-dev/codex-editor
Length of output: 861
🏁 Script executed:
# Get the full Paratext parser to see metadata assignments
cat webviews/codex-webviews/src/NewSourceUploader/importers/paratext/parser.ts | head -300Repository: genesis-ai-dev/codex-editor
Length of output: 11910
🏁 Script executed:
# Search for corpusMarker and importerType assignments in paratext parser
rg -n "corpusMarker|importerType" webviews/codex-webviews/src/NewSourceUploader/importers/paratext/parser.ts -B 2 -A 2Repository: genesis-ai-dev/codex-editor
Length of output: 53
🏁 Script executed:
# Search for the paratextImporter function and its return/metadata setup
rg -n "paratextImporter|notebookPair|createNotebookPair" webviews/codex-webviews/src/NewSourceUploader/importers/paratext/parser.ts -B 2 -A 5Repository: genesis-ai-dev/codex-editor
Length of output: 2849
🏁 Script executed:
# Get the end of the parser file where the main export function likely is
tail -200 webviews/codex-webviews/src/NewSourceUploader/importers/paratext/parser.tsRepository: genesis-ai-dev/codex-editor
Length of output: 9081
🏁 Script executed:
# Find and examine createNotebookPair function
rg -n "export.*createNotebookPair|function createNotebookPair" webviews/codex-webviews/src/NewSourceUploader/importers/common/usfmUtils.ts -A 50 | head -100Repository: genesis-ai-dev/codex-editor
Length of output: 1725
🏁 Script executed:
# Check what originalFileName is set to in Paratext importer
rg -n "originalFileName" webviews/codex-webviews/src/NewSourceUploader/importers/paratext/parser.ts -B 2 -A 2Repository: genesis-ai-dev/codex-editor
Length of output: 53
🏁 Script executed:
# Also check usfmUtils to see what getCorpusMarkerForBook returns
rg -n "getCorpusMarkerForBook" webviews/codex-webviews/src/NewSourceUploader/importers/common/usfmUtils.ts -A 15 | head -50Repository: genesis-ai-dev/codex-editor
Length of output: 1372
🏁 Script executed:
# Find getCorpusMarkerForBook in corpusUtils
rg -n "export.*getCorpusMarkerForBook|function getCorpusMarkerForBook" webviews/codex-webviews/src/utils/corpusUtils.ts -A 20Repository: genesis-ai-dev/codex-editor
Length of output: 159
🏁 Script executed:
# Also check what originalFileName is set to in createNotebookPair - it's baseName
# Let's verify baseName for Paratext is the book code by looking at the Paratext parser call again
cat webviews/codex-webviews/src/NewSourceUploader/importers/paratext/parser.ts | sed -n '437,465p'Repository: genesis-ai-dev/codex-editor
Length of output: 1613
🏁 Script executed:
# Search for getCorpusMarkerForBook across the entire codebase
rg -n "getCorpusMarkerForBook" webviews/ src/ --type ts -A 10 | head -100Repository: genesis-ai-dev/codex-editor
Length of output: 10410
🏁 Script executed:
# Find and examine getCorpusMarkerForBook in sharedUtils
cat sharedUtils/corpusUtils.ts | head -200Repository: genesis-ai-dev/codex-editor
Length of output: 3202
🏁 Script executed:
# Check if there's a Paratext exporter in exportHandler
rg -n "paratext|Paratext" src/exportHandler/exportHandler.ts -B 2 -A 2Repository: genesis-ai-dev/codex-editor
Length of output: 1544
🏁 Script executed:
# Check what file types are actually supported in rebuild export
rg -n "filesByType\[" src/exportHandler/exportHandler.ts | grep -A 1 -B 1 "1277\|1289\|1290"Repository: genesis-ai-dev/codex-editor
Length of output: 365
🏁 Script executed:
# Check USFM importer to see what originalFileName it sets
rg -n "originalFileName|createNotebookPair" webviews/codex-webviews/src/NewSourceUploader/importers/usfm/index.ts -B 2 -A 2 | head -60Repository: genesis-ai-dev/codex-editor
Length of output: 547
🏁 Script executed:
# Also check if there are any other places where files get .usfm extension in originalFileName
rg -n "originalFileName.*usfm|\.usfm.*originalFileName" webviews/ src/ --type tsRepository: genesis-ai-dev/codex-editor
Length of output: 980
🏁 Script executed:
# Check all places where originalFileName is set or modified
rg -n "originalFileName\s*=" webviews/codex-webviews/src/NewSourceUploader/importers/ --type ts -B 2 -A 2 | head -150Repository: genesis-ai-dev/codex-editor
Length of output: 1010
Line 1285 serves as a valid fallback but relies on filename extension as its sole criterion.
The standalone originalFileName check at line 1285 is intentionally broad to catch edge cases where files lack corpusMarker or importerType metadata. However, it assumes that any file with a .usfm/.sfm extension in originalFileName should route to the USFM exporter, regardless of which importer created it.
While this works for USFM-imported files (which strip extensions from originalFileName in createNotebookPair), and Paratext files won't match since they use book codes without extensions, the condition could theoretically misroute files from importers that preserve extensions in originalFileName (e.g., Biblica). If such an importer ever produces a file with .usfm in its originalFileName, it would incorrectly route to the USFM exporter instead of its proper handler.
Consider whether this fallback should require additional validation (e.g., checking importerType is not set to a non-USFM value) to prevent unintended file routing.
🤖 Prompt for AI Agents
In src/exportHandler/exportHandler.ts around lines 1277 to 1289, the fallback
condition at line 1285 routes any file whose originalFileName ends with
.usfm/.sfm to the USFM exporter even when importerType might indicate a
different importer; narrow this fallback by adding an extra check so it only
applies when importerType is absent/empty or explicitly one of the USFM
importers (e.g., 'usfm' or 'usfm-experimental') — alternatively, implement a
negative check that excludes known non-USFM importers (blacklist) before routing
to the USFM exporter; update the conditional accordingly and add a short comment
explaining the reason for the stricter fallback.
| // Show save dialog | ||
| const workspaceFolder = vscode.workspace.workspaceFolders?.[0]; | ||
| const defaultUri = workspaceFolder | ||
| ? vscode.Uri.joinPath(workspaceFolder.uri, fileName) | ||
| : undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanitize fileName to prevent path traversal in default URI.
The fileName from the webview message is used directly in constructing defaultUri. While the VS Code save dialog ultimately controls where the file is saved, a malicious or malformed fileName containing path components (e.g., ../../../etc/passwd) could set an unexpected default location, potentially confusing users or exposing directory structure.
+ // Sanitize fileName to prevent path traversal
+ const sanitizedFileName = path.basename(fileName);
+
// Show save dialog
const workspaceFolder = vscode.workspace.workspaceFolders?.[0];
const defaultUri = workspaceFolder
- ? vscode.Uri.joinPath(workspaceFolder.uri, fileName)
+ ? vscode.Uri.joinPath(workspaceFolder.uri, sanitizedFileName)
: undefined;Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/providers/NewSourceUploader/NewSourceUploaderProvider.ts around lines
1356 to 1360, the fileName from the webview is used directly to build defaultUri
which allows path traversal; sanitize the incoming fileName before joining it to
the workspace path by extracting only a safe basename (strip any path separators
and parent references like "../" or backslashes), reject or replace unsafe
characters, enforce a reasonable max length, and fallback to a fixed safe
defaultName if validation fails; then pass that sanitized name to
vscode.Uri.joinPath so the save dialog default cannot point outside the intended
workspace.
| import { usfmImporterPlugin } from "./usfm/index.tsx"; // Original USFM importer | ||
| import { usfmExperimentalImporterPlugin } from "./usfm/experimental/index.tsx"; // Experimental round-trip importer (standalone with headers in chapter 1) | ||
| import { ebibleDownloadImporterPlugin } from "./ebibleCorpus/index.tsx"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# First, let's find and examine the registry.tsx file
fd -name "registry.tsx" webviews/codex-webviews/src/NewSourceUploader/importers
# Then examine its contents
cat webviews/codex-webviews/src/NewSourceUploader/importers/registry.tsxRepository: genesis-ai-dev/codex-editor
Length of output: 301
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the registry.tsx file
find webviews/codex-webviews/src/NewSourceUploader/importers -name "registry.tsx" -type f
# Read the file
cat webviews/codex-webviews/src/NewSourceUploader/importers/registry.tsxRepository: genesis-ai-dev/codex-editor
Length of output: 9256
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for plugin ID definitions across all importer plugins
rg 'id:\s*["\']' webviews/codex-webviews/src/NewSourceUploader/importers/ --type ts --type tsx -A 0 | head -50Repository: genesis-ai-dev/codex-editor
Length of output: 139
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for plugin ID definitions across all importer plugins
rg 'id:' webviews/codex-webviews/src/NewSourceUploader/importers/ --type ts --type tsx -B 1 -A 1Repository: genesis-ai-dev/codex-editor
Length of output: 98
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for plugin ID definitions across all importer plugins
rg 'id:' webviews/codex-webviews/src/NewSourceUploader/importers/ -B 1 -A 1 | head -100Repository: genesis-ai-dev/codex-editor
Length of output: 9136
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find all index.tsx files in importer directories and extract their IDs
find webviews/codex-webviews/src/NewSourceUploader/importers -name "index.tsx" -type f | while read file; do
echo "=== $file ==="
rg 'id:\s*"[^"]*"' "$file" | head -1
doneRepository: genesis-ai-dev/codex-editor
Length of output: 347
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Extract all plugin IDs from index.tsx files
echo "Extracting all plugin IDs..."
find webviews/codex-webviews/src/NewSourceUploader/importers -name "index.tsx" -type f | sort | while read file; do
id=$(rg 'id:\s*"[^"]*"' "$file" | head -1 | sed 's/.*id:\s*"\([^"]*\)".*/\1/')
if [ -n "$id" ]; then
echo "$id|$file"
fi
done | sort > /tmp/plugin_ids.txt
echo "=== All Plugin IDs ==="
cat /tmp/plugin_ids.txt
echo ""
echo "=== Checking for Duplicates ==="
cut -d'|' -f1 /tmp/plugin_ids.txt | sort | uniq -d | while read dup_id; do
echo "DUPLICATE: $dup_id"
grep "^$dup_id|" /tmp/plugin_ids.txt
doneRepository: genesis-ai-dev/codex-editor
Length of output: 99
🏁 Script executed:
#!/bin/bash
# List all index.tsx files
echo "=== Found index.tsx files ==="
find webviews/codex-webviews/src/NewSourceUploader/importers -name "index.tsx" -type f | sort
echo ""
echo "=== Extracting plugin IDs ==="
# Extract IDs with proper error handling
for file in $(find webviews/codex-webviews/src/NewSourceUploader/importers -name "index.tsx" -type f | sort); do
result=$(rg 'id:\s*"[^"]*"' "$file" 2>/dev/null || true)
if [ -n "$result" ]; then
id=$(echo "$result" | head -1 | sed 's/.*id:\s*"\([^"]*\)".*/\1/')
echo "$id | $file"
else
echo "NO_ID_FOUND | $file"
fi
doneRepository: genesis-ai-dev/codex-editor
Length of output: 3254
🏁 Script executed:
#!/bin/bash
echo "=== Checking biblica-swapper plugin ==="
find webviews/codex-webviews/src/NewSourceUploader/importers -path "*biblica*" -name "*.tsx" | head -10
echo ""
echo "=== Contents of biblica/index.tsx ==="
cat webviews/codex-webviews/src/NewSourceUploader/importers/biblica/index.tsx | head -40
echo ""
echo "=== Contents of indesign/index.tsx ==="
cat webviews/codex-webviews/src/NewSourceUploader/importers/indesign/index.tsx | head -40
echo ""
echo "=== Contents of pdf/index.tsx ==="
cat webviews/codex-webviews/src/NewSourceUploader/importers/pdf/index.tsx | head -40Repository: genesis-ai-dev/codex-editor
Length of output: 2305
🏁 Script executed:
#!/bin/bash
echo "=== Searching for biblicaSwapperImporterPlugin ==="
rg 'biblicaSwapperImporterPlugin' webviews/codex-webviews/src/NewSourceUploader/importers/ -l
echo ""
echo "=== Checking if biblica-swapper directory exists ==="
ls -la webviews/codex-webviews/src/NewSourceUploader/importers/ | grep biblica
echo ""
echo "=== Contents of audio2/index.tsx ==="
cat webviews/codex-webviews/src/NewSourceUploader/importers/audio2/index.tsx | head -50Repository: genesis-ai-dev/codex-editor
Length of output: 1139
🏁 Script executed:
#!/bin/bash
echo "=== Searching for biblica-swapper anywhere in codebase ==="
find webviews/codex-webviews -type d -name "*biblica*"
echo ""
echo "=== Checking registry.tsx line 30 (biblicaSwapperImporterPlugin import) ==="
sed -n '25,35p' webviews/codex-webviews/src/NewSourceUploader/importers/registry.tsx
echo ""
echo "=== Checking all audio directories ==="
find webviews/codex-webviews/src/NewSourceUploader/importers -type d -name "audio*"
echo ""
echo "=== Checking contents of audio directory ==="
ls -la webviews/codex-webviews/src/NewSourceUploader/importers/audio/
echo ""
echo "=== Checking contents of audio2 directory ==="
ls -la webviews/codex-webviews/src/NewSourceUploader/importers/audio2/Repository: genesis-ai-dev/codex-editor
Length of output: 1655
🏁 Script executed:
#!/bin/bash
echo "=== Contents of audio/index.tsx ==="
cat webviews/codex-webviews/src/NewSourceUploader/importers/audio/index.tsx
echo ""
echo "=== Checking if both audio plugins are imported in registry ==="
rg 'audioImporterPlugin|audio2' webviews/codex-webviews/src/NewSourceUploader/importers/registry.tsxRepository: genesis-ai-dev/codex-editor
Length of output: 1143
Remove broken import and fix duplicate plugin IDs
The registry has two critical issues:
-
Missing module:
biblicaSwapperImporterPluginimported from./biblica-swapper/index.tsxdoes not exist. Remove this import and its registry entry (lines 30, 158-163), or create the missing module. -
Duplicate plugin ID: Both
audio/index.tsxandaudio2/index.tsxexport plugins withid: "audio", which breaksgetImporterById()lookups. Either remove the unusedaudio2directory or rename its plugin ID to something unique.
🤖 Prompt for AI Agents
In webviews/codex-webviews/src/NewSourceUploader/importers/registry.tsx lines
19-21 (and also review lines ~30 and 158-163), remove the broken import and
registry entry for biblicaSwapperImporterPlugin (import from
./biblica-swapper/index.tsx does not exist) by deleting its import and any place
it is added to the registry, or alternatively create the missing
./biblica-swapper/index.tsx module that exports the plugin; also fix the
duplicate plugin ID collision between audio/index.tsx and audio2/index.tsx by
either removing the unused audio2 importer or changing its exported plugin id to
a unique value (and update any references) so getImporterById() lookups are
unambiguous.
| // Track which target cells have been matched | ||
| const matchedTargetCells = new Set<any>(); | ||
|
|
||
| // Process each imported content item | ||
| // Only match verses to existing target cells - don't create new cells | ||
| for (const importedItem of importedContent) { | ||
| if (!importedItem.content.trim()) { | ||
| continue; // Skip empty content | ||
| } | ||
|
|
||
| const importedId = importedItem.id; | ||
| let matchedCell: any | null = null; | ||
| let alignmentMethod: AlignedCell['alignmentMethod'] = 'custom'; | ||
| let confidence = 0.0; | ||
|
|
||
| // Strategy 1: PRIORITIZE cellLabel matching (most reliable for verse matching) | ||
| // Check both importedItem.cellLabel and importedItem.metadata?.cellLabel | ||
| const cellLabel = importedItem.cellLabel || (importedItem as any).metadata?.cellLabel; | ||
| if (cellLabel) { | ||
| const labelStr = String(cellLabel).trim(); | ||
| const normalizedLabel = labelStr.toUpperCase(); | ||
|
|
||
| if (targetCellsByLabel.has(labelStr)) { | ||
| matchedCell = targetCellsByLabel.get(labelStr); | ||
| alignmentMethod = 'custom'; | ||
| confidence = 0.95; // High confidence for label matching | ||
| labelMatches++; | ||
| } else if (targetCellsByLabel.has(normalizedLabel)) { | ||
| matchedCell = targetCellsByLabel.get(normalizedLabel); | ||
| alignmentMethod = 'custom'; | ||
| confidence = 0.95; // High confidence for label matching | ||
| labelMatches++; | ||
| } | ||
| } | ||
|
|
||
| // Strategy 2: Try exact ID match (fallback) | ||
| // Try both original case and uppercase | ||
| if (!matchedCell && importedId) { | ||
| const normalizedId = String(importedId).trim().toUpperCase(); | ||
| const originalId = String(importedId).trim(); | ||
|
|
||
| if (targetCellsById.has(originalId)) { | ||
| matchedCell = targetCellsById.get(originalId); | ||
| alignmentMethod = 'exact-id'; | ||
| confidence = 1.0; | ||
| exactMatches++; | ||
| } else if (targetCellsById.has(normalizedId)) { | ||
| matchedCell = targetCellsById.get(normalizedId); | ||
| alignmentMethod = 'exact-id'; | ||
| confidence = 1.0; | ||
| exactMatches++; | ||
| } | ||
| } | ||
|
|
||
| // Strategy 3: Try verse reference matching (for verses) - last resort | ||
| // First try with book code for precise matching, then fallback to chapter:verse | ||
| if (!matchedCell && importedId) { | ||
| // Match pattern: book code (2+ chars), space(s), chapter number, colon, verse number | ||
| const verseMatch = String(importedId).match(/^([A-Z0-9]{2,})\s+(\d+):(\d+[a-z]?)$/i); | ||
| if (verseMatch) { | ||
| const [, bookCode, chapter, verse] = verseMatch; | ||
| const normalizedBookCode = bookCode.toUpperCase(); | ||
| // Try matching with normalized book code first (more precise) | ||
| const verseRefWithBook = `${normalizedBookCode} ${chapter}:${verse}`; | ||
| if (targetVersesByRef.has(verseRefWithBook)) { | ||
| matchedCell = targetVersesByRef.get(verseRefWithBook); | ||
| alignmentMethod = 'custom'; | ||
| confidence = 0.9; // High confidence for book-specific verse matching | ||
| verseMatches++; | ||
| } else { | ||
| // Fallback to chapter:verse matching (in case book codes differ slightly) | ||
| const verseRef = `${chapter}:${verse}`; | ||
| if (targetVersesByRef.has(verseRef)) { | ||
| matchedCell = targetVersesByRef.get(verseRef); | ||
| alignmentMethod = 'custom'; | ||
| confidence = 0.85; // Medium-high confidence for verse matching | ||
| verseMatches++; | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Only add aligned cell if we found a match | ||
| // Skip unmatched verses - don't create new cells for them | ||
| if (matchedCell) { | ||
| matchedTargetCells.add(matchedCell); | ||
| alignedCells.push({ | ||
| notebookCell: matchedCell, | ||
| importedContent: importedItem, | ||
| alignmentMethod, | ||
| confidence, | ||
| }); | ||
| } else { | ||
| // No match found - skip this verse (don't create new cells) | ||
| // Log for debugging but don't add to alignedCells | ||
| console.warn(`[USFM Aligner] No match found for verse: ${importedId || 'unknown'}`); | ||
| unmatched++; | ||
| } | ||
| } | ||
|
|
||
| // IMPORTANT: Preserve all existing target cells that weren't matched | ||
| // This ensures preface cells (chapter 0), headers, and other non-verse cells are kept | ||
| for (const targetCell of targetCells) { | ||
| if (!matchedTargetCells.has(targetCell)) { | ||
| // This cell wasn't matched - preserve it with its original content | ||
| alignedCells.push({ | ||
| notebookCell: targetCell, | ||
| importedContent: { | ||
| id: (targetCell.metadata?.id || targetCell.id) || '', | ||
| content: targetCell.value || targetCell.content || '', | ||
| cellLabel: targetCell.metadata?.cellLabel, | ||
| metadata: targetCell.metadata || {}, | ||
| }, | ||
| alignmentMethod: 'custom', // Preserved existing cell | ||
| confidence: 1.0, | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return aligned cells in targetCells order (current logic can reorder everything)
Right now you push() matched cells in importedContent order and then append all preserved target cells, which can reorder the final aligned list. If downstream applies updates sequentially or UI previews are order-dependent, this will be painful.
Suggested shape: collect matches into a Map<targetCell, alignedPayload>, then iterate targetCells once and emit either the matched alignment or the preserved cell (keeps stable order and guarantees one entry per target cell).
🤖 Prompt for AI Agents
In
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts
around lines 74-191, the current logic pushes matched alignedCells in
importedContent order and then appends preserved target cells, which reorders
entries; change this to build a Map keyed by targetCell -> alignedPayload when
matching (instead of pushing directly), track unmatched count separately, then
after processing importedContent iterate the original targetCells array in order
and push either the mapped matched payload or a preserved payload for that
targetCell (ensuring exactly one output per target cell and stable ordering);
remove the final loop that filters unmatched targetCells into alignedCells and
instead use the single ordered iteration over targetCells to emit results.
| alignedCells.push({ | ||
| notebookCell: targetCell, | ||
| importedContent: { | ||
| id: (targetCell.metadata?.id || targetCell.id) || '', | ||
| content: targetCell.value || targetCell.content || '', | ||
| cellLabel: targetCell.metadata?.cellLabel, | ||
| metadata: targetCell.metadata || {}, | ||
| }, | ||
| alignmentMethod: 'custom', // Preserved existing cell | ||
| confidence: 1.0, | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ?? (not ||) when preserving existing cell content
targetCell.value || targetCell.content || '' will drop legitimate empty strings. Safer:
- content: targetCell.value || targetCell.content || '',
+ content: (targetCell.value ?? targetCell.content ?? ''),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| alignedCells.push({ | |
| notebookCell: targetCell, | |
| importedContent: { | |
| id: (targetCell.metadata?.id || targetCell.id) || '', | |
| content: targetCell.value || targetCell.content || '', | |
| cellLabel: targetCell.metadata?.cellLabel, | |
| metadata: targetCell.metadata || {}, | |
| }, | |
| alignmentMethod: 'custom', // Preserved existing cell | |
| confidence: 1.0, | |
| }); | |
| alignedCells.push({ | |
| notebookCell: targetCell, | |
| importedContent: { | |
| id: (targetCell.metadata?.id || targetCell.id) || '', | |
| content: (targetCell.value ?? targetCell.content ?? ''), | |
| cellLabel: targetCell.metadata?.cellLabel, | |
| metadata: targetCell.metadata || {}, | |
| }, | |
| alignmentMethod: 'custom', // Preserved existing cell | |
| confidence: 1.0, | |
| }); |
🤖 Prompt for AI Agents
In
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmCellAligner.ts
around lines 179 to 189, the code uses the logical OR operator to choose content
(targetCell.value || targetCell.content || ''), which will treat legitimate
empty strings as falsy and replace them; change those fallbacks to use the
nullish coalescing operator (??) so that only null/undefined are replaced (e.g.,
targetCell.value ?? targetCell.content ?? ''), and apply the same change to any
other fields here that should allow empty strings (such as id if intended),
ensuring metadata remains unchanged.
| } else { | ||
| // No translation found - keep original | ||
| updatedLines.push(mapping.originalLine || lines[i]); | ||
| if (mapping.cellId && cellTranslations.has(mapping.cellId)) { | ||
| console.warn(`[USFM Export] No translation found for cellId: ${mapping.cellId}`); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning condition appears to be unreachable.
The condition on line 384 checks cellTranslations.has(mapping.cellId) in the "no translation found" else branch. However, if cellTranslations.has(mapping.cellId) is true, the translation should have been retrieved on line 172. This warning will never trigger as written.
} else {
// No translation found - keep original
updatedLines.push(mapping.originalLine || lines[i]);
- if (mapping.cellId && cellTranslations.has(mapping.cellId)) {
- console.warn(`[USFM Export] No translation found for cellId: ${mapping.cellId}`);
+ if (mapping.cellId) {
+ // Log that we expected a translation but didn't find one
+ console.log(`[USFM Export] No translation found for cellId: ${mapping.cellId}`);
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else { | |
| // No translation found - keep original | |
| updatedLines.push(mapping.originalLine || lines[i]); | |
| if (mapping.cellId && cellTranslations.has(mapping.cellId)) { | |
| console.warn(`[USFM Export] No translation found for cellId: ${mapping.cellId}`); | |
| } | |
| } | |
| } else { | |
| // No translation found - keep original | |
| updatedLines.push(mapping.originalLine || lines[i]); | |
| if (mapping.cellId) { | |
| // Log that we expected a translation but didn't find one | |
| console.log(`[USFM Export] No translation found for cellId: ${mapping.cellId}`); | |
| } | |
| } |
🤖 Prompt for AI Agents
In
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/usfmExporter.ts
around lines 381 to 387, the warning checks cellTranslations.has(mapping.cellId)
inside the "no translation found" else branch which is unreachable; change the
condition to warn when a mapping.cellId exists but the cellTranslations map does
NOT contain it (i.e., invert the check to !cellTranslations.has(mapping.cellId))
or remove the redundant warning entirely—ensure the warning only fires when
there is a cellId and its translation is missing.
| @@ -0,0 +1,409 @@ | |||
| import React, { useState, useCallback } from "react"; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Missing "use client" directive.
Per coding guidelines for TSX files in a Vite + React + TypeScript VSCode webview environment, add the "use client" directive at the top of the file.
+"use client";
+
import React, { useState, useCallback } from "react";📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import React, { useState, useCallback } from "react"; | |
| "use client"; | |
| import React, { useState, useCallback } from "react"; |
🤖 Prompt for AI Agents
In
webviews/codex-webviews/src/NewSourceUploader/importers/usfm/experimental/UsfmImporterForm.tsx
around line 1, the file is missing the required "use client" directive for TSX
files in our Vite + React + TypeScript webview environment; add a single line
containing "use client" (exactly that string with quotes) at the very top of the
file before any imports to enable client-side semantics.
| console.log(`[USFM Export] LineMappings with cellId: ${lineMappings.filter(m => m.cellId && m.cellId !== '').length}`); | ||
| } else { | ||
| // First param is codexCells (backward compatibility - old experimental exporter) | ||
| cells = lineMappingsOrCells as Array<{ kind: number; value: string; metadata: any; }>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty lineMappings array causes codexCells to be ignored
When lineMappings is an empty array [], the export fails to use the provided codexCells translations. In exportHandler.ts, the check if (lineMappings) passes for an empty array (truthy in JavaScript), calling exportUsfmRoundtrip with three arguments. However, in exportUsfmRoundtrip, the condition lineMappingsOrCells.length > 0 fails for an empty array, causing the function to incorrectly treat the empty array as codexCells and ignore the actual third parameter. This results in no translations being applied to the exported file.
Additional Locations (1)
| const cellChapter = seenFirstChapter ? currentChapter : 1; | ||
|
|
||
| // Handle verse markers specially - collect multi-line verses | ||
| if (marker === 'v' || marker.startsWith('v')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marker check incorrectly matches \va and \vp as verses
The condition marker === 'v' || marker.startsWith('v') incorrectly matches USFM markers \va (alternate verse number) and \vp (published verse character) as regular verse markers. These markers have different formats and semantics than \v. When these markers appear on their own lines, they would be incorrectly parsed as verses, potentially corrupting the cell structure and breaking round-trip export for USFM files using these markers.
Additional Locations (1)
| currentVerse = { | ||
| verseNumber, | ||
| verseText: verseText ? [verseText] : [], | ||
| breakTags: [''], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array misalignment when verse line has no text
When a verse line has no text (e.g., \v 1 followed by \li1 text), the verseText and breakTags arrays become misaligned. The initialization sets verseText: [] when text is empty, but breakTags: [''] always starts with one element. When continuation lines are added, both arrays receive a push(), but they remain off by one. In finishCurrentVerse(), the loop iterates over verseText.length, so the continuation line's text at index 0 incorrectly pairs with the empty break tag at index 0 instead of the actual break tag at index 1. This causes missing <br> tags in HTML output and incorrect round-trip export for USFM files with poetry or list structures.
Implemented a new experimental way to import USFM files (new button in the importers webview), that uses the rebuilding logic of IDML and other improters I implemented. It still needs to be fully tested, but this first iteration should be mostly functional with most USFM files. (important, users can not merge cells, it will break it)
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.
Note
Experimental USFM round-trip support
usfm-experimentalimporter (parser, inline mapper, cell aligner, exporter) enabling verse-only target imports and precise round-trip rebuild usingstructureMetadata.lineMappingsRebuild Export integration
exportCodexContentAsUsfmRoundtripwith progress, error handling, and timestamped filenamesWebview and provider enhancements
saveFilemessage andhandleSaveFileinNewSourceUploaderProviderto save base64 payloads via VS Code save dialogfinalizeAudioImportnotebook write in async handlerOther
biblica-swapperimporterWritten by Cursor Bugbot for commit 96ad76c. This will update automatically on new commits. Configure here.