Skip to content

Conversation

@epsjunior
Copy link
Contributor

@epsjunior epsjunior commented Aug 14, 2025

Enhance MDX to Markdown processing for better AI readability

Overview

This PR improves the existing "Copy page" feature by implementing advanced MDX-to-Markdown conversion and replacing blob URLs with actual .md files for better AI assistant integration.

Changes

🔧 MDX Processing Script (scripts/process-mdx-to-md.js)

  • NEW: Advanced conversion from MDX to clean Markdown
  • Removes: Import statements and JSX noise
  • Converts:
    • <CustomCard>- **[Title](URL)**: Description
    • <Image> / <img>![alt](absolute_url)
    • <Callout> → Blockquotes with emojis
    • <div>, <br> → Clean text formatting
  • URL Conversion: All relative URLs → absolute (https://docs.genlayer.com/...)

📁 Build Process (package.json)

  • Updated: sync-mdx script to use new processing instead of simple copy
  • Integration: Added to dev and build pipelines

🎯 Component Updates (components/copy-page.tsx)

  • "View as Markdown": Now opens actual .md files instead of blob URLs
  • AI Integration: ChatGPT/Claude now receive markdown file URLs instead of doc page URLs
  • Copy Function: Uses processed .md files for consistent content

🧹 File Structure

  • Generated: Clean .md files in public/pages/ (gitignored)
  • Removed: Blob URL generation and cleanup logic

Before vs After

Before:

// Blob URLs: blob:http://localhost:3000/abc123...
// AI gets: "read this docs page https://docs.genlayer.com/page"

After:

// Real URLs: https://docs.genlayer.com/pages/path/file.md  
// AI gets: "read this markdown file https://docs.genlayer.com/pages/path/file.md"

Benefits

  • Better AI assistance: Clean markdown without JSX noise
  • Shareable URLs: Real .md files instead of temporary blobs
  • Improved readability: Professional markdown formatting
  • Consistent content: Same processed files for copy/view/AI

Testing

  • MDX components properly converted to markdown
  • All URLs converted to absolute paths
  • AI platforms receive working markdown file URLs
  • "View as Markdown" opens clean .md files in browser
  • Build process includes .md generation

Summary by CodeRabbit

  • New Features

    • Copy/view/share actions now use pre-generated Markdown files and open MD URLs for AI sharing and viewing.
    • Route-aware prefetching of Markdown content for faster copy and AI workflows.
  • Improvements

    • Improved error handling and fallbacks when fetching Markdown content.
    • Link and path sanitization for consistent Markdown resolution.
  • Documentation

    • Automated conversion of MDX docs into normalized Markdown with absolute links and cleaner formatting.
  • Chores

    • Added ignore rule for generated public docs and added a sync script integrated into dev/build pipelines.

@netlify
Copy link

netlify bot commented Aug 14, 2025

Deploy Preview for genlayer-docs ready!

Name Link
🔨 Latest commit 507956f
🔍 Latest deploy log https://app.netlify.com/projects/genlayer-docs/deploys/68a5265e7bdde700080ac3a1
😎 Deploy Preview https://deploy-preview-278--genlayer-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 14, 2025

Walkthrough

Switches CopyPage from DOM-derived Markdown to using pre-generated Markdown files in public/pages (route-based resolution and prefetch), adds a Node script to convert pages//*.mdx → public/pages//*.md, wires that script into dev/build, and ignores the generated public/pages directory.

Changes

Cohort / File(s) Summary
Ignore rules
/.gitignore
Adds public/pages to ignored paths.
CopyPage MD source switch
components/copy-page.tsx
Replaces DOM-based Markdown generation with route-aware resolution: uses useRouter and sanitized path mapping (root → /pages/index.md), prefetches Markdown into prefetchedContent, copies prefetched content to clipboard, opens MD URL for viewing, and sends MD file URL in AI prompts; adds fetch error handling.
MDX → Markdown pipeline & scripts
/package.json, scripts/process-mdx-to-md.js
Adds scripts/process-mdx-to-md.js to convert pages/**/*.mdxpublic/pages/**/*.md (preserves structure, normalizes relative links to absolute with base URL, strips imports and MDX wrappers, converts components/links/images/callouts/tabs to plain Markdown), adds sync-mdx npm script, and runs it in dev and build after docs generation and before Next steps.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant CopyPage
  participant Router
  participant StaticMD as Static MD (public/pages)
  participant Clipboard
  participant Browser

  User->>CopyPage: Click "Copy as Markdown"
  CopyPage->>Router: read router.asPath
  CopyPage->>StaticMD: GET /pages{sanitizedPath}.md
  StaticMD-->>CopyPage: 200 + MD content / 404
  alt Success
    CopyPage->>Clipboard: write(prefetchedContent)
  else Failure
    CopyPage->>CopyPage: console.error("Prefetch failed, will use fallback:")
  end

  User->>CopyPage: Click "View as Markdown"
  CopyPage->>Browser: open(new tab, origin + /pages{sanitizedPath}.md)

  User->>CopyPage: Click "Open in AI"
  CopyPage->>Browser: open(AI URL with markdown file URL in prompt)
Loading
sequenceDiagram
  autonumber
  participant DevScript as Dev/Build Script
  participant DocsGen as generate-full-docs.js
  participant SyncMDX as sync-mdx (process-mdx-to-md.js)
  participant Next as Next.js (dev/build)

  DevScript->>DocsGen: run docs generation
  DocsGen-->>DevScript: done
  DevScript->>SyncMDX: run sync-mdx (process .mdx -> public/pages/.md)
  SyncMDX-->>DevScript: MD files written
  DevScript->>Next: continue to next dev/build step
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • mpaya5
  • danielrc888

Poem

I nibble MDX and hop with pride,
Turn pages to markdown, burrowed inside.
I copy, I open, I send with a grin,
Static crumbs saved for browsers to spin.
🥕

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dxp-575-improve-mdx-to-markdown-processing-for-better-ai-readability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (2)
components/copy-page.tsx (1)

136-136: Fix capitalization inconsistency in "View as MarkDown"

The text "View as MarkDown" has inconsistent capitalization. It should be "View as Markdown" to match standard conventions.

-                <span className={styles.dropdownTitle}>View as MarkDown</span>
+                <span className={styles.dropdownTitle}>View as Markdown</span>
scripts/process-mdx-to-md.js (1)

67-82: Image component regex patterns have redundant logic

Lines 67-73 and 75-82 handle Image components with and without alt attributes, but the logic could be simplified into a single regex that handles both cases.

-  // Convert Image components to markdown images (with alt)
-  processed = processed.replace(
-    /<Image[^>]*\s+src="([^"]*)"[^>]*\s+alt="([^"]*)"[^>]*\/?>/g,
-    (match, src, alt) => {
-      const absoluteUrl = makeAbsoluteUrl(src);
-      return `![${alt}](${absoluteUrl})`;
-    }
-  );
-
-  // Convert Image components to markdown images (without alt)
+  // Convert Image components to markdown images
   processed = processed.replace(
-    /<Image[^>]*\s+src="([^"]*)"[^>]*\/?>/g,
-    (match, src) => {
-      const absoluteUrl = makeAbsoluteUrl(src);
-      return `![Image](${absoluteUrl})`;
+    /<Image([^>]*)\/?>/g,
+    (match, attrs) => {
+      const srcMatch = attrs.match(/src="([^"]*)"/);
+      const altMatch = attrs.match(/alt="([^"]*)"/);
+      
+      if (srcMatch) {
+        const src = srcMatch[1];
+        const alt = altMatch ? altMatch[1] : 'Image';
+        const absoluteUrl = makeAbsoluteUrl(src);
+        return `![${alt}](${absoluteUrl})`;
+      }
+      return match; // Return unchanged if src is missing
     }
   );
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 02cb46b and 8f248be.

📒 Files selected for processing (4)
  • .gitignore (1 hunks)
  • components/copy-page.tsx (3 hunks)
  • package.json (1 hunks)
  • scripts/process-mdx-to-md.js (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
scripts/**/*

📄 CodeRabbit Inference Engine (CLAUDE.md)

Scripts for build-time generation are located in the scripts/ directory

Files:

  • scripts/process-mdx-to-md.js
components/**/*

📄 CodeRabbit Inference Engine (CLAUDE.md)

components/**/*: Custom components go in /components/ directory
Follow existing patterns for icons and cards when developing components

Files:

  • components/copy-page.tsx
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to pages/**/*.mdx : Create .mdx file in appropriate pages/ subdirectory when adding new pages
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to pages/**/*.mdx : All content is in MDX format supporting React components
📚 Learning: 2025-07-24T07:56:58.272Z
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to pages/**/*.mdx : All content is in MDX format supporting React components

Applied to files:

  • scripts/process-mdx-to-md.js
📚 Learning: 2025-07-24T07:56:58.272Z
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to pages/**/*.mdx : Create .mdx file in appropriate pages/ subdirectory when adding new pages

Applied to files:

  • scripts/process-mdx-to-md.js
📚 Learning: 2025-07-24T07:56:58.272Z
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to scripts/generate-full-docs.js : Build process includes automatic generation of full documentation concatenation (scripts/generate-full-docs.js) and sitemap generation (scripts/generate-sitemap-xml.js)

Applied to files:

  • scripts/process-mdx-to-md.js
  • package.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Redirect rules - genlayer-docs
  • GitHub Check: Header rules - genlayer-docs
  • GitHub Check: Pages changed - genlayer-docs
🔇 Additional comments (2)
.gitignore (1)

6-6: LGTM!

The addition of public/pages to .gitignore is appropriate since this directory contains generated files from the MDX-to-Markdown conversion process.

package.json (1)

14-14: LGTM! Clean addition of the sync-mdx script

The new sync-mdx script is well-integrated into the build pipeline and follows the existing naming conventions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
components/copy-page.tsx (3)

51-64: Provide user feedback when copying empty content

When prefetchedContent is null (due to prefetch failure), the function silently copies an empty string without informing the user.

Add proper error handling:

 const copyPageAsMarkdown = async () => {
   try {
+    if (!prefetchedContent) {
+      alert('Markdown content not available. Please ensure the build process has completed.');
+      setIsOpen(false);
+      return;
+    }
-    await navigator.clipboard.writeText(prefetchedContent || '');
+    await navigator.clipboard.writeText(prefetchedContent);
     
     // Show success feedback
     setIsCopied(true);
     setTimeout(() => {
       setIsCopied(false);
     }, 2000);
   } catch (error) {
     console.error('Failed to copy page:', error);
+    alert('Failed to copy page content to clipboard.');
   }
   setIsOpen(false);
 };

30-49: Add better error handling for prefetch failures

The prefetch silently logs errors without notifying the user that the Markdown content might not be available. This could lead to confusion when users try to copy or view content that failed to load.

Consider adding a state to track prefetch failures and show appropriate feedback:

 const [prefetchedContent, setPrefetchedContent] = useState<string | null>(null);
+const [prefetchError, setPrefetchError] = useState<boolean>(false);
 const dropdownRef = useRef<HTMLDivElement>(null);
 const router = useRouter();

 // ... existing code ...

 useEffect(() => {
   const prefetchContent = async () => {
     try {
       const currentPath = router.asPath;
       const cleanPath = currentPath.split('?')[0].split('#')[0];
       const mdUrl = cleanPath === '/' ? '/pages/index.md' : `/pages${cleanPath}.md`;
       
       const response = await fetch(mdUrl);
       if (response.ok) {
         const content = await response.text();
         setPrefetchedContent(content);
+        setPrefetchError(false);
+      } else {
+        setPrefetchError(true);
+        console.error(`Failed to fetch Markdown: ${response.status} ${response.statusText}`);
       }
     } catch (error) {
-      console.log('Prefetch failed, will use fallback:', error);
+      setPrefetchError(true);
+      console.error('Prefetch failed:', error);
     }
   };

   prefetchContent();
 }, [router.asPath]);

66-76: Verify Markdown file exists before opening

The function directly opens a URL without checking if the file exists, which could lead to 404 errors.

Verify file availability first:

-const viewAsMarkdown = () => {
+const viewAsMarkdown = async () => {
   const currentPath = router.asPath;
   
   // Remove query params and hash from path
   const cleanPath = currentPath.split('?')[0].split('#')[0];
   
   // Open the .md file directly (no blob needed!)
   const mdUrl = cleanPath === '/' ? '/pages/index.md' : `/pages${cleanPath}.md`;
-  window.open(mdUrl, '_blank');
+  
+  try {
+    const response = await fetch(mdUrl, { method: 'HEAD' });
+    if (response.ok) {
+      window.open(mdUrl, '_blank');
+    } else {
+      alert('Markdown file not available. Please ensure the build process has completed.');
+    }
+  } catch (error) {
+    console.error('Failed to verify Markdown file:', error);
+    alert('Failed to verify Markdown file availability.');
+  }
+  
   setIsOpen(false);
 };
🧹 Nitpick comments (2)
components/copy-page.tsx (2)

78-96: Consider URL validation for AI platform integration

The function constructs URLs without validating that the Markdown file exists, potentially sending AI platforms to non-existent resources.

Consider checking file availability before opening the AI platform:

-const openInAI = (platform: 'chatgpt' | 'claude') => {
+const openInAI = async (platform: 'chatgpt' | 'claude') => {
   const currentPath = router.asPath;
   const cleanPath = currentPath.split('?')[0].split('#')[0];
   
   // Use the .md file URL instead of the docs page URL
   const mdUrl = cleanPath === '/' ? '/pages/index.md' : `/pages${cleanPath}.md`;
   const fullMdUrl = `${window.location.origin}${mdUrl}`;
   
+  // Optionally verify the file exists
+  try {
+    const response = await fetch(mdUrl, { method: 'HEAD' });
+    if (!response.ok) {
+      console.warn('Markdown file may not be available for AI platform');
+    }
+  } catch (error) {
+    console.warn('Could not verify Markdown file availability');
+  }
+  
   const prompt = `I'm building with GenLayer - can you read this markdown file ${fullMdUrl} so I can ask you questions about it?`;
   const encodedPrompt = encodeURIComponent(prompt);

35-36: Extract path cleaning logic to reduce duplication

The path cleaning logic is duplicated across multiple functions.

Consider extracting this into a utility function:

+const getCleanPath = (path: string): string => {
+  return path.split('?')[0].split('#')[0];
+};
+
+const getMdUrl = (path: string): string => {
+  const cleanPath = getCleanPath(path);
+  return cleanPath === '/' ? '/pages/index.md' : `/pages${cleanPath}.md`;
+};

 // Then update the functions:
 useEffect(() => {
   const prefetchContent = async () => {
     try {
-      const currentPath = router.asPath;
-      const cleanPath = currentPath.split('?')[0].split('#')[0];
-      const mdUrl = cleanPath === '/' ? '/pages/index.md' : `/pages${cleanPath}.md`;
+      const mdUrl = getMdUrl(router.asPath);
       
       const response = await fetch(mdUrl);

Also applies to: 70-70, 80-80

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8f248be and ff642f8.

📒 Files selected for processing (1)
  • components/copy-page.tsx (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/**/*

📄 CodeRabbit Inference Engine (CLAUDE.md)

components/**/*: Custom components go in /components/ directory
Follow existing patterns for icons and cards when developing components

Files:

  • components/copy-page.tsx
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to pages/**/*.mdx : Create .mdx file in appropriate pages/ subdirectory when adding new pages
📚 Learning: 2025-07-24T07:56:58.272Z
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to pages/**/*.mdx : All content is in MDX format supporting React components

Applied to files:

  • components/copy-page.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Redirect rules - genlayer-docs
  • GitHub Check: Header rules - genlayer-docs
  • GitHub Check: Pages changed - genlayer-docs

@epsjunior
Copy link
Contributor Author

Additional updates in this PR

  • MDX Processing Script enhancements:
    • Convert Tabs + Tabs.Tab into ### headings with tab content.
    • Strip non-content wrappers: Cards, Bleed, Fragment (keep inner content).
    • Normalize list indentation and remove redundant blank lines.
    • Fix URL handling to avoid double-encoding:
      • Preserve # anchors.
      • Encode once via encodeURI in the markdown pass.
      • Leave JSX image src as-is; skip images in the link conversion pass.
      • Result: paths with spaces become %20 (e.g., Diagram%20MAIN.jpg), not %2520.

Benefits (additions)

  • Accurate tabbed content: Tab sections render as clean Markdown headings.
  • Robust image URLs: Images with spaces resolve correctly.

Testing (additions)

  • Tabs content becomes proper ### sections in public/pages/developers.md
  • Image with space in name becomes %20 (e.g., public/pages/understand-genlayer-protocol.md)
  • Wrapper components (Cards, Bleed, Fragment) are removed without losing content
  • Lists are properly formatted without stray indentation

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
scripts/process-mdx-to-md.js (3)

78-89: Blockquote only prefixes the first line; subsequent lines fall out of the quote

Multiline Callout content should have '>' applied to every line for consistent rendering across Markdown engines.

-      const prefix = type === 'warning' ? '⚠️ ' : type === 'info' ? 'ℹ️ ' : '';
-      return `> ${prefix}${cleanContent}`;
+      const prefix = type === 'warning' ? '⚠️ ' : type === 'info' ? 'ℹ️ ' : '';
+      const lines = cleanContent.split('\n');
+      if (lines.length > 0 && prefix) lines[0] = `${prefix}${lines[0]}`;
+      return lines.map(l => `> ${l}`).join('\n');

5-24: Non-root-relative URLs detected—extend makeAbsoluteUrl to handle them

The ripgrep check shows many links using paths like “../…” or “./…” which won’t be prefixed by baseUrl and will remain relative, breaking absolute-URL guarantees. Please update makeAbsoluteUrl (or call-site logic) to resolve these against the document’s root or source directory.

Examples of affected files:

  • pages/developers/decentralized-applications/genlayer-js.mdx: lines 52–54
  • pages/developers/intelligent-contracts/tooling-setup.mdx: line 105
  • pages/developers/intelligent-contracts/deploying/network-configuration.mdx: lines 29–30

Suggested action:
• In makeAbsoluteUrl, detect other relative forms (e.g. ./…, ../…) and convert via something like:

const resolved = new URL(url, baseUrl + currentDocPath + '/').href;
return encodeURI(resolved);

• Or introduce a helper that knows each MDX file’s directory to resolve relative links before encoding.


172-188: Switch to AST-based normalization for Markdown links/images

The current regex-based approach in scripts/process-mdx-to-md.js (lines 172–188) will:

  • Break URLs containing parentheses (e.g. Wikipedia links)
  • Still run inside code fences or inline code, corrupting examples

Our scan found at least one occurrence in pages/index.mdx that will be mangled. To ensure correctness and avoid side-effects, replace the regex with an AST-based transform that only visits real link/image nodes.

Please update scripts/process-mdx-to-md.js as follows:

@@ scripts/process-mdx-to-md.js
-  // Convert regular markdown images to absolute URLs
-  processed = processed.replace(
-    /!\[([^\]]*)\]\(([^)]*)\)/g,
-    (match, alt, src) => {
-      const absoluteUrl = makeAbsoluteUrl(src);
-      return `![${alt}](${absoluteUrl})`;
-    }
-  );
-
-  // Convert regular markdown links to absolute URLs (skip images)
-  processed = processed.replace(
-    /(^|[^!])\[([^\]]*)\]\(([^)]*)\)/gm,
-    (match, prefix, text, href) => {
-      const absoluteUrl = makeAbsoluteUrl(href);
-      return `${prefix}[${text}](${absoluteUrl})`;
-    }
-  );
+  // Prefer AST-based normalization for markdown links/images to handle parentheses and avoid code fences
+  processed = normalizeMarkdownLinksAndImages(processed, makeAbsoluteUrl);

Add this helper below (or in a shared utils file):

const { unified } = require('unified');
const remarkParse = require('remark-parse');
const remarkStringify = require('remark-stringify');
const remarkGfm = require('remark-gfm');
const remarkMdx = require('remark-mdx');
const { visit } = require('unist-util-visit');

function normalizeMarkdownLinksAndImages(md, absolutize) {
  const processor = unified()
    .use(remarkParse)
    .use(remarkGfm)
    .use(remarkMdx);

  const tree = processor.parse(md);

  visit(tree, (node) => {
    if ((node.type === 'link' || node.type === 'image') && node.url) {
      node.url = absolutize(node.url);
    }
  });

  return unified()
    .use(remarkStringify, { bullet: '-', fences: true })
    .use(remarkGfm)
    .use(remarkMdx)
    .stringify(tree);
}

This change will:

  • Correctly handle URLs with nested parentheses
  • Skip fenced/inline code blocks
  • Only modify genuine link and image nodes
🧹 Nitpick comments (5)
scripts/process-mdx-to-md.js (5)

69-76: Anchor tag conversion misses nested content and line breaks

([^<]*) only matches on one line and fails when the anchor contains nested tags or breaks. Use a non-greedy dot-all and strip tags from the label.

-  processed = processed.replace(
-    /<a\s+href="([^"]*)"[^>]*>([^<]*)<\/a>/g,
-    (match, href, text) => {
-      const absoluteUrl = makeAbsoluteUrl(href);
-      return `[${text}](${absoluteUrl})`;
-    }
-  );
+  processed = processed.replace(
+    /<a\s+[^>]*href="([^"]*)"[^>]*>([\s\S]*?)<\/a>/g,
+    (match, href, inner) => {
+      const absoluteUrl = makeAbsoluteUrl(href);
+      const text = inner.replace(/<[^>]+>/g, '').replace(/\s+/g, ' ').trim();
+      return `[${text}](${absoluteUrl})`;
+    }
+  );

95-100: Tabs items parsing is fragile; splitting on commas breaks for titles containing commas

Extract quoted strings instead of splitting by comma.

-      const tabTitles = itemsRaw
-        .split(',')
-        .map(s => s.trim())
-        .map(s => s.replace(/^["']|["']$/g, ''))
-        .filter(Boolean);
+      const tabTitles = Array.from(itemsRaw.matchAll(/(['"])(.*?)\1/g))
+        .map(m => m[2])
+        .filter(Boolean);

148-170: Comment vs behavior mismatch: image src is later absolutized

The comment says "leave src as-is to avoid double-encoding", but the later “regular markdown images” pass reprocesses these URLs through makeAbsoluteUrl. Either remove the comment or consolidate logic so images are normalized once.

-  // Convert Image components to markdown images (with alt) - leave src as-is to avoid double-encoding
+  // Convert Image components to markdown images (with alt)

203-205: List indentation normalization may flatten valid nested lists

Blindly removing leading indentation before "- " will change list nesting. Consider skipping this step or scoping it (e.g., only for lines produced by the component conversions).


209-246: Consider pruning stale generated files to avoid serving removed pages

If an MDX file is deleted/renamed, its old .md will linger in public/pages. Optional: compute the target set and remove stale files after generation.

Example helper (add after processing):

function pruneStalePublicMd(pagesDir, publicPagesDir) {
  const expected = new Set();
  (function walk(src, dst) {
    for (const item of fs.readdirSync(src)) {
      const sp = path.join(src, item);
      const st = fs.statSync(sp);
      if (st.isDirectory()) walk(sp, path.join(dst, item));
      else if (item.endsWith('.mdx')) expected.add(path.join(dst, item.replace(/\.mdx$/, '.md')));
    }
  })(pagesDir, publicPagesDir);

  (function walkPrune(dir) {
    for (const item of fs.readdirSync(dir)) {
      const p = path.join(dir, item);
      const st = fs.statSync(p);
      if (st.isDirectory()) walkPrune(p);
      else if (item.endsWith('.md') && !expected.has(p)) fs.rmSync(p);
    }
  })(publicPagesDir);
}

Then call: pruneStalePublicMd(pagesDir, publicPagesDir);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6e71beb and 0acde6d.

📒 Files selected for processing (1)
  • scripts/process-mdx-to-md.js (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
scripts/**/*

📄 CodeRabbit Inference Engine (CLAUDE.md)

Scripts for build-time generation are located in the scripts/ directory

Files:

  • scripts/process-mdx-to-md.js
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to pages/**/*.mdx : Create .mdx file in appropriate pages/ subdirectory when adding new pages
Learnt from: epsjunior
PR: genlayerlabs/genlayer-docs#278
File: components/copy-page.tsx:0-0
Timestamp: 2025-08-19T21:48:24.895Z
Learning: In the genlayer-docs project, markdown files are generated during the build process via `npm run sync-mdx`. If markdown generation fails, the entire deployment process fails, ensuring that markdown files will always exist in production environments.
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to pages/**/*.mdx : All content is in MDX format supporting React components
📚 Learning: 2025-08-19T21:48:24.895Z
Learnt from: epsjunior
PR: genlayerlabs/genlayer-docs#278
File: components/copy-page.tsx:0-0
Timestamp: 2025-08-19T21:48:24.895Z
Learning: In the genlayer-docs project, markdown files are generated during the build process via `npm run sync-mdx`. If markdown generation fails, the entire deployment process fails, ensuring that markdown files will always exist in production environments.

Applied to files:

  • scripts/process-mdx-to-md.js
📚 Learning: 2025-08-19T21:44:11.533Z
Learnt from: epsjunior
PR: genlayerlabs/genlayer-docs#278
File: scripts/process-mdx-to-md.js:8-16
Timestamp: 2025-08-19T21:44:11.533Z
Learning: In the genlayer-docs project, the `makeAbsoluteUrl` function in scripts/process-mdx-to-md.js uses a simple approach that handles only absolute URLs, root-relative URLs, and preserves other formats as-is. The project doesn't use protocol-relative URLs, path traversal sequences, or complex URL patterns, so comprehensive URL validation is not needed.

Applied to files:

  • scripts/process-mdx-to-md.js
📚 Learning: 2025-07-24T07:56:58.272Z
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to pages/**/*.mdx : All content is in MDX format supporting React components

Applied to files:

  • scripts/process-mdx-to-md.js
📚 Learning: 2025-07-24T07:56:58.272Z
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to pages/**/*.mdx : Create .mdx file in appropriate pages/ subdirectory when adding new pages

Applied to files:

  • scripts/process-mdx-to-md.js
📚 Learning: 2025-07-24T07:56:58.272Z
Learnt from: CR
PR: genlayerlabs/genlayer-docs#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-24T07:56:58.272Z
Learning: Applies to scripts/generate-full-docs.js : Build process includes automatic generation of full documentation concatenation (scripts/generate-full-docs.js) and sitemap generation (scripts/generate-sitemap-xml.js)

Applied to files:

  • scripts/process-mdx-to-md.js
📚 Learning: 2025-08-19T22:01:32.272Z
Learnt from: epsjunior
PR: genlayerlabs/genlayer-docs#278
File: scripts/process-mdx-to-md.js:150-184
Timestamp: 2025-08-19T22:01:32.272Z
Learning: In the genlayer-docs project build process, epsjunior prefers to let file operations in build scripts fail fast with natural Node.js errors rather than adding graceful error handling, to ensure build integrity and prevent incomplete builds from proceeding.

Applied to files:

  • scripts/process-mdx-to-md.js

@epsjunior epsjunior requested a review from cristiam86 October 1, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants