diff --git a/Readme.md b/Readme.md index 583c513..b63534b 100644 --- a/Readme.md +++ b/Readme.md @@ -2,8 +2,17 @@ ## Overview -Simple and straight forward Python utility that converts a Markdown file (`.md`) to a Microsoft Word document (`.docx`). It supports multiple Markdown elements, including headings, bold and italic text, both unordered and ordered lists, and many more. +Simple and straight forward Python utility that converts Markdown files (`.md`) to Microsoft Word documents (`.docx`) and vice versa. It supports multiple Markdown elements, including headings, bold and italic text, both unordered and ordered lists, and many more. +## Word to Markdown Conversion Example: +#### Input .docx file: +![image](https://github.com/user-attachments/assets/2891ebdf-ff36-4fd5-af2f-b35413264b06) + +#### Output .md file: +![image](https://github.com/user-attachments/assets/e46c096b-762e-4f0c-a0ab-f81c3069a533) + + +## Markdown to Word Conversion Example: #### Input .md file: ![image](https://github.com/user-attachments/assets/c2325e52-05a7-4e11-8f28-4eeb3d8c06f5) @@ -13,18 +22,22 @@ Simple and straight forward Python utility that converts a Markdown file (`.md`) ## Features -- Converts Markdown headers (`#`, `##`, `###`) to Word document headings. -- Supports bold and italic text formatting. -- Converts unordered (`*`, `-`) and ordered (`1.`, `2.`) lists. -- Handles paragraphs with mixed content. +- Bi-directional conversion between Markdown and Word documents +- Handles various programming languages code given in word doc like python, ruby and more. +- Converts Markdown headers (`#`, `##`, `###`) to Word document headings and back +- Supports bold and italic text formatting +- Converts unordered (`*`, `-`) and ordered (`1.`, `2.`) lists +- Handles paragraphs with mixed content +- Preserves document structure during conversion ## Prerequisites You need to have Python installed on your system along with the following libraries: -- `markdown` for converting Markdown to HTML. -- `python-docx` for creating and editing Word documents. -- `beautifulsoup4` for parsing HTML. +- `markdown` for converting Markdown to HTML +- `python-docx` for creating and editing Word documents +- `beautifulsoup4` for parsing HTML +- `mammoth` for converting Word to HTML Sure, let's enhance your instructions for clarity and completeness: @@ -74,7 +87,33 @@ This code will create a file named `amazon_case_study.docx`, which is the conver --- -This should make it easier to understand and follow the steps. Let me know if you need any more help or further enhancements! +#### For Converting Word to Markdown +Use the `word_to_markdown()` function to convert your Word document to Markdown: + +```python +word_to_markdown(word_file, markdown_file) +``` + +- `word_file`: The path to the Word document you want to convert +- `markdown_file`: The desired path and name for the output Markdown file + + +Here's a complete example: + +```python +from md2docx_python.src.docx2md_python import word_to_markdown + +# Define the paths to your files +word_file = "sample_files/test_document.docx" +markdown_file = "sample_files/test_document_output.md" + +# Convert the Word document to a Markdown file +word_to_markdown(word_file, markdown_file) +``` + +This code will create a file named `test_document_output.md`, which is the conversion of `test_document.docx` to the Markdown format. + +--- ## Why this repo and not others ? @@ -108,6 +147,11 @@ Here are some reasons why this repo might be considered better or more suitable ### 8. **Privacy** - If you are working in a corporate firm and you want to convert your markdown files to word and you use a online tool to do it then there are chances that they will store your file which can cause to a vital information leak of your company. With use of this repo you can easily do the conversion in your own system. +### 9. **Bi-directional Conversion** + - **Complete Workflow**: Convert documents in both directions, allowing for round-trip document processing + - **Format Preservation**: Maintains formatting and structure when converting between formats + - **Flexibility**: Easily switch between Markdown and Word formats based on your needs + ### Comparison to Other Scripts - **Feature Set**: Some scripts may lack comprehensive support for Markdown features or may not handle lists and text formatting well. - **Performance**: Depending on the implementation, performance might vary. This script is designed to be efficient for typical Markdown files. diff --git a/build/lib/md2docx_python/src/docx2md_python.py b/build/lib/md2docx_python/src/docx2md_python.py new file mode 100644 index 0000000..d5162fd --- /dev/null +++ b/build/lib/md2docx_python/src/docx2md_python.py @@ -0,0 +1,95 @@ +from docx import Document +import re + + +def word_to_markdown(word_file, markdown_file): + """ + Convert a Word document to Markdown format + + Args: + word_file (str): Path to the input Word document + markdown_file (str): Path to the output Markdown file + """ + # Open the Word document + doc = Document(word_file) + markdown_content = [] + + for paragraph in doc.paragraphs: + # Skip empty paragraphs + if not paragraph.text.strip(): + continue + + # Get paragraph style + style = paragraph.style.name.lower() + + # Handle code blocks + if style.startswith("code block") or style.startswith("source code"): + markdown_content.append(f"```\n{paragraph.text.strip()}\n```\n\n") + continue + + # Handle headings + if style.startswith("heading"): + level = style[-1] # Get heading level from style name + markdown_content.append(f"{'#' * int(level)} {paragraph.text.strip()}\n") + continue + + # Handle lists + if style.startswith("list bullet"): + markdown_content.append(f"* {paragraph.text.strip()}\n") + continue + if style.startswith("list number"): + markdown_content.append(f"1. {paragraph.text.strip()}\n") + continue + + # Handle regular paragraphs with formatting + formatted_text = "" + for run in paragraph.runs: + text = run.text + if text.strip(): + # Handle inline code (typically monospace font) + if run.font.name in [ + "Consolas", + "Courier New", + "Monaco", + ] or style.startswith("code"): + if "\n" in text: + text = f"```\n{text}\n```" + else: + text = f"`{text}`" + # Apply bold + elif run.bold: + text = f"**{text}**" + # Apply italic + elif run.italic: + text = f"*{text}*" + # Apply both bold and italic + elif run.bold and run.italic: + text = f"***{text}***" + formatted_text += text + + if formatted_text: + markdown_content.append(f"{formatted_text}\n") + + # Add an extra newline after paragraphs + markdown_content.append("\n") + + # Write to markdown file + with open(markdown_file, "w", encoding="utf-8") as f: + f.writelines(markdown_content) + + +def clean_markdown_text(text): + """ + Clean and normalize markdown text + + Args: + text (str): Text to clean + + Returns: + str: Cleaned text + """ + # Remove multiple spaces + text = re.sub(r"\s+", " ", text) + # Remove multiple newlines + text = re.sub(r"\n\s*\n\s*\n", "\n\n", text) + return text.strip() diff --git a/dist/md2docx_python-1.0.0-py3-none-any.whl b/dist/md2docx_python-1.0.0-py3-none-any.whl new file mode 100644 index 0000000..482a23f Binary files /dev/null and b/dist/md2docx_python-1.0.0-py3-none-any.whl differ diff --git a/md2docx_python.egg-info/PKG-INFO b/md2docx_python.egg-info/PKG-INFO index ded2d20..6da7dbe 100644 --- a/md2docx_python.egg-info/PKG-INFO +++ b/md2docx_python.egg-info/PKG-INFO @@ -1,138 +1,177 @@ Metadata-Version: 2.1 Name: md2docx-python -Version: 0.3.2 +Version: 1.0.0 Summary: Markdown to Word Converter. - Simple and straight forward Python utility - that converts a Markdown file (`.md`) to a Microsoft - Word document (`.docx`). It supports multiple Markdown - elements, including headings, bold and italic text, - both unordered and ordered lists and many more. Home-page: https://github.com/shloktech/md2docx-python Author: Shlok Tadilkar Author-email: shloktadilkar@gmail.com License: MIT -Description: # Markdown to Word Converter - - ## Overview - - Simple and straight forward Python utility that converts a Markdown file (`.md`) to a Microsoft Word document (`.docx`). It supports basic Markdown elements, including headings, bold and italic text, and both unordered and ordered lists. - - #### Input .md file: - ![image](https://github.com/user-attachments/assets/c2325e52-05a7-4e11-8f28-4eeb3d8c06f5) - - #### Output .docx file: - ![image](https://github.com/user-attachments/assets/3e48a9dd-8fe3-43cc-8246-164c58e95179) - - - ## Features - - - Converts Markdown headers (`#`, `##`, `###`) to Word document headings. - - Supports bold and italic text formatting. - - Converts unordered (`*`, `-`) and ordered (`1.`, `2.`) lists. - - Handles paragraphs with mixed content. - - ## Prerequisites - - You need to have Python installed on your system along with the following libraries: - - - `markdown` for converting Markdown to HTML. - - `python-docx` for creating and editing Word documents. - - `beautifulsoup4` for parsing HTML. - - Sure, let's enhance your instructions for clarity and completeness: - - --- - - ### How to Convert Markdown to Word Using `md2docx-python` - - #### Step 1: Install the Required Library - First, you need to install the `md2docx-python` library using pip. Open your terminal and run the following command: - - ```bash - pip install md2docx-python - ``` - - #### Step 2: Import the Library in Your Code - To use the library, import it into your Python code with the following line: - - ```python - from md2docx_python.src.md2docx_python import markdown_to_word - ``` - - #### Step 3: Convert Markdown to Word - Call the `markdown_to_word()` function to convert your Markdown file to a Word document. Here's the syntax: - - ```python - markdown_to_word(markdown_file, word_file) - ``` - - - `markdown_file`: The path to the Markdown file you want to convert. - - `word_file`: The desired path and name for the output Word document. - - #### Step 4: Sample Code - Here's a complete example to illustrate how it works: - - ```python - from md2docx_python.src.md2docx_python import markdown_to_word - - # Define the paths to your files - markdown_file = "sample_files/amazon_case_study.md" - word_file = "sample_files/amazon_case_study.docx" - - # Convert the Markdown file to a Word document - markdown_to_word(markdown_file, word_file) - ``` - - This code will create a file named `amazon_case_study.docx`, which is the conversion of `amazon_case_study.md` to the Word format. - - --- - - This should make it easier to understand and follow the steps. Let me know if you need any more help or further enhancements! - - ## Why this repo and not others ? - - Here are some reasons why this repo might be considered better or more suitable for certain use cases compared to other scripts available on the internet: - - ### 1. **Comprehensive Markdown Support** - - **Header Levels**: The script supports multiple header levels (`h1`, `h2`, `h3`), which is important for properly structuring the document. - - **Bold and Italic Text**: It handles bold (`**`) and italic (`*`) text, providing more accurate formatting in the Word document. - - ### 2. **Proper List Formatting** - - **Unordered and Ordered Lists**: The script correctly formats both unordered (`*`, `-`) and ordered lists (`1.`, `2.`) in the Word document. This ensures that lists appear as expected without additional line breaks or formatting issues. - - ### 3. **Use of Well-Supported Libraries** - - **Markdown to HTML Conversion**: Utilizes the `markdown` library, which is a widely used and reliable tool for converting Markdown to HTML. - - **HTML Parsing and Word Document Creation**: Employs `BeautifulSoup` for parsing HTML and `python-docx` for creating Word documents, both of which are robust and well-maintained libraries. - - ### 4. **Simplicity and Readability** - - **Clear Code Structure**: The script is designed to be straightforward and easy to understand, making it accessible for users who may want to customize or extend it. - - **Basic Markdown Elements**: Focuses on the most commonly used Markdown elements, ensuring compatibility with a wide range of Markdown files without unnecessary complexity. - - ### 5. **Customizability** - - **Easy to Modify**: Users can easily adjust the script to handle additional Markdown features or customize the output format based on their specific needs. - - **Example Usage**: Provides a clear example of how to use the script, making it easy for users to adapt it for their own files. - - ### 6. **Minimal Dependencies** - - **Lightweight and Focused**: The script relies on only a few libraries, which reduces potential conflicts and keeps the script lightweight. - - ### 7. **Handles Basic HTML Tags** - - **Text Formatting**: Properly handles bold and italic text by interpreting HTML tags (`strong`, `em`), ensuring that formatting is preserved when converting to Word. - - ### 8. **Privacy** - - If you are working in a corporate firm and you want to convert your markdown files to word and you use a online tool to do it then there are chances that they will store your file which can cause to a vital information leak of your company. With use of this repo you can easily do the conversion in your own system. - - ### Comparison to Other Scripts - - **Feature Set**: Some scripts may lack comprehensive support for Markdown features or may not handle lists and text formatting well. - - **Performance**: Depending on the implementation, performance might vary. This script is designed to be efficient for typical Markdown files. - - **User-Friendliness**: The clear and concise code in this script may make it more user-friendly and easier to modify compared to more complex alternatives. - - Overall, this script provides a balanced combination of functionality, simplicity, and ease of use, which can be advantageous for many users looking to convert Markdown files to Word documents. - - For any queries please start a discussion I will be happy to answer your queries :) - -Platform: UNKNOWN Classifier: License :: OSI Approved :: MIT License Classifier: Programming Language :: Python :: 3.9 Classifier: Operating System :: OS Independent Requires-Python: >=3.9.0 Description-Content-Type: text/markdown +License-File: LICENSE + +# Markdown to Word Converter + +## Overview + +Simple and straight forward Python utility that converts Markdown files (`.md`) to Microsoft Word documents (`.docx`) and vice versa. It supports multiple Markdown elements, including headings, bold and italic text, both unordered and ordered lists, and many more. + +## Word to Markdown Conversion Example: +#### Input .docx file: +![image](https://github.com/user-attachments/assets/2891ebdf-ff36-4fd5-af2f-b35413264b06) + +#### Output .md file: +![image](https://github.com/user-attachments/assets/e46c096b-762e-4f0c-a0ab-f81c3069a533) + + +## Markdown to Word Conversion Example: +#### Input .md file: +![image](https://github.com/user-attachments/assets/c2325e52-05a7-4e11-8f28-4eeb3d8c06f5) + +#### Output .docx file: +![image](https://github.com/user-attachments/assets/3e48a9dd-8fe3-43cc-8246-164c58e95179) + + +## Features + +- Bi-directional conversion between Markdown and Word documents +- Handles various programming languages code given in word doc like python, ruby and more. +- Converts Markdown headers (`#`, `##`, `###`) to Word document headings and back +- Supports bold and italic text formatting +- Converts unordered (`*`, `-`) and ordered (`1.`, `2.`) lists +- Handles paragraphs with mixed content +- Preserves document structure during conversion + +## Prerequisites + +You need to have Python installed on your system along with the following libraries: + +- `markdown` for converting Markdown to HTML +- `python-docx` for creating and editing Word documents +- `beautifulsoup4` for parsing HTML +- `mammoth` for converting Word to HTML + +Sure, let's enhance your instructions for clarity and completeness: + +--- + +### How to Convert Markdown to Word Using `md2docx-python` + +#### Step 1: Install the Required Library +First, you need to install the `md2docx-python` library using pip. Open your terminal and run the following command: + +```bash +pip install md2docx-python +``` + +#### Step 2: Import the Library in Your Code +To use the library, import it into your Python code with the following line: + +```python +from md2docx_python.src.md2docx_python import markdown_to_word +``` + +#### Step 3: Convert Markdown to Word +Call the `markdown_to_word()` function to convert your Markdown file to a Word document. Here's the syntax: + +```python +markdown_to_word(markdown_file, word_file) +``` + +- `markdown_file`: The path to the Markdown file you want to convert. +- `word_file`: The desired path and name for the output Word document. + +#### Step 4: Sample Code +Here's a complete example to illustrate how it works: + +```python +from md2docx_python.src.md2docx_python import markdown_to_word + +# Define the paths to your files +markdown_file = "sample_files/amazon_case_study.md" +word_file = "sample_files/amazon_case_study.docx" + +# Convert the Markdown file to a Word document +markdown_to_word(markdown_file, word_file) +``` + +This code will create a file named `amazon_case_study.docx`, which is the conversion of `amazon_case_study.md` to the Word format. + +--- + +#### For Converting Word to Markdown +Use the `word_to_markdown()` function to convert your Word document to Markdown: + +```python +word_to_markdown(word_file, markdown_file) +``` + +- `word_file`: The path to the Word document you want to convert +- `markdown_file`: The desired path and name for the output Markdown file + + +Here's a complete example: + +```python +from md2docx_python.src.docx2md_python import word_to_markdown + +# Define the paths to your files +word_file = "sample_files/test_document.docx" +markdown_file = "sample_files/test_document_output.md" + +# Convert the Word document to a Markdown file +word_to_markdown(word_file, markdown_file) +``` + +This code will create a file named `test_document_output.md`, which is the conversion of `test_document.docx` to the Markdown format. + +--- + +## Why this repo and not others ? + +Here are some reasons why this repo might be considered better or more suitable for certain use cases compared to other scripts available on the internet: + +### 1. **Comprehensive Markdown Support** + - **Header Levels**: The script supports multiple header levels (`h1`, `h2`, `h3`), which is important for properly structuring the document. + - **Bold and Italic Text**: It handles bold (`**`) and italic (`*`) text, providing more accurate formatting in the Word document. + +### 2. **Proper List Formatting** + - **Unordered and Ordered Lists**: The script correctly formats both unordered (`*`, `-`) and ordered lists (`1.`, `2.`) in the Word document. This ensures that lists appear as expected without additional line breaks or formatting issues. + +### 3. **Use of Well-Supported Libraries** + - **Markdown to HTML Conversion**: Utilizes the `markdown` library, which is a widely used and reliable tool for converting Markdown to HTML. + - **HTML Parsing and Word Document Creation**: Employs `BeautifulSoup` for parsing HTML and `python-docx` for creating Word documents, both of which are robust and well-maintained libraries. + +### 4. **Simplicity and Readability** + - **Clear Code Structure**: The script is designed to be straightforward and easy to understand, making it accessible for users who may want to customize or extend it. + - **Basic Markdown Elements**: Focuses on the most commonly used Markdown elements, ensuring compatibility with a wide range of Markdown files without unnecessary complexity. + +### 5. **Customizability** + - **Easy to Modify**: Users can easily adjust the script to handle additional Markdown features or customize the output format based on their specific needs. + - **Example Usage**: Provides a clear example of how to use the script, making it easy for users to adapt it for their own files. + +### 6. **Minimal Dependencies** + - **Lightweight and Focused**: The script relies on only a few libraries, which reduces potential conflicts and keeps the script lightweight. + +### 7. **Handles Basic HTML Tags** + - **Text Formatting**: Properly handles bold and italic text by interpreting HTML tags (`strong`, `em`), ensuring that formatting is preserved when converting to Word. + +### 8. **Privacy** + - If you are working in a corporate firm and you want to convert your markdown files to word and you use a online tool to do it then there are chances that they will store your file which can cause to a vital information leak of your company. With use of this repo you can easily do the conversion in your own system. + +### 9. **Bi-directional Conversion** + - **Complete Workflow**: Convert documents in both directions, allowing for round-trip document processing + - **Format Preservation**: Maintains formatting and structure when converting between formats + - **Flexibility**: Easily switch between Markdown and Word formats based on your needs + +### Comparison to Other Scripts +- **Feature Set**: Some scripts may lack comprehensive support for Markdown features or may not handle lists and text formatting well. +- **Performance**: Depending on the implementation, performance might vary. This script is designed to be efficient for typical Markdown files. +- **User-Friendliness**: The clear and concise code in this script may make it more user-friendly and easier to modify compared to more complex alternatives. + +Overall, this script provides a balanced combination of functionality, simplicity, and ease of use, which can be advantageous for many users looking to convert Markdown files to Word documents. + +For any queries please start a discussion I will be happy to answer your queries :) diff --git a/md2docx_python.egg-info/SOURCES.txt b/md2docx_python.egg-info/SOURCES.txt index eedc5c4..75d7a06 100644 --- a/md2docx_python.egg-info/SOURCES.txt +++ b/md2docx_python.egg-info/SOURCES.txt @@ -1,3 +1,4 @@ +LICENSE setup.py md2docx_python/__init__.py md2docx_python.egg-info/PKG-INFO @@ -6,4 +7,6 @@ md2docx_python.egg-info/dependency_links.txt md2docx_python.egg-info/requires.txt md2docx_python.egg-info/top_level.txt md2docx_python/src/__init__.py -md2docx_python/src/md2docx_python.py \ No newline at end of file +md2docx_python/src/docx2md_python.py +md2docx_python/src/md2docx_python.py +tests/test_markdown_to_word_converter.py \ No newline at end of file diff --git a/md2docx_python.egg-info/requires.txt b/md2docx_python.egg-info/requires.txt index 8346960..5157833 100644 --- a/md2docx_python.egg-info/requires.txt +++ b/md2docx_python.egg-info/requires.txt @@ -1,3 +1,3 @@ -markdown -python-docx -beautifulsoup4 +markdown>=3.7 +python-docx>=1.1.2 +beautifulsoup4>=4.13.3 diff --git a/md2docx_python/src/amazon_case_study.docx b/md2docx_python/src/amazon_case_study.docx new file mode 100644 index 0000000..1e0b09b Binary files /dev/null and b/md2docx_python/src/amazon_case_study.docx differ diff --git a/md2docx_python/src/docx2md_python.py b/md2docx_python/src/docx2md_python.py new file mode 100644 index 0000000..d5162fd --- /dev/null +++ b/md2docx_python/src/docx2md_python.py @@ -0,0 +1,95 @@ +from docx import Document +import re + + +def word_to_markdown(word_file, markdown_file): + """ + Convert a Word document to Markdown format + + Args: + word_file (str): Path to the input Word document + markdown_file (str): Path to the output Markdown file + """ + # Open the Word document + doc = Document(word_file) + markdown_content = [] + + for paragraph in doc.paragraphs: + # Skip empty paragraphs + if not paragraph.text.strip(): + continue + + # Get paragraph style + style = paragraph.style.name.lower() + + # Handle code blocks + if style.startswith("code block") or style.startswith("source code"): + markdown_content.append(f"```\n{paragraph.text.strip()}\n```\n\n") + continue + + # Handle headings + if style.startswith("heading"): + level = style[-1] # Get heading level from style name + markdown_content.append(f"{'#' * int(level)} {paragraph.text.strip()}\n") + continue + + # Handle lists + if style.startswith("list bullet"): + markdown_content.append(f"* {paragraph.text.strip()}\n") + continue + if style.startswith("list number"): + markdown_content.append(f"1. {paragraph.text.strip()}\n") + continue + + # Handle regular paragraphs with formatting + formatted_text = "" + for run in paragraph.runs: + text = run.text + if text.strip(): + # Handle inline code (typically monospace font) + if run.font.name in [ + "Consolas", + "Courier New", + "Monaco", + ] or style.startswith("code"): + if "\n" in text: + text = f"```\n{text}\n```" + else: + text = f"`{text}`" + # Apply bold + elif run.bold: + text = f"**{text}**" + # Apply italic + elif run.italic: + text = f"*{text}*" + # Apply both bold and italic + elif run.bold and run.italic: + text = f"***{text}***" + formatted_text += text + + if formatted_text: + markdown_content.append(f"{formatted_text}\n") + + # Add an extra newline after paragraphs + markdown_content.append("\n") + + # Write to markdown file + with open(markdown_file, "w", encoding="utf-8") as f: + f.writelines(markdown_content) + + +def clean_markdown_text(text): + """ + Clean and normalize markdown text + + Args: + text (str): Text to clean + + Returns: + str: Cleaned text + """ + # Remove multiple spaces + text = re.sub(r"\s+", " ", text) + # Remove multiple newlines + text = re.sub(r"\n\s*\n\s*\n", "\n\n", text) + return text.strip() diff --git a/md2docx_python/src/helpers/word_file_creator.py b/md2docx_python/src/helpers/word_file_creator.py new file mode 100644 index 0000000..139bb3c --- /dev/null +++ b/md2docx_python/src/helpers/word_file_creator.py @@ -0,0 +1,58 @@ + +from docx import Document +from docx.shared import Pt +from docx.enum.text import WD_ALIGN_PARAGRAPH + + +def create_test_document(output_path): + """Create a test Word document with various formatting""" + doc = Document() + + # Add a title + doc.add_heading("Test Document", level=1) + + # Add some regular paragraphs + doc.add_paragraph("This is a regular paragraph with some text.") + + # Add formatted text + p = doc.add_paragraph() + p.add_run("This paragraph has ") + p.add_run("bold text, ").bold = True + p.add_run("italic text, ").italic = True + run = p.add_run("and bold-italic text.") + run.bold = True + run.italic = True + + # Add different heading levels + doc.add_heading("Heading Level 2", level=2) + doc.add_heading("Heading Level 3", level=3) + + # Add bullet points + doc.add_paragraph("First bullet point", style="List Bullet") + doc.add_paragraph("Second bullet point", style="List Bullet") + + # Add numbered list + doc.add_paragraph("First numbered item", style="List Number") + doc.add_paragraph("Second numbered item", style="List Number") + + # Add a code block + code_block = doc.add_paragraph() + code_block.style = doc.styles["Normal"] + code_run = code_block.add_run( + """def hello_world(): + print("Hello, World!") + return True""" + ) + code_run.font.name = "Consolas" + + # Add inline code + p = doc.add_paragraph("Here is some inline code: ") + code = p.add_run('print("Hello")') + code.font.name = "Courier New" + + # Save the document + doc.save(output_path) + + +if __name__ == "__main__": + create_test_document("sample_files/test_document.docx") diff --git a/md2docx_python/tests/create_test_doc.py b/md2docx_python/tests/create_test_doc.py new file mode 100644 index 0000000..9afacf1 --- /dev/null +++ b/md2docx_python/tests/create_test_doc.py @@ -0,0 +1,53 @@ +from docx import Document +from docx.shared import Pt +from docx.enum.text import WD_ALIGN_PARAGRAPH + +def create_test_document(output_path): + """Create a test Word document with various formatting""" + doc = Document() + + # Add a title + doc.add_heading('Test Document', level=1) + + # Add some regular paragraphs + doc.add_paragraph('This is a regular paragraph with some text.') + + # Add formatted text + p = doc.add_paragraph() + p.add_run('This paragraph has ') + p.add_run('bold text, ').bold = True + p.add_run('italic text, ').italic = True + run = p.add_run('and bold-italic text.') + run.bold = True + run.italic = True + + # Add different heading levels + doc.add_heading('Heading Level 2', level=2) + doc.add_heading('Heading Level 3', level=3) + + # Add bullet points + doc.add_paragraph('First bullet point', style='List Bullet') + doc.add_paragraph('Second bullet point', style='List Bullet') + + # Add numbered list + doc.add_paragraph('First numbered item', style='List Number') + doc.add_paragraph('Second numbered item', style='List Number') + + # Add a code block + code_block = doc.add_paragraph() + code_block.style = doc.styles['Normal'] + code_run = code_block.add_run('''def hello_world(): + print("Hello, World!") + return True''') + code_run.font.name = 'Consolas' + + # Add inline code + p = doc.add_paragraph('Here is some inline code: ') + code = p.add_run('print("Hello")') + code.font.name = 'Courier New' + + # Save the document + doc.save(output_path) + +if __name__ == "__main__": + create_test_document("test_document.docx") \ No newline at end of file diff --git a/output.md b/output.md new file mode 100644 index 0000000..ddc28a9 --- /dev/null +++ b/output.md @@ -0,0 +1,51 @@ +**Question: Which products and services is amazon** + +Based on the content of the provided file, Amazon is involved in a wide range of products and services, including: + +### 1. E-commerce: +* Started as an online bookseller. +* Expanded into music, movies, electronics, and general merchandise. +* Operates the Amazon Marketplace, allowing third-party merchants to sell products. +### 2. Cloud Computing: +* Amazon Web Services (AWS): Provides cloud computing services, generating significant revenue and operating income. +### 3. Subscription Services: +* Amazon Prime: Offers free two-day shipping, Prime Music (ad-free music streaming), and Prime Video (movies and TV shows). +* Prime Now: Delivery service within two hours for daily essentials. +### 4. Streaming and Digital Media: +* Prime Video: Streaming service for movies and TV shows. +* Amazon Studios: Produces original content, including feature films. +* Twitch: Live-streaming site for gamers. +### 5. Devices: +* Kindle: E-reader for digital books and movies. +* Echo: Voice-activated device with Alexa, Amazon's virtual assistant. +* Fire Phone: (Though it was a failure and discontinued). +### 6. Advertising: +* Operates a digital advertising network that generated substantial revenue. +### 7. Wholesale and Distribution: +* Amazon Business: Offers products for business clients, including supplies and mechanical parts. +### 8. Home Services: +* Amazon Home Services: Offers professional services for plumbing, electrical, pet care, etc. +### 9. Education: +* Amazon Inspire: An online marketplace for instructional materials for school teachers. +### 10. Retail and Grocery: +* Amazon Go: Cashless convenience stores. +* Whole Foods Market: Acquired for expanding into the grocery market. +### 11. Healthcare: +* Amazon Clinic: Virtual platform for connecting with healthcare providers. +* Amazon Pharmacy: Provides pharmaceutical services, expanded by acquiring PillPack. +### 12. Miscellaneous: +* Amazon Salon: A hair salon with augmented reality and new technology testing. +* Amazon AppStore: Competes with Google and Apple's app stores. +Amazon has a diverse portfolio, covering everything from online retail to cloud computing, streaming services, smart devices, and even healthcare. + +**Question: Did it make sense for a technology company like amazon with stratospheric stock price to enter gross market industry with thin margin industry?** + +Yes, it can make sense for a technology company like Amazon with a high stock price to enter a thin-margin industry. Here’s why: + +1. Economies of Scale: Amazon has significant resources and scale, allowing it to achieve economies of scale that can help mitigate the impact of thin margins. The company's extensive logistics and distribution network can reduce costs and improve efficiency. +1. Diversification: Entering different industries helps Amazon diversify its revenue streams and reduce dependency on any single sector. This can be a strategic move to hedge against market fluctuations and capture new growth opportunities. +1. Market Penetration: Amazon’s entry into a thin-margin industry might be aimed at dominating or disrupting that market. Even with thin margins, a large market share can translate into significant overall revenue and profitability due to volume. +1. Long-Term Strategy: Amazon often invests in lower-margin or loss-leading areas with the expectation that these ventures will become profitable in the long run or provide strategic advantages such as increased customer data, market share, or complementary services. +1. Customer Acquisition: A thin-margin industry might be a way for Amazon to attract more customers and drive traffic to its broader ecosystem, including its higher-margin services like AWS or Prime memberships. +In summary, while thin margins present challenges, Amazon's large scale, strategic focus, and long-term vision can make it feasible for them to enter and succeed in such industries. + diff --git a/requirements.txt b/requirements.txt index 722232a..5157833 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,3 @@ -markdown -python-docx -beautifulsoup4 \ No newline at end of file +markdown>=3.7 +python-docx>=1.1.2 +beautifulsoup4>=4.13.3 diff --git a/sample_files/amazon_case_study.docx b/sample_files/amazon_case_study.docx index 1e0b09b..5a4b0dc 100644 Binary files a/sample_files/amazon_case_study.docx and b/sample_files/amazon_case_study.docx differ diff --git a/sample_files/test_document.docx b/sample_files/test_document.docx new file mode 100644 index 0000000..d958867 Binary files /dev/null and b/sample_files/test_document.docx differ diff --git a/sample_files/test_document_output.md b/sample_files/test_document_output.md new file mode 100644 index 0000000..d9485fd --- /dev/null +++ b/sample_files/test_document_output.md @@ -0,0 +1,19 @@ +# Test Document +This is a regular paragraph with some text. + +This paragraph has **bold text, ***italic text, ***and bold-italic text.** + +## Heading Level 2 +### Heading Level 3 +* First bullet point +* Second bullet point +1. First numbered item +1. Second numbered item +``` +def hello_world(): + print("Hello, World!") + return True +``` + +Here is some inline code: `print("Hello")` + diff --git a/sample_files/~$st_document.docx b/sample_files/~$st_document.docx new file mode 100644 index 0000000..03629e4 Binary files /dev/null and b/sample_files/~$st_document.docx differ diff --git a/setup.py b/setup.py index dc390c2..341c70b 100644 --- a/setup.py +++ b/setup.py @@ -7,7 +7,7 @@ setup( name='md2docx_python', - version='0.3.2', + version='1.0.0', url='https://github.com/shloktech/md2docx-python', author='Shlok Tadilkar', author_email='shloktadilkar@gmail.com', @@ -26,7 +26,7 @@ long_description=long_description, long_description_content_type='text/markdown', packages=find_packages(), - install_requires=['markdown', 'python-docx', 'beautifulsoup4'], + install_requires=['markdown>=3.7', 'python-docx>=1.1.2', 'beautifulsoup4>=4.13.3'], python_requires=">=3.9.0", ) diff --git a/test_document.docx b/test_document.docx new file mode 100644 index 0000000..09f2a4d Binary files /dev/null and b/test_document.docx differ