A high-performance XML parser for Python based on Cython and pugixml, providing fast XML parsing, manipulation, XPath queries, text extraction, and advanced XML processing capabilities.
pygixml delivers exceptional performance compared to other XML libraries:
| Library | Parsing Time | Speedup vs ElementTree |
|---|---|---|
| pygixml | 0.00077s | 15.9x faster |
| lxml | 0.00407s | 3.0x faster |
| ElementTree | 0.01220s | 1.0x (baseline) |
- 15.9x faster than Python's ElementTree for XML parsing
- 5.3x faster than lxml for XML parsing
- Memory efficient - uses pugixml's optimized C++ memory management
- Scalable performance - maintains speed advantage across different XML sizes
pip install pygixmlpip install git+https://github.com/MohammadRaziei/pygixml.git- Node selection:
//book,/library/book,book[1] - Attribute selection:
book[@id],book[@category='fiction'] - Boolean operations:
and,or,not() - Comparison operators:
=,!=,<,>,<=,>= - Mathematical operations:
+,-,*,div,mod - Functions:
position(),last(),count(),sum(),string(),number() - Axes:
child::,attribute::,descendant::,ancestor:: - Wildcards:
*,@*,node()
- XMLDocument: Create, parse, save XML documents
- XMLNode: Navigate and manipulate XML nodes
- XMLAttribute: Handle XML attributes
- XPathQuery: Compile and execute XPath queries
- XPathNode: Result of XPath queries (wraps nodes and attributes)
- XPathNodeSet: Collection of XPath results
parse_string(xml_string)- Parse XML from stringparse_file(file_path)- Parse XML from filesave_file(file_path)- Save XML to fileappend_child(name)- Add child nodefirst_child()- Get first child nodechild(name)- Get child by namereset()- Clear document
name- Get/set node namevalue- Get/set node value (for text nodes only)child_value(name)- Get text content of child nodeappend_child(name)- Add child nodefirst_child()- Get first childchild(name)- Get child by namenext_sibling- Get next siblingprevious_sibling- Get previous siblingparent- Get parent nodetext(recursive, join)- Get text contentto_string(indent)- Serialize to XML stringxml- XML representation propertyxpath- Absolute XPath of nodeis_null()- Check if node is nullmem_id- Memory identifier for debugging
select_nodes(query)- Select multiple nodes using XPathselect_node(query)- Select single node using XPathXPathQuery(query)- Create reusable XPath query objectevaluate_node_set(context)- Evaluate query and return node setevaluate_node(context)- Evaluate query and return first nodeevaluate_boolean(context)- Evaluate query and return booleanevaluate_number(context)- Evaluate query and return numberevaluate_string(context)- Evaluate query and return string
import pygixml
# Parse XML from string
xml_string = """
<library>
<book id="1">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
</book>
</library>
"""
doc = pygixml.parse_string(xml_string)
root = doc.first_child()
# Access elements
book = root.first_child()
title = book.child("title")
print(f"Title: {title.child_value()}") # Output: Title: The Great Gatsby
# Create new XML
doc = pygixml.XMLDocument()
root = doc.append_child("catalog")
product = root.append_child("product")
product.name = "product"
# To add text content to an element, append a text node
text_node = product.append_child("") # Empty name creates text node
text_node.value = "content"import pygixml
xml_string = """
<root>
<simple>Hello World</simple>
<nested>
<child>Child Text</child>
More text
</nested>
<mixed>Text <b>with</b> mixed <i>content</i></mixed>
</root>
"""
doc = pygixml.parse_string(xml_string)
root = doc.first_child()
# Get direct text content
simple = root.child("simple")
print(simple.child_value()) # "Hello World"
# Get recursive text content
nested = root.child("nested")
print(nested.text(recursive=True)) # "Child Text\nMore text"
# Get direct text only (non-recursive)
mixed = root.child("mixed")
print(mixed.text(recursive=False)) # "Text "
# Custom join character
print(nested.text(recursive=True, join=" | ")) # "Child Text | More text"import pygixml
doc = pygixml.XMLDocument()
root = doc.append_child("root")
child = root.append_child("item")
child.name = "product"
# Serialize to string
print(root.to_string()) # <root>\n <product/>\n</root>
print(root.to_string(" ")) # Custom indentation
# Convenience property
print(root.xml) # Same as to_string() with default indentimport pygixml
xml_string = """
<root>
<item>First</item>
<item>Second</item>
<item>Third</item>
</root>
"""
doc = pygixml.parse_string(xml_string)
# Iterate over document (depth-first)
for node in doc:
print(f"Node: {node.name}, XPath: {node.xpath}")
# Iterate over children
root = doc.first_child()
for child in root:
print(f"Child: {child.name}, Value: {child.child_value()}")import pygixml
doc = pygixml.parse_string("<root><a/><b/></root>")
root = doc.first_child()
a = root.child("a")
b = root.child("b")
a2 = root.child("a")
print(a == a2) # True - same node
print(a == b) # False - different nodes
print(a.mem_id) # Memory address for debuggingpygixml provides full XPath 1.0 support through pugixml's powerful XPath engine:
import pygixml
xml_string = """
<library>
<book id="1" category="fiction">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
<price>12.99</price>
</book>
<book id="2" category="fiction">
<title>1984</title>
<author>George Orwell</author>
<year>1949</year>
<price>10.99</price>
</book>
</library>
"""
doc = pygixml.parse_string(xml_string)
root = doc.first_child()
# Select all books
books = root.select_nodes("book")
print(f"Found {len(books)} books")
# Select fiction books
fiction_books = root.select_nodes("book[@category='fiction']")
print(f"Found {len(fiction_books)} fiction books")
# Select specific book by ID
book_2 = root.select_node("book[@id='2']")
if book_2:
title = book_2.node.child("title").child_value()
print(f"Book ID 2: {title}")
# Use XPathQuery for repeated queries
query = pygixml.XPathQuery("book[year > 1930]")
recent_books = query.evaluate_node_set(root)
print(f"Found {len(recent_books)} books published after 1930")
# XPath boolean evaluation
has_orwell = pygixml.XPathQuery("book[author='George Orwell']").evaluate_boolean(root)
print(f"Has George Orwell books: {has_orwell}")
# XPath number evaluation
avg_price = pygixml.XPathQuery("sum(book/price) div count(book)").evaluate_number(root)
print(f"Average price: ${avg_price:.2f}")In pugixml (and therefore pygixml), element nodes do not have values directly. Instead, they contain child text nodes that hold the text content.
# ❌ This will NOT work (element nodes don't have values):
element_node.value = "some text"
# ✅ Correct approach - use child_value() to get text content:
text_content = element_node.child_value()
# ✅ To set text content, you need to append a text node:
text_node = element_node.append_child("") # Empty name creates text node
text_node.value = "some text"Run performance comparisons:
# Run complete benchmark suite
python benchmarks/clean_visualization.py
# View results
cat benchmarks/results/benchmark_results.csvThe benchmark suite compares pygixml against:
- lxml - Industry-standard C-based parser
- xml.etree.ElementTree - Python standard library
Benchmark Files:
benchmarks/clean_visualization.py- Main benchmark runnerbenchmarks/benchmark_parsing.py- Core benchmark logicbenchmarks/results/- Generated CSV data and SVG charts
📖 Full documentation is available at: https://mohammadraziei.github.io/pygixml/
The documentation includes:
- Complete API reference with examples
- Installation guides for all platforms
- Performance benchmarks and optimization tips
- XPath 1.0 usage guide with comprehensive examples
- Real-world usage scenarios
MIT License - see LICENSE file for details.
To use this library, you must star the project on GitHub!
This helps support the development and shows appreciation for the work. Please star the repository before using the library:
- pugixml - Fast and lightweight C++ XML processing library
- Cython - C extensions for Python
- scikit-build - Modern Python build system