Skip to content

Importing serialized meta data containing CRLF line endings fails when the importer uses WXR_Parser_SimpleXML #178

@angryaxi

Description

@angryaxi

An error is emitted from the unserialize function and the meta value is missing from the database when importing a post which has serialized meta data containing CRLF new lines and the SimpleXML php extension is used.

unserialize(): Error at offset 185 of 185 bytes
    wp-includes/functions.php:650
    unserialize()
    wp-includes/functions.php:650
    maybe_unserialize()
    wp-content/plugins/wordpress-importer/class-wp-import.php:891
    WP_Import->process_posts()
    wp-content/plugins/wordpress-importer/class-wp-import.php:89
    WP_Import->import()
    wp-content/plugins/wordpress-importer/class-wp-import.php:65
    WP_Import->dispatch()
    wp-admin/admin.php:364

It seems that the SimpleXML PHP extension has a bug that replaces the \r\n(0x0D 0x0A) characters with \n(0x0A). This reduces the length of the data, as the \r are removed. The size information of the string inside the serialized data is no longer correct and the unserialize function fails because of it.

Judging by the date of this Stack Overflow question this has been the case for at least 10 years.
https://stackoverflow.com/questions/27871572/php-simplexml-modifies-line-break-characters-in-cdata-elements

The import is successful when the SimpleXML extension is unavailable and the XML extension is used instead.
A quick way to test this without modifying the PHP configuration files is to modify the code temporarily in class-wxr-parser.php.

/* Line 15: */ if ( false && extension_loaded( 'simplexml' ) ) { ... }

Steps to reproduce this bug

  1. Create a new WordPress installation.
  2. Run the script provided below with wp eval-file filename.php to create the sample posts.
  3. Observe two rows being added to the wp_postmeta table.
  4. Create an export WXR file that contains these posts.
  5. Move the two posts to the trash and then delete them permanently.
  6. Observe the two rows vanishing from the wp_postmeta table as the posts were deleted.
  7. Import the WXR file that you just created. An error is emitted from the unserialize function.
  8. Observe that the meta data containing CRLF line endings is missing from the database. The meta data with LF line endings has been imported successfully.

Sample data creation script

<?php
if (!defined('WP_CLI')) die('NOT RUN FROM CLI');

if (!extension_loaded('simplexml')) {
    WP_CLI::error('SimpleXML php extension is not loaded! The bug this issue is describing will not manifest!');
} else {
    WP_CLI::log('SimpleXML php extension is available');
}

if (!extension_loaded('xml')) {
    WP_CLI::log('XML php extension is not loaded.');
} else {
    WP_CLI::log('XML php extension is available');
}

$post_crlf = wp_insert_post([
    'post_author' => 1,
    'post_title'  => 'A post with serialized metadata - CRLF',
    'post_status' => 'publish',
    'meta_input'  => [
        'example_meta' => [
            'boolean'     => true,
            'integer'     => 42,
            'html_markup' => "<h2>CRLF</h2>\r\nLorem ipsum dolor sit amet\r\n\r\nQuisque ligula eros ullamcorper quis, lacinia quis facilisis sed sapien."
        ]
    ]
], true);

if (is_wp_error($post_crlf)) {
    WP_CLI::error('An error happened when inserting the CRLF post');
} else {
    WP_CLI::success("Successfully inserted the CRLF post to the database with an ID: $post_crlf");
}

$post_lf = wp_insert_post([
    'post_author' => 1,
    'post_title'  => 'A post with serialized metadata - LF',
    'post_status' => 'publish',
    'meta_input'  => [
        'example_meta' => [
            'boolean'     => true,
            'integer'     => 42,
            'html_markup' => "<h2>LF</h2>\nLorem ipsum dolor sit amet\n\nQuisque ligula eros ullamcorper quis, lacinia quis facilisis sed sapien."
        ]
    ]
], true);

if (is_wp_error($post_lf)) {
    WP_CLI::error('An error happened when inserting the LF post');
} else {
    WP_CLI::success("Successfully inserted the LF post to the database with an ID: $post_lf");
}

Environment information

PHP Version 8.3.16
WordPress 6.7.1
MySQL 8.0.33
Apple M1 Pro ARM64
MacOS Sonoma 14.7.2 (23H311)

Screenshots

The wp_postmeta table after the posts have been created with the script.
Image

The wp_postmeta table after the posts were deleted and the WXR file was imported.
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions