Skip to content

Conversation

@erikn69
Copy link
Contributor

@erikn69 erikn69 commented Dec 4, 2025

this is more a problem on reading the stream when the size is known

Alternative for #120, without guessing endstream (#120 (comment), #120 (comment))

This PR adds support for PDF streams where the end tokens(like endstream) appear immediately after the stream data, without any whitespace or line break.
Some PDF generators embed metadata blocks that end like this:

<?xpacket end="w"?>endstream

In this case, the parser previously failed to recognize the endstream token because it expected a whitespace separator before it. As a result, the tokenization would break and parsing of the stream would fail.

}
$stream_content .= $this->_c;
/// Get /Length from parsed dict
$length = $obj['Length']->get_int();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code fails in some cases, because Length key does not appear in some cases; so $obj['Length'] is null and it raises an exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be your suggestion?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to debug why the existing solution does not work and try to fix the issue in the appropriate function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried just advancing the position and it seems to work, can you confirm?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it yet because I'm not currently in front of my computer, but following the code, it looks very good. It would be nice if anyone could test it, apart from me. If not, I will test it in a few hours

Copy link

@angeljqv angeljqv Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to use length if it exists and guess if it doesn't.
Is a false positive possible when searching for endings?

} else if (preg_match('/endobj\s$/', $this->_buffer->substratpos(7))) {
$stream_content .= $this->_c;
$this->nextchar();
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for me,
it makes sense, because it does a break and the next ->nextChar() of the while loop is never executed.

Please release it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid stream ending after upgrading to 1.5.5

4 participants