Skip to content

Conversation

@erikn69
Copy link
Contributor

@erikn69 erikn69 commented Dec 3, 2025

This PR adds support for PDF streams where the end tokens(like endstream) appear immediately after the stream data, without any whitespace or line break.
Some PDF generators embed metadata blocks that end like this:

<?xpacket end="w"?>endstream

In this case, the parser previously failed to recognize the endstream token because it expected a whitespace separator before it. As a result, the tokenization would break and parsing of the stream would fail.

@dealfonso
Copy link
Owner

While this may work, this is a problem of the tokenization and it should not be solved in that part of the code... this is more a problem on reading the stream when the size is known. I find that this is not the appropriate place to patch the code. What do you think?

@erikn69
Copy link
Contributor Author

erikn69 commented Dec 4, 2025

It's possible, dut due to lack of time, I came up with a last-minute solution; it's working, and when something's working, it's best not to touch it. 😂😂

but in the other code it's also like "guessing" endstream.

@dealfonso
Copy link
Owner

It's possible, dut due to lack of time, I came up with a last-minute solution; it's working, and when something's working, it's best not to touch it. 😂😂

but in the other code it's also like "guessing" endstream.

this is true, but this is because some pdf editors do not include the size of the stream (which is not correct).

Anyway, the code in _parse_stream function seems to make the same task that you are proposing... why is not working? that is the key to solve the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid stream ending after upgrading to 1.5.5

2 participants