Skip to content

UnicodeDecodeError due to non-ASCII chars in key #72

@jakubgs

Description

@jakubgs

I've encountered this issue with glacier-cli failing due to git-annex mistakenly adding things that look like file extension to the key when using the SHA256E backend. Essentially what it means is that certain files will have characters that look like a file extension appended to the key, even when they might not be part of the extension.

Example:

 % ls 12.\ Change\ The\ World\ \(feat.\ 웅산\).mp3 
12. Change The World (feat. 웅산).mp3
 % git annex info 12.\ Change\ The\ World\ \(feat.\ 웅산\).mp3
file: 12. Change The World (feat. 웅산).mp3
size: 7.48 megabytes
key: SHA256E-s7479642--957208748ae03fe4fc8d7877b2c9d82b7f31be0726e4a3dec9063b84cc64cf09.웅산.mp3
present: true
 % git annex calckey 12.\ Change\ The\ World\ \(feat.\ 웅산\).mp3
SHA256E-s7479642--957208748ae03fe4fc8d7877b2c9d82b7f31be0726e4a3dec9063b84cc64cf09.웅산.mp3

I've opened an issue with git-annex here:
https://git-annex.branchable.com/bugs/git-annex_adds_unicode_characters_at_end_of_checksum/

And the will be a fix for the case with brackets, but there are other cases in which a file extension might not be just ASCII. And then this is what happens:

% git annex copy 12.\ Change\ The\ World\ \(feat.\ 웅산\).mp3 --to glacier
copy 12. Change The World (feat. 웅산).mp3 (checking glacier...) Traceback (most recent call last):
  File "/usr/local/bin/glacier", line 737, in <module>
    main() 
  File "/usr/local/bin/glacier", line 733, in main
    App().main()
  File "/usr/local/bin/glacier", line 719, in main
    self.args.func()
  File "/usr/local/bin/glacier", line 600, in archive_checkpresent
    self.args.vault, self.args.name)
  File "/usr/local/bin/glacier", line 161, in get_archive_last_seen
    result = self._get_archive_query_by_ref(vault, ref).one()
  File "/usr/local/bin/glacier", line 136, in _get_archive_query_by_ref
    if ref.startswith('id:'):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 83: ordinal not in range(128)
(user error (glacier ["--region=eu-west-1","archive","checkpresent","music","--quiet","SHA256E-s7479642--957208748ae03fe4fc8d7877b2c9d82b7f31be0726e4a3dec9063b84cc64cf09.\50885\49328.mp3"] exited 1)) failed
git-annex: copy: 1 failed

Now, As the bug report says, you can avoid this issue by changing your backend from SHA256E to SHA256 to avoid adding extensions. But I think addressing this issue would be good anyway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions