Skip to content

Scraping does not seem to use Unicode/UTF-8, Japanese gets garbled #2318

@vermeeren

Description

@vermeeren

Describe the bug

Bookmark scraping appears to not use unicode, resulting in garbled characters with for example Japanese.

The PHP default_charset is set to UTF-8, Debian defaults. Japanese works fine with file syncing and in other parts of Nextcloud.

To Reproduce

Add bookmark https://www.youtube.com/watch?v=OoM0ikOi1v4 with web scraping turned on.

�Blender�������Blender����座��簡������������������������

Seems like the values are inserted into the database garbled.

# from psql command line
nextcloud=# select title from oc_bookmarks;

ã\u0080\u0090Blenderã\u0080\u0091å\u0088\u009Då¿\u0083è\u0080\u0085å\u0090\u0091ã\u0081\u0091ï¼\u0081Blenderè¶\u0085å\u0085¥é\u0096\u0080è¬\u009B座ã\u0080\u0080ï½\u009Eç°¡å\u008D\u0098ã\u0081ªã\u0082»ã\u0083«ã\u0083«ã\u0083\u0083ã\u0082¯ã\u0081®ã\u0081\u0086ã\u0081\u0095ã\u0081\u008Eã\u0081®ã\u0082­ã\u0083£ã\u0083©ã\u0082¯ã\u0082¿ã\u0083¼ã\u0082\u0092ä½\u009Cã\u0082\u008Dã\u0081\u0086ï¼\u0081ï½\u009E

PostgreSQL database using UTF8 for encoding and en_US.UTF-8 for collate and ctype.

Expected behavior

【Blender】初心者向け!Blender超入門講座 ~簡単なセルルックのうさぎのキャラクターを作ろう!~ 

Screenshots

Render from the bookmarks UI in firefox.

Image

Desktop (please complete the following information):

  • OS: Debian Linux
  • Browser: Firefox
  • Version: ESR 128

Server (please complete the following information):

Additional context

Web server error log

Nothing shows up in logs.

Nextcloud log (nextcloud/data/nextcloud.log)

Nothing shows up in logs.

Browser log

Not sure about this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions