Skip to content

fix: preserve CJK/Unicode characters in directory listing for non-browser clients#970

Open
zlvb wants to merge 1 commit intohttp-party:masterfrom
zlvb:master
Open

fix: preserve CJK/Unicode characters in directory listing for non-browser clients#970
zlvb wants to merge 1 commit intohttp-party:masterfrom
zlvb:master

Conversation

@zlvb
Copy link
Copy Markdown

@zlvb zlvb commented Apr 6, 2026

Problem

When browsing directory listings from non-browser HTTP clients (e.g. Nintendo Switch DBI), CJK characters (Chinese, Japanese, Korean) in file and directory names are displayed as HTML entity codes like 巨丝事 instead of the actual characters.

This happens because he.encode() converts all non-ASCII characters into numeric HTML entities (e.g. 巨). Standard web browsers decode these entities automatically, so the issue is invisible in browsers. However, lightweight HTTP clients that don't implement full HTML parsing display the raw entity codes, making CJK filenames completely unreadable.

Root Cause

  1. he.encode() over-escapes non-ASCII characters: The he library's encode() function converts all non-ASCII characters (including CJK) to &#x...; HTML entities. Non-browser clients display these entities as-is.

  2. Missing charset=utf-8 in Content-Type header: The directory listing response header was set to text/html without specifying charset, which can cause encoding issues in clients that don't sniff the <meta charset> tag.

  3. ensureUriEncoded() function bug: An unreachable return statement on line 39 prevented the URL encoding logic from ever executing, causing non-ASCII characters in redirect URLs to remain unencoded.

Fix

  • Replace he.encode() with a custom escapeHtml() function that only escapes the 5 HTML-unsafe characters (& < > " ') while preserving CJK/Unicode characters as raw UTF-8.
  • Add charset=utf-8 to the Content-Type response header for directory listings.
  • Remove the dead return text line in ensureUriEncoded() so URL encoding works correctly.

Files Changed

  • lib/core/show-dir/index.js - escapeHtml() + charset fix
  • lib/core/index.js - ensureUriEncoded() bug fix

…ents (e.g. Switch DBI)

- Replace he.encode() with escapeHtml() to avoid converting CJK characters to HTML entities (&#x...;), which non-browser HTTP clients cannot decode

- Add charset=utf-8 to Content-Type header in directory listing responses

- Fix ensureUriEncoded() bug where unreachable code prevented proper URL encoding of non-ASCII paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant