fix: preserve CJK/Unicode characters in directory listing for non-browser clients#970
Open
zlvb wants to merge 1 commit intohttp-party:masterfrom
Open
fix: preserve CJK/Unicode characters in directory listing for non-browser clients#970zlvb wants to merge 1 commit intohttp-party:masterfrom
zlvb wants to merge 1 commit intohttp-party:masterfrom
Conversation
…ents (e.g. Switch DBI) - Replace he.encode() with escapeHtml() to avoid converting CJK characters to HTML entities (&#x...;), which non-browser HTTP clients cannot decode - Add charset=utf-8 to Content-Type header in directory listing responses - Fix ensureUriEncoded() bug where unreachable code prevented proper URL encoding of non-ASCII paths
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When browsing directory listings from non-browser HTTP clients (e.g. Nintendo Switch DBI), CJK characters (Chinese, Japanese, Korean) in file and directory names are displayed as HTML entity codes like
巨丝事instead of the actual characters.This happens because
he.encode()converts all non-ASCII characters into numeric HTML entities (e.g.巨→巨). Standard web browsers decode these entities automatically, so the issue is invisible in browsers. However, lightweight HTTP clients that don't implement full HTML parsing display the raw entity codes, making CJK filenames completely unreadable.Root Cause
he.encode()over-escapes non-ASCII characters: Thehelibrary'sencode()function converts all non-ASCII characters (including CJK) to&#x...;HTML entities. Non-browser clients display these entities as-is.Missing
charset=utf-8in Content-Type header: The directory listing response header was set totext/htmlwithout specifying charset, which can cause encoding issues in clients that don't sniff the<meta charset>tag.ensureUriEncoded()function bug: An unreachablereturnstatement on line 39 prevented the URL encoding logic from ever executing, causing non-ASCII characters in redirect URLs to remain unencoded.Fix
he.encode()with a customescapeHtml()function that only escapes the 5 HTML-unsafe characters (& < > " ') while preserving CJK/Unicode characters as raw UTF-8.charset=utf-8to theContent-Typeresponse header for directory listings.return textline inensureUriEncoded()so URL encoding works correctly.Files Changed
lib/core/show-dir/index.js- escapeHtml() + charset fixlib/core/index.js- ensureUriEncoded() bug fix