Problem
Japanese strings passed to Pandoc via CLI argument inside Nix build sandbox gets garbled. Texts inside files and ASCII range texts are safe.
Here is the reproduction code (actually my project invokes Pandoc from Zig Build System, but the result is same.)
# Reproduction (default.nix)
{
stdenv,
pandoc,
}:
stdenv.mkDerivation {
name = "nix-builder-lang-test";
src = ./.;
buildPhase = ''
pandoc --from markdown --to html --output out.html test.md --standalone --metadata=title:題名
'';
installPhase = ''
mkdir -p $out
cp out.html $out/out.html
'';
nativeBuildInputs = [ pandoc ];
}ls
# default.nix flake.nix test.md
cat test.md
# あああ
nix build
tail -5 result/out.html<h1 class="title">������</h1>
</header>
<p>あああ</p>
</body>
</html>Cause
Nix build sandbox do not have locale, thus neither $LANG
nor $LC_ALL is set. I suspect other non-ASCII characters
have the same problem too.
Solution
You can fix that by adding locale package as a build dependency
(nativeBuildInputs) and set $LANG to
en_US.UTF-8.
{
stdenv,
pandoc,
+ glibcLocalesUtf8,
}:
stdenv.mkDerivation {
name = "nix-builder-lang-test";
src = ./.;
buildPhase = ''
pandoc --from markdown --to html --output out.html test.md --standalone --metadata=title:題名
'';
installPhase = ''
mkdir -p $out
cp out.html $out/out.html
'';
+
- nativeBuildInputs = [ pandoc ];
+ nativeBuildInputs = [ pandoc glibcLocalesUtf8 ];
+
+ LANG = "en_US.UTF-8";
}nix build
tail -5 result/out.html<h1 class="title">題名</h1>
</header>
<p>あああ</p>
</body>
</html>