Nix build sandbox lacks UTF8 locale

A problem caused by the lack of UTF8 inside Nix build sandbox and solution for that.

Created at
Updated at

Problem

Japanese strings passed to Pandoc via CLI argument inside Nix build sandbox gets garbled. Texts inside files and ASCII range texts are safe.

Here is the reproduction code (actually my project invokes Pandoc from Zig Build System, but the result is same.)

# Reproduction (default.nix)
{
  stdenv,
  pandoc,
}:
stdenv.mkDerivation {
  name = "nix-builder-lang-test";

  src = ./.;

  buildPhase = ''
    pandoc --from markdown --to html --output out.html test.md --standalone --metadata=title:題名
  '';

  installPhase = ''
    mkdir -p $out
    cp out.html $out/out.html
  '';

  nativeBuildInputs = [ pandoc ];
}
ls
# default.nix flake.nix test.md
cat test.md
# あああ
nix build
tail -5 result/out.html
<h1 class="title">������</h1>
</header>
<p>あああ</p>
</body>
</html>

Cause

Nix build sandbox do not have locale, thus neither $LANG nor $LC_ALL is set. I suspect other non-ASCII characters have the same problem too.

Solution

You can fix that by adding locale package as a build dependency (nativeBuildInputs) and set $LANG to en_US.UTF-8.

 {
   stdenv,
   pandoc,
+  glibcLocalesUtf8,
 }:
 stdenv.mkDerivation {
   name = "nix-builder-lang-test";
 
   src = ./.;
 
   buildPhase = ''
     pandoc --from markdown --to html --output out.html test.md --standalone --metadata=title:題名
   '';
 
   installPhase = ''
     mkdir -p $out
     cp out.html $out/out.html
   '';
+
-  nativeBuildInputs = [ pandoc ];
+  nativeBuildInputs = [ pandoc glibcLocalesUtf8 ];
+
+  LANG = "en_US.UTF-8";
 }
nix build
tail -5 result/out.html
<h1 class="title">題名</h1>
</header>
<p>あああ</p>
</body>
</html>