log.pocka.io

Always set Content-Type when using net/http

Created at
Updated at

Go's net/http standard library package performs MIME sniffing using an algorithm designed for user agents when a Content-Type header is empty upon writing a response. This odd "safe-guard" sets text/plain rather than text/html when it sees a valid HTML that does not have a byte pattern defined in the MIME sniffing spec.

This situation is an edge case, but if you follow Go's official tutorial and return non-templated HTMLs, that mechanism may confuse you. The solution (set Content-Type) is easy and simple but finding the root cause (not every valid HTML pass MIME sniffing) requires a lot of digging.

In order to avoid letting an inappropriate algorithm decide a type of your content, and to secure your software, always set Content-Type explicitly.

How I stepped on this landmine

I was writing a demo/PoC web server for my side project, as a hands-on experience for Event Sourcing architecture. I chose Go as a language because of its battery-included standard library, GC, and easiness to use.

Since I've never (or so many years ago that I can't recall) written plain Go web server and am eager to learn new paradigm, API design, and practices, I decided to use Writing Web Applications - The Go Programming Language as "an ideal use of net/http web server" guide.

Wrote a HTML file and a handler just returns go:embed-ed the HTML file, invoked go run . then accessed the page in a web browser — successfully displaying source code of the HTML file as a plain text. The reason why the browser displays as a plain text is easy to guess, Content-Type header is set to text/plain.

net/http heavily uses MIME sniffing

If I were a webdev newbie, I would have thought "Why fmt.Fprint(http.ResponseWriter, string) set text/plain?" Fortunately, I'm not. The first question comes into my mind was "Why and How fmt.Fprint(http.ResponseWriter, string) set Content-Type?" There is neither explicit Content-Type nor filename extension (many static file servers guess Content-Type based on this).

The http.ResponseWriter interface has Write([]byte) method, which is then called by various functions including fmt.Fprint. Doc comments for the Write method states:

// ...
// WriteHeader(http.StatusOK) before writing the data. If the Header
// does not contain a Content-Type line, Write adds a Content-Type set
// to the result of passing the initial 512 bytes of written data to
// [DetectContentType]. Additionally, if the total size of all written
// ...

Documentation for the DetectContentType says:

DetectContentType implements the algorithm described at https://mimesniff.spec.whatwg.org/ to determine the Content-Type of the given data. It considers at most the first 512 bytes of data. DetectContentType always returns a valid MIME type: if it cannot determine a more specific one, it returns "application/octet-stream".

So, http.ResponseWriter guesses Content-Type from the beginning of a content, using a web content sniffing algorithm designed for user agents. And this mechanism guessed my HTML as text/plain.

At this point, the culprit is either my HTML or the algorithm.

The HTML file

The HTML file was like this:

<!--
(Typical license header texts)

SPDX-FileCopyrightText: ...
SPDX-License-Identifier: ...
-->
<!DOCTYPE html>
<html lang="en-US">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>PoC</title>
  </head>
  <body>
    <h1>Top</h1>
  </body>
</html>

If there could be something unusual, it's the comment header at the top. Removed the comment header and started a server, then the server returned a correct Content-Type: text/html; charset=utf-8.

I searched Go's issue tracker with terms such as "net/http" and "html comment" then found golang/go#16275. The second reply in the issue says the HTML is invalid, so I reached for the HTML spec: turned out that reply is wrong, and the HTML code is valid. I don't know about HTML4.0 and XML thingy, but at least HTML4.1 and the Living Standard (HTML5) allows comments before and after the <!DOCTYPE> and <html>.

So, it's the algorithm that causing the issue.

whatwg/mimesniff — the MIME Sniffing standard

The last two comments on that issue saying the MIME Sniffing standard, which net/http package uses underneath, does not recognize the HTML file as text/html.

Reading the spec, I found that it defines a "tag-terminating byte" as,

any one of the following bytes: 0x20 (SP), 0x3E (">").

that means when a newline character or tab character appears after a tag opening or comment marker, the MIME Sniffing standard dismisses that content as HTML. The author of the above issue created an issue on whatwg/mimesniff repository, following the response from Go team. Of course, the issue was closed as the spec is not for this kind of use-cases.

Solutions

This time, I just ended up moving the comment header after the <!DOCTYPE> line because I researched this after the "fix". Obviously, the appropriate fix is to set the correct Content-Type header by yourself.

But the official tutorial uses the "automatic" way. Isn't it the right way?

Tutorials do not necessary represent "an authentic way" or "best way". Oftentimes they focuses on "easy" — great ones accomplish by making it simple but the rest just hides the complexity and pile up implicit "magics".

I blindly assume Go documents to be great despite me not interacting with it for a long time. That's the problem: I should have read carefully from the start and reach for API reference, not just jumping to the relevant section.

Anyway, whether you own contents to serve or handle user-upload content, I recommend you not to rely on the MIME Sniffing.

  • The sniffing algorithm does not perfectly align with your "file type".
  • File's Content-Type can change in a future if the standard updates the spec.
  • If you handle user-upload files and let net/http sniff the file, it could be a source of security vulnerability.

The last point was new to me, issues in Caddy (web server) repository educated me about this topic (caddyserver/caddy#2629 and caddyserver/caddy#6843). While API using is different, the bare http.ResponseWriter could be abused in a similar way.

If it was just "oh, my HTML sent as text/plain," this does not matter. However, a potential future breakage and security vulnerability is not worth the single line saving IMO. I'm not sure how many uses net/http directly, but I don't believe this is a low risk edge-case you can essentially ignore.

Thoughts, What I learned from this

  • Although I appreciate Go standard library in general, net/http is easy to misuse (this MIME sniffing, when to write header/status, etc.)
  • Writing tutorial is difficult task. Narrowing scope could be the solution?
  • Do not perform content sniffing on server side.
  • Do not guess user input, reject it or let the user choose from your guesses.
  • Select an appropriate standard.
  • Browsers don't sniff when Content-Type is text/html.