On Ken Thompson's "Reflections on Trusting Trust"

A detailed look at one of my favorite software security papers, and its implications on bootstrapping trust.

Published Updated Categories

2020-07-01 2025-01-20 computer science, security

Ken Thompson’s “Reflections on Trusting Trust” (Thompson, 1984) is a short and effective paper discussing a fascinating exploit and its implications on trust in computer security. The following is based on the aforementioned paper as well as discussions here and here.

Introduction

Consider the C compiler, which is bootstrapped and written in C.¹ For simplicity, let the compiler routine be represented as follows.

compile (code)
{
  /* Do regular compilation things */
  ...
}

The Unix kernel has a login command that accepts an encrypted password and compares it to the stored user password in /etc/password, allowing access on a match and rejecting access otherwise. Say we wanted to insert a backdoor into this command, which would accept either the encrypted password or a particular known password we would like to inject. Doing so would allow a third-party user to gain access to any system with this additional password.

A Trivial Backdoor

At the time that Thampson wrote his article, most Unix users compiled elements of their system on heterogeneous hardware (binary distributions were less common than they are today). The C compiler thus assumed a privileged position as a centerpiece in this workflow.

As a naive attempt to insert such a backdoor, we can consider modifying the C compiler to compile the backdoor instead of the traditional login code when the login code is being compiled, but behave normally in all other situations. In pseudocode, our modification might look like

compile (code)
{
  /* If the code we're compiling is code for the Unix login command */
  if (match (code, login_code_pattern))
  {
    compile (backdoor);
    return;
  }

  else
  {
    /* Do regular compilation things */
    ...
  }
}

This would work perfectly, so that whenever the Unix kernel was compiled, the code generated by the compiler would include our desired backdoor. However, our exploit isn’t fooling anyone: a simple look at the C compiler source code would immediately raise complaints.

Thompson’s Exploit

In the previous scenario, our modification to the compiler’s source code was poorly received, as a third party could easily verify the maliciousness of the source code. However, the buck doesn’t stop at the source code—Thompson’s exploit and article build on the notion that building executables places implicit trust in many other, less known components of the build system that often fly under the radar.

Consider, for example, the C compiler. Imagine an exploit that modifies the compiler with a malicious program that is executed when the compiler compiles itself. In particular, when the compiler is asked to compile itself, it instead compiles a version of itself with the login backdoor and the malicious program inserted. To summarize, our compiler now includes two new procedures, which we together call the “malicious program”: ²

If asked to compile the Unix kernel login code, it instead compiles the backdoor code
If asked to compile itself, it instead compiles a version of itself with both the logic in (1) and the logic that enables (2)

Note that for procedure (2) to work properly, the malicious program that the compiler executes when asked to compile itself must be able to print its own source code, so that it can insert its source code into the modified code to be compiled. Programs that can do so are called quines.

Our resulting compiler looks like

compile (code)
{
  /* If the code we're compiling is code for the Unix login command */
  if (match (code, login_code_pattern))
  {
    compile (backdoor);
    return;
  }

  /* If the code we're compiling is the compiler source code */
  if (match (code, compiler_code_pattern))
  {
    compile (compiler_code_with_both_if_statements_inserted);
    return;
  }

  else
  {
    /* Do regular compilation things */
    ...
  }
}

Now, when we compile the C compiler with itself, we obtain a binary that includes the code to insert a backdoor upon compilation of the Unix kernel login command. Cleaning up, we can finally delete our modifications to the compiler code, so that we leave no trace of our exploit; however, the exploit code remains within the binary of the C compiler—and when the compiler compiles itself, the code will be re-introduced!

In other words, since the C compiler compiles itself, any future (perfectly exploit-free) version of the C compiler that is compiled by our poisoned binary will include the backdoor due to procedure (2). And, if the Unix kernel is compiled by our poisoned binary, it will include the login backdoor, as desired.

Implications on Bootstrapping Trust

Thompson’s exploit is initially quite worrying: if you can’t trust the C compiler, and you can’t trust the compiler binary, what do you do? You can re-build the compiler binary, but you can’t trust the binary you used to re-build… and so on, in an indefinite loop of recursive paranoia.

Indeed, the particular exploit is less important than the principle it conveys: trusting nobody is a fool’s errand. Regardless of the layer of the stack you are developing, it’s worth keeping in mind what your trust model truly is, both in the direct contracts you form (e.g. the written code) and the indirect/implied contracts that are formed against lower layers of the stack.

In Thompson’s own words:

In demonstrating the possibility of this attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well-installed microcode bug will be almost impossible to detect.

Notes

How does this work? Certainly, the first-ever version of a compiler for new language $L$ cannot be written in $L$. However, one can bootstrap the process by first writing a compiler for language $L$ in existing language $E$ (call this $C_E$),then writing a compiler for $L$ in $L$ itself (and compiling this compiler with $C_E$ to get $C_L$), and from then onwards using $C_L$ to compile future versions of $L$. For some more discussion, see this StackOverflow question. ↩
Technically, both of these steps require solving the Halting problem — to determine whether two sources do the same thing, you’d have to prove that they both halt in the same circumstances — but a semantically similar match for login/compiler source code is likely enough in practice. ↩

Thompson, K. (1984). Reflections on trusting trust. Communications of the ACM, 27(8), 761–763.