A Discussion of Ken Thompson's "Reflections on Trusting Trust"

Ken Thompson's Turing award lecture "Reflections on Trusting Trust" took me a while to grasp, but proved immensely rewarding to understand. Here, I discuss the exploit presented in an approachable manner.

Thompson’s “Reflections on Trusting Trust” (Thompson, 1984) is a short and effective paper discussing a fascinating exploit and its implications on trust in computer security. The following discussion is based on the aforementioned paper as well as discussions here, here, and here.

Introduction

To begin, consider the C compiler, which is bootstrapped and written in C. For simplicity, let the compiler routine be represented as follows.

compile (code)
{
  /* Do regular compilation things */
  ...
}

The Unix kernel has a login command that accepts an encrypted password and compares it to the stored user password in /etc/password, allowing access on a match and rejecting access otherwise. Say we wanted to insert a backdoor into this command, which would accept either the encrypted password or a particular known password. Doing so would allow a third-party user to gain access to any system given knowledge of this additional acceptable password.

A Trivial Backdoor

As a first attempt to insert such a backdoor, we can modify the C compiler to compile the backdoor instead of the traditional login code, but behave normally in all other situations. In particular, our modification might look like

compile (code)
{
  /* If the code we're compiling is code for the Unix login command */
  if (match (code, login_code_pattern))
  {
    compile (backdoor);
    return;
  }

  else
  {
    /* Do regular compilation things */
    ...
  }
}

This would work perfectly, so that whenever the Unix kernel was compiled, the code generated by the compiler would include our desired backdoor. However, our exploit isn’t fooling anyone: a simple look at the C compiler source code would immediately raise complaints.

Thompson’s Solution

The brilliance of Thompson’s exploit is the ability to insert such a backdoor in the C compiler, so that the compiler inserts the backdoor when it compiles itself and leaves no trace that the exploit was performed (aside from within the compiled assembly code).

We begin by further modifying the C compiler with a malicious program that is executed when the compiler compiles itself. In particular, when the compiler is asked to compile itself, it instead compiles a version of itself with the login backdoor and the malicious program inserted. To summarize, our compiler now includes two new procedures: 1

  1. If asked to compile the Unix kernel login code, it instead compiles the backdoor code
  2. If asked to compile itself, it instead compiles a version of itself with both the logic in (1) and the malicious code that executes when the compiler compiles itself.

Note that for procedure (2) to work properly, the malicious program that the compiler executes when asked to compile itself must be able to print its own source code, so that it can insert its source code into the modified code to be compiled. Programs that can do so are called quines.

Our resulting compiler looks like

compile (code)
{
  /* If the code we're compiling is code for the Unix login command */
  if (match (code, login_code_pattern))
  {
    compile (backdoor);
    return;
  }

  /* If the code we're compiling is similar to the compiler source code */
  if (match (code, compiler_code_pattern))
  {
    compile (compiler_code_with_both_if_statements_inserted);
    return;
  }

  else
  {
    /* Do regular compilation things */
    ...
  }
}

Now, when we compile the C compiler with itself, we obtain a binary that includes the code to insert a backdoor upon compilation of the Unix kernel login command. We finally delete our modifications to the compiler code, so that we leave no trace of our exploit; however, the exploit code remains within the binary of the C compiler.

Since the C compiler compiles itself, any future (perfectly exploit-free) version of the C compiler that is compiled by our poisoned binary will include the backdoor due to procedure (2). And, if the Unix kernel is compiled by our poisoned binary, it will include the login backdoor, as desired.

The Moral of the Story

Thompson’s solution is initially quite worrying: if you can’t trust the C compiler, and you can’t trust the compiler binary, what do you do? You can re-build the compiler binary, but you can’t trust the binary you used to re-build… and so on, in an indefinite loop of recursive paranoia.

The moral of the paper, then, is quite simple: it’s impossible to operate without trusting someone. Unless you’ve written all (and I mean ALL) the code yourself, you’ll have to trust the security of some part of the program-handling process. Indeed, as Thompson states,

In demonstrating the possibility of this attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well-installed microcode bug will be almost impossible to detect.

Notes

  1. Technically, both of these steps require solving the Halting problem — to determine whether two sources do the same thing, you’d have to prove that they both halt in the same circumstances — but a semantically similar match for login/compiler source code is likely enough in practice. 

  1. Thompson, K. (1984). Reflections on trusting trust. Communications of the ACM, 27(8), 761–763.