Header banner: Castle on a headland in Brittany, 2024

Privilege Separation in Go

Almost three weeks ago, I gave a talk on privilege separation in the Go programming language at FOSDEM 2025. In my talk, I was foolish enough to announce two blog posts, one I had already written and this one. After a few evenings where I found the time to work on this post, it is finally done.

The previous Dropping Privileges in Go post dealt with the privileges a computer program has. Usually, these privileges are derived from the executing user. If your user can read emails, a program executed by this user can do so as well. Then I showed certain techniques for giving up privileges on POSIX-like operating systems.

But is just limiting all privileges enough? Unfortunately, no, because there may be software projects dealing with both sensitive data as well as having dangerous code blocks. Consider an Internet-facing application that handles user credentials over a spooky committee-born protocol, perhaps even being parsed by a library notorious for security opportunities.

It is possible to split this application apart: Resulting in one restricted part having to deal with authentication, one even more restricted part handling the dangerous parser, and some communication in between. And, honestly, that is the gist of privilege separation. But since this has been a very superficial introduction, more details, specific to POSIX-leaning operating systems and the Go programming language, will follow.

Changes In Software Architecture

It might be a good idea to do some preliminary thinking before you start, to identify both the parts into which you want to divide the software, and the permissions that will be required throughout the life of those parts.

For example, a web application that manages its state in a SQLite database requires both network permissions (opening a port for incoming HTTP connections) and file system permissions (SQLite database). One could implement two subprocesses for each task and would end up with the supervising monitor process, a web server subprocesses (network permissions), and a SQLite subprocesses (file system permissions).

Taking the example from this theoretical level to the POSIX concepts introduced in my previous post, the supervising monitor process could launch two subprocesses, each running under unique user and group IDs. The network-facing subprocesses could be chrooted to an empty directory, while the database subprocess resides within a chroot containing the SQLite file. Alternatively, more modern but OS-specific features can be used to limit each process.

But, to address the second part of the initial issue, do our subprocesses really need those privileges throughout their lifetimes? The answer is very often “no”, especially if the software is designed to perform privileged operations first. An architectural goal might be to start with the most privileged operations, then drop those privileges, and continue this cycle until the main task can be performed, which might also be the most dangerous, e.g., parsing user input.

The example web server may only require the permissions to listen on a port at the beginning. After that, the subprocess should be fine with the file descriptor it already has.

Go Runtime

Nothing Go-specific has been stated so far. There are some elementary differences from C, such as Go having a runtime while C does not.

But there are low-level packages and functions in Go’s standard library that provide access to OS-specific features. Most prominent is the frozen syscall package, which was replaced by golang.org/x/sys/unix for maintenance reasons and to break it free from the Go 1 compatibility promise. So it is quite easy to port C-style privilege separation to Go, if one can adapt a bit.

Creating And Supervising Children

Even if Go has not quite reached C’s maturity, it has gone through over a decade of changes since Go version one. Starting processes is one of the rare situations where one can actually see them. While the syscall package had a ForkExec function, it did not made it into the golang.org/s/sys/unix package.

But wait, a quick interjection first. In C, fork(2) and exec(3) are two independent system calls. Using fork(2) creates a new process, and functions from the exec(3) family - like execve(2) - replace the current process with another process or, in simpler terms, start another process from an executable file in the current process. In Go’s syscall.ForkExec, these two low-level functions have been merged together to provide a more higher-level interface. This was most likely done to make it harder to break the Go runtime.

In addition to merging multiple functions into one, syscall.ForkExec also supports a wide range of specific attributes via syscall.SysProcAttr. These attributes includes user and group switching, chrooting and even cgroup support (v1, I’d guess). Unfortunately, this code was frozen ten years ago and SysProcAttr lacks documentation. Thus, I would advise taking a look at its implementation, but not to use it. One demotivating example might be the internal forkAndExecInChild1 function.

What to use instead? The os package has a Process type and the os/exec package provides an even more abstract interface. From now on, I will stick to os/exec and will do all privilege dropping by myself, even if os/exec still supports syscall.SysProcAttr.

For starters, a short demo to fork itself and select the child operation mode via a command line argument flag should do the trick. A bit glue code can be written around os/exec, resulting in the forkChild function shown in the demo below.

package main

import (
	"bufio"
	"flag"
	"fmt"
	"log"
	"os"
	"os/exec"
	"time"
)

// forkChild forks off a subprocess with -fork-child flag.
//
// The extraFiles are additional file descriptors for communication.
func forkChild(childName string, extraFiles []*os.File) (*os.Process, error) {
	// pipe(2) to communicate child's output back to parent
	logParent, logChild, err := os.Pipe()
	if err != nil {
		return nil, err
	}

	// For the moment, just print the child's output
	go func() {
		scanner := bufio.NewScanner(logParent)
		for scanner.Scan() {
			log.Printf("[%s] %s", childName, scanner.Text())
		}
		if err := scanner.Err(); err != nil {
			log.Printf("Child output scanner failed: %v", err)
		}
	}()

	cmd := &exec.Cmd{
		Path: os.Args[0],
		Args: append(os.Args, "-fork-child", childName),

		Env: []string{}, // don't inherit parent's env

		Stdin:      nil,
		Stdout:     logChild,
		Stderr:     logChild,
		ExtraFiles: extraFiles,
	}
	if err := cmd.Start(); err != nil {
		return nil, err
	}

	return cmd.Process, nil
}
func main() {
	var flagForkChild string
	flag.StringVar(&flagForkChild, "fork-child", "", "")
	flag.Parse()

	switch flagForkChild {
	case "":
		// Parent code
		childProc, err := forkChild("demo", nil)
		if err != nil {
			log.Fatalf("Cannot fork child: %v", err)
		}
		log.Printf("Started child process, wait for it to finish")

		childProcState, _ := childProc.Wait()
		log.Printf("Child exited: %d", childProcState.ExitCode())

	case "demo":
		// Child code
		for i := range 3 {
			fmt.Printf("hello world, %d\n", i)
			time.Sleep(time.Second)
		}
		fmt.Println("bye")

	default:
		panic("This example has only one child")
	}
}

While this example is quite trivial, it demonstrates how the parent process can .Wait() for children and even inspect the exit code. Using this information, the parent can monitor its children and raise an alarm, restart children or crash the whole execution if a child exits prematurely. In a more concrete example, where each child should run as long as the parent, the code waits for the first child to die or for a wild SIGINT to appear, to clean up all child processes.

Inter-Process Communication

This first example was nice and all, but a bit useless. So far, no communication between the processes - main/parent and demo - is possible. This can be solved by creating a bidirectional communication channel between two processes, e.g., via socketpair(2).

A socketpair(2) is similar to a pipe(2), but it is bidirectional (both ends can read and write) and supports certain features usually reserved to Unix domain sockets. Using the already mentioned golang.org/x/sys/unix package allows creating a trivial helper function.

// socketpair is a helper function wrapped around socketpair(2).
func socketpair() (parent, child *os.File, err error) {
	fds, err := unix.Socketpair(
		unix.AF_UNIX,
		unix.SOCK_STREAM|unix.SOCK_NONBLOCK,
		0)
	if err != nil {
		return
	}

	parent = os.NewFile(uintptr(fds[0]), "")
	child = os.NewFile(uintptr(fds[1]), "")
	return
}

The previously introduced forkChild function came with an extraFiles parameter, effectively setting exec.Cmd{ExtraFiles: extraFiles}. These extra files are then passed as file descriptors to the newly created process following the standard streams stdin, stdout and stderr with file descriptors 0, 1 and 2 respectively.

Linking socketpair and forkChild’s extraFiles allows passing a bidirectional socket as file descriptor 3 to the child.

Let’s follow this idea and modify the demo part to implement a simple string-based API that supports both the hello and bye commands returning a useful message back to the sender.

case "demo":
	// Child code
	cmdFd := os.NewFile(3, "")
	cmdScanner := bufio.NewScanner(cmdFd)
	for cmdScanner.Scan() {
		switch cmd := cmdScanner.Text(); cmd {
		case "hello":
			_, _ = fmt.Fprintln(cmdFd, "hello again")

		case "bye":
			_, _ = fmt.Fprintln(cmdFd, "ciao")
			return
		}
	}

This code starts by opening the third file descriptor as an os.File, using it both for a line-wise reader and as a writer for the output. The counterpart can be altered accordingly.

case "":
	// Parent code
	childCommParent, childCommChild, err := socketpair()
	if err != nil {
		log.Fatalf("socketpair: %v", err)
	}

	childProc, err := forkChild("demo", []*os.File{childCommChild})
	if err != nil {
		log.Fatalf("Cannot fork child: %v", err)
	}
	log.Printf("Started child process, wait for it to finish")

	cmdScanner := bufio.NewScanner(childCommParent)
	for _, cmd := range []string{"hello", "hello", "bye"} {
		_, _ = fmt.Fprintln(childCommParent, cmd)
		log.Printf("Send %q command to child", cmd)
		_ = cmdScanner.Scan()
		log.Printf("Received from child: %q", cmdScanner.Text())
	}

	childProcState, _ := childProc.Wait()
	log.Printf("Child exited: %d", childProcState.ExitCode())

In this example, the socketpair(2) is first created using our previously defined helper function. The child part of the socketpair is then passed to the newly created child process, while the parent part is then used for communication. As an example, hello is called twice, followed by a bye call, expecting the child to finish afterwards.

Running this demo will look as follows. The RPC API works!

2025/02/12 22:05:45 Started child process, wait for it to finish
2025/02/12 22:05:45 Send "hello" command to child
2025/02/12 22:05:45 Received from child: "hello again"
2025/02/12 22:05:45 Send "hello" command to child
2025/02/12 22:05:45 Received from child: "hello again"
2025/02/12 22:05:45 Send "bye" command to child
2025/02/12 22:05:45 Received from child: "ciao"
2025/02/12 22:05:45 Child exited: 0

This RPC is quite simple, even for demonstration purposes. So it should be replaced by something more powerful that one would expect to find in real-world applications.

Dropping Privileges

Wait, before we get serious about RPCs, we should first introduce dropping privileges. Otherwise, doing everything that follows would be useless.

The motivation for this post started with a mental image of a program being split into several subprograms, each running only with the necessary privileges. The first part - breaking down a program - has already been addressed. Now it is time to drop privileges.

Luckily, this section will be rather short, since I felt that I have wrote more than enough on this topic in my earlier Dropping Privileges in Go post. I will assume that it was read or at least skimmed.

Looking at the demonstration program, there is a main thread that starts the child before communicating with it, and the child itself just handling some IO. For this example, I am going to use my syscallset-go library to restrict system calls via Seccomp BPF. While this only works on Linux, there are mechanisms for other operating systems, as mentioned in my previous post, e.g., pledge(2) on OpenBSD.

The main program first needs the privileges to create a socketpair(2) and launch the other program. After that, it still communicates over the created file descriptor and monitors the other process. So there are two places where privileges can be dropped: initially and after launching the process. Please take a look at this altered main part, where the two highlighted syscallset.LimitTo blocks drop privileges.

case "":
	// Parent code
	if err := syscallset.LimitTo("@system-service"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	childCommParent, childCommChild, err := socketpair()
	if err != nil {
		log.Fatalf("socketpair: %v", err)
	}

	childProc, err := forkChild("demo", []*os.File{childCommChild})
	if err != nil {
		log.Fatalf("Cannot fork child: %v", err)
	}
	log.Printf("Started child process, wait for it to finish")

	if err := syscallset.LimitTo("@basic-io @io-event @process"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	cmdScanner := bufio.NewScanner(childCommParent)
	for _, cmd := range []string{"hello", "hello", "bye"} {
		_, _ = fmt.Fprintln(childCommParent, cmd)
		log.Printf("Send %q command to child", cmd)
		_ = cmdScanner.Scan()
		log.Printf("Received from child: %q", cmdScanner.Text())
	}

	childProcState, _ := childProc.Wait()
	log.Printf("Child exited: %d", childProcState.ExitCode())

Same must be done for the demo program, where the only privileged task is opening the file descriptor 3 for communication. Afterwards, this process only needs to do IO for its simple RPC task.

case "demo":
	// Child code
	if err := syscallset.LimitTo("@system-service"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	cmdFd := os.NewFile(3, "")

	if err := syscallset.LimitTo("@basic-io @io-event"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	cmdScanner := bufio.NewScanner(cmdFd)
	for cmdScanner.Scan() {
		switch cmd := cmdScanner.Text(); cmd {
		case "hello":
			_, _ = fmt.Fprintln(cmdFd, "hello again")

		case "bye":
			_, _ = fmt.Fprintln(cmdFd, "ciao")
			return
		}
	}

Let’s take a moment to reflect on what was accomplished so far. Splitting up the process and applying different system call filters resulted in a first privilege separated demo. This was actually a lot less code than one might expect.

Using A Real RPC

After reminding ourselves of how to drop privileges, we will move on to a larger example using this technique while also using a more mature RPC. Since I find gRPC too powerful for this task, I will stick to Go’s net/rpc package, despite its shortcomings and feature-frozen state.

While the previous examples were very demo-like, the following one should be a bit more realistic. When finished, a child process should serve a simplified interface to a SQLite database, allowing only certain requests, while the main process should also drop privileges and serve the database’s content through a web server. To give it a realistic spin, let’s call this a web blog (or blog, as the cool kids say).

The skeleton with the forkChild method and the -fork-child command line argument based main method remains. However, the database needs some code, especially some that can be used by net/rpc. The following should work, creating a Database type and two RPC methods, ListPosts and GetPost.

// Database is a wrapper type around *sql.DB.
type Database struct {
	db *sql.DB
}

// OpenDatabase opens or creates a new SQLite database at the given file.
//
// If the database should be created, it will be populated with the posts table
// and two example entries.
func OpenDatabase(file string) (*Database, error) {
	_, fileInfoErr := os.Stat(file)
	requiresSetup := errors.Is(fileInfoErr, os.ErrNotExist)

	db, err := sql.Open("sqlite3", file)
	if err != nil {
		return nil, err
	}

	if requiresSetup {
		if _, err := db.Exec(`
			CREATE TABLE posts (id INTEGER NOT NULL PRIMARY KEY, text TEXT);
			INSERT INTO posts(id, text) VALUES (0, 'hello world!');
			INSERT INTO posts(id, text) VALUES (1, 'second post, wow');
		`); err != nil {
			return nil, fmt.Errorf("cannot prepare database: %w", err)
		}
	}

	return &Database{db: db}, nil
}

// ListPosts returns all post ids as an array of integers.
//
// This method follows the net/rpc method specification.
func (db *Database) ListPosts(_ *int, ids *[]int) error {
	rows, err := db.db.Query("SELECT id FROM posts")
	if err != nil {
		return err
	}

	*ids = make([]int, 0, 128)
	for rows.Next() {
		var id int
		if err := rows.Scan(&id); err != nil {
			return err
		}
		*ids = append(*ids, id)
	}
	return rows.Err()
}

// GetPost returns a post's text for the id.
//
// This method follows the net/rpc method specification.
func (db *Database) GetPost(id *int, text *string) error {
	return db.db.QueryRow("SELECT text FROM posts WHERE id = ?", &id).Scan(text)
}

Without further ado, create a database main entry using this Database type. In this case, the child will not be named demo, since it now serves a real purpose.

case "database":
	// SQLite database child for posts
	if err := syscallset.LimitTo("@system-service"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	rpcFd := os.NewFile(3, "")

	db, err := OpenDatabase("posts.sqlite")
	if err != nil {
		log.Fatalf("Cannot open SQLite database: %v", err)
	}

	if err := landlock.V5.BestEffort().RestrictPaths(
		landlock.RODirs("/proc"),
		landlock.RWFiles("posts.sqlite"),
	); err != nil {
		log.Fatalf("landlock: %v", err)
	}
	if err := syscallset.LimitTo("@basic-io @io-event @file-system"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	rpcServer := rpc.NewServer()
	rpcServer.Register(db)
	rpcServer.ServeConn(rpcFd)

This code starts by dropping some privileges to prohibit the juicy syscalls. Then it opens the already known file descriptor 3 next to the SQLite database located at posts.sqlite.

Since no more files need to be accessed at this point, the privileges are being dropped again. Starting with Landlock LSM, only allowing read-only access to Linux’ /proc required by some Go internals and read-write access to the posts.sqlite database file. Next comes a stricter system call filter.

Finally, the net/rpc is started on the third file descriptor serving the Database. This will block until the connection is closed, which effectively means the child has finished.

The main part now needs to follow. As initially outlined, it should start by forking off the database child, then drop its own privileges, and finally serving a web server. Since there are two RPC methods Database.ListPosts and Database.GetPost, they can be queried from the main code and used to build the web frontend for this blog.

case "":
	// Parent: Starts children, drops to HTTP server
	if err := syscallset.LimitTo("@system-service"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	databaseCommParent, databaseCommChild, err := socketpair()
	if err != nil {
		log.Fatalf("socketpair: %v", err)
	}

	_, err = forkChild("database", []*os.File{databaseCommChild})
	if err != nil {
		log.Fatalf("Cannot fork database child: %v", err)
	}

	httpLn, err := net.Listen("tcp", ":8080")
	if err != nil {
		log.Fatalf("cannot listen: %v", err)
	}

	if err := landlock.V5.BestEffort().RestrictPaths(
		landlock.RODirs("/proc"),
	); err != nil {
		log.Fatalf("landlock: %v", err)
	}
	if err := syscallset.LimitTo("@basic-io @io-event @network-io @file-system"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	rpcClient := rpc.NewClient(databaseCommParent)

	httpMux := http.NewServeMux()
	httpMux.HandleFunc("GET /", func(w http.ResponseWriter, r *http.Request) {
		w.Header().Add("Content-Type", "text/html")

		var ids []int
		err = rpcClient.Call("Database.ListPosts", 0, &ids)
		if err != nil {
			http.Error(w, "cannot list posts: "+err.Error(), http.StatusInternalServerError)
			return
		}

		_, _ = fmt.Fprint(w, `<ul>`)
		for _, id := range ids {
			_, _ = fmt.Fprintf(w, `<li><a href="/post/%d">Post %d</a></li>`, id, id)
		}
		_, _ = fmt.Fprint(w, `</ul>`)
	})
	httpMux.HandleFunc("GET /post/{id}", func(w http.ResponseWriter, r *http.Request) {
		w.Header().Add("Content-Type", "text/html")

		id, err := strconv.Atoi(r.PathValue("id"))
		if err != nil {
			http.Error(w, "cannot parse ID: "+err.Error(), http.StatusInternalServerError)
			return
		}

		var text string
		err = rpcClient.Call("Database.GetPost", &id, &text)
		if err != nil {
			http.Error(w, "cannot fetch post: "+err.Error(), http.StatusInternalServerError)
			return
		}

		_, _ = fmt.Fprintf(w, `<h1>Post %d</h1><p>%s</p>`, id, html.EscapeString(text))
	})

	httpd := http.Server{Handler: httpMux}
	log.Fatal(httpd.Serve(httpLn))

Like the child, the main part started by dropping system calls to the reasonable @system-service group. Then it creates the socketpair(2) and launches the child, as done in the previous examples. Another privileged operation follows, opening a TCP port to listen on :8080.

At this point, privileges can be dropped further, again using Landlock LSM and Seccomp BPF. Following, an RPC client connection is established against the parent’s side of the socketpair(2) and the web server’s endpoints are defined. All known posts should be listed on /, which can be requested from the RPC client via the Database.ListPosts method. More details will then be available on /post/{id} using the Database.GetPost RPC method.

In the end, a http.Server has been created and is being served on the previously created TCP listener. Incoming requests are served and result in an RPC call to the child process that is allowed to access the database.

Passing Around File Descriptors

While this RPC works well for these constraints, what about file access? Think about an RPC API that needs to access lots of files and pass the contents from one process to another. Reading the whole file, encoding it, sending it, receiving it and decoding it does not sound very efficient. Fortunately, there is a way to actually share file descriptors between processes for POSIX.

One process can open a file, pass the file descriptor to the other process, and close the file again, while the other process can now read the file, even though it would not have no access to it. The art of passing file descriptors is a bit more obscure and beautifully explained in chapter 17.4 Passing File Descriptors of the definitive book Advanced Programming in the Unix Environment. If you are interested in the details, please check it out - PDFs are available online.

The following works as our socketpair(2) call created a pair of AF_UNIX sockets, effectively being Unix domain sockets In addition to exchanging streaming data over a Unix domain socket, it is also possible to pass specific messages. But let’s start with the code, which may look a bit cryptic on its own.

// unixConnFromFile converts a file (FD) into an Unix domain socket.
func unixConnFromFile(f *os.File) (*net.UnixConn, error) {
	fConn, err := net.FileConn(f)
	if err != nil {
		return nil, err
	}

	conn, ok := fConn.(*net.UnixConn)
	if !ok {
		return nil, fmt.Errorf("cannot use (%T, %T) as *net.UnixConn", f, conn)
	}
	return conn, nil
}

// sendFd sends an open File (its FD) over an Unix domain socket.
func sendFd(f *os.File, conn *net.UnixConn) error {
	oob := unix.UnixRights(int(f.Fd()))
	_, _, err := conn.WriteMsgUnix(nil, oob, nil)
	return err
}

// recvFd receives a File (its FD) from an Unix domain socket.
func recvFd(conn *net.UnixConn) (*os.File, error) {
	oob := make([]byte, 128)
	_, oobn, _, _, err := conn.ReadMsgUnix(nil, oob)
	if err != nil {
		return nil, err
	}

	cmsgs, err := unix.ParseSocketControlMessage(oob[0:oobn])
	if err != nil {
		return nil, err
	} else if len(cmsgs) != 1 {
		return nil, fmt.Errorf("ParseSocketControlMessage: wrong length %d", len(cmsgs))
	}

	fds, err := unix.ParseUnixRights(&cmsgs[0])
	if err != nil {
		return nil, err
	} else if len(fds) != 1 {
		return nil, fmt.Errorf("ParseUnixRights: wrong length %d", len(fds))
	}

	return os.NewFile(uintptr(fds[0]), ""), nil
}

Starting with the unixConnFromFile function, which creates a *net.UnixConn based on a generic *os.File. This allows converting one end of the socketpair(2) to a Unix domain socket without losing Go’s type safety.

Then, the sendFd function encodes the file descriptor to be sent into a socket control message and sends it over the virtual wire. On the other side, the recvFd function waits for such a control message, unpacks it and returns a new *os.File to be used.

To give a little background, each process has its own file descriptor table, each entry is represented in the kernel’s file table, which is eventually mapped to a vnode entry. Thus, one process’ file descriptor 42 and another’s file descriptor 23 could actually be the same file. Same applies here, sending a file descriptor will most likely result in a different file descriptor number at the receiving end. However, the kernel will take care that this little stunt works.

Again, please consult Stevens’ Advanced Programming in the Unix Environment for more details or take a look at the implementation in the golang.org/x/sys/unix package. Or just accept that it works and move on.

Let’s extend the previous example and add another child process that serves pictures for each blog post to be shown. This child will need file system access to a directory of images, sending them over to the main process via file descriptor passing, as just introduced.

First, implement the new child.

case "img":
	// File storage child to send pictures for posts back as a file descriptor
	if err := syscallset.LimitTo("@system-service"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	rpcFd := os.NewFile(3, "")

	rpcSock, err := unixConnFromFile(rpcFd)
	if err != nil {
		log.Fatalf("cannot create Unix Domain Socket: %v", err)
	}

	imgDir, err := filepath.Abs("./cmd/07-05-fork-exec-rpc/imgs/")
	if err != nil {
		log.Fatalf("cannot abs: %v", err)
	}

	if err := landlock.V5.BestEffort().RestrictPaths(
		landlock.RODirs("/proc", imgDir),
	); err != nil {
		log.Fatalf("landlock: %v", err)
	}
	if err := syscallset.LimitTo("@basic-io @io-event @file-system @network-io"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	rpcScanner := bufio.NewScanner(rpcFd)
	for rpcScanner.Scan() {
		file, err := filepath.Abs(imgDir + "/" + rpcScanner.Text() + ".png")
		if err != nil {
			log.Printf("cannot abs: %v", err)
			continue
		}
		if dir := filepath.Dir(file); dir != imgDir {
			log.Printf("file directory %q mismatches, expected %q", dir, imgDir)
			continue
		}

		f, err := os.Open(file)
		if err != nil {
			log.Printf("cannot open: %v", err)
			continue
		}

		if err := sendFd(f, rpcSock); err != nil {
			log.Printf("cannot send file descriptor: %v", err)
		}
		_ = f.Close()
	}

I hope you are not bored reading this kind of code. It gets pretty repetitive, I know. But please bear with me and follow me through the img child.

The first part should be quite familiar by now: forbidding some syscalls and opening file descriptor 3. But now the third file descriptor is also converted to a Unix domain socket for later use. Landlock LSM restricts directory access to the directory containing the pictures, and a stricter Seccomp BPF filter follows.

After that, a simple string-based RPC is being used again, which reads what files to open line by line. Besides a simple prefix check, the Landlock LSM filter denies everything outside the allowed directory. If the file can be opened, its file descriptor will be send back to the main process.

A few small changes are required in the main process. They are highlighted and explained below.

case "":
	// Parent: Starts children, drops to HTTP server
	if err := syscallset.LimitTo("@system-service"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	databaseCommParent, databaseCommChild, err := socketpair()
	if err != nil {
		log.Fatalf("socketpair: %v", err)
	}
	imgCommParent, imgCommChild, err := socketpair()
	if err != nil {
		log.Fatalf("socketpair: %v", err)
	}

	imgCommSock, err := unixConnFromFile(imgCommParent)
	if err != nil {
		log.Fatalf("cannot create Unix Domain Socket: %v", err)
	}

	_, err = forkChild("database", []*os.File{databaseCommChild})
	if err != nil {
		log.Fatalf("Cannot fork database child: %v", err)
	}
	_, err = forkChild("img", []*os.File{imgCommChild})
	if err != nil {
		log.Fatalf("Cannot fork img child: %v", err)
	}

	httpLn, err := net.Listen("tcp", ":8080")
	if err != nil {
		log.Fatalf("cannot listen: %v", err)
	}

	if err := landlock.V5.BestEffort().RestrictPaths(
		landlock.RODirs("/proc"),
	); err != nil {
		log.Fatalf("landlock: %v", err)
	}
	if err := syscallset.LimitTo("@basic-io @io-event @network-io @file-system"); err != nil {
		log.Fatalf("seccomp-bpf: %v", err)
	}

	rpcClient := rpc.NewClient(databaseCommParent)

	httpMux := http.NewServeMux()
	httpMux.HandleFunc("GET /", func(w http.ResponseWriter, r *http.Request) {
        // Same as before.
	})
	httpMux.HandleFunc("GET /post/{id}", func(w http.ResponseWriter, r *http.Request) {
		w.Header().Add("Content-Type", "text/html")

		id, err := strconv.Atoi(r.PathValue("id"))
		if err != nil {
			http.Error(w, "cannot parse ID: "+err.Error(), http.StatusInternalServerError)
			return
		}

		var text string
		err = rpcClient.Call("Database.GetPost", &id, &text)
		if err != nil {
			http.Error(w, "cannot fetch post: "+err.Error(), http.StatusInternalServerError)
			return
		}

		_, _ = fmt.Fprintln(imgCommParent, id)
		imgFd, err := recvFd(imgCommSock)
		if err != nil {
			http.Error(w, "cannot fetch img: "+err.Error(), http.StatusInternalServerError)
		}
		defer imgFd.Close()

		_, _ = fmt.Fprintf(w, `<h1>Post %d</h1><p>%s</p>`, id, html.EscapeString(text))
		_, _ = fmt.Fprint(w, `<img src="data:image/png;base64,`)

		encoder := base64.NewEncoder(base64.StdEncoding, w)
		io.Copy(encoder, imgFd)
		encoder.Close()

		_, _ = fmt.Fprint(w, `" />`)
	})

	httpd := http.Server{Handler: httpMux}
	log.Fatal(httpd.Serve(httpLn))

The first changes are to create another socketpair(2) and fork off the second child. Except for also creating a unixConnFromFile, they are analogous to the startup code for the first child process.

The interesting part happens inside the HTTP handler for /post/{id}. If the SQLite database knows of a post for the id, that id is written to the parent’s socketpair(2) end to be read the by the img child’s RPC loop. The code then waits to receive a file descriptor over the Unix domain socket created on the same connection. After receiving the file descriptor, its content is copied into a base64 encoder and written as an encoded image back to the web response.

This example now has two subprocesses, each running with differently restricted privileges. An RPC mechanism in between allows inter-process communication, including the passing of file descriptors. At this point, it is safe to say that privilege separation has been achieved.

What’s Next?

This post was the logical successor to Dropping Privileges in Go. While the first one described how to drop various privileges, this one focused on architectural changes to drop privileges more granularly. Adding privilege separation to the toolbox of software architectures makes it possible to build more robust software under the assumption that the software will be pwned some day.

The examples shown here and more are available in a public git repository at codeberg.org/oxzi/go-privsep-showcase. Before creating these explicit examples, I have toyed with these technologies in a “research project” of mine, called gosh. Please feel free to take a look at it for more inspirations.

There are still some related topics I plan to write about, but I would not go so far and create any announcements. Stay tuned.