Dropping Privileges in Go

February 1, 2025

Computer programs may do lots of things, both intended and unintended. What they can do is limited by their privileges. Since most operating systems execute programs as a certain user, the program has all the user’s precious privileges.

To take a concrete example, if a user has an SSH private key laying around and runs, e.g., a chat program, then this program is able to read the private key even though it has nothing to do with it. Assuming that this chat is exploitable, then an attacker might instruct the chat through a crafted message to exfiltrate the private key.

Maybe not the issue’s core, but the damage is rooted in the fact that a program was able to access a resource that it should not be able to access in the first place. As writing secure software is out of scope, the private key could have been saved if the principle of least privilege would have been enforced by some means. It says, in a nutshell, that each component, i.e., the chat software, should only have the necessary privileges and nothing more. Many roads might lead to this state, e.g., not using the same user for private key interactions and chatting or sandboxing the chat application.

When developing software, the developer should know what their tool should be able to do. Thus, they are able to carve out the allowed territories, denying everything else with the help of system features. As a metaphor, think of a werewolf chaining themself up before full moon.

In case you are asking yourself right now why you should do this to your code as it will never fail, then especially you should do this. For most applications out there, the question is not if they can be broken, but more when they will be broken. Since I wrote many bugs throughout the years and saw exploits unable to grasp, I am trying to self-chain all my future werewolves.

Changes In Software Architecture

The idea of self-restricting software is that given up privileges cannot be gotten back. For example, once the program denied itself file system access, no more files can be opened.

The software starts as a certain user, sometimes the root user, e.g., to use an restricted network port. Thus, after starting to listen on this port, this privilege can be dropped, e.g., by switching to an unprivileged user while keeping the file descriptor. The software continues accepting connections on the prior bound port, but cannot start listening on other restricted ports.

This limitation must be taken into account when planning the software. Instead of being able to access all resources when necessary, they must be acquired in the beginning phase before self-restricting. To introduce another questionable metaphor, think about a funnel or an upside down cone: Your program starts with all these privileges, dropping it along as it goes until it continues with just the bare minimum.

Good Old Chroot And User-Switch

Let’s start with the classic approach of chrooting and user/group changing. I called it classical, because this variant goes back to the early 1990s and it works on any POSIX-like operating system (think BSDs, Linux and friends).

Unfortunately, this approach has the annoyance that the necessary system calls are reserved for the root user. While in most cases this is not a problem for daemons, this would be a blocker for end user applications, like GUIs. Instead of suggesting some SUID file flag madness, secure alternatives for rootless scenarios will follow.

`chroot(2)`

First things first: chroot(2) changes the process’ root directory to the supplied one. For example, / becomes /var/empty and accessing /etc/passwd would actually try to open /var/empty/etc/passwd. To activate the chroot(2), the process needs to chdir(2) into it.

Unless further actions are taken, an attacker can break out of a chroot. Actually, chrooting is not a security feature, but can be used to build one - as this post attempts. Nevertheless, please be aware of the limitations.

A chroot directly impacts the process. If it should not interact with any files, chrooting to /var/empty or some just created empty directory makes sense. If there is one directory, one can consider chrooting to this directory. If, however, files from different locations must be accessed, a strict chroot can be a burden. This is a decision one has to take on a case by case basis.

After importing golang.org/x/sys/unix, the following snippet is enough to chroot(2) the process to /var/empty, which is a “[g]eneric chroot(2) directory”, according to OpenBSD’s hier(2).

if err := unix.Chroot("/var/empty"); err != nil {
    log.Fatalf("chroot: %v", err)
}
if err := unix.Chdir("/"); err != nil {
    log.Fatalf("chdir: %v", err)
}

`setuid(2)` or `setresuid(2)`

The process may now be chrooted to an empty directory, but otherwise still runs as root. To give up root privileges, let the process switch user rights to an unprivileged user without qualities.

Throughout the ages, multiple syscalls for user switching have emerged, starting at setuid(2). While not strictly being part of POSIX, the setresuid(2) syscall is available on most operating systems. It allows setting the real, effective and saved user ID, where there are fine differences between them. These may differ when developing or using a SUID application, having the real user ID of your user and the effective user ID of the root user. In this case, however, we just want to drop all privileges to our unqualified user, setting all three user IDs to the same user.

The same applies to groups with setresgid(2). In addition, as a process may have multiple groups, this list can be shortened via setgroups(2). Doing so results in only the group privileges of the given groups applies, not all other user groups.

For the example, create an unprivileged worker user. On a Linux, this can be done as the following:

$ sudo useradd \
    --home-dir /var/empty \
    --system \
    --shell /run/current-system/sw/bin/nologin \
    --user-group \
    demoworker
$ id demoworker
uid=992(demoworker) gid=987(demoworker) groups=987(demoworker)

Continue with the following short code block:

// Prior chroot code

uid, gid := 992, 987
if err := unix.Setgroups([]int{gid}); err != nil {
    log.Fatalf("setgroups: %v", err)
}
if err := unix.Setresgid(gid, gid, gid); err != nil {
    log.Fatalf("setresgid: %v", err)
}
if err := unix.Setresuid(uid, uid, uid); err != nil {
    log.Fatalf("setresuid: %v", err)
}

While this snippet works as a demonstration, having to actually configure the user and group ID is a bit too much. Thus, let the code do the lookup by writing a short helper function around os/user.

One word of caution regarding the os/user package: It caches the current user and one cannot simply invalidate this cache. Thus, after using the following helper function user.Current() will always return whatever executed this function first.

// uidGidForUserGroup fetches an UID and GID for the given user and group.
func uidGidForUserGroup(username, groupname string) (uid, gid int, err error) {
	userStruct, err := user.Lookup(username)
	if err != nil {
		return
	}
	userId, err := strconv.ParseInt(userStruct.Uid, 10, 64)
	if err != nil {
		return
	}
	groupStruct, err := user.LookupGroup(groupname)
	if err != nil {
		return
	}
	groupId, err := strconv.ParseInt(groupStruct.Gid, 10, 64)
	if err != nil {
		return
	}

	uid, gid = int(userId), int(groupId)
	return
}

When using this function, a word of caution is necessary as this might be the first situation where the chroot can shoot us in the foot. The lookup requires the /etc/passwd and /etc/group files to be accessible. Thus, when already be chrooted to /var/empty, this will obviously fail:

open /etc/passwd: no such file or directory

First, do the lookup, then chroot(2), and finally perform the user/group switching.

// Start with root privileges, do necessary lookups.
uid, gid, err := uidGidForUserGroup("demoworker", "demoworker")
if err != nil {
    log.Fatalf("user/group lookup: %v", err)
}

// Drop into chroot
if err := unix.Chroot("/var/empty"); err != nil {
    log.Fatalf("chroot: %v", err)
}
if err := unix.Chdir("/"); err != nil {
    log.Fatalf("chdir: %v", err)
}

// Switch to an unprivileged user unable to escape chroot
if err := unix.Setgroups([]int{gid}); err != nil {
    log.Fatalf("setgroups: %v", err)
}
if err := unix.Setresgid(gid, gid, gid); err != nil {
    log.Fatalf("setresgid: %v", err)
}
if err := unix.Setresuid(uid, uid, uid); err != nil {
    log.Fatalf("setresuid: %v", err)
}

// Application code follows

Limiting Resources The POSIX Way

After chrooting and dropping root user privileges, the code may no longer access privileged system APIs or some files, but can still run cycles. For example, a parser may be vulnerable to a billion laughs attack, resulting in either 100% CPU load or even memory exhaustion.

One good old POSIX way to limit different kinds of resources is setrlimit(2). Depending on the targeting operating system, different resources are defined.

Both horror scenarios from the example can be addressed either via RLIMIT_CPU for CPU time or via RLIMIT_DATA for data.

`RLIMIT_CPU`

The CPU time or process time is the amount of actively consumed CPU cycles of a single process. If the process calculates something nonstop, this counter raises. However, if the process waits for certain events, the counter idles as well.

As an example, idle for a bit and then burn CPU cycles by useless hash computations while limiting the CPU time to one second.

if err := unix.Setrlimit(
    unix.RLIMIT_CPU,
    &unix.Rlimit{Max: 1},
); err != nil {
    log.Fatal("setrlimit: %v", err)
}

log.Println("CPU time != execution time, hanging low")
time.Sleep(5 * time.Second)
log.Println("STRESS!")

buff := make([]byte, 32)
for i := uint64(1); ; i++ {
    _, _ = rand.Read(buff)
    _ = sha256.Sum256(buff)

    if i%100_000 == 0 {
        log.Printf("Run %d", i)
    }
}

Putting this example into a main function shows how the process gets aborted after consuming too much CPU time.

2025/01/26 20:17:18 CPU time != execution time, hanging low
2025/01/26 20:17:23 STRESS!
2025/01/26 20:17:23 Run 100000
[ . . . ]
2025/01/26 20:17:24 Run 800000
[1]    190379 killed     ./02-01-setrlimit-cpu

`RLIMIT_DATA`

This second example limits the maximum data segments to 10MiB or, in other words, restricts the amount of available memory to 10MiB.

The code will allocate memory within an infinite loop, resulting an out of memory situation. However, due to the setrlimit(2) call, the process gets aborted.

if err := unix.Setrlimit(
    unix.RLIMIT_DATA,
    &unix.Rlimit{Max: 10 * 1024 * 1024},
); err != nil {
    log.Fatal("setrlimit: %v", err)
}

var blobs [][]byte
for i := uint64(1); ; i++ {
    buff := make([]byte, 1024)
    _, _ = rand.Read(buff)
    blobs = append(blobs, buff)

    if i%1_000 == 0 {
        log.Printf("Allocated %dK", i)
    }
}

And this will be aborted due to its memory hunger.

2025/01/26 20:17:44 Allocated 1000K
2025/01/26 20:17:44 Allocated 2000K
fatal error: runtime: out of memory
[ . . . ]

Good Hard Limits?

These two examples set hard limits, resulting in eventually aborting the process. Especially RLIMIT_CPU, being an increasing counter, will be reached.

Thus, what are good values? Depends, of course.

Just to be sure, if deciding to set any limits, make them high enough to not be reached anyhow during normal operation. If something goes south, they are still there as a safety net.

And what about soft limits? That is an exercise left for the reader.

Doubling Down With OS-specific Features

Everything so far should work on most POSIX-like operating systems. As an advantage, using these patterns in your program may work on platforms you did not even know existed.

But there are also OS-specific mechanisms allowing to drop privileges, even when not starting as the root but with an usual user. Since my personal experience is limited to Linux and OpenBSD, I will address some of their features. With OpenBSD having simpler APIs to use, I will start there.

Restricting Syscalls On OpenBSD

The operating system’s kernel verifies if the user privileges are sufficient to access some resource. For example, when trying to open(2) a file, the operating system may deny this. This check happens within open(2). But what if the program itself cannot even use open(2) since the developer knows that it never must open a file in the first place. Welcome to system call filtering, allowing a program to restrict what syscalls are being used later on.

If there is a thing like having a favorite system call, mine might be OpenBSD’s pledge(2). It provides a simple string-based API to restrict the available system calls based on space separated keywords, called promises.

These promises are names of syscall groups, e.g., rpath for read-only system calls regarding the file system. By adding exec to the list, executing other programs will be allowed, starting them with their own promise, given in the second parameter.

int pledge(const char *promises, const char *execpromises);

After a pledge(2) was made, it cannot be undone, only be tightened. Tightening means calling pledge(2) another time with a shorter promises list. In case of violating the syscall promise, the process gets either killed or, if error is part of the promise, the denied system call returns an error.

As being a system call, it is available for Go in golang.org/x/sys/unix.

Unfortunately, the web Go Packages thingy only renders the docs for some selected platforms, not including OpenBSD. Thus, I took the liberty to paste the docs below. By the way, setting the GOOS or GOARCH environment variable also works for go doc, e.g., GOOS=openbsd go doc -all golang.org/x/sys/unix works on a Linux.

func Pledge(promises, execpromises string) error
    Pledge implements the pledge syscall.

    This changes both the promises and execpromises; use PledgePromises or
    PledgeExecpromises to only change the promises or execpromises respectively.

    For more information see pledge(2).

func PledgeExecpromises(execpromises string) error
    PledgeExecpromises implements the pledge syscall.

    This changes the execpromises and leaves the promises untouched.

    For more information see pledge(2).

func PledgePromises(promises string) error
    PledgePromises implements the pledge syscall.

    This changes the promises and leaves the execpromises untouched.

    For more information see pledge(2).

Now, there are three Go functions for pledge(2): One to set both parameters and one to set only the first or second one. Let’s create an example for a simple program working with an input file, making a promise just allowing to read the file and making another tighter one after it was read. Use your imagination what the program should do, e.g., it can convert an image to another format and printing it out to stdout.

// Start with limited privileges
if err := unix.PledgePromises("stdio rpath error"); err != nil {
    log.Fatalf("pledge: %v", err)
}

// Read input file
f, err := os.Open("input")
if err != nil {
    log.Fatalf("cannot open input: %v", err)
}
inputFile, err := io.ReadAll(f)
if err != nil {
    log.Fatalf("cannot read input: %v", err)
}
if err := f.Close(); err != nil {
    log.Fatalf("cannot close input: %v", err)
}

// Drop further, reading files is no loner necessary
if err := unix.PledgePromises("stdio error"); err != nil {
    log.Fatalf("pledge: %v", err)
}

// Do some computation based on the input

As the example shows, using pledge(2) is both easy and boring. That might be the reason why most programs being shipped with OpenBSD are pledged and there are lots of patches for ported software. Only one command results in dropping so much privileges. Impressive.

Restricting File System Access On OpenBSD

This post opened with the constructed example of a pwned chat program used to exfiltrate the user’s SSH private key. What if a program could make a promise which file system path are needed, denying every other access? OpenBSD’s unveil(2) addresses this, similar as pledge(2) does for system calls.

Multiple unveil(2) calls are creating an allow-list to unveiled paths the program is allowed to access. Each call adds a path and the kind of permission: read, write, exec and/or create. After a finalizing call of two empty parameters, this will be enforced.

Thus, if the chat program would use unveil(2) for relevant directories - definitely not containing ~/.ssh -, this exploit would have been mitigated.

int unveil(const char *path, const char *permissions);

This system call is available for Go in golang.org/x/sys/unix as well.

func Unveil(path string, flags string) error
    Unveil implements the unveil syscall. For more information see unveil(2).
    Note that the special case of blocking further unveil calls is handled by
    UnveilBlock.

func UnveilBlock() error
    UnveilBlock blocks future unveil calls. For more information see unveil(2).

Thus, in Go multiple unix.Unveil(...) calls might be issued with a closing unix.UnveilBlock().

Continuing with the chat program example, a program can be written to allow read/write/create access to some Download directory to store cringy memes. All other file system requests should be denied.

// Restrict read/write/create file system access to the ./Download directory.
// This does not include exec!
if err := unix.Unveil("Download", "rwc"); err != nil {
    log.Fatalf("unveil: %v", err)
}
if err := unix.UnveilBlock(); err != nil {
    log.Fatalf("unveil: %v", err)
}

// Buggy application starts here: allowing path traversal
userInput := "../.ssh/id_ed25519"

f, err := os.Open("Download/" + userInput)
if err != nil {
    log.Fatalf("cannot open file: %v", err)
}
defer f.Close()

privateKey, err := io.ReadAll(f)
if err != nil {
    log.Fatalf("cannot read: %v", err)
}
log.Printf("looks familiar?\n%s", privateKey)

Taking this code for a test drive shows how a good old path traversal attack was mitigated, even if the file exists.

$ ./04-openbsd-unveil
2025/01/26 22:11:57 cannot open file: open Download/../.ssh/id_ed25519: no such file or directory
$ ls -l Download/../.ssh/id_ed25519
-rw-------  1 user  user  420 Jan 26 22:10 Download/../.ssh/id_ed25519

Great success!

Restricting Syscalls On Linux

Let’s switch operating systems and focus on the Linux kernel for a while.

The similar named section about system call filtering on OpenBSD started with a few sentences about how system calls are the gateway between programs and the kernel to access resources. Same applies to Linux, and almost any operating system out there. Denying unnecessary syscalls directly implies restricting privileges.

Linux comes with a very powerful tool, Seccomp BPF, allowing each process to supply a program to the kernel, deciding which system calls are being allowed. This program is an Berkeley Packet Filter (BPF), receiving both the syscall number and (some) arguments. Thus, it is even possible to allow or deny only certain parameters of a syscall.

Obviously, this great flexibility has its ups and downs. While there are situations where one might want to create a very specific filter, a quick-and-dirty one might be more common. At least in my experience as an unwashed userland developer, I mostly prefer a more rough filter, especially in Go where mostly not interacting with system calls directly.

But where to start? When wanting to get the pure Seccomp BPF experience in Go, there is the github.com/elastic/go-seccomp-bpf package, written in pure Go without any cgo. It allows developing fine-grained filters on a system call basis, as introduced above.

When I first used it, I had no real clue where to start with writing a filter for my Go program. Starting with an all denying filter and using Linux’ auditd(8) I made progress in getting hopefully all necessary syscalls. But then I had to realize that updating Go or any of my dependencies may result in other code (d’oh!) and therefore other system calls. Another constraint is that there are slightly differences between the available system calls of different architectures.

Thus, I started to play around with a more wider filter list and eventually “borrowed” the sets available in systemd’s SystemCallFilter. Administrators may know this feature already, allowing to restrict the available system calls for each service under systemd’s control through a list of system calls, quite familiar to pledge(2). After a while I built myself a small library exactly for this use case, github.com/oxzi/syscallset-go.

From a developer’s perspective, it serves a simple string-based API to build yourself a list of allowed system calls through groups. Honestly, I just shipped systemd’s code to Go and made it fly using the aforesaid go-seccomp-bpf library. But it works.

// Start with limited privileges
if err := syscallset.LimitTo("@system-service"); err != nil {
    log.Fatalf("seccomp: %v", err)
}

// Read input file
f, err := os.Open("input")
if err != nil {
    log.Fatalf("cannot open input: %v", err)
}
inputFile, err := io.ReadAll(f)
if err != nil {
    log.Fatalf("cannot read input: %v", err)
}
if err := f.Close(); err != nil {
    log.Fatalf("cannot close input: %v", err)
}

// Drop further, reading files is no loner necessary
if err := syscallset.LimitTo("@basic-io"); err != nil {
    log.Fatalf("seccomp: %v", err)
}

// Do some computation based on the input

The not dozed off reader might recognize this code, as it is almost identical to the demo code used for pledge(2) above. However, there are a few small differences. First and most obvious, the filters differ. There are also no meta-filters on OpenBSD like the used @system-service, containing lots of commonly used system calls. Furthermore, OpenBSD’s pledge(2) had the handy error group, allowing forbidden system calls to fail, when set. Otherwise, the kernel would kill the process. This behavior also applies here, were each misstep will directly be punished by process execution.

Restricting File System Access On Linux… And Network Access As Well

For symmetry sake, Linux’ answer to unveil(2) must follow. And it does, but it is also way more. Please welcome Landlock LSM.

While Landlock started to address the same issues as unveil(2) - limiting file system access -, it recently grow to also allow certain network isolations. But let the code speak for us, using the github.com/landlock-lsm/go-landlock library.

Restricting Paths

// Restrict file system access to the ./Download directory.
if err := landlock.V5.BestEffort().RestrictPaths(
    landlock.RWDirs("Download"),
); err != nil {
    log.Fatalf("landlock: %v", err)
}

// Buggy application starts here: allowing path traversal
userInput := "../.ssh/id_ed25519"

f, err := os.Open("Download/" + userInput)
if err != nil {
    log.Fatalf("cannot open file: %v", err)
}
defer f.Close()

privateKey, err := io.ReadAll(f)
if err != nil {
    log.Fatalf("cannot read: %v", err)
}
log.Printf("looks familiar?\n%s", privateKey)

This code might also look familiar, because it is an adapted version of the earlier unveil(2) example.

One thing worth mentioning might be the V5.BestEffort() part. Landlock itself is versioned, growing with Linux releases in features. But to build a Go program compatible with older targets, the BestEffort part falls back to what the targeted kernel supports. In case this is not desired, directly use V5.RestrictPaths - or whatever version is the latest when reading.

Restricting Network

At the moment, the possibilities to restrict network connections is still a bit limited in Landlock, but therefore has a simple API. In case you are looking for a full-fledged network limitation suite on Linux, maybe take a look at cgroup eBPF-based network filtering.

So, what is definitely possible? The application is able to limit both inbound and outbound TCP traffic based on ports. Or, in simpler words, one can allow certain TCP ports.

// Restrict outbound TCP connections to port 443.
if err := landlock.V5.BestEffort().RestrictNet(
    landlock.ConnectTCP(443),
); err != nil {
    log.Fatalf("landlock: %v", err)
}

// HTTP should fail, while HTTPS should work.
for _, proto := range []string{"http", "https"} {
    _, err := http.Get(proto + "://pkg.go.dev/")
    log.Printf("%q worked: %t\t%v", proto, err == nil, err)
}

This little example only allows outgoing TCP connections to port 443, as the failing HTTP (port 80) connection shows. Of course, this is no secure way to restrict your application to only use HTTPS.

./06-02-linux-landlock-tcp
2025/01/27 21:35:32 "http" worked: false        Get "http://pkg.go.dev/": dial tcp [2600:1901:0:f535::]:80: connect: permission denied
2025/01/27 21:35:32 "https" worked: true        <nil>

And finally, there is also landlock.BindTCP as a rule option, restricting the TCP ports to be bound to. This may be especially useful when fearing that an attacker launches a shell.

Is This All? Are We Finished?

Have I now covered all possible options for an application to self-restrict its privileges? Obviously not. I have not even started covering Linux’ cgroups, itself allowing a wide range of restrictions. And then there are even other operating systems, like FreeBSD with its capsicum(4).

My main goal for this post was to show that there are quite simple APIs to restrict limiting privileges. OpenBSD itself comes with simple APIs and for Linux there are the two shown Go libraries making these big, very configurable features also usable as a one-liner.

Then there is setrlimit(2), which one might wanna use or might wanna ignore. And of course the root-restricted chroot(2)/setresuid(2) dance.

Thus, you as a developer have a choice. As shown, it is quite easy adding some of the introduced mechanisms to protect your software against future mischief. I would urge you to give it a try, limiting the attack surface of your program.

The examples used in this post and more are available in the following git repository, https://codeberg.org/oxzi/go-privsep-showcase. May it be useful.