GO EP10: GOROOT, GOPATH, GOCACHE
For instance, you might wonder what `GOPATH` and `GOROOT` mean, or what happens when you use commands like `go get` or `go mod tidy`.
When learning Go, we concentrate on understanding the syntax and specific features of the language, such as ‘for range’ loops, generics, pointers, etc., rather than learning the “environmental factors” of the language.
For instance, what is GOPATH, what is GOROOT, what happens when you run go get
, go mod tidy
, etc.?
This discussion (and many later ones) aims to bridge that gap. We’ll talk about aspects that aren’t directly related to the language’s syntax. This research is also broken down into several stories to keep things concise and focused.
So, let’s begin with a fundamental question: What is GOROOT?
GOROOT
When you follow the steps to install Go from the official website at https://go.dev/doc/install, the distribution typically sets up in the /usr/local/go
directory (Linux or macOS) or C:\Go
(Windows).
The term GOROOT
refers to an environment variable that directs to where your Go installation resides. This is important because it lets the Go tools know where to find the installed files they need to run properly.
You can see what your GOROOT
is set to by running the following command.
$ go env GOROOT
/usr/local/go
Keep in mind, though, that you usually don’t need to worry about setting the GOROOT
environment variable on your own, as the Go installation process typically handles this for you. Of course, there might be occasions when you need to set it manually, especially if you are juggling multiple versions of Go or using a custom build.
GOROOT
is also where the Go standard library's source code resides, specifically within the src
directory. Here, you'll find libraries such as math
, fmt
, and net
.
Additionally, the tools
directory under the same GOROOT
houses essential Go tools like go
, gofmt
, and goimports
.
Basically, this setup forms the core of the Go toolchain. It’s important to understand that GOROOT
pertains to the tools and libraries provided by the Go language itself and is not directly tied to the specific projects you are working on or to managing your project's dependencies.
GOPATH
On the other hand, the GOPATH
environment variable specifies where your Go dependencies, or anything related to your working projects, live on your computer.
Since Go version 1.8, this variable is automatically set up with a default value, $HOME/go
on Unix-like systems and %USERPROFILE%\go
on Windows.
$ go env GOPATH
/Users/Phuong/go
Within the $GOPATH
directory, you’ll find three important subdirectories: src
, pkg
, and bin
.
In the traditional GOPATH mode, when you use the go get
command to download a repository, it places the source code into the $GOPATH/src
directory. For instance, if you type go get github.com/who/which, the source code will go to $GOPATH/src/github.com/who/which
.
This directory is where you manage both your projects and their dependencies.
Let’s say you run go get example.com/user/project
, the structure would look like this:
$GOPATH
├── bin
│ └── (compiled binaries from go install)
├── pkg
│ └── (compiled package files)
│ └── example.com
│ └── user
│ └── project.a
└── src
└── example.com
└── user
└── project
├── main.go
├── utils.go
└── subpackage
├── sub.go
└── helper.go
The
bin
directory: This is where the compiled binaries go when you run thego install
command.The
pkg
directory: Stores the compiled package files and they are organized by the import paths.The
src
directory: Holds the source code of all your projects and their dependencies.
But starting from Go 1.11, Go introduced something new called Go modules, most modern Go projects use this system now.
With Go modules, you don’t have to keep your project inside the $GOPATH/src
folder. Instead, you can place your project anywhere on your computer and Go will use a file named go.mod
in your project folder to keep track of your dependencies.
“So, is GOPATH no longer needed?”
The answer is, GOPATH itself is still needed.
What has changed is the ‘GOPATH development mode’ (or ‘GOPATH mode’), which is no longer the main way to manage dependencies and projects since Go 1.16. So, we can ignore the src
directory for now.
You can place your projects anywhere on your file system. Each project has its own go.mod
file to manage dependencies, making it more distributed.
Even though you don’t use $GOPATH/src
to organize your projects anymore, the GOPATH
environment variable still plays a role. It helps Go know where to store certain things like the module cache, binary files, etc.
When you run the command go get example.com/user/project
, the directory structure now looks like this:
$GOPATH
├── bin
│ └── (compiled binaries from go install)
├── pkg
│ └── mod
│ ├── cache
│ │ └── (cache files used during module resolution)
│ └── example.com
│ └── user
│ └── project@v1.0.0
│ ├── go.mod
│ ├── main.go
│ ├── utils.go
│ └── subpackage
│ ├── sub.go
│ └── helper.go
└── sumdb
└── (cached checksum database state)
This setup is called module-aware mode.
It has been the default mode since Go 1.16. This means that even if there isn’t a go.mod file in your current directory or any parent directory, Go will still use module-aware mode. This behavior is also controlled by an environment variable named GO111MODULE
.
“What exactly is a Go module?”
A Go module is a collection of related Go packages that are versioned and released together, it helps you manage your code and its dependencies.
A single repository in Go can have one or more modules, but usually, it has just one module and this module is typically located at the root of the repository. So, the module’s directory is the main directory of the repository.
Inside this directory, there’s a special file called go.mod
. This file declares the module path, which is essentially the base import path for all the packages in that module.
For example, if your module path is github.com/who/which
, then all the packages within this module will use this as the prefix for their import paths.
github.com/who/which
├── go.mod
├── main.go
└── pkg/
└── mypackage/
└── mypackage.go
main.go
and any Go code in the root directory are part of the module github.com/who/which
. The file mypackage.go
within the pkg/mypackage
directory is also part of the same module, with the import path github.com/who/which/pkg/mypackage
.
So, the module includes all the Go packages found in the directory where the go.mod
file is located, as well as all packages in the subdirectories beneath this directory.
These packages are included as long as there isn’t another go.mod
file in one of those subdirectories. If there is another go.mod file in a subdirectory, it indicates the start of a new module, and the hierarchy starts over from there.
In the example above, if there was another go.mod file inside pkg/mypackage
, it would indicate a new module starts there, and the packages inside mypackage
would belong to that new module.
“My folder is
which
, so my module name iswhich
, right?”
No, the module name is not the same as the folder name. The module name is declared in the go.mod
file, and it can be anything you want (as long as it follows valid syntax).
// go.mod
module github.com/who/which
The above go.mod
file declares the module name as github.com/who/which
, but we can specify different names, such as github.com/who/which/v2
or even whichofwho
, etc.
“What is the difference between a package and a module?”
They have hierarchical relationships. A module is a collection of related packages, and a package is a collection of related Go files (*.go).
“Why GOPATH mode is replaced by Go modules?”
The root cause of these issues is that GOPATH mode lacks the concept of package versions.
GOPATH mode uses the code you have checked out on your computer for building dependencies. This means that if you clone a project and build it, you might get different dependencies than the original developer used.
In contrast, Go modules manage dependencies with specific versions, making sure that everyone uses the same code. When you clone a project and fetch its dependencies, Go modules check these dependencies against a go.sum
file to ensure they are the same as what the original developer used. Only the initial git clone is trusted.
Let’s move on to the last directory in the GOPATH, $GOPATH/bin
. Actually, to explain this, we need to talk about the GOBIN
environment variable.
GOBIN
When you install Go programs or tools using commands like go install
or go get
, the resulting executable binaries need to be placed somewhere on your file system. Where these binaries are placed depends on two environment variables: GOPATH
and GOBIN
.
“I have run
go install
for many packages, but/bin
only contains some, why?"
Not every package will generate an executable binary, only packages that contain a main()
function will generate an executable binary.
Let’s talk about the GOBIN
environment variable, if GOBIN
is set, it specifies the directory where the binaries will be installed.
If GOBIN
is not set, the system then checks the GOPATH
environment variable, and binaries will be installed in the bin
subdirectory of the first directory listed in the GOPATH
. For instance, if GOPATH
is set to /home/user/go
, then the binaries will go into /home/user/go/bin
.
If neither GOBIN
nor GOPATH
is set, Go falls back to using a default GOPATH
. The default GOPATH
is $HOME/go
on Unix-like systems (like Linux or macOS) and %USERPROFILE%\go
on Windows as we mentioned before.
Therefore, in this case, the binaries will be installed in the bin
subdirectory of the default GOPATH
directory. So on a Unix-like system, they would go into $HOME/go/bin
, and on a Windows system, they would go into %USERPROFILE%\go\bin
.
GOCACHE
Have you ever noticed, when you build a project for the first time, it’s kinda slow, but the second time is significantly faster?
The Go build uses a caching mechanism to store build outputs, which helps speed up future builds by reusing previous outputs. This is a simple estimation of the time to build a large project which involves hundreds of both direct and indirect dependencies.
$ time go build
go build 169.96s user 28.43s system 556% cpu 35.622 total
$ time go build
go build 0.79s user 0.83s system 227% cpu 0.713 total
$ time go build
go build 0.81s user 1.38s system 476% cpu 0.460 total
The first time it runs, the Go toolchain needs to compile all the necessary packages and dependencies from scratch, so it took about ~36 seconds in total. The high user and system times indicate that a lot of work was done by the CPU.
However, when we run go build
again, the build time is significantly reduced and took less than a second to complete. Since the Go toolchain has already compiled the necessary packages and stored the results in the cache during the first build, it can now reuse these precompiled outputs.
The third run shows a similarly fast result, but because of the dynamics of the computer, heat, application, scheduling, etc., the time is slightly different.
Again, the build process is very quick because the cache is being used effectively.
By default, this cache data is stored in a directory called go-build
located within the standard user cache directory specific to the current OS. In my case, it's located in $HOME/Library/Caches/go-build
.
$ go env GOCACHE
Users/Phuong/Library/Caches/go-build
The GOCACHE
environment variable can be used to specify a different location for the cache if you want to, but it's not recommended to change it unless you have a good reason to do so.
“But this cache can grow large over time, right?”
Partially true, the Go command also has a built-in mechanism to manage this cached data.
It periodically deletes cached items that haven’t been used recently and this helps to free up disk space by removing old and potentially unnecessary files that are just taking up space.
The cache is not only used for storing build artifacts to speed up the build process, but also successful test results, values used in fuzzing, etc. But that’s not our main concern here.
“My machine is running out of disk space, should I clear the cache?”
Sure, sometimes you might want to manually clear out the cache, and I always recommend this. There are several cache layers when using Go as we have seen and the direct way is using these commands:
go clean -cache
to delete the cache data stored in the$GOCACHE
directory.go clean -testcache
to delete all the test cache results also in the$GOCACHE
directory.go clean -fuzzcache
to delete the fuzzing testing data in that directory.go clean -modcache
to delete the module cache data stored in the$GOPATH/pkg/mod
directory.
“So, what’s the point? What’s the benefit of knowing this?”
We could speed up the build process of our image in Docker by mounting the cache directory to the container, especially in local development.
When building an image from a Dockerfile, we could make a cache mount to share the data between build processes or a bind mount to share the cache between the host and the container.
So, at least for now, we know 2 caching strategies: $GOPATH/pkg/mod
for dependencies and $GOCACHE
for build cache. We could cut off the time of downloading dependencies and compiling the source code.
This is an example from the Official Docker guide for Go:
# syntax=docker/dockerfile:1
FROM golang:1.21-alpine AS base
WORKDIR /src
COPY go.mod go.sum .
RUN go mod download
RUN --mount=type=cache,target=/go/pkg/mod/ \
go mod download -x
COPY . .
FROM base AS build-client
RUN go build -o /bin/client ./cmd/client
RUN --mount=type=cache,target=/go/pkg/mod/ \
go build -o /bin/client ./cmd/client
FROM base AS build-server
RUN go build -o /bin/server ./cmd/server
RUN --mount=type=cache,target=/go/pkg/mod/ \
go build -o /bin/server ./cmd/server
FROM scratch AS client
COPY --from=build-client /bin/client /bin/
ENTRYPOINT [ "/bin/client" ]
FROM scratch AS server
COPY --from=build-server /bin/server /bin/
ENTRYPOINT [ "/bin/server" ]
The above Dockerfile only shows the dependencies cache. Look at the --mount
flag of the RUN command. They use the cache at /go/pkg/mod
, which is our $GOPATH/pkg/mod
directory that contains all the dependencies of our project downloaded from the internet.
“How does it work?”
Not too related to our story, but I will give a short explanation here.
The mount=type=cache
flag tells Docker to create a persistent cache directory that can be reused across multiple build steps or even different builds. Note that, it’s not related to our /go/pkg/mod
on the local machine, the storage is managed by the Docker itself.
How about the go-build
directory or $GOCACHE
?
Things could be even faster by caching the go-build
directory. The process should be dramatically faster when building a project multiple times.
In my case, it goes from 1 minute and 15 seconds (non-cache) to 45 seconds (/go/pkg/mod
only) to 8 seconds (both /go/pkg/mod
and $GOCACHE
). Of course, you have to break the image steps cached. Otherwise, the benchmark will not be correct.
Others
There are some other environment variables which are not as popular as what we discussed above, but I find it interesting to mention them here.
GO111MODULE
GO111MODULE
(Go v1.11 module) is an environment variable that controls the use of Go modules. With Go 1.16 and later, the default setting for GO111MODULE
is "on," meaning Go modules are used by default.
This means that module-aware mode is always on unless you explicitly change this setting. If you prefer the old behavior, where module-aware mode is only enabled when a go.mod
file is present, you need to set GO111MODULE
to "auto." So there are three possible values for this variable: "on," "off," and "auto."
“on” means module-aware mode is always enabled, regardless of where your code is located or whether a
go.mod
file exists.“off” means module-aware mode is disabled, and the Go tools will look for dependencies in the traditional
GOPATH
mode.“auto” means module-aware mode is enabled only if there is a
go.mod
file in the current directory or any parent directory.
The Go team will remove this variable in the future and switch entirely to module-aware mode.
GOMODCACHE
It’s not entirely true to say that the Go modules cache is stored in $GOPATH/pkg/mod
.
The GOMODCACHE
is actually the environment variable that specifies the directory where the Go command will store downloaded modules and related files. By default, it's $GOPATH/pkg/mod
(if it's not set, it will also use this default value).
GOSUMDB
GOSUMDB
is used to identify a checksum database that verifies the integrity of Go modules. This database ensures that the modules you download have not been tampered with.
When you download a module, the Go tool checks the go.sum
file for a hash of that module. If the hash is not present, the Go tool verifies the module's hash using the checksum database specified by GOSUMDB
. Once verified, the hash is added to go.sum
and the module is stored in the module cache.
By default, GOSUMDB
is set to sum.golang.org
, which is a checksum database run by Google. It's known and trusted by the Go tools, so you don't need to provide any additional information for it to work. You can specify other checksum databases by setting GOSUMDB
to include the database's public key and URL. The public key is used to verify the signatures from the checksum database.
Inside mainland China, sum.golang.google.cn
connects to the sum.golang.org
database. The Go tool recognizes this and uses the same public key for verification.
If GOSUMDB
is set to off or the -insecure
flag is used, the checksum database is not consulted. Instead, the Go tool accepts the module's hash without verification and adds the module to the cache. This also applies to private modules specified by GOPRIVATE
or GONOSUMDB
.
With newest versions of Go is so much better nowadays, for example go modules by default, etc. Running/installing locally became so much easier. I would also mention GOPRIVATE, GOPROXY which can be used in CI/CD.