Directory Filters
When you pass a directory to a Dagger Function as argument, Dagger uploads everything in that directory tree to the Dagger Engine. For large monorepos or directories containing large-sized files, this can significantly slow down your Dagger Function while filesystem contents are transferred. To mitigate this problem, Dagger lets you apply filters to control which files and directories are uploaded.
Directory arguments
Dagger Functions do not have access to the filesystem of the host you invoke the Dagger Function from (i.e. the host you execute a CLI command like dagger call
from). Instead, host files and directories need to be explicitly passed as command-line arguments to Dagger Functions.
There are two important reasons for this.
- Reproducibility: By providing a call-time mechanism to define and control the files available to a Dagger Function, Dagger guards against creating hidden dependencies on ambient properties of the host filesystem that could change at any moment.
- Security: By forcing you to explicitly specify which host files and directories a Dagger Function "sees" on every call, Dagger ensures that you're always 100% in control. This reduces the risk of third-party Dagger Functions gaining access to your data.
To tell a Dagger Function which directory to use, specify its path as an argument when using dagger call
. Here's a simple example, which passes a directory from the host (./example/hello
) to a Dagger Function:
git clone https://github.com/golang/example
dagger -m github.com/kpenfound/dagger-modules/golang@v0.2.0 call build --source=./example/hello --args=. directory --p
The important thing to know here is that, by default, Dagger will copy and upload everything in the specified directory and its sub-directories to the Dagger Engine. For complex directory trees, directories containing a large number of files, or directories containing large-sized files, this can add minutes to your Dagger Function execution time while the contents are transferred.
Dagger offers pre- and post-call filtering to mitigate this problem and optimize how your directories are handled.
Why filter?
Filtering improves the performance of your Dagger Functions in three ways:
- It reduces the size of the files being transferred from the host to the Dagger Engine, allowing the upload step to complete faster.
- It ensures that minor unrelated changes in the source directory don't invalidate Dagger's build cache.
- It enables different use-cases, such as setting up component/feature/service-specific pipelines for monorepos.
It is worth noting that Dagger already uses caching to optimize file uploads. Subsequent calls to a Dagger Function will only upload files that have changed since the preceding call. Filtering is an additional optimization that you can apply to improve the performance of your Dagger Function.
Pre-call filtering
Pre-call filtering means that a directory is filtered before it's uploaded to the Dagger Engine container. This is useful for:
-
Large monorepos. Typically your Dagger Function only operates on a subset of the monorepo, representing a specific component or feature. Uploading the entire worktree imposes a prohibitive cost.
-
Large files, such as audio/video files and other binary content. These files take time to upload. If they're not directly relevant, you'll usually want your Dagger Function to ignore them.
tipThe
.git
directory is a good example of both these cases. It contains a lot of data, including large binary objects, and for projects with a long version history, it can sometimes be larger than your actual source code. -
Dependencies. If you're developing locally, you'll typically have your project dependencies installed locally:
node_modules
(Node.js),.venv
(Python),vendor
(PHP) and so on. When you call your Dagger Function locally, Dagger will upload all these installed dependencies as well. This is both bad practice and inefficient. Typically, you'll want your Dagger Function to ignore locally-installed dependencies and only operate on the project source code.
At the time of writing, Dagger does not read exclusion patterns from existing .dockerignore
/.gitignore
files. If you already use these files, you'll need to manually implement the same patterns in your Dagger Function.
To implement a pre-call filter in your Dagger Function, add an ignore
parameter to your Directory
argument. The ignore
parameter follows the .gitignore
syntax. Some important points to keep in mind are:
- The order of arguments is significant: the pattern
"**", "!**"
includes everything but"!**", "**"
excludes everything. - Prefixing a path with
!
negates a previous ignore: the pattern"!foo"
has no effect, since nothing is previously ignored, while the pattern"**", "!foo"
excludes everything exceptfoo
.
- Go
- Python
- TypeScript
Here's an example of a Dagger Function that excludes everything in a given directory except Go source code files:
package main
import (
"context"
"dagger/my-module/internal/dagger"
)
type MyModule struct{}
func (m *MyModule) Foo(
ctx context.Context,
// +ignore=["*", "!**/*.go", "!go.mod", "!go.sum"]
source *dagger.Directory,
) (*dagger.Container, error) {
return dag.
Container().
From("alpine:latest").
WithDirectory("/src", source).
Sync(ctx)
}
Here's an example of a Dagger Function that excludes everything in a given directory except Python source code files:
from typing import Annotated
import dagger
from dagger import Ignore, dag, function, object_type
@object_type
class MyModule:
@function
async def foo(
self,
source: Annotated[dagger.Directory, Ignore(["*", "!**/*.py"])],
) -> dagger.Container:
return await (
dag.container().from_("alpine:latest").with_directory("/src", source).sync()
)
Here's an example of a Dagger Function that excludes everything in a given directory except TypeScript source code files:
import {
dag,
object,
argument,
func,
Directory,
Container,
} from "@dagger.io/dagger"
@object()
class MyModule {
@func()
async foo(
@argument({ ignore: ["*", "!**/*.ts"] }) source: Directory,
): Promise<Container> {
return await dag
.container()
.from("alpine:latest")
.withDirectory("/src", source)
.sync()
}
}
Here are a few examples of useful patterns:
- Go
- Python
- TypeScript
// exclude Go tests and test data
+ignore=["**_test.go", "**/testdata/**"]
// exclude binaries
+ignore=["bin"]
// exclude Python dependencies
+ignore=["**/.venv", "**/__pycache__"]
// exclude Node.js dependencies
+ignore=["**/node_modules"]
// exclude Git metadata
+ignore=[".git", "**/.gitignore"]
# exclude Go tests and test data
Ignore(["**_test.go", "**/testdata/**"])
# exclude binaries
Ignore(["bin"])
# exclude Python dependencies
Ignore(["**/.venv", "**/__pycache__"])
# exclude Node.js dependencies
Ignore(["**/node_modules"])
# exclude Git metadata
Ignore([".git", "**/.gitignore"])
// exclude tests and test data
@argument({ ignore: ["**_test.go", "**/testdata/**"] })
// exclude binaries
@argument({ ignore: ["bin"] })
// exclude Python dependencies
@argument({ ignore: ["**/.venv", "**/__pycache__"] })
// exclude Node.js dependencies
@argument({ ignore: ["**/node_modules"] })
// exclude Git metadata
@argument({ ignore: [".git", "**/.gitignore"] })
Post-call filtering
Post-call filtering means that a directory is filtered after it's uploaded to the Dagger Engine.
This is useful when working with directories that are modified "in place" by a Dagger Function. When building an application, your Dagger Function might modify the source directory during the build by adding new files to it. A post-call filter allows you to use that directory in another operation, only fetching the new files and ignoring the old ones.
A good example of this is a multi-stage build. Imagine a Dagger Function that reads and builds an application from source, placing the compiled binaries in a new sub-directory (stage 1). Instead of then transferring everything to the final container image for distribution (stage 2), you could use a post-call filter to transfer only the compiled files.
- Go
- Python
- TypeScript
To implement a post-call filter in your Dagger Function, use the DirectoryWithDirectoryOpts
or ContainerWithDirectoryOpts
structs, which support Include
and Exclude
patterns for Directory
objects. Here's an example:
package main
import (
"context"
"dagger/my-module/internal/dagger"
)
type MyModule struct{}
func (m *MyModule) Foo(
ctx context.Context,
source *dagger.Directory,
) *dagger.Container {
builder := dag.
Container().
From("golang:latest").
WithDirectory("/src", source, dagger.ContainerWithDirectoryOpts{Exclude: []string{"*.git", "internal"}}).
WithWorkdir("/src/hello").
WithExec([]string{"go", "build", "-o", "hello.bin", "."})
return dag.
Container().
From("alpine:latest").
WithDirectory("/app", builder.Directory("/src/hello"), dagger.ContainerWithDirectoryOpts{Include: []string{"hello.bin"}}).
WithEntrypoint([]string{"/app/hello.bin"})
}
To implement a post-call filter in your Dagger Function, use the include
and exclude
parameters when working with Directory
objects. Here's an example:
import dagger
from dagger import dag, function, object_type
@object_type
class MyModule:
@function
def foo(self, source: dagger.Directory) -> dagger.Container:
builder = (
dag.container()
.from_("golang:latest")
.with_directory("/src", source, exclude=["*.git", "internal"])
.with_workdir("/src/hello")
.with_exec(["go", "build", "-o", "hello.bin", "."])
)
return (
dag.container()
.from_("alpine:latest")
.with_directory(
"/app", builder.directory("/src/hello"), include=["hello.bin"]
)
.with_entrypoint(["/app/hello.bin"])
)
To implement a post-call filter in your Dagger Function, use the include
and exclude
parameters when working with Directory
objects. Here's an example:
import { dag, Container, Directory, object, func } from "@dagger.io/dagger"
@object()
class MyModule {
@func()
foo(source: Directory): Container {
const builder = dag
.container()
.from("golang:latest")
.withDirectory("/src", source, { exclude: ["*.git", "internal"] })
.withWorkdir("/src/hello")
.withExec(["go", "build", "-o", "hello.bin", "."])
return dag
.container()
.from("alpine:latest")
.withDirectory("/app", builder.directory("/src/hello"), {
include: ["hello.bin"],
})
.withEntrypoint(["/app/hello.bin"])
}
}
Here are a few examples of useful patterns:
- Go
- Python
- TypeScript
// exclude all Markdown files
dirOpts := dagger.ContainerWithDirectoryOpts{
Exclude: "*.md*",
}
// include only the build output directory
dirOpts := dagger.ContainerWithDirectoryOpts{
Include: "build",
}
// include only ZIP files
dirOpts := dagger.DirectoryWithDirectoryOpts{
Include: "*.zip",
}
// exclude Git metadata
dirOpts := dagger.DirectoryWithDirectoryOpts{
Exclude: "*.git",
}
# exclude all Markdown files
dir_opts = {"exclude": ["*.md*"]}
# include only the build output directory
dir_opts = {"include": ["build"]}
# include only ZIP files
dir_opts = {"include": ["*.zip"]}
# exclude Git metadata
dir_opts = {"exclude": ["*.git"]}
// exclude all Markdown files
const dirOpts = { exclude: ["*.md*"] }
// include only the build output directory
const dirOpts = { include: ["build"] }
// include only ZIP files
const dirOpts = { include: ["*.zip"] }
// exclude Git metadata
const dirOpts = { exclude: ["*.git"] }
Debugging
Using logs
Both Dagger Cloud and the Dagger TUI provide detailed information on the patterns Dagger uses to filter your directory uploads - look for the upload step in the TUI logs or Trace:
Inspecting directory contents
Another way to debug how directories are being filtered is to create a function that receives a Directory
as input, and returns the same Directory
:
- Go
- Python
- TypeScript
func (m *MyModule) Debug(
ctx context.Context,
// +ignore=["*", "!analytics"]
source *dagger.Directory,
) *dagger.Directory {
return source
}
@function
async def foo(
self,
source: Annotated[
dagger.Directory, Ignore(["*", "!analytics"])
],
) -> dagger.Directory:
return source
@func()
debug(
@argument({ ignore: ["*", "!analytics"] }) source: Directory,
): Directory {
return source
}
Calling the function will show you the directory’s digest and top level entries. The digest is content addressed, so it changes if there are changes in the contents of the directory. Looking at the entries field you may be able to spot an interloper:
dagger call debug --source=.
You can also list all files, recursively to check it more deeply:
dagger call debug --source=. glob --pattern="**/*"
You can open the directory in an interactive terminal to inspect the filesystem:
dagger call debug --source=. terminal
You can export the filtered directory to your host and check it with local tools:
dagger call debug --source=. export --path=audit