Guix package structure: inputs, native-inputs and propagated inputs

Screenshot of the dependencies for the Htop package

Figure 1 (click bottom box to enlarge!): Htops relations

In the last post we introduced the overall structure of Guix packages and how to process source code for a package. Along with the source code we need other libraries and utilities to build or run the software - these are referred to as inputs. The source code and inputs are processed by a build system which builds them. This is a functional approach, think of it as a pipeline where the build system transforms the inputs (and source code) to the outputs.

This post is going to be all about package inputs! It's part 7 of the ongoing Guix packaging series - click through for more fun!

Although this post is about inputs, lets quickly cover how the build-system also provides tools that are used to build the source code. By defining a build-system, a package sets how the build process will run, along with the utilities that will be used. Guix has multiple build-systems. The most common is the gnu-build-system where the standard utilities it provides are gcc and make; for the cmake-build-system it's cmake. This is why every C package doesn't have to specify gcc, or every Perl package specify Perl. We can make changes to how the build happens by providing arguments to the build system - the details of that will be in a later post.

Aside from the build utilities many packages need to specify other libraries or tools that are needed. That's the job of the inputs which are specified like this:

The native-inputs list provides inputs that are needed during the build. They are native because when cross-compiling the native-inputs are built for the architecture of the build machine. An example of something that's a native-input is any test tools (e.g. python-mock) as this will need to run on the build machine during the build.

The inputs list is for libraries that are needed during the build and when the package is used. When thinking about cross-compiling these libraries will be built for the target architecture. An example of a library would be QT for a KDE application, or a Perl module that's used by a Perl utility.

The dividing line can be quite difficult to determine - I have no idea why glib always goes into native-inputs while gtk goes into inputs - but guix lint will find mistakes. Generally, the inputs list will have much more in it than the native-inputs list - but it's easier to understand the special case of native-inputs first.

Inputs example

For a specific example here's htop which is defined in admin.scm:

 1 (define-public htop
 2   (package
 3     (name "htop")
 4     (version "3.2.2")
 5     (source
 6      (origin
 7        (method git-fetch)
 8        (uri (git-reference
 9              (url "https://github.com/htop-dev/htop")
10              (commit version)))
11        (sha256
12         (base32 "0cyaprgnhfrc7rqq053903bjylaplvxkb65b04bsxmiva09lvf9s"))
13        (file-name (git-file-name name version))))
14     (build-system gnu-build-system)
15     (inputs
16      (list ncurses))
17     (native-inputs
18      (list autoconf automake python-minimal-wrapper))     ; for scripts/MakeHeader.py
19     (home-page "https://htop.dev")
20     (synopsis "Interactive process viewer")
21     (description
22      "This is htop, an interactive process viewer.  It is a text-mode
23 application (for console or X terminals) and requires ncurses.")
24     (license license:gpl2)))

Line 14: the package uses the gnu-build-system so we know it's going to use configure (Autoconf) and Make to build.

Line 17-18: the package has a set of native-inputs in a list. There's are the tools required during the build, including a minimal Python.

Line 15-16: the package's inputs list. In this case the only thing required to build the package, and presumably at runtime is ncurses.

To build the package we do:

$ guix shell --development guix --container --nesting --network coreutils
[env] guix build --no-substitutes htop

The next thing to look at is what happens to the inputs in the final package that is output.

Graph of package references in the Htop package

Figure 2 (click to enlarge!): Htop's references

Runtime references

We can see a graphical view of the inputs required to build a package by using guix graph like this:

$ guix package --install graphviz
$ guix graph --type=package htop > htop-dag.dot

In Figure 1 (above) we see that the package requires automake, which in turn requires m4: this is an example of how the DAG expands through dependencies of dependencies. We can also see that the python-minimal package in native-inputs pulls in other required libraries.

If all these packages were required to run the application there'd be a big download and install cost. But, as we know there's a difference between tools required to build the package and those required at runtime.

In Guix the final package includes references to anything that is needed for the package to work at runtime. These referenced store items are downloaded into the Store when the package is installed.

We can see the embedded references in a binary package by doing the following:

$ guix graph --type=references htop > htop-references.dot
$ xdot htop-references.dot

See Figure 2, which shows that htop embeds references to ncurses, gcc, glibc and bash-static. What we're seeing is the requirements that the binary has at runtime. As the build tools from the native-inputs aren't required at runtime they aren't referenced.

The mechanism that is used to do this is a runtime path that is part of the ELF file format. We can see this:

$ guix shell --container --nesting --network coreutils

[env]$ guix package --install htop binutils grep
[env]$ GUIX_PROFILE="/home/steve/.guix-profile" ; . "$GUIX_PROFILE/etc/profile"
[env]$ readelf --dynamic ~/.guix-profile/bin/htop | grep runpath

0x000000000000001d (RUNPATH)            Library runpath: [/gnu/store/1fjm5sqgiwl2rcy9fwn69abaahx6z3sq-ncurses-6.2.20210619/lib:
/gnu/store/ln6hxqjvz6m9gdd9s97pivlqck7hzs99-glibc-2.35/lib:/gnu/store/6ncav55lbk5kqvwwflrzcr41hp5jbq0c-gcc-11.3.0-lib/lib:
/gnu/store/6ncav55lbk5kqvwwflrzcr41hp5jbq0c-gcc-11.3.0-lib/lib/gcc/x86_64-unknown-linux-gnu/11.3.0/../../..]

As we can see it contains those references to ncurses, glibc and gcc that we saw from guix graph: guix graph is just a graphical method of showing it. And, what we're seeing - just to reinforce - is that the htop binary is embedding the other binaries (libraries) that it needs at runtime.

The approach of having these references is different to how other package managers define runtime dependencies. If we look at the Debian package of Htop we can see that the packager defines other packages that are required at runtime (e.g. libc). There are also other differences like the FHS which impact's how libraries work.

In Guix and Nix the packaging system determines the runtime requirements automatically: the Guix Htop package doesn't define that libc is required but it's included because Guix discovered it during the build. When I discovered this it just blew my mind, as it's so different from other packaging systems! 🤯

The second element to notice is that these references are all in the Store, they are not installed into the users profile. The goal is to simplify the interrelations between packages. This simplifies packaging and lets users run multiple packages at the same time.

To illustrate, imagine that Htop depended on a specific version of Grep, and that we wanted to install another package that required a different version of Grep. In many systems the two packages would conflict because if both were installed there'd be no way to guarantee that both could use their specific version of Grep. Which ever version was on the $PATH first would be the one that would be used. While this is an imaginary example, there are plenty of situations where there's this kind of conflicting dependency. Guix removes this by using a reference to the tool or library in the Store - and as each package version is on a unique path there's no chance of a conflict.

Alright, hope that's clear - packages embed references to their runtime requirements which are in the Store.

Now we get to the BUT - you knew there was one!

Propagated inputs

Graph of package references in the python-scp package

Figure 3 (click to enlarge!): Python-scp's references

Not all languages have the ability to record and use an ELF runtime path - many of the dynamic languages like Guile, Perl and Python fall into this category. This means that Guix can't automatically know about all the libraries that are required at runtime. To cover these packages there is the propagated-inputs list which is used when packaging a runtime dependency for these languages. If a package requires another library at runtime then it's put into the propagated-inputs list.

A good example of this is python-scp in python-xyz.scm:

(define-public python-scp
  (package
    (name "python-scp")
    (version "0.13.3")
    (source
     (origin
       (method url-fetch)
       (uri (pypi-uri "scp" version))
       (sha256
        (base32 "1m2v09m407p097cy3xy5rxicqfzrqjwf8v5rd4qhfqkk7lllimwb"))))
    (build-system python-build-system)
    (arguments
     '(#:tests? #f))                     ;tests require an SSH server
    (propagated-inputs
     (list python-paramiko))
    (home-page "https://github.com/jbardin/scp.py")
    (synopsis "SCP protocol module for Python and Paramiko")
    (description "The scp module extends the Paramiko library to send and
receive files via the SCP1 protocol, as implemented by the OpenSSH
@command{scp} program.")
    (license license:gpl2+)))

As we can see it doesn't have any inputs but it does have a propagated-inputs list. We can see the impact this has on the packages references like this:

$ guix graph --type=references python-scp > python-scp-references.dot
$ xdot python-scp-references.dot

# if you want to look at it in a browser do this
$ dot -Tsvg python-scp-references.dot > python-scp-references.svg
$ xdg-open python-scp-references.svg

In Figure 3 we see that the package only refers to itself, there are no embedded references. The picture's a bit different (see Figure 4) when we look at the package's full dependency graph:

$ guix graph --type=package python-scp > python-scp-package.dot
$ xdot python-scp-package.dot

# to see it in a browser do
$ dot -Tsvg python-scp-package.dot > python-scp-package.svg
$ xdg-open python-scp-package.svg

What this is telling us is that during the build no additional libraries were used, so nothing appears in the references. To capture the fact that python-paramiko is needed at runtime, the packager added it to propagated-inputs and consequently it's listed in the package's DAG (along with a lots of transitive dependencies). Since Python doesn't use the rpath based runtime search path, Guix sets up a search path in an environment variable - for Python this is GUIX_PYTHONPATH - and in this case adds the location to the paramiko lib to this (under a site_packages directory in the profile).

Graph of package relationships in the python-scp package

Figure 4 (click the edge of the bottom box to enlarge!): Python-scp's package relations

It's worth noting that a key difference is that items in the inputs list are referenced from the Store, whereas those specified in propagated-inputs are added to the user's profile so that the appropriate interpreter (e.g. Python) can find them using it's own runtime path.

Reverse graph of package relationships in the python-scp package

Figure 5 (click box to enlarge!): Package dependents

Other than for dynamic language libraries, avoid using propagated-inputs to specify runtime dependencies because since they are added to the users profile they make the associated dependency graph more complicated: if there are lots of libraries installed into a profile it's more likely that there will be a clash - whereas there can't be a clash if a reference is embedded. We'll look at ways to reduce propagated-inputs in a future post.

There's another interesting guix graph command:

$ guix graph --type=reverse-package python-scp > python-scp-reverse-package.dot

The reverse-graph shows the packages that require the python-scp package (Figure 5). In this case, it's just one package!

Modifying Inputs

Now that we understand the various inputs we can move onto using them to create our own package variants. The basic pattern is that we use inheritance and then modify-inputs to delete, prepend, append and replace parts of a package's inputs. We can alter any input - the inputs list, native inputs or propagated-inputs.

To prepend or append an input see this excerpt from the python-pillow-simd (python-xyz.scm) package:

(inputs
    (modify-inputs (package-inputs python-pillow)
        (prepend libraqm libimagequant)))

It's altering the (inputs ...) list with the call to (modify-inputs ...) which is specified in packages.scm. There's two arguments, the first uses the (package-inputs ...) function to retrieve the inputs of the python-pillow package that's been inherited. The second is to (prepend ...) the libraries libraqm and libimagequant. Essentially, the first argument returns a list of the inputs and the second parameter prepends the new libraries onto that list - when modify-inputs returns it provides the new list to inputs.

We can see an example of the package-inputs replace capability in the guile2.2-mailutils (mail.scm) package:

(define-public guile2.2-mailutils
  (package
    (inherit mailutils)
    (name "guile2.2-mailutils")
    (inputs
     (modify-inputs (package-inputs mailutils)
       (replace "guile" guile-2.2)))))

The new package guile2.2-mailutils inherits from mailutils. The modify-inputs function is called with it's two arguments. The first one is to retrieve the package's inputs. The second is to replace guile with guile-2.2. One thing to notice is that Guile is specified as a string.

Slimming Weechat

As a practical example lets look at slimming down Weechat by using the delete capability in modify-inputs. As Weechat has a plugin interface, the package embeds each interpreter. To see the size of Weechat in Guix do:

$ guix size weechat
store item                                                       total    self
/gnu/store/5lqhcv91ijy82p92ac6g5xw48l0lwwz4-gcc-11.3.0             223.6   148.1  25.3%
/gnu/store/4r7k7ipiaqkdf4lmnxwmbz0wx2yzygzc-python-3.10.7          226.3    74.0  12.6%
/gnu/store/lj75fc25zx2y9pqvfp95la84rdhlj4f8-perl-5.36.0            152.2    59.4  10.1%
/gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9            135.0    53.1   9.1%
/gnu/store/7ri578qarmn1cj2inl243xar6p7j1vxh-ruby-3.1.4             298.1    40.3   6.9%
[... more output ... ]
585.4 MiB

Most plugins are written in Python and Perl so lets remove Guile, Ruby and Lua to see how much it reduces the package's size. Create a file called local-weechat.scm and add the following:

 1 (define-module (local-weechat)
 2     #:use-module (guix packages)
 3     #:use-module (guix utils)
 4     #:use-module (gnu packages irc))
 5 
 6 (define-public weechat-local
 7   (package
 8     (inherit weechat)
 9     (version "4.2.1-local")
10     (arguments
11         (substitute-keyword-arguments (package-arguments weechat)
12             ((#:configure-flags flags)
13               `(cons* "-DENABLE_GUILE=OFF" "-DENABLE_LUA=OFF"
14                       "-DENABLE_RUBY=OFF" "-DENABLE_TCL=OFF" ,flags))))
15     (inputs
16         (modify-inputs (package-inputs weechat)
17                         (delete "guile-3.0" "lua-5.1" "ruby" "tcl")))))

Line 1-4: we need (guix packages) and (guix-utils) because they contain the package functions we're using. As we're inheriting the weechat package we have to be able to reference it so we need (gnu packages irc). As usual because our file was called local-weechat our module is called the same thing.

Line 6-9: we've seen these before where there's an (inherit weechat) and adding an altered version string.

Line 15-17: using modify-inputs to change the inputs field. Two arguments are provided, the first is the package's inputs. From this list the various interpreters are deleted.

Line 10-14: to remove the interpreters from the package, the software has to build with these capabilities turned off. We can provide arguments to the build-system, which impact how the build operates. The #:configure-flags is a keyword argument that controls how the configure phase runs: this is essentially a list of strings. The parent package already has some configure flags, so we use the subsitute-keyword-arguments function to make some changes. This function accepts two arguments, the first is package arguments (package-arguments weechat). The second argument (line 13-14) has two parts - first we get the existing configure flags list - then we adds the four strings to the flags list using cons*.

One odd part might be the use of ,flags (line 14) which is an unquote so that the actual list of existing flags is "injected" into the expression so cons* can then add the ones we want.

To build it do:

$ guix shell --container --nesting --network --share=/var/log/guix --preserve=^TERM$ coreutils

[env] guix build --source --no-substitutes --verbosity=3 --load-path=./ weechat@4.2.1-local
[... lots of output ...]
successfully built /gnu/store/qcja3fcrg3669nx9i48b2p3iks3fbb7b-weechat-4.2.1-local.drv
/gnu/store/9v1vvrm3bvpqh9qpx4dni8wd407100gp-weechat-4.2.1-local-doc
/gnu/store/bjn5nc5ls388zcx0lzdlyycanxs5va3j-weechat-4.2.1-local

;successfully built /gnu/store/b2ppdqa6siax68nxngcmm4vndq1kds1i-weechat-4.2.1-local.drv
;/gnu/store/1zpw72z9pkqr1rjmglzgk0kfy8lim0yl-weechat-4.2.1-local-doc
;/gnu/store/rkcx62hdfgfsyqxg59qh9qqn9yapiv29-weechat-4.2.1-local

When it's built we can look at the build derivation, which for me was:

/gnu/store/qcja3fcrg3669nx9i48b2p3iks3fbb7b-weechat-4.2.1-local.drv

In that file find the builder derivation - this is towards the bottom and has local-builder in it:

/gnu/store/ib3ka505254a8rv7fmh3aks9hs8nnq9j-weechat-4.2.1-local-builder

This contains the %build-inputs that were sent to the builder, it's a list of lists. Each package is a single item list containing a Scheme Pair like this:

("cmake" . "/gnu/store/gl26kr5v6ch5lc3ignly61kb224drijc-cmake-minimal-3.24.2")

This explains why modify-inputs delete (and replace) require a string (e.g. "ruby") - because it's searching through the first field of the Alist looking for the package name.

The configure flags are also in there:

#:configure-flags (cons* "-DENABLE_GUILE=OFF" "-DENABLE_LUA=OFF" "-DENABLE_RUBY=OFF" "-DENABLE_TCL=OFF"
        (list "-DENABLE_PHP=OFF" "-DENABLE_MAN=ON" "-DENABLE_DOC=ON" "-DENABLE_DOC_INCOMPLETE=ON"))

We see it's using cons* to add items to the existing list of configure flags.

Lets look at the package's graph:

# create a graph for the existing package
guix graph --type=references weechat@4.2.1 > weechat-package-references.dot

$ guix graph --load-path=./ --type=references weechat@4.2.1-local > weechat-local-references.dot
$ xdot weechat-local-references.dot

The two graphs are lower down. Figure 6 shows the original Weechat package complete with the two outputs (doc and bin), and all the interpreters. In Figure 7 we have the local package variant, Guile, Lua and Ruby have been removed, while Tcl is being pulled in as a dependency of Python.

Repeating the guix size command from earlier:

guix size --load-path=./ weechat@4.2.1-local

/gnu/store/4r7k7ipiaqkdf4lmnxwmbz0wx2yzygzc-python-3.10.7          226.3    74.0  21.9%
/gnu/store/lj75fc25zx2y9pqvfp95la84rdhlj4f8-perl-5.36.0            152.2    59.4  17.6%
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35              40.6    38.8  11.5%
total 337.8 MiB

That's a pretty good reduction!

References in the weechat package

Figure 6 (click box to enlarge!): Weechat package references

References in the local weechat package

Figure 7 (click box to enlarge!): Slimmed Weechat package references

We can actually go further - as an exercise can you figure out how to replace Python with Python-minimal? Replacing it results in the final package being about half the size of the original one!

Inheriting inputs

An alternative approach, rather than using modify-inputs, is to simply inherit and then put in all the inputs again. This is a good option if the inputs are significantly different from the parent package. The advantage is that the intent is very explicit, and it can be easier to clearly see what's happening with the package.

Final thoughts

We've covered all the main parts of inputs:

  • the different sorts of inputs - inputs, native-inputs.
  • how Guix uses references to deal with runtime requirements.
  • why propagated-inputs are needed and how it's different from the other inputs
  • altering packages inputs with modify-inputs

We'll come back to propagated-inputs when we look at wrap-program. Before that, our next task is to look at the last element that goes into the build, the build-system, and to explore how we can change how the build runs.

Did this post cover everything that you wanted to know about Guix package inputs? If you have thoughts, comments or queries feel free to email or leave comments futurile@mastodon.social


Posted in Tech Friday 29 March 2024
Tagged with tech ubuntu guix