We often want to make changes to the source that's used in a package. For example, to make a package easier to build by altering the versions of libraries that are used, or by modifying some of the tests so they can execute in Guix's build environment. In this post we'll look at Guix's substitute* procedure and how to use it to make alterations by replacing text in a package's source. This is by far the most common change to package definitions, in fact the substitute* procedure is called over 9,500 times in Guix's package archive.
This is part 10 of the ongoing Guix packaging series - click through for more fun!
There are three ways to alter the source (as defined in the <origin> record) of a package: a snippet, a patch or a custom phase. We introduced snippets and patches back in an earlier post on source/origin inputs and custom phases in the post on modify-phases. You might wonder when you should use one over the other? There's no specific policy but some guidance is:
The choice to use a snippet or a custom phase is often personal preference. If it's a simple change I prefer a snippet, for example removing a bundled library, as this will make the change before the source is added to the Store. However, for that same reason it's more complicated to test snippets, so for complicated changes a custom phase seems better.
Having chosen the method to make the change, we now need to alter the source. There are two functions in (guix build utils) which can be used to search through a file and make changes to it, substitute and substitute*.
The substitute function accepts a pattern to search for and a procedure to operate if a match is found.
The substitute* procedure searches through a file (or set of files) for some text and replaces it with the text we specify. The difference is we don't have to provide a custom function, so it's much easier to use. Here's the basic syntax:
substitute* file ((regexp match-var…) body…) … (substitute* "somefile.txt" ( ("text-to-search") "replacement-text"))
Where the first argument "somefile.txt" is a file to search through. The second is a regular expression ("text-to-search") and the last parameter is the text to replace it with "replacement-text". Note that it always returns #t even if it doesn't match anything.
Some simple examples:
1 ;; replace -O with -O2 in Makefile 2 (substitute* "Makefile" (("-O" "-O2 "))) 3 4 ;; use the correct 'file' command 5 (substitute* "configure" 6 (("/usr/bin/file") (which "file"))
The first example looks in a Makefile and anytime it finds the string "-O" it replaces it with "-O2 ". The second one is something you'll often see in build scripts (configure in this case) where you need to replace a hard-coded tool with the one that's available in the Guix build environment.
Note that it's matching using Scheme's regular expression rules, if these aren't familiar see the A side of Guile posts on Guile Regular Expressions.
As you might expect it's fairly common to see multiple different matches taking place:
1 ;; python-awkward-cpp from python-xyz.scm 2 (substitute* "pyproject.toml" 3 (("scikit-build-core..0.10") "scikit-build-core") 4 (("^minimum-version =.*") ""))
Here we're looking for two matches, the first is a specific sciki-build-core and removing the version, and in the second for minimum version and removing it by replacing the whole thing with blank text. Both of these contain more complex regular expressions.
The substitute* function can also match multiple different regexp subexpressions. This means that within any single match we can find different parts of it and do something with them. A good example of where this is useful is matching against the whole of a version string, we can use subexpressions to split it up and then remove part of it and substitute that with what we want. The format is:
(substitute* "file.txt" ( ("(regexp-sub-express1) ... (regexp-sub-express2)" all subexpress1 subexpress2) "replace-txt")
The whole of the match is included in the first var (called all here), and then each subexpression's match is put into their own vars (e.g. subexpress1). Here's an example:
; guix shell guile guile-readline guile-colorized -- guix repl (use-modules (guix) (guix build utils)) (use-modules (ice-9 regex)) => (substitute* "t/spamc_x_e.t" (("/bin/echo" all) (format #t "Substitute match: ~s\n" all) "LALALALA")) Substitute match: "/bin/echo" Substitute match: "/bin/echo" $7 = #t
In this example the file that's being checked is "t/spamx_x_e.t". The match that's being looked for is "/bin/echo" and it's replaced with "LALALALA". When it matches the match-var is called all. Then a body is evaluated for this match-var, in this case it's one expression which is the call to format to print out what was matched.
The fact that substitute* alters a file makes it a bit annoying when you're exploring a regex. I always find I make mistakes when using a regex, so I split my exploration into two phases in the REPL. In the first step I use the (ice-9 regex) functions to tune my regex. For the second step I use substitute* with a test file, before transferring it into the package definition.
Here's an example:
; guix shell guile guile-readline guile-colorized -- guix repl => (use-modules (guix) (guix build utils)) ;; for the initial experimentation => (use-modules (ice-9 regex)) ;; create the string that I want to alter => (define base-str "humansize = 2.1.3") ;; matching by finding the name of the library in a subexpression ;; then .{3} is 3 characters of any form (space = space) ;; then finally the version of the library is the next subexpression => (string-match "(humansize).{3}(2.1.3)" base-str) #("humansize = 2.1.3" (0 . 17) (0 . 9) (12 . 17)) => (match:count (string-match "(humansize).{3}(2.1.3)" base-str)) ;; 0 is the whole thing; 1 is the library name, 2 is the version ;; specify each one => (match:substring (string-match "(humansize).{3}(2.1.3)" base-str) 2) ;; Create a test.txt file and put the string into it [dependencies] humansize = 2.1.3 ;; Test using substitute* => (substitute* "test.txt" ( ("(humansize).{3}(2.1.3)" all name ver) (string-append name " = 3.0.0\n")))
When doing the replace part of the expression, note that the expression part entirely replaces what was matched by the regexp. The base match (subexpression 0) is the whole string, so if you want to alter just the version you have to provide the name of the library. For example, above we can't do "string-replace ver "2.0.0" as then the line that's substituted would just be that part, there would be no library name. If the regexp only matched the version part then we could do a string-replace, but this increases the risk that we'll match incorrectly as it's less precise.
To get a sense of how substitute* is used lets look at a few examples from Guix's package archive. Here are a few where it's used in a snippet.
The gifsicle package in gnu/packages/images.scm has this snippet:
(modules '((guix build utils))) (snippet '(begin (substitute* "configure.ac" (("2.72") "2.69"))))))
The package record adds (guix build utils) to the modules that are active in the builder, this makes the substitute* function available during the build. The snippet section of the record uses the older style quoted list. The begin isn't really required as it's used when there are multiple procedure calls, while in this case there's only one. It calls substitute* which is looking at the configure.ac file, it searches for 2.72 and replaces it with 2.69. Simple enough, it would be ideal if there was a comment telling us what it's looking for.
The goaccess package in gnu/packages/web.scm does something a bit more involved:
(modules '((guix build utils))) (snippet '(substitute* '("src/error.h" "src/parser.c") (("__DATE__") "\"1970-01-01\"") (("__TIME__") "\"00:00:00\"")))))
The substitute* receives a quoted list of files to examine ("src/error.h" and "src/parser.c"). There are two matches to check for and it replaces them with a standard date and time. Notice the use of quoting with backslash escaping, see the A side of Guile posts about regular expressions which cover this extensively. This will have been done because the builds were embedding the current date/time during each build which destroys reproducibility, by having standard date/times in the packages the hashes won't change.
Finally, the pythonawesomeversion package in gnu/packages/python-xyz.scm:
(modules '((guix build utils))) (snippet #~(substitute* "pyproject.toml" (("version = \"0\"") (format #f "version = \"~a\"" #$version))))))
The modules line uses the old style quoted list, and the snippet uses the newer format using a Gexp. It's generally preferred to use the G-expressions and the style formatter will warn about it (guix style). The #~ replaces the quoted list and signals that what follows is run on the builder. The substitute* examines pyproject.toml and looks for some version information (with embedded quotes). Then the format procedure embeds the version from the package record, it does this using another gexp #$version.
Custom phases are often more complex and in many cases substitute* is used as part of a set of changes. Here are a few examples that are a bit simpler:
The r-adacgh2 package in gnu/packages/bioconductor.scm is an R package. It has a custom phase like this:
(arguments (list #:phases '(modify-phases %standard-phases (add-after 'unpack 'python3-compatibility (lambda _ (substitute* "inst/imagemap-example/toMap.py" (("print nameMap") "print(nameMap)")))))))
It uses modify-phases to add a phase called 'python3-compatibility which gives us a good idea of what the goal is. The substitute* looks at a single file and replaces a call to print with one that works with Python3. Not that complex, but demonstrates a different kind of alteration to a package's source. This is also a good example of an alteration that could be done as a snippet but the developer chose to use a phase instead.
The ruby-minitest-bonus-assertions package in gnu/packages/ruby-check.scm custom phase is like this:
#:phases #~(modify-phases %standard-phases (add-before 'check 'clean-dependencies (lambda _ ;; Remove unneeded require statement that would entail another ;; dependency. (substitute* "test/minitest_config.rb" (("require 'minitest/bisect'") "")))))))
This uses a G-expression to define the phase that's being added which is called 'clean-dependencies. There's a good comment to explain why it's required. The substitute* call makes a change to test/minitest_config.rb where it looks for a specific require statement and replaces it with a blank line.
Finally, the kigo package in gnu/packages/kde-games.scm make an interesting change:
#:phases #~(modify-phases %standard-phases (add-after 'unpack 'patch-gnugo-command (lambda* (#:key inputs #:allow-other-keys) (substitute* "src/kigo.kcfg" (("\"gnugo\"") (format #f "~s" (search-input-file inputs "bin/gnugo")))))))))
This is the newer style that uses a G-expression (#~) to add a phase called 'patch-gnugo-command. The lambda* has parameters for the packages inputs and it allows any other keywords to be passed in. The package's inputs are used later in the substitute* call, this is the first example that's used parameters.
The substitute* function looks in the source's configuration (src/kigo.kcfg) and searches for the string "gnugo". We have another use of format to create the replacement string. In this case it uses search-input-file to look through the package's inputs to find a file "bin/gnugo". This has the effect of creating a string with the correct Store path (i.e. "/gnu/store/xxxx/bin/gnugo") which is then embedded. Needing to embed explicit Store paths is pretty common, and we'll look at this in more detail in a future post.
Alright, that's the end of our tour of substitute*, which has to be the most used procedure for Guix packages.
For those writing plain Guile Scheme there is regexp-substitute and regexp-substitute/global which perform similar functions. Guix's version is tailored to packaging, particularly with the ability to easily name subexpressions.
It's really useful to see how different people have used substitute* when creating packages. I didn't add complex examples from custom phases. For further exploration the graphical environment packages (e.g. GNOME and KDE) have lots of interesting uses, as well as some of the programming languages.
As usual if you're like to ask a question or comment you can either e-mail me (details on the About page), or reach me on Mastodon, futurile@mastodon.social.