Specter a Clojure library for nested data manipulation

Specter is a Clojure library for navigating, querying and changing nested Clojure data. Nested data structures are common in Clojure, but there aren't that many functions in the core library to access them. Specter solves this by providing ways to navigate through a data structure and to transform the data at the point of navigation. Specter is useful when you want to:

I found Specter quite hard to understand initially and then really rewarding when I'd understood it: it really is a swiss army knife for dealing with nested data.

If you prefer a video introduction then Understanding Specter: Clojure's missing piece by Misophistful is a good place to start. In this post I'm going to give an overview of Specter and then work through an example with common use-cases when handling data - my aim is that this is a good accessible introduction to the library.

Installing Specter

Add it to the dependencies section of your project.clj and do a lein deps:

[com.rpl/specter "1.1.3"]

Then in your clj file do:

    [com.rpl.specter :as sp])

Introduction to Specter

Specter introduces the concept of Navigation which allows us to move through multiple different layers of a data structure. When we're at the point we want to be at, we can then perform operations. Common operations are to query (with select or equivalent), and to alter the data returning a new version (using transform and others).

 user=> (def data_v [1 5 7])
 user=> (sp/select sp/FIRST data_v) ;;=> [1]

In this example there are two elements - a navigation and an operation. At ➊ we define a simple vector. At ➋ the navigation uses the navigator called FIRST which finds the first element within a collection. The operation is the select function - this performs a query by going through the navigation steps and then returning the data it finds as a vector. In this case it's navigated to the first element which is 1.

If you keep the two types of functions in mind then Specter is easier to understand: Navigators and operations. The Navigators like FIRST or nthpath, take us to a location in the data structure. Navigators that don't have any parameters are capitalized, the ones that take some parameters are lower case.

The second type of function handle operations (like select or transform) on the data. The select variants are for querying and the transform ones are for changing data. There are a few variants of each one, the main thing is how they return their data.

Lets look at how to change data with transform:

  ;; returning the same type of data structure
 user=> (sp/transform sp/ALL inc [1 5 7]) ;;=> [2 6 8]
 user=> (sp/transform sp/ALL inc '(1 5 7)) ;;=> (2 6 8)
 user=> (sp/transform sp/ALL inc #{1 5 7}) ;;=> #{6 2 8}

The transform function receives a navigator (ALL), an operation (inc) and the data structure to operate on. We're using the ALL navigator which provides each element of the sequence (in turn) to the operation: so each element (1, then 5, then 7) is given to inc. See the section on transform on the Specter Wiki for some more examples.

Specter tries hard to return the same data type that you provide when changing a data structure - this is very useful because you don't have to add additional transformations. At ➊ above we use a vector ([1 5 7]), notice that Specter returns a vector. The second and third example (at ➋ and ➌) show the same operation but on a list and a set, again Specter returns the same collection type we provided. Compare this to plain Clojure where doing map inc <data structure> would return a list.

Simple nested data

Now we can move onto nested data structures. Lets first define a nested structure and look at how you'd access it with plain Clojure:

user=> (def nest_m {:cat1 {:name "Molly" :age 8} :cat2 {:name "Jack" :age 6}})
user=> (get-in nest_m [:cat1 :age])

It's easy enough to access a single specified value in standard Clojure by using get-in. We can think of the vector [:cat1 :age] as being like a navigator in Specter. However, get-in only works with keywords and sequences, it's limited beyond that so commonly people write their own utility functions.

A good example of this is I find it difficult to collect values across a nested structure in plain Clojure: this is what led me to investigate Specter in the first place. For example lets say I want to collect all the ages of my pets - it's really simple using Specter:

user=> (sp/select [sp/MAP-VALS :age] nest_m) ;;=> [8 6]

Here we tell Specter to navigate in two steps, notice that we put the navigation steps into a vector [s/MAP-VALS :age]. First, we tell it to get all the MAP-VALS which will be {:name "Molly" :age 8} {:name "Jack" :age 6}. For the second step we tell it to access the :age. Navigation is then complete, as this is a Select it returns both values in a vector. See the Specter Wiki for more examples of select queries.

Specter has navigators for maps and for sequences, here's an example using a vector of vectors:

user=> (def data_v [[:turtle 4] [:cat 5] [:bear 10]])
user=> (sp/select [sp/ALL sp/FIRST] data_v)
[:turtle :cat :bear]

In this case we have two steps in the navigation, [sp/ALL sp/FIRST]. The first step says access every element, so [:turtle 4] then [:cat 5] then [:bear 10], The second step of the navigation says access the first element which is the keyword (name). As this is a select it returns the keywords (names) in a vector.

We can also change nested data structures using transform.

; example 1: using the nested map data structure we built earlier
(sp/transform [sp/MAP-VALS :age] inc nest_m) ;;=> {:cat1 {:name "Molly" :age 9} :cat2 {:name "Jack" :age 7}}

; example 2: using the nested vector data structure we created earlier
(sp/transform [sp/ALL sp/LAST odd?] inc data_v) ;;=> [[:turtle 4] [:cat 6] [:bear 10]]

; example 3: using a nested list data structure
(def data_l '((:cod 4) (:haddock 9) (:tuna 2)))
(sp/transform [sp/ALL sp/LAST odd?] inc data_l) ;;=> ((:cod 4) (:haddock 10) (:tuna 2))

; example 4: using a mixed data structure - a vector of maps
(def data_m [{:a 1 :b 2} {:c 3} {:d 4}])
(sp/transform [sp/ALL sp/MAP-VALS even?] inc data) ;;=> [{:a 1 :b 3} {:c 3} {:d 5}]

These three examples show how to change nested structures. In the first we use have a map of maps - so we use MAP-VALS to go through each nested map in turn ({:name "Molly" :age 9} {:name "Jack" : age 7}), and then for each one we access the :age. For each one we apply inc to increment the age. Notice that transform returns the whole data structure put back together.

In the second example we have a vector of vectors. The only difference from earlier is that we use LAST which gives us the last item in the nested vector, which is the age, we then check if it's odd or not.

In the third example we do the same thing but use a list of lists. Notice that Specter returns the same data type.

The fourth example shows mixing different collection types - have a vector of maps. The first step is to access each element of the vector with ALL, we then want the map values (using MAP-VALS) and finally we can check if they are even or not. For each one that is even we increment it with inc.

Understanding navigation

We've seen a few of the unparametrized navigators (ALL, MAP-VALS, LAST) at this point. We'll look at the paramertized ones later. The documentation (specifically the Github Wiki) has examples for each one which is really helpful.

I found it quite difficult to understand when to use one navigator over another - the main thing to know is whether you're at a sequence/collection or at an element. The navigators are split between those that deal with an entire collection (e.g. MAP-VALS) and those that deal with an element in a collection (e.g. LAST).

When investigating in the REPL another tip is that you can use select or select-one to progressively explore navigation.

If you're manipulating a section of a data structure repeatedly or want to use the same navigation later then you can create your own specific path:

user=> (def CAT-AGE (sp/nthpath 1 1))
user=> (sp/select-one CAT-AGE data_v)

In the first line we create an unparametrized navigator called CAT-AGE. Really we're creating a normal symbol - capitalizing it so we're keeping to the conventions of Specter - which is a shortcut telling Clojure to call nthpath 1 1. The effect is that when we apply it to data_v it gives us the age of the cat.

Before you continue it's worth watching Understanding Specter by Misophistful and Nathan Marz's video introducing Specter to get a good overview.

Fruit delivery example

The rest of the post uses a worked example of common use-cases when querying and transforming a nested data structure.

For various reasons you've started a fruit delivery service, where each week you deliver the finest fruit and vegetables to your local community. Like any good programmer you've been keeping a record as a Clojure data structure (rather than in a note pad which would be faster, but would make for a very short example!). Here's the data structure:

(def fruit_order {:order-date 20210214
                  :summer [{:apples 3} {:apples 4} {:apples 5 :pears 3}]
                  :winter [{:apples 3} {:apples 5 :pears 0} {:apples 3 :clementine 2}]})

These are the operations we want to carry out:

  1. Query a value: how many pears do we order?
  2. Define a path: set the path to winter clementines
  3. Query all values: how many apples do we order in the summer?
  4. Change a single element: how do we change the date of the order?
  5. Add something at a specific location in the structure: add bananas to each of the summer orders
  6. Remove a specific element: remove the pears from the summer order
  7. Add something if a condition is met: add bananas if we also have pears
  8. Add value from one part of the structure to another: add as many bananas in summer as we have clementines in winter
  9. Add a collection: add a new weekly fruit order
  10. Remove a collection: remove the last week of the winter order
  11. Replace a collection of values: change the second week of winters order
  12. Remove something from every element: remove an apple from every order that has them
  13. Append a new element to each one: add bananas to every order

Query a value

How many pears did we order?

We've seen that we can use the unparametrized navigators (like FIRST) to access parts of the data structure. There are also parametrised navigators, the convention for these in Specter is that they are lower case. Here we use nthpath to find the second element.

user=> (sp/select-one [:summer (sp/nthpath 2) (sp/must :pears)] fruit_order)

The navigator here initially goes to the summer section of the map, it then accesses the vector and uses nthpath to go to the second element which is a map. We then use must to check for a pears key, and return this. Notice that our query uses select-one as we only want a single value.

Define a path

Create a winter clementines path

As we navigate through complex structures we'll often use the same path multiple times across queries and transformations. Rather than repeating it we can create a path var.

user=> (def WINTER-CLEMENTINES (sp/path :winter (sp/nthpath 2) (sp/must :clementine)))

user=> (sp/select-one WINTER-CLEMENTINES fruit_order)

The first line defines a path that is initially :winter, then it takes the second map from {:apples 3} {:apples 5, :pears 0} {:apples 3, :clementine 2}, and searches through for the :clementine map-key. The second line uses this select-one with the defined path.

Query all values

How many apples did we order in the summer?

After accessing a single element, we might want to query and summarise all elements within a data structure. We saw this concept earlier in the introduction to Specter.

user=> (sp/select [:summer sp/ALL :apples] fruit_order)
[3 4 5]

Again we navigate to the summer section of the map in the first step. In the second step we navigate to all the elements of the vector. For each map we access the :apples key and return the value. In this case we're using select so we return a vector. If we want to summarise we use reduce.

Change a single element

How do we change the date of the order?

We can use either transform or setval to make the change. With setval we navigate to a location and replace the specific element which is easier for a simple change.

user=> (sp/setval :order-date 20210303 fruit_order)
{:order-date 20210303,
 :summer [{:apples 3} {:apples 4} {:apples 5, :pears 3}],
 :winter [{:apples 3} {:apples 5, :pears 0} {:apples 3, :clementine 2}]}

Add something at a specific location

Add bananas to the summer orders

In the previous example we changed a single element of the data structure. In this case we want to change a section of the data structure.

;; option 1: use setval
user=> (sp/setval [:summer sp/ALL :bananas] 5 fruit_order)
{:order-date 20210214, :summer [{:apples 3, :bananas 5} {:apples 4, :bananas 5} {:apples 5, :pears 3, :bananas 5}],
                       :winter [{:apples 3} {:apples 5, :pears 0} {:apples 3, :clementine 2}]}

;; option 2: use a transform with an anonymous function
user=> (sp/transform [:summer sp/ALL] #(assoc % :bananas 3) fruit_order)
{:order-date 20210214, :summer [{:apples 3, :bananas 3} {:apples 4, :bananas 3} {:apples 5, :pears 3, :bananas 3}],
                       :winter [{:apples 3} {:apples 5, :pears 0} {:apples 3, :clementine 2}]}

In the first case we use setval which can add a value at a particular location. We navigate through the :summer section, then for each of the maps we add a :bananas element and add 5 to it.

The second option is to use an anonymous function with transform: with transform we can either provide an existing function (e.g. we used inc earlier) or we send our own anonymous function. In the second example, we've provided a function that uses the parameter (which will be each of the maps in turn) and then adds the bananas.

Remove a specific element of a collection

Remove the pears from the summer order

Many of the navigators let us use NONE as a value so we can remove a specific element. We just need to directly navigate to the element and then provide NONE to remove it.

; example of using NONE with a vector navigation
user=> (sp/setval sp/FIRST sp/NONE [1 2 3 4 5])  ;;=> [2 3 4 5]

user=> (sp/setval [:summer sp/ALL :pears] sp/NONE fruit_order)
{:order-date 20210214, :summer [{:apples 3} {:apples 4} {:apples 5}],
                       :winter [{:apples 3} {:apples 5, :pears 0} {:apples 3, :clementine 2}]}

We navigate to the :summer section, then for each of the maps we look for the pears key, where we find it we change it to NONE which removes that element.

Add something if a condition is met

Add bananas if we also have pears

We can also use logic in the navigation to make choices about what to do.

; Attempt 1: problem is it navigates to the element
user=> (sp/select [:summer sp/ALL :pears] fruit_order)
[nil nil 3]

; Option 1: do a pred to check if something submatches
user=> (sp/transform [:summer sp/ALL (sp/pred :pears)] #(assoc % :bananas 10) fruit_order)
{:order-date 20210214, :summer [{:apples 3} {:apples 4} {:apples 5, :pears 3, :bananas 10}],
                       :winter [{:apples 3} {:apples 5, :pears 0} {:apples 3, :clementine 2}]}

; Option 2: run our own function and if it returns true we use it
user=> (sp/transform [:summer sp/ALL (sp/selected? #(% :pears))] #(assoc % :bananas 10) fruit_order)
{:order-date 20210214, :summer [{:apples 3} {:apples 4} {:apples 5, :pears 3, :bananas 10}],
                       :winter [{:apples 3} {:apples 5, :pears 0} {:apples 3, :clementine 2}]}

The main issue is that we want to deal with the whole collection, not with the specific values. In Attempt 1 we navigate down to pears and we can check the value, but we actually want to add our value onto the whole of the map.

There are two options, using either pred or selected?.

In Option 1 it's the same navigation for the first part of the query, we navigate to :summer and then we look at each element of the vector (each map): {:apples 3}, {:apples 4} and {:apples 5 :pears 3}. For each of these maps we use pred to check if a predicate is satisfied - here's it simple we just test if :pears is there. If the predicate is satisfied then the map is passed into our transform function.

Option 2 is a great option if there's a more complex query to do, it runs any arbitrary function and if the function returns true then it runs the transform function. For each map we run the selected? navigator where we can provide a function. Essentially, this allows us to provide a user function and then only keep the values we want. In this case we check whether :pears exists and if it does we add the apples.

Add values from one part of the structure to another

Add as many bananas in summer as we have clementines in winter

So far we've navigated to a location and then changed or added a value to an element. We can also run a query on one section of the structure and then use those values to alter another part.

We basically want to say:

"Find me the value of clementines and store it, then navigate to the right place in summer and add it as bananas"

To do this we use the collect-one function which lets you run a subquery and keep the value of that query.

; example 1: using collect-one
user=> (sp/select-one [(sp/collect-one sp/FIRST) sp/LAST] [1 2 3 4 5 6]) ;;=> [1 6]

; define a SUMMER-PEARS path - same navigation path we used in the previous example
user => (def SUMMER-PEARS (sp/path :summer sp/ALL (sp/selected? #(% :pears))))

; this is the query we'll use to find out how many Clementines there are
user=> (sp/select-one [:winter sp/ALL (sp/pred :clementine) :clementine] fruit_order) ;;=> 2

; use that path, but with collect-one which collects it
; when the transform function runs, it provides the value
user=> (sp/transform [(sp/collect-one :winter sp/ALL (sp/pred :clementine) :clementine) SUMMER-PEARS] #(assoc %2 :bananas %1) fruit_order)
{:order-date 20210214, :summer [{:apples 3} {:apples 4} {:apples 5, :pears 3, :bananas 2}],
                       :winter [{:apples 3} {:apples 5, :pears 0} {:apples 3, :clementine 2}]}

In first line we use the collect-one function in the simplest form: it lets us navigate somewhere and collect the value at that point. In this case we use collect-one to navigate to the first value in the vector and collect the first value - in the second step we navigate to the last value of the vector and collect the value there. Think of it as two separate queries.

You'll recall we said we could define our own path, to simplify the transform we define a path to the summer pears.

Then in the third part we use collect-one to find the value of the winter clementines, and we navigate to the summer pears (using the path we defined above). As it's a transform we call an anonymous function and we add bananas with the value we collected.

Add a collection

Add a new weekly fruit order

Rather than adding an element in a collection, we can also add a collection into an existing collection.

Our vector of maps collects the fruit orders we have each week. Lets say it's winter and we want to add a new order of apples and pears.

user=> (sp/setval [:winter sp/AFTER-ELEM] {:apples 4 :pears 9} fruit_order)
{:order-date 20210214, :summer [{:apples 3} {:apples 4} {:apples 5, :pears 3}],
                       :winter [{:apples 3} {:apples 5, :pears 0} {:apples 3, :clementine 2} {:apples 4, :pears 9}]}

The AFTER-ELEM navigator puts you in the collection and navigates to the 'void' at the end of it. Then we can use the setval function. If we used LAST instead we're replace the previous order.

Remove a collection

Remove the last week of the winter order

We previously used NONE to remove an element from a collection: this is just an extension of that idea, here we're removing a collection which is an element of another collection.

user=> (sp/setval [:winter sp/LAST] sp/NONE fruit_order)
{:order-date 20210214, :summer [{:apples 3} {:apples 4} {:apples 5, :pears 3}],
                       :winter [{:apples 3} {:apples 5, :pears 0}]}

Replace a collection of values

We discover an error in the records and need to change the second week of winters order. We actually ordered 8 pears, not zero as the records say.

We can also replace an existing collection with a new collection. If it's in the first layer then we can just use dissoc and there's no need to use Specter.

The key navigation concept is that we need to navigate to a specific element in a collection. In this case we want the second map {:apples 5, :pears 0} which is inside a vector of maps.

user=> (def WINTER-WEEK2 (sp/path :winter sp/nthpath 1))

user=> (sp/setval WINTER-WEEK2 {:apples 5 :pears 8} fruit_order)
{:order-date 20210214, :summer [{:apples 3} {:apples 4} {:apples 5, :pears 3}],
                       :winter [{:apples 3} {:apples 4, :pears 8} {:apples 3, :clementine 2}]}

We're using nthpath to navigate to the first element of the vector, and then changing it with setval. We've already seen similar navigators such as FIRST and LAST.

Remove something from every element

Remove an apple from every order

If we can navigate to something then we can change it. This means we can also make changes to groups of things.

; Attempt 1: doesn't work because cond-path stops looking after it finds true
user=> (sp/select [(sp/cond-path (sp/must :winter) :winter (sp/must :summer) :summer)] fruit_order)

; Attempt 2
; doesn't work because filterer only works on a sequence and we're dealing with a map
user=> (sp/select [sp/MAP-VALS (sp/pred vector?) (sp/filterer :apples)] fruit_order)

;; Example 1: strip out anything that's not a vector
user=> (sp/select [sp/MAP-VALS (sp/pred vector?) sp/ALL :apples] fruit_order) ;;=> [3 4 5 3 5 3]
user=> (sp/transform [sp/MAP-VALS (sp/pred vector?) sp/ALL :apples] #(dec %) fruit_order)
{:order-date 20210214, :summer [{:apples 2} {:apples 3} {:apples 4, :pears 3}],
                       :winter [{:apples 2} {:apples 4, :pears 0} {:apples 2, :clementine 2}]}

;; Example 2: use multi-path to find multiple sections
user=> (sp/select [(sp/multi-path (sp/must :winter) (sp/must :summer))] fruit_order)
[[{:apples 3} {:apples 5, :pears 0} {:apples 3, :clementine 2}] [{:apples 3} {:apples 4} {:apples 5 :pears 3}]]

user=>(sp/transform [(sp/multi-path (sp/must :winter) (sp/must :summer)) sp/ALL :apples] #(dec %) fruit_order)
{:order-date 20210214, :summer [{:apples 2} {:apples 3} {:apples 4, :pears 3}],
                       :winter [{:apples 2} {:apples 4, :pears 0} {:apples 2, :clementine 2}]}

I found this one quite difficult - showing my attempts so we can see other ideas.

Initially, I tried using cond-path which tests a path and if it selects something returns. I thought it would search the whole of the data structure, but it stops when it finds the first true value (e.g. when it selects something). In this case when it finds :winter it goes into that structure, but it doesn't then look at the :summer part of the structure.

Next (Attempt 2) I tried filterer which is really interesting function. However, it only works with sequences, not with maps.

In Example 1 we're getting the map values, and from there we're checking if we have a vector using the pred function, for those that return true (basically the :summer and :winter section of the data structure) we look at all the elements (which are maps) and finally select those with :apples.

In Example 2 we're using multi-path to search through both the :summer and :winter sections of the data structure. The select shows that it returns two separate vectors with the maps inside them.

The actual transform uses multipath to search through each section. The rest is the same to the previous example. Multi-path is great for looking in different sections of the data structure simultaneously.

Append a new element to each one

Add bananas to every order

This is really the opposite of remove example above, so lets slightly simplify.

; define our own path to use
user=> (def ALL-ORDERS (sp/path [sp/MAP-VALS (sp/pred vector?) sp/ALL]))

user=> (sp/select ALL-ORDERS fruit_order)
[{:apples 3} {:apples 4} {:apples 5, :pears 3}, {:apples 5, :pears 0} {:apples 3, clementine 2}]

user=> (sp/transform ALL-ORDERS #(assoc % :bananas 3) fruit_order)
{:order-date 20210214,
:summer [{:apples 3, :bananas 3} {:apples 4, :bananas 3} {:apples 5, :pears 3, :bananas 3}],
:winter [{:apples 3, :bananas 3} {:apples 5, :pears 0, :bananas 3} {:apples 3, :clementine 2, :bananas 3}]}

As you can see it's just a matter of getting to the vector of maps and then we can simply pass it to an anonymous function that uses assoc.

Further Reading

I didn't find that many resources on Specter, the best information is on the Specter Wiki which has examples for all the library's capabilities which is really helpful.

Final Thoughts

I'm finding Specter a really powerful tool and hope to use it for general data transformation. It is a big library so you have to invest time to understand it's capabilities. It's worth mentioning Eidolon - a library of Specter navigators.

There are alternatives such as Vvvvalvalval's supdate (looks interesting and is simpler), there's also instar (simpler assoc/dissoc and update-in) and cats (it's too maths advanced for me personally!).

Even though this post is long I'm sure that Specter has capabilities I haven't touched on here - if you have any good examples please comment or drop me a note.

Posted in Tech Saturday 20 March 2021
Tagged with tech clojure