zq
would compare to jq
on one of the most common things I have to do with AWS JSON results: dig out a resource's name from its Tags
. This has got to be one of the most annoying decisions AWS has made across the board. I understand names can't be required, but not having a top-level optional attribute vs. pushing it down into Tags seems ridiculous to me.{
"Reservations": [
{
"Groups": [],
"Instances": [
{
"InstanceId": "i-1234567890abcdefa",
"InstanceType": "t4g.medium",
"Tags": [
{
"Key": "Name",
"Value": "my-groovy-ec2"
}
]
},
{
"InstanceId": "i-9876543210abcdefa",
"InstanceType": "t4g.medium",
"Tags": [
{
"Key": "Name",
"Value": "et-c-tu-brute"
}
]
}
]
}
]
}
Name
up from the Tags alongside its InstanceId
? Here's a jq
way to do it:❯ aws ec2 describe-instances |
jq -c -r '
.Reservations[].Instances[]
| {name: (.Tags[] | select(.Key == "Name") | .Value), id: .InstanceId}'
{"name":"my-groovy-ec2","id":"i-1234567890abcdefa"}
{"name":"et-c-tu-brute","id":"i-9876543210abcdefa"}
zq
? First, let's add some commenting to the jq
version so we're sorta tracking where we're at in the navigation of the original document as we go:❯ aws ec2 describe-instances |
jq -c -r '
.Reservations[] # for each object in the Reservations array...
.Instances[] # for each object in the Res Instances array...
| {
name: # make a new object with a name key, and for its value...
# drop down into the Res Instances Tags array...
( # and find the Value of Name
.Tags[] | select(.Key == "Name") | .Value
),
# back up at the new object with the Instance object
# grab the InstanceId as the value of id
id: .InstanceId
}'
{"name":"my-groovy-ec2","id":"i-1234567890abcdefa"}
{"name":"et-c-tu-brute","id":"i-9876543210abcdefa"}
zq
, how do we do the parenthetical bit of dropping down into the Tags to retrieve the Name while still being able to execute something on the parent to grab the Id?“Lateral subqueries provide a powerful means to apply a Zed query to each subsequence of values generated from an outer sequence of values. The inner query may be any Zed query and may refer to values from the outer sequence.”
# Lateral Subquery
❯ aws ec2 describe-instances |
zq -z '
over Reservations
| over Instances
| over Tags with obj=this => ( where Key=="Name"
| {name: this.Value, id:obj.InstanceId} )' -
# Lateral Expression
❯ aws ec2 describe-instances |
zq -z '
over Reservations
| over Instances
| {name:(over Tags | where Key=="Name" | Value), id:InstanceId}' -
“Lateral subqueries can also appear in expression context using the parenthesized form. Note that the parentheses disambiguate a lateral expression from a lateral dataflow operator.”This is pretty tidy. And in this case, brings us about equal with jq here.
zq
version 1.16.0 and jq version 1.7jq
→ zq
learning curve gotchyas.search
Dataflow Operator is not only an Implied Operator but it is the “primary” implied operator, the first one to be evaluated by the parser (see also ZqImpliedOperator).#!/usr/bin/env bash
echo "Downloading..."
curl "https://raw.githubusercontent.com/prust/wikipedia-movie-data/91ae721d/movies.json" \
>raw.json
echo "Flattening single array into separate lines:"
echo "- Processing to movies.zng..."
zq -f zng 'over this' raw.json >movies.zng
echo "- Processing to movies.json..."
jq -c '.[]' raw.json >movies.json
echo "Removing raw.json"
rm raw.json
zq
, it also saves the data into a binary Zed format called ZNG (“zing”) that performs much better (see also ZqWhereDoYouGetTheseNames).❯ zq 'Bacon' movies.zng
...
...[snip a bacon-ton of output]...
...
{title:"Leave the World Behind",year:2023...}
❯ zq 'year >= 2022 | Bacon | yield this.title' movies.zng
"They/Them"
"One Way"
"Smile"
"Space Oddity"
"Leave the World Behind"
❯ zq 'year >= 2022 | Bacon | ! Kevin | cut title, year, cast' movies.zng
{title:"Smile",year:2022,
cast:["Sosie Bacon","Jessie T. Usher","Kyle Gallner",↩
"Robin Weigert","Caitlin Stasey","Kal Penn","Rob Morgan"]}
❯ zq 'Bacon | ! Kevin | cut title, year' movies.zng
...
...[snip another bacon-ton of output]...
...
❯ zq -j 'Bacon | ! Kevin | cut title, year
| decade:=(year/10)*10 | count() by decade | sort decade' movies.zng
{"decade":1910,"count":7}
{"decade":1920,"count":19}
{"decade":1930,"count":43}
{"decade":1940,"count":31}
{"decade":1950,"count":13}
{"decade":2020,"count":1}
‘Bacon | ! Kevin | year < 1960 | cut title, cast'
is too much output, but scanning it doesn't show many Bacon names at all. It must be in the Extract field (unless “Bacon” is a genre? And why shouldn't it be?! mmmm, bacon)❯ zq 'year < 1960 | cut title, cast | Bacon' movies.zng
{title:"The Fireman",cast:["Charlie Chaplin","Edna Purviance","Lloyd Bacon"]}
{title:"The Floorwalker",cast:["Charlie Chaplin","Edna Purviance","Eric Campbell","Lloyd Bacon"]}
{title:"The House of Intrigue",cast:["Mignon Anderson","Lloyd Bacon"]}
{title:"The Girl in the Rain",cast:["Anne Cornwall","Lloyd Bacon"]}
{title:"The Greater Profit",cast:["Edith Storey","Pell Trenton","Lloyd Bacon"]}
{title:"Hearts and Masks",cast:["Elinor Field","Lloyd Bacon","Francis McDonald"]}
{title:"Bringin' Home the Bacon",cast:["Jay Wilsey","Jean Arthur"]}
{title:"Branded Men",cast:["Ken Maynard","June Clyde","Irving Bacon"]}
extract
field:❯ zq 'year < 1960 | cut extract | Bacon
| bacon:=regexp_replace(this.extract, /.*\W(\w+? Bacon).*/, "$1")
| cut bacon | sort | uniq' movies.zng
{bacon:"Daskam Bacon"}
{bacon:"David Bacon"}
{bacon:"Irving Bacon"}
{bacon:"Lloyd Bacon"}
{bacon:"the Bacon"}
❯ zq '"the Bacon" | cut title' movies.zng
{title:"Bringin' Home the Bacon"}
zq
-- in addition to its many other features -- is designed to be a search tool, and because search
is the primary Implied Operator, meaning you don't need to type out search
in contexts where the tool presumes that's what you meant.zq
learning curve. Click on through to ZqImpliedOperatorsCanTrickYou for more on that.zq
for doing data exploration and normalization. The Zed docs offer a couple of great tutorials that go into more detail: Real-World GitHub Data & Zed and Schools Data.zq
vs jq
in a different post, but we can take a peek in that direction here before we sign off.jq
, but even the basic zq
search
operator doesn't seem to be as easy in jq
, to search through all of the values of objects. Here's what I've come up with so far:❯ jq 'with_entries(select(.value | tostring | contains("Bacon")))
| select(. != {})' movies.json
❯ cat search.jq
def search(term):
with_entries(select(.value | tostring | contains(term))) |
select(. != {});
❯ jq 'include "search"; search("Bacon")' movies.json
...
zq
you can define your own dataflow operators and functions. All functions need to be called with parens, even argumentless ones, whether built-in or user-defined.func <id> ( [<param> [, <param> ...]] ) : ( <expr> )
<id>
and <param>
are identifiers and <expr>
is an expression that may refer to parameters but not to runtime state such as this
.”op <id> ( [<param> [, <param> ...]] ) : (
<sequence>
)
<id>
is the operator identifier, <param>
are the [optional] parameters for the operator, and <sequence>
is the chain of operators (e.g., operator | ...
) where the operator does its work.” And it can refer to this
.Though it's more precise to say the body of a user-defined function is an Expression, and the body of a user-defined operator is a Sequence of Dataflow Operators.
User-defined Function Can only call other functions. Cannot refer to this
.User-defined Operator Can call operators and functions. Can refer to this
.
❯ zq -z \
'op add_ids(): (
over this
| yield {id:count(), ...this}
)
yield [{name:"lorem"},{name:"ipsum"}] | add_ids()'
{id:1(uint64),name:"lorem"}
{id:2(uint64),name:"ipsum"}
add_ids
will take an array of records and add an auto-incrementing id
integer to each one.zq
things going on here, but for now we'll just focus on calling the operator, with parens. A call without, and again this will fall to the search
ImpliedOperator:❯ zq -z \
'op add_ids(): (
over this
| yield {id:count(), ...this}
)
yield [{name:"lorem"},{name:"ipsum"}] | add_ids'
❯
add_ids
without parens is interpreted by zq
as search add_ids
and the string “add_ids”
can't be found in the array of two records.zq
version 1.16.0 and jq version 1.7jq
(not to mention “operators” in jq
docs refer to things like +
and -
, and zq
“dataflow operators” are things like over
, yield
, and put
).zq
always need to be called with parentheses, even if it has no arguments (like the now
function). But in jq
, argumentless functions receive their input via the pipe |
operator and don't have parens, like Dataflow Operators in zq
.❯ echo "[1,2,3]" | jq 'length'
3
❯ echo "[1,2,3]" | jq 'length()'
jq: error: syntax error, unexpected ')' ... at <top-level>, line 1:
length()
jq: 1 compile error
zq
“thing” is a dataflow operator vs. a function (I generally have to look all those things up anyway with either tool) the rule to keep in mind in zq
is if it's a function, it's gotta have parens.❯ echo '{a:null}' | zq 'now() | {a:this}' -
{a:2024-07-12T14:25:24.70848Z}
search
implied operator. (see ZqImpliedOperator).❯ echo '{a:null}' | zq 'now | {a:this}' -
❯
zq
version 1.16.0 and jq version 1.7zq
operators are Implied Operators, which means “Zed allows certain operator names to be optionally omitted when they can be inferred from context.” The operators are evaluated in this order:Evaluation | Operator | Implied | Explicit |
search expression | search | foo | search foo |
boolean expression | where | a >= 1 | where a >= 1 |
field assignment | put | a:=x+1,b:=y-1 | put a:=x+1,b:=y-1 |
aggregation | summarize | count() | summarize count() |
expression | yield | {a:x+1,b:y-1} | yield {a:x+1,b:y-1} |
-C
flag can be passed to zq
to output the parsed query with explicit operators.❯ zq 'over docs
| has(author_name)
| grep(/tuta/, author_name)
| yield author_key' openlibrary.json
["OL369643A"]
❯
jq
experience, piped our previous output to lower. But this didn't work, and I got no output:❯ zq 'over docs
| has(author_name)
| grep(/tuta/, author_name)
| yield author_key | over this | lower' openlibrary.json
❯
❯ zq 'over docs
| has(author_name)
| grep(/tuta/, author_name)
| yield author_key | over this | lower(this)' openlibrary.json
"ol369643a"
❯
has
and grep
functions correctly, but I think because lower
doesn't require an additional argument, I just fell into a jq
habit.lower
properly? If I make a mistake by passing a non-string into it, I'll get an error:❯ zq 'lower(1)'
error({message:"lower: string arg required",on:1})
❯ zq 'yield "HEY" | lower'
❯
-C
flag has the answer:❯ zq -C 'yield "HEY" | lower'
yield "HEY"
| search lower
search
is now the implied operator for the term ‘lower’! It's not parsed as a built-in function because it's also a valid search expression, and because ZqSearchIsFirstClass, which is a good thing in search contexts, that's given the priority.zq
. Remember, if you make a change, and get no output, an Implied Operator is probably in play.zq
version 1.16.0 and jq version 1.7jq
and zq
. Here's an example of anjq -c '[.docs[]
| {title, author_name: .author_name[0], publish_year: .publish_year[0]}
| select(.author_name!=null and .publish_year!=null)
]
| group_by(.author_name)
| [.[] | {author_name: .[0].author_name, count: . | length}]
| sort_by(.count) | reverse | limit(3;.[])' openlibrary.json
zq -j 'over docs
| {title, author_name: author_name[0], publish_year: publish_year[0]}
| has(author_name) and has(publish_year)
| count() by author_name | sort -r count | head 3' openlibrary.json
jq
every program is a series of filters separated by the pipe operator. There are a few places the pipe operator cannot be used, but it's fairly ubiquitous.zq
has a more elaborate and structured syntax. At its highest level, one or more dataflow operators are joined together in a sequence with the pipe character (sequence overview). But within operators, syntax varies and the pipe character isn't used. Field references and expressions are common, and most functions receive arguments passed in parentheses. Function outputs can only be passed to other functions as nested calls, not via the pipe character.operator | operator | ...
who
value is replaced with “me” whenever it's “chrismo”:> zq -j '[{who:"bob"}, {who:"chrismo"}]
| over this
| put who:=replace(who, "chrismo", "me")'
{"who":"bob"}
{"who":"me"}
put
is an operation that sets a field to an expression. replace
is a function taking three string arguments.who
field, we can use the coalesce
function to return an empty string if who
is missing, but we cannot use the pipe character like we could in jq between two functions:❯ zq -j '[{}, {who:"bob"}, {who:"chrismo"}]
| over this
| put who:=((coalesce(who, "") | replace(who, "chrismo", "me"))'
zq: error parsing Zed at line 3, column 39:
| put who:=((coalesce(who, "") | replace(who, "chrismo", "me"))
=== ^ ===
put
is and the pipe character is not valid in an expression**. This can be accomplished by nesting the function calls:[put] <field>:=<expr>
❯ zq -j '[{}, {who:"bob"}, {who:"chrismo"}]
| over this
| put who:=replace(coalesce(who, ""), "chrismo", "me")'
{"who":""}
{"who":"bob"}
{"who":"me"}
jq
code. Recently, I've been checking out zq
and evaluating it as a jq
replacement.zq
in more detail, while it looks similar on the surface, it's pretty different under the hood and I've been tripped up by a few things along the way. The goal with these posts is to try and flatten the learning curve for folks also coming over from jq
.zq
to folks familiar with jq
. I also want to thank Phil Rzewski from the Brim team, who's been incredibly helpful, both in their support Slack and in the GitHub repo, fielding my questions patiently and thoughtfully.Post | tl;dr |
ZqSearchIsFirstClass | Searching is an integral part of aggregation and transformation, but compared to jq , zq search is first class. |
ZqImpliedOperatorsCanTrickYou | Did you make a change and now there's no output? It's probably the search implied operator hiding the fact that you tried to pipe something to a function like you would in jq , but that's not how it works. See previous post. |
ZqPipeCharacter | You can't use | everywhere, esp. not in or between Expressions. Only in-between Dataflow Operators. |
ZqFunctionsNeedParens | Functions always have to be called with parens. If you don't, the search implied operator will getchya. |
ZqUserOperatorsNeedParens | While built-in dataflow operators do not take parens, user-defined ones must. If you don't, (all together now) the search implied operator will getchya. |
ZqLateralSubquery | How to drop down a level of JSON while parsing. |
jq
, why bother? In addition to wrangling JSON data, zq
has a host of additional features that make it not just a decent replacement of jq
, but a solid upgrade.jq
can too with a special option); it supports multiple input & output formats: from csv to Parquet and more (including groking log files a la Logstash); it can naturally join data together in a relational fashion from either files or its own Data Lake, and supports a form of gradual typing of your data with Shaping.[36:28] Daniel Ek, Spotify co-founder: It's actually my co-founder's [Martin Lorentzon] saying. He said this thing, I'm not even sure he was aware that he coined what I think is an iconic quote. He said, “the value of a company is the sum of all problems solved.” Even to this day, it's one of those things that I think about. You may think about all the things you guys went through as all the issues that you went through, but you solved them, one by one. And I think the most important thing that you got right is the integrity of the programming and the shows that you make. At the end of the day, that is the value that you're bringing to that and bringing to consumers and it really served you well in the end.
[37:20] Alex Blumberg: In other words, Matt's and mine constant fighting had produced something valuable, the fighting itself in fact was the thing that made it better. If I hadn't cared about what I cared about and he hadn't cared about what he cared about and we hadn't each cared enough to fight with each other, the company we built, it wouldn't have worked as well.
Slack is a prescription for building a capacity to change into the modern enterprise. It looks into the heart of the efficiency-flexibility quandary: The more efficient you get, the harder it is to change. The book shows managers how to make their organizations slightly less efficient but enormously more effective. It coaches them on the introduction of slack, the missing ingredient required for all change. It counsels a thoughtful use of slack instead of the mindless obsession with elimination of all slack in the interests of efficiency.