ZqLateralSubquery

August 11, 2024
Recently I wondered how zq would compare to jq on one of the most common things I have to do with AWS JSON results: dig out a resource's name from its Tags. This has got to be one of the most annoying decisions AWS has made across the board. I understand names can't be required, but not having a top-level optional attribute vs. pushing it down into Tags seems ridiculous to me.

If you're not familiar with the structure, here's a (very) slimmed down version of an EC2 JSON result:
{
"Reservations": [
{
"Groups": [],
"Instances": [
{
"InstanceId": "i-1234567890abcdefa",
"InstanceType": "t4g.medium",
"Tags": [
{
"Key": "Name",
"Value": "my-groovy-ec2"
}
]
},
{
"InstanceId": "i-9876543210abcdefa",
"InstanceType": "t4g.medium",
"Tags": [
{
"Key": "Name",
"Value": "et-c-tu-brute"
}
]
}
]
}
]
}

How can we pull the Name up from the Tags alongside its InstanceId? Here's a jq way to do it:
❯ aws ec2 describe-instances |
jq -c -r '
.Reservations[].Instances[]
| {name: (.Tags[] | select(.Key == "Name") | .Value), id: .InstanceId}'

{"name":"my-groovy-ec2","id":"i-1234567890abcdefa"}
{"name":"et-c-tu-brute","id":"i-9876543210abcdefa"}

How do we do this with zq? First, let's add some commenting to the jq version so we're sorta tracking where we're at in the navigation of the original document as we go:
❯ aws ec2 describe-instances |
jq -c -r '
.Reservations[] # for each object in the Reservations array...
.Instances[] # for each object in the Res Instances array...
| {
name: # make a new object with a name key, and for its value...
# drop down into the Res Instances Tags array...
( # and find the Value of Name
.Tags[] | select(.Key == "Name") | .Value
),
# back up at the new object with the Instance object
# grab the InstanceId as the value of id
id: .InstanceId
}'

{"name":"my-groovy-ec2","id":"i-1234567890abcdefa"}
{"name":"et-c-tu-brute","id":"i-9876543210abcdefa"}

In zq, how do we do the parenthetical bit of dropping down into the Tags to retrieve the Name while still being able to execute something on the parent to grab the Id?

You can do this with a Lateral Subquery.
“Lateral subqueries provide a powerful means to apply a Zed query to each subsequence of values generated from an outer sequence of values. The inner query may be any Zed query and may refer to values from the outer sequence.”
# Lateral Subquery

❯ aws ec2 describe-instances |
zq -z '
over Reservations
| over Instances
| over Tags with obj=this => ( where Key=="Name"
| {name: this.Value, id:obj.InstanceId} )' -

If we keep reading the docs there, in turns out there's a tighter way to do this, even better!
# Lateral Expression

❯ aws ec2 describe-instances |
zq -z '
over Reservations
| over Instances
| {name:(over Tags | where Key=="Name" | Value), id:InstanceId}' -

What?! Dataflow Operators inside an Expression?! I thought you said that was impossible? And I quote: “the pipe character is not valid in an expression” from ZqPipeCharacter. Well, I didn't lie ... I just hadn't ... learned about Lateral Expressions yet...

Lateral Expressions
“Lateral subqueries can also appear in expression context using the parenthesized form. Note that the parentheses disambiguate a lateral expression from a lateral dataflow operator.”
This is pretty tidy. And in this case, brings us about equal with jq here.



Prev: ZqUserOperatorsNeedParens | Up: LearningZq