Simon Willison has written about his entire publishing pipeline, which is a wonderful combination of impressive and drop-dead simple. That was one inspiration.
I've also been experimenting with Claude Code Cloud in the last week or two (I think they officially call it “on the web”), inspired by Simon Willison's recent vibespiling posts.
Certain work lends itself to the cloud, I feel like, and there's a lot less prompting there, so you can run and get stuff done with fewer interruptions. Of course, with the trade-off that it might run awhile, going down the wrong track and can make a bigger mess.
The setup: I've got a new dedicated private repo for all my potential content. The goal is for pushes to the
main branch of the repo to auto-schedule posts to my socials via Publer's configured time slots. No more manual copy-paste (not that I was even doing that.)We probably could have knocked this all out in one Claude Cloud session, but we kept running into network blocks trying to work out some details with the Publer API.
The main hangup I think was that network allowlist changes in Claude Code Cloud don't seem to apply to sessions that are already running? I kept getting blocked curl requests even after I added publer.com to the allowlist.
I'm still not 100% sure that was the issue, but eventually I just gave it full network access and started a fresh session, and that finally got things moving.
Once we got past the network stuff, we hit a bug: Publer uses
Bearer-API as their auth header instead of Bearer (which is more common in my experience). There's a typo in their Getting Started example that shows plain Bearer, which is what we ran across first.Claude eventually figured it out by fetching more of the API docs and then just trying different header variations until one actually worked. Kind of a brute force approach but, it got the job done.
#ClaudeCode #Publer #ContentAutomation
zq would compare to jq on one of the most common things I have to do with AWS JSON results: dig out a resource's name from its Tags. This has got to be one of the most annoying decisions AWS has made across the board. I understand names can't be required, but not having a top-level optional attribute vs. pushing it down into Tags seems ridiculous to me.If you're not familiar with the structure, here's a (very) slimmed down version of an EC2 JSON result:
{
"Reservations": [
{
"Groups": [],
"Instances": [
{
"InstanceId": "i-1234567890abcdefa",
"InstanceType": "t4g.medium",
"Tags": [
{
"Key": "Name",
"Value": "my-groovy-ec2"
}
]
},
{
"InstanceId": "i-9876543210abcdefa",
"InstanceType": "t4g.medium",
"Tags": [
{
"Key": "Name",
"Value": "et-c-tu-brute"
}
]
}
]
}
]
}
How can we pull the
Name up from the Tags alongside its InstanceId? Here's a jq way to do it:❯ aws ec2 describe-instances |
jq -c -r '
.Reservations[].Instances[]
| {name: (.Tags[] | select(.Key == "Name") | .Value), id: .InstanceId}'
{"name":"my-groovy-ec2","id":"i-1234567890abcdefa"}
{"name":"et-c-tu-brute","id":"i-9876543210abcdefa"}
How do we do this with
zq? First, let's add some commenting to the jq version so we're sorta tracking where we're at in the navigation of the original document as we go:❯ aws ec2 describe-instances |
jq -c -r '
.Reservations[] # for each object in the Reservations array...
.Instances[] # for each object in the Res Instances array...
| {
name: # make a new object with a name key, and for its value...
# drop down into the Res Instances Tags array...
( # and find the Value of Name
.Tags[] | select(.Key == "Name") | .Value
),
# back up at the new object with the Instance object
# grab the InstanceId as the value of id
id: .InstanceId
}'
{"name":"my-groovy-ec2","id":"i-1234567890abcdefa"}
{"name":"et-c-tu-brute","id":"i-9876543210abcdefa"}
In
zq, how do we do the parenthetical bit of dropping down into the Tags to retrieve the Name while still being able to execute something on the parent to grab the Id?You can do this with a Lateral Subquery.
“Lateral subqueries provide a powerful means to apply a Zed query to each subsequence of values generated from an outer sequence of values. The inner query may be any Zed query and may refer to values from the outer sequence.”
# Lateral Subquery
❯ aws ec2 describe-instances |
zq -z '
over Reservations
| over Instances
| over Tags with obj=this => ( where Key=="Name"
| {name: this.Value, id:obj.InstanceId} )' -
If we keep reading the docs there, in turns out there's a tighter way to do this, even better!
# Lateral Expression
❯ aws ec2 describe-instances |
zq -z '
over Reservations
| over Instances
| {name:(over Tags | where Key=="Name" | Value), id:InstanceId}' -
What?! Dataflow Operators inside an Expression?! I thought you said that was impossible? And I quote: “the pipe character is not valid in an expression” from ZqPipeCharacter. Well, I didn't lie ... I just hadn't ... learned about Lateral Expressions yet...
Lateral Expressions
“Lateral subqueries can also appear in expression context using the parenthesized form. Note that the parentheses disambiguate a lateral expression from a lateral dataflow operator.”This is pretty tidy. And in this case, brings us about equal with jq here.
TODO: It's also better because it still retains instances without a name.
Prev: ZqUserOperatorsNeedParens | Up: LearningZq
examples use
zq version 1.16.0 and jq version 1.7I've made this the first post in this series, even though it's about the 5th one I've written, because it really seems to be the foundation of my
jq → zq learning curve gotchyas.In service of making search easy, the
search Dataflow Operator is not only an Implied Operator but it is the “primary” implied operator, the first one to be evaluated by the parser (see also ZqImpliedOperator).Let's take it for a spin. For these examples, I'll be working with a dataset of English movies since 1900 scraped from Wikipedia, downloaded like so:
#!/usr/bin/env bash
echo "Downloading..."
curl "https://raw.githubusercontent.com/prust/wikipedia-movie-data/91ae721d/movies.json" \
>raw.json
echo "Flattening single array into separate lines:"
echo "- Processing to movies.zng..."
zq -f zng 'over this' raw.json >movies.zng
echo "- Processing to movies.json..."
jq -c '.[]' raw.json >movies.json
echo "Removing raw.json"
rm raw.json
The processing extracts each object/record out of the array onto its own line, which saves, in many cases, the step of iterating over it. For
zq, it also saves the data into a binary Zed format called ZNG (“zing”) that performs much better (see also ZqWhereDoYouGetTheseNames).Kevin Bacon seems like a fairly prominent figure in Hollywood, let's go bacon hunting!
❯ zq 'Bacon' movies.zng
...
...[snip a bacon-ton of output]...
...
{title:"Leave the World Behind",year:2023...}
That's a lot of output. How about just titles in the last couple years?
❯ zq 'year >= 2022 | Bacon | yield this.title' movies.zng
"They/Them"
"One Way"
"Smile"
"Space Oddity"
"Leave the World Behind"
That's better.
But wait, “Smile” isn't quite right, is it?
❯ zq 'year >= 2022 | Bacon | ! Kevin | cut title, year, cast' movies.zng
{title:"Smile",year:2022,
cast:["Sosie Bacon","Jessie T. Usher","Kyle Gallner",↩
"Robin Weigert","Caitlin Stasey","Kal Penn","Rob Morgan"]}
Right! Sosie is Kevin's daughter.
I wonder what other non-Kevin Bacon is in the data?
❯ zq 'Bacon | ! Kevin | cut title, year' movies.zng
...
...[snip another bacon-ton of output]...
...
That's a lot of olde-timey bacon -- can we summarize it?
❯ zq -j 'Bacon | ! Kevin | cut title, year
| decade:=(year/10)*10 | count() by decade | sort decade' movies.zng
{"decade":1910,"count":7}
{"decade":1920,"count":19}
{"decade":1930,"count":43}
{"decade":1940,"count":31}
{"decade":1950,"count":13}
{"decade":2020,"count":1}
Who are these other Bacons?
‘Bacon | ! Kevin | year < 1960 | cut title, cast' is too much output, but scanning it doesn't show many Bacon names at all. It must be in the Extract field (unless “Bacon” is a genre? And why shouldn't it be?! mmmm, bacon)First, the Bacons that are in title and cast:
❯ zq 'year < 1960 | cut title, cast | Bacon' movies.zng
{title:"The Fireman",cast:["Charlie Chaplin","Edna Purviance","Lloyd Bacon"]}
{title:"The Floorwalker",cast:["Charlie Chaplin","Edna Purviance","Eric Campbell","Lloyd Bacon"]}
{title:"The House of Intrigue",cast:["Mignon Anderson","Lloyd Bacon"]}
{title:"The Girl in the Rain",cast:["Anne Cornwall","Lloyd Bacon"]}
{title:"The Greater Profit",cast:["Edith Storey","Pell Trenton","Lloyd Bacon"]}
{title:"Hearts and Masks",cast:["Elinor Field","Lloyd Bacon","Francis McDonald"]}
{title:"Bringin' Home the Bacon",cast:["Jay Wilsey","Jean Arthur"]}
{title:"Branded Men",cast:["Ken Maynard","June Clyde","Irving Bacon"]}
...gives us Lloyd Bacon, and the 1924 movie Bringin’ Home the Bacon.
Now just Bacon from the
extract field:❯ zq 'year < 1960 | cut extract | Bacon
| bacon:=regexp_replace(this.extract, /.*\W(\w+? Bacon).*/, "$1")
| cut bacon | sort | uniq' movies.zng
{bacon:"Daskam Bacon"}
{bacon:"David Bacon"}
{bacon:"Irving Bacon"}
{bacon:"Lloyd Bacon"}
{bacon:"the Bacon"}
...with “the Bacon” being our previously found title:
❯ zq '"the Bacon" | cut title' movies.zng
{title:"Bringin' Home the Bacon"}
Alrighty. I think I've had my fill. Good breakfast!
The main point here is to illustrate how simple many of these queries are to write because
zq -- in addition to its many other features -- is designed to be a search tool, and because search is the primary Implied Operator, meaning you don't need to type out search in contexts where the tool presumes that's what you meant.But this sets up what was, for me, a fairly consistent gotchya in my
zq learning curve. Click on through to ZqImpliedOperatorsCanTrickYou for more on that.A couple final things for this post:
• There's a lot more tooling in
zq for doing data exploration and normalization. The Zed docs offer a couple of great tutorials that go into more detail: Real-World GitHub Data & Zed and Schools Data.• I think it'd be interesting to do a deeper dive comparison on searching in
zq vs jq in a different post, but we can take a peek in that direction here before we sign off.I'm sure you can do all of these same queries in
jq, but even the basic zq search operator doesn't seem to be as easy in jq, to search through all of the values of objects. Here's what I've come up with so far:❯ jq 'with_entries(select(.value | tostring | contains("Bacon")))
| select(. != {})' movies.json
We can put that into a file and re-use it like this:
❯ cat search.jq
def search(term):
with_entries(select(.value | tostring | contains(term))) |
select(. != {});
❯ jq 'include "search"; search("Bacon")' movies.json
...
But more time would need to be spent to do a more thorough side-by-side comparison.
Up: LearningZq | Next: ZqImpliedOperatorsCanTrickYou
examples use zq version 1.16.0.
In
zq you can define your own dataflow operators and functions. All functions need to be called with parens, even argumentless ones, whether built-in or user-defined.Operators, however, are quirky in that built-in ones cannot be called with parens, but user-defined ones must be.
First off, when do I need to create a function vs. an operator?
The syntax for a user-defined function is:
func <id> ( [<param> [, <param> ...]] ) : ( <expr> )
“where
<id> and <param> are identifiers and <expr> is an expression that may refer to parameters but not to runtime state such as this.”The syntax for a user-defined operator is:
op <id> ( [<param> [, <param> ...]] ) : (
<sequence>
)
“where
<id> is the operator identifier, <param> are the [optional] parameters for the operator, and <sequence> is the chain of operators (e.g., operator | ...) where the operator does its work.” And it can refer to this.To over-simplify:
Though it's more precise to say the body of a user-defined function is an Expression, and the body of a user-defined operator is a Sequence of Dataflow Operators.
User-defined Function Can only call other functions. Cannot refer to this.User-defined Operator Can call operators and functions. Can refer to this.
Let's make a user-defined operator.
❯ zq -z \
'op add_ids(): (
over this
| yield {id:count(), ...this}
)
yield [{name:"lorem"},{name:"ipsum"}] | add_ids()'
{id:1(uint64),name:"lorem"}
{id:2(uint64),name:"ipsum"}
The user-defined operator
add_ids will take an array of records and add an auto-incrementing id integer to each one.There's some other cool
zq things going on here, but for now we'll just focus on calling the operator, with parens. A call without, and again this will fall to the search ImpliedOperator:❯ zq -z \
'op add_ids(): (
over this
| yield {id:count(), ...this}
)
yield [{name:"lorem"},{name:"ipsum"}] | add_ids'
❯
add_ids without parens is interpreted by zq as search add_ids and the string “add_ids” can't be found in the array of two records.Prev: ZqFunctionsNeedParens | Up: LearningZq | Next: ZqLateralSubquery
examples use
zq version 1.16.0 and jq version 1.7Knowing the difference between an operator and a function in Zed can be confusing to me because there's not a hard distinction like that in
jq (not to mention “operators” in jq docs refer to things like + and -, and zq “dataflow operators” are things like over, yield, and put).In addition, functions with
zq always need to be called with parentheses, even if it has no arguments (like the now function). But in jq, argumentless functions receive their input via the pipe | operator and don't have parens, like Dataflow Operators in zq.❯ echo "[1,2,3]" | jq 'length'
3
❯ echo "[1,2,3]" | jq 'length()'
jq: error: syntax error, unexpected ')' ... at <top-level>, line 1:
length()
jq: 1 compile error
While I don't have any advice on how to “just know” when a built-in
zq “thing” is a dataflow operator vs. a function (I generally have to look all those things up anyway with either tool) the rule to keep in mind in zq is if it's a function, it's gotta have parens.❯ echo '{a:null}' | zq 'now() | {a:this}' -
{a:2024-07-12T14:25:24.70848Z}
If it doesn't have parens, then, in many cases it won't even error out, it'll fall back to the
search implied operator. (see ZqImpliedOperator).❯ echo '{a:null}' | zq 'now | {a:this}' -
❯
Prev: ZqPipeCharacter | Up: LearningZq | Next: ZqUserOperatorsNeedParens
examples use
zq version 1.16.0 and jq version 1.7A subset of
zq operators are Implied Operators, which means “Zed allows certain operator names to be optionally omitted when they can be inferred from context.” The operators are evaluated in this order:| Evaluation | Operator | Implied | Explicit |
| search expression | search | foo | search foo |
| boolean expression | where | a >= 1 | where a >= 1 |
| field assignment | put | a:=x+1,b:=y-1 | put a:=x+1,b:=y-1 |
| aggregation | summarize | count() | summarize count() |
| expression | yield | {a:x+1,b:y-1} | yield {a:x+1,b:y-1} |
Note: The
-C flag can be passed to zq to output the parsed query with explicit operators.In many contexts this is really helpful (see ZqSearchIsFirstClass), but as I've been learning zq, it's been confusing at times.
I was experimenting with the example data found in this Brim Data article and put together this query:
❯ zq 'over docs
| has(author_name)
| grep(/tuta/, author_name)
| yield author_key' openlibrary.json
["OL369643A"]
❯
Browsing the docs for some functions, I decided to try out lower function, and as is my wont given my
jq experience, piped our previous output to lower. But this didn't work, and I got no output:❯ zq 'over docs
| has(author_name)
| grep(/tuta/, author_name)
| yield author_key | over this | lower' openlibrary.json
❯
I realized I made an error in how to call the function (see ZqFunctionsNeedParens and ZqPipeCharacter), and fixed it accordingly:
❯ zq 'over docs
| has(author_name)
| grep(/tuta/, author_name)
| yield author_key | over this | lower(this)' openlibrary.json
"ol369643a"
❯
I'm aware that I wrote both the
has and grep functions correctly, but I think because lower doesn't require an additional argument, I just fell into a jq habit.But I am still curious. Why didn't I get an error or something letting me know I wasn't using
lower properly? If I make a mistake by passing a non-string into it, I'll get an error:❯ zq 'lower(1)'
error({message:"lower: string arg required",on:1})
So, why no error here?
❯ zq 'yield "HEY" | lower'
❯
Our friend the
-C flag has the answer:❯ zq -C 'yield "HEY" | lower'
yield "HEY"
| search lower
Ahh. The real reason this returns nothing, not even an error, is
search is now the implied operator for the term ‘lower’! It's not parsed as a built-in function because it's also a valid search expression, and because ZqSearchIsFirstClass, which is a good thing in search contexts, that's given the priority.While this example is a bit contrived, hopefully it highlights a frequent experience for me while learning
zq. Remember, if you make a change, and get no output, an Implied Operator is probably in play.Prev: ZqSearchIsFirstClass | Up: LearningZq | Next: ZqPipeCharacter
examples use
zq version 1.16.0 and jq version 1.7The pipe character appears quite frequently in both
jq and zq. Here's an example of anequivalent query in both:
jq -c '[.docs[]
| {title, author_name: .author_name[0], publish_year: .publish_year[0]}
| select(.author_name!=null and .publish_year!=null)
]
| group_by(.author_name)
| [.[] | {author_name: .[0].author_name, count: . | length}]
| sort_by(.count) | reverse | limit(3;.[])' openlibrary.json
zq -j 'over docs
| {title, author_name: author_name[0], publish_year: publish_year[0]}
| has(author_name) and has(publish_year)
| count() by author_name | sort -r count | head 3' openlibrary.json
(Examples taken from this Brim Data article comparing zq and jq).
In
jq every program is a series of filters separated by the pipe operator. There are a few places the pipe operator cannot be used, but it's fairly ubiquitous.zq has a more elaborate and structured syntax. At its highest level, one or more dataflow operators are joined together in a sequence with the pipe character (sequence overview). But within operators, syntax varies and the pipe character isn't used. Field references and expressions are common, and most functions receive arguments passed in parentheses. Function outputs can only be passed to other functions as nested calls, not via the pipe character.operator | operator | ...
In this example, the
who value is replaced with “me” whenever it's “chrismo”:> zq -j '[{who:"bob"}, {who:"chrismo"}]
| over this
| put who:=replace(who, "chrismo", "me")'
{"who":"bob"}
{"who":"me"}put is an operation that sets a field to an expression. replace is a function taking three string arguments.If not all records have a
who field, we can use the coalesce function to return an empty string if who is missing, but we cannot use the pipe character like we could in jq between two functions:❯ zq -j '[{}, {who:"bob"}, {who:"chrismo"}]
| over this
| put who:=((coalesce(who, "") | replace(who, "chrismo", "me"))'
zq: error parsing Zed at line 3, column 39:
| put who:=((coalesce(who, "") | replace(who, "chrismo", "me"))
=== ^ ===The syntax for
put is and the pipe character is not valid in an expression**. This can be accomplished by nesting the function calls:[put] <field>:=<expr>
❯ zq -j '[{}, {who:"bob"}, {who:"chrismo"}]
| over this
| put who:=replace(coalesce(who, ""), "chrismo", "me")'
{"who":""}
{"who":"bob"}
{"who":"me"}**well ... there is an exception to this, but that's covered in ZqLateralSubquery.
Prev: ZqImpliedOperatorsCanTrickYou | Up: LearningZq | Next: ZqFunctionsNeedParens
jq code. Recently, I've been checking out zq and evaluating it as a jq replacement.As I've been learning
zq in more detail, while it looks similar on the surface, it's pretty different under the hood and I've been tripped up by a few things along the way. The goal with these posts is to try and flatten the learning curve for folks also coming over from jq.For a more thorough introduction, the Brim team has a wonderful zq tutorial that helps introduce
zq to folks familiar with jq. I also want to thank Phil Rzewski from the Brim team, who's been incredibly helpful, both in their support Slack and in the GitHub repo, fielding my questions patiently and thoughtfully.While I've tried to be accurate in what is written here, these posts are not designed to be a comprehensive comparison of the two tools. I'd love any feedback, corrections, or clarifications you have. Check out my ContactInfo for how to reach me.
| Post | tl;dr |
| ZqSearchIsFirstClass | Searching is an integral part of aggregation and transformation, but compared to jq, zq search is first class. |
| ZqImpliedOperatorsCanTrickYou | Did you make a change and now there's no output? It's probably the search implied operator hiding the fact that you tried to pipe something to a function like you would in jq, but that's not how it works. See previous post. |
| ZqPipeCharacter | You can't use | everywhere, esp. not in or between Expressions. Only in-between Dataflow Operators. |
| ZqFunctionsNeedParens | Functions always have to be called with parens. If you don't, the search implied operator will getchya. |
| ZqUserOperatorsNeedParens | While built-in dataflow operators do not take parens, user-defined ones must. If you don't, (all together now) the search implied operator will getchya. |
| ZqLateralSubquery | How to drop down a level of JSON while parsing. |
So, with all of these quirks coming over from
jq, why bother? In addition to wrangling JSON data, zq has a host of additional features that make it not just a decent replacement of jq, but a solid upgrade.It performs better and streams data by default (
jq can too with a special option); it supports multiple input & output formats: from csv to Parquet and more (including groking log files a la Logstash); it can naturally join data together in a relational fashion from either files or its own Data Lake, and supports a form of gradual typing of your data with Shaping.- Brooks's Law - Adding people to a late software project makes it later.
- Lewin's Equation - B = f(P, E). An individual’s behavior (B) is a function (f) of the the person (P), including their history, personality and motivation, and their environment (E), which includes both their physical and social surroundings.
- Conway's Law - Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure.
- Goodhart's Law - Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.
- Hyrum's Law - With a sufficient number of users of an API all observable behaviors of your system will be depended on by somebody.
- Gall's Law - A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system.
- McNamara Fallacy - The first step is to measure whatever can be easily measured. This is OK as far as it goes. The second step is to disregard that which can't be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can't be measured easily really isn't important. This is blindness. The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide.
- Seven Deadly Diseases of Management - Lack of constancy of purpose; Emphasis on short-term profits; Annual performance reviews; Manager job hopping within an org; Relying solely on measurable metrics (“the most important figures that one needs for management are unknown or unknownable”); Excessive medical costs; Excessive liability costs.
- deming.org - Every system is perfectly designed to get the results it gets.
- 85/15 Rule - So ... I can't actually trace this back to any reliable source...
- Psychological Safety - “Drive out fear, so that everyone may work effectively for the company.” -- Deming
- Usability Tests only need 5 Users
- Laws of Cooperation - A long list of laws assembled by @bartlog.
These are great. Some of my favs:
- Aesthetic Usability Effect - Users often perceive aesthetically pleasing design as design that’s more usable.
- Hick's Law - The time it takes to make a decision increases with the number and complexity of choices.
- Jakob's Law - Users spend most of their time on other sites. This means that users prefer your site to work the same way as all the other sites they already know.
- Peak-End Rule - People judge an experience largely based on how they felt at its peak and at its end, rather than the total sum or average of every moment of the experience.
- Postel's Law - Be liberal in what you accept, and conservative in what you send.
- Zeigarnik Effect - People remember uncompleted or interrupted tasks better than completed tasks.
The last season of Startup, Gimlet's first podcast, discusses their acquisition by Spotify, and it has some interesting behind-the-scenes stuff (maybe not as much as I'd like, but ...)
One of the ongoing themes is the tension between Alex and Matt, the co-founders of Gimlet. Alex cares more about the quality of the product and Matt cares more about the business needs, and Gimlet was struggling, even in the quarter preceding the acquisition. Neither of them were handling the conflict very well.
In this episode, Thanksgiving in Stockholm, Alex is talking with Daniel Ek, CEO of Spotify, after the acquisition has happened to reflect on prior conversations they'd had before.
[36:28] Daniel Ek, Spotify co-founder: It's actually my co-founder's [Martin Lorentzon] saying. He said this thing, I'm not even sure he was aware that he coined what I think is an iconic quote. He said, “the value of a company is the sum of all problems solved.” Even to this day, it's one of those things that I think about. You may think about all the things you guys went through as all the issues that you went through, but you solved them, one by one. And I think the most important thing that you got right is the integrity of the programming and the shows that you make. At the end of the day, that is the value that you're bringing to that and bringing to consumers and it really served you well in the end.
Then Alex sums up his thoughts in a way I don't quite agree with.
[37:20] Alex Blumberg: In other words, Matt's and mine constant fighting had produced something valuable, the fighting itself in fact was the thing that made it better. If I hadn't cared about what I cared about and he hadn't cared about what he cared about and we hadn't each cared enough to fight with each other, the company we built, it wouldn't have worked as well.
It's the caring that matters, not the fighting. It's possible to combine caring with good conflict resolution skills and minimize the fighting. Then things are even better.