LearningZq

July 07, 2024
Over the past couple of years in my ops work, I've built up a fair amount of jq code. Recently, I've been checking out zq and evaluating it as a jq replacement.

As I've been learning zq in more detail, while it looks similar on the surface, it's pretty different under the hood and I've been tripped up by a few things along the way. The goal with these posts is to try and flatten the learning curve for folks also coming over from jq.

For a more thorough introduction, the Brim team has a wonderful zq tutorial that helps introduce zq to folks familiar with jq. I also want to thank Phil Rzewski from the Brim team, who's been incredibly helpful, both in their support Slack and in the GitHub repo, fielding my questions patiently and thoughtfully.

While I've tried to be accurate in what is written here, these posts are not designed to be a comprehensive comparison of the two tools. I'd love any feedback, corrections, or clarifications you have. Check out my ContactInfo for how to reach me.

Posttl;dr
ZqSearchIsFirstClassSearching is an integral part of aggregation and transformation, but compared to jq, zq search is first class.
ZqImpliedOperatorsCanTrickYouDid you make a change and now there's no output? It's probably the search implied operator hiding the fact that you tried to pipe something to a function like you would in jq, but that's not how it works. See previous post.
ZqPipeCharacterYou can't use | everywhere, esp. not in or between Expressions. Only in-between Dataflow Operators.
ZqFunctionsNeedParensFunctions always have to be called with parens. If you don't, the search implied operator will getchya.
ZqUserOperatorsNeedParensWhile built-in dataflow operators do not take parens, user-defined ones must. If you don't, (all together now) the search implied operator will getchya.
ZqLateralSubqueryHow to drop down a level of JSON while parsing.


So, with all of these quirks coming over from jq, why bother? In addition to wrangling JSON data, zq has a host of additional features that make it not just a decent replacement of jq, but a solid upgrade.

It performs better and streams data by default (jq can too with a special option); it supports multiple input & output formats: from csv to Parquet and more (including groking log files a la Logstash); it can naturally join data together in a relational fashion from either files or its own Data Lake, and supports a form of gradual typing of your data with Shaping.