SqlSets

September 04, 2003
It's always fascinating to me how simple it is for the human brain to conceive of queries that are then (relatively) difficult to break down for computer consumption.

A high-school teacher friend of mine and I were recently discussing the design of a simple database to track student tardies. A bit into it, we came across a needed query: “Show me all tardies for students with 3 or more tardies where at least one tardy has not been previously reported on.”**

What the brain is able to quickly express as one set is actually composed of three sets:

- First, show me all students with three or more tardies.
- Second, show me all students with tardies in the first set that have not been reported.
- Third, show me all tardies for students in the second set.

What's remarkable to me is that this step of breaking down the final set into its components is generally difficult for people to do, yet it's not hard to come up with the original requirement.

In addition to it being hard to break down, it's easy during the breakdown to get it wrong. For example, I might tell the computer to do the following:

- First, show me all students with three or more tardies where the tardy hasn't been reported.

Is that right? Well, probably not. There's a good chance a few different SQL statements might come out of that English statement, but I'd probably write one that gave me this:

- All students with three or more tardies, where all tardies are unreported.

That's not the same thing.

This also brings to mind the importance of testing queries, because if the above, incorrect query were written against a large enough dataset, it could be easily mistaken for the correct query. In many cases, I don't end up doing a full blown test of the query on sample sets, but I do usually write some spot-check queries to help expose some of the assumptions. For example, in this case, finding a student with 3 reported tardies and a 4th unreported one, and making sure they show up in the final results.

Why am I writing about all this? To help right the wrongs of unreasonable expectations.

Many times I find myself and/or the people I work with frustrated at how complex the solution to a seemingly simple problem has become. The frustration seems rooted in inflated expectations. “If it was simple for me to devise the requirement, why is it so blastedly difficult to get the computer to provide it?” Because it just is, and this small example is a good window into the great divide between the human brain and the pedantic computer.


**Now that I re-read the original statement, I realize that even that one was clarified from the original original statement from my friend which came about, ironically, in two statements: “I need to see all students who have had three or more tardies, and send the student a report of those tardies. Then I need to make sure I don't resend a tardy report to the same student, unless they've accrued additional tardies.”

It was a no-brainer for him to come up with his requirements -- it's his job, he knows it well. It then took a little bit of work to get the query down to one English statement.



tags: ComputersAndTechnology ThePedanticComputer