Queries as Data

FROM    (
    SELECT  *
    FROM    mytable sample (0.01)
WHERE rownum <= 1000

Let's just admit that SQL is pretty gross. It's ad-hoc, awkward, and so foreign that you find giant SQL queries assembled as template strings (!!) in the wild. It's certainly powerful, but doesn't have very good interfaces to the rest of the world.

Queries deserve to be data. "Turning things that don't seem like data into data so that you can operate on them" is a powerful pattern seen throughout computing history: first-class functions (so useful that we resorted to garbage collection to support them!), "everything is a file" in Unix, code-as-data in Lisp, or even CloudFormation in AWS.

"X should be data" can be a tricky argument to make because people often point out that "X is already data". You can write a C parser in C. You can pass SQL queries around as strings. And they're right - "X should be data" is sloppy shorthand for some harder-to-articulate values like "you should be able to operate on the representation of X in useful ways", or "we should find an internal representation of X that we're comfortable exposing to the user", or better yet "we should try to represent X in way that somehow corresponds to the semantics of what we interpret X to mean".

Another aspect to "turning something into data" is increasing its portability between environments. Lisp has a natural representation in any language that has a concept of linked lists - the parentheses thing is just incidental. You know something's really data when it transcends any one serialization - something definitely not true for SQL.

So let's go on a safari to hunt for a more structured way of representing queries!

Graphs. You've invented graphs.

We ultimately want to query graphs, and we're working on tooling around graphs, and we're arguing that graphs are a good candidate to be the universal fundamental data model from which all other data models inherit. Maybe queries can be graphs too?

RDF blank nodes are just begging to be interpreted as existential quantifiers, which suggests that JSON-LD can be viewed as a highly constrained declarative read-only query language.

This can be implemented as subgraph matching combined with JSON-LD's framing algorithm. Resolving the query means looking for assignments of URIs or literals to all of its blank nodes such that the resulting subgraph is contained in the database. Once a complete set of satisfying assignments has been found, the resulting subgraph can be "framed" by the original query to return a JSON object to the user that's structured the same as the query they gave.

This is implicitly asking for a person named Jane Doe who knows a person named John Doe who has a birthdate

There's not "order" to this query - you can look at it as asking for Jane's friend's John's birthday

	"@context": {
		"@vocab": "<http://schema.org/>"
	"@type": "Person",
	"name": "Jane Doe",
	"knows": {
		"@type": "Person",
		"name": "John Doe",
		"birthDate": {}
	"@id": "<http://people.com/jane>",
	"knows": {
		"@id": "<http://people.com/john>",
		"birthDate": "1969-12-31T23:00:00.000Z"
_:b0 <http://schema.org/knows> _:b1 .
_:b0 <http://schema.org/name> "Jane Doe" .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .
_:b1 <http://schema.org/birthDate> _:b2 .
_:b1 <http://schema.org/name> "John Doe" .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .
_:b0 <- <http://people.com/jane>
_:b1 <- <http://people.com/john>
_:b2 <- "1969-12-31T23:00:00.000Z"
<http://people.com/jane> <http://schema.org/knows> <http://people.com/john> .
<http://people.com/john> <http://schema.org/birthDate> "1969-12-31T23:00:00.000Z" .

GraphQL's wild popularity is a testament to how much developers really like this isomorphic-style interface. SQL is powerful, but a huge barrier for most web developers, who'd much rather use something that looks like JSON to ask for their JSON data.

(related to subgraph isomorphism, but since