Building a Native GraphQL Database: Challenges, Learnings and Future

"GraphQL is not a Graph DB query language --- it's a replacement for REST APIs."

That's my standard opening statement whenever talking about Dgraph's choice of the query language.

Most people who don't know GraphQL assume it's a graph database query language. Most people who know GraphQL wonder how a graph database is being powered by it.

Our focus at Dgraph has always been on helping you build apps using an amazingly advanced graph database. And that extends to making sure Dgraph has a great development experience for modern developers. Other graph databases choose Gremlin, Cypher or write something custom inspired by SQL for their query language. We went a slightly different route in Dgraph, aiming to build a query language that modern developers would find useful and intuitive.

Our journey with GraphQL started almost 4 years ago, when I wrote this email to my ex-manager at Google.

October 22, 2015

I wanted your opinion on Graph languages. I'm working on building Dgraph outside of Google -- distributed low latency graph serving system essentially. This project is going to be open-source after the MVP launch in November. It's still early stage, and I'm debating which graph query language to support. Facebook just launched GraphQL, which seems pretty nicely written to me. But, I've also heard a lot about Gremlin. What do you think of them? I don't want to stretch out too thin and support both, at least at this stage. Which one do you think would be worth aiming at (given it's a new graph database)?

The reply was quick, short and sweet.

I like GraphQL, it has most of the nice properties of MQL[^1]. Gremlin has more Hadoop support if that matters.

[^1]: See here a rough comparison of MQL vs GraphQL.

December 1, 2015

Thanks for your advice! Went with GraphQL, quite like the query language so far. Just released: https://github.com/hypermodeinc/dgraph

And that was how GraphQL became the native query language for a new database called Dgraph. Dgraph v0.1 supported (a version of) GraphQL. Dgraph v1.1, which launched last month, still only supports GraphQL. Neither version ever really achieved spec compliance with GraphQL, but for different reasons. The first was too early to achieve feature parity, the latter had deliberately deviated from the spec.

In this post, we talk about GraphQL and the choices we've made in moving beyond the GraphQL spec to build a real graph database query language. We talk about the common issues GraphQL engineers face and how Dgraph solves them. We then come full-circle back to the GraphQL spec and our next releases which will provide native GraphQL backed by Dgraph.

What is GraphQL

As per the spec, GraphQL is a query language designed to build client applications by providing an intuitive and flexible syntax for describing data requirements. In layman terms, GraphQL allows a way for clients to understand what the server provides and specifically request for data useful to the client.

Typical access patterns for applications tend to be graph shaped. For example, a question has answers, and comments; answers have comments; comments can have comments; all might have likes, and so on. Outside the general understanding, there's no real mention of or association with the backend system being a graph system in the spec. Both the July 2015 draft and the current working draft mention the word "graph" only twice: once referring to a fragment (a concept in GraphQL) and once in a comment.

Despite Lee Byron's reservations, GraphQL is widely assumed to be a REST API replacement. It provides a simple way for developers to retrieve only as much of their strongly-typed dataset as they need, and in the shape in which they need it.

REST API endpoints are fixed. Each endpoint would give you a pre-determined amount of data, irrespective of how much data the client needs. Squeezing efficiency in such a system leads developers to generate an ever-increasing number of endpoints, each returning a contained set of results. The client would then call them in sequence to generate a view for the end-user. There are guidelines around what would be an HTTP POST, what would make a PUT, what would be a GET, and what the return error codes would be. By now, most developers are supposed to know these and be good at determining when to use what. These make for clichéd interview questions, which don't guarantee you a tech job if you get them right, but surely turn the interview negative if you happen to falter.

GraphQL turns all of this on its head. There's only one endpoint, most accesses are POST, all return codes are HTTP 200, with the errors captured in a JSON response. The server publishes what all the clients can access, and the client determines how much it wants to access, depending upon its use case. It is simple and powerful.

Considering GraphQL is put in the same category as REST, gRPC and webhooks, calling it a "QL" (Query Language) is a bit of a stretch. It is really a syntax for describing an API. If you take up GraphQL, you get to build whatever API you like and use GraphQL to describe the API's types and operations. Those things then describe the data that your API can return.

Why choose GraphQL?

GraphQL had never been used as a query language for any database system. We bought the simplicity and power of this system, but also felt like we were taking a risk in going with unproven technology, particularly considering it is a syntax for describing an API (like REST) rather than a query language.

For Dgraph, we needed a graph query language. And a language powerful enough to describe all the sorts of graph queries that we wanted to enable for users. It turns out, however, that a GraphQL-like syntax is a great place to start for two main reasons:

The query structure naturally models the graph that the query is traversing. A GraphQL query is like a structured explanation of the graph you'd like as a result. Other graph query languages and even SQL joins, follow edges in the graph, but return lists of results, losing the relationships between the entities in the response. GraphQL, on the other hand, responds in the same structure as both the graph itself and the query tree, allowing the relationships to stay intact. The bracketing and indentation are so revealing of the query, graph traversal and intended result, that at a glance you know where it is going. So one can, for example, figure out exactly how a director is connected to an actor and through which movie.

The JSON result format of GraphQL not only naturally models the returned graph, but it's also the most useful structure for developers to consume. Graph results that are embedded in other formats are difficult to consume for developers. Special tools and libraries are required. Your language might not have a tool for the format, and it's not easy to pass the result on to other parts of your system because those too need to be dependent on the workings of the format. Those formats get in the way of developers trying to use the graph results. But a GraphQL result works with any JSON parser, in any language, and naturally maps to the types you're using in your programming language. GraphQL is easy to consume and just as easy to pass on, knowing that the next developer is also happy.

We did not even fathom how wildly popular GraphQL would become. While we were focused on building the most advanced graph database in the market, the popularity of GraphQL intermixed with the popularity of Dgraph. One year since the original email went out, an article came out hailing Dgraph as a potential defacto GraphQL database.

Hitting Potholes: The Challenges

We found the article to be encouraging and perfectly captured our efforts. What they said resonated with us, but also exposed some gashes from our year of effort.

GraphQL is not a Database Query Language, but it can be --- Article

By this time, we were having doubts about whether GraphQL can really be a query language for a database. We were no longer convinced that the official spec matched the needs of a database.

Mutations

The original GraphQL draft spec was light on how mutations should be executed. The idea was (and still is) that the client would call mutation functions, with arguments, which are implemented by custom code written in the backend. Expecting a database user to provide a plugin function for data ingest just does not make sense. Moreover, any basic function we could have provided wouldn't work for the thousands of updates Dgraph would see in a single network call.

So, we realized we needed to build something custom into the spec to allow for inserting and modifying data in a standardized way. We achieved this by allowing RDF data input in mutation blocks. This was the first major foray outside the official spec for Dgraph. Later, we extended this to allow JSON inputs.

Variables and Query Blocks

Databases typically allow a way to temporarily hold a result from one step in a query, using it later to execute another step in the query. This is the "SQL AS" query concept --- something that's missing from GraphQL. We introduced this as variables in Dgraph, allowing a user to query for some data and use its result in another portion of the same query.

The interplay of variables and query blocks adds significant query power to Dgraph.

GraphQL realized that avoiding repeated calls to the database is a great practice, and has the concept of query blocks. That is, within the same query { ... } block, many queries (executed concurrently) are allowed. Dgraph's variables extend that further.

Dgraph allows a user to specify many different blocks to not only allow concurrent execution of these blocks for speed improvements, but also to divide the query up into function-like structures, making it easier to read and understand, and to share values between them.

In Dgraph, one block can define variables that are used by other blocks. Dgraph then orders block execution in the right way to run the entire query, sometimes running blocks concurrently, sometimes in serial order, to execute the query in the most efficient way possible.

Dgraph extends the GraphQL syntax so that any part of a query can be collected in a variable. Dgraph adds A as ...some-query-field... to mean variable A takes on the values (nodes or scalar values) calculated by the following field.

Doing this allows running a query that's dependent on a previous result without having to read the result back to the client, deserialize and pass parameters to a new query. It also means you can return multiple related subgraphs from a single query and that you can reshape the result to fit the client's requirement. With Dgraph's built-in aggregation, you can also use variables to build calculations into a query. You can even break out sub results and use those elsewhere.

For example, the query below finds a movie, calculates the number of roles the actors in that movie has had, and then returns that as a result about actors (ordered by the number of roles), rather than a result about movies as per the initial query. All that in one network call!

query {
  var(func:allofterms(name@en, "The Princess Bride")) {
    starring {
      actors as performance.actor {
        roles as count(actor.film)
      }
    }
  }

  totalRoles(func: uid(actors), orderasc: val(roles)) {
    name@en
    numRoles : val(roles)
  }
}

Again that's a great feature for developers using Dgraph: you don't have to orchestrate multiple queries and input parameters and you don't have to dig through a whole result graph to get the data that matters for particular operations; you can just return multiple subgraphs that make sense, deserialize into a model that works for your algorithms and off you go.

Dgraph's query blocks add significant extra query power, while also simplifying queries and reducing the network calls from the client to the server, and this is an easy win for Dgraph users.

Filters and Functions

The first part of starting any query to a database is to reduce the search space. select * from X where Y=Z, reduces the search space to table X and in particular, only rows where column Y has a value Z. select * from * does not exist, because a database would not know where to begin looking.

The GraphQL draft spec had examples which all started with a certain id. While it would be great to know the id of an object, it would be incredibly hard for a developer to code that way. Therefore, we needed a way for a user to give a starting point.

Dgraph introduced the concept of functions, oftentimes powered by indices. For example, a full-text index allows a user to search for entities using anyoftext function. Similarly, a trigram index allows using regexp function.

Furthermore, we needed a way to filter the results, and not only one level of filtering, but allowing the composition of these filters to build an entire filter tree. We introduced brackets and operators to allow for these.

// Dgraph
@filter((anyoftext(title, "jump") OR eq(name, "brown fox")) AND (NOT regexp(title, "A quick.*")))

// (How it might look in) GraphQL
filter: { and: { or: { anyoftext: { title: "jump"}, eq: { name: "brown fox"} }, not: { regexp: { title: "A quick.*" }}}}

GraphQL would typically require each server is to implement its own filters (normally as arguments on fields) and would use JSON input objects. The syntax of which is not as intuitive, particularly when dealing with filter trees like above. For example, compare the Dgraph infix syntax with the JSON-style GraphQL syntax above.

Plus Minus

Dgraph introduced concepts like Facets, which are key-value pairs on edges, adding properties to the edge instead of the node. Among other things, Facets are great at storing weight of an edge to support queries like shortest path, or k-shortest paths.

Dgraph also has many other features from full-text search to geo search, to aggregations, built-in pagination support, and recursive queries. We've packed a whole lot into GraphQL+-.

We also felt that the concept of unions, enums, and interfaces are not what a database should implement natively. For these, and many more similar reasons, we deviated just enough from the GraphQL spec that we could no longer call it GraphQL. So, we switched the name of our query language to GraphQL+-. Plus, because we added things to the query language (vars, blocks, etc.), and Minus, because we removed things from it (unions, enums, etc.)

Ideas were floated to change the name entirely to something else, but we felt there must be a way to converge these two worlds sometime in the future. We did not want to deviate from the spec, we just had to do so to allow continue using GraphQL as our native query language, while building a graph database.

In retrospect, if we really really wanted to, we could have built ugly workarounds to stay within the confines of GraphQL --- but our goal was to have a simple and intuitive query language and that meant pushing the boundaries to see what's possible, particularly at that early stage when shaping the database.

The Learnings

Developers often come to us to ask about spec-compliant GraphQL support. They love Dgraph and what GraphQL+- has to offer, but also want spec compliant GraphQL. It's not something we considered initially because our focus was on a great language and fast query. But we can understand the reason for the question: Developers want the Dgraph experience and query power, but with integration into GraphQL growing ecosystem, tooling and development workflows.

Dgraph's unique approach to graphs solves common GraphQL problems by default, without any special code or handling.

Dgraph Solves GraphQL Problems

GraphQL has bought you a simple language for clients and architecture for your server, but now you've got to engineer solutions around production issues. It's a nice architecture for turning something that's not a graph into an API that serves a graph, but that comes with a host of engineering and performance concerns, in particular, related to the N+1 problem. There's caching, batching and data loading to help, but they aren't a panacea and can introduce latencies and complexity of their own.

All of these issues arise because of the disjoint between the underlying data model and how the GraphQL server must query and present that data.

Query Execution: Avoiding the N+1 Problem

N+1 problem occurs when the GraphQL resolvers execute multiple trips to the database/endpoint to resolve a single query, making one resolver invocation per field invocation in the query. And the number of these round trips are directly proportional to the number of results in the intermediate steps.

query {
  queryQuestions(...) {
    title
    body
    answers {         # N answers
      text
      comments {      # N * M comments
        text
        likes { ... } # N * M * O likes
      }
    }
  }
}

For example, to get all the answers on a question and all their comments, you would have to make one query for a post, one query for its N answers, but then N queries for the comments on each answer. As you go deeper to get likes on these question, answers and comments, the round trips multiply to N*M comments (M comments per answer), and so on...

Many solutions are floated to avoid this problem, the most popular one being the data loader pattern. In the data loader pattern, every invocation to the resolver gets replaced by a promise to go load that piece of data, with the final perform to execute those promises and fetch all those pieces in a batched call.

Depending upon how the implementation is done, this might result in k or more calls to the database/endpoint, where k = number of fields in the GraphQL request. Someone smarter might come along and further reduce the k calls by understanding that title and body resolvers are part of the same SQL table and therefore can be fetched together. But, that'd be an optimization over the same k order of network call complexity.

The k calls can be even worse if the underlying system is distributed. Instead of having a SQL DB running on just one server, it could be a cluster of SQL DBs running independently, put together by a thin layer, or a NoSQL database, etc. In those cases, the data loader pattern would result in more than k network calls as it becomes aware of how the data is sharded, etc.

Dgraph avoids this problem completely. Any GraphQL query over Dgraph can be consumed whole by the database in a single network call. Dgraph has its own lexer and parser for GraphQL, allowing it to parse and execute it in one shot. So, instead of doing many iterations, it can execute the query in as few disk seeks and (in a distributed Dgraph cluster) as few network calls as possible, utilizing the power of Dgraph's unique graph storage and execution. For the end-user, there's no need to write a resolver, no need for a data loading framework, no batching, etc. It all just works with a single call to the DB.

See with Dgraph, instead of trying to squash a graph on top of a Relational or NoSQL database and thus introducing N+1 problems, you can now just query for the subgraph you need and operate over that. You don't have to over fetch or make multiple round trips, you can just ask for the subgraph that's needed to answer the query in a single call and that's efficient and powerful. Particularly, when coupled with the advanced operations that Dgraph introduces to the QL.

Dgraph: The GraphQL Database

Dgraph's basis in GraphQL means that it is the closest thing available to a native GraphQL database. Because we spent years of engineering effort focused on making the world's best graph database, we have as a result, also solved the engineering challenges that GraphQL adopters are hitting and having to build bespoke solutions for. Issues like N+1 problems and scale are solved in the database (including under and over fetching, overloading the endpoint with tons of calls, etc.). So a GraphQL solution built with Dgraph doesn't have the engineering concerns that keep GraphQL engineers up at night.

Dgraph community realizes this power, which is why there has been a demand for official GraphQL spec support. How Dgraph should do GraphQL has become a hotly debated topic in our forums. Dgraph is simultaneously hailed and critiqued due to its love-hate relationship with GraphQL.

Github issues about supporting GraphQL get submitted and are much commented on. Our most commented issue is our 2019 Roadmap, where many of the comments are about GraphQL support, and our second most commented issue is "Make Dgraph work with standard GraphQL".

Ideas are thrown around about how GraphQL+- could become GraphQL compliant. Multiple GraphQL to GraphQL+- conversion layers have been built by our community members (for example, here and here). The demand has become too strong to ignore. Ultimately, we feel interoperability with the growing GraphQL ecosystem is too important for our users to not be addressed.

For a while, we have been thinking about a good way to bridge that gap. Here is a database that began with a version of GraphQL as its native query language. And four years later, this is still the only query language is speaks and executes natively. Meanwhile, the community has built social apps on top of Dgraph. In some cases, replacing their Postgres and MongoDB instances to use Dgraph and gain the simplicity of GraphQL and the power of native GraphQL execution. Dgraph is fast, scalable and solves many common GraphQL problems. Yet, Dgraph is a spectator in the GraphQL arena. That has to change.

Today, we are changing that.

We mentioned folks building a GraphQL to GraphQL+- conversion layer. Michael Compton was one of those folks. When we saw that project, we reached out to him to join the Dgraph team and build an official one. Over the past few months, Michael and his team have been hard at work, putting together a native spec-compliant GraphQL support into Dgraph tapping into the power of GraphQL+-.

We are proud to launch the beta release of Dgraph GraphQL. It is rough around the edges, the website needs work, but it is ready for a spin! So, bring your favorite GraphQL GUI, React library, or language you are using to implement your GraphQL server and check it out here!

The Future

Dgraph has the advantage of a native GraphQL engine and we plan to adapt it to efficiently serve common GraphQL features like subscriptions, live queries and more. Over time, we envision the distinction between GraphQL+- and GraphQL spec would become smaller and smaller, to ultimately us merging the two worlds into one single powerful system.

GraphQL scalability is a work in progress for many companies. We plan to take that problem head-on with our already scalable and distributed engine, solving the needs of a startup which has to build an app using the power of GraphQL, while also solving the needs of a big company, which is struggling with performance issues from existing GraphQL solutions.

We at Dgraph, bet upon GraphQL before it reached ubiquity. We then joined the GraphQL foundation as a founding member. And today, we reassert our faith that GraphQL will change the landscape of how apps are built. At the same time, we recognize that the GraphQL landscape needs performant native solutions, which are not just bolt-ons on top of an unrelated database.

We're also proud to sponsor the GraphQL Summit in San Francisco this week. If you're there, come past our booth and talk to us about Dgraph, our native GraphQL support and all things GraphQL.