Piloting a web browser with GraphQL

July 19, 2017 0 Comments

Piloting a web browser with GraphQL



Driving the complex and confusing web drivers out there

Up until this point, browser automation and programmatic access has been a precarious challenge. The browser vendors that did offer programmatic access (PhantomJS, Casper and others) were a step in the right direction, but had challenges keeping up with the modern web. Once you did land on a browser, you were at the mercy of the available libraries out there for it. Mix in the noise of HTML, the rapidly changing JavaScrip API’s, and the frequent release of new frameworks; and you can begin to see the reason why most folks won’t even touch anything related to programmatic browsers.

In order to offer universal access to live browser, across multitudes of languages, a well-understood protocol is a “must”. The obvious choice for this task is HTTP, as it’s ubiquitous among runtimes, with a helping of REST usually. While most would agree that HTTP is likely the best candidate, the interface begins to crumble when good-old REST enters the picture.

Imagining a browser interaction with REST


Just the simple action of loading a webpage and clicking a link is lost in the mire of the URL (and we haven’t even encoded the link!). Could you even imagine trying to implement such an interface?

The next naive step is to POST some sort of a string describing the interaction, but there’s quite a few tradeoffs in that as well. Crafting your own domain-specific-language puts you, as the author, in the hot-seat once consumers start requesting some semblance of tooling. Things like type-ahead, linting, and other conveniences will need to be created in order to streamline developer experience. What are we left with?

This is where GraphQL comes into the picture. It’s a well-defined and understood protocol that leaves almost nothing desired. With it you get all of the conveniences you could want: documentation, type-ahead, linting, and breaking-change detection all for free. Atop of all the tooling, the protocol lends itself quite nicely for interaction with a browser. Queries themselves imitate functions in most languages, giving users the flexibility to craft elaborate workflows with little syntactical overhead.

Imagining a browser interaction in GraphQL

navigate(url: "http://localhost:1337";)
click(selector: ".my-link")

Aside from the concise syntax lurks another feature: types. Since the protocol treats types as “first-class”, there’s little developers need to learn in order to be productive. There’s really never a need to change contexts once you’ve started using GraphiQL, as it has all the information you need to be productive:

Defining higher-level abstractions

query surfNClick ($url: String, $button: String) {
goto (url: $url)
click(selector: $button)

With these abstracted queries we get a few obvious benefits. The first is that our queries line-up exactly with what the service provides. The second is the fact that we can now reuse this query elsewhere in an application (such as functional tests). Prior to GraphQL, abstracting HTTP calls was an exercise left to the consumer, whereas now that’s no longer the case.

The only challenge remaining now is to support such an interface with a flexible API.

Navalia was originally written to manage and scale browser-based work. It’s an recently open-sourced TypeScript project targeted at the JavaScript environment. With it, you get a simple API to interact with a browser.

Simple goto and click

const { Chrome } = require('navalia');
const chrome = new Chrome();
.then(() => chrome.click('.buy-now'));

And a more elaborate API to queue and manage work.

Executing jobs against a pool of browsers

const { Navalia } = require('navalia');
const navalia = new Navalia();
// Some time later
navalia.run((chrome) => {

Navalia fulfills our need for a robust API to interact with browser, and lines up with GraphQL well: all actions are asynchronous, and return aPromise, making composition and reuse trivial. The only remaining challenge is tying to the two together.

I’m extremely excited to announce that version 0.1.0 of Navalia, just now published, ships with a GraphQL front-end! With it you can now easily query, interact, and extract information inside of headless Chrome; with more vendors coming soon.

Navalia UI

This implementation has all of the features you’ll likely want:

  1. An interactive client (GraphiQL) to author and test requests
  2. Documentation and types all available in one place
  3. A painless way to interact with a browser over HTTP

It also exposes more imperative features like screenshots and pdf generation, so those tasks can easily be automated as well. I’d encourage you to head over to GitHub to check it out, or download it on npm!

With all of the features and conveniences that now come bundled with Navalia, there’s still some features that aren’t there:

  • subscriptions for events (things like network requests and so on)
  • More control over failure scenarios and timeouts
  • Live “replaying” of the queries as you author them

It’s my hope that the open-source community will gather behind and push this effort forward. Let’s make the term “web-driver” something to be excited about, and not shy away from.

Tag cloud