To Create An Evolvable API, Stop Thinking About URLs

February 14, 2019 0 Comments

To Create An Evolvable API, Stop Thinking About URLs



The picture of a male peafowl standing on the grass. The bird is fanning its tail, displaying all the feathers. The feathers are enormous and contain eyespots. According to Charles Darwin, the peafowl has developed this setup to attract females.

An evolvable API is an API that can change with the least amount of effort without speculation or over-engineering. It’s a way to model the communication between two computers in the form of a conversation. Most APIs can leverage the power of evolvability. However, as there’s no silver bullet, a mere Remote Procedure Call may be enough for what you’re trying to achieve.

This post describes the fundamentals of an evolvable API. It shows how you can model a conversation between the client and the server to maintain the state of the conversation using the payload of the network requests. It also talks about how a uniform interface is essential to decouple client and server and allow independent systems evolution.

Imagine the website of a clinic where patients can book an appointment with a doctor. You need to develop an API so that third-party developers can integrate with the booking system of this clinic. The API should be able to change as new business requirements emerge.

For this post, let's ignore authentication and only focus on business requirements.

Nowadays, a common solution is to create an end-point that accepts an HTTP POST request in a /booking end-point. The request takes two parameters indicating the patient username and the intended doctor. Also, it takes another two parameters to represent the time they want to book and the contents of that booking.

Here's how the code looks like for a request to book an appointment at 10:20 in the morning:

A pseudo-code that shows an HTTP request using the JavaScript “fetch” API to the URL "" using the method "post." The request body contains the field "username" as "mary.doe," the field "date time" as "May 1st, 2018 at 10:20 AM" in the ISO8601 format, the field "intended doctor" as "jane," and the field "booking" as "Consultation with Doctor Jane."

One week later, business rules change. Now the doctor "Jane" only works in 15 minutes time slots instead of 30 minutes and the working day starts at 10 AM. The time of 10:20 AM is now invalid.

One common solution to handle this change is to introduce a new version to the end-point. If clients try to book an invalid date, the new version responds with a validation status code 422 Unprocessable Entity without breaking clients using the old version.

Another alternative is to create a GET /booking end-point. The response returns the time slots the doctor Jane has available so that clients can analyze and pick the ones they prefer.

Here's how the client code looks like for the second approach:

A pseudo-code that shows an HTTP request using the JavaScript "fetch" API to the URL "" using the method "get." The query string has the parameter "intended doctor" as "jane." The code extracts the "list of available time slots for the week" from the response. In the next statement, there's the same "post" HTTP request from the previous example, only now the "date time" field is the result of a function call. The name of the function is "choose available time for." The function receives the "date time" of “May 1st, 2018 at 10:20 AM” in the ISO8601 format as the first argument and the list of available time slots as the last argument.
A diagram that shows the client in the left and the server in the right. The client makes a GET request to the "/booking" end-point, looks at the response and then synchronously makes a POST request to the same end-point.

This example shows some critical aspects people get wrong with APIs:

  • The client code owns the URL and the methods. The server doesn’t have control over where the client makes the next request. Therefore, the server can’t change their URLs or methods because those changes can break somebody else.
  • Business logic changes in the server are very likely to cause breaking changes in the clients. That forces the server to create a rigid response body and implement URL versioning to remain backward compatible.
  • The client needs to find out exactly which parameters the server accepts on every end-point. They need to search those parameters either through Swagger docs or code examples in the "Developer" section of the website.

The main problem here is that most of the changes in the business requirements have a higher chance to break everybody.

Instead of forcing clients to have prior knowledge of all URLs, fields, and HTTP methods, the client can ask the server what is required to complete an operation, and the server can provide that. The only code the client needs to write is the code to interpret the message.

With that in mind, let's reimagine the same booking system.

Instead of starting with the code, though, let's first understand how people book an appointment without the technology:

Customer arrives at the clinic
Customer: Hi, I want to make an appointment.
Receptionist: What’s your name?
Customer: Mary.
Receptionist: What sort of appointment would you like to make?
Customer: I want to consult with doctor Jane.
Receptionist: We have these time slots available, which one do you prefer?
Customer: That one.
Conversation ends

As a server, you can ask the clients which information they need. Exactly how you would do if you were a receptionist in real life. After you understand how people book an appointment without the technology, you'll learn very early some subtle rules from the business. For example, each doctor has a different set of working hours; therefore, the receptionist needs to know the intended doctor before providing the available time slots, not after.

The server doesn’t know what you want; it only knows the description of the fields, which are empty by default. The client who wants to book an appointment has the intelligence to fill those fields and provide the necessary information to the server:

The example of two responses from the server. The server returns the fields "user name" and "intended doctor" in the first response. Each of those fields has a property with the name "read-only" and the value of "false." The server returns the fields "user name" and "intended doctor" with their respective values in the second response and the "read-only" property as "true." Additionally, the server returns the fields "available time slots" and "booking" in the second response with their "read-only" properties containing the value of "false."

The /start-booking path is the "entry point." The "entry point" is the only URL the client needs to initiate a conversation. That makes sense because in real life you also need to know which clinic and receptionist to contact; you need to know the host and the path. The client understands when the server asks for those questions and has some logic to fill the value of each field accordingly.

In the previous example, look at the value for the fields where the read-only property is true. The client puts the fields in the request body, and the server returns them in the next response. You can develop a contract telling clients not to fill the values for the fields that have a read-only property with the value of true.

That is how you store the state in the network.

Neither the client or the server needs to be the permanent canonical source for the state. The state is in the conversation. You can write some code that sends the payload back to each subsequent request using a recursive strategy like this:

A pseudo-code that shows the implementation of a client for the API. The name of the function is "run." The code calls the function at the end and logs the return value as the "exit point." The function has 3 arguments. One argument with the name "method" and the default value as "get," one argument with the name "url" and the default value as "," and one argument with the name "fields" without a default value; the last argument is optional. The code inside the "run" function executes an HTTP request using the arguments "method" and "url." Then it queries for an "action" with the id "required-fields" from the response body. If it has that action, it calls a function with the name "fill fields" to set the value of the fields for that action. Then, the code recursively executes the function "run" passing the fields to the next request until the "required-fields action" is not there anymore. If the HTTP request returns a response with no action that the client supports, then the function returns with whatever response body the server returns.

This way, the server can change the time slots, and the client code won't break. In the code example above, there's a function with the name "fill fields." That function represents the brain of the client, the "driver." It has the domain logic to consume the fields and select the time slots. That brain can have the intelligence to pick the best time slot available for their needs exactly how a human would do.

An Evolvable API is a different architectural style, a way to think about how two computers communicate.
A diagram that shows the machine driver on the left and the client code. The server is in the right. Each the server and the driver have their representation as a brain. The driver initiates a GET request to the server using the client code and asks for the booking; the server returns the fields so that the driver can fill them. The server eventually returns the option to book, and the driver follows that option. As a final step of the conversation, the “exit point,” the server may return the booking as successful or unsuccessful.

It’s possible that the machine driver is not smart enough to pick the best available time slot. After all, their brain is only composed of boards and circuits. In that case, you can use a UI so that the value for each field can come from a human after they’re presented with a list to select and text inputs to type. Instead of the machine serving as the brain to execute with intelligence, the driver is the person who’s booking: a human brain!

A pseudo-code that shows the implementation of a CLI for the API. The code is the same as before; only now the "run" function accepts a function with the name "using CLI." The function "using CLI" prompts the user with the field names so that the system can receive the values as input.

In this style, the server defines actions, URLs, and methods. Also, the server defines which fields are necessary to complete the conversation and drivers can write code to look for those fields. The client always sends the state back to the server on every step of the conversation.

A diagram that shows a human driver on the left. The client code is a CLI. The server is in the right. Each the server and the human driver have their representation as a brain. The driver initiates a GET request to the server using the CLI and asks for the booking; the server returns the fields so that the user can fill them using the interface of the command prompt. The server eventually returns the option to book, the CLI shows that as a selectable option that the user can choose. As a final step of the conversation, the server may return the booking as successful or unsuccessful, which the CLI might represent in their own way.

These are the critical aspects which makes an API evolvable:

  • You store the state of the conversation in the network. If the server wants to track the booking transaction and the client always send the fields back, the server can create a new field with the value containing the transaction ID. You don't have to change any code in the client. Everything works.
  • You don’t need versioning. You can add or remove data from the response of any request and clients know how to react to it. For example, the CLI may not present an option to initiate the booking if the server returns with validation errors; the server decides if that option should be visible. If clients don’t know how to interpret a new feature from the server, say to add a mask for certain types of fields, they can ignore that feature and keep working the old way.
  • The server owns the "actions" which contain the values for the URLs, methods, and fields. This way they can control where the clients go to continue the conversation. The exception is the first call, the "entry point." The "entry point" needs to be hardcoded in the client. There are many ways to model this idea; you don’t need to use the term “action” as the examples of this post are using.

Here’s why having the control of URLs in the server is very important:

  • The server can run A/B testing and direct clients to different servers running the same instance of the application without changes in the clients.
  • The server can decide to implement a polling functionality to track the status of a booking asynchronously without changes in the clients.
  • The server can change the intermediary URLs, methods or responses and there's no need to make any backward-compatible change in the clients, only forward changes.

The difference between an evolvable API approach and the RPC one is that it makes new business requirements easier to implement. It also forces you to think about how systems and teams communicate in your domain. The tradeoff is a small design investment.

There's no magic. If the client or the server decides to fill the user_name with an “underscore” instead of username without an "underscore" and that field is mandatory, then obviously the server will never complete the conversation. It may error out with an API rate limit if the code enters an infinite loop. There are some things such as the field names that both systems need to agree on, and that's the API contract.

Start thinking about capabilities, not URLs.

If you want to design evolvable APIs, the first thing you need to understand is that URLs are a meaningless implementation detail, except for the "entry-point." If you're spending your time discussing if URLs should be in the singular /restaurant/:id or plural /restaurants/:id, then that means you're not creating evolvable APIs. Also, if you model URLs with IDs, you may inadvertently expose sensitive data that can affect the security of your application.

For every request to the same host, regardless of the URL, the client should be able to interpret the contents of that response and fill the data the server needs. The server may ask for all the fields in the first request, or they may acknowledge some fields and ask for the remaining ones in another request.

The most significant bit here is that the server responds to each interaction with the same uniform interface. This way, the code that you write for one response works for any kind of response in the same host, as long as the host sends the response in the agreed format.

This way you have no coupling between one specific URL and the response. One team can develop the client, and another team can develop the server; they only need to agree on the communication format. There's no need to deploy both client and server at the same time in a specific order!

All the code you write should have the power to interpret any response from any URL of an API.

Imagine you're booking a doctor's appointment in real life. When you call or get to the clinic (the host), you search for the receptionist to ask for what you need to make the booking (the path for the "entry point"). The receptionist gives you a list of possible times, and you pick one (as the client). If that list doesn't show the whole day, you ask for more (pagination).

You and the clinic know what a "list of available times" or a "booking" means, that’s the domain vocabulary. The server’s response, which contains the "data" property as the first level document structure, comes from the JSON API specification. The JSON API spec is the language both systems use to communicate. Two systems can’t understand each other only with a language; they need to understand the domain vocabulary.

The current state of the conversation is in the mutual understanding of both parties. The receptionist knows you're in the process of picking a time; you know you're in the process of choosing the time you prefer. Nobody is using a sheet of paper to store the state of the conversation; both parties know the state. The state is in the network, not in a database.

To have an efficient conversation, having a generic language is not enough. When you design an API, it’s your job as a programmer to develop the domain vocabulary.

In Domain Driven Design, you model the architecture of the code with how the people from your business operate. With an evolvable API, it's the same. You model the communication between machines exactly how people in your business communicate.

Think about how people would solve the business problem without technology and use technology to enhance the business.

If you write the code to interpret the message of the server and the server provides an evolvable API, you'll see that over time you'll be able to fit many business requirements with the minimal amount of effort. That means fewer code changes and less breakage for the code you don't control.

Now here's the catch: this is not a new idea.

Most of the developers who design APIs these days are struggling with problems that somebody else have already figured out more than 15 years ago. Given this post is becoming considerably big, I won’t tell you where this idea comes from and who is that “somebody else.” That's a subject for another post.

For now, I can only say it’s time for you and me to get some rest.

Tag cloud