As I was publishing last week's post on whether GitHub is becoming obsolete or the future of development platforms, they decided to trigger a two-hour interruption on Mergify in retaliation.
Just kidding. I am sure they did not do that on purpose.
Read my post-mortem if you want the whole story. The summary is that they broke their API for several hours until people started to complain, and they finally rolled back their change. Bringing down our service in the meantime.
That event forces me to talk about APIs this week.
API Definitions Are Just Definitions
I won’t go into the definition of an API per se; it’d be boring. You can Google it if you need to.
The real question is what having an API means. Offering an API to your users means authorizing them to interact with your service. This implies many rules, such as the data model of your API, the behavior of your API, the rules of usage, etc. Some can be encoded in a computer-readable machine; others cannot. Engineers like to talk about contracts, and I think it’s an almost good analogy.
To describe this contract, you need multiple specifications.
Developers have been ecstatic over OpenAPI over the last decade as a go-to media for describing their API. I want here to emphasize how little this documents your API. It illustrates the data model used but does not encode much of the behavior the system might exhibit.
Hey, I can confirm that GitHub did not break its OpenAPI schema when it broke its API last week. Formidable.
However, based on the assumption that OpenAPI is enough, many engineers mock their API consumption based on that part of the contract and think they’re done.
In that situation, the minimum you should do is validate that your mocking follows the OpenAPI schema you’re using. Even that is not enough because sometimes the schema changes—and sometimes it’s just not respected.
Let’s take GitHub again as an example. Their API is so legacy that the JSON schemas were crafted manually — and there might still be, I don’t know. It’s fine; it’s better than nothing, and it’s not obvious to change a legacy API that’s been there for 15 years.
We know first-hand that their system does not always respect the GitHub API JSON Schema.
APIs Have Side-effects
Again, this approach is entirely based on the data model and is insufficient and of little value.
Most of an API's value is in the behavior it triggers. Unless your API is a basic CRUD and does storage only, it will have side effects that might or might not be visible through the API.
For example, creating an asynchronous job on any REST API will return nothing except a unique identifier, which can be used later to identify the work. You might receive the data via a webhook or have to poll the API to get the job’s status. This kind of behavior cannot be documented in OpenAPI as it’s not part of the data model; there’s nothing to tell you to expect a webhook.
API Invisible Parts
Now, let’s discuss all the invisible parts of running an API. There are many. The first that come to mind are RBAC, quota, and rate limits. Most APIs have to implement those items, and they also have a direct impact on the API behavior and access.
Those features will massively impact the quality and quantity of API use. Again, they are pretty hard to test in a black box. There’s no way you can easily mock a full RBAC implementation or real-life rate limits.
Testing the Hard Way
Having consumed many different APIs for the last five years on Mergify, and especially GitHub’s one that we know by heart, gave us a few ideas on how you can or cannot test.
Rule number one: do not mock. Record your tests.
We leverage vcrpy in Python to do that: the idea is to run your test in a record mode where real HTTP requests are done against a service. Once the recording is done, you can replay the test when running it locally or in the CI.
If any of your code tries to make a different HTTP call, the test will fail, and you will have to re-record it. This ensures that no change is made to the application without being noticed.
Now, that does prevent your application from being broken, but that does not prevent the API from breaking your app. The only way to do this is to regularly re-record all the tests and see if they break.
So, rule number two: re-record your tests regularly — every day if possible.
For example, we have a test that plays with GitHub pull request labels. When re-recording a test a few months ago, we noticed that if it failed. It turned out that GitHub changed its API to become case-sensitive overnight (that was not in the OpenAPI schema!).
In that case, we preferred to ask GitHub to fix their API rather than fix our code, but hey, your mileage may vary.
Rule number three: be ready to fix the code.
No amount of testing will cover all the edge cases. For example, requests quota or rate limit might be hit in real scenarios but not in testing, meaning you’ll have to handle those specific cases without being able to test. It’s fine — you can actually mock part of the responses here.
For this, we leverage Sentry to obtain evidence of the problem, replicate it in a test, and fix it. No amount of testing can fix all scenarios, so having a way to hotfix your code is a must.
In the end, mixing API test recording for safety and error tracking for fast action is the best combination we’ve seen for dealing with external systems.
If we map those rules to last week's incident, rule number three helped to fix the issue quickly, while rule number one would have technically caught it, and rule number two would have done so in less than 24 hours. Even if it turned out in our case that reality kicked in before testing.
So use that. And retry mechanisms.
I guess that’ll be for another post.