On the WWW team, we’re responsible for Cloudflare’s REST APIs, account management services and the dashboard experience. We take security and PCI compliance seriously, which means we move quickly to stay up to date with regulations and relevant laws.
A recent compliance project had a requirement of detecting certain end user request data at the edge, and reacting to it both in API responses as well as visually in the dashboard. We realized that this was an excellent opportunity to dogfood Cloudflare Workers.
Deploying workers to www.cloudflare.com and api.cloudflare.com
In this blog post, we’ll break down the problem we solved using a single worker that we shipped to multiple hosts, share the annotated source code of our worker, and share some best practices and tips and tricks we discovered along the way.
Since being deployed, our worker has served over 400 million requests for both calls to api.cloudflare.com and the www.cloudflare.com dashboard.
First, we needed to detect when a client was connecting to our services using an outdated TLS protocol. Next, we wanted to pass this information deeper into our application stack so that we could act upon it and conditionally decorate our responses with notices providing a heads up about the imminent changes.
Our Edge team was quick to create a patch to capture TLS connection information for every incoming request, but how would we propagate it to our application layer where it could be acted upon?
With our workers able to inspect the TLS protocol versions of requests, we needed only to append a custom HTTP header containing this information before forwarding them into our application layer.
Our APIs use this data to add deprecation warnings to responses, and our UI uses it to display banners explaining the upcoming changes.
Let's now take a look at the source code for our worker.
Anatomy of a fail open worker
Asynchrony, logging and alerts via Sentry
We decided to use Sentry to capture events we sent from our worker, but you could follow this same pattern with any similar service.
The critical piece to making this work is understanding that you must signal the Cloudflare worker runtime that it needs to wait upon your asynchronous logging subrequest (and not cancel it).
You do this by:
- Ensuring that your logging function returns a promise (what your promise resolves to does not matter)
- Wrapping your call to your logging function in event.waitUntil as we have done above
This pattern fixes a common race condition: if you don't leverage event.waitUntil, the runtime will race the passthrough subrequest and your logging subrequest.
If the passthrough subrequest completes significantly faster than your logging subrequest, the logging request could be cancelled. In practice, you'll notice this issue manifesting as dropped logging messages - whether or not a given exception will be logged properly becomes a roll of the dice on every request.
For additional insight, check out our official guide to debugging Cloudflare workers.
Failing open to ensure service continuity
A key consideration when designing your worker is failure behavior. Depending on what your particular worker is accomplishing, you either want it to fail open or failed closed. Failing open means that if something goes horribly wrong, the original request will be passed through as if your worker did not exist, while failing closed means that a request that raises an exception in your worker will not be processed further.
If you are editing metadata, collecting metrics, or adding new non-critical HTTP headers, to name a few examples, you probably don't want an unhandled exception in your worker to prevent the request from being serviced.
In this case, you can leverage event.passThroughOnException as we have above, and it's recommended that you call this method in the first line of your fetch event handler. This sets a flag that the Cloudflare worker request handler can inspect in case of an exception to determine the desired passthrough behavior.
On the other hand, if you're employing your worker in a security-centric role, tasking it with blocking malicious requests from nefarious bots, or blocking access to sensitive endpoints when valid security credentials are not supplied, you may want your worker to fail closed. This is the default behavior, and it will prevent requests that raise exceptions in your worker from being processed further.
Our services are in the critical path for our customers, yet our use case was to conditionally add a new HTTP header, so we knew we wanted to fail open.
Having your worker generate PagerDuty and Hipchat alerts
Now that you've done the work to get data from your worker logged to Sentry, you can leverage Sentry's PagerDuty integration to configure alerts for exceptions that occur in your workers.
This will increase your team's confidence in your worker-based solutions, and alert you immediately to any new issues that occur in production.
You can find additional worker recipes and examples in our official documentation.