Lenny’s Little Apollo Book

Apollo Federation Motivation

April 22, 2020

In a previous post, I listed some assumptions that motivate the desire for a solution like Apollo Federation:

Microservices and the team autonomy they provide are valuable for developer velocity.
Many product features span multiple microservices even in the best-organized systems.
When microservices communicate with each other directly, you end up with a distributed monolith.
API Gateways are a useful solution for decoupling microservices.
API Gateways as shared codebases often become unmanageable piles of tech debt.

In this post I’d like to break down these assumptions and hopefully clarify my motivations for using Apollo Federation.

Microservices

There’s a ton of literature about the pros and cons of microservices—it’s all about tradeoffs. I like working with microservices because:

Teams can set their own deploy schedule.
Teams can choose the most appropriate language/framework/tools.
Smaller services are more approachable and often easier to understand.

Remember: this is just a starting assumption and may not apply to your company. But if we assume a microservice architecture, we then have to talk about the dangers of distributed monoliths.

Distributed Monoliths

It’s easy to start out with idealistic notions of microservice architectures, only to accidentally build a distributed monolith—something that looks like a microservice architecture, but is in fact a single application because of the tight coupling between services.

One reason you may fall into this trap is that you have to implement features that don’t fit neatly into a single service. Consider this simplistic microservice architecture:

Simple architecture diagram — Diagram of a single client and two microservices.

As soon as you need data from both services, you might be tempted to allow one service to fetch data from another service:

Fetching data from both services — The client fetches data from one service, which then fetches data from the other.

You’ve now coupled your services together. You may have to coordinate deploys and a bug in one service could take down the other. Also, if all services have a Service Level Agreement (SLA) of 99% uptime, the total availability of the system is 99% × 99% = 98%. This gets worse with each new service:

Diagram of many services calling each other — If each service has an SLA of 99.9% (about 8 hours of downtime per year) then this entire system has an SLA of is 99.1% (about 3.25 days of downtime per year.)

Client Complexity

You may be tempted to avoid tightly-coupling services by having your clients talk to services directly:

Diagram of a client calling multiple services — One client calls each service separately and combines the data for a cross-domain feature at runtime.

When I put on my client engineering hat, this diagram stresses me out. For starters, I’m now required to efficiently coordinate multiple asynchronous requests, making my code more complex. I also need to be aware of failure modes: what if one request fails and the other succeeds? Finally, it’s increasingly rare to have a single client—web, iOS, and Android apps are usually table stakes, not to mention watches, TVs, IoT devices, and other platforms. When the responsibility of combining cross-service data resides in clients, you end up duplicating complex logic across all these platforms.

Diagram of many clients calling many services — Repeating cross-domain logic across all platforms and devices creates an ever-increasing maintenance burden and consistency challenges.

API Gateways

One way to avoid distributed monoliths and manage client complexity is to implement an API Gateway between your clients and services.

Diagram of many clients calling an API gateway connected to many services.

Chris Richardson’s book Microservice Patterns talks about the features and tradeoffs of API Gateways (and their close cousin, Backends-For-Frontends). The gateway acts as an abstraction layer, hiding the complex web of services from clients. Each app talks to a single service that handles all the cross-domain concerns like combining data from multiple sources.

However, as much as I love API Gateways from a client-engineering perspective, I’ve had little luck maintaining them for more than a couple of years. I’ve found that they only grow bigger and bigger and eventually collapse under their own weight.

The challenges of building and maintaining API Gateways include:

Managing a constant stream of changes to keep up with ever-evolving services.
Either having a single team own and develop the gateway, or requiring that domain teams contribute to the single codebase, possibly in a language they don’t use.
Avoiding the temptation to put complex business logic into this shared, difficult-to-test codebase.
Upgrading the underlying framework/platform (e.g. from Rails 5 to 6) sometimes becomes a multi-week effort.

I’m definitely not saying that it’s impossible to successfully operate an API Gateway—many companies do it. It just hasn’t worked for me due to the specific engineering cultures I’ve worked in.

When I started investigating GraphQL, I was afraid that I’d have to build another monolithic API Gateway that would eventually become unmanageable. Then I discovered Apollo Federation!

Automated API Gateways

An API Gateway that supports Apollo Federation is more of an appliance than a service. It runs with little intervention, automatically combining the functionality of multiple services—defined as individual GraphQL APIs—into a single client-facing API.

This dramatically reduces the burden of building and maintaining an API Gateway. The tradeoff is that your services must be able to speak GraphQL and conform to the Federation Specification. In practice, this usually means that teams will run their own GraphQL services that translate lower-level APIs into GraphQL APIs with a topology like this:

Diagram of many clients a federated gatway, which calls team-specific GraphQL sercices — Each team runs one GraphQL service that abstracts away their web of microservice and converts their functionality into a composible GraphQL API.

With an automated platform, updating the API Gateway when microservices change becomes a matter of changing configuration, not writing a bunch of new code. Apollo Graph Manager provides APIs for managing the configuration and it updates the gateway for us. Ideally, we’ll rarely touch the gateway code at all.

Conclusion

Microservices come with many benefits and costs. One cost I would prefer not to pay is overwhelming complexity when building complex features that span services and domains. This is especially true when you want to provide a unified experience across your entirely product line, or you want to create new lines of business with new combinations of your product offering.

API Gateways are a useful pattern in microservice architectures, but they also have costs. Apollo Federation minimizes that cost by turning the gateway into a configuration-driven appliance rather than a shared codebase.

Even though they’re less than a year old, the Federation JavaScript libraries and Apollo Graph Manager work great today. I’m excited to see how they can improve over time to allow for increasingly useful abstractions over microservice architectures.

Written by Lenny Burdette in San Francisco. You can follow him on twitter but he doesn't tweet. Opinions written here do not necessarily reflect those of his employers and are subject to change.