An API gateway sits between clients and services. While there is no precise definition of what constitutes an API gateway, functions an API gateway is responsible for typically fall into three categories: routing, aggregation, and cross-cutting functionality. These functions are applicable to many backing services, so having the gateway take responsibility for their implementation yields more focused services and interface consistency.
In their role as routers, gateways provide a single endpoint for clients to consume. When the gateway receives a request, it forwards that request on to one or more services. This decouples the task to be done from the services that accomplish that task. If the services used to accomplish the task change, clients do not necessarily have to change the request.
Routing also provides flexibility for introducing new functionality. When a services deploys a new version, we can route requests to the new version for only a subset of clients. Assuming the partial rollout goes well, we can subsequently roll out the new version to everyone.
Keep in mind that routing all traffic to the gateway introduces a single point of failure. To mitigate this risk, it’s important to design the gateway for resiliency. Resiliency involves maintaining the availability of the gateway in the face of both well-intentioned and nefarious requests. Whether it’s an avalanche of legitimate requests or a denial-of-service attack, many of the same strategies apply. These strategies include authentication and authorization, IP whitelists, caching, and rate limiting.
Performance, or how long it takes to the gateway to respond to a request, is closely related to resiliency. Assuming clients communicate with the gateway via HTTP, there is a threshold in which the gateway must send a response to prevent clients from timing out the request. To keep the gateway resilient and performant, the code we execute on the gateway should be short-lived.
To keep execution time short, a gateway often communicates with services asynchronously. This allows the gateway to handle other requests while it waits for responses. A common implementation of this paradigm is the event loop concurrency pattern. The event loop processes requests on a single thread by offloading the work to be done via asynchronous service calls. While the event loop waits for the service calls to complete it processes other requests.
To ensure high availability, Microsoft recommends deploying at least two replicas of the gateway. From there, we can scale out the gateway further based on load. It can also make sense to run the gateway on dedicated nodes in a clustered environment to prevent noisy-neighbor problems.
Despite implementing strategies for maintaining resiliency and performance, we may still choose to partition the public interface into multiple gateways. Partitioning can help organize gateway responsibilities from a logical perspective. Partitioning the gateway by API version, particular endpoints, or service criticality are common strategies. Another partitioning strategy is separating the interface based on the types of clients being served.
The average human reaction time is 250 milliseconds (a quarter of a second). Actions performed in less than 250 milliseconds appear instantaneous. For a browsing experience to feel instantaneous, reducing round-trip time is a leading consideration. When it comes to round-trip time, the contributing factor is typically latency, i.e. how fast the contents of a request travel to the server and receive a response back.
In 2012, the average round-trip time for a single Google request was 100 milliseconds. Many web pages require more than a single request. The more requests required to render a webpage, the greater the aggregate latency. Yes, the browser can parallelize some requests, but there is also an overhead cost to parallelization. We may choose to aggregate requests when a unit of work the client wants performed is not handled by a single backing service. By aggregating the unit of work into a single request to the gateway, we can reduce latency, thereby providing a better browsing experience.
Note: Aggregation is not the same as request batching. Request batching reduces the number of requests between a client and a single service across multiple units of work. Aggregation reduces the number of requests required to complete a single unit of work.
Without a gateway, clients send requests directly to each service. In addition to increased latency, sending requests directly to each service exposes potential problems such as:
A key to the gateway as an aggregator is the implicit assumption it can aggregate requests more efficiently than the client. For the gateway to efficiently perform this function, we can implement the following resiliency strategies:
Additional recommendations for resiliency:
To simplify application development, we can offload cross-cutting functionality into the gateway. Security issues such as token validation, encryption, and SSL certificate management require specialized skills. Almost all services need functions such as authentication, authorization, logging, and monitoring. Some of these functions are not easily packaged and configured as dependencies, so it may be better to consolidate them into the gateway to reduce overhead and the chance for errors.
Terminating inbound SSL connections is a common function of the gateway. This pattern keeps data encrypted between the client and the gateway while allowing unencrypted traffic to flow between internal services. This alleviates the need to distribute and maintain certificates between backing services. The core engineering team can focus on application features while alleviating the need for security experts to focus on authentication, authorization, and network monitoring at every level of the architecture.
Offloading functions such as logging and monitoring to the gateway provides a level of consistency. Even if an individual service is not properly instrumented, the gateway ensures we have a minimum level of logs available. The gateway can also take care of more specialized monitoring activities such as rate limiting.
Additional functions commonly handled by the gateway include:
Offloading functionality to the gateway is a balancing act. As discussed in the routing section, we must ensure the gateway maintains a reasonable level of performance and is resilient to failure. Practical recommendations for offloading include:
API Gateways play a critical role in microservices architecture, acting as a mediator between clients and services. While there is no one-size-fits-all approach for which responsibilities a gateway handles, at a high level gateways handle routing, request aggregation, and cross-cutting concerns. Because gateways act as the single interface for client requests, it’s critical to ensure an acceptable level of performance as well as resiliency to backing service failures.
Microsoft. (2018, October 22). Using API gateways in microservices. Retrieved from https://docs.microsoft.com/en-us/azure/architecture/microservices/design/gateway
Microsoft. (2017, June 22). Gateway Routing pattern. Retrieved from https://docs.microsoft.com/en-us/azure/architecture/patterns/gateway-routing
Microsoft. (2017, June 22). Gateway Aggregation pattern. Retrieved from https://docs.microsoft.com/en-us/azure/architecture/patterns/gateway-aggregation
Microsoft. (2017, June 22). Gateway Offloading pattern. Retrieved from https://docs.microsoft.com/en-us/azure/architecture/patterns/gateway-offloading
Microsoft. (2017, June 22). Bulkhead pattern. Retrieved from https://docs.microsoft.com/en-us/azure/architecture/patterns/bulkhead
Microsoft. (2017, June 22). Circuit Breaker pattern. Retrieved from https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker
Microsoft. (2017, June 22). Retry pattern. Retrieved from https://docs.microsoft.com/en-us/azure/architecture/patterns/retry
PubNub Staff. (2015, February 9). How Fast is Realtime? Human Perception and Technology. Retrieved from https://www.pubnub.com/blog/how-fast-is-realtime-human-perception-and-technology/
Grigorik, Ilya. (2012, July 19). Latency: The New Web Performance Bottleneck. Retrieved from https://www.igvita.com/2012/07/19/latency-the-new-web-performance-bottleneck/
keycdn (2018, October 4). What Is Latency and How to Reduce It. Retrieved from https://www.keycdn.com/support/what-is-latency