How to industrialise API development?

Developing and operating software used to be two separate jobs, performed by two different groups of people. Developers wrote software, and they passed it on to operations staff, who ran and maintained the software in production (that is to say, serving real users, instead of merely running under test conditions). Like computers that need their own floor of the building, this separation has its roots in the middle of the last century. Software development was a very specialist job, and so was computer operation, and there was very little overlap between the two.

Indeed, the two departments had quite different goals and incentives, which often conflicted with each other. Developers tend to be focused on shipping new features quickly, while operations teams care about making services stable and reliable over the long term.

When the cloud came on the horizon, things changed. Distributed systems are complex, and the internet is very big. The technicalities of operating the system, including recovering from failures, handling timeouts, smoothly upgrading versions, are not so easy to separate from the design, architecture, and implementation of the system.

Further, “the system” is no longer just your software: it comprises in-house software, cloud services, open source libraries, network resources, load balancers, monitoring, content distribution networks, firewalls, DNS, and so on. All these things are intimately interconnected and interdependent. The people who write the software have to understand how it relates to the rest of the system, and the people who operate the system have to understand how the software works… or fails.

DevOps and Microservices

The origins of DevOps lie in attempts to bring these two groups together: to collaborate, to share understanding, to share responsibility for systems reliability and software correctness, and to improve the scalability of both the software systems and the teams of people who build them.

Microservice architecture emerged from a common set of DevOps ideologies that came into being at companies like Amazon, Netflix, Facebook and Google. In many cases, those companies started with monolithic applications, which rapidly evolved into decomposed services, which then communicated via RESTful APIs and other network-based messaging protocols to become the harbingers of a microservices-based architecture.

The benefit of decomposing an application into a suite of microservices for development teams are:

Modularity
Integration of heterogeneous and legacy systems
Distributed development

For the Ops in DevOps, please see our post on How to industrialise API operations?

The challenges

A key step in defining a microservice architecture is figuring out how big an individual microservice has to be. There is no consensus for this, as the right answer depends on the business and organisational context. For instance, Amazon famously uses a Service-oriented architecture where a service often maps 1:1 with a team of 3 to 10 engineers. It is considered bad practice to make the service too small, as then the runtime overhead and the operational complexity can overwhelm the benefits of the approach. When things get too fine-grained, alternative approaches must be considered – such as packaging the function as a library, moving the function into other microservices.

In general, make sure that the portfolio of APIs (representing those microservices) is leading to a simpler, more adaptable IT landscape, combining agility with maintainability. One of the dangers of having autonomous teams building APIs and microservices is a complex proliferation of dependencies across a large network of connected APIs. Taking an architecture-guided approach to control this complexity, which allows light touch coordination across teams, avoids the pitfall of “API spaghetti” without requiring classical, monolithic, waterfall ways of working.

Perhaps the most important point to keep in mind, though, is that APIs are about data – providing data to applications in a reliable and performant way. So it’s critical to have a data architecture modeled, as part of the overall enterprise architecture, that enables data objects to be mapped to the APIs, applications and services that provide and consume them. This gives visibility into duplication of data services and APIs, enabling rationalization and optimization of the portfolio.

This post discusses architecture patterns for:

Decomposition: how to decompose the application into a set of microservices?
Integration: how to integrate the microservices with the outside world?
Database: how manage databases in a decomposed architecture?
Observability: how to monitor the entire set of microservices as one whole?

1. Decomposition Patterns

Decompose by Business Capability

Microservices is all about making services loosely coupled, applying the single responsibility principle. However, breaking an application into smaller pieces has to be done logically. How do we decompose an application into small services?

One strategy is to decompose by business capability. A business capability is something that a business does in order to generate value. The set of capabilities for a given business depend on the type of business. For example, the capabilities of an insurance company typically include sales, marketing, underwriting, claims processing, billing, compliance, etc. Each business capability can be thought of as a service, except it’s business-oriented rather than technical.

Decompose by Subdomain

Decomposing an application using business capabilities might be a good start, but you will come across so-called “Super Classes” which will not be easy to decompose. These classes will be common among multiple services. For example, the Order class will be used in Order Management, Order Taking, Order Delivery, etc. How do we decompose them?

For the “Super Classes” issue, Domain-Driven Design comes to the rescue. It uses subdomains and bounded context concepts to solve this problem. Domain-Driven Design breaks the whole domain model created for the enterprise into subdomains. Each subdomain will have a model, and the scope of that model will be called the bounded context. Each microservice will be developed around the bounded context.

Note that identifying subdomains is not an easy task. It requires an understanding of the business. Like business capabilities, subdomains are identified by analyzing the business and its organizational structure and identifying the different areas of expertise.

Wrappers

So far, the design patterns we talked about were decomposing applications for greenfield, but 80% of the work we do is with brownfield applications, which are big, monolithic applications. Applying all the above design patterns to them will be difficult because breaking them into smaller pieces at the same time it’s being used live is a big task.

The Wrapper pattern comes to the rescue. This solution works well with web applications, where a call goes back and forth, and for each URI call, a service can be broken into different domains and hosted as separate services. The idea is to do it one domain at a time. This creates two separate applications that live side by side in the same URI space. Eventually, the newly refactored application warps the original application until finally you can shut off the monolithic application.

2. Integration Patterns

API Gateway

When an application is broken down to smaller microservices, there are a few concerns that need to be addressed:

How to call multiple microservices abstracting producer information.
On different channels (like desktop, mobile, and tablets), apps need different data to respond for the same backend service, as the UI might be different.
Different consumers might need a different format of the responses from reusable microservices. Who will do the data transformation or field manipulation?
How to handle different type of Protocols some of which might not be supported by producer microservice.

An API Gateway helps to address many concerns raised by microservice implementation, not limited to the ones above.

An API Gateway is the single point of entry for any microservice call.
An API gateway can be the first shield of defense against cyber attacks and can verify OAuth 2.1 tokens on behalf of the microservices.
It can work as a proxy service to route a request to the concerned microservice, abstracting the producer details.
It can fan out a request to multiple services and aggregate the results to send back to the consumer.
One-size-fits-all APIs cannot solve all the consumer’s requirements; this solution can create a fine-grained API for each specific type of client.
It can also convert the protocol request (e.g. AMQP) to another protocol (e.g. HTTP) and vice versa so that the producer and consumer can handle it.
It can also offload the authentication/authorization responsibility of the microservice.

Aggregator

We have talked about resolving the aggregating data problem in the API Gateway Pattern but what about collating the data returned by each service? This responsibility cannot be left with the consumer, as then it might need to understand the internal implementation of the producer application.

The Aggregator pattern is about aggregating the data from different services and then send the final response to the consumer. This can be done in two ways:

A composite microservice will make calls to all the required microservices, consolidate the data, and transform the data before sending back.
An API Gateway can also partition the request to multiple microservices and aggregate the data before sending it to the consumer.

It is recommended if any business logic is to be applied, then choose a composite microservice. Otherwise, the API Gateway is the established solution.

Client-Side skeleton

When services are developed by decomposing business capabilities/subdomains, the services responsible for user experience have to pull data from several microservices. In the monolithic world, there used to be only one call from the UI to a backend service to retrieve all data and refresh/submit the UI page. However, now it won’t be the same. We need to understand how to do it.

With microservices, the UI has to be designed as a skeleton with multiple sections/regions of the screen/page. Each section will make a call to an individual backend microservice to pull the data. That is called composing UI components specific to service. Frameworks like AngularJS and ReactJS help to do that easily. These screens are known as Single Page Applications. This enables the app to refresh a particular region of the screen instead of the whole page.

Webhooks

A typical API lets developers read the latest data from your application. Perhaps they can query for specific data and may even be able to write new data if they have appropriate permissions. These sort of CRUD operations (Create, Read, Update, and Delete) provide the basic elements needed to emulate the functionality of your application.

However, your application exists within a network of other applications. Since changes to your user data affects many others, you need a robust model that takes this reality into account. An event-driven approach ensures you can send the right notification to the right application at the time it occurs.

Let’s say one feature of your application includes contacts, each of which has a name, email address, and phone number. The moment a new contact is added to your application, some other application may want to use that data to send a text message, look up their email address in another system, or backup the contact data. Without a way to alert other applications when a new contact is added, you force them to constantly poll for new data.

Polling is bad for everyone. It’s taxing to servers on both sides, which spend most of their time and resources with requesting responses that contain nothing new. When there is data, it’s only as new as the polling interval, which means it might not even satisfy a user’s thirst for instant data access. Other than webhooks, the only way to have up-to-the-second access to new data is to poll every second. While some developers may try that, it’s not likely to be sustainable.

Webhooks are an architectural pattern that enable developers to receive updates to data as they happen rather than polling for the latest updates. The investment you put in up front building a webhook-enabled API can save your system resources and delight developers and end users alike.

3. Database Patterns

Database per Service

There is a problem of how to define database architecture for microservices. Following are the concerns to be addressed:

Services must be loosely coupled. They can be developed, deployed, and scaled independently.
Business transactions may enforce invariants that span multiple services.
Some business transactions need to query data that is owned by multiple services.
Databases must sometimes be replicated and sharded in order to scale.
Different services have different data storage requirements.

To solve the above concerns, one database per microservice must be designed; it must be private to that service only. It should be accessed by the microservice API only. It cannot be accessed by other services directly. For example, for relational databases, we can use private-tables-per-service, schema-per-service, or database-server-per-service. Each microservice should have a separate database id so that separate access can be given to put up a barrier and prevent it from using other service tables.

Shared Database per Service

We have talked about one database per service being ideal for microservices, but that is possible when the application is greenfield. But if the application is a monolith and trying to break into microservices, denormalisation is not that easy. What is the suitable architecture in that case?

A shared database per service is not ideal, but that is the working solution for the above scenario. Most people consider this an anti-pattern for microservices, but for brownfield applications, this is a good start to break the application into smaller logical pieces. This should not be applied for greenfield applications. In this pattern, one database can be aligned with more than one microservice, but it has to be restricted to 2-3 maximum, otherwise scaling, autonomy, and independence will be challenging to execute.

Command & Query Segregation

Once we implement database-per-service, there is a requirement to query, which requires joint data from multiple services — it’s not possible. Then, how do we implement queries in microservice architecture?

Command & Query Segregation suggests splitting the application into two parts — the command side and the query side. The command side handles the Create, Update, and Delete requests. The query side handles the query part by using the materialized views. The event sourcing pattern is generally used along with it to create events for any data change. Materialized views are kept updated by subscribing to the stream of events.

Pairing

When each service has its own database and a business transaction spans multiple services, how do we ensure data consistency across services? For example, for an e-commerce application where customers have a credit limit, the application must ensure that a new order will not exceed the customer’s credit limit. Since Orders and Customers are in different databases, the application cannot simply use a local ACID transaction.

A Pairing represents a high-level business process that consists of several sub requests, which each update data within a single service. Each request has a compensating request that is executed when the request fails. It can be implemented in two ways:

Choreography — When there is no central coordination, each service produces and listens to another service’s events and decides if an action should be taken or not.
Orchestration — An orchestrator (object) takes responsibility for a saga’s decision making and sequencing business logic.

4. Observability Patterns

Log Aggregation

Consider a use case where an application consists of multiple service instances that are running on multiple machines. Requests often span multiple service instances. Each service instance generates a log file in a standardized format. How can we understand the application behavior through logs for a particular request?

We need a centralized logging service that aggregates logs from each service instance. Users can search and analyze the logs. They can configure alerts that are triggered when certain messages appear in the logs. For example, PCF does have Loggeregator, which collects logs from each component (router, controller, diego, etc…) of the PCF platform along with applications. AWS Cloud Watch also does the same.

Performance Metrics

When the service portfolio increases due to microservice architecture, it becomes critical to keep a watch on the transactions so that patterns can be monitored and alerts sent when an issue happens. How should we collect metrics to monitor application perfomance?

A metrics service is required to gather statistics about individual operations. It should aggregate the metrics of an application service, which provides reporting and alerting. There are two models for aggregating metrics:

Push — the service pushes metrics to the metrics service e.g. NewRelic, AppDynamics
Pull — the metrics services pulls metrics from the service e.g. Prometheus

Distributed Tracing

In microservice architecture, requests often span multiple services. Each service handles a request by performing one or more operations across multiple services. Then, how do we trace a request end-to-end to troubleshoot the problem?

We need a service that:

Assigns each external request a unique external request id.
Passes the external request id to all services.
Includes the external request id in all log messages.
Records information (e.g. start time, end time) about the requests and operations performed when handling an external request in a centralized service.

Health Checks

When microservice architecture has been implemented, there is a chance that a service might be up but not able to handle transactions. In that case, how do you ensure a request doesn’t go to those failed instances? With a load balancing pattern implementation.

Each service needs to have an endpoint which can be used to check the health of the application, such as /health. This API should o check the status of the host, the connection to other services/infrastructure, and any specific logic.