Shobha Ramu Shivaiah is a Senior Software Development Manager with Amazon music. System design is a huge concept and plays a huge role in development. This article discusses the role of APIs in system design.
System Design
System Design is an organized process of defining a system’s architecture, data, and definition. In a large-scale system, the transaction volume is huge. System design doesn’t solve just the immediate problems but tries to solve future problems and put your system in a situation where all the anticipated issues can be resolved. It’s like a blueprint that you establish right after the requirement phase for the entire system. You can then carry out the detailed design of different components of a system.
Scaling
Scaling is where if your APIs are going to get more requests, your systems need to be able to scale. For the systems to scale, we can do vertical scaling or horizontal scaling.
Vertical Scaling – Vertical scaling is where you add more CPU RAM to your existing servers so that the servers can take in more API requests. Of course, vertical scaling will have its limitation where you can only do so much. Even if you double the CPU or RAM, your performance may not double.
Horizontal Scaling: Horizontal scaling is where you add more nodes and more servers. When you do horizontal scaling, there will be a load balancer in front of your APIs.
A load balancer can be simple. It can round-robin your API requests across the nodes. But usually, the load balancer logic is a little bit more complicated.
Mod-n hashing – When a request comes in, you will do a hash on the request, do a mod N where N is the number of nodes, and the computed value will determine which node your API request goes into. It is simple to implement. However, the disadvantage is if you lose or add a node, your entire request will have to be redistributed. This means it will take more time for the request to be rendered.
Consistent hashing is a slightly more intelligent version of modern hashing, where you do exactly the same. There is an imaginary ring where you have slots from zero to N minus one. When the request comes in, you hash the request, do a mod N and figure out which slot the request falls in. Then you also take the nodes on your system and do the same process: hash the node to a mod N and place the node on the ring. The load balancer takes in the request, finds the closest node, which is close to your request and in the clockwise direction, and puts your request on that node. The advantage is if you lose a node or add a node, you will not have to redistribute the entire request, whereas you need to redistribute a section of it.
This is important because, as an API team, it is important to know what is happening to the request. This may help you decide during design how the requests are coming and how to handle them.
Microservice architecture
When you design a system, never start with the data modeling or the database. Always start from the front end, understand your customers’ needs, understand the use case, and start designing from front to back. Start putting services based on the user’suser’s use case, and then design the data behind the services. When you design the services, think about microservice architecture or monolithic architecture. Most likely than not for a large-scale system, microservice architecture is a way to go. But it’sit’s always important for you to question that and understand why you’re going with that architecture versus monolithic. The advantages of monolithic architecture are that there’s less back and forth between the client and the API. It is also simple to implement, especially if your use cases are limited. But microservice architecture is scalable, so if you have an ERP, you need microservice architecture.
Proxy Servers and API proxies
Proxy servers are used to conceal our identities. A forward proxy server conceals the server identity. A reverse proxy server conceals the client’s identity. Along with concealing the identity, the proxy servers can also do other things like having lightweight code, like the HTML code or policies or routing logic for your APIs. Proxy servers can act as a cache where they can cache the API request that’s coming in and get back the response from the cache without hitting your API. This helps save your network bandwidth and the load for your API.
There’s also a concept called collapse forward that can be implemented in a proxy server, where multiple requests that come into the proxy server can be collapsed into a single request, sent into your API, and the response can be sent back to the clients depending on the type of the request.
API gateways are API proxies, specifically helping us with API security, rate limiting, and transformation.
CAP Theorem
When you start designing microservice architecture, you want performance, uptime, and resilience. Whereas in a distributed system with parallel computing, you will not be able to get all three of them. You can only get two of them. This phenomenon is captured in theory called the CAP theorem. CAP theorem states, “Out of the three overarching goals, you can only achieve two.”
Three overarching goals in APIs
Consistency – Every API should be consistent. The same API request should render the same API response, no matter which server it’s coming from, at any given time.
Availability – You want all the APIs to be available 100% of the time.
Partition tolerance – In a distributed network, there can be a network failure. When a system is partitioned, a particular partition cannot talk to another partition, so the data is inconsistent.
So, the CAP theorem says that you can only get two out of these three. For example, think of a system where there’s a network failure, and there’s a partition. If you want to keep the API consistent at that time, even with the partition, you can only keep it consistent by bringing down the APIs until the data is synced up. So, you cannot have availability if there’s a partition and you want to keep consistency. At the same time, if you want a partition and if you want to keep the APIs available, then you can only keep it available by not giving back consistent responses. If you want both, consistency and availability, then there cannot be a partition for your disk system.
As a professional, this is your limitation when you design your APIs. You will achieve only two of them and consider what is important for you. For example, if you’re doing a social media API, where your API is giving back the number of likes, shapes, or a news feed of comments, consistency may not be that important for you. So, when there’s a network failure, it’s okay if there is no consistency. Then, you can give up on consistency and keep APIs available because you want the social media API to be available without being down. But if you’re thinking about a financial API, you want the APIs to be consistent no matter what. So that’s where you bring down the API and not make it available until the partition is brought up. Knowing these limitations will help during system design.
Design patterns
All our services are defined into three different patterns.
Replicated services – Replicated services are the most common services. Every service is identical to every other service, and all are capable of supporting traffic. Each service is responsible for serving a backup request. It’s replicated in all the servers. They work exactly the same, no matter which server, and when a load balancer in front of it doesn’t have to think much, it can just distribute the request for the replicated service. It can be done for a stateless service. Replicating stateless services improves reliability, redundancy, and scaling.
Sharded services – Sharded service is a key concept where the service is only responsible for serving a subset of the request. Each replica or shard can only serve a subset of all requests. A load-balancing node has extra logic for examining each request and distributing each request to the appropriate shard for processing.
Scatter gather services – Scatter gather requests are simultaneously farmed out to all of the service replicas in the system. Each replica does a small amount of processing and then returns a fraction of the result to the root. The root server then combines the various partial results to form a complete response to the request and then sends this response back to the client.
Checklist for APIs with a system design hat
Concurrency – When your system is large, and you’re getting a lot of requests, you’re bound to get a lot of concurrent requests. It’s not a big deal if it’s only read-only APIs. But if you have to update your API, you must keep concurrency in mind. For example, you get a get request, but you do not want to lock the API because it will slow the system. So, you can do nothing where the last update wins. Or you can do optimistic locking, where if an update is done, it will give an error to your user stating that something has changed recently and needs to be refreshed and then updated.
Network – Always optimize your API for low latency and high throughput.
- Ensure maximum request payload size
- Stream where applicable
- Define use cases specific end-points
Real-world performance – The best feature your API can have is a great performance. Load-test your APIs for real-world volume and scenarios.
Estimation – It’s very important to write down the estimation for your API throughout the system design, high-level design, detailed design, implementation, and testing phases.
Availability and reliability – Understand what in the system can make your API go down and your response to that.
All these factors will help you design your APIs better and provide a good user experience.