David Freeman works with a consultancy firm called Sonrai Consulting. In this article, he discusses building on-premise hybrid API platforms.
One of the obvious questions is, why would you build on-premise when everything seems to be going Cloud-first? But there are some very good reasons why people build applications on-premise.
- Regulatory requirements – Terms and conditions of cloud providers may prevent you from putting your data on the Cloud.
- Location—The Cloud is not everywhere. API responses need to be within a few seconds. If the servers are located farther apart, the response time will be longer. So, for some businesses that have very latency-sensitive data, location is very important.
- Resiliency – Customers believe that on-premise applications will have less downtime and can be controlled better.
- Political—If anyone says a wrong thing on a social media platform, many people are waiting to de-platform them. As the “cancel culture” grows, people want to be anti-fragile and move away from cloud operators.
When we look at building an application or an API platform at home, we can quickly get from start to finish. But in an enterprise scenario, projects take a long time. Systems tend to be designed around the organizational structure. You have a network, application, integration, and infrastructure teams. You have people like project managers, architects, engineering leads, and product and platform owners who hold those people together. Doing those things concertedly has challenges, such as approval cycles, testing, coordination, technology readiness, etc. There are inherited limitations of the enterprise business that can pose a challenge.
Case-study
We had to build an on-premise API platform, which was an extension of the enterprise’s existing API platform on the Cloud. It was a hybrid platform. It had to be two sites in two states, running these API gateways on-premise in an active-active mode. It must survive a total cloud outage. So, if the clouds are out, this platform must stay up and running. This means it has to stay up all the time. We had a timeline for delivery of less than three months.
Let’s look at some of the learning and the tips and tricks here. The first thing is that the schedule needs to be five to six months long. For the API platform part itself, you need two dedicated people who cross-skill, so they know about API platforms and how to build API networks; they know Kubernetes and other bits and pieces of the technology to get a successful outcome. Ensure that the architect and the designer are builders. Create a sandpit environment. Ensure that the builders provide support.
Physical infrastructure
For on-premise physical infrastructure, you have to consider a few things –
- Physical infrastructure delivery time. Getting the servers, network equipment, and storage into your data center can take several months. So, factor that into your timeline.
- Data center space – Ensure enough physical space to install your equipment.
- Network capacity—You need to ensure that you have enough network ports available at all times to meet your requirements.
- Matchy-matchy—The more critical your infrastructure, the more you want your lower environments to match production, even down to the infrastructure. Of course, this will depend on the project budget. Obviously, the less important the infrastructure, the less matchy-matchy it has to be.
Kubernetes deployment for your API platform
- Be careful of code or config drift as you move to higher environments. Manually deploying from dev to production leaves a very high risk of code drift. So, before going into production, we had to come back and make sure we normalized that code across the environment.
- Deploy as soon as you can. First, deploy in dev. You build as you design.
- Bring security on the journey. This will ensure that the security is built-in during design so that changes do not have to be made later.
Network security
Segregating your environment is critical. You may want to share clusters with different workloads, so it’s important to ensure they’re segregated. Using network policies ensures you only allow certain traffic from each namespace or each OpenShift project. You can use firewall rules.
Operations
- You should have local as well as cloud telemetry.
- Ensure that the time series database is on a PVC and not on ephemeral storage so that when you need a reboot, it is still there.
- Set resource limits so that it does not impact production workloads.
- Look at logging and indexing.