Omid Eidivandi is a Technical lead. His primary role is modernizing distributed systems, focusing on event-driven design. In this article, he discusses serverless sustainability.
Event-driven architecture and serverless software based on the economic model applied are optimized to reduce resource consumption. When we talk of serverless, we talk about bunch of services and communications details in different categories like compute, storage, brokers, events, integration between the services, data flow, event flow, and everything transiting between these components.
Best metric to level up sustainability?
The best metric is a metric which is enough comprehensive by anyone in the company and any human or machine will be able to compare it and differentiate and finally find if the direction opted toward sustainability is the correct direction, and the most meaningful metric is the cost which is familiar with any staff in the company.
The serverless compute billing is calculated with a simple model which is based on :
- Execution time
- Memory consumption
So improving your costs in serverless design means avoiding unnecessary consumption and as well optimal usage of resources.
When thinking about sustainability, we need to align the product and business are aligned and ensure about those alignments.
A few questions that we need to address are –
- How can we organize data?
- How do we use brokers?
- How do we transfer data?
- How to optimize events?
- How to optimize computing?
How to ensure the right resource consumption:
There is no silver bullet, when designing serverless solutions for all pillars of Well-Architected Framework we need to gather data, look at them and improve the solution.
The continuous improvement became a daily task during the last 7 years of my experience.
- Continuously Observe
- Continuously Gather Data
- Continuously Evaluate
- Continuously Improve
Sustainability pillar as well has no silver bullet solution but can be observed, evolved and improved iteratively, The sustainability must become a journey not a one fit for all solution decision.
When designing software choosing the right datastore is a crucial decision and needs a good understanding of requirements and the interactions around those data. We can have relational databases, NoSQL databases, document databases, collection-based databases, and blob storage.
The Datastores are basically used for CRUD operations but the most resource-intensive operation was always querying data which is often based on complex conditions and in reality a really fast query can consume a large amount of resources behind the scene.
One thing to consider is that all datastores are not technically working and designed in the same manner and choosing the wrong datastore can lead to the wrong design.
When choosing a data storage think about :
- CAP requirements (Consistency, Availability and Partitioning).
- Indexing requirements
- Throughput needs
- Avoid Over Provisioning
Data in its own context is important and often a large amount of business is presented as data, having all these data in place and keeping all of them alive reduces the performance as well as increases resource consumption in terms of storage and querying data.
Let’s think about those data, let’s be familiar with our software data.
Are data Hot? The hot data are sorts of data which are in real-time used and consumed by the software.
Are Data Accessed Frequently? Are those data used and consumed in real scenarios frequently and must those data be available at any given time now or later?
Are Data Cold? The data that are never used or are used occasionally for specific use cases.
Are Data Needed? if there is a need to store the data.
Data life cycle is the duration of time in which data exists in a specific storage type.
There are multiple storage classes :
- Infrequently accessed
Based on your preceding questions we asked about data, we can choose the best storage class to keep those data.
Brokers are highly used in the serverless design and those are one of the principal components of cloud native designs.
Brokers are used to notify interested actors or make crossing communication bundaries more flexible. Consumers fetch data, and that pattern has to be based on customer and business needs.
When designing APIs the consumers are querying APIs to fetch their data based on criteria, but any consumer sends requests in its proper style and based on its proper needs. the question is about, are all those requests useful? in other words, are all those requests some meaningful results and data in response?
How we can improve that design to improve the service and resource consumption?
Let’s notify the consumers of any change in our system and let them be notified by that change.
This way we push the data to the interested actors avoiding resource extra consumption.
The brokers help to improve consumption as well :
- Lightning communication
- Sharing Responsibilities
- Inversing dependency direction
- Notify changes
When Systems communicate, the data are passing through from one side to the other side, the source system exposes a well-defined data model which represents a part of the business process and must cover the consumer needs.
In distributed systems the consumers use the source based on a versioned API documentation, the data model exposed is a collection of elements which are not useful for all consumers at any given time.
In serverless design, these well-defined data models are as well presented and are called Domain Events.
Domain Events: Domain Events represent an entity in a source system which will be broadcast to one or all consumers, theses events if get out of control can produce Fat Events and can become tricky in the long-term, but beside that, these events can add some overconsumption.
Integration Events: Integration events represent a lightweight event which is used to let two systems communicate together or be integrated into a distributed communication. the integration events let the consumer to be informed by something happened in producer side and let the consumer as well be able to identify the event purpose.
The consumer can decide based on the integration info to be interested about that event or not.
Some Consideration while data transfer:
– Transfer only essential information.
– Notify in a lightweight format.
– Let the consumers control their needs.
– Keep resource near to consumers.
– Optimize events
When using Events , basically we are connecting to bounded contexts by transferring some essential information which helps the consumer make some business-centric decisions but in distributed systems, the events have some drawbacks as:
- Hard to debug
- Hard to standardize
- Designed mostly as a mean to move data
- Are too large
When deciding to move toward Event driven design the first thing to do is think about events and not about brokers, communication channels or etc…
The Event must represent :
- Lightweight event
- Observability info
- Correlation info
- Optimized for transfer
- Decisional event Lifecycle
When talking about serverless the most common category of serverless services comes in mind is compute or Faas ( Function as a service )
A function has 3 life phases Initialization, Invoke and shutdown.
Initialization: During this phase the function will be setup for invocation including Runtime Init, Code Init.
The Environment init will setup all environment specifications like making available the Disk, Adding Environment Variables, Making Network Ingress. Egress available, Memory and etc.
The Code init will run the pre-init code like dependency initialization, extensions.
Invocation: This is the period where the function logic ( handler or invoke method) is running. this phase must be treated and terminated before defined function timeout.
Shutdown: Any function needs to be cleaned up after usage, The function will be freeze after usage during a short period of time and during this freeze mode any new demand will make reuse of that already initialized function. in other case if there is no demand that function will be cleaned up and reset.
Concurrency in serverless is a bit tricky as based on serverless ecosystem it works differently from containers or VMs. ( to find more about serverless vs traditional cloud refer to my article here )
In serverless concurrency means the capacity to allocate available Micro containers to handle simultaneous demands. basically any micro-container handles a single demand at a given time.
In serverless compute , there is no waiting for a demand to be treated behind previous demand, as the compute service will treate any demand separately and you can run thousands of demands at the same time below throttling limits at a given time.
Keeping your function initialization + invocation phase enough short and optimized will help increase your concurrency level as well reducing resource consumption.
When using Compute to reduce resource consumption consider to:
-Make reuse of already initialized containers
-Reduce initialization phase period by reducing dependencies size, function code size.
-Reduce invocation phase period.
-Choose the right Memory which leads to optimize function logic process.
-Use Neoverse processors.
-Use of direct service integration when possible instead of writing function.
-Move toward Asynchronous design
Sharing responsibilities and caring about our planet must be applied in all dimensions, improving , reducing and preventing impacts.
The sustainability pyramid is all about iterations, Using the appropriate patterns will leave a gap of flexibility to simplify the improvement journey.
Relying on managed services can be , in lots of use cases, a way of reducing your solution complexity and simplifies the continuous observation / data gathering / evaluating and improvement.
To conclude, opt for serverless solutions when applies for more sustainability maneuver.