Benjamin Davy is an innovation manager with TEADS, a digital advertising company, helping news publishers monetize their content by respectfully delivering ads. This article by Benjamin discusses building an AWS EC2 Carbon Emissions Dataset.
We wanted to know more about our service’s carbon footprint and discovered a lack of resources. So, we tried to see if there would be a way to estimate it using a bottom-up approach, starting with compute resources.
Today, there is no standard methodology or emission factors to assess a digital service. There are no factors to check carbon emissions related to the computing and networking activities we are generating. As our main provider is AWS, and our main service is EC2, we started by looking at how we could estimate the impact of using Cloud instances.
In a data center, we have computing resources; services support the servers’ running and manufacturing the servers. We have to look at power consumption and location.
Emissions related to running instances
To estimate emissions related to running instances, we need to find a way to know the power consumed by our instances. We also need to know the data center’s power usage effectiveness (PUE). This tells us the power used for non-IT equipment in the data center, like the cooling systems.
AWS communicates that all their data centers have a PUE under 1.2. It means that if a server consumes 100 Watts, we can add 20 Watts to cover all the things around it that are required to run a machine and consume power. Once we have the power consumption, we need to convert this value into carbon emissions based on our country’s electricity carbon emission factor. But we did not have the actual power consumption of instances, though there is a way to estimate it.
On modern Intel CPUs, we have access to RAPL, Running Average Power Limit, an interface that gives access to a set of counters, giving energy and power consumption information. Using Turbostat, a tool by Intel, we can get the CPU TDP (Thermal Design Power). According to Intel, it refers to the power consumption under the maximum theoretical load. So it’s supposed to be the maximum sustainable power consumption for the CPU. This is especially useful because, on AWS, we use custom-made CPUs, which means that even if we can get the CPU model using a simple LS CPU comment on the terminal, we will never find any public information about this CPU model online. But most importantly, the tool gives us access to the CPU’s power consumption and the machine’s memory.
We must remember that this relies on software modeling and is not as accurate as a physical power analyzer. This is the first limitation.
So to build consumption profiles, we combine two tools. The first one is Stress-ng. We use it to create a stress protocol to simulate different usage contexts. We use Turbo-Start to read the power consumption during the test. The resulting tool is called Turbostress. Though this was effective for some
However, it was it has many limitations. First, we couldn’t do this for instances using AMD CPUs, as no bare metal machines were available. And also, this power consumption feature doesn’t have an equivalent on ARM CPUs. So we cannot test Graviton-based machines as well. Another limitation is that we only measure part of the server, the CPU, and the memory by doing this. And finally, this is a barometer measurement. And like many AWS customers, we don’t use barometer machines. So, this was not directly usable.
Now, the CPU and memory are supposed to be the main power drivers of a server. However, other parts also consume power, such as the main board, fans, storage drives, network cards, additional drives, cards, and drives. We did several analyses and defined the value we added to our measurement to cover what we could not measure. For now, it’s simply a percentage of the CPU.
Sizing of AWS instances – The next step is to estimate an entire server. On most instances, there is no other commitment. It means that the resources you get are dedicated to you. Our resources are linearly cut into virtual instances. It simplifies calculations. We can roughly estimate the consumption of an instance. This is not simple, but we considered it good enough for our estimation.
We generalized our previous results based on the machine’s and CPU node’s specifications in the case of AMD and ARM platforms, which we couldn’t measure.
Estimating the carbon footprint of manufacturing the
This is an even more complex challenge. We can retrieve information about the underlying hardware if we consider our instances. But some parts that are custom-made by AWS are not available anywhere else. So it could be hard to compare with other products. Also, there isn’t much information about the manufacturing footprint of IT equipment to compare these specifications with. Some manufacturers like Dell provide a product carbon footprint document for their machines. Unfortunately, this is not ideal because the specifications for these product carbon footprints are not comparable to our AWS service. For example, the amount of memory on these machines is ridiculously low. Also, there are a few examples where the impact is detailed at the component level. So we don’t know how a specific component could impact the manufacturing footprint.
A Naïve Proposal
Having studied the few data available, we came up with this naive model to estimate the emissions from manufacturing.
We defined a minimal server with one CPU and a low amount of memory and assumed it has a manufacturing footprint of 1000 kg of Carbon dioxide. Then we set our carbon footprint values for additional parts to add to AWS emissions.
Manufacturing emissions for additional components | Emissions (kgCO2 eq) |
128 GB of DRAM | 150 |
1 CPU | 100 |
1 HDD | 50 |
1 SSD | 100 |
1 GPU Card | 150 |
But for GPUs, we didn’t find any data. So the 150 Kg of carbon emissions related to a GPU card is a wild guess. Also, we define only one value per type of component, even if the actual manufacturing footprint can vary greatly between two different parts. So it’s indeed a very first and naive estimation.
We applied this model to each bare metal platform on EC 2. And we considered that a server could have a lifespan of four years, which is the value used by most manufacturers. We computed an hourly rate for these emissions. For each additional part, we added some emissions before calculating an hourly rate at the instance level. So, combining the two emissions estimations, we can compute an overall footprint.
Looking at carbon emissions is an important and interesting step. But in the end, it’s not enough.
This tool and this dataset can be used for granular analysis of our infrastructure, even if it’s not perfectly accurate.