Amazon CloudWatch Metrics By Example

John Tucker
codeburst
Published in
5 min readAug 13, 2020

--

Making sense of how Metrics are uniquely defined by a Name, a Namespace, and zero or more Dimensions.

I was surprised how long it took me to come to understand Amazon CloudWatch Metrics; thought to share my thinking.

Metrics are the fundamental concept in CloudWatch. A metric represents a time-ordered set of data points that are published to CloudWatch. Think of a metric as a variable to monitor, and the data points as representing the values of that variable over time. For example, the CPU usage of a particular EC2 instance is one metric provided by Amazon EC2. The data points themselves can come from any application or business activity from which you collect data.

Metrics are uniquely defined by a name, a namespace, and zero or more dimensions.

— AWS — Amazon CloudWatch Concepts

It is relatively easy to understand how Names and Namespaces relate to each other and to Metrics; for example:

Observations:

  • As Amazon CloudWatch is a regional service, Namespaces are unique within a Region, e.g., AWS/EC2 is distinct from CWAgent; Namespaces are also not hierarchical
  • Names are unique within a Namespace, e.g., CPUUtilization is distinct from DiskReadOps
  • Multiple Metrics, however, can share the same Name (this will make more sense shortly)
  • The AWS/EC2 Namespace is used by Amazon EC2 monitoring and CWAgent is used by Amazon CloudWatch Agent

note: It is interesting that the no-cost Amazon EC2 monitoring does not include either memory or disk utilization; for these, one must use Custom Metrics (starts at $0.30 per Custom Metric per month), e.g., generated by Amazon CloudWatch Agent.

The key to distinguishing between Metrics with the same Name are Dimensions:

A dimension is a name/value pair that is part of the identity of a metric. You can assign up to 10 dimensions to a metric.

Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics. Dimensions help you design a structure for your statistics plan. Because dimensions are part of the unique identifier for a metric, whenever you add a unique name/value pair to one of your metrics, you are creating a new variation of that metric.

— AWS — Amazon CloudWatch Concepts

Upon first reading this, I was confused. Looking back, part of my confusion arose due to the use of the word Dimension itself. IMHO, I think the term Label would have been a better choice. Let us look at some examples:

Observations:

  • Under the mem_used_percent Name, Metrics are distinguished using three name/value pairs (Dimensions); with names InstanceId, ImageId, and InstanceType. Here the two Metrics refer to memory utilization of two different AWS EC2 Instances
  • If we were to change an Instance’s Type, Amazon EC2 monitoring would create a new Metric with the updated InstanceType Dimension (but with the same InstanceId). This, in particular, is understandable as mem_used_percent would not be comparable across different Instance Types (different memory values in the denominator)
  • Under the disk_used_percent, Metrics are distinguished using six Dimensions; with names InstanceId, ImageId, InstanceType, fstype, device and path. Here the two Metrics refer to disk utilization of two different volumes for the same AWS EC2 Instance; same InstanceId and different paths

note: It seems that the Metrics are labeled with some irrelevant (not useful in distinguishing Metrics) Dimensions, e.g., we would never have two metrics with the same InstanceId and different ImageId Dimensions (one cannot change an Instance’s AMI Id).

note: Metrics, and their identifiers, can be listed using the AWS CLI tool, e.g.,

$ aws cloudwatch list-metrics \
--namespace AWS/EC2
{
"Metrics": [
{
"Namespace": "AWS/EC2",
"MetricName": "NetworkPacketsIn",
"Dimensions": [
{
"Name": "InstanceId",
"Value": "i-08d99a369f572dbdb"
}
]
},
...

Amazon EC2 monitoring generates aggregated Metrics across Amazon EC2 Instances with the same AMI Images, same Instance Types, and across all Instances. This is accomplished by using appropriate Dimensions:

note: This is something specific to how Amazon EC2 monitoring generates Metrics; i.e,. it is not something that Amazon CloudWatch does automatically.

Observations:

  • The Metrics with an InstanceId Dimension are the Per-Instance Metrics (as shown in the AWS Console)
  • The Metrics with an ImageId Dimension are the By Image (AMI) Id Metrics
  • The Metrics with the InstanceType Dimension are the Aggregated by Instance Type Metrics
  • The Metrics with no Dimensions (EMPTY) are the Across All Instances Metrics

Now that we know how to uniquely identify a Metric by supplying a Namespace, Name, and Dimensions, let us explore how to calculate statistics across Data Points in a Metric.

First, we calculate an average across all Data Points in five minute periods over the course of an hour for a CPU utilization Per-Instance Metric using the AWS CLI tool.

$ aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-08d99a369f572dbdb \
--start-time 2020-08-11T16:00:00Z \
--end-time 2020-08-11T17:00:00Z \
--period 300 \
--statistics Average

{
"Label": "CPUUtilization",
"Datapoints": [
{
"Timestamp": "2020-08-11T16:30:00+00:00",
"Average": 0.0666666666666676,
"Unit": "Percent"
},
...

Observations:

  • We are required to provide all of the identifiers, Namespace, Name, and Dimensions to uniquely identify a single Metric
  • Here we get twelve (60 divided by 5) values representing the five minute periods over an hour
  • While one cannot access the raw Data Points themselves, one can effectively get at them by calculating any statistic, e.g., Average, over a period that matches up with the period of the service generating the Data Points in the Metric (there is only one value is used to calculate the statistic)

We get the same data using the Amazon CloudWatch service in the AWS Console:

note: After exploring the Basic (recorded every five minutes) Metrics generated by Amazon EC2 monitoring, I noticed that it records five Data Points at the same point in time every five minutes (I expected one). We can see this by using the SampleCount (number of Data Points in period) statistic. This threw me off for quite some time.

Wrap Up

Nothing too fancy here, but at least Dimensions make sense to me now.

--

--