AWS EC2 Status Checks and Recovery Options

During our training, we discussed the importance of understanding and managing status checks for EC2 instances in AWS. Here's a summary of the key points:

EC2 Status Checks

AWS performs automatic status checks on EC2 instances to identify potential hardware and software issues. There are two main types of status checks:

System Status Checks

Monitors AWS system problems that affect the underlying infrastructure.
Issues can include hardware failures or loss of power on the host.
To review these issues, use the Personal Health Dashboard.
Remediation typically involves stopping and starting the instance, which migrates it to a new host.

Instance Status Checks

Monitors the software and network configuration of the individual instance.
Issues can include invalid network configurations or exhausted memory.
Remediation can be as simple as rebooting the instance or changing its configuration.

Automated Recovery Options

To automate the recovery process, you can leverage CloudWatch metrics and alarms:

Option 1: CloudWatch Alarms

Set up CloudWatch Alarms for metrics like StatusCheckFailed_System and StatusCheckFailed_Instance.
Configure the alarm to perform the "recover instance" action, which maintains the same IP addresses, metadata, and placement group.
Optionally, send notifications via Amazon SNS.

AWS EC2 Status Checks and Recovery Options

EC2 Status Checks

System Status Checks

Instance Status Checks

Automated Recovery Options

Option 1: CloudWatch Alarms

Option 2: Auto Scaling Group