I built a Python script (2.7) that will check Mongo connections, queries, and replication status. The structure is basically 3 methods that runs its respective checks and 1 method that sends the results to CloudWatch:
#!/usr/bin/python
import commands
import json
import pymongo
import subprocess, os
import re
from pymongo import MongoClient
ret, instanceId = commands.getstatusoutput("wget -q -O - http://169.254.169.254/latest/meta-data/instance-id")
# Checks Number of Connections Made against Total Connections Allowed
def parse_connections(ret, instanceId):
# Obtains Connections made and Total Connections Allowed
connection_result=os.popen("/usr/lib/nagios/plugins/check_mongodb.py -A connections").read()
get_numeric_con_results= map(int, re.findall(r'\d+', connection_result))
connections_so_far = get_numeric_con_results[1]
total_connections = get_numeric_con_results[2]
# Calculate percentage for CloudWatch
metric_name = "Mongo Connections"
percentage_connections_used = float(connections_so_far) / float(total_connections)
percentage_float = float(percentage_connections_used)
result = format(percentage_float, '.2f')
send_mongo_results(metric_name, instanceId, ret, result)
# Checks Response time of Connectivity
def check_mongo_connections(ret, instanceId):
connection_result=os.popen("/usr/lib/nagios/plugins/check_mongodb.py -A connect -W 2 -C 4").read()
metric_name = "Mongo Connection Response In Seconds"
# Parse Through Response
connection_time = map(int, re.findall(r'\d+', connection_result))
connection_time_result = connection_time[0]
send_mongo_results(metric_name, instanceId, ret, connection_time_result)
# Queries Per Second
def queries_per_second(ret, instanceId):
connection_result=os.popen("/usr/lib/nagios/plugins/check_mongodb.py -A queries_per_second").read()
metric_name = "Mongo Queries Per Second"
#Parse Response
get_numeric_result=(re.findall("\d+\.\d+",connection_result))
result=get_numeric_result[0]
send_mongo_results(metric_name, instanceId, ret, result)
## Submit Results
def send_mongo_results(metric_name, instance_id,ret,result):
cmd = "aws cloudwatch put-metric-data --metric-name " + metric_name + " --namespace MONGO --dimensions \"instance=" + instanceId + ",servertype=Mongo\" --value " + str(result) + " --region us-east-1"
ret,cmdout = commands.getstatusoutput(cmd)
parse_connections(ret, instanceId)
check_mongo_connections(ret, instanceId)
queries_per_second(ret, instanceId)
The script works but I don't see the results in CloudWatch when the script is ran. I placed a print statement in the send_mongo_results() and it hits the method. Can someone recommend what could be preventing the method from sending the results to CloudWatch? (FYI: I have an IAM role for the script so it's not that)
here is the docs for python of how to log (in lambda, but should be the same for you) http://docs.aws.amazon.com/lambda/latest/dg/python-logging.html
edit:
sorry, you wanted to use cloudwatch metrics... check this page http://boto3.readthedocs.io/en/latest/reference/services/cloudwatch.html#CloudWatch.Client.put_metric_data
you need to use the boto3 lib https://aws.amazon.com/sdk-for-python/
Related
I am trying to run a SSM command on more than 50 EC2 instances of my fleet. By using AWS boto3's SSM client, I am running a specific command on my nodes. My code is given below. After running the code, an unexpected error is showing up.
# running ec2 instances
instances = client.describe_instances()
instance_ids = [inst["InstanceId"] for inst in instances] # might contain more than 50 instances
# run command
run_cmd_resp = ssm_client.send_command(
Targets=[
{"Key": "InstanceIds", "Values": inst_ids_all},
],
DocumentName="AWS-RunShellScript",
DocumentVersion="1",
Parameters={
"commands": ["#!/bin/bash", "ls -ltrh", "# some commands"]
}
)
On executing this, getting below error
An error occurred (ValidationException) when calling the SendCommand operation: 1 validation error detected: Value '[...91 instance IDs...]' at 'targets.1.member.values' failed to satisfy constraint: Member must have length less than or equal to 50.
How do I run the SSM command my whole fleet?
As shown in the error message and boto3 documentation (link), the number of instances in one send_command call is limited up to 50. To run the SSM command for all instances, splitting the original list into 50 each could be a solution.
FYI: If your account has a fair amount of instances, describe_instances() can't retrieve all instance info in one api call, so it would be better to check whether NextToken is in response.
ref: How do you use "NextToken" in AWS API calls
# running ec2 instances
instances = client.describe_instances()
instance_ids = [inst["InstanceId"] for inst in instances]
while "NextToken" in instances:
instances = client.describe_instances(NextToken=instances["NextToken"])
instance_ids += [inst["InstanceId"] for inst in instances]
# run command
for i in range(0, len(instance_ids), 50):
target_instances = instance_ids[i : i + 50]
run_cmd_resp = ssm_client.send_command(
Targets=[
{"Key": "InstanceIds", "Values": inst_ids_all},
],
DocumentName="AWS-RunShellScript",
DocumentVersion="1",
Parameters={
"commands": ["#!/bin/bash", "ls -ltrh", "# some commands"]
}
)
Finally after #Rohan Kishibe's answer, I tried to implement below batched execution for the SSM runShellScript.
import math
ec2_ids_all = [...] # all instance IDs fetched by pagination.
PG_START, PG_STOP = 0, 50
PG_SIZE = 50
PG_COUNT = math.ceil(len(ec2_ids_all) / PG_SIZE)
for page in range(PG_COUNT):
cmd = ssm.send_command(
Targets=[{"Key": "InstanceIds", "Values": ec2_ids_all[PG_START:PG_STOP]}],
DocumentVersion="AWS-RunShellScript",
Parameters={"commands": ["ls -ltrh", "# other commands"]}
}
PG_START += PG_SIZE
PG_STOP += PG_SIZE
In above way, the total number of instance IDs will be distributed in batches and then executed accordingly. One can also save the Command IDs and batch instance IDs in a mapping for future usage.
I am attempting to write a simple Lambda function to query a table in Athena. But after a few seconds I see "Status: FAILED" in the Cloudwatch logs.
There is no descriptive error message on the cause of failure.
My test code is below:
import json
import time
import boto3
# athena constant
DATABASE = 'default'
TABLE = 'test'
# S3 constant
S3_OUTPUT = 's3://test-output/'
# number of retries
RETRY_COUNT = 1000
def lambda_handler(event, context):
# created query
query = "SELECT * FROM default.test limit 2"
# % (DATABASE, TABLE)
# athena client
client = boto3.client('athena')
# Execution
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': DATABASE
},
ResultConfiguration={
'OutputLocation': S3_OUTPUT,
}
)
# get query execution id
query_execution_id = response['QueryExecutionId']
print(query_execution_id)
# get execution status
for i in range(1, 1 + RETRY_COUNT):
# get query execution
query_status = client.get_query_execution(QueryExecutionId=query_execution_id)
query_execution_status = query_status['QueryExecution']['Status']['State']
if query_execution_status == 'SUCCEEDED':
print("STATUS:" + query_execution_status)
break
if query_execution_status == 'FAILED':
#raise Exception("STATUS:" + query_execution_status)
print("STATUS:" + query_execution_status)
else:
print("STATUS:" + query_execution_status)
time.sleep(i)
else:
# Did not encounter a break event. Need to kill the query
client.stop_query_execution(QueryExecutionId=query_execution_id)
raise Exception('TIME OVER')
# get query results
result = client.get_query_results(QueryExecutionId=query_execution_id)
print(result)
return
The logs show the following:
2020-08-31T10:52:12.443-04:00
START RequestId: e5434651-d36e-48f0-8f27-0290 Version: $LATEST
2020-08-31T10:52:13.481-04:00
88162f38-bfcb-40ae-b4a3-0b5a21846e28
2020-08-31T10:52:13.500-04:00
STATUS:QUEUED
2020-08-31T10:52:14.519-04:00
STATUS:RUNNING
2020-08-31T10:52:16.540-04:00
STATUS:RUNNING
2020-08-31T10:52:19.556-04:00
STATUS:RUNNING
2020-08-31T10:52:23.574-04:00
STATUS:RUNNING
2020-08-31T10:52:28.594-04:00
STATUS:FAILED
2020-08-31T10:52:28.640-04:00
....more status: FAILED
....
END RequestId: e5434651-d36e-48f0-8f27-0290
REPORT RequestId: e5434651-d36e-48f0-8f27-0290 Duration: 30030.22 ms Billed Duration: 30000 ms Memory Size: 128 MB Max Memory Used: 72 MB Init Duration: 307.49 ms
2020-08-31T14:52:42.473Z e5434651-d36e-48f0-8f27-0290 Task timed out after 30.03 seconds
I think I have the right permissions for S3 bucket access given to the role (if not, I would have seen the error message). There are no files created in the bucket either. I am not sure what is going wrong here. What am I missing?
Thanks
The last line in your log shows
2020-08-31T14:52:42.473Z e5434651-d36e-48f0-8f27-0290 Task timed out after 30.03 seconds
To me this looks like the timeout of the Lambda Function is set to 30 seconds. Try increasing it to more than the time the Athena query needs (the maximum is 15 minutes).
Is there a way to delete all AWS Log Groups that haven't had any writes to them in the past 30 days?
Or conversely, get the list of log groups that haven't had anything written to them in the past 30 days?
Here is some quick script I wrote:
#!/usr/bin/python3
# describe log groups
# describe log streams
# get log groups with the lastEventTimestamp after some time
# delete those log groups
# have a dry run option
# support profile
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html#CloudWatchLogs.Client.describe_log_streams
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html#CloudWatchLogs.Client.describe_log_groups
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html#CloudWatchLogs.Client.delete_log_group
import boto3
import time
millis = int(round(time.time() * 1000))
delete = False
debug = False
log_group_prefix='/' # NEED TO CHANGE THESE
days = 30
# Create CloudWatchLogs client
cloudwatch_logs = boto3.client('logs')
log_groups=[]
# List log groups through the pagination interface
paginator = cloudwatch_logs.get_paginator('describe_log_groups')
for response in paginator.paginate(logGroupNamePrefix=log_group_prefix):
for log_group in response['logGroups']:
log_groups.append(log_group['logGroupName'])
if debug:
print(log_groups)
old_log_groups=[]
empty_log_groups=[]
for log_group in log_groups:
response = cloudwatch_logs.describe_log_streams(
logGroupName=log_group, #logStreamNamePrefix='',
orderBy='LastEventTime',
descending=True,
limit=1
)
# The time of the most recent log event in the log stream in CloudWatch Logs. This number is expressed as the number of milliseconds after Jan 1, 1970 00:00:00 UTC.
if len(response['logStreams']) > 0:
if debug:
print("full response is:")
print(response)
print("Last event is:")
print(response['logStreams'][0]['lastEventTimestamp'])
print("current millis is:")
print(millis)
if response['logStreams'][0]['lastEventTimestamp'] < millis - (days * 24 * 60 * 60 * 1000):
old_log_groups.append(log_group)
else:
empty_log_groups.append(log_group)
# delete log group
if delete:
for log_group in old_log_groups:
response = cloudwatch_logs.delete_log_group(logGroupName=log_group)
#for log_group in empty_log_groups:
# response = cloudwatch_logs.delete_log_group(logGroupName=log_group)
else:
print("old log groups are:")
print(old_log_groups)
print("Number of log groups:")
print(len(old_log_groups))
print("empty log groups are:")
print(empty_log_groups)
I have been using aws-cloudwatch-log-clean and I can say it works quite well.
you need boto3 installed and then you:
./sweep_log_streams.py [log_group_name]
It has a --dry-run option for you to check it what you expect first.
A note of caution, If you have a long running process in ECS which is quiet on the Logs, and the log has been truncated to empty in CW due to the logs retention period. Deleting its empty log stream can break and hang the service, as it has nowhere to post its logs to...
I'm not aware of a simple way to do this, but you could use the awscli (or preferably python/boto3) to describe-log-groups, then for each log group invoke describe-log-streams, then for each log group/stream pair, invoke get-log-events with a --start-time of 30 days ago. If the union of all the events arrays for the log group is empty then you know you can delete the log group.
I did the same setup, I my case I want to delete log stream older than X days from the Cloudwatch Log Stream.
remove.py
import optparse
import sys
import os
import json
import datetime
def deletefunc(loggroupname,days):
os.system("aws logs describe-log-streams --log-group-name {} --order-by LastEventTime > test.json".format(loggroupname))
oldstream=[]
milli= days * 24 * 60 * 60 * 1000
with open('test.json') as json_file:
data = json.load(json_file)
for p in data['logStreams']:
subtract=p['creationTime']+milli
sub1=subtract/1000
sub2=datetime.datetime.fromtimestamp(sub1).strftime('%Y-%m-%d')
op=p['creationTime']/1000
original=datetime.datetime.fromtimestamp(op).strftime('%Y-%m-%d')
name=p['logStreamName']
if original < sub2:
oldstream.append(name)
for i in oldstream:
os.system("aws logs delete-log-stream --log-group-name {} --log-stream-name {}".format(loggroupname,i))
parser = optparse.OptionParser()
parser.add_option('--log-group-name', action="store", dest="loggroupname", help="LogGroupName For Eg: testing-vpc-flow-logs",type="string")
parser.add_option('-d','--days', action="store", dest="days", help="Type No.of Days in Integer to Delete Logs other than days provided",type="int")
options, args = parser.parse_args()
if options.loggroupname is None:
print ("Provide log group name to continue.\n")
cmd = 'python3 ' + sys.argv[0] + ' -h'
os.system(cmd)
sys.exit(0)
if options.days is None:
print ("Provide date to continue.\n")
cmd = 'python3 ' + sys.argv[0] + ' -h'
os.system(cmd)
sys.exit(0)
elif options.days and options.loggroupname:
loggroupname = options.loggroupname
days = options.days
deletefunc(loggroupname,days)
you can run this file using the command:
python3 remove.py --log-group-name=testing-vpc-flow-logs --days=7
I've created a RDS postgres instance with size of 65GB initially.
Is it possible to get free space available using any postgres query.
If not, how can I achieve the same?
Thank you in advance.
A couple ways to do it
Using the AWS Console
Go to the RDS console and select the region your database is in. Click on the Show Monitoring button and pick your database instance. There will be a graph (like below image) that shows Free Storage Space.
This is documented over at AWS RDS documentation.
Using the API via AWS CLI
Alternatively, you can use the AWS API to get the information from cloudwatch.
I will show how to do this with the AWS CLI.
This assumes you have set up the AWS CLI credentials. I export AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in my environment variables, but there are multiple ways to configure the CLI (or SDKS).
REGION="eu-west-1"
START="$(date -u -d '5 minutes ago' '+%Y-%m-%dT%T')"
END="$(date -u '+%Y-%m-%dT%T')"
INSTANCE_NAME="tstirldbopgs001"
AWS_DEFAULT_REGION="$REGION" aws cloudwatch get-metric-statistics \
--namespace AWS/RDS --metric-name FreeStorageSpace \
--start-time $START --end-time $END --period 300 \
--statistics Average \
--dimensions "Name=DBInstanceIdentifier, Value=${INSTANCE_NAME}"
{
"Label": "FreeStorageSpace",
"Datapoints": [
{
"Timestamp": "2017-11-16T14:01:00Z",
"Average": 95406264320.0,
"Unit": "Bytes"
}
]
}
Using the API via Java SDK
Here's a rudimentary example of how to get the same data via the Java AWS SDK, using the Cloudwatch API.
build.gradle contents
apply plugin: 'java'
apply plugin: 'application'
sourceCompatibility = 1.8
repositories {
jcenter()
}
dependencies {
compile 'com.amazonaws:aws-java-sdk-cloudwatch:1.11.232'
}
mainClassName = 'GetRDSInfo'
Java class
Again, I rely on the credential chain to get AWS API credentials (I set them in my environment). You can change the call to the builder to change this behavior (see Working with AWS Credentials documentation).
import java.util.Calendar;
import java.util.Date;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.cloudwatch.AmazonCloudWatch;
import com.amazonaws.services.cloudwatch.AmazonCloudWatchClientBuilder;
import com.amazonaws.services.cloudwatch.model.GetMetricStatisticsRequest;
import com.amazonaws.services.cloudwatch.model.GetMetricStatisticsResult;
import com.amazonaws.services.cloudwatch.model.StandardUnit;
import com.amazonaws.services.cloudwatch.model.Dimension;
import com.amazonaws.services.cloudwatch.model.Datapoint;
public class GetRDSInfo {
public static void main(String[] args) {
final long GIGABYTE = 1024L * 1024L * 1024L;
// calculate our endTime as now and startTime as 5 minutes ago.
Calendar cal = Calendar.getInstance();
Date endTime = cal.getTime();
cal.add(Calendar.MINUTE, -5);
Date startTime = cal.getTime();
String dbIdentifier = "tstirldbopgs001";
Regions region = Regions.EU_WEST_1;
Dimension dim = new Dimension()
.withName("DBInstanceIdentifier")
.withValue(dbIdentifier);
final AmazonCloudWatch cw = AmazonCloudWatchClientBuilder.standard()
.withRegion(region)
.build();
GetMetricStatisticsRequest req = new GetMetricStatisticsRequest()
.withNamespace("AWS/RDS")
.withMetricName("FreeStorageSpace")
.withStatistics("Average")
.withStartTime(startTime)
.withEndTime(endTime)
.withDimensions(dim)
.withPeriod(300);
GetMetricStatisticsResult res = cw.getMetricStatistics(req);
for (Datapoint dp : res.getDatapoints()) {
// We requested only the average free space over the last 5 minutes
// so we only have one datapoint
double freespaceGigs = dp.getAverage() / GIGABYTE;
System.out.println(String.format("Free Space: %.2f GB", freespaceGigs));
}
}
}
Example Java Code Execution
> gradle run
> Task :run
Free Space: 88.85 GB
BUILD SUCCESSFUL in 7s
The method using the AWS Management Console has changed.
Now you have to go:
RDS > Databases > [your_db_instance]
From there, scroll down, and click on "Monitoring"
There you should be able to see your db's "Free Storage Space" (in MB/Second)
Is there a utility or script available to retrieve a list of all instances from AWS EC2 auto scale group?
I need a dynamically generated list of production instance to hook into our deploy process. Is there an existing tool or is this something I am going to have to script?
Here is a bash command that will give you the list of IP addresses of your instances in an AutoScaling group.
for ID in $(aws autoscaling describe-auto-scaling-instances --region us-east-1 --query AutoScalingInstances[].InstanceId --output text);
do
aws ec2 describe-instances --instance-ids $ID --region us-east-1 --query Reservations[].Instances[].PublicIpAddress --output text
done
(you might want to adjust the region and to filter per AutoScaling group if you have several of them)
On a higher level point of view - I would question the need to connect to individual instances in an AutoScaling Group. The dynamic nature of AutoScaling would encourage you to fully automate your deployment and admin processes. To quote an AWS customer : "If you need to ssh to your instance, change your deployment process"
--Seb
The describe-auto-scaling-groups command from the AWS Command Line Interface looks like what you're looking for.
Edit: Once you have the instance IDs, you can use the describe-instances command to fetch additional details, including the public DNS names and IP addresses.
You can use the describe-auto-scaling-instances cli command, and query for your autoscale group name.
Example:
aws autoscaling describe-auto-scaling-instances --region us-east-1
--query 'AutoScalingInstances[?AutoScalingGroupName==`YOUR_ASG`]' --output text
Hope that helps
You can also use below command to fetch private ip address without any jq/awk/sed/cut
$ aws autoscaling describe-auto-scaling-instances --region us-east-1 --output text \
--query "AutoScalingInstances[?AutoScalingGroupName=='ASG-GROUP-NAME'].InstanceId" \
| xargs -n1 aws ec2 describe-instances --instance-ids $ID --region us-east-1 \
--query "Reservations[].Instances[].PrivateIpAddress" --output text
courtesy this
I actually ended up writing a script in Python because I feel more comfortable in Python then Bash,
#!/usr/bin/env python
"""
ec2-autoscale-instance.py
Read Autoscale DNS from AWS
Sample config file,
{
"access_key": "key",
"secret_key": "key",
"group_name": "groupName"
}
"""
from __future__ import print_function
import argparse
import boto.ec2.autoscale
try:
import simplejson as json
except ImportError:
import json
CONFIG_ACCESS_KEY = 'access_key'
CONFIG_SECRET_KEY = 'secret_key'
CONFIG_GROUP_NAME = 'group_name'
def main():
arg_parser = argparse.ArgumentParser(description=
'Read Autoscale DNS names from AWS')
arg_parser.add_argument('-c', dest='config_file',
help='JSON configuration file containing ' +
'access_key, secret_key, and group_name')
args = arg_parser.parse_args()
config = json.loads(open(args.config_file).read())
access_key = config[CONFIG_ACCESS_KEY]
secret_key = config[CONFIG_SECRET_KEY]
group_name = config[CONFIG_GROUP_NAME]
ec2_conn = boto.connect_ec2(access_key, secret_key)
as_conn = boto.connect_autoscale(access_key, secret_key)
try:
group = as_conn.get_all_groups([group_name])[0]
instances_ids = [i.instance_id for i in group.instances]
reservations = ec2_conn.get_all_reservations(instances_ids)
instances = [i for r in reservations for i in r.instances]
dns_names = [i.public_dns_name for i in instances]
print('\n'.join(dns_names))
finally:
ec2_conn.close()
as_conn.close()
if __name__ == '__main__':
main()
Gist
The answer at https://stackoverflow.com/a/12592543/20774 was helpful in developing this script.
Use the below snippet for sorting out ASGs with specific tags and listing out its instance details.
#!/usr/bin/python
import boto3
ec2 = boto3.resource('ec2', region_name='us-west-2')
def get_instances():
client = boto3.client('autoscaling', region_name='us-west-2')
paginator = client.get_paginator('describe_auto_scaling_groups')
groups = paginator.paginate(PaginationConfig={'PageSize': 100})
#print groups
filtered_asgs = groups.search('AutoScalingGroups[] | [?contains(Tags[?Key==`{}`].Value, `{}`)]'.format('Application', 'CCP'))
for asg in filtered_asgs:
print asg['AutoScalingGroupName']
instance_ids = [i for i in asg['Instances']]
running_instances = ec2.instances.filter(Filters=[{}])
for instance in running_instances:
print(instance.private_ip_address)
if __name__ == '__main__':
get_instances()
for ruby using aws-sdk gem v2
First create ec2 object as this:
ec2 = Aws::EC2::Resource.new(region: 'region',
credentials: Aws::Credentials.new('IAM_KEY', 'IAM_SECRET')
)
instances = []
ec2.instances.each do |i|
p "instance id---", i.id
instances << i.id
end
This will fetch all instance ids in particular region and can use more filters like ip_address.