Single Quotes Converting to Special Characters using AWS CLI - amazon-web-services

I am making SES Templates using the AWS CLI and having issues with single quotes converting to special characters when the emails are sent.
This also happens when doing a DynamoDB put item operation using the CLI when a string contains a single quote within it.
I've tried backslashes, wrapping the quote in double quotes then escaping it etc.
aws ses send-bulk-templated-email --cli-input-json file://test.json
aws dynamodb put-item --table-name TABLE --item file://item.json
Item/Test Example (snippets of the json):
test: "SubjectPart":"Happy birthday! Get more involved in managing your healthcare now that you're 18"
item:
"S": "Now that you're 18"
Output:
Happy birthday! Get more involved in managing your healthcare now that you’re 18
and
Now that you’re 18
Expected:
Happy birthday! Get more involved in managing your healthcare now that you're 18
and
Now that you're 18

Assuming that you're using Linux or Mac, with the bash shell ...
Here is an example of how to escape quote characters when using the awscli:
aws dynamodb put-item \
--table mytable \
--item '{"id":{"S":"1"}, "name":{"S":"Fred'\''s Garage"}}'
Here is a second way:
aws dynamodb put-item \
--table mytable \
--item $'{"id":{"S":"1"}, "name":{"S":"Fred\'s Garage"}}'
In the latter example, words of the form $'string' are treated specially and allow you to quote certain characters.

Welp after many trial and errors this is what worked:
you\u2019re
I have no idea why but it did. Posting this answer in case others experience this as well.
Example:
"SubjectPart":"Happy birthday! Get more involved in managing your healthcare now that you\u2019re 18"
This will give you the expected output.

Related

AWS CLI DynamoDB Called From Powershell Put-Item fails when a value contains a space

So, let's say I'm trying to post this JSON via the command line (not in a file because I'm not going to write a file for every invocation of this script) to a dynamo DB table
{\"TeamId\":{\"S\":\"One_Space_123\"},\"TeamName\":{\"S\":\"One_Space\"},\"Environment\":{\"S\":\"cte\"},\"StartDate\":{\"S\":\"null\"},\"EndDate\":{\"S\":\"null\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someones user\"},\"EmailDistributionList\":{\"S\":\"test#test.com\"},\"RemedyGroup\":{\"S\":\"OneSpace\"},\"ScomSubscriptionId\":{\"S\":\"guid-ab22-2345\"},\"ZabbixActionId\":{\"S\":\"11\"},\"SnsTopic\":{\"M\":{\"TopicName\":{\"S\":\"ATopicName\"},\"TopicArn\":{\"S\":\"AtopicArn1234\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someones user\"}}}}
Then the result from the CLI is one like this:
Unknown options: Space"},"ScomSubscriptionId":{"S":"guid-ab22-2345"},"ZabbixActionId":{"S":"11"},"SnsTopic":{"M":{"TopicName":{"S":"ATopicName"},"TopicArn":{"S":"AtopicArn1234"},"CreatedDate":{"S":"today"},"CreatedBy":{"S":"someones, user"}}}}, user"},"EmailDistributionList":{"S":"test#test.com"},"RemedyGroup":{"S":"One
As you can see, it fails on the TeamName property that in the above example is "One Space". If I change that value to "OneSpace" then instead it starts to fail on the "CreatedBy" property that is populated by "someones user" but if I remove all spaces from all properties I can suddenly pass this json to dynamoDB successfully.
In a working example the json looks like this:
{\"TeamId\":{\"S\":\"One_Space_123\"},\"TeamName\":{\"S\":\"One_Space\"},\"Environment\":{\"S\":\"cte\"},\"StartDate\":{\"S\":\"null\"},\"EndDate\":{\"S\":\"null\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someonesuser\"},\"EmailDistributionList\":{\"S\":\"test#test.com\"},\"RemedyGroup\":{\"S\":\"OneSpace\"},\"ScomSubscriptionId\":{\"S\":\"guid-ab22-2345\"},\"ZabbixActionId\":{\"S\":\"11\"},\"SnsTopic\":{\"M\":{\"TopicName\":{\"S\":\"ATopicName\"},\"TopicArn\":{\"S\":\"AtopicArn1234\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someonesuser\"}}}}
I can't find any documentation that tells me I can't have spaces, if I read this in from a file it will post it with the spaces, so what gives? If anyone has any advice on this matter, I certainly appreciate it.
For what it's worth in Powershell the execution looks like this currently (though I've tried various combinations of quoting the $dbTeamTableEntry variable
$dbEntry = aws.exe dynamodb put-item --region $region --table-name $table --item "$($dbTeamTableEntry)"

Creating Kinesis Analytics applications using aws cli

I want to create a kinesis analytics application using aws cli. I use this command to create the application
aws kinesisanalytics create-application --application-name smartfactorytest1 --application-code "CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM ( "device_serial" VARCHAR(16), "uploadRate" INTEGER, "downloadRate" INTEGER);
CREATE OR REPLACE PUMP "STREAM_PUMP"
AS INSERT INTO DESTINATION_SQL_STREAM
SELECT STREAM "device_serial", "uploadRate", "downloadRate"
FROM SOURCE_SQL_STREAM_001
-- LIKE compares a string to a string pattern (_ matches all char, % matches substring)
-- SIMILAR TO compares string to a regex, may use ESCAPE
WHERE "uploadRate" >20000" --inputs NamePrefix="SOURCE_SQL_STREAM",KinesisStreamsInput={ResourceARN="sourcearn",RoleARN="rolearn"}
But I get this error
invalid type for parameter Inputs[0].KinesisStreamsInput, value: ResourceARN=string, type: <class 'str'>, valid types: <class 'dict'>
Can anyone tell me what am I doing wrong? Any help would be appreciated.
I believe the issue is either that you need to take the quotes out in the KinesisStreamsInput section, or you need to add quotes and escape them. The documentation is unclear on which is the correct option.
According to the AWS Kinesis Analytics CLI Reference, https://docs.aws.amazon.com/cli/latest/reference/kinesisanalytics/create-application.html, the syntax for --inputs with KinesisStreamsInput should look like the example provided for KinesisStreamsOutput:
Name=string,KinesisStreamsOutput={ResourceARN=string,RoleARN=string},...
This would mean removing the quotes around your sourcearn and rolearn. However, the documentation isn't clear that this refers to the CLI syntax in all cases.
If that doesn't work, according to this AWS CLI usage guide page, https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-parameters-quoting-strings.html, it specifies adding quotes and escaping the relevant ones, depending on your OS...
"Linux or macOS
Use single quotation marks (' ') to enclose the JSON data structure, as in the following example. You don't have to do anything special with the embedded double quotation marks embedded in the JSON string.
aws ec2 run-instances --image-id ami-12345678 --block-device-mappings '[{"DeviceName":"/dev/sdb","Ebs":{"VolumeSize":20,"DeleteOnTermination":false,"VolumeType":"standard"}}]'
PowerShell
PowerShell requires single quotation marks (' ') to enclose the JSON data structure. Also, because double quotation marks have a special meaning to PowerShell, you must use a backslash () to escape each double quotation mark (") within the JSON structure, as in the following example.
PS C:\> aws ec2 run-instances --image-id ami-12345678 --block-device-mappings '[{\"DeviceName\":\"/dev/sdb\",\"Ebs\":{\"VolumeSize\":20,\"DeleteOnTermination\":false,\"VolumeType\":\"standard\"}}]'
Windows Command Prompt
The Windows command prompt requires double quotation marks (" ") to enclose the JSON data structure. Also, to prevent the command processor from misinterpreting the double quotation marks embedded in the JSON, you must also escape (precede with a backslash [ \ ] character) each double quotation mark (") within the JSON data structure itself, as in the following example.
C:\> aws ec2 run-instances --image-id ami-12345678 --block-device-mappings "[{\"DeviceName\":\"/dev/sdb\",\"Ebs\":{\"VolumeSize\":20,\"DeleteOnTermination\":false,\"VolumeType\":\"standard\"}}]"
Only the outermost double quotation marks are not escaped."
This link also references needing to escape quotes on Windows, and is using the kinesisanalytics command: https://github.com/aws/aws-cli/issues/3103
"Rishi74744 commented on Feb 6, 2018
I got it to work as -
aws kinesisanalytics add-application-reference-data-source --endpoint https://kinesisanalytics.us-east-1.amazonaws.com --region us-east-1 --application-name alerts --reference-data-source "{\"TableName\":\"DeviceData\",\"S3ReferenceDataSource\":{\"BucketARN\":\"arn: aws: s3: : : bucket-name\",\"FileKey\":\"device.csv\",\"ReferenceRoleARN\":\"arn: aws: iam: : account-id: role/role-name\"},\"ReferenceSchema\":{\"RecordFormat\":{\"RecordFormatType\":\"CSV\",\"MappingParameters\":{\"CSVMappingParameters\":{\"RecordRowDelimiter\":\"\n\",\"RecordColumnDelimiter\":\", \"}}},\"RecordEncoding\":\"UTF-8\",\"RecordColumns\":[{\"Name\":\"key1\",\"SqlType\":\"VARCHAR(64)\"},{\"Name\":\"key2\",\"SqlType\":\"VARCHAR(64)\"}]}}" --current-application-version-id 2
But this should be mentioned in the documentation."
One note: it may be preferable to use JSON files as inputs and use this syntax instead: --cli-input-json file://input.json. This is referenced in the AWS Kinesis CLI Command Reference (first link, under 1.) and also mentioned in the GitHub link above. It's also the method used by the majority of the AWS Kinesis documentation. For example, JSON files used for different purposes in Kinesis Analytics:
https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html
Please let me know what works, and I will work with my AWS rep to improve the documentation.

My AWS Cloudwatch bill is huge. How do I work out which log stream is causing it?

I got a $1,200 invoice from Amazon for Cloudwatch services last month (specifically for 2 TB of log data ingestion in "AmazonCloudWatch PutLogEvents"), when I was expecting a few tens of dollars. I've logged into the Cloudwatch section of the AWS Console, and can see that one of my log groups used about 2TB of data, but there are thousands of different log streams in that log group, how can I tell which one used that amount of data?
On the CloudWatch console, use the IncomingBytes metrics to find the amount of data ingested by each log group for a particular time period in uncompressed bytes using Metrics page. Follow the below steps -
Go to CloudWatch metrics page and click on AWS namespace 'Logs' --> 'Log Group Metrics'.
Select the IncomingBytes metrics of the required log groups and click on 'Graphed metrics' tab to see the graph.
Change the start time and end time such that their difference is 30 days and change the period to 30 days. This way, we will get only one data point. Also changed the graph to Number and statistics to Sum.
This way, you will see the amount of data ingested by each log groups and get an idea about which log group is ingesting how much.
You can also achieve the same result using AWS CLI. An example scenario where you just want to know the total amount of data ingested by log groups for say 30 days, you can use get-metric-statistics CLI command-
sample CLI command -
aws cloudwatch get-metric-statistics --metric-name IncomingBytes --start-time 2018-05-01T00:00:00Z --end-time 2018-05-30T23:59:59Z --period 2592000 --namespace AWS/Logs --statistics Sum --region us-east-1
sample output -
{
"Datapoints": [
{
"Timestamp": "2018-05-01T00:00:00Z",
"Sum": 1686361672.0,
"Unit": "Bytes"
}
],
"Label": "IncomingBytes"
}
To find the same for a particular log group, you can change the command to accommodate dimensions like -
aws cloudwatch get-metric-statistics --metric-name IncomingBytes --start-time 2018-05-01T00:00:00Z --end-time 2018-05-30T23:59:59Z --period 2592000 --namespace AWS/Logs --statistics Sum --region us-east-1 --dimensions Name=LogGroupName,Value=test1
One by one, you can run this command on all log groups and check which log group is responsible for most of the bill for data ingested and take corrective measures.
NOTE: Change the parameters specific to your environment and requirement.
The solution provided by OP gives data for the amount of logs stored which is different from logs ingested.
What is the difference?
Data ingested per month is not same as Data storage bytes. After the data is ingested to CloudWatch, it is archived by CloudWatch which includes 26 bytes of metadata per log event and is compressed using gzip level 6 compression. So the Storage bytes refers to the storage space used by Cloudwatch to store the logs after they're ingested.
Reference : https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/get-metric-statistics.html
We had a lambda logging GB of data of due to accidental check-in. Here's a boto3 based python script based on the info from the answers above that scans all log groups and prints out any group with logs greater than 1GB in the past 7 days. This helped me more than trying to use AWS dashboard which was slow to update.
#!/usr/bin/env python3
# Outputs all loggroups with > 1GB of incomingBytes in the past 7 days
import boto3
from datetime import datetime as dt
from datetime import timedelta
logs_client = boto3.client('logs')
cloudwatch_client = boto3.client('cloudwatch')
end_date = dt.today().isoformat(timespec='seconds')
start_date = (dt.today() - timedelta(days=7)).isoformat(timespec='seconds')
print("looking from %s to %s" % (start_date, end_date))
paginator = logs_client.get_paginator('describe_log_groups')
pages = paginator.paginate()
for page in pages:
for json_data in page['logGroups']:
log_group_name = json_data.get("logGroupName")
cw_response = cloudwatch_client.get_metric_statistics(
Namespace='AWS/Logs',
MetricName='IncomingBytes',
Dimensions=[
{
'Name': 'LogGroupName',
'Value': log_group_name
},
],
StartTime= start_date,
EndTime=end_date,
Period=3600 * 24 * 7,
Statistics=[
'Sum'
],
Unit='Bytes'
)
if len(cw_response.get("Datapoints")):
stats_data = cw_response.get("Datapoints")[0]
stats_sum = stats_data.get("Sum")
sum_GB = stats_sum / (1000 * 1000 * 1000)
if sum_GB > 1.0:
print("%s = %.2f GB" % (log_group_name , sum_GB))
Although the author of the question and other folks have answered the question in a good way, I will try to have a generic solution that could be applied without knowing the exact log-group-name which is causing too much of logs.
To do this, we can not use describe-log-streams function because this would need --log-group-name and as I said earlier I do not know the value of my log-group-name.
We can use describe-log-groups function because this function does not require any parameter.
Note that I am assuming that you have the required flag (--region) configured in ~/.aws/config file and your EC2 instance have the required permission to execute this command.
aws logs describe-log-groups
This command would list all the log groups in your aws account. The sample output of this would be
{
"logGroups": [
{
"metricFilterCount": 0,
"storedBytes": 62299573,
"arn": "arn:aws:logs:ap-southeast-1:855368385138:log-group:RDSOSMetrics:*",
"retentionInDays": 30,
"creationTime": 1566472016743,
"logGroupName": "/aws/lambda/us-east-1.test"
}
]
}
If you are interested in a specific prefix pattern only for the log group, you can use --log-group-name-prefix like this
aws logs describe-log-groups --log-group-name-prefix /aws/lambda
The output JSON of this command would also be similar to the above output.
If you have too many log groups in your account, analyzing the output of this becomes difficult and we need some command-line utility to give a brief insight into the result.
We will use the 'jq' command-line utility to get the desired thing. The intention is to get which log group has produced the most amount of log and hence caused more money.
From the output JSON, the fields which we need for our analysis would be "logGroupName" and "storedBytes". So taking these 2 fields in the 'jq' command.
aws logs describe-log-groups --log-group-name-prefix /aws/
| jq -M -r '.logGroups[] | "{\"logGroupName\":\"\(.logGroupName)\",
\"storedBytes\":\(.storedBytes)}"'
Using '\' in the command to do the escape because we want the output to be in the JSON format only to use the sort_by function of jq. The sample output of this would be something like below:
{"logGroupName":"/aws/lambda/test1","storedBytes":3045647212}
{"logGroupName":"/aws/lambda/projectTest","storedBytes":200165401}
{"logGroupName":"/aws/lambda/projectTest2","storedBytes":200}
Note that the output result would not be sorted on storedBytes, so we want to sort them in order to get which log group is the most problematic one.
we will use sort_by function of jq to accomplish this. The sample command would be like this
aws logs describe-log-groups --log-group-name-prefix /aws/
| jq -M -r '.logGroups[] | "{\"logGroupName\":\"\(.logGroupName)\",
\"storedBytes\":\(.storedBytes)}"'
| jq -s -c 'sort_by(.storedBytes) | .[]'
This would produce the below result for the above sample output
{"logGroupName":"/aws/lambda/projectTest2","storedBytes":200}
{"logGroupName":"/aws/lambda/projectTest","storedBytes":200165401}
{"logGroupName":"/aws/lambda/test1","storedBytes":3045647212}
The elements from the bottom of this list are the ones that have the most log associated with it. You may set the Expire Events After property to a finite period say 1 month to these log group.
If you want to know what is the sum of all the log byte then you can use the 'map' and 'add' function of jq like below.
aws logs describe-log-groups --log-group-name-prefix /aws/
| jq -M -r '.logGroups[] | "{\"logGroupName\":\"\(.logGroupName)\",
\"storedBytes\":\(.storedBytes)}"'
| jq -s -c 'sort_by(.storedBytes) | .[]'
| jq -s 'map(.storedBytes) | add '
The output of this command for the above sample output would be
3245812813
The answer has become lengthy but I hope it helps in figuring out the most problematic log group in cloudwatch.
You can also click the gear on the gear on the cloudwatch logs dashboard and choose the stored bytes column.
I also clicked anything that said 'never expire' and changed the logs to expire.
Use cloudwatch logs gear and select "Stored Bytes" column
*** UPDATE 20210907 - as #davur points out in one of the comments below, AWS deprecated storedBytes for individual LogStreams, so the method described in this answer no longer fulfils the requirement, although it might be interesting in other ways ***
Okay, I'm answering my own question here, but here we go (with all other answers welcome):
You can use a combination of AWS CLI tools, the csvfix CSV package and a spreadsheet to work this out.
Log into the AWS Cloudwatch Console and grab the name of the log group which has generated all the data. In my case it's called "test01-ecs".
Unfortunately in the Cloudwatch Console you can't sort the streams by "Stored Bytes" (which would tell you which ones are biggest). If there are too many streams in the log group to look through in the Console then you need to dump them somehow. For this you can use the AWS CLI tool:
$ aws logs describe-log-streams --log-group-name test01-ecs
The command above will give you JSON output (assuming your AWS CLI tool is set to JSON output - set it to output = json in ~/.aws/config if not) and it will look something like this:
{ "logStreams": [ { "creationTime": 1479218045690, "arn": "arn:aws:logs:eu-west-1:902720333704:log-group:test01-ecs:log-stream:test-spec/test-spec/0307d251-7764-459e-a68c-da47c3d9ecd9", "logStreamName": "test-spec/test-spec/0308d251-7764-4d9f-b68d-da47c3e9ebd8", "storedBytes": 7032 } ] }
Pipe this output to a JSON file - in my case the file was 31 MB in size:
$ aws logs describe-log-streams --log-group-name test01-ecs >> ./cloudwatch-output.json
Use the in2csv package (part of csvfix) to convert the JSON file to a CSV file which can easily be imported into a spreadsheet, making sure you define the logStreams key to be used to import on:
$ in2csv cloudwatch-output.json --key logStreams >> ./cloudwatch-output.csv
Import the resulting CSV file into a spreadsheet (I use LibreOffice myself as it seems great at dealing with CSV) making sure the storedBytes field is imported as an integer.
Sort the storedBytes column in the spreadsheet to work out which log stream or streams are generating the most data.
In my case this worked - it turned out one of my log streams (with logs from a broken TCP pipe in a redis instance) was 4,000 times the size of all the other streams combined!
An alternative to using the now deprecated storedBytes for log streams is to use Cloudwatch > Logs Insights and then run a query to count events by the log steam:
stats count(*) by #logStream
The log stream with the larger number of events will then probably be what is causing the high bill usage.

Pass comma separated argument to spark jar in AWS EMR using CLI

I am using aws cli to create EMR cluster and adding a step. My create cluster command looks like :
aws emr create-cluster --release-label emr-5.0.0 --applications Name=Spark --ec2-attributes KeyName=*****,SubnetId=subnet-**** --use-default-roles --bootstrap-action Path=$S3_BOOTSTRAP_PATH --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=$instanceCount,InstanceType=m4.4xlarge --steps Type=Spark,Name="My Application",ActionOnFailure=TERMINATE_CLUSTER,Args=[--master,yarn,--deploy-mode,client,$JAR,$inputLoc,$outputLoc] --auto-terminate
$JAR - is my spark jar which takes two params input and output
$input is basically a comma separated list of input files like s3://myBucket/input1.txt,s3://myBucket/input2.txt
However, aws cli command treats comma separated values as separate arguments and hence my second parameter is being treated as second parameter and hence the $output here becomes s3://myBucket/input2.txt
Is there any way to escape comma and treat this whole argument as single value in CLI command so that spark can handle reading multiple files as input?
Seems like there is no possible way of escaping comma from input files.
After trying quite a few ways, I finally had to put a hack by passing a delimiter for separating input files and handling the same in code. In my case,I added % as my delimiter and in Driver code, I am doing
if (inputLoc.contains("%")) {
inputLoc = inputLoc.replaceAll("%", ",");
}

How do I filter and extract raw log event data from Amazon Cloudwatch

Is there any way to 1) filter and 2) retrieve the raw log data out of Cloudwatch via the API or from the CLI? I need to extract a subset of log events from Cloudwatch for analysis.
I don't need to create a metric or anything like that. This is for historical research of a specific event in time.
I have gone to the log viewer in the console but I am trying to pull out specific lines to tell me a story around a certain time. The log viewer would be nigh-impossible to use for this purpose. If I had the actual log file, I would just grep and be done in about 3 seconds. But I don't.
Clarification
In the description of Cloudwatch Logs, it says, "You can view the original log data (only in the web view?) to see the source of the problem if needed. Log data can be stored and accessed (only in the web view?) for as long as you need using highly durable, low-cost storage so you don’t have to worry about filling up hard drives." --italics are mine
If this console view is the only way to get at the source data, then storing logs via Cloudwatch is not an acceptable solution for my purposes. I need to get at the actual data with sufficient flexibility to search for patterns, not click through dozens of pages lines and copy/paste. It appears a better way to get to the source data may not be available however.
For using AWSCLI (plain one as well as with cwlogs plugin) see http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/SearchDataFilterPattern.html
For pattern syntax (plain text, [space separated] as as {JSON syntax}) see: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/FilterAndPatternSyntax.html
For python command line utility awslogs see https://github.com/jorgebastida/awslogs.
AWSCLI: aws logs filter-log-events
AWSCLI is official CLI for AWS services and now it supports logs too.
To show help:
$ aws logs filter-log-events help
The filter can be based on:
log group name --log-group-name (only last one is used)
log stream name --log-stream-name (can be specified multiple times)
start time --start-time
end time --end-time (not --stop-time)
filter patter --filter-pattern
Only --log-group-name is obligatory.
Times are expressed as epoch using milliseconds (not seconds).
The call might look like this:
$ aws logs filter-log-events \
--start-time 1447167000000 \
--end-time 1447167600000 \
--log-group-name /var/log/syslog \
--filter-pattern ERROR \
--output text
It prints 6 columns of tab separated text:
1st: EVENTS (to denote, the line is a log record and not other information)
2nd: eventId
3rd: timestamp (time declared by the record as event time)
4th: logStreamName
5th: message
6th: ingestionTime
So if you have Linux command line utilities at hand and care only about log record messages for interval from 2015-11-10T14:50:00Z to 2015-11-10T15:00:00Z, you may get it as follows:
$ aws logs filter-log-events \
--start-time `date -d 2015-11-10T14:50:00Z +%s`000 \
--end-time `date -d 2015-11-10T15:00:00Z +%s`000 \
--log-group-name /var/log/syslog \
--filter-pattern ERROR \
--output text| grep "^EVENTS"|cut -f 5
AWSCLI with cwlogs plugin
The cwlogs AWSCLI plugin is simpler to use:
$ aws logs filter \
--start-time 2015-11-10T14:50:00Z \
--end-time 2015-11-10T15:00:00Z \
--log-group-name /var/log/syslog \
--filter-pattern ERROR
It expects human readable date-time and always returns text output with (space delimited) columns:
1st: logStreamName
2nd: date
3rd: time
4th till the end: message
On the other hand, it is a bit more difficult to install (few more steps to do plus current pip requires to declare the installation domain as trusted one).
$ pip install awscli-cwlogs --upgrade \
--extra-index-url=http://aws-cloudwatch.s3-website-us-east-1.amazonaws.com/ \
--trusted-host aws-cloudwatch.s3-website-us-east-1.amazonaws.com
$ aws configure set plugins.cwlogs cwlogs
(if you make typo in last command, just correct it in ~/.aws/config file)
awslogs command from jorgebastida/awslogs
This become my favourite one - easy to install, powerful, easy to use.
Installation:
$ pip install awslogs
To list available log groups:
$ awslogs groups
To list log streams
$ awslogs streams /var/log/syslog
To get the records and follow them (see new ones as they come):
$ awslogs get --watch /var/log/syslog
And you may filter the records by time range:
$ awslogs get /var/log/syslog -s 2015-11-10T15:45:00 -e 2015-11-10T15:50:00
Since version 0.2.0 you have there also the --filter-pattern option.
The output has columns:
1st: log group name
2nd: log stream name
3rd: message
Using --no-group and --no-stream you may switch the first two columns off.
Using --no-color you may get rid of color control characters in the output.
EDIT: as awslogs version 0.2.0 adds --filter-pattern, text updated.
If you are using the Python Boto3 library for extraction of AWS cloudwatch Logs. The function of get_log_events() accepts start and end time in milliseconds.
For reference: http://boto3.readthedocs.org/en/latest/reference/services/logs.html#CloudWatchLogs.Client.get_log_events
For this you can take a UTC time input and convert it into milliseconds by using the Datetime and timegm modules and you are good to go:
from calendar import timegm
from datetime import datetime, timedelta
# If no time filters are given use the last hour
now = datetime.utcnow()
start_time = start_time or now - timedelta(hours=1)
end_time = end_time or now
start_ms = timegm(start_time.utctimetuple()) * 1000
end_ms = timegm(end_time.utctimetuple()) * 1000
So, you can give inputs as stated below y using sys input as:
python flowlog_read.py '2015-11-13 00:00:00' '2015-11-14 00:00:00'
While Jan's answer is a great one and probably what the author wanted, please note that there is an additional way to get programmatic access to the logs - via subscriptions.
This is intended for always-on streaming scenarios where data is constantly fetched (usually into Kinesis stream) and then further processed.
Haven't used it myself, but here is an open-source cloudwatch to Excel exporter I came across on GitHub:
https://github.com/petezybrick/awscwxls
Generic AWS CloudWatch to Spreadsheet Exporter CloudWatch doesn't provide an Export utility - this does. awscwxls creates spreadsheets
based on generic sets of Namespace/Dimension/Metric/Statistic
specifications. As long as AWS continues to follow the
Namespace/Dimension/Metric/Statistic pattern, awscwxls should work for
existing and future Namespaces (Services). Each set of specifications
is stored in a properties file, so each properties file can be
configured for a specific set of AWS Services and resources. Take a
look at run/properties/template.properties for a complete example.
I think the best option to retrieve the data is provided as described in the API.