CloudWatch Insights Query - How to get a single count from counts - amazon-web-services

I have a log file which contains playerId values, some players have multiple entries in the file. I want to get an exact distinct count of to unique players, regardless of if they have 1 or multiple entries in the log file.
Using the query below it scans 497 records and finds 346 unique rows (346 is the number I want)
Query:
fields #timestamp, #message
| sort #timestamp desc
| filter #message like /(playerId)/
| parse #message "\"playerId\": \"*\"" as playerId
| stats count(playerId) as CT by playerId
If I change my query to use count_distinct instead, I get exactly what I want. Example below:
fields #timestamp, #message
| sort #timestamp desc
| filter #message like /(playerId)/
| parse #message "\"playerId\": \"*\"" as playerId
| stats count_distinct(playerId) as CT
The problem with count_distinct however is that as the query expands to a larger timeframe/more records the number of entries get into the thousands, and tens of thousands. This presents an issue as the numbers become approximations, due to the nature of Insights count_distinct behaviour...
"Returns the number of unique values for the field. If the field has very high cardinality (contains many unique values), the value returned by count_distinct is just an approximation.".
Docs: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html
This is not acceptable, as I require exact numbers. Playing with the query a little, and sticking with count(), not count_distinct() I believe is the answer, however I've not been able to come to a single number... Examples which do not work... Any thoughts?
Ex 1:
fields #timestamp, #message
| sort #timestamp desc
| filter #message like /(playerId)/
| parse #message "\"playerId\": \"*\"" as playerId
| stats count(playerId) as CT by playerId
| stats count(*)
We are having trouble understanding the query.
To be clear, I'm looking for an exact count to be returned in a single row showing the number.

What if we introduce a dummy field that's hardcoded to "1"? The idea is to retrieve its min value so that it stays as a "1" even if the same playerId occurs more than once. And then we sum this field.
The log entry might look like this:
[1]"playerId": "1b45b168-00ed-42fe-a977-a8553440fe1a"
Query:
fields #timestamp, #message
| sort #timestamp desc
| filter #message like /(playerId)/
| parse #message "[*]\"playerId\": \"*\"" as dummyValue, playerId
| stats sum(min(dummyValue)) by playerId as CT
References used:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData_AggregationQuery.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CountOccurrencesExample.html

Related

Cloudwatch: merging the result from 2 fields into one

fields #timestamp, #message
| parse durationMs /(?<duration>[\d]+ )/
| parse message /(GET \/[^\s]+ [\d]+ )(?<responseTime>[\d]+)/
| display #timestamp, duration, responseTime
| sort #timestamp desc
This query works for me and fetches the values. The query is currently parsing the durationMs field and getting the value into duration field. Also parsing message field and getting the value into responseTime field.
I am looking for a way to parse durationMs and message fields and get the value into only one field. Is this possible? Please help.
coalesce function did the job for me.
fields #timestamp, #message
| parse durationMs /(?<duration>[\d]+ )/
| parse message /(GET \/[^\s]+ [\d]+ )(?<responseTime>[\d]+)/
| display #timestamp, coalesce(duration, responseTime) as response_time
| sort #timestamp desc

How to search any string regular expression in AWS Log Insights?

I have message filter
fields #timestamp, #message
| sort #timestamp desc
| filter #message ~= 'simple query'
| limit 20
What query should I use for searching results with messages:
simple query
simple query 1
simple query 2
simple query error
simple query etc...
Big thx!
Two options that you have are the strcontains and like methods:
strcontains:
fields #timestamp, #message
| filter strcontains(#message, "simple query")
| sort #timestamp desc
| limit 20
like:
fields #timestamp, #message
| filter #message like /simple query/
| sort #timestamp desc
| limit 20

Cloud watch Log insights query, combining two querires

I currently use two different cloud watch log insights queries one to get total request count and the other to get total error count. Below are the queries:
Total count:
fields #timestamp, #message
| filter #message like /reply.*MyAPI/
|parse #message '"reqID":*' as reqID
| stats count_distinct(reqID) as request_count by bin(1h) as hour
** Error count**
fields #timestamp, #message
| filter #message like /reply.*MyAPI.*Exception/
|parse #message '"reqID":*' as reqID
| stats count_distinct(reqID) as request_count by bin(1h) as hour
However I would like to calculate both total request counts and error request count in each bin and calculate error rates for each bin (error count/total request count) if possible with a single query. How would I go about this?

Parse amazon cloudwatch RDS audit log for user

I am working with RDS Audit logs and trying to parse out the username with a log query. The data in the audit for the #message column looks like this:
1234567890,rds-instance-name,rdsadmin,localhost,123,0,CONNECT,,,0
I would like to aggregate the counts for the various entries in the logs but I don't know how to parse the username out of the #message column. In the example above the username is rdsadmin.
Here is the query I have so far:
fields #timestamp, #message
| filter #message like /(?i)(connect)/
| parse #message /(?<#ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/
| stats count() AS counter by #user, #ip
| sort by #user desc, #counter desc
| limit 50
Would a regex be able to parse the third value in the comma separated string?
This appears to be working, maybe not the best way? :
fields #timestamp, #message
| filter #message like /(?i)(CONNECT)/
| parse #message ',*,*,' as #instance,#user
| parse #message /(?<#ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/
| stats count() AS counter by #user, #ip
| sort by #user desc, #counter desc
| limit 50

How to query AWS CloudWatch logs using AWS CloudWatch Insights?

I have a lot of AWS Lambda logs which I need to query to find the relevant log stream name,
I am logging a particular string in the logs,
Which I need to do a like or exact query on.
The log format is something like this -
Request ID => 572bf6d2-e3ff-45dc-bf7d-c9c858dd5ccd
I am able to query the logs without the UUID string -
But if I mention the UUID in the query, it does not show results -
Queries used -
fields #timestamp, #message
| filter #message like /Request ID =>/
| sort #timestamp desc
| limit 20
fields #timestamp, #message
| filter #message like /Request ID => 572bf6d2-e3ff-45dc-bf7d-c9c858dd5ccd/
| sort #timestamp desc
| limit 20
Have you tried adding an additional filter on the message field to your first query to further narrow your results?
fields #timestamp, #message
| filter #message like /Request ID =>/
| filter #message like /572bf6d2-e3ff-45dc-bf7d-c9c858dd5ccd/
| sort #timestamp desc
| limit 20
Alternatively if all of your logs follow the same format you could use the parse keyword to split out your UUID field and search on it with something like
fields #timestamp, #message
| parse #message "* * Request ID => *" as datetime, someid, requestuuid
| filter uuid like /572bf6d2-e3ff-45dc-bf7d-c9c858dd5ccd/
| sort #timestamp desc
| limit 20
Also try widening your relative time range at the top right of the query, just in case the request you're looking for has dropped outside of the 1hr range since attempting the first query.
instead of using two like filters like in accepted answer, I would suggest using the in operator as follows. This way your code is shorter and cleaner.
fields #timestamp, #message
| filter #message in ["Request ID =>", "572bf6d2-e3ff-45dc-bf7d-c9c858dd5ccd"]
| sort #timestamp desc
| limit 20