I'm trying to create a CloudWatch Insights query for Amazon Connect that will give me call counts by date. I'm able to get the number of log messages by date, however, I need to only count unique ContactId's. The query I have has many duplicated ContactId's since each time Connect logs to CloudWatch, it uses ContactId to tie all of the events related to a contact together. Is there a way to modify this query to only show the count of the unique ContactId?
filter #message like /ContactId/
| stats count(*) as callCount by toMillis(datefloor(1d))
| sort callCount desc
Embarassingly enough, almost immediately after posting this, I found my answer. count_distinct() gets me what I needed.
filter #message like /ContactId/
| stats count_distinct(ContactId) as callCount by toMillis(datefloor(1d))
| sort callCount desc
Related
I'm running a CloudWatch log insights query on a single log stream that corresponds to a single Python AWS Lambda function. This function logs a unique line corresponding to the key in s3 that it is processing. It logs this line once at the beginning of the invocation. The only condition where it won't log this line is if it fails before it even reads the event.
The query is:
parse #message /(?<#unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter #message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(#unique_key) - count_distinct(#unique_key) as #distinct_unique_keys_delta
by datefloor(#timestamp, 1d) as #_datefloor
| sort #_datefloor asc
The two regular expressions in this query will parse the full key of the s3 file being processed. In this particular problem and in general, my understanding is that the count(...) of any quantity minus the count_distinct(...) of the same quantity should always be greater than or equal to zero.
For several of the days in the results, it is a negative number.
I thought I might be misunderstanding the correct usage of datefloor(), so I tried running the following query:
parse #message /(?<#unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter #message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(#unique_key) - count_distinct(#unique_key) as #distinct_unique_keys_delta
The result was -20,347.
At this point the only scenarios I can see are
Something wrong with the code executing the query.
I'm misunderstanding this tool.
I have discovered that the count_distinct function in AWS Log Insights queries doesn't really return a distinct count! As per the documentation
Returns the number of unique values for the field. If the field has very high cardinality (contains many unique values), the value returned by count_distinct is just an approximation.
Apparently I can't just assume that a function returns an accurate result.
The documentation page.
I have two different log groups and am retrieving details in different ways.
Once per:
fields #timestamp, message.event.detail.myIdentifier as placeA.myIdentifier
And once with
fields #timestamp
| parse message.event.Records.0.body '"myIdentifier ": "*"' as placeB.myIdentifier
Is there a way to join these log entries by myIdentifier?
Ideally the goal would be to compare the timestamps of all places per identifier.
So e.g.:
myIdentifier
placeA-timestamp
placeB-timestamp
First
12:00:00
13:00:00
Second
12:05:00
13:05:00
Is there a way to achieve this with cloud watch log insights?
Thanks for your help!
Hope you're well. I've been trying to put together a CloudWatch Query that returns the first event in each contactId.
I thought I'd add a count stat, and then exclude all events that were equal to or greater than 2. I'm clearly not doing something right though. Although I am being provided with the count, it seems for some reason that the count is excluding other information from the query. The query returns almost no information on the event that it is counting. I'd like the count to be added, and also INCLUDE the information from the query.
Here is the query I am using:
fields #timestamp, #message
| sort number asc
| stats count(ContactId) as number by ContactId
| filter ContactFlowModuleType = 'SetLoggingBehavior' and Parameters.LoggingBehavior = 'Enable'
| fields #message
| display Results, ContactId, #timestamp, ContactFlowModuleType, number
With this query, it says that 'time stamp' is invalid. I believe the stats clause has something to do with it.
I'm looking to determine the sequence of events on a contactId basis, so that I can exclude all logged events after the initial event. For now, I'd just like to see a count on the basis of ContactId, so I can perform the exclusion myself.
Steve
I am trying to generate a graph that will display the success/failure rate of an operation. In my application I am pushing log events in the following format:
[loggingType] loggingMessage.
I want to create a pie chart that shows the ratio of success/failure but its not working. I am running the following:
filter #logStream="RunLogs"
| parse #message "[*] *" as loggingType, loggingMessage
| filter loggingType in ["pass","fail"]
| stats count(loggingType="pass")/count(loggingType="fail") as ratio by bin(12w)
It seems like the condition inside count does not work and grabs everything. It returns 1 every time :(
I came across a similar scenario; but, super weirdly I believe, if you change the query to use sum instead of count it works. Not sure why AWS query execution interprets in this way.
filter #logStream="RunLogs"
| parse #message "[*] *" as loggingType, loggingMessage
| filter loggingType in ["pass","fail"]
| stats sum(loggingType="pass")/sum(loggingType="fail") as ratio by bin(12w)
I'm trying to create an AWS dashboard visualization that displays the counts of cache hits vs. misses over a period of time. To do this, I'm setting up a log type dashboard with an insights query on the log. To be as simple as possible, my log is either:
{"cache.hit", true} or {"cache.hit", false}.
I would like for my dashboard to track both possibilities on the same graph, but it seems like I can't without breaking my log up into distinct rows for these values. For example, if my logs were simply:
{"cache.hit.true", true} or {"cache.hit.false", true}, then I could create 2 separate graphs to track these values independently in the dashboard, but that's not as clean.
To get them on one dash, I've tried this, but all it does is display the two fields, and the values for both display fields are the same, when they definitely shouldn't be:
fields #timestamp, #message, cache.hit as cache_hits
| filter cache_hits IN [0, 1]
| display cache_hits = 0 as in_cache_false
| display cache_hits = 1 as in_cache_true
| stat count (in_cache_true), count(in_cache_false) by bin(30s)
| sort #timestamp desc
| limit 20
This query below extracts out the cache hits and cache misses and then works out the cache hit percentage.
fields #timestamp, #message
| filter #message like /cache.hit/
| fields strcontains(#message, "true") as #CacheHit,
strcontains(#message, "false") as #CacheMiss
| stats sum(#CacheHit) as CacheHits, sum(#CacheMiss) as CacheMisses, sum(#CacheHit) / (sum(#CacheMiss) + sum(#CacheHit)) * 100 as HitPercentage by bin(30s)
| sort #timestamp desc