Cloudwatch Filter Against OpenSearch Logs - amazon-web-services

I followed the instructions from documentation, but could not find it useful in my scenario.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html#extract-log-event-values
I am able to filter json values as well as columns data, but unable to filter for example took_millis[19] value from the log. I tried multiple filters like this [,,,,,,took_millis >= 100,...], [,,,,,,f7=took, milliseconds>=100,...] but no luck so far.
I want to filter slow log queries that are taking more than 100ms.
Example log data for elasticsearch slow queries is attached. Please have a look and share the filter pattern for cloudwatch events.
[2021-11-22T01:25:17,133][WARN ][index.search.slowlog.query] [319eDpW] [locations][1] took[19.3ms], took_millis[19], types[data_en], stats[], search_type[QUERY_THEN_FETCH], total_shards[6], source[...]

Related

Item Duration in Cache

I am trying to create a metric to measure the amount of time that an item has been in a cache using Elasticache. There does not seem to be any built in metric for this in Cloud Watch, and I have struggled to run a query in logs insights to obtain this information.
I have tried running a query in log insights to create this metric, but it requires matching of an ID and the query language used in AWS does not seem to support these types of conditional queries. So I am unsure of how to solve this problem

GCP log explorer filter for list item count more than 1

I am trying to write a filter in GCP log explorer, which can look for a count of the values of an attribute.
Example:
I am trying to find the logs like below, which has two items for "referencedTables" attribute.
GCP Log Explorer Screenshot
I have tried below options which doesn't work -
protoPayload.metadata.jobChange.job.jobStats.queryStats.referencedTables.*.count>1
protoPayload.metadata.jobChange.job.jobStats.queryStats.referencedTables.count>1
Also tried Regex looking for "tables" keyword occurrence twice -
protoPayload.metadata.jobChange.job.jobStats.queryStats.referencedTable=~"(\tables+::\tables+))"
Also tried Regex querying second item, which means there are more than one items -
protoPayload.metadata.jobChange.job.jobStats.queryStats.referencedTables1=~"^[A-Za-z0-9_.]+$"
Note that - these types of logs are BigQuery audit logs, that are logged in GCP logging service, when you run "insert into.. select" type of queries in BigQuery.
I think you can't use logging filters to filter across log entries only within a log entry.
One solution to your problem is log-based metrics where you'd create a metric by extracting values from logs but you'd then have to use MQL to query (e.g. count) the metric.
A more simple (albeit ad hoc) solution is to use use gcloud logging read to --filter the logs (possibly --format the results in JSON for easier processing) and then pipeline the results into a tool like jq where you could count the results.

Are there ways to find amount of data queried per lambda statement for AWS redshift?

I am trying to find the amount of data queried per statement from AWS Lambda on Redshift, but all I can find is amount of data queried per query ID. There are multiple lambdas which I am running but I can't seem to relate the lambdas to the query ID.
I tried to look up the documentation on AWS Redshift system views, but there doesn't seem to be any tables which contain these values.
So there are a few ways to do this. First off the Lambda can find its session id with PG_BACKEND_PID(). This can be reported out / logged from the Lambda to report all statements from this session. Or you can add a unique comment to to all the queries coming from Lambda and you can search on this in svl_statementtext. Or you can do both. Once you have the query id and session id you look at the query statistics (SVL_QUERY_REPORT or other catalog tables).
Be aware that query ids and session ids repeat over time so also check the date to make sure you are not seeing a query from some time ago.

Count number of GCP log entries during a specified time

Is it possible to count number of occurrences of a specific log message over a specific period of time from GCP Stackdriver logging? To answer the question "How many times did this event occur during this time period." Basically I would like the integral of the curve in the chart below.
It doesn't have to be a moving window, this time it's more of a one-time-task. A count-aggregator or similar on the advanced log query would also work if that would be available.
The query looks like this:
(resource.type="container"
logName="projects/xyz-142842/logs/drs"
"Publish Message for updated entity"
) AND (timestamp>="2018-04-25T06:20:53Z" timestamp<="2018-04-26T06:20:53Z")
My log based metric for the graph above looks like this:
My Dashboard is setup like this:
I ended up building stacked bars.
With correct zoom level I can sum up the number of occurrences easy enough. It would have been a nice feature to get the count directly from a graph (the integral), but this works for now.
There are multiple ways to do so, the two that I saw actually working and that can apply to your situation are the following:
Making use of Logs-based Metrics. They can, for example, record the number of log entries containing particular error messages, or they can extract latency information reported in log entries.
Stackdriver Logging logs-based metrics can be one of two metric types: counter or distribution. [...] Counter metrics count the number of log entries matching an advanced logs filter. [...] Distribution metrics accumulate numeric data from log entries matching a filter.
I would advise you to go through the Documentation to check this feature completely cover your use case.
You can export your logs to Big query, once you have them there you can make use of the classical tools like groupby, select and all the tool that BigQuery offers you.
Here you can find a very minimal step to step guide regarding how to export the logs and how to Analyzing Audit Logs Using BigQuery, but I am sure you can find online many resources.
The product and the approaches are really different, I would say that BigQuery is more flexible, but also more complex to be configure and to properly use it. If you find a third better way please update your question with those information.
At first you have to create a metric :
Go to Log explorer.
Type your query
Go to Actions >> Create Metric.
In the monitoring dashboard
Create a chart.
Select the resource and metric.
Go to "Advanced" and provide the details as given below :
Preprocessing step : Rate
Alignment function : count
Alignment period : 1
Alignment unit : minutes
Group by : log
Group by function : count
This will give you the visualisation in a bar chart with count of the desired events.
There is one more option.
You can read your custom metric using Stackdriver Monitoring API ( https://cloud.google.com/monitoring/api/v3/ ) and process it in script with whatever aggregation you need.
If you are working with python - you may look into gcloud python library https://github.com/GoogleCloudPlatform/google-cloud-python/tree/master/monitoring
It will be very simple script and you can stream results of calculation into bigquery table and use it in your dashboard
With PacketAI, you can send logs of arbitrary formats, including from GCP. then the logs dashboard will automatically parse and group into patterns as shown in this video. https://streamable.com/n50kr8
Counts and trends of different log patterns are also displayed
Disclaimer: I work for PacketAI

Elasticsearch Sorting "breaks" filter

I am trying to split logs per customer. I think I understand the Query DSL of Elasticsearch.
For filtering the logs I use a domain name as filter parameter.
For this time we will call them
bh250.example.com
bh500.example.com
now I have managed to filter the logs so that the owner of domain bh250.example.com can only see his logfiles.
But when I want to sort them, based on the timestamp it "breaks" the filter and shows both bh250 and bh500 logs.
q =Q("match", domainname=domein)
q1 =Q("match", status="404")
search= Search(using=dev_client, index="access-logs").query(q).filter.("term" , status="200").sort("-#timestamp")[0:100]
Now without the Sort function it shows the correct logs, but in a different order. with the sort function i get both records on screen. (bh250 and bh500)
I also have looked at if mappings could be the issue but im not quite sure about why the sort function breaks down my "filter"