Clouwatch log insights query [replace UUID with *] - amazon-web-services

I have some logs comming into aws log group in messages(field), I want to write log insights query to get real time count of incomming logs,
But I have unique UUIDs in each log that needs to be replaced by '*' so that. similar logs can be grouped together for getting correct count.
I want the UUIDs in these logs to be replaced by '*' .
Sample logs in system.
messages(field)
v2/documents/0003cfad-c6ce-46f1-b617-9efd95d79b52
v2/documents/0003cfad-c6ce-46f1-b617-9efd95d79b52/status/004083f4-467e-4d25-9d71-baf7087acb2b/
v2/004083f4-467e-4d25-9d71-baf7087acb2b
v2/documents/004083f4-467e-4d25-9d71-baf7087acb2b/status
v2/0063891d-6822-493e-a650-31cc57989310/create/004083f4-467e-4d25-9d71-baf7087acb2b/
v2/documents/00ee9bb9-e21b-44c7-b437-d0c7dd1057f8
v2/documents/00ee9bb9-e21b-44c7-b437-d0c7dd1057f8/status/00ee9bb9-e21b-44c7-b437-d0c7dd1057f8/
v2/documents/00fcce48-1768-4e89-a58b-e699be061ae4/delete/00ee9bb9-e21b-44c7-b437-d0c7dd1057f8/
Expected result after replacement-
messages (field)
v2/documents/*
v2/documents/*/status/*/
v2/*
v2/documents/*/status
v2/*/create/*/
v2/documents/*
v2/documents/*/status/*/
v2/documents/*/delete/*/
Note:- All UUIDs can be unique but follow same pattern of ---
[varchar(8)-varchar(4)-varchar(4)-varchar(4)-varchar(12)]
I have tried to get desiered result using below query. This query uses parse function and relies completely on '/' as delimiter for creating new column, but this is not helping as all the logs do not follow same pattern.
Ps: each of these is error log with 503 status code.
fields #message
| filter #message like '" 503'
| parse #message "* * * * *" as a, b, c, uri, e
| parse uri "/*/*/*/*/*/*/*" as f1,f2,f3,f4,f5,f6,f7
| parse f7 "*?" as f8
| parse f7 "*/" as f9
| stats count(*)as Count by f5 as API, f9 as Call, f8 as call
Thanks in advance!!

Related

Log insights Regex to extract and group up to some characters of logs and get count

I have filtered my logs to get ERROR like so
filter #message like /ERROR/
My logs look like so:
"2023-01-01 06:01:02.010 ERROR <details of the error up to 150 chars>
I wish to extract the output and group similar errors like below. I have mentioned up to 50 chars as the details in similar errors match for the first 50(or so) characters. How to do it in the logs insights query? Or using Python in case of boto3.
ERROR Type Count
ERROR <details up to 50 chars> 400 <- Error type x
ERROR <details up to 50 chars> 230 <- Error type y

Parse lines from messages in a Splunk query to be displayed as a chart on a dashboard

I generate events on multiple computers that list service names that aren't running. I want to make a chart that displays the top offending service names.
I can use the following to get a table for the dashboard:
ComputerName="*.ourDomain.com" sourcetype="WinEventLog:Application" EventCode=7223 SourceName="internalSystem"
| eval Date_Time=strftime(_time, "%Y-%m-%d %H:%M")
| table host, Date_Time, Message, EventCode
Typical Message(s) will contain:
The following services were not running after 5603 seconds and a start command has been sent:
Service1
Service2
The following services were not running after 985 seconds and a start command has been sent:
Service2
Service3
Using regex I can make a named group of everything but the first line with (?<Services>((?<=\n)).*)
However, I don't think this is the right approach as I don't know how to do a valuation for the chart with this information.
So in essence, how do I grab and tally service names from messages in Splunk?
Edit 1:
Coming back to this after a few days.
I created a field extraction called "Services" with regex that grabs the contents of each message after the first line.
If I use | stats count BY Services it counts each message as a whole instead of the lines inside. The results look like this:
Service1 Service2 | Count: 1
Service2 Service3 | Count: 1
My intention is to have it treat each line as its own value so the results would look like:
Service1 | Count: 1
Service2 | Count: 2
Service3 | Count: 1
I tried | mvexpand Services but it didn't change the output so I assume I'm either using it improperly or it's not applicable here.
I think you can do it with the stats command.
| stats count by service
will give a number of appearances for each service. You then can choose the bar chart visualization to create a graph.
I ended up using split() and mvexpand to solve this problem.
This is what worked in the end:
My search
| eval events=split(Service, "
")
| mvexpand events
| eval events=replace(events, "[\n\r]", "")
| stats count BY events
I had to add the replace() method because any event with just one service listed was being treated differently from an event with multiple, after the split on an event with multiple services each service had a carriage return, hence the replace.
My end result dashboard chart:
For Chart dropping down that is clean:
index="yourIndex" "<searchCriteria>" | stats count(eval(searchmatch("
<searchCriteria>"))) as TotalCount
count(eval(searchmatch("search1"))) as Name1
count(eval(searchmatch("search2" ))) as Name2
count(eval(searchmatch("search3"))) as Name3
| transpose 5
| rename column as "Name", "row 1" as "Count"
Horizontal table example with percentages:
index=something "Barcode_Fail" OR "Barcode_Success" | stats
count(eval(searchmatch("Barcode_Success"))) as SuccessCount
count(eval(searchmatch("Barcode_Fail"))) as FailureCount
count(eval(searchmatch("Barcode_*"))) as Totals | eval
Failure_Rate=FailureCount/Totals |eval Success_Rate=SuccessCount/Totals

Apache Beam - Bigquery Upsert

I have a dataflow job which splits up a single file into x number of records (tables). These flow in to bigquery no problem.
What I found though was there was no way to then execute another stage in the pipeline following the results.
For example
# Collection1- filtered on first two characters = 95
collection1 = (
rows | 'Build pCollection1' >> beam.Filter(lambda s: data_ingestion.filterRowCollection(s, '95'))
| 'p1 Entities to JSON' >> beam.Map(lambda s: data_ingestion.SplitRowDict(s, '95'))
| 'Load p1 to BIGQUERY' >> beam.io.WriteToBigQuery(
data_ingestion.spec1,
schema=parse_table_schema_from_json(data_ingestion.getBqSchema('95')),
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED) # Write to Bigquery
)
# Collection2 - filtered on first two characters = 99
collection2 = (
rows | 'Build pCollection2' >> beam.Filter(lambda s: data_ingestion.filterRowCollection(s, '99'))
| 'p2 Split Entities to JSON' >> beam.Map(lambda s: data_ingestion.SplitRowDict(s, '99'))
| 'Load p2 to BIGQUERY' >> beam.io.WriteToBigQuery(
data_ingestion.spec2,
schema=parse_table_schema_from_json(data_ingestion.getBqSchema('99')),
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED) # Write to Bigquery)
Following the above I'd like to run something like the following:
final_output = (
collection1, collection2
| 'Log Completion' >> beam.io.WriteToPubSub('<topic>'))
Is there anyway to run another part of the pipeline following the upsert to bigquery or is this impossible? Thanks in advance.
Technically, there's no way to do exactly what you asked. beam.io.WriteToBigquery consumes the pCollection leaving nothing.
However, it's simple to duplicate the input to beam.io.WriteToBigquery in a parDo just before you call beam.io.WriteToBigquery, and to send copies of your pCollection down each path. See this answer which references this sample doFn from the docs

Oracle Window function not working in SnowFlake

i am working on Oracle to Snowflake migration.
while migrating oracle window functions to snowflake getting below error, could you let me know, alternate way for oracle function in snowflake.
SELECT
COL1,
COL2, ...,
SUM(SUM(TAB1.COL1)) OVER (PARTITION BY
TAB1.COL2,
TAB1.COL3,
TAB1.COL4,
TAB1.COL5,
TAB1.COL6,
TAB1.COL7,
TAB1.COL8,
TAB1.COL9,
TAB1.COL10,
ORDER BY MAX(CALENDAR_TAB.DATE_COLUMN) RANGE BETWEEN INTERVAL '21' DAY PRECEDING AND CURRENT ROW)/4 AS COLMN
FROM TAB1,CALENDAR_TAB
JOIN
GROUP BYCOL1,
COL2, ...
Below is the error message:
QL Error [1003] [42000]: SQL compilation error:
syntax error line 75 at position 60 unexpected 'INTERVAL'.
syntax error line 75 at position 78 unexpected 'PRECEDING'.
Per the documentation for Snowflake, here is the syntax:
https://docs.snowflake.com/en/sql-reference/functions-analytic.html#window-syntax-and-usage
slidingFrame ::=
{
ROWS BETWEEN <N> { PRECEDING | FOLLOWING } AND <N> { PRECEDING | FOLLOWING }
| ROWS BETWEEN UNBOUNDED PRECEDING AND <N> { PRECEDING | FOLLOWING }
| ROWS BETWEEN <N> { PRECEDING | FOLLOWING } AND UNBOUNDED FOLLOWING
}
It might not like the INTERVAL and the quoted number.
The Window frame document is a good place to start.
If I read the Oracle syntax correctly, the window frame your are using for the MAX is value based aka (interval '21' day) which Snowflake does not support, it only supports N rows based logic. If you have 1 row per day, and always 1 row, then you can use the row count logic, but otherwise this is not supported.
Which means you to join back to your own data tables and apply the prior time filter on the join.

Regex to extract two values from single string in Splunk

I've log statements appearing in Splunk as below.
info Request method=POST, time=100, id=12345
info Response statuscode=200, time=300, id=12345
I'm trying to write a Splunk query that would extract the time parameter from the lines starting with info Request and info Response and basically find the time difference. Is there a way I can do this in a query? I'm able to extract values separately from each statement but not the two values together.
I'm hoping for something like below, but I guess the piping won't work:
... | search log="info Request*" | rex field=log "time=(?<time1>[^\,]+)" | search log="info Response*" | rex field=log "time=(?<time2>[^\,]+)" | table time1, time2
Any help is highly appreciated.
General process:
Extract type into a field
Calculate response and request times
Group by id
Calculate the diff
You may want to use something other than stats(latest) but won't matter if there's only one request/response per id.
| rex field=_raw "info (?<type>\w+).*"
| eval requestTime = if(type="Request",time,NULL)
| eval responseTime = if(type="Response",time,NULL)
| stats latest(requestTime) as requestTime latest(responseTime) as responseTime by id
| eval diff = responseTime - requestTime