GROK LOG Filter / grep specific values - regex

i am a noob at GROK and I need to grep specific things from a logfile
Here is an example of the log:
2021-03-16 12:23:30,717 [ STATUS ] {replicate_changes } Replication status: SRC_SCN 1235720653409 - SRC_TMSTMP 2021-03-16 12:23:27 - STMTS/s 189.18 - TX/s 101.05
From that line I need to grep for:
Timestamp
Value for STMTS/s
Value for TX/s
In regex it would look something like this:
(^\d.+) \[ .+ \].+ SRC_TMSTMP (\d.+) - STMTS\/s (\d.+) - TX\/s (\d.+)
Can anyone help me solve this mystery? Thx in advance!

Note the original question asked for timestamp, and the sample regex appears to be capturing both the (presumably) receipt timestamp and "SRC_TMSTMP". The simple grok pattern below will capture both and assign appropriately:
%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA} SRC_TMSTMP %{TIMESTAMP_ISO8601:source_timestamp} %{GREEDYDATA} STMTS/s %{BASE10NUM:stmts_per_sec:float} %{GREEDYDATA} TX/s %{BASE10NUM:tx_per_sec:float}
This could be further optimized based on additional sample data.
General grok syntax and usage is explained here: https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html
Pre-defined grok patterns can be found here:
https://github.com/elastic/elasticsearch/blob/7.11/libs/grok/src/main/resources/patterns/grok-patterns
In short, grok pattern matching follows the format:
%{DEFINED_GROK_PATTERN:field_name:optional_cast_type}
Note if no field_name is specified, it will not assign the captured value to a field - essentially the same as using a regex pattern without parentheses, or a non-capturing group.
Usage of this pattern depends on where you intend to use it - Elasticsearch or Logstash (based on the question tags). If Elasticsearch, see the first link - if using Logstash, see the following: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Note a useful tool in Kibana is the Grok Debugger, which can be found under Dev Tools:

Related

Regex search for UUID based uri in splunk

I am trying to search for all events which contain a UUID as part of a request url. Here is my query:
.... | regex requestURI=*/employee/[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}*
It gives error as:
Unknown search command '0'
What's the mistake I am making?
Try using " instead of *
.... | regex requestURI="/employee/[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}"
Now talking about the regex itself, but there is already a question for that. Check the answers here for the RegEx: Searching for UUIDs in text with regex
Some important points like case sensitivity etc. are discussed there.

GA Regex Filter - Filter PPC traffic and replace it with "PPC"

1) www.mysite.site/product/brand?card_type=all
2) www.mysite.site/product/brand?card_type=all&cp=randomID&keyword=randomKeyword&network=randomNetwork&v3=sometype&v4=MM
So I have these 2 types of URLs being reported on my Analytics being:
Traffic that went on that page organically
Traffic that went on that page via Paid Traffic
I need to basically find all the links that have a "&" followed by (cp|keyword|v1|v2|v3|v5) after the value for “card_type” and replace it with “ppc-traffic” - so ideally would have :
www.mysite.site/product/brand?card_type=all
www.mysite.site/product/brand/ppctraffic or just mysite.site/ppctraffic
What I attempted:
Search String
Request URI
^(https?:\/\/\S+\/[^?]*)(.*?)&(cp|keyword|v1|v2|v3|v5)
Replace String:
/ppctraffic
(I’ve also tried $1/ppctraffic and $2/ppctraffic)
When testing the regex online it seems to work so not sure what Im doing wrong.
Any help deeply appreciated
One way is to capture in a group upon /brand matching not a question mark [^?]+ and match ?card_type=all& afterwards followed by any character until the end of the string.
As your links do not start with https:// you could make that part optional (?:https?:\/\/)?.
^((?:https?:\/\/)?www\.[^?]+)\?card_type=all&(?:cp|keyword|v[1235]).*$
Then in the replacement use $1/ppctraffic
Regex demo
const pattern = /^((?:https?:\/\/)?www\.[^?]+)\?card_type=all&(?:cp|keyword|v[1235]).*$/;
[
"www.mysite.site/product/brand?card_type=all&cp=randomID&v1=randomIDv2=productName&v3=sometype&v4=MM&fbclid=randomID",
"www.mysite.site/product/brand?card_type=all",
"www.mysite.site/product/brand?card_type=all&aa=randomID&v1=randomIDv2=productName&v3=sometype&v4=MM&fbclid=randomID"
].forEach(s => console.log(s.replace(pattern, "$1/ppctraffic")));

Using a wildcard in Regex at the end of a URL in GA

I'm a newbie at Regex. I'm trying to get a report in GA that returns all pages after a certain point in the URL.
For example:
http://www.essentialibiza.com/ibiza-club-tickets/carl-cox/14-June-2016/
I want to see all dates so: http://www.essentialibiza.com/ibiza-club-tickets/carl-cox/*
Here's what I've got so far in my regex:
^https:\/\/www\.essentialibiza\.com\/ibiza-club-tickets\/carl-cox(?=(?:\/.*)?$)
You can try this:
https?:\/\/www\.essentialibiza\.com\/ibiza-club-tickets\/carl-cox[\w/_-]*
GA RE2 regex engine does not allow lookarounds (even lookaheads) in the pattern. You have defined one - (?=(?:\/.*)?$).
If you need all links having www.essentialibiza.com/ibiza-club-tickets/carl-cox/, you can use a simple regex:
www\.essentialibiza\.com/ibiza-club-tickets/carl-cox/
If you want to precise the protocol:
https?://www\.essentialibiza\.com/ibiza-club-tickets/carl-cox(/|$)
The ? will make s optional (1 or 0 occurrences) and (/|$) will allow matching the URL ending with cox (remove this group if you want to match URLs that only have / after cox).

what is the regexp pattern for multiline (logstash)

Currently I have:
multiline {
type => "tomcat"
pattern => "(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)|(---)"
what => "previous"
}
and this is part of my log:
TP-xxxxxxxxxxxxxxxxxxxxxxxx: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
at xxxxxx
Caused by: xxxxxxxxx
at xxxxxx
Caused by: xxxxxxxxx
--- The error occurred in xxxxxxxxx.
--- The error occurred xxxxxxxxxx.
My pattern doesn't work here. Probably because i added the (---) at the end. What is the correct regexp to also add the --- lines?
Thanks
You'll want to account for the other characters on the line as well:
(^---.*$)
I have put your regex and text into these online regex buddies and tried the suggestion of Eric:
http://www.regextester.com/
http://www.regexr.com/
Sometimes these online buddies really help to clear the mind. This picture shows what is recognized:
If I were stuck on this, I wouldn't focus on the regex itself any further. Rather I'd check these points:
As there are different regex dialects, what dialect is used by logstash? What does it mean to my pattern?
Are there any logstash specific modifiers that are not set and need to be set?
As Ben mentioned, there are further filter tools. Would it help to use grok instead?
If one log event start with a timestamp or a specific word, for example, in your logs if all logs start with TP, then you can use it as filter pattern.
multiline {
pattern => "^TP"
what => "previous"
negate => true
}
With this filter you can multiline your logs easy, no need to use complex patterns.

Getting rid of the parenthesis with regular expression group matching

I'm trying to analyze logs using splunk and I need to parse lines that look like this:
2012-06-20 20:35:13,980 INFO [http-bio-8080-exec-72] (b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf) interceptor.CustomLoggingOutInterceptor (AbstractLoggingInterceptor.java:149) - Outbound Message
I've got this regex which matches:
(?i)^[^\]]*\]\s+(?P<FIELDNAME>[^ ]+)
this part :
2012-06-20 20:35:13,980 INFO [http-bio-8080-exec-72] (b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf)
Using groups I can extract the real information that I need and that is :
(b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf)
Only problem is that I don't need parenthesis, I've tried with some negative lookahead/lookbehind google searches, don't really know regex that well.
So my final goal would be to capture b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf . thanks
(?i)^[^\]]*\]\s+\((?P<FIELDNAME>[^ ]+)\)
That matches and drops the () in group 1.
Play with the regex here.