Getting rid of the parenthesis with regular expression group matching - regex

I'm trying to analyze logs using splunk and I need to parse lines that look like this:
2012-06-20 20:35:13,980 INFO [http-bio-8080-exec-72] (b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf) interceptor.CustomLoggingOutInterceptor (AbstractLoggingInterceptor.java:149) - Outbound Message
I've got this regex which matches:
(?i)^[^\]]*\]\s+(?P<FIELDNAME>[^ ]+)
this part :
2012-06-20 20:35:13,980 INFO [http-bio-8080-exec-72] (b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf)
Using groups I can extract the real information that I need and that is :
(b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf)
Only problem is that I don't need parenthesis, I've tried with some negative lookahead/lookbehind google searches, don't really know regex that well.
So my final goal would be to capture b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf . thanks

(?i)^[^\]]*\]\s+\((?P<FIELDNAME>[^ ]+)\)
That matches and drops the () in group 1.
Play with the regex here.

Related

Generate Regex with special characters and brackets

I am trying to create a regex for this text *[Failure] : Automation Failure, Reason - Unable to find Watch Live button on title detail page*, I want to extract anything between *[Failure] : and *. I tried coming up with \*\[Failure][ :,-,-]+[a-zA-Z0-9]+\* but this does not work.
In my case desired output is Automation Failure, Reason - Unable to find Watch Live button on title detail page
If you simply want to get everything between the '*[Failure] :' and the '*', you can use a lookbehind and a lookahead to make the regex:
(?<=\*\[Failure] : ).*(?=\*)
(?<=\*\[Failure] : ) looks behind for '*[Failure] :'
(?=\*) looks ahead for a '*'
You are missing some essential characters in the 2 character classes that you use to span the match till *, and to only get the part in between you can use a capture group or else you will have the full match only.
\*\[Failure][ :,-]+([a-zA-Z0-9, -]+)\*
Regex demo

How to write Regex expression to extract the content in brackets, after string and the first match?

I would like to use Regular expression to extract content between brackets, after some specific string and the 1st match.
Example text:
**-n --command PING being applied--:
Wed May 34 7:23:18 2010
[ZZZ_6323] Command [ping] failed with error [[TEZZZGH_IUE] [[EIJERTMMMMIJE_EIEJ] gdyugedyue Service [ABC] is not available in domain [DEF]. Check the content and review diejidjei. Service [ABC] Domain [DEF] ] did not ping back. It might be due to one of the following reasons:
=> Reason1
=> Reason3
=> Reason 4: deijdije djkeoidjeio.
info=4343 day=Mon year=2010*
I would like to extract the string between [] but after string Service and 1st match as Service could appear again later. In this case ABC
Could someone help me?
I am not able to combine these three conditionals.
Thanks
Assuming that you don't care about capturing square brackets inside the [ ] pair, by far the easiest way to do this is to use the following simple regex:
Service (\[[^\]]*\])
and extract only the 1st capturing group from the result using whatever regex functionality you're using. For example, using JS, you would write
string.match(/Service (\[[^\]]*\])/)[1]
to extract the first capturing group.
If you instead want a regex that will only capture the first occurrence, you can exploit the greedy nature of the * quantifier and change the regex to this:
Service (\[[^\]]*\]).*
Service \[([^\]]+)\]
will match Service [anything besides brackets] and capture anything besides brackets in group number 1. Since regex engines work left-to-right, the first match will be the leftmost match.
Test it live on regex101.com.
In PHP, you could do this (code snippet generated by RegexBuddy):
if (preg_match('/Service \[([^\]]+)\]/', $subject, $groups)) {
$result = $groups[1];
} else {
$result = "";
}
The definition of the group name How should I write it? I know that it can be like this: (?) but I dont know how to combine it with this part Service [([^]]+)] in a single way

GA Regex Filter - Filter PPC traffic and replace it with "PPC"

1) www.mysite.site/product/brand?card_type=all
2) www.mysite.site/product/brand?card_type=all&cp=randomID&keyword=randomKeyword&network=randomNetwork&v3=sometype&v4=MM
So I have these 2 types of URLs being reported on my Analytics being:
Traffic that went on that page organically
Traffic that went on that page via Paid Traffic
I need to basically find all the links that have a "&" followed by (cp|keyword|v1|v2|v3|v5) after the value for “card_type” and replace it with “ppc-traffic” - so ideally would have :
www.mysite.site/product/brand?card_type=all
www.mysite.site/product/brand/ppctraffic or just mysite.site/ppctraffic
What I attempted:
Search String
Request URI
^(https?:\/\/\S+\/[^?]*)(.*?)&(cp|keyword|v1|v2|v3|v5)
Replace String:
/ppctraffic
(I’ve also tried $1/ppctraffic and $2/ppctraffic)
When testing the regex online it seems to work so not sure what Im doing wrong.
Any help deeply appreciated
One way is to capture in a group upon /brand matching not a question mark [^?]+ and match ?card_type=all& afterwards followed by any character until the end of the string.
As your links do not start with https:// you could make that part optional (?:https?:\/\/)?.
^((?:https?:\/\/)?www\.[^?]+)\?card_type=all&(?:cp|keyword|v[1235]).*$
Then in the replacement use $1/ppctraffic
Regex demo
const pattern = /^((?:https?:\/\/)?www\.[^?]+)\?card_type=all&(?:cp|keyword|v[1235]).*$/;
[
"www.mysite.site/product/brand?card_type=all&cp=randomID&v1=randomIDv2=productName&v3=sometype&v4=MM&fbclid=randomID",
"www.mysite.site/product/brand?card_type=all",
"www.mysite.site/product/brand?card_type=all&aa=randomID&v1=randomIDv2=productName&v3=sometype&v4=MM&fbclid=randomID"
].forEach(s => console.log(s.replace(pattern, "$1/ppctraffic")));

Regex to remove everything after -i- (with -i-)

I was trying to find solution for my problem.
Input: prd-abcd-efgh-i-0dflnk55f5d45df
Output: prd-abcd-efgh
Tried Splunk Query : index=aws-* (host=prd-abcd-efgh*) | rex field=host "^(?<host>[^.]+)"| dedup host | stats count by host,methodPath
I want to remove everything comes after "-i-" using simple regex.I tried with regex "^(?[^.]+)" listed here
https://answers.splunk.com/answers/77101/extracting-selected-hosts-with-regex-regex-hosts-with-exceptions.html
Please help me to solve it.
replace(host, "(?<=-i-).*", "")
Example here: https://regex101.com/r/blcCcQ/2
This (?<=-i-) is a lookbehind
I have no knowledge of Splunk. but the normal way to do that would be to match the part you don't want and replace it with an empty string.
The regex for doing that could be:
-i-.*
Then replace the match with an empty string.
Something simple like this should work:
([a-z-]+)-i-.+
The first capture group will return only the part preceding -i-.

Using a wildcard in Regex at the end of a URL in GA

I'm a newbie at Regex. I'm trying to get a report in GA that returns all pages after a certain point in the URL.
For example:
http://www.essentialibiza.com/ibiza-club-tickets/carl-cox/14-June-2016/
I want to see all dates so: http://www.essentialibiza.com/ibiza-club-tickets/carl-cox/*
Here's what I've got so far in my regex:
^https:\/\/www\.essentialibiza\.com\/ibiza-club-tickets\/carl-cox(?=(?:\/.*)?$)
You can try this:
https?:\/\/www\.essentialibiza\.com\/ibiza-club-tickets\/carl-cox[\w/_-]*
GA RE2 regex engine does not allow lookarounds (even lookaheads) in the pattern. You have defined one - (?=(?:\/.*)?$).
If you need all links having www.essentialibiza.com/ibiza-club-tickets/carl-cox/, you can use a simple regex:
www\.essentialibiza\.com/ibiza-club-tickets/carl-cox/
If you want to precise the protocol:
https?://www\.essentialibiza\.com/ibiza-club-tickets/carl-cox(/|$)
The ? will make s optional (1 or 0 occurrences) and (/|$) will allow matching the URL ending with cox (remove this group if you want to match URLs that only have / after cox).