Generate Regex with special characters and brackets - regex

I am trying to create a regex for this text *[Failure] : Automation Failure, Reason - Unable to find Watch Live button on title detail page*, I want to extract anything between *[Failure] : and *. I tried coming up with \*\[Failure][ :,-,-]+[a-zA-Z0-9]+\* but this does not work.
In my case desired output is Automation Failure, Reason - Unable to find Watch Live button on title detail page

If you simply want to get everything between the '*[Failure] :' and the '*', you can use a lookbehind and a lookahead to make the regex:
(?<=\*\[Failure] : ).*(?=\*)
(?<=\*\[Failure] : ) looks behind for '*[Failure] :'
(?=\*) looks ahead for a '*'

You are missing some essential characters in the 2 character classes that you use to span the match till *, and to only get the part in between you can use a capture group or else you will have the full match only.
\*\[Failure][ :,-]+([a-zA-Z0-9, -]+)\*
Regex demo

Related

How to write Regex expression to extract the content in brackets, after string and the first match?

I would like to use Regular expression to extract content between brackets, after some specific string and the 1st match.
Example text:
**-n --command PING being applied--:
Wed May 34 7:23:18 2010
[ZZZ_6323] Command [ping] failed with error [[TEZZZGH_IUE] [[EIJERTMMMMIJE_EIEJ] gdyugedyue Service [ABC] is not available in domain [DEF]. Check the content and review diejidjei. Service [ABC] Domain [DEF] ] did not ping back. It might be due to one of the following reasons:
=> Reason1
=> Reason3
=> Reason 4: deijdije djkeoidjeio.
info=4343 day=Mon year=2010*
I would like to extract the string between [] but after string Service and 1st match as Service could appear again later. In this case ABC
Could someone help me?
I am not able to combine these three conditionals.
Thanks
Assuming that you don't care about capturing square brackets inside the [ ] pair, by far the easiest way to do this is to use the following simple regex:
Service (\[[^\]]*\])
and extract only the 1st capturing group from the result using whatever regex functionality you're using. For example, using JS, you would write
string.match(/Service (\[[^\]]*\])/)[1]
to extract the first capturing group.
If you instead want a regex that will only capture the first occurrence, you can exploit the greedy nature of the * quantifier and change the regex to this:
Service (\[[^\]]*\]).*
Service \[([^\]]+)\]
will match Service [anything besides brackets] and capture anything besides brackets in group number 1. Since regex engines work left-to-right, the first match will be the leftmost match.
Test it live on regex101.com.
In PHP, you could do this (code snippet generated by RegexBuddy):
if (preg_match('/Service \[([^\]]+)\]/', $subject, $groups)) {
$result = $groups[1];
} else {
$result = "";
}
The definition of the group name How should I write it? I know that it can be like this: (?) but I dont know how to combine it with this part Service [([^]]+)] in a single way

Regex in syslog template

I try to use regex in a syslog template but it still not works. I first tried do match everthing to see if it works but it dont works. I tested my expression with a online tool and there it works. In my opinion the log message should be empty because the expression always match or? But there are always zeros in the log file.
template(name="testLogFormat" type="list")
property(name="syslogtag"
regex.type="ERE"
regex.submatch="0"
regex.expression=".*--end"
regex.nomatchmode="ZERO"
)
Your input string looks like abc[123]. So, you may fix your current config using
regex.submatch="1"
regex.expression="\[([0-9]+)]"
See the regex demo.
Here,
\[ - matches a [
([0-9]+) - captures into Group 1 (you access the value using regex.submatch="1")
] - a closing ].

How to extract file name from URL?

I have file names in a URL and want to strip out the preceding URL and filepath as well as the version that appears after the ?
Sample URL
Trying to use RegEx to pull, CaptialForecasting_Datasheet.pdf
The REGEXP_EXTRACT in Google Data Studio seems unique. Tried the suggestion but kept getting "could not parse" error. I was able to strip out the first part of the url with the following. Event Label is where I store URL of downloaded PDF.
The URL:
https://www.dudesolutions.com/Portals/0/Documents/HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033
REGEXP_EXTRACT( Event Label , 'Documents/([^&]+)' )
The result:
HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033
Now trying to determine how do I pull out everything after the? where the version data is, so as to extract just the Filename.pdf.
You could try:
[^\/]+(?=\?[^\/]*$)
This will match CaptialForecasting_Datasheet.pdf even if there is a question mark in the path. For example, the regex will succeed in both of these cases:
https://www.dudesolutions.com/somepath/CaptialForecasting_Datasheet.pdf?ver
https://www.dudesolutions.com/somepath?/CaptialForecasting_Datasheet.pdf?ver
Assuming that the name appears right after the last / and ends with the ?, the regular expression below will leave the name in group 1 where you can get it with \1 or whatever the tool that you are using supports.
.*\/(.*)\?
It basically says: get everything in between the last / and the first ? after, and put it in group 1.
Another regular expression that only matches the file name that you want but is more complex is:
(?<=\/)[^\/]*(?=\?)
It matches all non-/ characters, [^\/], immediately preceded by /, (?<=\/) and immediately followed by ?, (?=\?). The first parentheses is a positive lookbehind, and the second expression in parentheses is a positive lookahead.
This REGEXP_EXTRACT formula captures the characters a-zA-Z0-9_. between / and ?
REGEXP_EXTRACT(Event Label, "/([\\w\\.]+)\\?")
Google Data Studio Report to demonstrate.
Please try the following regex
[A-Za-z\_]*.pdf
I have tried it online at https://regexr.com/. Attaching the screenshot for reference
Please note that this only works for .pdf files
Following regex will extract file name with .pdf extension
(?:[^\/][\d\w\.]+)(?<=(?:.pdf))
You can add more extensions like this,
(?:[^\/][\d\w\.]+)(?<=(?:.pdf)|(?:.jpg))
Demo

Google Analytics - Content grouping - Regex fix

This is our URL structure:
http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
http://www.disabledgo.com/access-guide/kingston-university/coombehurst-court-2
http://www.disabledgo.com/access-guide/kings-college-london/franklin-wilkins-building-2
http://www.disabledgo.com/access-guide/redbridge-college/brook-centre-learning-resource-centre
I am trying to create a list of groups based on the client names
/access-guide/[this bit]/...
So I can have a performance list of all our clients.
This is my regex:
/access-guide/(.*universit(y|ies)|.*colleg(e|es))/
I want it to group anything that has university/ies or college/es in it, at any point within that client name section of the URL.
At the moment, my current regex will only return groups that are X-University:
Durham-University
Plymouth-University
Cardiff-University
etc.
What does the regex need to be to have the list I'm looking for?
Do I need to have something at the end to stop it matching things after the client name? E.g. ([^/]+$)?
Thanks for your help in advance!
Depending upon your needs you may want to do:
/access-guide/([^/]*(?:university|universities|college|colleges)[^/]*)/
This will match names even if "university" or "college" is not at the end of the string. For example "college-of-the-ozarks" Note the non-capturing internal parenthesis, that should probably be used no matter what solution you go with, as you don't want to just match the word "university" or "college"
Live Example
Additionally, I don't know what may be in your but if you may have compound words you want to eliminate using a \b may be advisable. For instance if you don't want to match "miskatonic-postcollege" you may want to do something like this:
/access-guide/([^/]*\b(?:university|universities|college|colleges)\b[^/]*)/
If the client name section of the URL is after the access-guid/ and before the next /:
http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
|----------------------------|
you need to use a negated character class to only match university before the regex reaches that rightmost / boundary.
As per the Reference:
You can extract pages by Page URL, Page Title, or Screen Name. Identify each one with a regex capture group (Analytics uses the first capture group for each expression)
Thus, you can use
/access-guide/([^/]*(universit(y|ies)|colleges?))
^^^^^
See demo.
The regex matches
/access-guide/ - leftmost boundary, matches /access-guide/ literally
[^/]* - any character other than / (so we still remain in that customer section)
(universit(y|ies)|colleges?) - university, or universities, orcollegeorcolleges` literally. Add more if needed.

Getting rid of the parenthesis with regular expression group matching

I'm trying to analyze logs using splunk and I need to parse lines that look like this:
2012-06-20 20:35:13,980 INFO [http-bio-8080-exec-72] (b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf) interceptor.CustomLoggingOutInterceptor (AbstractLoggingInterceptor.java:149) - Outbound Message
I've got this regex which matches:
(?i)^[^\]]*\]\s+(?P<FIELDNAME>[^ ]+)
this part :
2012-06-20 20:35:13,980 INFO [http-bio-8080-exec-72] (b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf)
Using groups I can extract the real information that I need and that is :
(b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf)
Only problem is that I don't need parenthesis, I've tried with some negative lookahead/lookbehind google searches, don't really know regex that well.
So my final goal would be to capture b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf . thanks
(?i)^[^\]]*\]\s+\((?P<FIELDNAME>[^ ]+)\)
That matches and drops the () in group 1.
Play with the regex here.