Using regular expressions to match log file - regex

I am working on a regular expression problem and have run into some issues when I try to match text between certain markers. Below is a regular expression tester with what I have completed so far.
https://regex101.com/r/gE8uQ1/1
I am trying to select the ALL of the query text which appears after "statement: " and before the \nTIMESTAMP. I have used \n\d{4}-d{2}-d{2} to represent the timestamp, but it will not select the whole query. Why is this happening? Is it because of my modifiers?

(?<=statement: )([ _\-|0-9,:;\.=A-Za-z\(\)"\n\t']+?)(?=(?:\d{4}-\d{2}-\d{2}|$))
Try this.See demo.Just change your negative lookahead to positive lookahead and add quantifier to character class.
See demo.
https://regex101.com/r/gE8uQ1/5

You can use the following with g and s (because your querys have new lines which are not matched by .) modifiers:
(?<=statement: )([ _\-|0-9,:;\.=A-Za-z\(\)"\n\t'].+?)(?=\d{4}-\d{2}-\d{2}|$)
^ ^ ^
See DEMO

Related

Regular Expression to return number without coma

I have to extract a number formatted xx,xxx.xx in a different format - xxxxx.xx by applying a regular expression. In other words, I have to remove the comma from the number in the final capture group.
I am not quite sure if it's possible to achieve only with the regular expression and without writing specific code to split and join at these values.
Here is the demo.
This is the part of input string:
AMT : EGP 3,000.00
My current regex is AMT\s*:\s*EGP\s*(\d*,\d*.\d*), which basically retreives 3,000.00.
I'm expecting to have 3000.00 in final capture group.
EDIT:
Since the OP doesn't want to capture and replace, the following can be done:
AMT\s*:\s*EGP\s*(\d*),(\d*.\d*)
The expected data is now part of the two capturing groups, and can be accessed by concatenating them: \1\2.
Demo
You can capture everything other than the , in two groups, and then replace:
Capture with:
(AMT\s*:\s*EGP\s*\d*),(\d*.\d*)
Replace with: \1\2
Demo
Try this:
AMT\s*:\s*EGP\s*\K\d+(,\d{3})*(\.\d+)?
Here is Demo
After find the match, do something like: Mystring.Replac(",", "")

Regular Expresion in Tableau returns only Null's in Calculated Field

I'm trying to extratct in Tableau the first occurance of part of speech name (e.g. subst, adj, fin) located between { and : in every line from column below:
{subst:pl:nom:m3=18, subst:pl:voc:m3=1, subst:pl:acc:m3=5}
{subst:sg:gen:m3=5, subst:sg:inst:m3=1, subst:sg:gen:f=1, subst:sg:nom:m3=1}
{subst:sg:nom:f=3, subst:sg:loc:f=2, subst:sg:inst:f=1, subst:sg:nom:m3=1}
{adj:sg:nom:m3:pos=2, adj:sg:acc:m3:pos=1, adj:sg:acc:n1.n2:pos=3, adj:pl:acc:m1.p1:pos=3, adj:sg:nom:f:pos=1}
{adj:sg:gen:f:pos=2, adj:sg:nom:n:pos=1}
{fin:sg:ter:imperf=5}
To do this I use the following regular expression: {(\w+):(?:.*?)}$. Unfortunately my calculated field returns only Null's:
Screeen from Tableau
I checked my regular expression on regex tester and is valid:
Sreen from regex101.com
I don't know what I'm doing wrong so if anybody has any suggestions I would be greatfull.
Tableau regex engine is ICU, and there are some differences between it and PCRE.
One of them is that braces that should be matched as literal symbols must be escsaped.
Your regex also contains a redundant non-capturing group ((?:.*?) = .*?) and a lazy quantifier that slows down matching since you want to check for a } at the end of the string, and thus should be changed to a greedy .*.
You can use
REGEXP_EXTRACT([col], '^\{(\w+):.*\}$')

Regular Expressions (pcre) for shortcode/bbcode

I have a regex (see on https://regex101.com/r/mB7vQ8/2):
/\[content_box((.*?)!?\])(.*?)\[\/content_box\]/ig
for match all [content_box] (with or without tag parameters) in a text like:
[content_boxes foo=bar][content_box baz=foo]text[/content_box][/content_boxes]
[content_box]text[/content_box]
[content_box foo=bar]text[/content_box]
My regex work, but if [content_box] is included in a [content_boxes] the rule fails the match (in strong):
[content_boxes foo=bar][content_box baz=foo]text[/content_box][/content_boxes]
[content_box]text[/content_box]
[content_box foo=bar]text[/content_box]
the expected match is:
[content_boxes foo=bar][content_box baz=foo]text[/content_box][/content_boxes]
[content_box]text[/content_box]
[content_box foo=bar]text[/content_box]
see online https://regex101.com/r/mB7vQ8/2
How solve it?
You can use this regex with word boundaries:
~\[content_box\b\s*([^]]*)\](.*?)\[/content_box\]~
RegEx Demo
Here content_box\b will not match content_boxes and match will always be inner [content_box ..] tag.

Finding a pattern with optional end using regular expression

I am looking for one single regular expression to extract a block of text, which can be surrounded with an optional end. The challenge here is just to use a single regular expression.
The input is as follows:
Anchor: This is the text I want to extract A/C : 2015-5-20
Anchor: This is the text I want to extract
I am currently using the following regular expression
Anchor:(?<extact>.*)(A\/C)
The result looks as follows:
If I make the A/C block optional, Anchor:(?<extact>.*)(A\/C)? using a ? the matching gets to long:
It looks as follows:
Any ideas how to elegantly solve this with a single regex. An additional constraint is that I want to have a named block in the regex, (here extact)
You can find the sample code on regex101: https://regex101.com/r/wH5iQ4/1
Anchor:(?<extact>.*?)\s*(?=A\/C|$)
You can make use of lookahead here.See demo.
https://regex101.com/r/wH5iQ4/3

Regular Expression on Strings

I wrote this regular expression in http://www.regexr.com/
Regular Expression: (^A.*\..\s)\|((\sS.*:\sA.*,\sN.....\s))\|(\sN.+)/g
Text:
AT1G01010.1 | Symbols: ANAC001, NAC001 | NAC domain containing protein 1
| chr1:3760-5630 FORWARD LENGTH=429
I'm able to detect the 1st String|2nd String| 3rd String| in the above text.
I would like to eliminate the 2nd part (" Symbols: ANAC001, NAC001 ") in the above text using the regular expression. Could anyone help? Or I need a regular expression to detect only the 1st and 3rd String.
Consider the following regex since you are already using the beginning of string ^ anchor.
^(A[^|]+)\s\|[^|]+\|\s*([^|]+)\s\|
Live Demo
What exactly are you trying to do? the regular ex that you provide that will search the whole text and return you the one that match. so you are treating the regex as a whole. if you want grab just the 1st part and the 3st part, then you need to do two seperate regex on the same text twice and merge the result together.
try ?:
(^A.*\..\s)\|(?:\sS.*:\sA.*,\sN.....\s)\|(\sN.+)