Need improvement on the Regex - regex

I have wrote an easy regex for extracting user SC08.
https://regex101.com/r/L1DOzH/1/ Performance wise, its really bad taking around 1448 steps.
Jun 2 11:16:44 192.168.55.19 1 2020-06-02T10:16:43.721Z chisdsm#abcd.com dsm 4493 USR1278I [U#21513 sev="INFO" msg="user logged out due to inactivity" user="SC08"]
Jun 2 10:13:50 192.168.55.19 1 2020-06-02T09:13:50.297Z chisdsm#abcd.com dsm 4493 DO0426I [DA#21513 sev="INFO" msg="switch domain" admin="SC08"
Jun 2 10:13:43 192.168.55.19 1 2020-06-02T09:13:42.956Z chisdsm#abcd.com dsm 4493 DAO0267I [DA#21513 sev="INFO" msg="user logged in" admin="SC08" stime="2020-06-02 10:13:42.944" role="ALL_ADMIN" source="192.168.54.9"]
May 27 15:53:38 192.168.55.129 1 2020-05-27T14:53:37.669Z chisdsm#abcd.com dsm 4493 DAO0227I [DA#21513 sev="INFO" msg="delete file signature" user="SC08" filePath="/bin/rm"]

Alternation group as the first pattern in a regex cancels some optimizations that are in place for patterns that start with a more specific pattern.
Since your alternatives match = delimited strings, you may put it at the beginning of the pattern, and then use lookarounds, as in Michail's suggestion. Here is a small variation with 139 steps:
=(?:(?<=user=)"(?<user1>\w+)|(?<=admin=)"(?<user2>\w+))
See the regex demo. Details
= - an equals sign
(?:(?<=user=)"(?<user1>\w+)|(?<=admin=)"(?<user2>\w+)) - a non-capturing group:
(?<=user=) - user= must be immediately to the left of the current position
" - a " char
(?<user1>\w+) - Group "user1": 1+ word chars
| - or
(?<=admin=) - admin= must be immediately to the left of the current position
" - a " char
(?<user2>\w+) - Group "user2": 1+ word chars
If your matches are always preceded with a whitespace, use it as the first pattern:
\s(?:user="(?<user1>\w+)|admin="(?<user2>\w+))
See this regex demo, with 918 steps.
If you know the matches are somewhere close to the end of the line, use
.*\b(?:user="(?<user1>\w+)|admin="(?<user2>\w+))
See this regex demo, 568 steps. .* at the start will move the regex index at the end of a line/string and then backtrack to find either user= or admin=.

Related

Regular expression for matching a specifc substring of a string

I have a log file that logs connection drops of computers in a LAN. I want to extract name of each computer from every line of the log file and for that I am doing this: (?<=Name:)\w+|(-PC)
The target text:
`[C417] ComputerName:KCUTSHALL-PC UserID:GO kcutshall Station 9900 (locked) LanId: | (11/23 10:54:09 - 11/23 10:54:44) | Average limit (300) exceeded while pinging www.google.com [74.125.224.147] 8x
[C445] ComputerName:FRONTOFFICE UserID:YB Yenae Ball Station 7C LanId: | (11/23 17:02:00) | Client is connected to agent.`
The problem is that some computer names have -PC in them and in some isn't. The expression I have created matches computer without -PC in their names but it if a computer has -PC in the name, it treats that as a separate match and I don't want that. In short, it gives me 3 matches, but I want only 2. That's why I need help here, I am beginner in regex.
You may use
(?<=Name:)\w+(?:-PC)?
Details
(?<=Name:) - a place immediately preceded with Name:
\w+ - 1+ word chars
(?:-PC)? - an optional non-capturing group that matches 1 or 0 occurrences of -PC substring.
Consider using word boundaries if you need to match PC as a whole word,
(?<=Name:)\w+(?:-PC\b)?
See the regex demo.

How to create a matching regex pattern for "greater than 10-000-000 and lower than 150-000-000"?

I'm trying to make
09-546-943
fail in the below regex pattern.
​^[0-9]{2,3}[- ]{0,1}[0-9]{3}[- ]{0,1}[0-9]{3}$
Passing criteria is
greater than 10-000-000 or 010-000-000 and
less than 150-000-000
The tried example "09-546-943" passes. This should be a fail.
Any idea how to create a regex that makes this example a fail instead of a pass?
You may use
^(?:(?:0?[1-9][0-9]|1[0-4][0-9])-[0-9]{3}-[0-9]{3}|150-000-000)$
See the regex demo.
The pattern is partially generated with this online number range regex generator, I set the min number to 10 and max to 150, then merged the branches that match 1-8 and 9 (the tool does a bad job here), added 0? to the two digit numbers to match an optional leading 0 and -[0-9]{3}-[0-9]{3} for 10-149 part and -000-000 for 150.
See the regex graph:
Details
^ - start of string
(?: - start of a container non-capturing group making the anchors apply to both alternatives:
(?:0?[1-9][0-9]|1[0-4][0-9]) - an optional 0 and then a number from 10 to 99 or 1 followed with a digit from 0 to 4 and then any digit (100 to 149)
-[0-9]{3}-[0-9]{3} - a hyphen and three digits repeated twice (=(?:-[0-9]{3}){2})
| - or
150-000-000 - a 150-000-000 value
) - end of the non-capturing group
$ - end of string.
This expression or maybe a slightly modified version of which might work:
^[1][0-4][0-9]-[0-9]{3}-[0-9]{3}$|^[1][0]-[0-9]{3}-[0-9]{2}[1-9]$
It would also fail 10-000-000 and 150-000-000.
In this demo, the expression is explained, if you might be interested.
This pattern:
((0?[1-9])|(1[0-4]))[0-9]-[0-9]{3}-[0-9]{3}
matches the range from (0)10-000-000 to 149-999-999 inclusive. To keep the regex simple, you may need to handle the extremes ((0)10-000-000 and 150-000-000) separately - depending on your need of them to be included or excluded.
Test here.
This regex:
((0?[1-9])|(1[0-4]))[0-9][- ]?[0-9]{3}[- ]?[0-9]{3}
accepts (space) or nothing instead of -.
Test here.

Get tag name of the first question by using regex

I got a Problem with the following regex pattern:
m).*?^([^n]*)(modified)([^n]*)$.*
I want to replace the clipboard with
Clipboard := RegExReplace(Clipboard, "m).*?^([^n]*)(modified)([^n]*)$.*" ,"" )
Source looks like:
Ask Question Interesting 326 Featured
Hot Week Month 1 vote 0 answers 12 views
Type Guard for empty object
typescript modified 2 mins ago kremerd 312
0 votes
Expected result should be:
typescript modified 2 mins ago kremerd 312
But its replacing nothing. If this works i want to get later the tagnames ^([^n]*) by using regExMatch.
I am scripting with autohotkey (a windows open souce) from https://autohotkey.com
You want to match a line that contains a modified substring. The dot in a regex does not match the newline by default, so you need to pass the s (DOTALL) modifier (you may add it together with m, MULTILINE, modifier that makes ^ match the start of string position and $ to match the end of line position). Besides, to match non-newlines you need [^\n] (not [^n]).
To solve the issue you may use
RegExMatch(Clipboard, "s)^.*?(\n[^\n]*)(modified|asked|answered)", res)
Grab the whole line value via res, the text before the keywords via res1 and the keyword itself with res2.
Details
s) - the . now matches any char including line break chars
^ - start of the string
.*? - any 0+ chars, as few as possible
(\n[^\n]*) - Group 1 (accessed via res1 later): a newline followed with 0+ chars other than newline chars
(modified|asked|answered) - any of the three alternatives: modified, asked or answered.

Regex up to a special character and group of letters

Using Regex, I'm attempting to get back the following (stars denote what I'd like to extract) from each string using a single Regex command:
FO4H56FD-BTU (Follow Home 56): PLTD8
\***********
FO4H56FD-SYH-BI (Follow Home 56 SYH): PLTD8
\***********
FO4H52FD-SZH-AG4R-BI (Follow Home 52 SAH): QQTD8
\****************
FO4H58FD-SGH: (Follow Home 58 TGT): PLTS8
\***********
For some reason I'm having a lot of difficulties. I've been using various methods and currently have =REGEXEXTRACT(A43,"(FO.+)\-BI") which isn't working. Mine also isn't looking for the : currently. I was using a | for multiple rules which didn't seem to work out.
You may use
=REGEXEXTRACT(A43,"^(.*?)(?:-BI)?(?:[ :]|$)")
Details:
^ - start of string
(.*?) - capturing group #1 matching any 0+ chars as few as possible
(?:-BI)? - an optional non-capturing group matching 1 or 0 occurrences of -BI substring
(?:[ :]|$) - either a space, : or end of string

Matching a group that may or may not exist

My regex needs to parse an address which looks like this:
BLOOKKOKATU 20 A 773 00810 HELSINKI SUOMI
-------------------- ----- -------- -----
1 2 3 4*
Groups one, two and three will always exist in an address. Group 4 may not exist. I've written a regex that helps me get the first, second and third part but I would also need the fourth part. Part 4 is the country name and can either be FINLAND or SUOMI. If the fourth part didn't exist in an address the fourth group would be empty. This is my regex so far but the third group captures the country too. Any help?
(.*?)\s(\d{5})\s(.*)$
(I'm going to be using this Oracles REGEXP function)
Change the regex to:
(.*?)\s(\d{5})\s(.+?)\s?(FINLAND|SUOMI)?$
Making group three none greedy will let you match the optional space + country choices. If group 4 doesn't match I think it will be uninitialized rather than blank, that depends on language.
To match a character (or in your case group) that may or may not exist, you need to use ? after the character/subpattern/class in question. I'm answering now because RegEx is complicated and should be explained: only posting the fix without the answer isn't enough!
A question mark matches zero or one of the preceding character, class, or subpattern. Think of this as "the preceding item is optional". For example, colou?r matches both color and colour because the "u" is optional.
Above quote from http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm
Try this:
(.*?)\s(\d{5})\s(.*?)\s?([^\s]*)?$
This will match your input more tightly and each of your groups is in its own regex group:
(\w+\s\d+\s\w\s\d+)\s(\d+)\s(\w+)\s(\w*)
or if space is OK instead of "whitespace":
(\w+ \d+ \w \d+) (\d+) (\w+) (\w*)
Group 1: BLOOKKOKATU 20 A 773
Group 2: 00810
Group 3: HELSINKI
Group 4: SUOMI (optional - doesn't have to match)
(.*?)\s(\d{5})\s(\w+)\s(\w*)
An example:
SQL> with t as
2 ( select 'BLOOKKOKATU 20 A 773 00810 HELSINKI SUOMI' text from dual
3 )
4 select text
5 , regexp_replace(text,'(.*?)\s(\d{5})\s(\w+)\s(\w*)','\1**\2**\3**\4') new_text
6 from t
7 /
TEXT
-----------------------------------------
NEW_TEXT
-----------------------------------------------------------------------------------------
BLOOKKOKATU 20 A 773 00810 HELSINKI SUOMI
BLOOKKOKATU 20 A 773**00810**HELSINKI**SUOMI
1 row selected.
Regards,
Rob.