Regular expression for java - regex

I have 2 regular expressions that I want to combine.
First Regex:
(?:for invalid user|for user|user|for|to account) (?:\"|')?(\\S+?)(?:\"|'|'s|,)?(?:\\.|\\s.*?)$
This regular expression captures the username from this string:
<37>May 14 10:02:10 imapd[7336]: [ID 210418 auth.notice] Login failed
user=k33360 auth=k33360 host=bas3-stlambert20-1176368555.dsl.bell.ca
[70.29.245.171]
Second Regex:
(?:\\w*\\:?\\s?\\[ID \\d+ \\w+.\\w+\\]\\s*(\\w*)?.*?)
This regular expression captures the username from this string:
<33>Jul 16 07:55:44 sudo: [ID 702911 auth.alert] do0905 : 1
incorrect password attempt ; TTY=pts/2 ; PWD=/home/do0905/bin ;
USER=root ; COMMAND=/usr/bin/su -
Now I want to combine both of these regular expression with the OR condition, so that it is true for both the string.
I have tried using this regular expression, but it doesn't work:
(?:for invalid user|for user|user|for|to account) (?:\"|')?(\\S+?)(?:\"|'|'s|,)?(?:\\.|\\s.*?)$ | ((?:\\w*\\:?\\s?\\[ID \\d+ \\w+.\\w+\\]\\s*(\\w*)?.*?))
How can I combine these two regular expressions using the OR condition?

Try to combine it like this:
Pattern.compile("(?i)(?:for invalid user|for user|user|for|to account)(?:\"|')?(\\S+?)(?:\"|'|'s|,)?(?:\\.|\\s.*?)|((?:\\w*:?\\s?\\[ID \\d+ \\w+.\\w+]\\s*(\\w*)?.*?))");

Related

Could anyone help me in understanding the groovy script below

I am trying to decode some groovy script. I was able to figure out that it is a regular expression but couldn't figure out what the code is exactly.
def dirNumber = this.'Directory Number'
dirNumber?"61" + (dirNumber =~ /0([0-9]+)/)[0][1] + "#":null
According to Regular expression operators section of https://groovy-lang.org/operators.html, =~ is the find operator, which creates a java.util.regex.Matcher for pattern to the right matching them on string to the left.
So, dirNumber =~ /0([0-9]+)/ is equivalent to
Pattern.compile("/0([0-9]+)/").matcher(dirNumber) and evaluates to an instance of java.util.regex.Matcher.
Groovy gives you the ability to access matches by index ([0] in your code); your regular expression uses grouping, so in each match you can access groups by (1-based: 0 denotes the entire pattern) index ([1] in your code), too.
So, your code (if dirNumber is not null) extracts the first group of the first match of the regular expression.
// EDITED
You can get an IndexOutOfBoundsException when the first index ([0] in your code) is out of the matches' range; when your second index ([1] in you code) is of the grous' range, you get a null without exception accessing the group through index...
You can test these scenarios with a simplified version of the code above:
def dirNumber = "Xxx 15 Xxx 16 Xxx 17"
def matcher = dirNumber =~ /Xxx ([0-9]+)/
println(matcher[0][1]) // 15
println(matcher[0][5]) // null
println(matcher[5][1]) // IndexOutOfBoundsException

Regex in PostgreSQL

I'm ultimately trying to use the following regex expression.
SELECT *
into table
FROM table2
Where
(Description ~ '\bD\s*(&|AND|&AMP;|N|AMP|\*|\+)\s*B.*')
However this returns the following errors:
[XX000] ERROR: Invalid preceding regular expression prior to repetition operator. The error occured while parsing the regular expression fragment: 'P;|N|AMP|>>>HERE>>>|+)sB.'. Detail: ----------------------------------------------- error: Invalid preceding regular expression prior to repetition operator. The error occured while parsing the regular expression fragment: 'P;|N|AMP|>>>HERE>>>|+)sB.'. code: 8 ...
Any idea on the fix?
You should replace \b with \y (or \m) to fix the pattern, and you may put single chars inside a capturing group into a character class where you do not have to escape them, (&|\*|\+) -> [*+&]. Note you do not need .* at the end, unless you are matching (if you just check for a regex match with ~ you do not need it);
Use
'\yD\s*(AND|&AMP;|N|AMP|[*+&])\s*B'
See the online demo:
CREATE TABLE tb1
(website character varying)
;
INSERT INTO tb1
(website)
VALUES
('D AND B...'),
('ROCK''N''ROLL'),
('www.google.com'),
('More text here'),
('D N Brother')
;
SELECT * FROM tb1 WHERE website ~ '\yD\s*(AND|&AMP;|N|AMP|[*+&])\s*B';
Output

Regular expression with multiple endings

I have a pandas DataFrame like this:
idx name
1 "NM_014855.2(AP5Z1):c.80_83delGGATinsTGCTGTAAACTGTAACTGTAAA (p.Arg27_Ala362delinsLeuLeuTer)"
2 "NM_014630.2(ZNF592):c.3136G>A (p.Gly1046Arg)"
3 "NM_000410.3(HFE):c.892+48G>A"
4 "NC_000014.9:g.(31394019_31414809)_(31654321_31655889)del"
I need to extract whatever follows the ':' character, until any of the following:
" ("
"del"
{end of string}
I have tried the following:
df.str.extract(r"\):(.*) \(|\n")
But it doesn't work for all the cases.
How can I properly specify the condition I need?
Use a lazy match *? to minimize how much the .* will capture, then specify the stop conditions you're looking for:
df.str.extract(r":(.*?)(?:\(|del|$)")
Regular expressions normally match the longest possible string, but ? switches it to match the shortest possible string.

Apply regular expression to the second word in "|" separated string in a interpretor Flume config

My requirement is to apply regular expression to the data coming from kafka.
The data is as follow:
abc|def|mnq|xyz
abc1|def1|mnq1|xyz1
abc2|def2|mnq2|xyz2
I want to apply regular expression on the second word i.e (def) from the first sting using a flume interpretor.
Regular expression can be to filter words and decimal numbers.
Can someone help in this.
Following python code matches all the second words in all the lines:
import re
# used || to add multilines combine into one string
parent = """abc|def|mnq|xyz||
abc1|def1|mnq1|xyz1||
abc2|def2|mnq2|xyz2"""
pattern = re.compile("\w+\|(.*?)\|\w+", re.MULTILINE)
m = pattern.findall(parent)
print m
which outputs|
['def', 'def1', 'def2']
Note: escape '|' by '\'.

Perl regular expression to match input for port numbers

I would like to write a regular expression that would only accept valid input that would qualify as a port number. I want to only accept input for the characters 0-9 and no special characters. The number should not be longer than 5 characters.
I read user input using this method.
my $port_number = <>;
I know that the regular expression should look something like this.
^[0-9]*$
How do I combine the regular expression with the reading of the command line input without using an if statement?
Try this code:
$result = ($port_number =~ m/^[0-9]{1,5}$/);
$result will be set to 1 if the $port_number matches your criteria, and will be set to 0 otherwise.