Python regex match n lines after match

Python regex match n lines after match - regex

I have a pattern that works fine on regexr.com with pcre but when I use it with python it doesn't match anything.
the pattern is:
.*(?<=RSA SHA256:).*(?:.*\n){3}.*
and it matches the data on the website but when I run this on my python script it doesn't.
The goal is to match Accepted publickey and the next 3 lines.
Thank you!
script below:
import re
Accepted_publickey=r'.*(?<=RSA SHA256:).*(?:.*\n){3}.*'
file=open('secure')
for items in file:
re1=re.search(Accepted_publickey,items)
if re1:
print(re1.group())
The actual data is:
Oct 21 17:27:21 localhost sshd[19772]: Accepted publickey for vagrant from 192.168.2.140 port 54614 ssh2: RSA SHA256:uDsE4ecSD9ElWQ5Q0fdMsbqEzOe0Hszilv8xhU6dT6M
Oct 21 17:27:22 localhost sshd[19772]: pam_unix(sshd:session): session opened for user vagrant by (uid=0)
Oct 21 17:27:22 localhost sshd[19772]: User child is on pid 19774
Oct 21 17:27:22 localhost sshd[19774]: Starting session: shell on pts/2 for vagrant from 192.168.2.140 port 54614 id 0

You don't have to use a lookbehind, you could match the value.
To match the 3 following lines, you could switch the newline and .* to omit the last .*
^.*\bRSA SHA256:.*(?:\n.*){3}
^ Start of string
.*\bRSA SHA256:.* Match RSA SHA256: in the string preceded by a word boundary
(?:\n.*){3} repeat 3 times a newline followed by matching any char except a newline 3 times
Regex demo
In your code you might use read():
import re
Accepted_publickey = r'^.*RSA SHA256:.*(?:.*\n){3}.*'
f = open('secure')
items = f.read()
re1 = re.search(Accepted_publickey, items, re.M)
if re1:
print(re1.group())

Related

Regex Stop at the First Occurrence of a Word

How would I change my Regex to stop after the first match of the word?
My text is:
-rwxr--r-- 1 bob123 bob123 0 Nov 10 22:48 /path/for/bob123/dir/to/file.txt
There is a variable called owner, the first arg from cmd:
owner=$1
My regex is: ^.*${owner}
My match ends up being:
-rwxr--r-- 1 bob123 bob123 0 Nov 10 22:48 /path/for/bob123
But I only want it to be: -rwxr--r-- 1 bob123.

By adding a question mark: ^.*?${owner}. This will make the * quantifier non-greedy. But use -P option: grep -P to use Perl-compatible regular expression.
https://regex101.com/r/ThGpcq/1.

You do not need a regex here, use string manipulation and concatenation:
text='-rwxr--r-- 1 bob123 bob123 0 Nov 10 22:48 /path/for/bob123/dir/to/file.txt'
owner='bob123'
echo "${text%%$owner*}$owner"
# => -rwxr--r-- 1 bob123
See the online Bash demo.
The ${text%%$owner*} removes as much text as possible from the end of string (due to %%) up to and including the $owner, and - since the $owner text is removed - "...$owner" adds $owner back.

Graylog regex extract first valid Mac Address in message

I am trying to extract the first valid common mac address out of several different message entries in Graylog. I can do it with different Grok Extractors, but am wanting to do it with Regex so I can do conversions on the Mac to all lower case. Below are some sample messages and the Grok Patterns that work.
Question, how would I convert these Grok extractors to regex and or is there a single regex that would work in all 4 examples? Basically the regex would just need to match the first valid MAC address in each string and extract it.
Sample 1:
Equinox: *spamApTask1: Mar 20 15:26:04.033: #CAPWAP-3-ECHO_ERR: capwap_ac_sm.c:7019 Did not receive heartbeat reply; AP: 00:3a:9a:48:9b:40
Sample 2:
Equinox: *spamReceiveTask: Mar 17 12:34:39.264: #CAPWAP-3-DTLS_CONN_ERR: capwap_ac.c:934 00:3a:9a:30:f5:90: DTLS connection not found forAP 192.168.99.74 (43456), Controller: 192.168.99.2 (5246) send packet
Sample3:
Equinox: *spamApTask1: Mar 22 08:35:14.562: #LWAPP-4-SIG_INFO1: spam_lrad.c:44474 Signature information; AP 00:14:1b:61:f8:40, alarm ON, standard sig NULL probe resp 1, track per-Macprecedence 2, hits 1, slot 0, channel 1, most offending MAC 00:00:00:00:00:00 #yes but must make Mac lowercase
Sample 4:
Equinox: *idsTrackEventTask: Mar 22 08:40:13.816: #WPS-4-SIG_ALARM_OFF: sig_event.c:656 AP 00:14:1B:61:F8:40 : Alarm OFF, standard sig NULL probe resp 1, track=per-Mac preced=2 hits=1 slot=0 channel=1 yes but must make Mac lowercase
Sample1 Grok pattern:%{GREEDYDATA}AP: {COMMONMAC:WLC_APBaseMac}
Sample2 Grok pattern:%{GREEDYDATA}capwap_ac.c:934 %{COMMONMAC:WLC_APBaseMac}
Sample3 Grok pattern:%{GREEDYDATA}AP %{COMMONMAC:WLC_APBaseMac}
Sample4 Grok pattern:%{GREEDYDATA}AP %{COMMONMAC:WLC_APBaseMac}

You can make a pattern, which matches 5 groups of 2 hex digits followed by a semicolon, followed by the last 6th group of 2 hex digits:
(?i)(?:[0-9a-f]{2}:){5}[0-9a-f]{2}
Demo here. The (?i) at the start make the search case-insensitive.
UPDATED
If the above regex does not work in Graylog then you can try the very basic form of it, where all the quantifiers and character sets are expanded:
[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F]
Demo here.

How to match a regex pattern group for fixed no of time?

I wrote a regex which basically matches the group, however I want to limit its searching such that it doesn't look for match till end and stop if n groups are found.
My log --
[Mon Feb 27 15:40:12.341031 2017] [auth_digest:notice] [pid 2420:tid 332] [AH01757: generating secret for digest authentication ...]
My pattern-
([^\[\]]+)
THe above pattersn is able to match all however I just want to group only first 2 i.e [Mon Feb 27 15:40:12.341031 2017] [auth_digest:notice]
Some thing like-
([^\[\]]+){2}
Individual character or words work using eg (abc){2} but how about group ?
Is it possible ?

Your pattern ([^\[\]]+) uses a negated character class which matches not [ and ] and does not take the structure of the opening and closing brackets into account.
Repeating it with {2} repeats the group and will result in a match and a capturing group where the group contains the match of the last iteration.
If you want the 2 matches from the start of the string you could use an anchor ^ to assert the start of the string and use 2 capturing groups with a space between matching (\[[^]]+\]) which will include the opening and closing square brackets.
^(\[[^]]+\]) (\[[^]]+\])
Regex demo

Try thiss. You just need to handle the one extra white space at the end.
const str = "[Mon Feb 27 15:40:12.341031 2017] [auth_digest:notice] [pid 2420:tid 332] [AH01757: generating secret for digest authentication ...]";
const regex = /(?:\[.+?\] ){2}/;
console.log(str.match(regex));

Regex - Match first two words but drop second word if it contains variables

How do I match the first two words from the log description but I don't the second word if it contains a # and numbers.
Logs:
<14>Dec 19 08:48:44 Xwsdedserfse11 httpd: [century]: Tue, 19 Dec 2017 08:48:44 -0800|JohnnyDoe|auth|INFO|1|Successful login for 'JohnnyDoe' from 1.1.1.1 (authentication type: ldap)
<12>Dec 19 08:25:18 Xwsdedserfse11 php: [century]: Tue, 19 Dec 2017 08:25:18 -0800||error|WARNING|1|Query #145050 used to generate source data is inactive.
My match should be
Successful login
Query
I have been working with different variation of this (?:[^\|]*\|){5}(\S+\s)(\S+)\s which pulls the first two words
and (?:[^\|]*\|){5}(\S+\s)([a-zA_Z]+)\s\
and (?:[^\|]*\|){5}(\S+\s)([^#0-9]+)\s but this one doesn't pull Query from the second log.

You can use the next regex: (?:[^\|]*\|){5}(\S+)\s(?:[#\d]+|([a-zA-Z]+))\s

You can use this regex:
(?<=[^\|]+?\|)([a-zA-Z]+?\s(?:[^#]+?\s)?)(?<=[^\|]*)

Extract Name out of Log ( Regex, If/else)

i have different log entry in my logfile and want to extract the hostname
my current regex looks like: \[[^:]* (but does not work very well)
What i want:
hostA
hostB
hostC
hostD
Log Example
Dec 22 12:15:40 0.0.0.0 [hostA: some.text]:
Dec 22 12:15:40 0.0.0.0 [loremipsumdolor#hostB: some.text]:
Dec 22 12:15:40 0.0.0.0 [hostC: some.text]:
Dec 22 12:15:40 0.0.0.0 [sometext#hostD: some.text]:

You can use either
\[([^\]#]*#)?(?P<host>.*?):
to capture the host name in the named group host. (Demo)
or
(?<=[\[#])[^#]*?(?=:)
to match only the host name. (Demo)
The first pattern matches text after the first [ character, skipping everything up to the next # if it exists.
The second pattern will pick up anything between a [ or # and a :, so it's a little more likely to produce false positives I think.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Python regex match n lines after match - regex

Related

Regex Stop at the First Occurrence of a Word

Graylog regex extract first valid Mac Address in message

How to match a regex pattern group for fixed no of time?

Regex - Match first two words but drop second word if it contains variables

Extract Name out of Log ( Regex, If/else)

Categories

Resources