i have different log entry in my logfile and want to extract the hostname
my current regex looks like: \[[^:]* (but does not work very well)
What i want:
hostA
hostB
hostC
hostD
Log Example
Dec 22 12:15:40 0.0.0.0 [hostA: some.text]:
Dec 22 12:15:40 0.0.0.0 [loremipsumdolor#hostB: some.text]:
Dec 22 12:15:40 0.0.0.0 [hostC: some.text]:
Dec 22 12:15:40 0.0.0.0 [sometext#hostD: some.text]:
You can use either
\[([^\]#]*#)?(?P<host>.*?):
to capture the host name in the named group host. (Demo)
or
(?<=[\[#])[^#]*?(?=:)
to match only the host name. (Demo)
The first pattern matches text after the first [ character, skipping everything up to the next # if it exists.
The second pattern will pick up anything between a [ or # and a :, so it's a little more likely to produce false positives I think.
Related
I have a pattern that works fine on regexr.com with pcre but when I use it with python it doesn't match anything.
the pattern is:
.*(?<=RSA SHA256:).*(?:.*\n){3}.*
and it matches the data on the website but when I run this on my python script it doesn't.
The goal is to match Accepted publickey and the next 3 lines.
Thank you!
script below:
import re
Accepted_publickey=r'.*(?<=RSA SHA256:).*(?:.*\n){3}.*'
file=open('secure')
for items in file:
re1=re.search(Accepted_publickey,items)
if re1:
print(re1.group())
The actual data is:
Oct 21 17:27:21 localhost sshd[19772]: Accepted publickey for vagrant from 192.168.2.140 port 54614 ssh2: RSA SHA256:uDsE4ecSD9ElWQ5Q0fdMsbqEzOe0Hszilv8xhU6dT6M
Oct 21 17:27:22 localhost sshd[19772]: pam_unix(sshd:session): session opened for user vagrant by (uid=0)
Oct 21 17:27:22 localhost sshd[19772]: User child is on pid 19774
Oct 21 17:27:22 localhost sshd[19774]: Starting session: shell on pts/2 for vagrant from 192.168.2.140 port 54614 id 0
You don't have to use a lookbehind, you could match the value.
To match the 3 following lines, you could switch the newline and .* to omit the last .*
^.*\bRSA SHA256:.*(?:\n.*){3}
^ Start of string
.*\bRSA SHA256:.* Match RSA SHA256: in the string preceded by a word boundary
(?:\n.*){3} repeat 3 times a newline followed by matching any char except a newline 3 times
Regex demo
In your code you might use read():
import re
Accepted_publickey = r'^.*RSA SHA256:.*(?:.*\n){3}.*'
f = open('secure')
items = f.read()
re1 = re.search(Accepted_publickey, items, re.M)
if re1:
print(re1.group())
I wrote a regex which basically matches the group, however I want to limit its searching such that it doesn't look for match till end and stop if n groups are found.
My log --
[Mon Feb 27 15:40:12.341031 2017] [auth_digest:notice] [pid 2420:tid 332] [AH01757: generating secret for digest authentication ...]
My pattern-
([^\[\]]+)
THe above pattersn is able to match all however I just want to group only first 2 i.e [Mon Feb 27 15:40:12.341031 2017] [auth_digest:notice]
Some thing like-
([^\[\]]+){2}
Individual character or words work using eg (abc){2} but how about group ?
Is it possible ?
Your pattern ([^\[\]]+) uses a negated character class which matches not [ and ] and does not take the structure of the opening and closing brackets into account.
Repeating it with {2} repeats the group and will result in a match and a capturing group where the group contains the match of the last iteration.
If you want the 2 matches from the start of the string you could use an anchor ^ to assert the start of the string and use 2 capturing groups with a space between matching (\[[^]]+\]) which will include the opening and closing square brackets.
^(\[[^]]+\]) (\[[^]]+\])
Regex demo
Try thiss. You just need to handle the one extra white space at the end.
const str = "[Mon Feb 27 15:40:12.341031 2017] [auth_digest:notice] [pid 2420:tid 332] [AH01757: generating secret for digest authentication ...]";
const regex = /(?:\[.+?\] ){2}/;
console.log(str.match(regex));
Hi I am trying to to achieve something below.
[root#WEBSERVER]# ll /dev/disk/by-id/scsi-* | grep sdd
lrwxrwxrwx. 1 root root 9 Oct 25 15:26 /dev/disk/by-id/scsi-1234567891123455 -> ../../sdd
I want assign "/dev/disk/by-id/scsi-1234567891123455" from the above output to a variable.
Ansible:
name: Capture output
command: ll /dev/disk/by-id/scsi-* | grep sdd
register: lsblkoutput
Now I want to querty lsblkoutput and the /dev/disk/by-id/scsi-1234567891123455
Thanks,
I have no knowledge af 'Ansible', but I have created a regex that matches, what you want.
However, since I don't know Ansible, I don't know, if your regex supports this pattern:
lrwxrwxrwx[^/]*(.*(?=\s->))
The regex matches the Word 'lrwxrwxrwx', followed by zero or more characters, until it reaches a slash '/'. Then it matches any characters as long as there's '.>', when looking ahead.
The value, you're looking for, is in the captured Group 1.
Hope you can use this in Ansible.
How do I match the first two words from the log description but I don't the second word if it contains a # and numbers.
Logs:
<14>Dec 19 08:48:44 Xwsdedserfse11 httpd: [century]: Tue, 19 Dec 2017 08:48:44 -0800|JohnnyDoe|auth|INFO|1|Successful login for 'JohnnyDoe' from 1.1.1.1 (authentication type: ldap)
<12>Dec 19 08:25:18 Xwsdedserfse11 php: [century]: Tue, 19 Dec 2017 08:25:18 -0800||error|WARNING|1|Query #145050 used to generate source data is inactive.
My match should be
Successful login
Query
I have been working with different variation of this (?:[^\|]*\|){5}(\S+\s)(\S+)\s which pulls the first two words
and (?:[^\|]*\|){5}(\S+\s)([a-zA_Z]+)\s\
and (?:[^\|]*\|){5}(\S+\s)([^#0-9]+)\s but this one doesn't pull Query from the second log.
You can use the next regex: (?:[^\|]*\|){5}(\S+)\s(?:[#\d]+|([a-zA-Z]+))\s
You can use this regex:
(?<=[^\|]+?\|)([a-zA-Z]+?\s(?:[^#]+?\s)?)(?<=[^\|]*)
I have some data like this
Wed Mar 18 15:16:10 2015 eth0:1 109.224.232.219 up (not currently mapped)
Wed Mar 18 15:18:12 2015 eth0:1 109.224.232.219 down (not responding)
Wed Mar 18 15:20:46 2015 eth0:1 109.224.232.219 up (not currently mapped)
Wed Mar 18 15:22:52 2015 eth0:1 109.224.232.219 down (not responding)
Wed Mar 18 15:24:26 2015 eth0:1 109.224.232.219 up (not currently mapped)
I am trying to capture the IP and the date string on each line, I thought I could just do anything before the word eth and then my IP check, but this isn't working. Have I mis understood the concept of capture groups?
Is there a sensible way to get this data from 1 regex?
(^(.*?)eth)(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
Any help would be appreciated.
This is an image of the regex currently
https://www.debuggex.com/i/BaXnqh2DzRhUCph8.png
You're almost there. You just need to add .*? after eth so that it would match the characters present in-between eth and the ip-address.
^(.*?)eth.*?\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
DEMO
If you don't want the space before eth not to be captured by group 1 then you could change your regex like this,
^(.*?)\s+eth.*?\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
DEMO
Sometimes, people ignore what a well-defined sequence of characters a dotted-decimal IP representation is. I have almost no problems identifying an IP when I fully detail a proper IP octet.
my $octet = qr/\b(?:0|1\d{0,2}|2(?:[0-4]\d?|5[0-5]?|[6-9])?|[3-9]\d?)\b/;
my ( $foctet = "$octet" ) =~ s/0[|]//;
And then on top of that, I specify that a IP address is a set of four octets, separated by a dot.
my $ip_regex = qr/($foctet(\.$octet){3})/;
This little beauty will almost always pull for me anything that is a valid IP from any file.
Along with this, dates can be specified with greater specification. And again, following this specification, what you get will almost inevitably be a date:
my $dow = qr/\b(?:Fri|Mon|Sat|Sun|Thu|Tue|Wed)\b/;
my $mon = qr/\b(?:Apr|Aug|Dec|Feb|Jan|Jul|Jun|Mar|May|Nov|Oct|Sep)\b/;
my $day = qr/\b(?:[012]\d?|3[01]?|[4-9])\b/;
my $hr24 = qr/\b(?:[01]\d?|2[0-3])\b/;
my $minsec = qr/\b(?:[0-5]\d)\b/;
my $datetime_regex = qr/$dow\s+$mon\s+$day\s+$hr24:$minsec:$minsec\s+\d+/;
So simply using both regexes against the source line, you get what you want without a whole lot of backtracking needed.
my #date_parts = $line =~ /$datetime_regex/;
my ( $ip ) = $line =~ /$ip_regex/;
In fact, if performance is a concern, I saw numerous failures in the single regex with the non-greedy match, whereas the the ip regex succeeds on first try. The regex engine finds '.' at offset 35 and starts back at position 32.
However, the following, does not fail once for both. Just an indication of how it can help to specify your expressions to the expected range of data:
my ( $dt, $ip ) = m/($datetime_regex)\s+eth\d:\d+\s+($ip_regex)/;