Negating from symbol until the previous space - regex

Trying to use logstash grok filters (oniguruma regex) to filter some logs. For a log entry that looks like this:
2019-03-24 17:57:14,202 p=19455 u=root | TASK [this is the task name msg=Debug message] ************************
I have written this filter:
%{DATE:date}\s%{TIME:time}\sp=(?<id>[\d]+)\su=(?<user>[\w]+)\s\|\s*TASK\s*\[(?<task>[^=]*)
Difficulty for me here, is that I need to match "task" label to exactly this:
"this is the task name". At this time "task" matches ""this is the task name msg".
And of course, this is only an example and the words themselves change from example to example.
This is an ansible log, which for some reason mixes the task name and the tasks themselves in the same log line and only uses spaces to separate them. In all the cases, I know that the task name has finished and the task details are showing, because of the "=" symbol.
So I would need to match until a "=" is found, and then negate the word behind it, in this case is "msg" (depending on the task, this word could also change).
Any ideas how to accomplish this? Thanks!

You may use
%{DATE:date}\s%{TIME:time}\su=(?<user>\w+)\s\|\s*TASK\s*\[(?<task>[^\]=]*)\s\w+=
See the regex demo
The (?<task>[^\]=]*)\s\w+= part is of interest:
(?<task>[^\]=]*) - Group named "task": [^\]=]* matches any 0+ chars other than ] and =
\s - one whitespace
\w+ - 1+ word chars
= - a = char

Related

Regex (grok) - create general pattern for log which occurs but don't have to

I am sorry for enigmatic topic title, but I did not know how to put it correctly.
These are log types:
{vpnclient} Client[10.10.10.10:54576](11764): sending R_KEYCHANGE message
{vpnclient} Client[10.10.10.10:54576](16031): sending R_IPCONFIG message - client IP = 172.11.11.11/255.255.255.0, CEP = 3600 s, DNS = 172.11.1.101, 172.11.1.102
And this is my grok pattern:
^{vpnclient} %{WORD}\[%{IP:[client][ip]}:%{NUMBER:[source][port]}\]\(%{INT:[process][pid]}\): %{GREEDYDATA:message} (:?%{GREEDYDATA:kv_vpn_message})
What i want to do is forward log after hyphen (so - client IP) to kv filter.
My problem is - this type of log does not occur always, so i want to wrap the whole grok pattern, so it matches until %{GREEDYDATA:message} and also %{GREEDYDATA:kv_vpn_message}, but only when it occurs.
You can use
^{vpnclient} %{WORD}\[%{IP:[client][ip]}:%{NUMBER:[source][port]}\]\(%{INT:[process][pid]}\): %{DATA:message}(?: - %{GREEDYDATA:kv_vpn_message})?$
There are several changes:
%{DATA:message} - the message pattern is turned into a non-greedy dot pattern, .*?, with GREEDYDATA changed to DATA
(?: - %{GREEDYDATA:kv_vpn_message})? - is an optional non-capturing group that matches one or zero occurrences of - and then zero or more chars as many as possible captured into the "kv_vpn_message" group
$ - end of string anchor, it allows the "message" DATA pattern match till the end of line.

Regex Group Name prefix multiple options

I'm performing regex extraction for parsing logs for our SIEM. I'm working with PCRE2.
In those logs, I have this problem: I have to extract a field that can be preceded by multiple options and I want use only one group name.
Let me be clearer with an example.
The SSH connection can appear in our log with this form:
UserType=SSH,
And I know that a simple regex expression to catch this is:
UserType=(?<app>.*?),
But, at the same time, SSH can appear with another "prefix":
ACCESS TYPE:SSH;
that can be captured with:
ACCESS\sTYPE:(?<app>.*?);
Now, because the logical field is the same (SSH protocol) and I want map it in every case under group name "app", is there a way to put the previous values in OR and use the same group name?
The desiderd final result is something like:
(UserType=) OR (ACCESS TYPE:) <field_value_here>
You can use
(?:UserType=|ACCESS\sTYPE:)(?<app>[^,;]+)
See the regex demo. Details:
(?:UserType=|ACCESS\sTYPE:) - either UserType= or ACCESS + whitespace + TYPE:
(?<app>[^,;]+) - Group "app": one or more chars other than , and ;.

How to write Regex expression to extract the content in brackets, after string and the first match?

I would like to use Regular expression to extract content between brackets, after some specific string and the 1st match.
Example text:
**-n --command PING being applied--:
Wed May 34 7:23:18 2010
[ZZZ_6323] Command [ping] failed with error [[TEZZZGH_IUE] [[EIJERTMMMMIJE_EIEJ] gdyugedyue Service [ABC] is not available in domain [DEF]. Check the content and review diejidjei. Service [ABC] Domain [DEF] ] did not ping back. It might be due to one of the following reasons:
=> Reason1
=> Reason3
=> Reason 4: deijdije djkeoidjeio.
info=4343 day=Mon year=2010*
I would like to extract the string between [] but after string Service and 1st match as Service could appear again later. In this case ABC
Could someone help me?
I am not able to combine these three conditionals.
Thanks
Assuming that you don't care about capturing square brackets inside the [ ] pair, by far the easiest way to do this is to use the following simple regex:
Service (\[[^\]]*\])
and extract only the 1st capturing group from the result using whatever regex functionality you're using. For example, using JS, you would write
string.match(/Service (\[[^\]]*\])/)[1]
to extract the first capturing group.
If you instead want a regex that will only capture the first occurrence, you can exploit the greedy nature of the * quantifier and change the regex to this:
Service (\[[^\]]*\]).*
Service \[([^\]]+)\]
will match Service [anything besides brackets] and capture anything besides brackets in group number 1. Since regex engines work left-to-right, the first match will be the leftmost match.
Test it live on regex101.com.
In PHP, you could do this (code snippet generated by RegexBuddy):
if (preg_match('/Service \[([^\]]+)\]/', $subject, $groups)) {
$result = $groups[1];
} else {
$result = "";
}
The definition of the group name How should I write it? I know that it can be like this: (?) but I dont know how to combine it with this part Service [([^]]+)] in a single way

Regex for Windows "Message" from syslog source

I have Windows logs being aggregated to a syslog server which is messing with the format a little bit and I'm trying to work a regular expression (PCRE) to be reformat it a little so I can extract some key/value pairs
I've had a go at the regular expression myself, but I'm stuck on the fact that each "Message" section has several "Headers" which have defined key/value pairs underneath them.
An example would be:
An attempt was made to access an object. Subject: Security ID: NT AUTHORITY\SYSTEM Account Name: NAME$ Account Domain: DOMAIN Logon ID: 0x3e7 Object: Object Server: Security Object Type: File Object Name: Z:\PATH\PATH\PATH\file.log Handle ID: 0x9b0 Process Information: Process ID: 0xa84 Process Name: C:\Program Files\PROGRAM\EXECUTABLE.exe Access Request Information: Accesses: ReadData (or ListDirectory) Access Mask: 0x1
The headers would be Subject, Object and Process Information.
Where I seem to be stuck is the only delimiter here is \s regardless of a header or pair.
This has got me close.
\s([^:\s]+)\:[\s]([^\s]*) but only captures the first word in a multi-word header or key.
With /s being the only delimiter, will this be possible?
If you only want those header names you might use an alternation and list the words between word boundaries \b.
Note that you don't have to escape the : and a single \s could also be written without the square brackets.
\b(Process Information|\S+)\b:\s(\S*)
Explanation
\b Word boundary
( Capturing group 1
Process Information|\S+ Match any of the listed
) Close capturing group
\b:\s Match word boundary, : and whitespace char
(\S*) Capturing group 2 matching 0+ times a non whitespace char
See a regex demo

Google Analytics - Content grouping - Regex fix

This is our URL structure:
http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
http://www.disabledgo.com/access-guide/kingston-university/coombehurst-court-2
http://www.disabledgo.com/access-guide/kings-college-london/franklin-wilkins-building-2
http://www.disabledgo.com/access-guide/redbridge-college/brook-centre-learning-resource-centre
I am trying to create a list of groups based on the client names
/access-guide/[this bit]/...
So I can have a performance list of all our clients.
This is my regex:
/access-guide/(.*universit(y|ies)|.*colleg(e|es))/
I want it to group anything that has university/ies or college/es in it, at any point within that client name section of the URL.
At the moment, my current regex will only return groups that are X-University:
Durham-University
Plymouth-University
Cardiff-University
etc.
What does the regex need to be to have the list I'm looking for?
Do I need to have something at the end to stop it matching things after the client name? E.g. ([^/]+$)?
Thanks for your help in advance!
Depending upon your needs you may want to do:
/access-guide/([^/]*(?:university|universities|college|colleges)[^/]*)/
This will match names even if "university" or "college" is not at the end of the string. For example "college-of-the-ozarks" Note the non-capturing internal parenthesis, that should probably be used no matter what solution you go with, as you don't want to just match the word "university" or "college"
Live Example
Additionally, I don't know what may be in your but if you may have compound words you want to eliminate using a \b may be advisable. For instance if you don't want to match "miskatonic-postcollege" you may want to do something like this:
/access-guide/([^/]*\b(?:university|universities|college|colleges)\b[^/]*)/
If the client name section of the URL is after the access-guid/ and before the next /:
http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
|----------------------------|
you need to use a negated character class to only match university before the regex reaches that rightmost / boundary.
As per the Reference:
You can extract pages by Page URL, Page Title, or Screen Name. Identify each one with a regex capture group (Analytics uses the first capture group for each expression)
Thus, you can use
/access-guide/([^/]*(universit(y|ies)|colleges?))
^^^^^
See demo.
The regex matches
/access-guide/ - leftmost boundary, matches /access-guide/ literally
[^/]* - any character other than / (so we still remain in that customer section)
(universit(y|ies)|colleges?) - university, or universities, orcollegeorcolleges` literally. Add more if needed.