Regex: select the XML messages and time stamp from the log - regex

I am going to streaming the logs in to nxlog, i need to push xml messages in to nexlog server, To select the XML message:
(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})(.*)(my sentence 1....|my sentence 2 : [\S+\s+]*>\n)(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})
But I am not able to select all XML messages from logs
https://regex101.com/r/iA8qE5/5

In your regex you have to close the alternation using ) after:
(Message Picked from the queue....|Response Message :
Using a + inside the character class would have a different meaning and would match a plus sign literally. The plus is greedy so you have to make it non greedy using a question mark to let [\S\s]+ not match all lines.
Update [\S+\s+]*>\n)
to
)([\S\s]+?>)\n
Your match is in the 4th capturing group.
(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})(.*)(Message Picked from the queue....|Response Message : )([\S\s]+?>)\n(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})
Regex demo
Not that if you don't need all the capturing groups, you can also omit them and take only the first capturing group (Demo)

it capture date from starting line, message and xml. it using gms flag, Demo
^([\d-\.\s\:]+)\s.*?-\s([\w\s:\.]+)(<\w+.*?)\n\d{4}
date and xml only
^([\d-\.\s\:]+)\s.*?(<\w+.*?)\n\d{4}

Related

Regex (grok) - create general pattern for log which occurs but don't have to

I am sorry for enigmatic topic title, but I did not know how to put it correctly.
These are log types:
{vpnclient} Client[10.10.10.10:54576](11764): sending R_KEYCHANGE message
{vpnclient} Client[10.10.10.10:54576](16031): sending R_IPCONFIG message - client IP = 172.11.11.11/255.255.255.0, CEP = 3600 s, DNS = 172.11.1.101, 172.11.1.102
And this is my grok pattern:
^{vpnclient} %{WORD}\[%{IP:[client][ip]}:%{NUMBER:[source][port]}\]\(%{INT:[process][pid]}\): %{GREEDYDATA:message} (:?%{GREEDYDATA:kv_vpn_message})
What i want to do is forward log after hyphen (so - client IP) to kv filter.
My problem is - this type of log does not occur always, so i want to wrap the whole grok pattern, so it matches until %{GREEDYDATA:message} and also %{GREEDYDATA:kv_vpn_message}, but only when it occurs.
You can use
^{vpnclient} %{WORD}\[%{IP:[client][ip]}:%{NUMBER:[source][port]}\]\(%{INT:[process][pid]}\): %{DATA:message}(?: - %{GREEDYDATA:kv_vpn_message})?$
There are several changes:
%{DATA:message} - the message pattern is turned into a non-greedy dot pattern, .*?, with GREEDYDATA changed to DATA
(?: - %{GREEDYDATA:kv_vpn_message})? - is an optional non-capturing group that matches one or zero occurrences of - and then zero or more chars as many as possible captured into the "kv_vpn_message" group
$ - end of string anchor, it allows the "message" DATA pattern match till the end of line.

Regex capture group with non-uniform space group

I'm trying to parse the output of the "display interface brief" Comware switch command to convert it to a CSV file using RegEx. This command is printed using the following format:
Interface Link Speed Duplex Type PVID Description
BAGG51 UP 4G(a) F(a) T 1
FGE1/0/42 DOWN auto A T 1 ### LIVRE ###
GE6/0/20 UP 100M(a) F(a) A 1 LIVRE (MGMT - [WAN8-P8]
It's seems quite challenging for me because doesn't matter which RegEx I try, it doesn't properly handle "DOWN auto" and "100M(a) F(a)" output that has only one space between them. I also couldn't find a way to properly handle the last field, that can contain one or more spaces, but into most RegEx that I tried it create a separate capture group for each space instead of handling it's text content properly.
I'd also tried countless ways to try to parse it, and I couldn't find much content about parsing non-uniform columns into the Internet and StackOverflow community.
I need to parse it into the following format, with 7 capture groups per line, respecting the end of line:
BAGG51;UP;4G(a);F(a);T;1
FGE1/0/42;DOWN;auto;A;T;1;### LIVRE ###
GE6/0/20;UP;100M(a);F(a);A;1;LIVRE (MGMT - [WAN8-P8]
The most successfully RegEx that I found so far was: ^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+) replacing it to $1;$2;$3;$4;$5;$6;$7 using Notepad++ but it doesn't properly handle the "Description" field, that can be empty.
The following pattern seems to be working here:
^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)(?:[ ]+(.*))?
This follows your pattern with six mandatory capture groups, followed by an optional seventh capture group. The (?:[ ]+(\S+))? at the end of the pattern matches one or more spaces followed by the content. Note that this pattern should be used in multiline mode.
Here is a working demo

Generate Regex with special characters and brackets

I am trying to create a regex for this text *[Failure] : Automation Failure, Reason - Unable to find Watch Live button on title detail page*, I want to extract anything between *[Failure] : and *. I tried coming up with \*\[Failure][ :,-,-]+[a-zA-Z0-9]+\* but this does not work.
In my case desired output is Automation Failure, Reason - Unable to find Watch Live button on title detail page
If you simply want to get everything between the '*[Failure] :' and the '*', you can use a lookbehind and a lookahead to make the regex:
(?<=\*\[Failure] : ).*(?=\*)
(?<=\*\[Failure] : ) looks behind for '*[Failure] :'
(?=\*) looks ahead for a '*'
You are missing some essential characters in the 2 character classes that you use to span the match till *, and to only get the part in between you can use a capture group or else you will have the full match only.
\*\[Failure][ :,-]+([a-zA-Z0-9, -]+)\*
Regex demo

Regex pattern for Prometheus exporter

I am trying to create a regex pattern for one of the prometheus exporter (jmx exporter) configuration file to export weblogic jms queues.
My String is as below
(com.bea<ServerRuntime=AC_Server-10-100-40-122, Name=iLoyalJMSModule!AC_JMSServer#AC_Server-10-100-40-122#com.ibsplc.iloyal.eai.EN.retro.outErrorqueue, Type=JMSDestinationRuntime, JMSServerRuntime=AC_JMSServer#AC_Server-10-100-40-122><>MessagesCurrentCount)
And the RegEx is as below
Pattern
com.bea<ServerRuntime=(.+), Name=(.+), Type=(.+), JMSServerRuntime=(.+)<>(MessagesCurrentCount|MessagesPendingCount)
Name to display in Prometheus exporter output
name: "weblogic_jmsserver_$1_$5"
Current Output
weblogic_jmsserver_ac_server_10_100_40_122_messagescurrentcount
Now i would like to add the queue outErrorqueue name to my output from the Name= string and the final output should be like below.
Required Output
weblogic_jmsserver_ac_server_10_100_40_122_outErrorqueue_messagespendingcount
You could change the number of capture groups from 5 to the 2 that you need in the replacement. Instead of using .+, you can either use .*? or use a negated character class to match any char except a commen [^,]+
If the surrounding parenthesis of the example data should not be part of the replacement, you can use:
\(com\.bea<ServerRuntime=([^,]+), Name=[^,]+, Type=[^,]+, JMSServerRuntime=.+?<>(Messages(?:Current|Pending)Count)\)
In the replacement use:
weblogic_jmsserver_$1_outErrorqueue_$2
See a regex demo

Regex to parse certain fields of a log file

I have this log line:
blabla#gmail.com, Portal, qtp724408050-38, com.blabla.search.lib.SearchServiceImpl .logRequest, [Input request is lookupRequestDTO]
I need to find a regex that grabs that email, then matches lookupRequestDTO ignoring everything in between.
Currently my regex grabs the whole line:
([\w-\.]+)#gmail.com,(.+)lookupRequestDTO
How do I not match anything in between the email and lookupRequestDTO ?
What about this?
([^,]+).*?lookupRequestDTO
[^,]+ matches everything up until the first comma so it should get you the email
It assumes lookupRequestDTO is a criteria for your search. If it is a variable you want to retrieve, you could use this :
([^,]+).*?\[Input request is ([^\]]+)
Assuming you're using PCRE (php, perl, etc., and this should work in javascript):
([\w-\.]+?#gmail\.com),(?:.+)(lookupRequestDTO)
Out of capture groups 1 and 2, you'll get:
MATCH 1
blabla#gmail.com
lookupRequestDTO
Working example: http://regex101.com/r/yW9eU3