How to parse word using grok - regex

I am very new in grok syntax. I have lines:
/app-name/version/code_suffix/sync
for example:
/my-app/v1/O03_ABCD/sync
/my-app/v1/O04/sync
and I need to parse code which always consist from 3 characters. I tried something using:
http://grokconstructor.appspot.com/do/match
but with no success

This regex will match each part of your format and put it in a named capturing group :
/(?<appName>[^/]*)/(?<version>[^/]*)/(?<code>[^\W_]{3})(?:_(?<suffix>[^/]*))?/sync
You can try it here, and it also works on grokConstructor.

Related

Match a string between hash symbols

I have a string in this shape
State#Received#ID#e23d8926-1327-4fde-9ea7-d364af3325e0
I want to extract the State value via RegEx. So in this above example I only want to extract Received
I have tried the following ([^State#])([A-Za-z]) which matches Received but I am stuck at excluding the rest of the string #ID#e23d8926-1327-4fde-9ea7-d364af3325e0
You should not use a parenthesis for the group you don't want to capture. My solution is that:
State#(?'state'[^#]+)#
Sample: https://regex101.com/r/vAr65j/1

How to write Regex expression to extract the content in brackets, after string and the first match?

I would like to use Regular expression to extract content between brackets, after some specific string and the 1st match.
Example text:
**-n --command PING being applied--:
Wed May 34 7:23:18 2010
[ZZZ_6323] Command [ping] failed with error [[TEZZZGH_IUE] [[EIJERTMMMMIJE_EIEJ] gdyugedyue Service [ABC] is not available in domain [DEF]. Check the content and review diejidjei. Service [ABC] Domain [DEF] ] did not ping back. It might be due to one of the following reasons:
=> Reason1
=> Reason3
=> Reason 4: deijdije djkeoidjeio.
info=4343 day=Mon year=2010*
I would like to extract the string between [] but after string Service and 1st match as Service could appear again later. In this case ABC
Could someone help me?
I am not able to combine these three conditionals.
Thanks
Assuming that you don't care about capturing square brackets inside the [ ] pair, by far the easiest way to do this is to use the following simple regex:
Service (\[[^\]]*\])
and extract only the 1st capturing group from the result using whatever regex functionality you're using. For example, using JS, you would write
string.match(/Service (\[[^\]]*\])/)[1]
to extract the first capturing group.
If you instead want a regex that will only capture the first occurrence, you can exploit the greedy nature of the * quantifier and change the regex to this:
Service (\[[^\]]*\]).*
Service \[([^\]]+)\]
will match Service [anything besides brackets] and capture anything besides brackets in group number 1. Since regex engines work left-to-right, the first match will be the leftmost match.
Test it live on regex101.com.
In PHP, you could do this (code snippet generated by RegexBuddy):
if (preg_match('/Service \[([^\]]+)\]/', $subject, $groups)) {
$result = $groups[1];
} else {
$result = "";
}
The definition of the group name How should I write it? I know that it can be like this: (?) but I dont know how to combine it with this part Service [([^]]+)] in a single way

Match 'complex' string using regex

I'm very new to regex, I'm trying to analyse data that come from a simple text file. Before I start the data analysis, I need to make sure the format or structure of the content in the simple text file is correct, then only can continue the process. The content in the file look like this:
,file_06,,
x data,y data
-969.0,-42.18187,
-958.0,-39.62946,
-948.0,-37.748737,
-938.0,-35.73368,
-929.0,-33.9873,
-919.0,-32.24092,
-910.0,-30.76321,
-899.0,-29.01683,
-891.0,-27.40478,
-878.0,-26.19575,
-872.0,-24.986712,
-864.0,-23.24033,
-853.0,-22.16563,
Looking for help in writing the regex.
I tried to write out some regex, but I keep match the first line only. I can't match the whole content.
Regex pattern :
/(,file_[\d]*,,)\n(x data,y data)\n((-?[\d]*.[\d]*,-?[\d]*.[\d]*,?)\n)*(,,)?/g
This will work
/(?=-)(.?[^\,]*)/gm
Using positive lookahead to start at the '-' then delimiting everything by the ','.
Use
/(?=-)(.*)/gm
if you want to capture the pairs of data together.
Sample at https://regex101.com/r/a5Dk5Y/1/

Multiple replace regex in one Apache-NiFi statement

I have a csv in following format.
id,mobile
1,02146477474
2,08585377474
3,07646474637
4,02158789566
5,04578599525
I want to add a new column and add just leading 3 numbers to that column (for specific cases and all the others NOT_VALID string). So result should be:
id,number,provider
1,02146477474,021
2,08585377474,085
3,07646474637,NOT_VALID
4,02158789566,021
5,04578599525,NOT_VALID
I can use following regex for replacing that. But I would like to use all possible conversations in one step. Using UpdateRecord processor.
${field.value:replaceFirst('085[0-9]+','085')}
When I use something like this:
${field.value:replaceFirst('085[0-9]+','085'):or(${field.value:replaceFirst('086[0-9]+','086')}`)}
This replaces all with false.
Nifi uses Java regex
As soon, as you are using record processing, this should work for you:
${field.value:replaceFirst('^(021|085)?.*','$1')}
The group () optionally ? catches 021 or 085 at the beginning of string ^
The replacement - $1 - is the first group
PS: The sites like https://regex101.com/ helps to understand regex

Getting rid of the parenthesis with regular expression group matching

I'm trying to analyze logs using splunk and I need to parse lines that look like this:
2012-06-20 20:35:13,980 INFO [http-bio-8080-exec-72] (b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf) interceptor.CustomLoggingOutInterceptor (AbstractLoggingInterceptor.java:149) - Outbound Message
I've got this regex which matches:
(?i)^[^\]]*\]\s+(?P<FIELDNAME>[^ ]+)
this part :
2012-06-20 20:35:13,980 INFO [http-bio-8080-exec-72] (b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf)
Using groups I can extract the real information that I need and that is :
(b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf)
Only problem is that I don't need parenthesis, I've tried with some negative lookahead/lookbehind google searches, don't really know regex that well.
So my final goal would be to capture b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf . thanks
(?i)^[^\]]*\]\s+\((?P<FIELDNAME>[^ ]+)\)
That matches and drops the () in group 1.
Play with the regex here.