Regex match if exist or ignore - regex

i'm with a throble in regex
Input example:
/aaaa/admin.php?file=xpto.js&version=abcd123
/aaaa/admin.php
Output 1 -
url => /aaaa/admin.php
var => file=xpto.js&version=abcd123
Output 2 -
url => /aaaa/admin.php
i tried %{NOTSPACE:url}(?:/?%{NOTSPACE:var}) and a others but not worked

You may use
%{URIPATH:path}(?:%{URIPARAM:param})?
The patterns are provided at https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns.
The %{URIPATH:path} will match the path, while (?:%{URIPARAM:param})? will match 1 or 0 occurrences (due to the optional non-capturing group (?:...)?) of the query string.
If you need to get rid of the ? in the param, you may also use
(?<path>(?:/[A-Za-z0-9$.+!*'(){},~:;=##%_-]*)+)(?:\?(?<param>[A-Za-z0-9$.+!*'|(){},~##%&/=:;_?\[\]-]*))?
The output for the /aaaa/admin.php?file=xpto.js&version=abcd123 input:
{
"path": [
[
"/aaaa/admin.php"
]
],
"param": [
[
"?file=xpto.js&version=abcd123"
]
]
}
The output for /aaaa/admin.php:
{
"path": [
[
"/aaaa/admin.php"
]
],
"param": [
[
null
]
]
}

Try this regex :
(%{NOTSPACE:url})(?:\?(%{NOTSPACE:var}))?
Demo : http://regexr.com/3f6sm

Is this what your looking for?
([^\s?]+)(?:\?(\S+))?
You can test it here.
Also, you could just split the url string on ?

Related

grok parse optional field pattern doesn't work

I've got a log like this:
ERROR_MESSAGE:Invalid Credentials,THROTTLED_OUT_REASON:API_LIMIT_EXCEEDED
I'm trying to parse it with grok using grok debugger:
ERROR_MESSAGE:%{GREEDYDATA:errorMassage},THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason}
It works, but sometimes the log comes without THROTTLED_OUT_REASON field.
ERROR_MESSAGE:%{GREEDYDATA:errorMassage}
In that case I tried below code since THROTTLED_OUT_REASON is an optional field.
ERROR_MESSAGE:%{GREEDYDATA:errorMassage}(,THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason})?
So this should work for both cases. The given output for the log with optional field is:
{
"errorMassage": [
[
"Invalid Credentials,THROTTLED_OUT_REASON:API_LIMIT_EXCEEDED"
]
],
"throttledOutReason": [
[
null
]
]
}
But the expected output for the log with optional field:
{
"errorMassage": [
[
"Invalid Credentials"
]
],
"throttledOutReason": [
[
"API_LIMIT_EXCEEDED"
]
]
}
expected output for the log without optional field:
{
"errorMassage": [
[
"Invalid Credentials"
]
],
"throttledOutReason": [
[
null
]
]
}
Can anyone suggest a solution which gives correct output for both type of logs?
Since you use GREEDYDATA it "eats" as much as it can get in order to fill errormessage.
I do not know GROK enough to tell you what alternative defined patterns there are, but you should be able to use a custom pattern:
ERROR_MESSAGE:(?<errorMassage>.*?),THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason}
I got the answer using #Skeeve 's idea.
Here it is for anyone who would come up with a similar question:
I've used custom pattern in order to avoid excess eating of GREEDYDATA (for errorMessage field).
ERROR_MESSAGE:(?<errorMassage>([^,]*)?)(,THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason})?

Watson Assistant: Problem with extracting value for pattern entity

I am trying to get the value for the first group match of a pattern entity from the json response of Watson Assistant. The pattern is a simple regex to recognize sequences of numbers: ([0-9]+)
The json response looks like this:
"entity": "ID",
"location": [
18,
23
],
"value": "id",
"confidence": 1.0,
"groups": [
{
"group": "group_0",
"location": [
18,
23
]
}
]
},
{
"entity": "sys-number",
"location": [
18,
23
],
"value": "12345",
"confidence": 1.0,
"metadata": {
"numeric_value": 12345.0
}
}
]
So, the group is matched alright, but the field "value" is populated with the String literal from the entity config. I would expected to find the actual value there (which is the one the value field of the next entity, sys-number).
How do I need to change the config so that the value is included as-is in the value field (or somewhere else) and so that I don't have to extract the entity from the text string using the location values? Is it possible at all?
Thanks a lot
Cheers,
Martin
To access value of pattern based entity, you can either use <? #entity_name.literal ?> or <? #entity_name.groups[0] ?> - if there are more groups captured. You can find more info in the doc: https://cloud.ibm.com/docs/services/assistant?topic=assistant-entities

custom log grok pattern

I am trying to parse a custom log line using grok pattern but I'm not able to completely parse the line.
Custom log line:
site 'TRT' : alias 'TRT,FAK,FAS,ATI,ONE,DVZ,TWO' : serveur 'Test10011' RAS : TRT / TRT serveur 'Test10011' OK
Grok pattern:
%{DATA:site}\:%{DATA:alias}\:%{DATA:server}\:%{DATA:msg}
Result:
{
"site": [
[
"site 'TRT' "
]
],
"alias": [
[
" alias 'TRT,FAK,FAS,ATI,ONE,DVZ,TWO' "
]
],
"server": [
[
" serveur 'Test10011' RAS "
]
],
"msg": [
[
""
]
]
}
I am not able to parse the last few items in the 'msg', . Could you please help ,where I'm going wrong? msg should contain "TRT / TRT serveur 'Test10011' OK"
It seems you just need to use GREEDYDATA instead of DATA pattern:
%{DATA:site}\s*:\s*%{DATA:alias}\s*:\s*%{DATA:server}\s*:\s*%{GREEDYDATA:msg}
I also suggest adding \s* around : to get rid of leading/trailing whitespaces.

Regex for json with negative lookahead

I have the following json array:
[
{
"roleId": 128,
"roleName": "B",
"Permissions": []
},
{
"roleId": 310,
"roleName": "ROLE",
"Permissions": [
{
"permissionId": 8074,
"isPermissionActive": true
},
{
"permissionId": 2271,
"isPermissionActive": true
},
{
"permissionId": 8075,
"isPermissionActive": true
},
{
"permissionId": 2275,
"isPermissionActive": true
}
]
},
{
"roleId": 201,
"roleName": "B",
"Permissions": []
},
{
"roleId": 5,
"roleName": "B",
"Permissions": []
}
]
I need to select the roleId of the jsons with non null permissions. I've tried using a negative lookahead, but doesnt seem to work.
Before selecting the roleId, I wanted to get a correct match on the json so I've tried \{(\s*.*?(?!"Permissions":\s*\[\]))*\}, but it seems to match all the jsons and even the inner jsons of the permission array. How should change the look ahead to match properly. I'm currently matching in sublime text 3.
Try the following Regex:
(?<="roleId": )\d*(?=(?:[^{]*"Permissions": \[(?!])))
Explanation:
The first part (?<="roleId": )\d* uses positive lookbehind to find the rollId
The second part (?=(?:[^{]*"Permissions": \[(?!]))) negative lookahead to find non-null permissions, and positive lookahead to find the rollId with the non-null permissions.
See detailed explanation here
While this doesn't directly answer your question, I would like to point out that it might be more natural to process JSON with JavaScript or with one of many languages that can easily convert JSON to native types. Alternatively, you could use a JSON processing language such as jq:
jq '
map( select(.Permissions != []) | .roleId )
'
which produces an array of role IDs with non-empty permissions.
A lookbehind to check if the number is a "roleId".
And a lookahead to check if the "Permissions" have a mustache.
(?<="roleId": )\d+(?=[^\}]*?"Permissions":\s*\[\s*\{)
Test it here

Regex working with calculator but not with AWS lambda function

Trouble with regex and gather all data between [ and ].
Testing with the program: http://regexr.com/
String data
{
"Items": [
{
"UserID": "1487840267893246",
"Timestamp": 1487204364877,
},
{
"UserID": "1487840267893336",
"Timestamp": 1487204364888,
}
],
"Count": 2,
"ScannedCount": 3
}
The below (fired in AWS lambda) has the intention of pulling all chars between the [ and ] and outputting it. (\[[^]*\]) works with the regex calc above, but only returns "undefined" in Lambda. Why?
Items = data.match(/"(\[[^]*\])"/);
console.log(Items);
An alternative solution was to extract the data into an array as follows
userID = data.match(/"UserID":"([^"]+)"/g);
console.log(userID);
Try the dotall flag:
Items = data.match(/"(?s)\[.*\]");
And you didn't need those brackets.