GROK (regular expressions), field with backslash, space and a long

GROK (regular expressions), field with backslash, space and a long - regex

I'm using Logstash to get some text out of a string and create a field.
The string of the message is:
"\"07/12/2016 16:21:24.652\",\"13.99\",\"1467351040\""
I can't figure it out how to get three results, being the first:
07/12/2016 16:21:24.652
The second
13.99
The third
1467351040

match => {
"message"=> [
"\\"%{DATESTAMP:a}\\",\\"%{NUMBER:b}\\",\\"%{NUMBER:c}\\""
]
}
To help the next time you have to craft a grok pattern:
GrokConstructor, to test your pattern
The main patterns
Grok filter documentation

That's the correct line indeed.
I had to remove one backslash for my own config. Thanks very much. Saves me a lot of time and stuff.
grok{ match => { "message"=> [ "\"%{DATESTAMP:a}\",\"%{NUMBER:b}\",\"%{NUMBER:c}\"" ]} }

Related

Extracting message content from JSON using regular expresions

I need to extract content of messages from JSON, I am not allowed to use JSON parser so I tried using regular expressions, however I got stuck on extracting message content. I am using C++.
Here's an example of the JSON file:
{
"id":"776752463986294785",
"type":0,
"content":"\"",
"channel_id":"762106839054811176",
"author":{
"id":"487706666905894923",
"username":"Emzak",
"avatar":"a70859ecda1355dfd55bddcfd0194458",
"discriminator":"6235",
"public_flags":0
},
"attachments":[
],
"embeds":[
],
"mentions":[
],
"mention_roles":[
],
"pinned":false,
"mention_everyone":false,
"tts":false,
"timestamp":"2020-11-13T10:16:58.777000+00:00",
"edited_timestamp":null,
"flags":0
}
as I said i need the Content field, my current regex is :
"content"[ :]+(\"[^"]*\")
Which works unless there are quotation marks in the content. If there are any, they are always escaped, but I haven't found a way to get past them. With quotation marks my current regex gives me this string :
"content": "\"
Which would be problematic if there was any message behind that quotation mark. I would like to get this string :
"content": "\""
Any help would be appreciated, Thanks :)

You can make it escape \" as follows:
"content"[ :]+(\"(?:\\.|[^"])*\")
It creates a non-capturing group that matches every \ with the following character, as well as the original [^"] criteria.

Query document based on field's value containing backslash using regex

I'm trying to query DB with documments similar to one presented below.
{
"_id":"5b9bd1b947c7471038399a39",
"subdir":"ge\\pt02\\kr02_20180824\\kr02_2018091log\\0010796ab5",
}
How to filter all documments starting with: ge\\pt02\\kr02
I tried many different approaches,
for example:
{"subdir": {"$regex": "pt02\\kr02*"}}
but I cannot figure out how to prepare a correct filter:

The problem is that you need to escape the slashes.
Here is a working example:
db.test1.insert({"subdir":"ge\\pt02\\kr02_20180824\\k2_2018091log\\0010796ab5"})
db.test1.find({"subdir": { $regex: "^ge\\\\pt02\\\\kr02"}})
This prints out:
{ "_id" : ObjectId("5ba28194fbb45cb9f7c58b18"), "subdir" : "ge\\pt02\\kr02_20180824\\kr02_2018091log\\0010796ab5" }

We need to escape the backslash there. Also since you want to select only the documents starting with this pattern, you need to group the regex into a parenthesis and prefix the group with caret. This gives us the following regex:
let pattern = "^(ge\\\\pt02\\\\kr02)";
{"subdir": {"$regex": pattern}}
Demo:

Remove an object from JSON using RegEx

I have JSON objects in this format:
{
"1f626": {
"name": "frowning face with open mouth",
"ascii": [],
"code_points": {
"base": "1f626",
"default_matches": [
"1f626"
],
"greedy_matches": [
"1f626"
],
"decimal": ""
}
}
}
I have to remove the code_points object using Regular Expressions.
I have tried using this RegEx:
(("code\w+)(.*)(}))
But it is only selecting the first line.
I have to select until end of curly brackets in order to fully get rid of the code_points object.
How can I do that?
Note: I have to remove it using Regular Expressions and not JavaScript.
Please don't post any JavaScript answers or mark this as a possible duplicate of a JavaScript-based question.

Alternatively, at the command-line, if you can use jq
jq "del(.[].code_points)" <monster.json >smaller_monster.json
This deletes the code_points key inside each 2nd-level object.
It took my machine about 5 seconds on a 60MB document.
It's not a regular expression but it's not JavaScript, either. So, it meets half of your non-functional requirements.

("code_points")([\s\S]*?)(})
The problem you had is that . is actually any character except \n, so in this case I usually use [\s\S] which means any whitespace and non-whitespace character (so it's actually any character). Also you should make * quantifier to be lazy by adding ?.
Remember that this Regular Expression won't work properly in case you have inner object (other {}) in code_points

Using RegEx select line based on positive criteria but exclude certain lines based on negative

I apologize if there is an answer for this somewhere, but my search skills have failed me if there is.
I'm using UltraEdit and there are lines I need to remove from some JSON schemas to make comparing them easier.
So given the following:
"PaymentMethod": {
"$id": "/properties/PaymentMethod",
"items": {
"$ref": "PaymentMethod.json"
},
"type": "array"
}
Using this RegEx:
^.*\".*\"\: \".*$\r\n
Selects these lines:
"$id": "/properties/PaymentMethod",
"$ref": "PaymentMethod.json"
"type": "array"
What I need to do is skip the $ref line.
I've tried to get negative lookaround to work using (?!json) in various ways with the selection criteria and have failed miserably.
The purpose of this is to delete the selected lines.
Thanks for any help.
Update:
To clarify, there are lines I want to delete that match the criteria my Regex finds, but I do not want to delete the $ref line.
So I was hoping to find a relatively easy way to do this using straight up perl regex within UltraEdit.
I've created a workaround with a Python script so I can get my work done, but it would still be interesting to find out if there is a way to do this. :)

Don't write your own broken parser; use an existing one.
use Cpanel::JSON::XS qw( decode_json );
my $json_utf8 = '...';
my $data = decode_json($json_utf8);
my $payment_method = $data->{PaymentMethod};
my $id = $payment_method->{'$id'};
my $item = $payment_method->{items}{'$ref'};
my $type = $payment_method->{type};

Using a JSON module to manipulate the data directly would be a more robust solution, but a quck and dirty way to edit the JSON file would be to run this on the command line.
Again, this is not a good way to manipulate JSON, but it may be suitable for your ad hoc case.
perl -ne 'print unless /"\$ref:"/' file.json > new_file.json

If you don't want $ref after the initial quotes:
^.*\"(?!\$ref).*\"\: \".*$\r\n
see it here: https://regex101.com/r/yxTwck/1
or to exclude based on .json" at the end of the line:
^.*\".*\"\: \".*(?<!\.json")$\r\n
Also note that you are using greedy quantifiers (.* vs. .*?). If your idea is to have the first .* stop at the first ", you should probably use [^\n"]* which will prevent line feeds or "s from being consumed. Your regex matches """""""""""""" "type": "array", for example.

logstash grok filter regular expression works in debug tool but failed in actual execution

I'm trying to extract a filed out of log line, i use http://grokdebug.herokuapp.com/ to debug my regular expression with:
(?<action>(?<=action=).*(?=\&))
with input text like this:
/event?id=123&action={"power":"on"}&package=1
i was able to get result like this:
{
"action": [
"{"power":"on"}"
]
}
but when i copy this config to my logstash config file:
input { stdin{} }
filter {
grok {
match => { "message" => "(?<action>(?<=action=).*(?=\&))"}
}
}
output { stdout {
codec => 'json'
}}
the output says matching failed:
{"message":" /event?id=123&action={\"power\":\"on\"}&package=1","#version":"1","#timestamp":"2016-01-05T10:30:04.714Z","host":"xxx","tags":["_grokparsefailure"]}
i'm using logstash-2.1.1 in cygwin.
any idea why this happen?

You might experience an issue caused by a greedy dot matching subpattern .*. Since you are only interested in a string of text after action= till next & or end of string you'd better use a negated character class [^&].
So, use
[?&]action=(?<action>[^&]*)
The [?&] matches either a ? or & and works as a boundary here.

It doesn't answer your regexp question, but...
Parse the query string to a separate field and use the kv{} filter on it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

GROK (regular expressions), field with backslash, space and a long - regex

I'm using Logstash to get some text out of a string and create a field. The string of the message is: "\"07/12/2016 16:21:24.652\",\"13.99\",\"1467351040\"" I can't figure it out how to get three results, being the first: 07/12/2016 16:21:24.652 The second 13.99 The third 1467351040

match => { "message"=> [ "\\"%{DATESTAMP:a}\\",\\"%{NUMBER:b}\\",\\"%{NUMBER:c}\\"" ] } To help the next time you have to craft a grok pattern: GrokConstructor, to test your pattern The main patterns Grok filter documentation

That's the correct line indeed. I had to remove one backslash for my own config. Thanks very much. Saves me a lot of time and stuff. grok{ match => { "message"=> [ "\"%{DATESTAMP:a}\",\"%{NUMBER:b}\",\"%{NUMBER:c}\"" ]} }

Related

Extracting message content from JSON using regular expresions

Query document based on field's value containing backslash using regex

Remove an object from JSON using RegEx

Using RegEx select line based on positive criteria but exclude certain lines based on negative

logstash grok filter regular expression works in debug tool but failed in actual execution

Categories

Resources