Remove an object from JSON using RegEx - regex

I have JSON objects in this format:
{
"1f626": {
"name": "frowning face with open mouth",
"ascii": [],
"code_points": {
"base": "1f626",
"default_matches": [
"1f626"
],
"greedy_matches": [
"1f626"
],
"decimal": ""
}
}
}
I have to remove the code_points object using Regular Expressions.
I have tried using this RegEx:
(("code\w+)(.*)(}))
But it is only selecting the first line.
I have to select until end of curly brackets in order to fully get rid of the code_points object.
How can I do that?
Note: I have to remove it using Regular Expressions and not JavaScript.
Please don't post any JavaScript answers or mark this as a possible duplicate of a JavaScript-based question.

Alternatively, at the command-line, if you can use jq
jq "del(.[].code_points)" <monster.json >smaller_monster.json
This deletes the code_points key inside each 2nd-level object.
It took my machine about 5 seconds on a 60MB document.
It's not a regular expression but it's not JavaScript, either. So, it meets half of your non-functional requirements.

("code_points")([\s\S]*?)(})
The problem you had is that . is actually any character except \n, so in this case I usually use [\s\S] which means any whitespace and non-whitespace character (so it's actually any character). Also you should make * quantifier to be lazy by adding ?.
Remember that this Regular Expression won't work properly in case you have inner object (other {}) in code_points

Related

Extracting message content from JSON using regular expresions

I need to extract content of messages from JSON, I am not allowed to use JSON parser so I tried using regular expressions, however I got stuck on extracting message content. I am using C++.
Here's an example of the JSON file:
{
"id":"776752463986294785",
"type":0,
"content":"\"",
"channel_id":"762106839054811176",
"author":{
"id":"487706666905894923",
"username":"Emzak",
"avatar":"a70859ecda1355dfd55bddcfd0194458",
"discriminator":"6235",
"public_flags":0
},
"attachments":[
],
"embeds":[
],
"mentions":[
],
"mention_roles":[
],
"pinned":false,
"mention_everyone":false,
"tts":false,
"timestamp":"2020-11-13T10:16:58.777000+00:00",
"edited_timestamp":null,
"flags":0
}
as I said i need the Content field, my current regex is :
"content"[ :]+(\"[^"]*\")
Which works unless there are quotation marks in the content. If there are any, they are always escaped, but I haven't found a way to get past them. With quotation marks my current regex gives me this string :
"content": "\"
Which would be problematic if there was any message behind that quotation mark. I would like to get this string :
"content": "\""
Any help would be appreciated, Thanks :)
You can make it escape \" as follows:
"content"[ :]+(\"(?:\\.|[^"])*\")
It creates a non-capturing group that matches every \ with the following character, as well as the original [^"] criteria.

What is the correct way to format regex metacharacters and options when using the regex operator in $searchBeta in MongoDB?

I'm trying to do full-text search in MongoDB with $searchBeta (aggregation) and I'm using the 'regex' operator to do so. Here's the portion of the $searchBeta I have that isn't working how I expecting it would:
$searchBeta: {
regex: {
query: '\blightn', // '\b' is the word boundary metacharacter
path: ["name", "set_name"],
allowAnalyzedField: true
}
}
Here's an example of two documents that I'm expecting to get matched by the expression:
{
"name": "Lightning Bolt"
"set_name": "Masters 25"
},
{
"name": "Chain Lightning",
"set_name": "Battlebond"
}
What I actually get:
[] //empty array
If I use an expression like:
$searchBeta: {
regex: {
query: '[a-zA-Z]'
path: ["name", "set_name"],
allowAnalyzedField: true
}
}
then I get results back.
I can't get any expression that has regex metacharacters and/or options in it to work, so I'm pretty sure I'm just entering it wrong in my query string. The $searchBeta regex documentation doesn't really cover how to format metacharacters into your query string. Also, the $searchBeta regex operator is different from $regex because it doesn't require slashes (i.e. "/your expression/" ). Really pulling my hair out on something so simple that I can't figure out.
$searchBeta uses Lucene for regular expressions, which is not Perl Compatible (PCRE) and doesn't support \b. You can read about the Lucene regex syntax here and also Elastic's docs on it are also helpful.
Here is a similar question for ElasticSearch and includes some workarounds.

Extracting a value from JSON Response using Regular Expression Extractor in Jmeter

I have JSON response from which i want to extract the "transaction id" value i.e (3159184) in this case and use it in my next sampler. Can somebody give me regular expression to extract the value for the same. I have looked for some solutions but it doesn't seem to work
{
"lock_release_date": "2021-04-03T16:16:59.7800000+00:00",
"party_id": "13623162",
"reservation_id": "reserve-1-81b70981-f766-4ca7-a423-1f66ecaa7f2b",
"reservation_line_items": [
{
"extended_properties": null,
"inventory_pool": "available",
"lead_type": "Flex",
"line_item_id": "1",
"market_id": 491759,
"market_key": "143278|CA|COBROKE|CITY|FULL",
"market_name": "143278",
"market_state_id": "CA",
"product_name": "Local Expert",
"product_size": "SOV30",
"product_type": "Postal Code",
"reserved_quantity": 0,
"transaction_id": 3159174
}
],
"reserved_by": "user1#abc.com"
}
Here's what i'm trying in Jmeter
If you really want the regular expression it would be something like:
"transaction_id"\s?:\s?(\d+)
Demo:
where:
\s? stands for an optional whitespace - this is why your expression doesn't work
\d+ stands for a number
See Regular Expressions chapter of JMeter User Manual for more details.
Be aware that parsing JSON using regular expressions is not the best idea, consider using JSON Extractor instead. It allows fetching "interesting" values from JSON using simple JsonPath queries which are easier to create/read and they are more robust and reliable. The relevant JSON Path query would be:
$.reservation_line_items[0].transaction_id
More information: API Testing With JMeter and the JSON Extractor
Use JSON Extractor for JSON response rather using Regular Expression extractor.
Use JSON Path Expressions as $..transaction_id
Results:
Simplest Regular Expression for extracting above is:
transaction_id": (.+)
Where:
() is used for creating capture group.
. (dot) matches any character except line breaks.
+ (plus) matches 1 or more of the preceding token.
(.+?) could be used to stop looking after first instance is found.
i.e. ? makes the preceding quantifier lazy, causing it to match as few characters as possible. By default, quantifiers are greedy, and will match as many characters as possible.

GROK (regular expressions), field with backslash, space and a long

I'm using Logstash to get some text out of a string and create a field.
The string of the message is:
"\"07/12/2016 16:21:24.652\",\"13.99\",\"1467351040\""
I can't figure it out how to get three results, being the first:
07/12/2016 16:21:24.652
The second
13.99
The third
1467351040
match => {
"message"=> [
"\\"%{DATESTAMP:a}\\",\\"%{NUMBER:b}\\",\\"%{NUMBER:c}\\""
]
}
To help the next time you have to craft a grok pattern:
GrokConstructor, to test your pattern
The main patterns
Grok filter documentation
That's the correct line indeed.
I had to remove one backslash for my own config. Thanks very much. Saves me a lot of time and stuff.
grok{ match => { "message"=> [ "\"%{DATESTAMP:a}\",\"%{NUMBER:b}\",\"%{NUMBER:c}\"" ]} }

logstash grok filter regular expression works in debug tool but failed in actual execution

I'm trying to extract a filed out of log line, i use http://grokdebug.herokuapp.com/ to debug my regular expression with:
(?<action>(?<=action=).*(?=\&))
with input text like this:
/event?id=123&action={"power":"on"}&package=1
i was able to get result like this:
{
"action": [
"{"power":"on"}"
]
}
but when i copy this config to my logstash config file:
input { stdin{} }
filter {
grok {
match => { "message" => "(?<action>(?<=action=).*(?=\&))"}
}
}
output { stdout {
codec => 'json'
}}
the output says matching failed:
{"message":" /event?id=123&action={\"power\":\"on\"}&package=1","#version":"1","#timestamp":"2016-01-05T10:30:04.714Z","host":"xxx","tags":["_grokparsefailure"]}
i'm using logstash-2.1.1 in cygwin.
any idea why this happen?
You might experience an issue caused by a greedy dot matching subpattern .*. Since you are only interested in a string of text after action= till next & or end of string you'd better use a negated character class [^&].
So, use
[?&]action=(?<action>[^&]*)
The [?&] matches either a ? or & and works as a boundary here.
It doesn't answer your regexp question, but...
Parse the query string to a separate field and use the kv{} filter on it.