Regex to capture between two characters that matches a word inbetween - regex

I am trying to find out the best way possible for extracting all the text in between two characters (ignoring line breaks) that matches a word in between the two characters specified.
In the below example, i want to find by the zip 22222 and extract/group its block from { till } that is {
"zip":"22222",
"total":2
}
Example :
{
"zip":"11111",
"total":1
},
{
"zip":"22222",
"total":2
},
{
"zip":"33333",
"total":3
}
Want to extract/capture/group the block {...} for zip 22222 as below :
{
"zip":"22222",
"total":2
}
I tried the below, but this is capturing the blocks for all zip codes
(?s)(?<={)(.*?)(?=})
https://regex101.com/r/0wTDyj/1

Below regex worked for me :
(?s)(?={)(?:(?:(?!"zip").)?"zip"\s:\s*"22222".*?)(?<=})

Related

Logstash Grok Regex: get each line in each block

I need a custom logstash-grok regex pattern
Some sample data:
abc blabla
[BLOCK]
START=1
END=2
[/BLOCK]
more blabla
[BLOCK]
START=3
END=4
[/BLOCK]
Note: each line ends in a newline character.
How do I capture all START and END values?
The desired result is:
{ "BLOCK1": { "START:"1", "END":"2"} }, "BLOCK2": { "START":"3", "END":"4" } }
I tried
START \bSTART=(?<start>\d*)
END \bEND=(?<end>\d*)
but the result is the values of only the first block:
{ "start": "1", "end": "2" }
I also tried using the multiline character (?m) in front of the grok pattern but that doesn't work either...
Any help is appreciated.

Extracting multiple values with RegEx in a Google Sheet formula

I have a Google spreadsheet with 2 columns.
Each cell of the first one contains JSON data, like this:
{
"name":"Love",
"age":56
},
{
"name":"You",
"age":42
}
Then I want a second column that would, using a formula, extract every value of name and string it like this:
Love,You
Right now I am using this formula:
=REGEXEXTRACT(A1, CONCATENER(CHAR(34),"name",CHAR(34),":",CHAR(34),"([^",CHAR(34),"]+)",CHAR(34),","))
The RegEx expresion being "name":"([^"]+)",
The problem being that it currently only returns the first occurence, like this:
Love
(Also, I don't know how many occurences of "name" there are. Could be anywhere from 0 to around 20.)
Is it even possible to achieve what I want?
Thank you so much for reading!
EDIT:
My JSON data starts with:
{
"time":4,
"annotations":[
{
Then in the middle, something like this:
{
"name":"Love",
"age":56
},
{
"name":"You",
"age":42
}
and ends with:
],
"topEntities":[
{
"id":247120,
"score":0.12561166,
"uri":"http://en.wikipedia.org/wiki/Revenue"
},
{
"id":31512491,
"score":0.12504959,
"uri":"http://en.wikipedia.org/wiki/Wii_U"
}
],
"lang":"en",
"langConfidence":1.0,
"timestamp":"2020-05-22T12:17:47.380"
}
Since your text is basically a JSON string, you may parse all name fields from it using the following custom function:
function ExtractNamesFromJSON(input) {
var obj = JSON.parse("[" + input + "]");
var results = obj.map((x) => x["name"])
return results.join(",")
}
Then use it as =ExtractNamesFromJSON(C1).
If you need a regex, use a similar approach:
function ExtractAllRegex(input, pattern,groupId,separator) {
return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]).join(separator);
}
Then use it as =ExtractAllRegex(C1, """name"":""([^""]+)""",1,",").
Note:
input - current cell value
pattern - regex pattern
groupId - Capturing group ID you want to extract
separator - text used to join the matched results.

How do I query a field from Mongo where the value is actually a number but stored as string?

The confusing part is the number in the string field can be with leading zeros but my query param will not contain that
Object 1:
{
"_id" : ObjectId("5c3f6aec29c2e3193315b485"),
"flightCode" : "000541300157840"
}
Object 2:
{
"_id" : ObjectId("5c3f6aec29c2e3193315b485"),
"flightCode" : "00054130015784"
}
If my intent is to find flight code that matches number 54130015784, how will I write my query?
You need $regex operator with following regular expression:
var code = "541300157840";
var regex = "^0*" + code + "$"
db.col.find({ flightCode: { $regex: new RegExp(regex) } })
where * means that 0 occurs zero or more times which means that it works both for 000541300157840 and for 541300157840
If you think that your data would have text flight code so the string can be identified, we can use this.
Regex:
54130015784(?="\n)
Explanation:
Positive Lookahead (?="\n)
Assert that the Regex below matches
" matches the character " literally (case sensitive)
\n matches a line-feed (newline) character (ASCII 10)
Example:
https://regex101.com/r/sF0YfH/3
Let me know if it works. If not give a clear idea what you want.

c# Regex expression to extract all non-numeric values in brackets

This is the Regex expression i have built so far \{([^{]*[^0-9])\}.
"This is the sample string {0} {1} {} {abc} {12abc} {abc123}"
I wish to extract everything within the string that includes brackets and that does not contain only an integer.
(e.g) '{}'
'{abc}' '{12abc}' '{abc123}'
However the last one which contains numbers at the end is not extracted with the rest.
{abc123}
How can i extract all values in the string that are in curly brackets and do not contain an Integer?
You may use
var res = Regex.Matches(s, #"{(?!\d+})[^{}]*}")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
See the regex demo and the online C# demo.
Pattern details
{ - a { char
(?!\d+}) - no 1+ digits and then } allowed immediately to the right of the current location
[^{}]* - 0+ chars other than { and }
} - a } char.

Regular Expression to match last word when string starts with pattern

I'm trying to create a regex to match the last word of a string, but only if the string starts with a certain pattern.
For example, I want to get the last word of a string only if the string starts with "The cat".
"The cat eats butter" -> would match "butter".
"The cat drinks milk"-> would match "milk"
"The dog eats beef" -> would find no match.
I know the following will give me the last word:
\s+\S*$
I also know that I can use a positive look behind to make sure a string starts with a certain pattern:
(?<=The cat )
But I can't figure out to combine them.
I'll be using this in c# and I know I could combine this with some string comparison operators but I'd like this all to be in one regex expression, as this is one of several regex pattern string that I'll be looping through.
Any ideas?
Use the following regex:
^The cat.*?\s+(\S+)$
Details:
^ - Start of the string.
The cat - The "starting" pattern.
.*? - A sequence of arbitrary chars, reluctant version.
\s+ - A sequence of "white" chars.
(\S+) - A capturing group - sequence of "non-white" chars,
this is what you want to capture.
$ - End of the string.
So the last word will be in the first capturing group.
What about this one?
^The\scat.*\s(\w+)$
My regex knowdlege is quite rusty, but couldn't you simply "add" the word you are looking for at the start of \s+\S*$, if you know that will return the last word?
Something like this then (the "\" is supposed to be the escape sign so it's read as the actual word):
\T\h\e\ \c\a\t\ \s+\S*$
Without Regex
No need for regex. Just use C#'s StartsWith with Linq's Split(' ').Last().
See code in use here
using System;
using System.Linq;
using System.Text.RegularExpressions;
class Example {
static void Main() {
string[] strings = {
"The cat eats butter",
"The cat drinks milk",
"The dog eats beef"
};
foreach(string s in strings) {
if(s.StartsWith("The cat")) {
Console.WriteLine(s.Split(' ').Last());
}
}
}
}
Result:
butter
milk
With Regex
If you prefer, however, a regex solution, you may use the following.
See code in use here
using System;
using System.Text.RegularExpressions;
class Example {
static void Main() {
string[] strings = {
"The cat eats butter",
"The cat drinks milk",
"The dog eats beef"
};
Regex regex = new Regex(#"(?<=^The cat.*)\b\w+$");
foreach(string s in strings) {
Match m = regex.Match(s);
if(m.Success) {
Console.WriteLine(m.Value);
}
}
}
}
Result:
butter
milk