Query document based on field's value containing backslash using regex - regex

I'm trying to query DB with documments similar to one presented below.
{
"_id":"5b9bd1b947c7471038399a39",
"subdir":"ge\\pt02\\kr02_20180824\\kr02_2018091log\\0010796ab5",
}
How to filter all documments starting with: ge\\pt02\\kr02
I tried many different approaches,
for example:
{"subdir": {"$regex": "pt02\\kr02*"}}
but I cannot figure out how to prepare a correct filter:

The problem is that you need to escape the slashes.
Here is a working example:
db.test1.insert({"subdir":"ge\\pt02\\kr02_20180824\\k2_2018091log\\0010796ab5"})
db.test1.find({"subdir": { $regex: "^ge\\\\pt02\\\\kr02"}})
This prints out:
{ "_id" : ObjectId("5ba28194fbb45cb9f7c58b18"), "subdir" : "ge\\pt02\\kr02_20180824\\kr02_2018091log\\0010796ab5" }

We need to escape the backslash there. Also since you want to select only the documents starting with this pattern, you need to group the regex into a parenthesis and prefix the group with caret. This gives us the following regex:
let pattern = "^(ge\\\\pt02\\\\kr02)";
{"subdir": {"$regex": pattern}}
Demo:

Related

RegEx remove part of string and and replace another part

I have a challenge getting the desired result with RegEx (using C#) and I hope that the community can help.
I have a URL in the following format:
https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1
I want make two modifications, specifically:
1) Remove everything after 'value' e.g. '&ida=0&idb=1'
2) Replace 'category' with e.g. 'newcategory'
So the result is:
https://somedomain.com/subfolder/newcategory/?abc=text:value
I can remove the string from 1) e.g. ^[^&]+ above but I have been unable to figure out how to replace the 'category' substring.
Any help or guidance would be much appreciated.
Thank you in advance.
Use the following:
Find: /(category/.+?value)&.+
Replace: /new$1 or /new\1 depending on your regex flavor
Demo & explanation
Update according to comment.
If the new name is completely_different_name, use the following:
Find: /category(/.+?value)&.+
Replace: /completely_different_name$1
Demo & explanation
You haven't specified language here, I mainly work on python so the solution is in python.
url = re.sub('category','newcategory',re.search('^https.*value', value).group(0))
Explanation
re.sub is used to replace value a with b in c.
re.search is used to match specific patterns in string and store value in the group. so in the above code re.search will store value from "https to value" in group 0.
Using Python and only built-in string methods (there is no need for regular expressions here):
url = r"https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1"
new_url = (url.split('value')[0] + "value").replace("category", 'newcategory')
print(new_url)
Outputs:
https://somedomain.com/subfolder/newcategory/?abc=text:value

GROK (regular expressions), field with backslash, space and a long

I'm using Logstash to get some text out of a string and create a field.
The string of the message is:
"\"07/12/2016 16:21:24.652\",\"13.99\",\"1467351040\""
I can't figure it out how to get three results, being the first:
07/12/2016 16:21:24.652
The second
13.99
The third
1467351040
match => {
"message"=> [
"\\"%{DATESTAMP:a}\\",\\"%{NUMBER:b}\\",\\"%{NUMBER:c}\\""
]
}
To help the next time you have to craft a grok pattern:
GrokConstructor, to test your pattern
The main patterns
Grok filter documentation
That's the correct line indeed.
I had to remove one backslash for my own config. Thanks very much. Saves me a lot of time and stuff.
grok{ match => { "message"=> [ "\"%{DATESTAMP:a}\",\"%{NUMBER:b}\",\"%{NUMBER:c}\"" ]} }

logstash grok filter regular expression works in debug tool but failed in actual execution

I'm trying to extract a filed out of log line, i use http://grokdebug.herokuapp.com/ to debug my regular expression with:
(?<action>(?<=action=).*(?=\&))
with input text like this:
/event?id=123&action={"power":"on"}&package=1
i was able to get result like this:
{
"action": [
"{"power":"on"}"
]
}
but when i copy this config to my logstash config file:
input { stdin{} }
filter {
grok {
match => { "message" => "(?<action>(?<=action=).*(?=\&))"}
}
}
output { stdout {
codec => 'json'
}}
the output says matching failed:
{"message":" /event?id=123&action={\"power\":\"on\"}&package=1","#version":"1","#timestamp":"2016-01-05T10:30:04.714Z","host":"xxx","tags":["_grokparsefailure"]}
i'm using logstash-2.1.1 in cygwin.
any idea why this happen?
You might experience an issue caused by a greedy dot matching subpattern .*. Since you are only interested in a string of text after action= till next & or end of string you'd better use a negated character class [^&].
So, use
[?&]action=(?<action>[^&]*)
The [?&] matches either a ? or & and works as a boundary here.
It doesn't answer your regexp question, but...
Parse the query string to a separate field and use the kv{} filter on it.

Regex with non-capturing hashbangs

I'm trying to write a regex which will parse the hash portion of a URL, removing whichever conventionally-formatted hashbang may be present.
For example, I wish to remove any of the following:
#
#/
#!
#!/
This is what I currently have:
/[(?:#|#\/|#!|#!\/)]+/
However, this is capturing an empty group at the start, and splitting the remaining strings. For example,
"#!/E/F".split(/[(?:#|#\/|#!|#!\/)]/); // ["", "", "", "E", "F"]
Whereas the desirable outcome is simply a single group
["E/F"]
Could someone please point out the error in my regex?
[If it makes a difference, I produced the above output using the JavaScript console in Firebug.]
Use string.replace instead of string.split.
#!?\/?
Use the above regex and then replace the match with empty string.
> '#!/E/F'.replace(/#!?\/?/g, '');
'E/F'
DEMO
Your regex seems awfully complicated. Maybe this is more what you're looking for:
"#!/E/F".split(/(#!/|#/|#!|#)/);
Did you checkout the Javascript regex documentation?
It might be different from what you imagined, since I don't understand why you're using the : and ? in your regex.
If you're using Javascript then you can just use:
location.assign(location.href.replace(/#.*$/, ""));
However if you only want to remove above listed hashtags then use:
var repl = location.href.replace(/#(!\/?|\/)?$/, '');

Regex URI portion: Remove hyphens

I have to split URIs on the second portion:
/directory/this-part/blah
The issue I'm facing is that I have 2 URIs which logically need to be one
/directory/house-&-home/blah
/directory/house-%26-home/blah
This comes back as:
house-&-home and house-%26-home
So logically I need a regex to retrieve the second portion but also remove everything between the hyphens.
I have this, so far:
/[^(/;\?)]*/([^(/;\?)]*).*
(?<=directory\/)(.+?)(?=\/)
Does this solve your issue? This returns:
house-&-home and house-%26-home
Here is a demo
If you want to get the result:
house--home
then you should use a replace method. Because I am not sure what language you are using, I will give my example in java:
String regex = (?<=directory\/)(.+?)(?=\/);
String str = "/directory/house-&-home/blah"
Pattern.compile(regex).matcher(str).replaceAll("\&", "");
This replace method allows you to replace a certain pattern ( The & symbol ) with nothing ""