Regex with multiple filters - regex

I Want to create a filter that will match the following string with the following interface
Date: dd/mm/yyyy-dd/mm/yyyy
Name: string
ID: string
The string itself:
Date: 11/02/2020,Name:SO,ID:10
The Regex I tried look like this ->
(Date:((((([13578]|0[13578]|1[02])[\/](0[1-9]|[1-9]|1[0-9]|2[0-9]|3[01]))|(([469]|0[469]|11)[\/]([1-9]|1[0-9]|2[0-9]|3[0]))|((2|02)([\/](0[1-9]|1[0-9]|2[0-8]))))[\/](19([6-9][0-9])|20([0-9][0-9])))|((02)[\/](29)[\/](19(6[048]|7[26]|8[048]|9[26])|20(0[048]|1[26]|2[048]))))(-)?((((([13578]|0[13578]|1[02])[\/](0[1-9]|[1-9]|1[0-9]|2[0-9]|3[01]))|(([469]|0[469]|11)[\/]([1-9]|1[0-9]|2[0-9]|3[0]))|((2|02)([\/](0[1-9]|1[0-9]|2[0-8]))))[\/](19([6-9][0-9])|20([0-9][0-9])))|((02)[\/](29)[\/](19(6[048]|7[26]|8[048]|9[26])|20(0[048]|1[26]|2[048]))))?|Name|ID)`
The problem I have with this regex is that it just captures the first filter.

I'm not sure I understood your problem, so I created a regex that matches the string you want, whatever are the dates or the names :
Date: (\d{2}\/\d{2}\/\d{4})-(\d{2}\/\d{2}\/\d{4})\sName:([a-zA-Z]+)
Test it here.
Can you edit your post and give more examples of strings that must match and strings that must not ?

Related

Match a string between hash symbols

I have a string in this shape
State#Received#ID#e23d8926-1327-4fde-9ea7-d364af3325e0
I want to extract the State value via RegEx. So in this above example I only want to extract Received
I have tried the following ([^State#])([A-Za-z]) which matches Received but I am stuck at excluding the rest of the string #ID#e23d8926-1327-4fde-9ea7-d364af3325e0
You should not use a parenthesis for the group you don't want to capture. My solution is that:
State#(?'state'[^#]+)#
Sample: https://regex101.com/r/vAr65j/1

Query document based on field's value containing backslash using regex

I'm trying to query DB with documments similar to one presented below.
{
"_id":"5b9bd1b947c7471038399a39",
"subdir":"ge\\pt02\\kr02_20180824\\kr02_2018091log\\0010796ab5",
}
How to filter all documments starting with: ge\\pt02\\kr02
I tried many different approaches,
for example:
{"subdir": {"$regex": "pt02\\kr02*"}}
but I cannot figure out how to prepare a correct filter:
The problem is that you need to escape the slashes.
Here is a working example:
db.test1.insert({"subdir":"ge\\pt02\\kr02_20180824\\k2_2018091log\\0010796ab5"})
db.test1.find({"subdir": { $regex: "^ge\\\\pt02\\\\kr02"}})
This prints out:
{ "_id" : ObjectId("5ba28194fbb45cb9f7c58b18"), "subdir" : "ge\\pt02\\kr02_20180824\\kr02_2018091log\\0010796ab5" }
We need to escape the backslash there. Also since you want to select only the documents starting with this pattern, you need to group the regex into a parenthesis and prefix the group with caret. This gives us the following regex:
let pattern = "^(ge\\\\pt02\\\\kr02)";
{"subdir": {"$regex": pattern}}
Demo:

What is the regular expression to extract word which is not preceded by any characters

For example,
ID is content ID,
need to know the regex to extract the first ID, I tried using [/b]ID but is not working
You're not giving us a lot to go on, but here is an extremely general example:
(ID.+)
which matches ID and everything after, be it ID:, ID "123", or ID: "123". Note that if you have characters after ID, it will capture that too. Update your question and I will update my answer to accomodate it accordingly.
Here is a live example: http://rubular.com/r/enuYP1kze4.

Perform Regex on value returned by Regex

This is probably straightforward but I'm not even sure which phrase I should google to find the answer. Forgive my noobiness.
I've got strings (filenames) that look like this:
site12345678_date20160912_23001_to_23100_of_25871.txt
What this naming convention means is "Records 23001 through 23100 out of 25871 for site 12345678 for September 12th 2016 (20160912)"
What I want to do is extract the date part (those digits between _date and the following _)
The Regex: .*(_date[0-9]{8}).* will return the string _date20160912. But what I'm actually looking for is just 20160912. Obviously, [0-8]{8} doesn't give me what I want in this case because that could be confused with the site, or potentially record counts
How can I responsibly accomplish this sort of 'substringing' with a single regular expression?
You just need to shift you parentheses so as to change the capture group from including '_date' in it. Then you would want to look for your capture group #1:
If done in python, for example, it would look something like:
import re
regex = '.*_date([0-9]{8}).*'
str = 'site12345678_date20160912_23001_to_23100_of_25871.txt'
m = re.match(regex, str)
print m.group(0) # the whole string
print m.group(1) # the string you are looking for '20160912'
See it in action here: https://eval.in/641446
The Regex: .*(_date[0-9]{8}).* will return the string _date20160912.
That means you are using the regex in a method that requires a full string match, and you can access Group 1 value. The only thing you need to change in the regex is the capturing group placement:
.*_date([0-9]{8}).*
^^^^^^^^^^
See the regex demo.

Can I capture a label not found in the test string using regex?

Assuming I have some strings of the following type:
session opened by (uid=0)
session opened by scotty
Is it possible to write a regex that will either capture the text "root" if (uid=0) is found in the string, otherwise capture the normal user name (i.e. scotty)?
Regex does not allow you to capture anything that is missing from the input string. If you know the structure of the input text, you can have a regex pattern return the required part. Here is an example that works for .NET-based regex flavor:
(?s)(?<=\(uid=0\).*opened by )\w+
Matches Found:
[0][0] = scotty