Partial string matching in Mongodb [duplicate] - regex

This question already has answers here:
How to query MongoDB with "like"
(45 answers)
Closed 7 years ago.
Lets say I have a bunch of mongodb records like so, which are all strings:
{myRecord:'foobarbazfoobaz'}
{myRecord:'bazbarfoobarbaz'}
{myRecord:'foobarfoofoobaz'}
{myRecord:'bazbarbazbazbar'}
I need to be able to partial string match in two ways:
1) I want to match on 'foobar' so it returns:
'foobarbazfoobaz'
'foobarfoofoobaz'
Note that here, 'foobar' is a partial string that is matched against any of the records from the beginning of the string. It doesn't matter if 'foobar' turns up later in the string. As long the first six characters of 'foobar' match against the the first six characters of the record, I want to get it back.
2) I need to be able match on 'baz%%%baz' so it returns:
bazbarbazbazbar
Here 'baz%%%baz' matches the first three characters of any of the records, ignores the next three, then matches against the final three. Again, it doesn't matter if this pattern occurs later in the string, I am just interested in if I can match it from the beginning of the string.
I think there is some kind mongo regex to do this (hopefully) but I am terrible when it comes to regex. Any help would be greatly appreciated.
This is for a web application where users are searching for sequences of events on a timeline and they will always have to search from the beginning, but can leave blanks in the search if they wish to.

You can try $regex operator
1) I want to match on 'foobar'
db.collection.find({"myRecord":{"$regex":"^foobar*"}})
I need to be able match on 'baz%%%baz'
db.collection.find({"myRecord":{"$regex":"^baz.{3}baz"}})
Hope it will help

Hang on - just found a way to deal with the second case, which turns out to be unexpectedly straightforward:
{"myRecord":{"$regex":"^baz.{3}.baz"}}
I probably should spend some time learning how to use regex!

Related

How can i use if else validation in regex

I know how to validate a constant string with regex, but I'm really having trouble finding out how to do the following: I want a regex to validate the string edv_ after that I want the validation to be dependent:
if the user inputs for example edv_2, only 6 or 7 can be the next character. So only edv_26 or edv_27 would be valid
if the user would enter edv_3 then only edv_32 or edv_39 would be valid
Ive tried searching on the internet watched several youtube tutorials. None of them seem to handle this kind of thing. It's always only 1 constant thing they want to validate.
/[e][d][v][_][A]/ig
This matches the first part (edv_digit) but I have no clue how I should continue with the if else conditions.
You basically need alternations for handling your various cases. You can use this regex which matches as per your criteria.
\bedv_(?:2[67]|3[29])\b
Here boundaries ensure it doesn't match partial text like abcedv_26 or edv_26111 and it starts matching with edv_ then looks for either 2 followed by either 6 or 7 or looks for 3 followed by 2 or 9.
Live Demo

Regex, take last match before suffix [duplicate]

This question already has answers here:
Tempered Greedy Token - What is different about placing the dot before the negative lookahead?
(3 answers)
Closed 4 years ago.
I know this is going to sound like the kind of question that's been asked hundreds of times. But I've been searching for over an hour and none of the solution I found worked in my case.
I have many different numbers of the form
\d*'?\d+\.\d\d
An example of string I work with would be
The base item costs 1'245.48, the tax is of 18.45 and the bonus of 250.00, the total price is of 1'013.93. In case of trouble, contact our e-mail. Bank account 784.45
I want to get ONLY the last match corresponding to my regex before e-mail, i.e 1'013.93. I would like to use only regex, no extra python, javascript or anything
I have tried code inspired by this Regex Last occurrence?, this How to capture only last match in Regex, this Find Last Occurrence of Regex Word, and many other expressions of my own, but so far there always seems to be one piece missing
For example, after successfully selecting the very last number with (\d*'?\d+\.\d\d)(?!.*\d*'?\d+\.\d\d), I tried (\d*'?\d+\.\d\d)(?!.*\d*'?\d+\.\d\d)(?=e-mail), which does not match anything.
Any insights?
You could try this:
((\d+')?\d+(\.\d+)?)(?=[^\d]+e-mail)
The first group matches the number you want. From regex101.com:
Something like this with an extra number format check:
((\d{1,3}')*(\d{1,3})\.\d{2})(?=\D+e-mail)
Demo

Remove first char from string - Regex [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I have started using Workflow on iOS to help speed up tasks at work. One of those is entering delivery records into the computer (via the iPad barcode scan function) instead of manually writting down the ref code and then typing it in.
Workflow has a "Replace Text" function that can be used with regexs to strip out characters etc.
I have managed to find a regex to get rid of the last digit in a scan (a checksum digit, always a capital letter).
The regex is simple.
.{0}-$.
This goes in the "Find Text" field. The "Replace With" is left empty. It works wonderfully.
How can adapt this to work with other scan types with other scan types where I want to specically get rid of the FIRST character only? I've searched the forums but can only find long and difficult to interpret regexes that I am sure won't do what I am trying to achive, something simple by comparison.
An example is of what I mean is to convert "Y300006944" to "300006944"
You can use the following regex:
^.(.*)$
with a backreference $1 that you can use as replacement.
Good luck.
Thanks to those who contributed somehting useful :)
I got the it resolved by using the "Split Text" function in Workflow for iOS.
I gave it the command to split based on a customer char, "Y" in this case. It's enough in this simple case.

Regex expression to validate a list of email with ; delimiter at the end of each email address [duplicate]

This question already has answers here:
How can I validate an email address in JavaScript?
(79 answers)
Closed 8 years ago.
i found this regular expression :
"^(([a-zA-Z0-9_\\-\\.]+)#([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5}){1,25})+([;.](([a-zA-Z0-9_\\-\\.]+)#([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5}){1,25})+)*$"
which validates a list of email like : address1#gmail.com;adresse2#gmail.com
But i need to tweak it to validate in fact this sturcture :
address1#gmail.com;adresse2#gmail.com;
and also just one email address with this structure :
address1#gmail.com;
I also want to be able to validate email addresses containing + sign ,
for example validating :
address1#gmail;adress2#gmail.com;addres+3#gmail.com;
as a valid list of emails.
Thank you for your help.
do not abuse regular expression too much.
it's not worthy to spend a lot of time and effort writing something inefficient and hard to analyze.
if you know it's semicolon separated, i would provid following pseudocode:
A<-split email-list with ';'
valid<-True
foreach email in A
if email != "" and email doesn't match [\w\-\.+]+#([\w+\-]+\.)+[a-zA-Z]{2,5}
valid<-False
end
end
return valid
the regular expression [\w\-\.+]+#([\w+\-]+\.)+[a-zA-Z]{2,5} validates one email address. it's perl-compatible syntax.
it matches a-zA-Z_-.+ in the domain, and allows domain names with a-zA-Z- in it, and end with 2 to 5 a-zA-Z combination.
in the regex you provided, it matches domain name with ([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5}){1,25})+, it's odd. i don't think you should do it this way.
about the reason why i said you are abusing regex is that, even though the problem you want to solve can be solved in regex, but
it takes more than linear time to design and debug regex as it gets longer.
it takes more than linear time for long regex to execute.
it's hard for other people to understand what you attempt to do. it's kind of preventing people from modifying you code, if it's not what you want.
so, please, never try to solve problem like this using pure regex. it's not a programming language.
This regex will match email-id's based on your criteria.
(?![\W_])((?:([\w+-]{2,})\.?){1,})(?<![\W_])#(?![\W_])(?=[\w.-]{5,})(?=.+\..+)(?1)(?<![\W_])
Regard the semicolon separated email-id's it is best to split them based on semicolon and match each occurrence individually to avoid confusions.
You can take a look at the matches here.
Just split the whole string using ; character and match each element based on the following regex. It will take care of plus sign also
string pattern = " \b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b";
foreach(string email in emailString.Split(';')
{
if(Regex.IsMatch(email, pattern))
{
//do stuff
}
}
As others have said, first split on ;, then validate the addresses individually. If you are going to use a regex to validate email, at least use one that's been at least vaguely tested against both good and bad examples, such as those on fightingforalostcause.net , listed in this project, or discussed in this definitive question, of which this question is effectively a duplicate.

Bash regex to detect IPv6 or none [duplicate]

This question already has answers here:
Regular expression that matches valid IPv6 addresses
(30 answers)
Closed 9 years ago.
How would I modify this IPv6 regex I wrote to either detect the address (ie the way the regex is written right now), but also accept "blank" ie the user did not specify an IPv6 address?
^[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}$
Right now, the regex is looking for a minimum of 0:0:0:0:0:0:0:0 or similar. Infact in addition to a blank address, I probably need to also be able to handle compression such as the following address:
FE80::1
or ::1
etc
Thanks!
* UPDATE *
So let me make sure I have this straight...
(^$|^IPV4)\|(^$|IPV6)\|REST OF STUFF$
That doesn't seem right. I feel like I have misplaced the ^ and $ and the very beginning and end of my entire regex.
Maybe this instead:
^(^$|IPV4)\|(^$|IPV6)\|REST$
* UPDATE *
Still no luck. Here is part of my code with the middles chopped out for sanity:
^(|[0-9]{1,3}.<<<OMIT MIDDLE IPV4>>>.[0-9]{1,3})\|(|(\A([0-9a-f]{1,4}:){1,1}<<<OMIT MIDDLE IPV6>>>[0-1]?\d?\d)){3}\Z))\|[a-zA-Z0-<<<MORE STUFF MIDDLE OMITTED>>>{0,50}$
I hope that isn't confusing. Thats the beginning and end of each regex with the middles omitted so you can see the ( ).
Perhaps I need to enclose the entire gigantic IPV6 regex in parenthesis?
* UPDATE *
Tried last statement above... no luck.
You can specify alternation with the | character, so a|b means "match either a or b". In this case it would look something like this:
^$|^[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}$
The regex ^$ will match empty strings, so ^$|<current-regex> means "match either an empty string, or whatever <current-regex> matches (in this case IPv6)". You could use ^\s*$ in place of ^$ if you want strings that only consist of whitespace character to also be considered "empty".
This just handles the first part of the question, handling the compression like FE80::1 is more complex and it looks like there are already some other good answers for that in comments (note that I don't think this question is a dupe, because the "also matching an empty string" part isn't present in those questions).
edit: If it is part of a larger regex, then you should wrap everything in a group and get rid of the ^$, so it would be something like (|<current-regex>). Since there is nothing before the |, it means that the group can match either empty strings or whatever your current regex would match.
According to this post on this site called Stack Overflow this other site has an explanation & example of a huge—but very usable—regex which is this:
(\A([0-9a-f]{1,4}:){1,1}(:[0-9a-f]{1,4}){1,6}\Z)|
(\A([0-9a-f]{1,4}:){1,2}(:[0-9a-f]{1,4}){1,5}\Z)|
(\A([0-9a-f]{1,4}:){1,3}(:[0-9a-f]{1,4}){1,4}\Z)|
(\A([0-9a-f]{1,4}:){1,4}(:[0-9a-f]{1,4}){1,3}\Z)|
(\A([0-9a-f]{1,4}:){1,5}(:[0-9a-f]{1,4}){1,2}\Z)|
(\A([0-9a-f]{1,4}:){1,6}(:[0-9a-f]{1,4}){1,1}\Z)|
(\A(([0-9a-f]{1,4}:){1,7}|:):\Z)|
(\A:(:[0-9a-f]{1,4}){1,7}\Z)|
(\A((([0-9a-f]{1,4}:){6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})\Z)|
(\A(([0-9a-f]{1,4}:){5}[0-9a-f]{1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})\Z)|
(\A([0-9a-f]{1,4}:){5}:[0-9a-f]{1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,1}(:[0-9a-f]{1,4}){1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,2}(:[0-9a-f]{1,4}){1,3}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,3}(:[0-9a-f]{1,4}){1,2}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,4}(:[0-9a-f]{1,4}){1,1}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A(([0-9a-f]{1,4}:){1,5}|:):(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A:(:[0-9a-f]{1,4}){1,5}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)