Regex, take last match before suffix [duplicate] - regex

This question already has answers here:
Tempered Greedy Token - What is different about placing the dot before the negative lookahead?
(3 answers)
Closed 4 years ago.
I know this is going to sound like the kind of question that's been asked hundreds of times. But I've been searching for over an hour and none of the solution I found worked in my case.
I have many different numbers of the form
\d*'?\d+\.\d\d
An example of string I work with would be
The base item costs 1'245.48, the tax is of 18.45 and the bonus of 250.00, the total price is of 1'013.93. In case of trouble, contact our e-mail. Bank account 784.45
I want to get ONLY the last match corresponding to my regex before e-mail, i.e 1'013.93. I would like to use only regex, no extra python, javascript or anything
I have tried code inspired by this Regex Last occurrence?, this How to capture only last match in Regex, this Find Last Occurrence of Regex Word, and many other expressions of my own, but so far there always seems to be one piece missing
For example, after successfully selecting the very last number with (\d*'?\d+\.\d\d)(?!.*\d*'?\d+\.\d\d), I tried (\d*'?\d+\.\d\d)(?!.*\d*'?\d+\.\d\d)(?=e-mail), which does not match anything.
Any insights?

You could try this:
((\d+')?\d+(\.\d+)?)(?=[^\d]+e-mail)
The first group matches the number you want. From regex101.com:

Something like this with an extra number format check:
((\d{1,3}')*(\d{1,3})\.\d{2})(?=\D+e-mail)
Demo

Related

trying to match while excluding from front/behind - Eclipse Negative Lookahead not working as I expect it to [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
I am trying to write some regex to use in Eclipse file search that will find "http:", while allowing me to exclude a word or words before and after the "http:"
Before making it work for both in front and behind, I am just trying to get it to work excluding phrases in the front using negative lookahead. So I have been trying this:
^(?!QName).*http:
or
.*^(?!QName).*http:
I would expect this line to not come back in the search:
// QName qn = new QName( "http://BUNDLE.wsdl","bundle");
But it does come back in the search. These both match the line all the way up to http: if QName is not present, or it matches the entire line if QName is present.
Eventually I want to make it more complex where I can exclude words in the front and back:
^(?!QName|xlmns).*http:^(?!word1|word2)
But I am far from that point - however any help on that will be appreciated since I am likely going to have trouble with it too
Credit to Wiktor Stribiżew in the above post
^(?!.*(?:QName|xmlns).*http:).*http:

Remove first char from string - Regex [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I have started using Workflow on iOS to help speed up tasks at work. One of those is entering delivery records into the computer (via the iPad barcode scan function) instead of manually writting down the ref code and then typing it in.
Workflow has a "Replace Text" function that can be used with regexs to strip out characters etc.
I have managed to find a regex to get rid of the last digit in a scan (a checksum digit, always a capital letter).
The regex is simple.
.{0}-$.
This goes in the "Find Text" field. The "Replace With" is left empty. It works wonderfully.
How can adapt this to work with other scan types with other scan types where I want to specically get rid of the FIRST character only? I've searched the forums but can only find long and difficult to interpret regexes that I am sure won't do what I am trying to achive, something simple by comparison.
An example is of what I mean is to convert "Y300006944" to "300006944"
You can use the following regex:
^.(.*)$
with a backreference $1 that you can use as replacement.
Good luck.
Thanks to those who contributed somehting useful :)
I got the it resolved by using the "Split Text" function in Workflow for iOS.
I gave it the command to split based on a customer char, "Y" in this case. It's enough in this simple case.

Partial string matching in Mongodb [duplicate]

This question already has answers here:
How to query MongoDB with "like"
(45 answers)
Closed 7 years ago.
Lets say I have a bunch of mongodb records like so, which are all strings:
{myRecord:'foobarbazfoobaz'}
{myRecord:'bazbarfoobarbaz'}
{myRecord:'foobarfoofoobaz'}
{myRecord:'bazbarbazbazbar'}
I need to be able to partial string match in two ways:
1) I want to match on 'foobar' so it returns:
'foobarbazfoobaz'
'foobarfoofoobaz'
Note that here, 'foobar' is a partial string that is matched against any of the records from the beginning of the string. It doesn't matter if 'foobar' turns up later in the string. As long the first six characters of 'foobar' match against the the first six characters of the record, I want to get it back.
2) I need to be able match on 'baz%%%baz' so it returns:
bazbarbazbazbar
Here 'baz%%%baz' matches the first three characters of any of the records, ignores the next three, then matches against the final three. Again, it doesn't matter if this pattern occurs later in the string, I am just interested in if I can match it from the beginning of the string.
I think there is some kind mongo regex to do this (hopefully) but I am terrible when it comes to regex. Any help would be greatly appreciated.
This is for a web application where users are searching for sequences of events on a timeline and they will always have to search from the beginning, but can leave blanks in the search if they wish to.
You can try $regex operator
1) I want to match on 'foobar'
db.collection.find({"myRecord":{"$regex":"^foobar*"}})
I need to be able match on 'baz%%%baz'
db.collection.find({"myRecord":{"$regex":"^baz.{3}baz"}})
Hope it will help
Hang on - just found a way to deal with the second case, which turns out to be unexpectedly straightforward:
{"myRecord":{"$regex":"^baz.{3}.baz"}}
I probably should spend some time learning how to use regex!

Regular Expression to find CVE Matches

I am pretty new to the concept of regex and so I am hoping an expert user can help me craft the right expression to find all the matches in a string. I have a string that represents a lot of support information in it for vulnerabilities data. In that string are a series of CVE references in the format: CVE-2015-4000. Can anyone provide me a sample regex on finding all occurrences of that ? obviously, the numeric part of that changes throughout the string...
Generally you should always include your previous efforts in your question, what exactly you expect to match, etc. But since I am aware of the format and this is an easy one...
CVE-\d{4}-\d{4,7}
This matches first CVE- then a 4-digit number for the year identifier and then a 4 to 7 digit number to identify the vulnerability as per the new standard.
See this in action here.
If you need an exact match without any syntax or logic violations, you can try this:
^(CVE-(1999|2\d{3})-(0\d{2}[1-9]|[1-9]\d{3,}))$
You can run this against the test data supplied by MITRE here to test your code or test it online here.
I will add my two cents to the accepted answer. Incase we want to detect case insensitive "CVE" we can following regex
r'(?i)\bcve\-\d{4}-\d{4,7}'

Regex expression to validate a list of email with ; delimiter at the end of each email address [duplicate]

This question already has answers here:
How can I validate an email address in JavaScript?
(79 answers)
Closed 8 years ago.
i found this regular expression :
"^(([a-zA-Z0-9_\\-\\.]+)#([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5}){1,25})+([;.](([a-zA-Z0-9_\\-\\.]+)#([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5}){1,25})+)*$"
which validates a list of email like : address1#gmail.com;adresse2#gmail.com
But i need to tweak it to validate in fact this sturcture :
address1#gmail.com;adresse2#gmail.com;
and also just one email address with this structure :
address1#gmail.com;
I also want to be able to validate email addresses containing + sign ,
for example validating :
address1#gmail;adress2#gmail.com;addres+3#gmail.com;
as a valid list of emails.
Thank you for your help.
do not abuse regular expression too much.
it's not worthy to spend a lot of time and effort writing something inefficient and hard to analyze.
if you know it's semicolon separated, i would provid following pseudocode:
A<-split email-list with ';'
valid<-True
foreach email in A
if email != "" and email doesn't match [\w\-\.+]+#([\w+\-]+\.)+[a-zA-Z]{2,5}
valid<-False
end
end
return valid
the regular expression [\w\-\.+]+#([\w+\-]+\.)+[a-zA-Z]{2,5} validates one email address. it's perl-compatible syntax.
it matches a-zA-Z_-.+ in the domain, and allows domain names with a-zA-Z- in it, and end with 2 to 5 a-zA-Z combination.
in the regex you provided, it matches domain name with ([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5}){1,25})+, it's odd. i don't think you should do it this way.
about the reason why i said you are abusing regex is that, even though the problem you want to solve can be solved in regex, but
it takes more than linear time to design and debug regex as it gets longer.
it takes more than linear time for long regex to execute.
it's hard for other people to understand what you attempt to do. it's kind of preventing people from modifying you code, if it's not what you want.
so, please, never try to solve problem like this using pure regex. it's not a programming language.
This regex will match email-id's based on your criteria.
(?![\W_])((?:([\w+-]{2,})\.?){1,})(?<![\W_])#(?![\W_])(?=[\w.-]{5,})(?=.+\..+)(?1)(?<![\W_])
Regard the semicolon separated email-id's it is best to split them based on semicolon and match each occurrence individually to avoid confusions.
You can take a look at the matches here.
Just split the whole string using ; character and match each element based on the following regex. It will take care of plus sign also
string pattern = " \b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b";
foreach(string email in emailString.Split(';')
{
if(Regex.IsMatch(email, pattern))
{
//do stuff
}
}
As others have said, first split on ;, then validate the addresses individually. If you are going to use a regex to validate email, at least use one that's been at least vaguely tested against both good and bad examples, such as those on fightingforalostcause.net , listed in this project, or discussed in this definitive question, of which this question is effectively a duplicate.