"REGEX" Match string not containing specific substring - regex

I will give an example, I have two strings:
FL_0DS906555B_3661_27012221225012_V001_S
FL_0DS906555C_3661_27012221225012_V001_S
And I want to get any string, that has no "0DS906555B" in it, has "2701222122" in it and "5012" is in range of 5003-5012.
My regex looks like this:
^.*(?!.*0DS906555B).{6}2701222122(500[3-9]|501[0-2]).*$
unfortunately it keeps matching everything all the time. I have looked into many posts here but nothing helped for me since people usually asked for less complex, smaller strings.
Thank you

Try (regex101):
^(?!.*0DS906555B)(?=.*_2701222122(?:500[3-9]|501[012])_).*$

Related

Matching within matches by extending an existing Regex

I'm trying to see if its possible to extend an existing arbitrary regex by prepending or appending another regex to match within matches.
Take the following example:
The original regex is cat|car|bat so matching output is
cat
car
bat
I want to add to this regex and output only matches that start with 'ca',
cat
car
I specifically don't want to interpret a whole regex, which could be quite a long operation and then change its internal content to match produce the output as in:
^ca[tr]
or run the original regex and then the second one over the results. I'm taking the original regex as an argument in python but want to 'prefilter' the matches by adding the additional code.
This is probably a slight abuse of regex, but I'm still interested if it's possible. I have tried what I know of subgroups and the following examples but they're not giving me what I need.
Things I've tried:
^ca(cat|car|bat)
(?<=ca(cat|car|bat))
(?<=^ca(cat|car|bat))
It may not be possible but I'm interested in what any regex gurus think. I'm also interested if there is some way of doing this positionally if the length of the initial output is known.
A slightly more realistic example of the inital query might be [a-z]{4} but if I create (?<=^ca([a-z]{4})) it matches against 6 letter strings starting with ca, not 4 letter.
Thanks for any solutions and/or opinions on it.
EDIT: See solution including #Nick's contribution below. The tool I was testing this with (exrex) seems to have a slight bug that, following the examples given, would create matches 6 characters long.
You were not far off with what you tried, only you don't need a lookbehind, but rather a lookahead assertion, and a parenthesis was misplaced. The right thing is: Put the original pattern in parentheses, and prepend (?=ca):
(?=ca)(cat|car|bat)
(?=ca)([a-z]{4})
In the second example (without | alternative), the parentheses around the original pattern wouldn't be required.
Ok, thanks to #Armali I've come to the conclusion that (?=ca)(^[a-z]{4}$) works (see https://regexr.com/3f4vo). However, I'm trying this with the great exrex tool to attempt to produce matching strings, and it's producing matches that are 6 characters long rather than 4. This may be a limitation of exrex rather than the regex, which seems to work in other cases.
See #Nick's comment.
I've also raised an issue on the exrex GitHub for this.

Match a string with a fixed substring in variable positions

there:
I want to create a filter in my email server that matches any message that contains any URL (using either http or https protocols) from a certain domain (let's say domain.org). I want it to match things like:
https://site1.domain.org
https://anothersite.domain.org
http://yetanotherone.domain.org
The problem here is that these strings can be wrapped in the message body at any random position of the string. And even worse, when the string is wrapped an equal sign is added before the end of the line, so I would need it to be able to match strings like these:
ht=
tps://thisisanexample.domain.org
https://thisisane=
xample.domain.org
https://thisisanexample.do=
main.org
I came up with a simple (but huge) solution, but I think there must be a much more elegant one than mine:
/h[=[:cntrl:]]*t[=[:cntrl:]]*t[=[:cntrl:]]*p[=[:cntrl:]]*s?[=[:cntrl:]]*:[=[:cntrl:]]*\/[=[:cntrl:]]*\/[=[:cntrl:]]*[-+_#&%$#|()=?¿:;,.,çÇ^[:cntrl:][:alnum:]\[\]\{\}\*\\]*[=[:cntrl:]]*.[=[:cntrl:]]*d[=[:cntrl:]]*o[=[:cntrl:]]*m[=[:cntrl:]]*a[=[:cntrl:]]*[=[:cntrl:]]*i[=[:cntrl:]]*n[=[:cntrl:]]*.[=[:cntrl:]]*o[=[:cntrl:]]*r[=[:cntrl:]]*g/
I have been looking around but I can not find anything that I understand to improve my solution given that my knowledge of regex does not go beyond simple queries.
Thank you very much in advance.
Regards.
2018/04/11 EDIT: Thank you to everyone who tried but the solutions proposed do not meet the requirements of elegance and readability I was expecting. I was looking for something like capturing everything but the equal-return string and performing the web address string search on the captured result of the first search. Is this a doable idea?

Using boost::regex to match two whole words

This seems like a really simple problem, but regardless of what I try the expression can't read the names.
The task here is to match two strings of random length (someone's name) then an id number after words, in this format: Joe Blow 123-456-678
I'm using boost::regex_search for this.
So far I have tried these expressions and they haven't worked..
"\\w{15}? \\s? \\w{15}? \\s? \\d{3}-\\d{3}-\\d{3}"
"\\w* \\s \\w* \\s \\d{3}-\\d{3}-\\d{3}"
"\\w+ \\s \\w+ \\s \\d{3}-\\d{3}-\\d{3}"
I tried a few other small variations of that as well but nothing has worked. This is the first time ever using regex, so if some of you are pros and this is stupidly simple, please go easy on me.
Try using
"^[a-zA-Z]+? [a-zA-Z]+? \\d{3}-\\d{3}-\\d{3}$"
and see if it works or not

Exclude a certain String from variable in regex

Hi I have a Stylesheet where i use xsl:analyze-string with the following regex:
(&journal_abbrevs;)[\s ]*([0-9]{{4}})[,][\s ][S]?[\.]?[\s ]?([0-9]{{1,4}})([\s ][(][0-9]{{1,4}}[)])?
You don't need to look at the whole thing :)
&journal_abbrevs; looks like this:
"example-String1|example-String2|example-String3|..."
What I need to do know is exclude one of the strings in &journal_abbrevs; from this regex. E.g. I don't want example-String1 to be matched.
Any ideas on how to do that ?
It seems XSLT regex does not support look-around. So I don't think you'll be able to get a solution for this that does not involve writing out all strings from journal_abbrevs in your regex. Related question.
To minimize the amount of writing out, you could split journal_abbrevs into say journal_abbrevs1, journal_abbrevs2 and journal_abbrevs3 (or how many you decide to use) and only write out whichever one that contains the string you wish to exclude. If journal_abbrevs1 contains the string, you'd then end up with something like:
((&journal_abbrevs2;)|(&journal_abbrevs3;)|example-String2|example-String3|...)...
If it supported look-around, you could've used a very simple:
(?!example-String1)(&journal_abbrevs;)...

Regex to find common letters between two strings

I've been searching on Google for a few hours and got a partial solution.
I'm new to both Groovy and regular expressions. I've used regex sporadically over the years, but I am far from comfortable with it.
I've got a simple game that checks how many letters you have in common with a hidden word.
For simplicity's sake, let's say the word is "pan" and the person types "can".
I want the result of the regex to give me "an".
Right now, I've got this partly working by doing this (in Groovy):
// Where "guess" is the user's try and "word" is the word they need to guess.
def expr = "[$word]"
def result = guess.find(expr)
The result string contains only the first matching letter. Anyone have any more elegant solutions?
Thanks in advance
I think this is no use case for a regex. You'll have to take care of things like not leting the user guess automatically if he enters .* or the like.
Typical collection work is better suited for this task IMO. One solution would be to find the intersection of both words treating them as sets of characters:
(word as Set).intersect(guess as Set).join()
Or filtering the guess' characters that appear in the secret word:
guess.findAll { word.contains(it) }.unique().join()
Suppose the two strings are s1 and s2
now to find the common string do:
commonString=s1.replaceAll("[^"+s2+"]","");
and if your word contain meta-character then
first do:
Pattern.quote(s2);
and then
commonString=s1.replaceAll("[^"+s2+"]","");
You could try:
guess.findAll( /[$word]/ ).join()