Using boost::regex to match two whole words - c++

This seems like a really simple problem, but regardless of what I try the expression can't read the names.
The task here is to match two strings of random length (someone's name) then an id number after words, in this format: Joe Blow 123-456-678
I'm using boost::regex_search for this.
So far I have tried these expressions and they haven't worked..
"\\w{15}? \\s? \\w{15}? \\s? \\d{3}-\\d{3}-\\d{3}"
"\\w* \\s \\w* \\s \\d{3}-\\d{3}-\\d{3}"
"\\w+ \\s \\w+ \\s \\d{3}-\\d{3}-\\d{3}"
I tried a few other small variations of that as well but nothing has worked. This is the first time ever using regex, so if some of you are pros and this is stupidly simple, please go easy on me.

Try using
"^[a-zA-Z]+? [a-zA-Z]+? \\d{3}-\\d{3}-\\d{3}$"
and see if it works or not

Related

"REGEX" Match string not containing specific substring

I will give an example, I have two strings:
FL_0DS906555B_3661_27012221225012_V001_S
FL_0DS906555C_3661_27012221225012_V001_S
And I want to get any string, that has no "0DS906555B" in it, has "2701222122" in it and "5012" is in range of 5003-5012.
My regex looks like this:
^.*(?!.*0DS906555B).{6}2701222122(500[3-9]|501[0-2]).*$
unfortunately it keeps matching everything all the time. I have looked into many posts here but nothing helped for me since people usually asked for less complex, smaller strings.
Thank you
Try (regex101):
^(?!.*0DS906555B)(?=.*_2701222122(?:500[3-9]|501[012])_).*$

Matching within matches by extending an existing Regex

I'm trying to see if its possible to extend an existing arbitrary regex by prepending or appending another regex to match within matches.
Take the following example:
The original regex is cat|car|bat so matching output is
cat
car
bat
I want to add to this regex and output only matches that start with 'ca',
cat
car
I specifically don't want to interpret a whole regex, which could be quite a long operation and then change its internal content to match produce the output as in:
^ca[tr]
or run the original regex and then the second one over the results. I'm taking the original regex as an argument in python but want to 'prefilter' the matches by adding the additional code.
This is probably a slight abuse of regex, but I'm still interested if it's possible. I have tried what I know of subgroups and the following examples but they're not giving me what I need.
Things I've tried:
^ca(cat|car|bat)
(?<=ca(cat|car|bat))
(?<=^ca(cat|car|bat))
It may not be possible but I'm interested in what any regex gurus think. I'm also interested if there is some way of doing this positionally if the length of the initial output is known.
A slightly more realistic example of the inital query might be [a-z]{4} but if I create (?<=^ca([a-z]{4})) it matches against 6 letter strings starting with ca, not 4 letter.
Thanks for any solutions and/or opinions on it.
EDIT: See solution including #Nick's contribution below. The tool I was testing this with (exrex) seems to have a slight bug that, following the examples given, would create matches 6 characters long.
You were not far off with what you tried, only you don't need a lookbehind, but rather a lookahead assertion, and a parenthesis was misplaced. The right thing is: Put the original pattern in parentheses, and prepend (?=ca):
(?=ca)(cat|car|bat)
(?=ca)([a-z]{4})
In the second example (without | alternative), the parentheses around the original pattern wouldn't be required.
Ok, thanks to #Armali I've come to the conclusion that (?=ca)(^[a-z]{4}$) works (see https://regexr.com/3f4vo). However, I'm trying this with the great exrex tool to attempt to produce matching strings, and it's producing matches that are 6 characters long rather than 4. This may be a limitation of exrex rather than the regex, which seems to work in other cases.
See #Nick's comment.
I've also raised an issue on the exrex GitHub for this.

Regex: Non fixed-width look around assertions?

My college asked my to provide him with a regex that only matches if the test-string endswith
.rar or .part1.rar or part01.rar or part001.rar (and so on).
Should match:
foo.part1.rar
xyz.part01.rar
archive.rar
part3_is_the_best.rar
Should not match:
foo.r61
bar.part03.rar
test.sfv
I immediately came up with the regex \.(part0*1\.)?rar$. But this does match for bar.part03.rar.
Next I tried to add a negative look behind assertion: .*(?<!part\d*)\.(part\0*1\.)?rar$ That didn't work either, because look around assertions need to be fixed width.
Then I tried using a regex-conditional. But that didn't work either.
So my question: Can this even be solved by using pure regex?
An answer should either contain a link to regex101.com providing a working solution, or explain why it can't work by using pure regex.
You could use lookahead to verify the one case that fails your original regex (.rar with .part part that isn't 0*1) is discredited:
^(?!.*\.part0*[^1]\.rar$).*\.(part0*1\.)?rar$
See it in action
This is an old question, but here's another approach:
(?:\.part0*1\.rar|^(?<!\.)\w+\.rar)$
The idea is to match either:
A string that ends with .part0*1.rar (ie foo.part01.rar, foo.part1.rar, bar.part001.rar), OR
A string that ends with .rar and doesn't contain any other dots (.) before that.
Works on all your test cases, plus your extra foo.part19.rar.
https://regex101.com/r/EyHhmo/2

select area within characters using regex (spaces are an issue)

Some other guy asked a similar question earlier which got a lot of down votes, and I was interested in solving it. I came to a similar issue and would like some help with it.
Take into consideration this wall of text:
__don't__ and __do it__
__yellow__
__green__ and __purple__
I would like to select all the area within the underscores __'s
I attempted the following regex:
/__[!-~]+__/g which worked great on most things. I would like to add the ability to have spaces within the underscores. __do it__ will not be encapsulated in the search because it includes a space which was ruled out by the regex. I attempted the following:
/__[ -~]+__/g
It didn't work as planned, and selected everything from the very first __ to the very last. I was wondering how to tell the regex it has reached the end of a search once it sees a space after a __.
Here is the regex you could play around with below:
http://regexr.com/39br7
I tried using __[^ ]/g at the end but It didn't seem to help.
You could simply use the below regex,
__[^_]*__
DEMO
__(.*?)__
This seems to work.Look at the demo.
http://regex101.com/r/lJ1jB1/1

Regex pattern not matching for integers followed by strings

I want to create a regex that would start with a integer number and then it might have a colon followed by a string. For example, it should pass for:
123
123:e43e
123:444+:343
I tried using the regex as:
String timeZoneRegex = "^\\d+[:(=[a-zA-Z+-:0-9]+)]*";
This does not work; appreciate any help here.
I have to say that some regexp features depend on the regexp engine, but try with:
\d+(\:[a-zA-Z0-9\-+]+)*
I've given a look to your express, you've made some mistake, maybe the most relevat one is the use of embeded [], you should know that inside the squared brackets the behaviour of symbols intepretation is a little different. This is a very good source if you want to learn them. Cheers.