Regex match whole string - c++

I have the following pattern:
[ \n\t]*([a-zA-Z][a-zA-Z0-9_]*)[ \n\t]+((char)[ \n\t]*\[[ \n\t]*([0-9]+)[ \t\n]*\]|(char)|(int)|(double)|(bool)|(blob)[ \n\t]*\[[ \n\t]*([0-9]+)[ \t\n]*\])[ \n\t]*
You can try it here: http://regex101.com/r/vA0xG9
In the first capturing group ([a-zA-Z][a-zA-Z0-9_]*), I want to grab words that only starts with a-zA-Z.
The two following strings matches equally:
cpf char[12]
,
9cpf char[12]
It ignores the 9 digit and matches equally to the first string.
I've tried to use this capturing group: (ˆ[a-zA-Z][a-zA-Z0-9_]*$), but it didn't work.
I'm using lib regex.h.
What should I do?
Thanks.

Put ^ at the beginning of the whole thing and $ at the end
^[ \n\t]*([a-zA-Z][a-zA-Z0-9_]*)[ \n\t]+((char)[ \n\t]*\[[ \n\t]*([0-9]+)[ \t\n]*\]|(char)|(int)|(double)|(bool)|(blob)[ \n\t]*\[[ \n\t]*([0-9]+)[ \t\n]*\])[ \n\t]*$
I would also suggest \s instead of [ \n\t] if you want to match whitespace.

In C++, there is a handy regex method that anchors the match to the whole string automatically: std::regex_match:
Determines if the regular expression e matches the entire target character sequence, which may be specified as std::string, a C-string, or an iterator pair.
This way, you will avoid issues with mistyped ^ as ˆ as well as cases when you have alternation (e.g. ^A|B$ won't match strings only equal to A or B, you need ^(A|B)$ or ^(?:A|B)$).
Note that there is an equivalent boost::regex_match method.

Related

Removing last character from a line using regex

I just started learning regex and I'm trying to understand how it possible to do the following:
If I have:
helmut_rankl:20Suzuki12
helmut1195:wasserfall1974
helmut1951:roller11
Get:
helmut_rankl:20Suzuki1
helmut1195:wasserfall197
helmut1951:roller1
I tried using .$ which actually match the last character of a string, but it doesn't match letters and numbers.
How do I get these results from the input?
You could match the whole line, and assert a single char to the right if you want to match at least a single character.
.+(?=.)
Regex demo
If you also want to match empty strings:
.*(?=.)
This will do what you want with regex's match function.
^(.*).$
Broken down:
^ matches the start of the string
( and ) denote a capturing group. The matches which fall within it are returned.
.* matches everything, as much as it can.
The final . matches any single character (i.e. the last character of the line)
$ matches the end of the line/input

Match pattern anywhere in string?

I want to match the following pattern:
Exxxx49 (where x is a digit 0-9)
For example, E123449abcdefgh, abcdefE123449987654321 are both valid. I.e., I need to match the pattern anywhere in a string.
I am using:
^*E[0-9]{4}49*$
But it only matches E123449.
How can I allow any amount of characters in front or after the pattern?
Remove the ^ and $ to search anywhere in the string.
In your case the * are probably not what you intended; E[0-9]{4}49 should suffice. This will find an E, followed by four digits, followed by a 4 and a 9, anywhere in the string.
I would go for
^.*E[0-9]{4}49.*$
EDIT:
since it fullfills all requirements state by OP.
"[match] Exxxx49 (where x is digit 0-9)"
"allow for any amount of characters in front or after pattern"
It will match
^.* everything from, including the beginning of the line
E[0-9]{4}49 the requested pattern
.*$ everthing after the pattern, including the the end of the line
Your original regex had a regex pattern syntax error at the first *. Fix it and change it to this:
.*E\d{4}49.*
This pattern is for matching in engines (most engines) that are anchored, like Java. Since you forgot to specify a language.
.* matches any number of sequences. As it surrounds the match, this will match the entire string as long as this match is located in the string.
Here is a regex demo!
Just simply use this:
E[0-9]{4}49
How do I allow for any amount of characters in front or after pattern? but it only matches E123449
Use global flag /E\d{4}49/g if supported by the language
OR
Try with capturing groups (E\d{4}49)+ that is grouped by enclosing inside parenthesis (...)
Here is online demo

Regular exp to match string from beginning until certain char is met

I have some long string where i'm trying to catch a substring until a certain character is met.
Lets suppose I have the following string, and I would like to get the text until the first ampersand.
abc.8965.aghtj&hgjkiyu5.8jfhsdj
I would like to extract what is present before the ampersand so: abc.8965.aghtj
W thought this would work:
grep'^.*&{1}'
I would translate it as
^ start of string
.* match whatever chars
&{1} until the first ampersand is matched
Any advice?
I'm afraid this will take me weeks
{1} does not match the first occurrence; instead it means "match exactly one of the preceding pattern/character", which is identical to just matching the character (&{3} would match &&&).
In order to match the first occurrence of &, you need to use .*?:
grep'^.*?&'
Normally, .* is greedy, meaning it matches as much as possible. This means your pattern would match the last ampersand rather than the first one. .*? is the non-greedy version, matching as little as possible while fulfilling the pattern.
Update: That syntax may not be supported by grep. Here is another option:
'^[^&]*&'
It matches anything that is not an ampersand, up to the first ampersand.
You also may have to enable extended regular expression in grep (-E).
Try this one:
^.*?(?=&)
it won't get ampersand sign, just a text before it

regular expressions boost c++

trying to catch the characters at the start the string and newlines the string is
.V/1LBOG\n.F/AV0094/08NOV/SAL/Y\n.E/0134249356001"
the regular expression i am using is from the string above i need to catch .V/ and .E/
^.[VE]/*
But it only seems to ctach .V/ can anyone see why as i thought ^ means newlines aswell as start of strings ? any help will be very gratefull as ive had this problem for a while now. If this is not the correct way as in doing this could you propose a different way.
Regex 101:
^ means start of string. And you guessed it right. There can only be one start of string.
^.[VE]/*
means :
Match start of string, followed by any character (other than newline), followed by either a V or a E, followed by 0 to n / (greedy).
Probably you want something like this :
\.[VE].*?(?:\\n|$)
Which means match a dot, followed by V or E and match everything until \n or end of string.
Comment if I am wrong.
So .V/1LBOG\n.F/AV0094/08NOV/SAL/Y\n.E/0134249356001"
Looks like this ?
.V/1LBOG
.F/AV0094/08NOV/SAL/Y
.E/0134249356001"
If yes, then you need to change your regex a little bit:
\.[VE].*
Abusing the fact that . does not match newlines by default.
. in regular expressions matches any single character, not a literal .. If you want to match a literal period, you need to escape it (\.). * doesn't match any number of any characters (as most shells would), but instead matches zero or more instances of whatever you put before it. For example, A* will match the literal letter A, AAAA etc., and .* will match any string.
^ means the beginning of a line. ^\.[VE]/ will match .V/ and .E/ (but only at the start of the line).
if you need .V or .E try ^.(V|E)/* the or | operator is useful for check ^.V/* or ^.E/*

regular expression no characters

I have this regular expression
([A-Z], )*
which should match something like
test, (with a space after the comma)
How to I change the regex expression so that if there are any characters after the space then it doesn't match.
For example if I had:
test, test
I'm looking to do something similar to
([A-Z], ~[A-Z])*
Cheers
Use the following regular expression:
^[A-Za-z]*, $
Explanation:
^ matches the start of the string.
[A-Za-z]* matches 0 or more letters (case-insensitive) -- replace * with + to require 1 or more letters.
, matches a comma followed by a space.
$ matches the end of the string, so if there's anything after the comma and space then the match will fail.
As has been mentioned, you should specify which language you're using when you ask a Regex question, since there are many different varieties that have their own idiosyncrasies.
^([A-Z]+, )?$
The difference between mine and Donut is that he will match , and fail for the empty string, mine will match the empty string and fail for ,. (and that his is more case-insensitive than mine. With mine you'll have to add case-insensitivity to the options of your regex function, but it's like your example)
I am not sure which regex engine/language you are using, but there is often something like a negative character groups [^a-z] meaning "everything other than a character".