could you explain to me this regular expression? - regex

Sure it's a regular expression's newbie question, I saw it in a program but I can't understand the part of the two backslashes, Does "\\" have a special meaning like \r or \t?
[a-zA-Z]+\\.?
Thank you

The backslash (\) is the escape character in your regular expression pattern which is why \r and \t work, they are regular characters preceded with the escape character to denote a special character you can't just type on your keyboard. To tell the pattern matcher that it should look for an actual backslash, which is what your pattern is doing, you have to escape it thus creating \\.

Yes, \ is the escape character in Regex. \\ means \ and \. means a single dot while a single dot means any character. If you see that inside a string in a language like C, the double slashes will be picked up by the language compiler and the string will be really \. which will be parsed by the regex engine as a single dot.

Related

modify regex to include the backslash and single quote (so that backslash behaves as an escaping character)

I have the following string:
arg1('value1') arg2('value '')2') arg3('value\'3')
The regex to extract the value looks like:
boost::regex re_arg_values("('[^']*(?:''[^']*)*')");
Now this regex is not able to extract 'value\'3'. How can I modify the regex to consider \' inside the parenthesis as well.
FYI. The value can contain spaces, special characters, and also tabs. The code is in CPP.
Thanks in advance.
boost::regex re_arg_values( "\('([^'\\]|''|\\.)*'\)" );
The \(' and '\) match the bounds.
The (, |, and )* for matching any of the given patterns
The [^'\\] matches normal characters.
The '' matches a pair of single quotes.
The \\. matches any escaped character (including stacked backslashes).

how regular expression starts from dot

My string is Car.audi=true I wanna select audi alone. If i use . regular expression take it as any single character. How do I match string?
In order to accept a special character as a literal in regular expressions, you must usually place the character in a character class. This may vary by implementation but generally, the regex [.] will match a single literal period character.
Alternatively you may escape the period with a backslash, as in the regex \. which should also match a literal dot.

Need help understanding this particular regular expression [^.]

[^.]+\.(txt|html)
I am learning regex, and am trying to parse this.
[^.] The ^ means "not", and the dot is a wildcard that means any character, so this means find a match with "not any character"? I still don't understand this. Can anyone explain?
The plus is a Kleene Plus which means "1 or more". So now it's "one or more" "not any character".
I get \., it means a period.
(txt|html) means match with a txt file or html file. I think I understand everything after the plus sign. What I don't understand is why it doesn't look something the DOS equivalent where I can just do this: *.txt or *.(txt|html) where * means everything that ends in the file extension .txt or .html?
Is [^.] the equivalent of * in DOS?
The dot (.) has no special meaning when it's inside a character class, and doesn't require to be escaped.
[^.] means "any character that is not a literal . character". [^.]+ matches one or more occurrences of any character that is not a dot.
From regular-expressions.info:
In most regex flavors, the only special characters or meta-characters inside a character class are the closing bracket (]), the backslash (\), the caret (^), and the hyphen (-). The usual meta-characters are normal characters inside a character class, and do not need to be escaped by a backslash. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.
. is not special inside [] character class. [^.]+ means one or more occurrences (+) of any character which is not a dot.
If you do *.txt it would not be valid regex as * would not get a character to repeat (zero or more times).

Regular Expression Explanation [\w-\.]

What does this RegExp mean please?
[\w-\.]
I know the \w stands for word characters and could alternatively be written as:
[A-Za-z0-9_]
I know the \. means that the point will be treated as an ordinary character.
The only thing I don't really know is the hyphen character. Is this used as a Range Operator here or just the hyphen character in e.g. "fine-tune"?
Hyphen here is just the hyphen character.
Hyphen is treated as a range operator only when it is between two other characters.
Hyphen is normal character here, so it works as [a-zA-z0-9_-\.] (number, letters, and these three characters: -_.).

Does a dot have to be escaped in a character class (square brackets) of a regular expression?

A dot . in a regular expression matches any single character. In order for regex to match a dot, the dot has to be escaped: \.
It has been pointed out to me that inside square brackets [] a dot does not have to be escaped. For example, the expression:
[.]{3} would match ... string.
Doesn't it, really? And if so, is it true for all regex standards?
In a character class (square brackets) any character except ^, -, ] or \ is a literal.
This website is a brilliant reference and has lots of info on the nuances of different regex flavours.
http://www.regular-expressions.info/refcharclass.html