My string is Car.audi=true I wanna select audi alone. If i use . regular expression take it as any single character. How do I match string?
In order to accept a special character as a literal in regular expressions, you must usually place the character in a character class. This may vary by implementation but generally, the regex [.] will match a single literal period character.
Alternatively you may escape the period with a backslash, as in the regex \. which should also match a literal dot.
Related
[^.]+\.(txt|html)
I am learning regex, and am trying to parse this.
[^.] The ^ means "not", and the dot is a wildcard that means any character, so this means find a match with "not any character"? I still don't understand this. Can anyone explain?
The plus is a Kleene Plus which means "1 or more". So now it's "one or more" "not any character".
I get \., it means a period.
(txt|html) means match with a txt file or html file. I think I understand everything after the plus sign. What I don't understand is why it doesn't look something the DOS equivalent where I can just do this: *.txt or *.(txt|html) where * means everything that ends in the file extension .txt or .html?
Is [^.] the equivalent of * in DOS?
The dot (.) has no special meaning when it's inside a character class, and doesn't require to be escaped.
[^.] means "any character that is not a literal . character". [^.]+ matches one or more occurrences of any character that is not a dot.
From regular-expressions.info:
In most regex flavors, the only special characters or meta-characters inside a character class are the closing bracket (]), the backslash (\), the caret (^), and the hyphen (-). The usual meta-characters are normal characters inside a character class, and do not need to be escaped by a backslash. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.
. is not special inside [] character class. [^.]+ means one or more occurrences (+) of any character which is not a dot.
If you do *.txt it would not be valid regex as * would not get a character to repeat (zero or more times).
I'm writing a regular expression in Java for capturing some word without spaces.
The word can contain only letter, number, hyphens and dot.
The character set [\w+\-\\.] work well.
Now I want to edit the set for allowing a single space after the dot.
How I have to edit my regular expression?
You can add an alternation that matches this additional requirement
([\w\-.]|(?<=\.) )+
See it here on Regexr
(?<=\.) is a lookbehind assertion. It ensures that space is only matched, if it is preceded by a dot.
Other hints:
\w contains the underscore and matches per default only ASCII letters/digits. If you care about Unicode, use either the modifier UNICODE_CHARACTER_CLASS to enable Unicode for \w or use the Unicode properties \p{L} and \p{Nd} to match Unicode letters and digits.
You don't need to escape the dot in a character class.
You have \w+ in your character class, are you aware, that you just add the "+" character to the accepted characters?
In case of a dot followed by a space, I suppose this pattern should be neither the first, nor the last in the matched string? You may want to enclose it in word boundaries \b:
([0-9A-Za-z-]|\b\.( \b)?)+
I deliberately did not use \w, to exclude underscores.
For allowing ONLY a single space after the dot you can use this regex:
^(?!.*?\. {2})[\w.-]+$
You don't need to escape dot OR hyphen inside character class
(?!.*?\. {2}) is a negative lookahead that disallows 2 or more spaces after a dot
/\ATo\:\s+(.*)/
Also, how do you work it out, what's the approach?
In multi-line regular expressions, \A matches the start of the string (and \Z is end of string, while ^/$ matches the start/end of the string or the start/end of a line). In single line variants, you just use ^ and $ for start and end of string/line since there is no distinction.
To is literal, \: is an escaped :.
\s means whitespace and the + means one or more of the preceding "characters" (white space in this case).
() is a capturing group, meaning everything in here will be stored in a "register" that you can use. Hence, this is the meat that will be extracted.
.* simply means any non newline character ., zero or more times *.
So, what this regex will do is process a string like:
To: paxdiablo
Re: you are so cool!
and return the text paxdiablo.
As to how to learn how to work this out yourself, the Perl regex tutorial(a) is a good start, and then practise, practise, practise :-)
(a) You haven't actually stated which regex implementation you're using but most modern ones are very similar to Perl. If you can find a specific tutorial for your particular flavour, that would obviously be better.
\A is a zero-width assertion and means "Match only at beginning of string".
The regex reads: On a line beginning with "To:" followed by one or more whitespaces (\s), capture the remainder of the line ((.*)).
First, you need to know what the different character classes and quantifiers are. Character classes are the backslash-prefixed characters, \A from your regex, for instance. Quantifiers are for instance the +. There are several references on the internet, for instance this one.
Using that, we can see what happens by going left to right:
\A matches a beginning of the string.
To matches the text "To" literally
\: escapes the ":", so it loses it's special meaning and becomes "just a colon"
\s matches whitespace (space, tab, etc)
+ means to match the previous class one or more times, so \s+ means one or more spaces
() is a capture group, anything matched within the parens is saved for later use
. means "any character"
* is like the +, but zero or more times, so .* means any number of any characters
Taking that together, the regex will match a string beginning with "To:", then at least one space, and the anything, which it will save. So, with the string "To: JaneKealum", you'll be able to extract "JaneKealum".
You start from left and look for any escaped (ie \A) characters. The rest are normal characters. \A means the start of the input. So To: must be matched at the very beginning of the input. I think the : is escaped for nothing. \s is a character group for all spaces (tabs, spaces, possibly newlines) and the + that follows it means you must have one or more space characters. After that you capture all the rest of the line in a group (marked with ( )).
If the input was
To: progo#home
the capture group would contain "progo#home"
It matches To: at the beginning of the input, followed by at least one whitespace, followed by any number of characters as a group.
The initial and trailing / characters delimit the regular expression.
A \ inside the expression means to treat the following character specially or treat it as a literal if it normally has a special meaning.
The \A means match only at the beginning of a string.
To means match the literal "To"
\: means match a literal ':'. A colon is normally a literal and has no special meaning it can be given.
\s means match a whitespace character.
+ means match as many as possible but at least one of whatever it follows, so \s+ means match one or more whitespace characters.
The ( and ) define a group of characters that will be captured and returned by the expression evaluator.
And finally the . matches any character and the * means match as many as possible but can be zero. Therefore the (.*) will capture all characters to the end of the input string.
So therefore the pattern will match a string that starts "To:" and capture all characters that occur after the first succeeding non-whitespace character.
The only way to really understand these things is to go through them one bit at a time and check the meaning of each component.
I'd like to understand what this line of JavaScript means...
(/^\w+, ?\w+, ?\w\.?$/)
i understand 'w stands for 'word', but need your help in understanding '/', '^', '+', '?', '.?$/'
Thank you..
That's a regular expression, not HTML.
It's inside of a regex literal (/.../) in Javascript.
^ matches the beginning of the string
\w matches any word character
+ matches one or more of the previous set.
? matches zero or one of the previous set (in this case a single space)
\. matches a .. (An unescaped . matches any single character)
$ matches the end of the string.
Let's break it down, because then it is easier to read:
^ beginning of the line
\w+ 1 or more 'word' characters
, a comma
? an optional space
\w+ 1 or more 'word' characters
, a comma
? an optional space
\w a single 'word' character
\.? an optional period
$ end of line
The meaning of a 'word' character is an alpha-numeric character or an underscore.
It is not HTML code but Regular Expression. Read more about it:
Regular expression
In computing, regular expressions,
also referred to as regex or regexp,
provide a concise and flexible means
for matching strings of text, such as
particular characters, words, or
patterns of characters. A regular
expression is written in a formal
language that can be interpreted by a
regular expression processor, a
program that either serves as a parser
generator or examines text and
identifies parts that match the
provided specification.
/^\w+, ?\w+, ?\w\.?$/
Outside in...
/ / delimiters
^ $ Matches the whole string (^ means to match the beginning, $ means to match the end)
One by one...
\w means word character (simply w doesn't match anything but the ASCII character w)
\w+ word characters (at least one, matches as much as possible)
? means the spaces are optional, matches 0 or 1 space character
. matches any character that is not a line break (can be configured with regex modifiers)
\. (like in the example) matches exactly one dot
It's a regular expression that looks for a string of word characters (like letters, digits, or underscores) that has two commas in it with an optional single space after each comma.
Sure it's a regular expression's newbie question, I saw it in a program but I can't understand the part of the two backslashes, Does "\\" have a special meaning like \r or \t?
[a-zA-Z]+\\.?
Thank you
The backslash (\) is the escape character in your regular expression pattern which is why \r and \t work, they are regular characters preceded with the escape character to denote a special character you can't just type on your keyboard. To tell the pattern matcher that it should look for an actual backslash, which is what your pattern is doing, you have to escape it thus creating \\.
Yes, \ is the escape character in Regex. \\ means \ and \. means a single dot while a single dot means any character. If you see that inside a string in a language like C, the double slashes will be picked up by the language compiler and the string will be really \. which will be parsed by the regex engine as a single dot.