I have want to match a string which starts with number, followed by any characters and ends with .html;
I have tried the following:
/([0-9]*[^\.html]*.html)/g
But Regexr for an example like "21212dfsd.htmlfdf.html" says 2 matches?! Why is that?
Thanks
You get two matches because of the * quantifier following the character class. * means match the preceding token "zero or more" times. Use + instead, meaning "one or more".
You can't place whole words inside of a character class as well. A character class matches any one character from a set of characters and the dot . needs to be escaped (it's a character of special meaning).
You can use the below regular expression:
/\d+.*?\.html/g
Related
Is there a way to have a regular expression to match anything but certain characters? Say for example the only characters that aren't allowed is the * character. Rather than list out all possibly characters allowed in the regular expression is there anything that will say "everything not equal to * is allowed".
You can use the negated class character that you can use by [^]. So, for your case you can use:
^[^*]+$
A useful debuggex graph to see this is:
You can check more about the theory on negated class. Below you can find a quotation explaining this.
Negated Character Classes
Typing a caret after the opening square bracket negates the character class. The result is that the character class matches any character that is not in the character class. Unlike the dot, negated character classes also match (invisible) line break characters. If you don't want a negated character class to match line breaks, you need to include the line break characters in the class. [^0-9\r\n] matches any character that is not a digit or a line break.
It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the "character that is not a u" that is matched by the negated character class in the above regexp. If you want the regex to match the q, and only the q, in both strings, you need to use negative lookahead: q(?!u).
[^*] Any single character except: *
Whenever I had to work with regular expressions I usually go to rubular.com and test my attempts. It also has some examples, pretty usefull
This is explained in the manual.
The solution is:
"[^*]*"
I have some large number of strings which starts like DD_filename.
How can I extract the characters before _ using regular expression.
I tried learning using from here and in that it is given a.b will retrieve characters starting from a and ending on b
I tried similarly ^._ but it is not working for me.
^._ will only match one character before _. Try this pattern:
^.*?(?=_)
Starting from the beginning of the string, capture all non-underscore characters:
"^[^_]*"
The first ^ (caret) character means that the match starts from the beginning of the string. The brackets allow you to define a set of possible characters (character class). The second ^ character means "not". So the character class is "not underscore". The star means "zero or more". So in plain English: "match from the start of the string zero or more non underscore characters".
You can try something like
.*?(?=_)
. matches any character and *? is a reluctant quantifier. (?=_) is a positive lookahead to ensure our match is followed by an _.
If you want to only extract characters that occur at the beginning of a string you can add the ^ anchor: ^.*?(?=_). ^ matches the position before the first character in the string.
Just capture all characters that are not an underscore:
"[^_]*"
Regular Expression to get all characters before "-"
Check out #stema's answer. He gives four ways to do this, but the first is probably the best.
Match result = Regex.Match(text, #"^.*?(?=-)");
Console.WriteLine(result);
[\w+\.]{3}
and
\w+\.\w+\.\w+\.
the former matches "dra"
later matches "dragon.is.awesome"
What am I not understanding right about them?
Input text looks like
i know dragon.is.awesome but
i know dragon.is.awesome.because, he is awesome
i know dragon.sucks.because, he is not awesome
i know dragon.is.dead, someone killed him
so i need to match any combination of groupings that are of the pattern \w+.
Because the first one is a character class.
[\w+/\.]
matches either one \w, or one + or one / or one literal .. If you want to shorten the latter, use normal parentheses:
(\w+\.){3}
Note that within character classes, most meta-characters lose their meaning. So + and . and * (for example) can all be contained and matched without being escaped.
[...] is a character class. It matches one character. [\w+\.] matches one character which is either a "word" character (letter, number, or underscore), or a plus, or a dot. [\w+\.]{3} matches three such characters in a row.
[] is a character class, not a subpattern. [abc] Matches a single a, b or c.
You probably meant (\w+\.){3}, which does match the same as your second regex.
I'm new to regular expression and I having trouble finding what "\'.-" means.
'/^[A-Z \'.-]{2,20}$/i'
So far from my research, I have found that the regular expression starts (^) and requires two to twenty ({2,20}) alphabetical (A-Z) characters. The expression is also case insensitive (/i).
Any hints about what "\'.-" means?
The character class is the entire expression [A-Z \'.-], meaning any of A-Z, space, single quote, period, or hyphen. The \ is needed to protect the single quote, since it's also being used as the string quote. This charclass must be repeated 2 to 20 times, and because of the leading ^ and trailing $ anchors that must be the entire content of the matching string.
It means to escape the single quote (') that delmits the regex (as to not prematurely end the string), and then a . which means a literal . and a - which means a literal -.
Inside of the character range, the . is treated literally, and if the - isn't part of a valid range, e.g. a-z, then it is treated literally as well.
Your regex says Match the characters a-zA-Z '.- between 2 and 20 times as the entire string, with an optional trailing \n.
This regex is in a string. The backslash is there to escape the single quote so the string doesn't end early, in the middle of the regex. The dot and dash are just what they are, a period and a dash.
So, you were nearly right, except it's 2-20 characters that are letters, space, single quote, period, or dash.
It's quoting the quote.
The regular expression is ^[A-Z'.-]{2,20}$.
In the programming language you are using, you write it as a quoted string:
'SOMETHING'
To get a single quote in there, it's been backslashed.
Everything inside the square brackets is part of the character class, and will match a single character listed. In your example, the characters listed are the letters A through Z, a space, a single quote, a period, or a hyphen. (Note the hyphen must be listed last to avoid indicating a range, like A-Z.) Your full regular expression will match between 2 and 20 of the listed characters. The single quote is needed so the compiler knows you are not ending the string that defines the regular expression.
Some examples of things this will match:
....................
abaca af - .
AAfa- - ..
.z
And so on.
I have one question related with regular expression. In my case, I have to make sure that
first letter is alphabet, second onwards it can be any alphanumeric + some special characters.
Regards,
Anto
Try something like this:
^[a-zA-Z][a-zA-Z0-9.,$;]+$
Explanation:
^ Start of line/string.
[a-zA-Z] Character is in a-z or A-Z.
[a-zA-Z0-9.,$;] Alphanumeric or `.` or `,` or `$` or `;`.
+ One or more of the previous token (change to * for zero or more).
$ End of line/string.
The special characters I have chosen are just an example. Add your own special characters as appropriate for your needs. Note that a few characters need escaping inside a character class otherwise they have a special meaning in the regular expression.
I am assuming that by "alphabet" you mean A-Z. Note that in some other countries there are also other characters that are considered letters.
More information
Character Classes
Repetition
Anchors
Try this :
/^[a-zA-Z]/
where
^ -> Starts with
[a-zA-Z] -> characters to match
I think the simplest answer is to pick and match only the first character with regex.
String str = "s12353467457458";
if ((""+str.charAt(0)).matches("^[a-zA-Z]")){
System.out.println("Valid");
}