How to match multiple characters in a regular expression? - regex

I have the string
7,456.23%
where I would like to use a regular expression to match BOTH the comma(,) and percent(%) characters and remove them so the result is
7456.23
I can figure out how to match one character or the other, but not both.

Simply use Character Classes or Character Sets
With a "character class", also called "character set", you can tell the regex engine to match only one out of several characters.
Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae].
System.out.println("7,456.23%".replaceAll("[,%]",""));
OR try with ORing (Alternation Operator)
System.out.println("7,456.23%".replaceAll(",|%",""));

Related

Regular Expression for not allowing two consecutive special characters

What i am trying to do is to not allow two consecutive special characters like &* or *$ or &&, but it should allow special characters in between strings like Hello%Mr&.
What i have tried so far:
^(([\%\/\\\&\?\,\'\;\:\!\-])\2?(?!\2))+$
^(?!.*[\%\/\\\&\?\,\'\;\:\!\-]{2}).*$
The idea is to use a negative lookahead ((?!)) to verify that nowhere in the string (.*) are there two consecutive "special" characters ([...]{2}). Afterwards, you just match the entire string (.*).
You can use this kind of pattern:
\A\W?(?>\w+\W)*\w*\z
or
\A[%/\\&?,';:!-]?(?>[^%/\\&?,';:!-]+[%/\\&?,';:!-])*[^%/\\&?,';:!-]*\z
or
\A[^\p{L}\p{N}\s]?(?>[\p{L}\p{N}\s]+[^\p{L}\p{N}\s])*[\p{L}\p{N}\s]*\z
or
\A[^a-zA-Z0-9 ]?(?>[a-zA-Z0-9 ]+...
depending of what do you call a "special character".

Regular Expression for a alphanumeric after a text

This is my regular expression
(\b(serial|sheet))+(\s(number|code|no))+?\b
For the input :
Serial no
sheet no
Sheet Number
Requirement is to parse the text which contain:
Serial no : 2424ABC
Sheet No 5 (Without colon)
Sheet No : 5
Serial No = 5335ABC
How to escape a assignment character (if available) and parse the next alphanumeric character?
This should work:
(\b(serial|sheet))+(\s(number|code|no))+?\b\s*[:=#~– ]*(.*)
You can try it here : https://regex101.com/r/rO2cX1/1
To escape a assignment character, do \=.
To parse the alphanumeric characters, do [a-zA-Z0-9]* or simply \w*.
If the = is optional, you could replace the \s in the regular expression with [=\s] to allow either a space or an equals. Perhaps better and matching your example try \s=?\s*.
If may characters might be between the word and the number then perhaps use \s[-=#~_]?\s*. Note the - goes at the start, otherwise it will be interpreted as a range of characters. Namely [a-f] means [abcdef], ie any of those six characters, whereas [-af] means any of those three characters.
Hence the regular expression becomes:
(\b(serial|sheet))+(\s[-=#~_]?\s*(number|code|no))+?\b
Try the following pattern:
(serial\s+no|sheet\s*no)(\s*\:\s*)([a-z0-9]+)
Demo.
You can add further cases to the pattern in first group. I covered two cases separated by |.
You can find the alphanumeric value in last group of this pattern.
Please note that, this pattern is written as a case-insensitive pattern.

Regular expressions, can I exclude pairs of characters?

How do you exclude pairs of characters from a regular expression?
I am trying to get a regular expression that will have 5 alphanumeric characters followed by
anything except "XX" and "AD", followed by XX.
So
D22D0ACXX
will match, but the following two will not match
D22D0ADXX
D22D0XXXX.
My first attempt was :
([A-Z0-9]{5}[^(?AD)|(?XX)]XX)
But this treats the character classes part [^(?AD)|(?XX)] as one character, so I end up with the last 8 characters, not all 9.
Can I exclude pairs of characters without getting into back references?
I need to capture the whole group, hence the outer parenthesis. The negative lookahead suggestions don't seem to do this.
Use negative lookahead:
([A-Z0-9]{5}(?!(AD|XX)XX).{4})
Don't treat it as a character class, instead, think of it as an alternation with a negative lookahead, e.g:
([A-Z0-9]{5}(?!(AD|XX)XX))
Then, if you need the tail, include it after the lookhead, e.g:
([A-Z0-9]{5}(?!(AD|XX)XX)[A-Z0-9]{4})

regex why aren't these two the same?

[\w+\.]{3}
and
\w+\.\w+\.\w+\.
the former matches "dra"
later matches "dragon.is.awesome"
What am I not understanding right about them?
Input text looks like
i know dragon.is.awesome but
i know dragon.is.awesome.because, he is awesome
i know dragon.sucks.because, he is not awesome
i know dragon.is.dead, someone killed him
so i need to match any combination of groupings that are of the pattern \w+.
Because the first one is a character class.
[\w+/\.]
matches either one \w, or one + or one / or one literal .. If you want to shorten the latter, use normal parentheses:
(\w+\.){3}
Note that within character classes, most meta-characters lose their meaning. So + and . and * (for example) can all be contained and matched without being escaped.
[...] is a character class. It matches one character. [\w+\.] matches one character which is either a "word" character (letter, number, or underscore), or a plus, or a dot. [\w+\.]{3} matches three such characters in a row.
[] is a character class, not a subpattern. [abc] Matches a single a, b or c.
You probably meant (\w+\.){3}, which does match the same as your second regex.

String negation using regular expressions

Is it possible to do string negation in regular expressions? I need to match all strings that do not contain the string "..". I know you can use ^[^\.]*$ to match all strings that do not contain "." but I need to match more than one character. I know I could simply match a string containing ".." and then negate the return value of the match to achieve the same result but I just wondered if it was possible.
You can use negative lookaheads:
^(?!.*\.\.).*$
That causes the expression to not match if it can find a sequence of two periods anywhere in the string.
^(?:(?!\.\.).)*$
will only match if there are no two consecutive dots anywhere in the string.