Ignore specific characters in regex - regex

I have this method to check if a string contains a special character, but I don't want it to check for specific characters such as (+ or -) how would I go about doing this?
public boolean containsSpecialCharacters(String teamName) {
Pattern p = Pattern.compile("[^a-z0-9 ]", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(teamName);
boolean b = m.find();
if (b) {
return true;
}
return false;
}

You can try this:
[^\w +-]
REGEX EXPLANATION
[^\w +-]
Match a single character NOT present in the list below «[^\w +-]»
A word character (letters, digits, and underscores) «\w»
The character “ ” « »
The character “+” «+»
The character “-” «-»

You can use the following. Simply add these characters inside of your negated character class.
Within a character class [], you can place a hyphen (-) as the first or last character. If you place the hyphen anywhere else you need to escape it (\-) in order to be matched.
Pattern p = Pattern.compile("(?i)[^a-z0-9 +-]");
Regular expression:
(?i) # set flags for this block (case-insensitive)
[^a-z0-9+-] # any character except: 'a' to 'z', '0' to '9', ' ', '+', '-'

Related

Regex Pattern - Groovy

I need to create a regex - with the following requirements
starts with C, D, F, G, I, M or P
has at least one underscore (_)
eg. C6352_3
I've tried the following like this
#Pattern(regexp = '^(\C|\D|\F|\G|\I\|\M|\P)+\_*' , message = "error")
You may use
/^[CDFGIMP][^_\s]*_\S*$/
Or, to only handle word chars (letters, digits and _),
/^[CDFGIMP]\w*_\w*$/
or a bit more efficient one with character class subtraction:
/^[CDFGIMP][\w&&[^_]]*_\w*$/
See the regex demo
Details
^ - start of a string
[CDFGIMP] - any char listed in the character set
[^_\s]* - zero or more chars other than _ and whitespace
\w* - matches 0+ word chars: letters, digits or _ ([\w&&[^_]]* matches 0+ letters and digits only)
_ - an underscore
\S* - 0+ non-whitespace chars (or \w* will match any letters, digits or _)
$ - end of string (or better, \z to only match at the very end of the string).
You could skip regex, and make it readable:
boolean valid(String value) {
(value?.take(1) in ['C', 'D', 'F', 'G', 'I', 'M', 'P']) && value?.contains('_')
}

How can I replace the last word using Regex?

I have a String extension:
func replaceLastWordWithUsername(_ username: String) -> String {
let pattern = "#*[A-Za-z0-9]*$"
do {
Log.info("Replacing", self, username)
let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)
let range = NSMakeRange(0, self.characters.count)
return regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: username )
} catch {
return self
}
}
let oldString = "Hey jess"
let newString = oldString.replaceLastWordWithUsername("#jessica")
newString now equals Hey #jessica #jessica. The expected result should be Hey #jessica
I think it's because the * regex operator will
Match 0 or more times. Match as many times as possible.
This might be causing it to also match the 'no characters at the end' in addition to the word at the end, resulting in two replacements.
As mentioned by #Code Different, if you use let pattern = "\\w+$" instead, it will only match if there are characters, eliminating the 'no characters' match.
"Word1 Word2"
^some characters and then end
^0 characters and then end
Use this regex:
(?<=\s)\S+$
Sample: https://regex101.com/r/kGnQEM/1
/(?<=\s)\S+$/g
Positive Lookbehind (?<=\s)
Assert that the Regex below matches
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\S+ matches any non-whitespace character (equal to [^\r\n\t\f ])
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string, or before the line
terminator right at the end of the string (if any)
Just change your pattern:
let pattern = "\\w+$"
\w matches any word character, i.e [A-Za-z0-9]
+ means one or more

(Vim regex) Following by anything except bracket character

Test string:
best.string_a = true;
best.string_b + bad.string_c;
best.string_d ();
best.string_e );
I want to catch string that after '.' and followed by anything except '('. My expression:
\.\#<=[_a-z]\+\(\s*[^(]\)\#=
I want :
string_a
string_b
string_c
string_e
But it doesn't work and result :
string_a
string_b
string_c
string_d
string_e
I am new to vim regex and i dont know why :(
Make this \.\#<=\<[_a-z]\+\>\(\s*(\)\#!
This matches:
\.\#<= Assure a dot is in front of the match followed by
\<[_a-z]\+\> A word containing only lowercase or '_' chars
\(\s*(\)\#! not followed by (any amount of spaces in front of a '(')
this would work for your needs too:
\.\zs[_a-z]\+\>\ze\s*[^( ]

How to validate a string to have only certain letters by perl and regex

I am looking for a perl regex which will validate a string containing only the letters ACGT. For example "AACGGGTTA" should be valid while "AAYYGGTTA" should be invalid, since the second string has "YY" which is not one of A,C,G,T letters. I have the following code, but it validates both the above strings
if($userinput =~/[A|C|G|T]/i)
{
$validEntry = 1;
print "Valid\n";
}
Thanks
Use a character class, and make sure you check the whole string by using the start of string token, \A, and end of string token, \z.
You should also use * or + to indicate how many characters you want to match -- * means "zero or more" and + means "one or more."
Thus, the regex below is saying "between the start and the end of the (case insensitive) string, there should be one or more of the following characters only: a, c, g, t"
if($userinput =~ /\A[acgt]+\z/i)
{
$validEntry = 1;
print "Valid\n";
}
Using the character-counting tr operator:
if( $userinput !~ tr/ACGT//c )
{
$validEntry = 1;
print "Valid\n";
}
tr/characterset// counts how many characters in the string are in characterset; with the /c flag, it counts how many are not in the characterset. Using !~ instead of =~ negates the result, so it will be true if there are no characters not in characterset or false if there are characters not in characterset.
Your character class [A|C|G|T] contains |. | does not stand for alternation in a character class, it only stands for itself. Therefore, the character class would include the | character, which is not what you want.
Your pattern is not anchored. The pattern /[ACGT]+/ would match any string that contains one or more of any of those characters. Instead, you need to anchor your pattern, so that only strings that contain just those characters from beginning to end are matched.
$ can match a newline. To avoid that, use \z to anchor at the end. \A anchors at the beginning (although it doesn't make a difference whether you use that or ^ in this case, using \A provides a nice symmetry.
So, you check should be written:
if ($userinput =~ /\A [ACGT]+ \z/ix)
{
$validEntry = 1;
print "Valid\n";
}

Wrong return from Regex.IsMatch - Regular expression

I want to find in string a specific string surrounded by white spaces. For example I want receive the value true from:
Regex.IsMatch("I like ZaleK", "zalek",RegexOptions.IgnoreCase)
and value false from:
Regex.IsMatch("I likeZaleK", "zalek",RegexOptions.IgnoreCase)
Here is my code:
Regex.IsMatch(w_all_file, #"\b" + TB_string.Text.Trim() + #"\b", RegexOptions.IgnoreCase) ;
It does not work when in the w_all_file is string I am looking for followed by "-"
For example: if w_all_file = "I like zalek_" - the string "zalek" is not found, but if
w_all_file = "I like zalek-" - the string "zalek" is found
Any ideas why?
Thanks,
Zalek
The \b character in regex doesn't consider an underscore as word boundry. You might want to change it to something like this:
Regex.IsMatch(w_all_file, #"[\b_]" + TB_string.Text.Trim() + #"[\b_]", RegexOptions.IgnoreCase) ;
That's what you need?
string input = "type your name";
string pattern = "your";
Regex.IsMatch(input, " " + pattern + " ");
\b matches at a word boundary, which are defined as between a character that is included in \w and one that is not. \w is the same as [a-zA-Z0-9_], so it matches underscores.
So basically, \b will match after the "k" in zalek- but not in zalek_.
It sounds like you want the match to also fail on zalek-, which you can do by using lookaround. Just replace the \b at the beginning with (?<![\w-]), and replace the \b at the end with (?![\w-]):
Regex.IsMatch(w_all_file, #"(?<![\w-])" + TB_string.Text.Trim() + #"(?![\w-])", RegexOptions.IgnoreCase) ;
Note that if you add additional characters to the character class [\w-], you need to make sure that the "-" is the very last character, or that you escape it with a backslash (if you don't it will be interpreted as a range of characters).