Related
I've seen lots of examples of making an entire regular expression case-insensitive. What I'm wondering about is having just part of the expression be case-insensitive.
For example, let's say I have a string like this:
fooFOOfOoFoOBARBARbarbarbAr
What if I want to match all occurrences of "foo" regardless of case but I only want to match the upper-case "BAR"s?
The ideal solution would be something that works across regex flavors but I'm interested in hearing language-specific ones as well (Thanks Espo)
Edit
The link Espo provided was very helpful. There's a good example in there about turning modifiers on and off within the expression.
For my contrived example, I can do something like this:
(?i)foo*(?-i)|BAR
which makes the match case-insensitive for just the foo portion of the match.
That seemed to work in most regex implementations except Javascript, Python, and a few others (as Espo mentioned).
The big ones that I was wondering about (Perl, PHP, .NET) all support inline mode changes.
Perl lets you make part of your regular expression case-insensitive by using the (?i:) pattern modifier.
Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?ism) in the middle of the regex, the modifier only applies to the part of the regex to the right of the modifier. You can turn off modes by preceding them with a minus sign. All modes after the minus sign will be turned off. E.g. (?i-sm) turns on case insensitivity, and turns off both single-line mode and multi-line mode.
Not all regex flavors support this. JavaScript and Python apply all mode modifiers to the entire regular expression. They don't support the (?-ismx) syntax, since turning off an option is pointless when mode modifiers apply to the whole regular expressions. All options are off by default.
You can quickly test how the regex flavor you're using handles mode modifiers. The regex (?i)te(?-i)st should match test and TEst, but not teST or TEST.
Source
It is true one can rely on inline modifiers as described in Turning Modes On and Off for Only Part of The Regular Expression:
The regex (?i)te(?-i)st should match test and TEst, but not teST or TEST.
However, a bit more supported feature is an (?i:...) inline modifier group (see Modifier Spans). The syntax is (?i:, then the pattern that you want to make cas-insensitive, and then a ).
(?i:foo)|BAR
The reverse: If your pattern is compiled with a case insensitive option and you need to make a part of a regex case sensitive, you add - after ?: (?-i:...).
Example uses in various languages (wrapping the matches with angle brackets):
php - preg_replace("~(?i:foo)|BAR~", '<$0>', "fooFOOfOoFoOBARBARbarbarbAr") (demo)
python - re.sub(r'(?i:foo)|BAR', r'<\g<0>>', 'fooFOOfOoFoOBARBARbarbarbAr') (demo) (note Python re supports inline modifier groups since Python 3.6)
c# / vb.net / .net - Regex.Replace("fooFOOfOoFoOBARBARbarbarbAr", "(?i:foo)|BAR", "<$&>") (demo)
java - "fooFOOfOoFoOBARBARbarbarbAr".replaceAll("(?i:foo)|BAR", "<$0>") (demo)
perl - $s =~ s/(?i:foo)|BAR/<$&>/g (demo)
ruby - "fooFOOfOoFoOBARBARbarbarbAr".gsub(/(?i:foo)|BAR/, '<\0>') (demo)
r - gsub("((?i:foo)|BAR)", "<\\1>", "fooFOOfOoFoOBARBARbarbarbAr", perl=TRUE) (demo)
swift - "fooFOOfOoFoOBARBARbarbarbAr".replacingOccurrences(of: "(?i:foo)|BAR", with: "<$0>", options: [.regularExpression])
go - (uses RE2) - regexp.MustCompile(`(?i:foo)|BAR`).ReplaceAllString( "fooFOOfOoFoOBARBARbarbarbAr", `<${0}>`) (demo)
Not supported in javascript, bash, sed, c++ std::regex, lua, tcl.
In these case, you can put both letter variants into a character class (not a group, see Why is a character class faster than alternation?). Examples:
sed posix-ere - sed -E 's/[Ff][Oo][Oo]|BAR/<&>/g' file > outfile (demo)
grep posix-ere - grep -Eo '[Ff][Oo][Oo]|BAR' file (or if you are using GNU grep, you can still use the PCRE regex, grep -Po '(?i:foo)|BAR' file (demo))
What language are you using? A standard way to do this would be something like /([Ff][Oo]{2}|BAR)/ with case sensitivity on, but in Java, for example, there is a case sensitivity modifier (?i) which makes all characters to the right of it case insensitive and (?-i) which forces sensitivity. An example of that Java regex modifier can be found here.
Unfortunately syntax for case-insensitive matching is not common.
In .NET you can use RegexOptions.IgnoreCase flag or ?i modifier
You could use
(?:F|f)(?:O|o)(?:O|o)
The ?: in the brackets in .Net means it's non-capturing, and just used to group the terms of the | (or) statement.
How can I make the following regex ignore case sensitivity? It should match all the correct characters but ignore whether they are lower or uppercase.
G[a-b].*
Assuming you want the whole regex to ignore case, you should look for the i flag. Nearly all regex engines support it:
/G[a-b].*/i
string.match("G[a-b].*", "i")
Check the documentation for your language/platform/tool to find how the matching modes are specified.
If you want only part of the regex to be case insensitive (as my original answer presumed), then you have two options:
Use the (?i) and [optionally] (?-i) mode modifiers:
(?i)G[a-b](?-i).*
Put all the variations (i.e. lowercase and uppercase) in the regex - useful if mode modifiers are not supported:
[gG][a-bA-B].*
One last note: if you're dealing with Unicode characters besides ASCII, check whether or not your regex engine properly supports them.
Depends on implementation
but I would use
(?i)G[a-b].
VARIATIONS:
(?i) case-insensitive mode ON
(?-i) case-insensitive mode OFF
Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?im) in the middle of the regex then the modifier only applies to the part of the regex to the right of the modifier. With these flavors, you can turn off modes by preceding them with a minus sign (?-i).
Description is from the page:
https://www.regular-expressions.info/modifiers.html
regular expression for validate 'abc' ignoring case sensitive
(?i)(abc)
The i flag is normally used for case insensitivity. You don't give a language here, but it'll probably be something like /G[ab].*/i or /(?i)G[ab].*/.
Just for the sake of completeness I wanted to add the solution for regular expressions in C++ with Unicode:
std::tr1::wregex pattern(szPattern, std::tr1::regex_constants::icase);
if (std::tr1::regex_match(szString, pattern))
{
...
}
JavaScript
If you want to make it case insensitive just add i at the end of regex:
'Test'.match(/[A-Z]/gi) //Returns ["T", "e", "s", "t"]
Without i
'Test'.match(/[A-Z]/g) //Returns ["T"]
In JavaScript you should pass the i flag to the RegExp constructor as stated in MDN:
const regex = new RegExp('(abc)', 'i');
regex.test('ABc'); // true
As I discovered from this similar post (ignorecase in AWK), on old versions of awk (such as on vanilla Mac OS X), you may need to use 'tolower($0) ~ /pattern/'.
IGNORECASE or (?i) or /pattern/i will either generate an error or return true for every line.
C#
using System.Text.RegularExpressions;
...
Regex.Match(
input: "Check This String",
pattern: "Regex Pattern",
options: RegexOptions.IgnoreCase)
specifically: options: RegexOptions.IgnoreCase
[gG][aAbB].* probably simples solution if the pattern is not too complicated or long.
Addition to the already-accepted answers:
Grep usage:
Note that for greping it is simply the addition of the -i modifier. Ex: grep -rni regular_expression to search for this 'regular_expression' 'r'ecursively, case 'i'nsensitive, showing line 'n'umbers in the result.
Also, here's a great tool for verifying regular expressions: https://regex101.com/
Ex: See the expression and Explanation in this image.
References:
man pages (man grep)
http://droptips.com/using-grep-and-ignoring-case-case-insensitive-grep
In Java, Regex constructor has
Regex(String pattern, RegexOption option)
So to ignore cases, use
option = RegexOption.IGNORE_CASE
Kotlin:
"G[a-b].*".toRegex(RegexOption.IGNORE_CASE)
You also can lead your initial string, which you are going to check for pattern matching, to lower case. And using in your pattern lower case symbols respectively .
You can practice Regex In Visual Studio and Visual Studio Code using find/replace.
You need to select both Match Case and Regular Expressions for regex expressions with case. Else [A-Z] won't work.enter image description here
How can I make the following regex ignore case sensitivity? It should match all the correct characters but ignore whether they are lower or uppercase.
G[a-b].*
Assuming you want the whole regex to ignore case, you should look for the i flag. Nearly all regex engines support it:
/G[a-b].*/i
string.match("G[a-b].*", "i")
Check the documentation for your language/platform/tool to find how the matching modes are specified.
If you want only part of the regex to be case insensitive (as my original answer presumed), then you have two options:
Use the (?i) and [optionally] (?-i) mode modifiers:
(?i)G[a-b](?-i).*
Put all the variations (i.e. lowercase and uppercase) in the regex - useful if mode modifiers are not supported:
[gG][a-bA-B].*
One last note: if you're dealing with Unicode characters besides ASCII, check whether or not your regex engine properly supports them.
Depends on implementation
but I would use
(?i)G[a-b].
VARIATIONS:
(?i) case-insensitive mode ON
(?-i) case-insensitive mode OFF
Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?im) in the middle of the regex then the modifier only applies to the part of the regex to the right of the modifier. With these flavors, you can turn off modes by preceding them with a minus sign (?-i).
Description is from the page:
https://www.regular-expressions.info/modifiers.html
regular expression for validate 'abc' ignoring case sensitive
(?i)(abc)
The i flag is normally used for case insensitivity. You don't give a language here, but it'll probably be something like /G[ab].*/i or /(?i)G[ab].*/.
Just for the sake of completeness I wanted to add the solution for regular expressions in C++ with Unicode:
std::tr1::wregex pattern(szPattern, std::tr1::regex_constants::icase);
if (std::tr1::regex_match(szString, pattern))
{
...
}
JavaScript
If you want to make it case insensitive just add i at the end of regex:
'Test'.match(/[A-Z]/gi) //Returns ["T", "e", "s", "t"]
Without i
'Test'.match(/[A-Z]/g) //Returns ["T"]
In JavaScript you should pass the i flag to the RegExp constructor as stated in MDN:
const regex = new RegExp('(abc)', 'i');
regex.test('ABc'); // true
As I discovered from this similar post (ignorecase in AWK), on old versions of awk (such as on vanilla Mac OS X), you may need to use 'tolower($0) ~ /pattern/'.
IGNORECASE or (?i) or /pattern/i will either generate an error or return true for every line.
C#
using System.Text.RegularExpressions;
...
Regex.Match(
input: "Check This String",
pattern: "Regex Pattern",
options: RegexOptions.IgnoreCase)
specifically: options: RegexOptions.IgnoreCase
[gG][aAbB].* probably simples solution if the pattern is not too complicated or long.
Addition to the already-accepted answers:
Grep usage:
Note that for greping it is simply the addition of the -i modifier. Ex: grep -rni regular_expression to search for this 'regular_expression' 'r'ecursively, case 'i'nsensitive, showing line 'n'umbers in the result.
Also, here's a great tool for verifying regular expressions: https://regex101.com/
Ex: See the expression and Explanation in this image.
References:
man pages (man grep)
http://droptips.com/using-grep-and-ignoring-case-case-insensitive-grep
In Java, Regex constructor has
Regex(String pattern, RegexOption option)
So to ignore cases, use
option = RegexOption.IGNORE_CASE
Kotlin:
"G[a-b].*".toRegex(RegexOption.IGNORE_CASE)
You also can lead your initial string, which you are going to check for pattern matching, to lower case. And using in your pattern lower case symbols respectively .
You can practice Regex In Visual Studio and Visual Studio Code using find/replace.
You need to select both Match Case and Regular Expressions for regex expressions with case. Else [A-Z] won't work.enter image description here
How can I make the following regex ignore case sensitivity? It should match all the correct characters but ignore whether they are lower or uppercase.
G[a-b].*
Assuming you want the whole regex to ignore case, you should look for the i flag. Nearly all regex engines support it:
/G[a-b].*/i
string.match("G[a-b].*", "i")
Check the documentation for your language/platform/tool to find how the matching modes are specified.
If you want only part of the regex to be case insensitive (as my original answer presumed), then you have two options:
Use the (?i) and [optionally] (?-i) mode modifiers:
(?i)G[a-b](?-i).*
Put all the variations (i.e. lowercase and uppercase) in the regex - useful if mode modifiers are not supported:
[gG][a-bA-B].*
One last note: if you're dealing with Unicode characters besides ASCII, check whether or not your regex engine properly supports them.
Depends on implementation
but I would use
(?i)G[a-b].
VARIATIONS:
(?i) case-insensitive mode ON
(?-i) case-insensitive mode OFF
Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?im) in the middle of the regex then the modifier only applies to the part of the regex to the right of the modifier. With these flavors, you can turn off modes by preceding them with a minus sign (?-i).
Description is from the page:
https://www.regular-expressions.info/modifiers.html
regular expression for validate 'abc' ignoring case sensitive
(?i)(abc)
The i flag is normally used for case insensitivity. You don't give a language here, but it'll probably be something like /G[ab].*/i or /(?i)G[ab].*/.
Just for the sake of completeness I wanted to add the solution for regular expressions in C++ with Unicode:
std::tr1::wregex pattern(szPattern, std::tr1::regex_constants::icase);
if (std::tr1::regex_match(szString, pattern))
{
...
}
JavaScript
If you want to make it case insensitive just add i at the end of regex:
'Test'.match(/[A-Z]/gi) //Returns ["T", "e", "s", "t"]
Without i
'Test'.match(/[A-Z]/g) //Returns ["T"]
In JavaScript you should pass the i flag to the RegExp constructor as stated in MDN:
const regex = new RegExp('(abc)', 'i');
regex.test('ABc'); // true
As I discovered from this similar post (ignorecase in AWK), on old versions of awk (such as on vanilla Mac OS X), you may need to use 'tolower($0) ~ /pattern/'.
IGNORECASE or (?i) or /pattern/i will either generate an error or return true for every line.
C#
using System.Text.RegularExpressions;
...
Regex.Match(
input: "Check This String",
pattern: "Regex Pattern",
options: RegexOptions.IgnoreCase)
specifically: options: RegexOptions.IgnoreCase
[gG][aAbB].* probably simples solution if the pattern is not too complicated or long.
Addition to the already-accepted answers:
Grep usage:
Note that for greping it is simply the addition of the -i modifier. Ex: grep -rni regular_expression to search for this 'regular_expression' 'r'ecursively, case 'i'nsensitive, showing line 'n'umbers in the result.
Also, here's a great tool for verifying regular expressions: https://regex101.com/
Ex: See the expression and Explanation in this image.
References:
man pages (man grep)
http://droptips.com/using-grep-and-ignoring-case-case-insensitive-grep
In Java, Regex constructor has
Regex(String pattern, RegexOption option)
So to ignore cases, use
option = RegexOption.IGNORE_CASE
Kotlin:
"G[a-b].*".toRegex(RegexOption.IGNORE_CASE)
You also can lead your initial string, which you are going to check for pattern matching, to lower case. And using in your pattern lower case symbols respectively .
You can practice Regex In Visual Studio and Visual Studio Code using find/replace.
You need to select both Match Case and Regular Expressions for regex expressions with case. Else [A-Z] won't work.enter image description here
I've seen lots of examples of making an entire regular expression case-insensitive. What I'm wondering about is having just part of the expression be case-insensitive.
For example, let's say I have a string like this:
fooFOOfOoFoOBARBARbarbarbAr
What if I want to match all occurrences of "foo" regardless of case but I only want to match the upper-case "BAR"s?
The ideal solution would be something that works across regex flavors but I'm interested in hearing language-specific ones as well (Thanks Espo)
Edit
The link Espo provided was very helpful. There's a good example in there about turning modifiers on and off within the expression.
For my contrived example, I can do something like this:
(?i)foo*(?-i)|BAR
which makes the match case-insensitive for just the foo portion of the match.
That seemed to work in most regex implementations except Javascript, Python, and a few others (as Espo mentioned).
The big ones that I was wondering about (Perl, PHP, .NET) all support inline mode changes.
Perl lets you make part of your regular expression case-insensitive by using the (?i:) pattern modifier.
Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?ism) in the middle of the regex, the modifier only applies to the part of the regex to the right of the modifier. You can turn off modes by preceding them with a minus sign. All modes after the minus sign will be turned off. E.g. (?i-sm) turns on case insensitivity, and turns off both single-line mode and multi-line mode.
Not all regex flavors support this. JavaScript and Python apply all mode modifiers to the entire regular expression. They don't support the (?-ismx) syntax, since turning off an option is pointless when mode modifiers apply to the whole regular expressions. All options are off by default.
You can quickly test how the regex flavor you're using handles mode modifiers. The regex (?i)te(?-i)st should match test and TEst, but not teST or TEST.
Source
It is true one can rely on inline modifiers as described in Turning Modes On and Off for Only Part of The Regular Expression:
The regex (?i)te(?-i)st should match test and TEst, but not teST or TEST.
However, a bit more supported feature is an (?i:...) inline modifier group (see Modifier Spans). The syntax is (?i:, then the pattern that you want to make cas-insensitive, and then a ).
(?i:foo)|BAR
The reverse: If your pattern is compiled with a case insensitive option and you need to make a part of a regex case sensitive, you add - after ?: (?-i:...).
Example uses in various languages (wrapping the matches with angle brackets):
php - preg_replace("~(?i:foo)|BAR~", '<$0>', "fooFOOfOoFoOBARBARbarbarbAr") (demo)
python - re.sub(r'(?i:foo)|BAR', r'<\g<0>>', 'fooFOOfOoFoOBARBARbarbarbAr') (demo) (note Python re supports inline modifier groups since Python 3.6)
c# / vb.net / .net - Regex.Replace("fooFOOfOoFoOBARBARbarbarbAr", "(?i:foo)|BAR", "<$&>") (demo)
java - "fooFOOfOoFoOBARBARbarbarbAr".replaceAll("(?i:foo)|BAR", "<$0>") (demo)
perl - $s =~ s/(?i:foo)|BAR/<$&>/g (demo)
ruby - "fooFOOfOoFoOBARBARbarbarbAr".gsub(/(?i:foo)|BAR/, '<\0>') (demo)
r - gsub("((?i:foo)|BAR)", "<\\1>", "fooFOOfOoFoOBARBARbarbarbAr", perl=TRUE) (demo)
swift - "fooFOOfOoFoOBARBARbarbarbAr".replacingOccurrences(of: "(?i:foo)|BAR", with: "<$0>", options: [.regularExpression])
go - (uses RE2) - regexp.MustCompile(`(?i:foo)|BAR`).ReplaceAllString( "fooFOOfOoFoOBARBARbarbarbAr", `<${0}>`) (demo)
Not supported in javascript, bash, sed, c++ std::regex, lua, tcl.
In these case, you can put both letter variants into a character class (not a group, see Why is a character class faster than alternation?). Examples:
sed posix-ere - sed -E 's/[Ff][Oo][Oo]|BAR/<&>/g' file > outfile (demo)
grep posix-ere - grep -Eo '[Ff][Oo][Oo]|BAR' file (or if you are using GNU grep, you can still use the PCRE regex, grep -Po '(?i:foo)|BAR' file (demo))
What language are you using? A standard way to do this would be something like /([Ff][Oo]{2}|BAR)/ with case sensitivity on, but in Java, for example, there is a case sensitivity modifier (?i) which makes all characters to the right of it case insensitive and (?-i) which forces sensitivity. An example of that Java regex modifier can be found here.
Unfortunately syntax for case-insensitive matching is not common.
In .NET you can use RegexOptions.IgnoreCase flag or ?i modifier
You could use
(?:F|f)(?:O|o)(?:O|o)
The ?: in the brackets in .Net means it's non-capturing, and just used to group the terms of the | (or) statement.