Ignore the case in part of the search pattern in Vim - regex

In the next search on Vim I would like to ignore the case of the first letter:
/[tlcp]omo
I'd like to know how the case can be ignored for only the first letter of the search pattern.
Vim has the following options to ignore the case in the search pattern:
:set ignorecase
:set smartcase [ignore case if no uppercase in search]
or use \c it at any position in the search pattern:
/hello\c => [find hello and HELLO]
But all of these options ignore the case in the entire pattern, not in part.
One option to ignore the case of a single letter in the search pattern is, using the [] collection of regular expression, to specifically capitalize each letter:
/[tTlLcCpP]omo
But, is there any way to ignore the case in a part of the search pattern without having to specify each and every upper and lower case character using regular expression?

In general, this isn't possible in Vim. The /\c and /\C regexp modifiers unfortunately turn the whole pattern into case (in-)sensitive matching, regardless of where they are placed. (Introducing a new set of modifiers that only work from that position onwards would in my opinion be the best solution.)
Most people usually get around this by using lower/uppercase collections for the insensitive parts, /like [tT][hH][iI][sS]/.
You could also go the opposite route and instead force certain characters to a case (using /\l for lowercase and /\u for uppercase), /\c\%(\l\l\l\l\&like\) this/.

My CmdlineSpecialEdits plugin has (among many others) a CTRL-G c mapping that converts a pattern in the search command-line in such a way that those alphabetic characters between \c...\C become case-insensive matches while the rest remains case-sensitive. In other words, it converts the pattern as if \c and \C would only apply to following atoms, and not the entire pattern.
Example
/My \cfoo\C is \cbad!/
becomes
/My [fF][oO][oO] is [bB][aA][dD]!/
or alternatively
/\c\%(\u\&M\)\%(\l\&y\) foo\%(\l\{2}\&is\) bad!/

Related

regular expression match case sensitive off

I have a regular expression that finds two words in a line.
The problem is that it is case sensitive. I need to edit it so that it matches both the case.
reqular expression
^(.*?(\bPain\b).*?(\bfever\b)[^$]*)$
You can use RegexOptions.Ignorecase to set case insensitive matching mode. This way you make the entire pattern case insensitive. The same effect can be achieved with (?i) inline option at the beginning of the pattern:
(?i)^(.*?(\bPain\b).*?(\bfever\b)[^$]*)$
You can use the inline flag to only set case insensitive mode to part of a pattern:
^(.*?(\b(?i:Pain)\b).*?(\b(?i:fever)\b)[^$]*)$
Or you can just match "pain" or "Pain" with
^(.*?(\b(?i:P)ain\b).*?(\bfever\b)[^$]*)$
Another alternative is using character classes [Pp], etc.
Note that you do not have to set a capturing group round the whole pattern, you will have access to it via rx.Match(str).Groups(0).Value.
^.*?(\b[pP]ain\b).*?(\b[Ff]ever\b)[^$]*$
You can usually set a flag for that, depending on your language, or you can mess up your regex into a more ugly looking one using multiple character classes. [pP][aA][iI][nN] is essentially the word "pain" without it being case sensitive at all.
Well, if you're using VB.net, you can tell the regex object to ignore the case sensitivity when you create it
'Defines the pattern
Dim MyPattern As String = "BlaBla"
'Create a new instance of the regex class with the above pattern
'and the option to ignore the casing
Dim Regex As New Regex(MyPattern, RegexOptions.IgnoreCase)

Regex workaround for applications that ignore case

I am trying to build a regex match in an application that ignore case in its queries. Unlike, say, Notepad++, in which one can turn a Case match off and on, this application (BitCurator) simply ignores Case. This, needless to add, makes case searching very difficult. For example, I want to search for the string (S) and do not want (s), which means something entirely different in this context.
\(S\)
in BitCurator returns both (s) and (S)!
Does anyone know of a workaround?
I suppose if the text string preceding (s) were consistent, I could omit those matches, but I am not sure that is the case (no pun intended). Thanks!
In regex flavors like C#, Java you can use:
`(?i)` to enable case insensitive match.
`(?-i)` to disable case insensitive match.
In PHP you can use:
`(?iJ)` to enable case insensitive match.
`(?-i)` to disable case insensitive match.
I can explain it using notepad++
If you have a document with:
Test
test
If you search with regex:
(?-i)Test ... then it will find the upper case word.
(?-i)test ... then it will find the lower case word.
No matter if you have match case enabled or not in the search window.
The same way most regex engines can be used.
I just tried it out and it worked:
bulk_extractor -f '(?-i)test' -o ~/Desktop/output ~/Desktop/input.txt
The results are written into ~/Desktop/output/find.txt

How to jump to first uncommented out statement in vim?

I'm using /print to search for my uncommented-out print statements as I want to comment them out. I know I could use search and replace to first remove all the comments, and then apply them, but I simply want to find the next uncommented out print statement, and I can't work out how to.
E.g. I have :
#print fooVal
#... do stuff
#print barF
#... more stuff
print gold # <-- I want to use vim to jump straight to this line
I want to match this so I don't have to cycle through all the print statements (even the commented-out ones) just to find the one print statement that is without #.
I've tried using :s/^\s+print and /^print but vim does not like it. Also, I looked here, but I could not find the info.
In your case no spaces before print. So try to use '*' instead of '+'.
This works for me: /^\s*print
Less convenient pattern should highlight uncommented lines with print: /^[^#]*\s*print
In your case #Taky's solution is the best I think, but I noticed some comments on vim's regexp. I've studied this the last days, so perhaps it's a good idea to write it down.
In vimworld the use of e.g. *, ^ and $ as regexp special characters is called "magic" (set magic is default). As default + is not a regexp character and has to be used as \+ to mean "regexp +". However with small changes it can be "magic" too, read this: vim help - search patterns (the same as in vim editor, but as HTML, and with a good search function at the top).
See also vim help - pattern
Here is a little short guide (rules are overruling down the list):
settings:
:set ignorecase to ignore case in search and replace,
:set smartcase for ignoring case as long as no upper case letter is used (ignorecase must be on for it to work).
:set magic to be able to use some characters as regexp special characters, e.g. *, without having to precede them with \. This is default (I think).
Rules in search pattern (overrules settings)
\c ignore case, \C case sensitive
Use as e.g. /\cxxx for "ignore case", /\Cxxx for case sensitive (\c is the same as flag i in replace syntax as s/xxx/yyy/i.
\m use "magic" (same as setting magic), \M no "magic".
Here is the interesting part, to use + in vim patterns:
\v described as "very magic", that is what we all are used to when using regexp I think.
\V "very nomagic", ALL is literal, only \ has a special meaning.
Now, + can be used directly as in /\v\s+print (but for you it's better to use * in that particular case, + won't find "plugin" that starts the line).
E.g. also { expressions must have \ if \v isn't used.
Tip: do these mappings in .vimrc to always use \v in search patterns:
" To get 'normal' behavior for regexps (use "\V" to avoid)
nnoremap / /\v
vnoremap / /\v
(And it's very easy to just "backspace" away the \vwhen it's unwanted).

Regex for a specific number or specific word

For a security question on a form, I want the user to enter either 4 or four, or any variation of the latter.
Right now I have this regex /\b4|four\b/gi that is a variation of one I've found on this site. The problem is that the user can enter 458 or something. So, can somebody help me out?
So you should be using a case-insensitve comparison for this. Some (perhaps most) regex flavors will support the pattern (?i) to denote case insensitivity.
^(?i)(?:4|four)$
But if this is JavaScript then you can use a syntax more like what you started with...
/^(?:4|four)$/i
The /i is for case insensitivity in this case. But I removed /g since it's for global matching and wouldn't be needed here.
Notice that I also put 4|four inside a (?:non capturing) group. This is more efficient than using a traditional (capturing) group when you don't need to do anything with the captured value.
Then the ^ and $ anchors surrounding everything will ensure you have no extra leading or following characters.
Try:
^4|four$
which will match "4" and "four". Depending on the programming language that you use there might be a case insensitive option like C#'s RegexOptions.IngoreCase
I'm not sure what you mean by "any variation of the latter", but:
/^(4|four)$/i
will match the entire string being either 4 or four. The ^ matches only at the start and $ only at the end. /i means case insensitive (so FOUR would be accepted as well). Some languages don't take flags like that (in them, you'll have to check the docs on how to do an insensitive match). If you can't use a case-insensitive match, you can do this instead:
/^(4|[fF][oO][uU][rR])$/i
Of course, whatever language you're working in probably also has equality comparisons. So you could just do (for example)
if (str == "4" || str == "four")

Case sensitive and insensitive in the same pattern

Thanks to the help with my previous homework question Regex to match tags like <A>, <BB>, <CCC> but not <ABC>, but now I have another homework question.
I need to match tags like <LOL>, <LOLOLOL> (3 uppercase letters, with repeatable last two letters), but not <lol> (need to be uppercase).
Using the technique from the previous homework, I tried <[A-Z]([A-Z][A-Z])\1*>. This works, except there's an additional catch: the repeating part can be in mixed case!
So I need to also match <LOLolol>, <LOLOLOlol>, because it's 3 uppercase letters, with repeatable last two letters in mixed case. I know you can make a pattern case-insensitive with /i, and that will let me match <LOLolol> with the regex I have, but it will also now match <lololol>, because the check for the first 3 letters are also case-insensitive.
So how do I do this? How can I check the first 3 letters case sensitively, and then the rest of the letters case-insensitively? Is this possible with regex?
Yes! You can in fact do this in some flavors, using what is called embedded modifier. This puts the modifier in the pattern, and you can essentially select which parts of the pattern the modifiers apply to.
The embedded modifier for case insensitivity is (?i), so the pattern you want in this case is:
<[A-Z]([A-Z]{2})(?i:\1*)>
References
regular-expressions.info/Modifiers
Specifying Modes Inside The Regular Expression
Instead of /regex/i, you can also do /(?i)regex/
Turning Modes On and Off for Only Part of The Regular Expression
You can also do /first(?i)second(?-i)third/
Modifier Spans
You can also do /first(?i:second)third/