Regex workaround for applications that ignore case - regex

I am trying to build a regex match in an application that ignore case in its queries. Unlike, say, Notepad++, in which one can turn a Case match off and on, this application (BitCurator) simply ignores Case. This, needless to add, makes case searching very difficult. For example, I want to search for the string (S) and do not want (s), which means something entirely different in this context.
\(S\)
in BitCurator returns both (s) and (S)!
Does anyone know of a workaround?
I suppose if the text string preceding (s) were consistent, I could omit those matches, but I am not sure that is the case (no pun intended). Thanks!

In regex flavors like C#, Java you can use:
`(?i)` to enable case insensitive match.
`(?-i)` to disable case insensitive match.
In PHP you can use:
`(?iJ)` to enable case insensitive match.
`(?-i)` to disable case insensitive match.
I can explain it using notepad++
If you have a document with:
Test
test
If you search with regex:
(?-i)Test ... then it will find the upper case word.
(?-i)test ... then it will find the lower case word.
No matter if you have match case enabled or not in the search window.
The same way most regex engines can be used.
I just tried it out and it worked:
bulk_extractor -f '(?-i)test' -o ~/Desktop/output ~/Desktop/input.txt
The results are written into ~/Desktop/output/find.txt

Related

Correct syntax for REGEXEXTRACT, where condition comes from certain cell and case insensitive parameter

I have a REGEXMATCH expression, where the regex comes from a certain cell. Like this: =REGEXMATCH(A2, $B$2). In B2, the regex looks like .*abc.*|.*def.*.
I tried to adjust regex with (?i).*abc.*|.*def.*, but it seems like only abc becomes case insensitive.
Should I add (?i) to each part of pipelined regex? - I have many of them, it would be not funny job. Or is it possible anyhow to add (?i) once to all regex parts?
use:
=REGEXMATCH(A2; "(?i).*abc.*|.*def.*")
If you need your REGEXEXTRACT to match regardless of case, and you are looking for the shortest path (least edits), you could consider converting the string you are testing to lowercase inside your formula. ie:
=REGEXEXTRACT(LOWER(A2), $B$2)
This way, your regex is always operating on an all lowercase string, making it case insensitive as a result.

Ignore the case in part of the search pattern in Vim

In the next search on Vim I would like to ignore the case of the first letter:
/[tlcp]omo
I'd like to know how the case can be ignored for only the first letter of the search pattern.
Vim has the following options to ignore the case in the search pattern:
:set ignorecase
:set smartcase [ignore case if no uppercase in search]
or use \c it at any position in the search pattern:
/hello\c => [find hello and HELLO]
But all of these options ignore the case in the entire pattern, not in part.
One option to ignore the case of a single letter in the search pattern is, using the [] collection of regular expression, to specifically capitalize each letter:
/[tTlLcCpP]omo
But, is there any way to ignore the case in a part of the search pattern without having to specify each and every upper and lower case character using regular expression?
In general, this isn't possible in Vim. The /\c and /\C regexp modifiers unfortunately turn the whole pattern into case (in-)sensitive matching, regardless of where they are placed. (Introducing a new set of modifiers that only work from that position onwards would in my opinion be the best solution.)
Most people usually get around this by using lower/uppercase collections for the insensitive parts, /like [tT][hH][iI][sS]/.
You could also go the opposite route and instead force certain characters to a case (using /\l for lowercase and /\u for uppercase), /\c\%(\l\l\l\l\&like\) this/.
My CmdlineSpecialEdits plugin has (among many others) a CTRL-G c mapping that converts a pattern in the search command-line in such a way that those alphabetic characters between \c...\C become case-insensive matches while the rest remains case-sensitive. In other words, it converts the pattern as if \c and \C would only apply to following atoms, and not the entire pattern.
Example
/My \cfoo\C is \cbad!/
becomes
/My [fF][oO][oO] is [bB][aA][dD]!/
or alternatively
/\c\%(\u\&M\)\%(\l\&y\) foo\%(\l\{2}\&is\) bad!/

regular expression match case sensitive off

I have a regular expression that finds two words in a line.
The problem is that it is case sensitive. I need to edit it so that it matches both the case.
reqular expression
^(.*?(\bPain\b).*?(\bfever\b)[^$]*)$
You can use RegexOptions.Ignorecase to set case insensitive matching mode. This way you make the entire pattern case insensitive. The same effect can be achieved with (?i) inline option at the beginning of the pattern:
(?i)^(.*?(\bPain\b).*?(\bfever\b)[^$]*)$
You can use the inline flag to only set case insensitive mode to part of a pattern:
^(.*?(\b(?i:Pain)\b).*?(\b(?i:fever)\b)[^$]*)$
Or you can just match "pain" or "Pain" with
^(.*?(\b(?i:P)ain\b).*?(\bfever\b)[^$]*)$
Another alternative is using character classes [Pp], etc.
Note that you do not have to set a capturing group round the whole pattern, you will have access to it via rx.Match(str).Groups(0).Value.
^.*?(\b[pP]ain\b).*?(\b[Ff]ever\b)[^$]*$
You can usually set a flag for that, depending on your language, or you can mess up your regex into a more ugly looking one using multiple character classes. [pP][aA][iI][nN] is essentially the word "pain" without it being case sensitive at all.
Well, if you're using VB.net, you can tell the regex object to ignore the case sensitivity when you create it
'Defines the pattern
Dim MyPattern As String = "BlaBla"
'Create a new instance of the regex class with the above pattern
'and the option to ignore the casing
Dim Regex As New Regex(MyPattern, RegexOptions.IgnoreCase)

Regex for a specific number or specific word

For a security question on a form, I want the user to enter either 4 or four, or any variation of the latter.
Right now I have this regex /\b4|four\b/gi that is a variation of one I've found on this site. The problem is that the user can enter 458 or something. So, can somebody help me out?
So you should be using a case-insensitve comparison for this. Some (perhaps most) regex flavors will support the pattern (?i) to denote case insensitivity.
^(?i)(?:4|four)$
But if this is JavaScript then you can use a syntax more like what you started with...
/^(?:4|four)$/i
The /i is for case insensitivity in this case. But I removed /g since it's for global matching and wouldn't be needed here.
Notice that I also put 4|four inside a (?:non capturing) group. This is more efficient than using a traditional (capturing) group when you don't need to do anything with the captured value.
Then the ^ and $ anchors surrounding everything will ensure you have no extra leading or following characters.
Try:
^4|four$
which will match "4" and "four". Depending on the programming language that you use there might be a case insensitive option like C#'s RegexOptions.IngoreCase
I'm not sure what you mean by "any variation of the latter", but:
/^(4|four)$/i
will match the entire string being either 4 or four. The ^ matches only at the start and $ only at the end. /i means case insensitive (so FOUR would be accepted as well). Some languages don't take flags like that (in them, you'll have to check the docs on how to do an insensitive match). If you can't use a case-insensitive match, you can do this instead:
/^(4|[fF][oO][uU][rR])$/i
Of course, whatever language you're working in probably also has equality comparisons. So you could just do (for example)
if (str == "4" || str == "four")

Is it possible to transform to lowercase using Eclipse's regex search and replace?

I've tried '\L' but it doesn't seem to be part of its grammar, and its help makes no mention of any available transforms.
I'm aware that I can and know how to do this from the command line but...
Is it possible to do case transforms using Eclipse's regex search and replace?
You can do it in two steps. First you insert the alphabet in lowercase. Then you only keep the right letter:
reg expr 1: ([A-Z])
replacement 1: abcdefghijklmnopqrstuvwxyz$1
reg expr 2: (a)bcdefghijklmnopqrstuvwxyzA|a(b)cdefghijklmnopqrstuvwxyzB|ab(c)defghijklmnopqrstuvwxyzC|abc(d)efghijklmnopqrstuvwxyzD|abcd(e)fghijklmnopqrstuvwxyzE|abcde(f)ghijklmnopqrstuvwxyzF|abcdef(g)hijklmnopqrstuvwxyzG|abcdefg(h)ijklmnopqrstuvwxyzH|abcdefgh(i)jklmnopqrstuvwxyzI|abcdefghi(j)klmnopqrstuvwxyzJ|abcdefghij(k)lmnopqrstuvwxyzK|abcdefghijk(l)mnopqrstuvwxyzL|abcdefghijkl(m)nopqrstuvwxyzM|abcdefghijklm(n)opqrstuvwxyzN|abcdefghijklmn(o)pqrstuvwxyzO|abcdefghijklmno(p)qrstuvwxyzP|abcdefghijklmnop(q)rstuvwxyzQ|abcdefghijklmnopq(r)stuvwxyzR|abcdefghijklmnopqr(s)tuvwxyzS|abcdefghijklmnopqrs(t)uvwxyzT|abcdefghijklmnopqrst(u)vwxyzU|abcdefghijklmnopqrstu(v)wxyzV|abcdefghijklmnopqrstuv(w)xyzW|abcdefghijklmnopqrstuvw(x)yzX|abcdefghijklmnopqrstuvwx(y)zY|abcdefghijklmnopqrstuvwxy(z)Z
replacement 2: $1$2$3$4$5$6$7$8$9$10$11$12$13$14$15$16$17$18$19$20$21$22$23$24$25$26
And you can transform to UPPERCASE like this:
reg expr 1: ([a-z])
replacement 1: ABCDEFGHIJKLMNOPQRSTUVWXYZ$1
reg expr 2: (A)BCDEFGHIJKLMNOPQRSTUVWXYZa|A(B)CDEFGHIJKLMNOPQRSTUVWXYZb|AB(C)DEFGHIJKLMNOPQRSTUVWXYZc|ABC(D)EFGHIJKLMNOPQRSTUVWXYZd|ABCD(E)FGHIJKLMNOPQRSTUVWXYZe|ABCDE(F)GHIJKLMNOPQRSTUVWXYZf|ABCDEF(G)HIJKLMNOPQRSTUVWXYZg|ABCDEFG(H)IJKLMNOPQRSTUVWXYZh|ABCDEFGH(I)JKLMNOPQRSTUVWXYZi|ABCDEFGHI(J)KLMNOPQRSTUVWXYZj|ABCDEFGHIJ(K)LMNOPQRSTUVWXYZk|ABCDEFGHIJK(L)MNOPQRSTUVWXYZl|ABCDEFGHIJKL(M)NOPQRSTUVWXYZm|ABCDEFGHIJKLM(N)OPQRSTUVWXYZn|ABCDEFGHIJKLMN(O)PQRSTUVWXYZo|ABCDEFGHIJKLMNO(P)QRSTUVWXYZp|ABCDEFGHIJKLMNOP(Q)RSTUVWXYZq|ABCDEFGHIJKLMNOPQ(R)STUVWXYZr|ABCDEFGHIJKLMNOPQR(S)TUVWXYZs|ABCDEFGHIJKLMNOPQRS(T)UVWXYZt|ABCDEFGHIJKLMNOPQRST(U)VWXYZu|ABCDEFGHIJKLMNOPQRSTU(V)WXYZv|ABCDEFGHIJKLMNOPQRSTUV(W)XYZw|ABCDEFGHIJKLMNOPQRSTUVW(X)YZx|ABCDEFGHIJKLMNOPQRSTUVWX(Y)Zy|ABCDEFGHIJKLMNOPQRSTUVWXY(Z)z
replacement 2: $1$2$3$4$5$6$7$8$9$10$11$12$13$14$15$16$17$18$19$20$21$22$23$24$25$26
Hmm, I don't think it is possible. \L and even \u seem not to work as I expected (at least in Indigo). Maybe it would be better to do that outside of eclipse and then refresh de workspace.
PS: If you feel bored you can search for them and user CTRL+SHIFT+Y and CTRL+SHIFT+X to change the case :P