How do you match [ ] with regex? - regex

I thought I was doing \[/b\]
but the machine disagrees.

How do you match [ ] with regex?
\[ \] should do just fine. At least in the Java regular expression engine.
System.out.println("[ ]".matches("\\[ \\]")); // prints true
Not sure where you get the /b from. Perhaps you're after a "blank" character. The most common expression for whitespace characters is \s. I.e., you could do \[\s\].
(Matching balanced [ ] is another story though. A task which regular expression are not very well suited for.)

It's hard to answer well without knowing which flavor of regex you're using, but:
If you're writing a regular expression literal in a language (like JavaScript) that has them, then just put a backslash in front of the [ and ]. E.g.:
var re = /\[\/b\]/;
...creates a regular expression that will match a [ followed by a / followed by a b followed by a ]. (I had to escape the / because in JavaScript regular expression literals, of course the / is the delimiter.)
In languages where you use a string to specify the regular expression (Java, for instance), escaping can be confusing, because you have to escape with a backslash, but of course backslashes are special in strings and so you have to escape them. You end up with lots of them:
Pattern p = Pattern.compile("\\[/b\\]");
That creates a regex that does what the one above does, but note how we had to escape the escapes.

'^[a-z]' // Should do fine in UNIX and possibly PERL/PHP too.
Example one with the grep command (similar to find)
grep '^[A-Z].?' file.txt
Find words that begin with a capital letter and then any characters after whether capitals or not.
Hope that helps.
DL.

Related

Strange behavior when using regex to match parentheses in vim

I'm having some trouble understanding why a regular expression is not working. I'm searching for the phrase #Test(groups = {"broken"}), and I'm not able to find it with this expression:
#Test\(groups = {"broken"}\)
However, this expression yields results:
#Test\(.*groups = {"broken"}\)
Why is this happening? I can't see why the first expression would not work, but I understand why the second one does.
\( is used for capture in vim since it does not use extended/"magic" regexen by default. If you want to search for a literal paren, use (.
The second expression works because .* matches (.
If you want to search for literal text, just prepend \V to the search pattern; then, only the backslash has special meaning and must be escaped:
/\V#Test(groups = {"broken"})
In contrast to most other regular expression dialects, many Vim atoms need to be prefixed with \ to be non-literal. To make Vim's patterns look more like Perl's, you can prepend \v; then, (...) do capture grouping (as you've expected), and you need to escape \( to match literal parentheses.

Visual Studio Find and Replace with regex and single/double quotes

How to use VS Find/Replace to replace:
this: $('a[name="lnkFind"]').on('click', function
with this: $(document).on("click", "a[name='lnkFind']", function
I'm not sure which characters need to be escaped - single or double quotes or both? None of the patters I've tried seem to find a match.
You'll need to escape many of these characters.
Find/Replace will complain about the un-escaped ( and ), even the bare ( at the end because it's missing a matching ). Also the square brackets, which are used for character sets, and finally the $.
So this should work as the pattern:
\$\('a\[name="lnkFind"\]'\).on\('click', function
You should look at a list of special characters in Regular Expressions.
$, ., [, ] should all be escaped.
http://www.fon.hum.uva.nl/praat/manual/Regular_expressions_1__Special_characters.html
Except in special cases (such as vim regex), in general you can escape any and all special characters in regex to get their literal form, i.e. escaping a special character that doesn't need to be escaped, won't do any harm.
That said, here's the minimum that needs to be escaped:
\$\('a\[name="lnkFind"]')\.on\('click', function
I don't think you'll need to escape anything in the replacement, because only a $ or \ followed by a number will be interpreted.

How to replace the whitespace around certain characters?

I am working on some free text for that I need to do some data cleaning, I have a question (out of many, which I will ask later I am sure):
I need to replace the following combinations:
[ ; ] (space before and after the punctuation)
[;] (no space before and after the punctuation)
[ ;] (only space before the punctuation)
to
[; ] (only space after the punctuation)
...where the punctuation can be one of [;:,.]. How can I do this with a regex?
A possible expression would be:
\s?([;:,.])\s?
and depending on the programming language or tool you are using, you have to use $1, \\1 or \1 for the backreference and the replacement would be e.g. $1 (there is a space after 1).
Explanation:
\s? - match at most one whitespace character
(...) - capture group, storing the matched characters in a reference
[...] - character class, matching one of the characters inside
References: character class, capture group, quantifier
But again: The expression can differ, depending on the tool/language you are using. E.g. a similar expression for sed would look like:
/ *\([;:,.]\) */\1 /
but this would also trim the spaces around the punctuation (there is probably a better way, but I'm not so familiar with sed).
I would use \s*([;:,.])\s* and replace with '$1 ' (single quotes added to emphasise the space after the back-reference. It's a cross between Felix's first and last suggestion, so it could clean multiple spaces including tabs and newlines.
It depends on what language you're using on how to move it into the cleaned form, [; ], but you can match any of the punctuation marks by enclosing them in [], like [;:,.].
Once you have your pattern complete, you can replace the matches with your clean version. In at least Java, you could replace it with something like "\[$<GroupNumber> \]", with the <GroupNumber> referring to the parenthesized group with your punctuation mark, like 1, 2, 3, etc., based on the order of the groups.
Remember, depending on the language you're using, you might need to escape backslashes. If you are using Java, then for all the examples above, you need to use \\ in place of \.

How do you regex match some unicode character follow by bracket?

I am not too familiar with regex and hope someone could help.
example:
This is a sentence with some_unicode[some other word] and other stuff.
After removing the characters and brackets, the result should be:
This is a sentence with and other stuff.
Thank you!!
Search for
some_unicode\[[^\]]*\]
and replace with nothing.
Explanation:
\[: Match a literal [.
[: Match a character class with the following properties (here [ is a metacharacter, starting a character class)...
^\]: "any character except a literal ]" (^ at the start of a character class negates its contents).
]*: ...zero or more times. Note again the unescaped ], ending the character class.
\]: Match a literal ].
This of course will only work if there can be no brackets inside brackets. How to actually format and use the regex is highly dependent on the language/tool you're doing this with; so if you add another tag to your question specifying the language, I can give you a code example.
[ and ] are metacharacters in regular expressions and must be escaped by a backslash, e.g. \[.

Using escape characters inside grep

I have the following regular expression for eliminating spaces, tabs, and new lines: [^ \n\t]
However, I want to expand this for certain additional characters, such as > and <.
I tried [^ \n\t<>], which works well for now, but I want the expression to not match if the < or > is preceded by a \.
I tried [^ \n\t[^\\]<[^\\]>], but this did not work.
Can any one of the sequences below occur in your input?
\\>
\\\>
\\\\>
\blank
\tab
\newline
...
If so, how do you propose to treat them?
If not, then zero-width look-behind assertions will do the trick, provided that your regular expression engine supports it. This will be the case in any engine that supports Perl-style regular expressions (including Perl's, PHP, etc.):
(?<!\\)[ \n\t<>]
The above will match any un-escaped space, newline, tab or angled braces. More generically (using \s to denote any space characters, including \r):
(?<!\\)\s
Alternatively, using complementary notation without the need for a zero-width look-behind assertion (but arguably less efficiently):
(?:[^ \n\t<>]|\\[<>])
You may also use a variation of the latter to handle the \\>, \\\>, \\\\> etc. cases as well up to some finite number of preceding backslashes, such as:
(?:[^ \n\t<>]|(?:^|[^<>])[\\]{1,3,5,7,9}[<>])
According to the grep man page:
A bracket expression is a list of
characters enclosed by [ and ]. It
matches any single character in that
list; if the first character of the
list is the caret ^ then it matches
any character not in the list.
This means that you can't match a sequence of characters such as \< or \> only single characters.
Unless you have a version of grep built with Perl regex support then you can use lookarounds like one of the other posters mentioned. Not all versions of grep have this support though.
Maybe you can use egrep and put your pattern string inside quotes. This should obliterate the need for escaping.