Vim how to remove some words using regex - regex

In vim editor, I want to delete parentheses and the words in parentheses using regular expression.
Help me please!
As-is:
DOT("."), COMMA(","), SEMICOLON(";"), COLON(":"), QUOTE("'"),
EQUALS("="), NOT_EQUALS("<>"), LESS_THAN("<"), LESS_EQUALS("<="),
Want To-be:
DOT, COMMA, SEMICOLON, COLON, QUOTE,
EQUALS, NOT_EQUALS, LESS_THAN, LESS_EQUALS,

Here is a short one:
%s/(.\{-})//g
Explanations: it matches a parenthese (, then as few characters as possible .\{-} before the next closing parenthese ). It replaces this whole match by nothing.

To keep it simple without having a too much strict regex, I would use
:%s#("..\?")##g
This will basically remove any character or two within double quotes and parenthesis.
Is using also # instead of / it may be easy to read and in some cases helps to avoid escaping / when required.

You should really take the time to learn regex properly, it's fairly useful and pretty cool stuff. That being said, this is a good time to learn at least this part.
You have a text list and you want to match everything that isn't within parentheses, repeatedly over a line.
%s/\([^(]*\)[^)]*)\([^(]*\)/\1\2/g
First, we're gonna do this over the whole file, so let's use %s. Next, we have / as our separator. Our pattern that we'll match is therefore \([^(]*\)[^)]*)\([^(]*\).
Let's break that down some more. \( \) is the grouping operator, which just tells vim "hey, I might want the stuff in here later." [^ ] is the not operator, and says "I a character that isn't any of these characters". [^)]* then says "I want all the characters I can grab in a row that aren't ")". All of that was group one.
After our first \( \) we have stuff that isn't in a group, because we don't want to keep it. [^)]*) uses the not operator again, to match a bunch of characters that aren't ")", and then we have a ")", which matches a literal ")" (there's probably a better way to do this part, but it works.
Next, we have our second \( \) group which contains [^(]*. Again, another not operator, matching as many non "(" in a row as we can. We need our pattern to stop by the next "(" so that our regex can match multiple times on the line; if we'd used \(.*\) instead, we'd have to run our regex a bunch of times since we'd only remove one set of parens per run.
After our pattern, we have another / which delimits the pattern what we're going to put in it's place. Remember how I said \( \) tells vim to keep the stuff inside for later? Here's where we use it. Our first group is basically "everything before a (" and our second group is basically "everything after a )". We tell vim we want to just keep group 1 followed by group 2 with \1\2.
Finally, /g means do to our regex globally over the line, meaning to try matching more than once in the line if possible.

Try this pattern:
(?:[A-Z]{3,9}|, |_){1,2}
You can test it online

Many of the solutions already given are excellent. Like some of the others, I'd recommend learning how to regex in more depth. For your specific issue, you could alternatively search for opening brackets with /( then use da) to delete the brackets and their contents (skip if you want to keep this particular pair), move to the next match with n, repeat the deletion with ;, and do this until you've deleted what you need.

This seems to work:
%s/("[.,;:'=<][>=]*")//g

Related

What is the difference between `(\S.*\S)` and `^\s*(.*)\s*$` in regex?

I'm doing the RegexOne regex tutorial and it has a question about writing a regular expression to remove unnecessary whitespace.
The solution provided in the tutorial is
We can just skip all the starting and ending whitespace by not capturing it in a line. For example, the expression ^\s*(.*)\s*$ will catch only the content.
The setup for the question does indicate the use of the hat at the beginning and the dollar sign at the end, so it makes sense that this is the expression that they want:
We have previously seen how to match a full line of text using the hat ^ and the dollar sign $ respectively. When used in conjunction with the whitespace \s, you can easily skip all preceding and trailing spaces.
That said, using \S instead, I was able to come up with what seems like a simpler solution - (\S.*\S).
I've found this Stack Overflow solution that match the one in the tutorial - Regex Email - Ignore leading and trailing spaces? and I've seen other guides that recommend the same format but I'm struggling to find an explanation for why the \S is bad.
Additionally, this validates as correct in their tool... so, are there cases where this would not work as well as the provided solution? Or is the recommended version just a standard format?
The tutorial's solution of ^\s*(.*)\s*$ is wrong. The capture group .* is greedy, so it will expand as much as it can, all the way to the end of the line - it will capture trailing spaces too. The .* will never backtrack, so the \s* that follows will never consume any characters.
https://regex101.com/r/584uVG/1
Your solution is much better at actually matching only the non-whitespace content in the line, but there are a couple odd cases in which it won't match the non-space characters in the middle. (\S.*\S) will only capture at least two characters, whereas the tutorial's technique of (.*) may not capture any characters if the input is composed of all whitespace. (.*) may also capture only a single character.
But, given the problem description at your link:
Occasionally, you'll find yourself with a log file that has ill-formatted whitespace where lines are indented too much or not enough. One way to fix this is to use an editor's search a replace and a regular expression to extract the content of the lines without the extra whitespace.
From this, matching only the non-whitespace content (like you're doing) probably wouldn't remove the undesirable leading and trailing spaces. The tutorial is probably thinking to guide you towards a technique that can be used to match a whole line with a particular pattern, and then replace that line with only the captured group, like:
Match ^\s*(.*\S)\s*$, replace with $1: https://regex101.com/r/584uVG/2/
Your technique would work given the problem if you had a way to make a new text file containing only the captured groups (or all the full matches), eg:
const input = ` foo
bar
baz
qux `;
const newText = (input.match(/\S(?:$|.*\S)/gm) || [])
.join('\n');
console.log(newText);
Using \S instead of . is not bad - if one knows a particular location must be matched by a non-space character, rather than by a space, using \S is more precise, can make the intent of the pattern clearer, and can make a bad match fail faster, and can also avoid problems with catastrophic backtracking in some cases. These patterns don't have backtracking issues, but it's still a good habit to get into.

How to add quotations around numbers in Vim - Regex?

I have been trying, unsuccessfully, to add quotation marks around some numbers in my file using regex. To clarify, let me give an example of what I am trying to do.
Something like myFunction(100) would be changed to myFunction("100").
I thought :100,300s/\([0-9]*\)/"\0" would work but it put quotation marks around spaces as well.
What can I do to fix this?
You should slightly modify the regular expression:
%s/\(\d\+\)/"\1"
In regular expression, first matched group is \1, not \0. And it looks safer to use \+ instead of *.
The reason this isn't working as expected is because [0-9]* is matching all strings of zero length, so your substitution is adding two quotes between every two characters. Changing it to [0-9]+ (to require at least one digit) will solve your problem.
As an additional improvement, you can replace [0-9] with \d. Also, \0 is the replacement for the entire matched expression, so your parentheses are unnecessary: :100,300s/\d+/"\0" will accomplish what you want. Captured subgroups start at \1.

Ant regex expression

Quite a simple one in theory but can't quite get it!
I want a regex in ant which matches anything as long as it has a slash on the end.
Below is what I expect to work
<regexp id="slash.end.pattern" pattern="*/"/>
However this throws back
java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*/
^
I have also tried escaping this to \*, but that matches a literal *.
Any help appreciated!
Your original regex pattern didn't work because * is a special character in regex that is only used to quantify other characters.
The pattern (.)*/$, which you mentioned in your comment, will match any string of characters not containing newlines, however it uses a possibly unnecessary capturing group. .*/$ should work just as well.
If you need to match newline characters, the dot . won't be enough. You could try something like [\s\S]*/$
On that note, it should be mentioned that you might not want to use $ in this pattern. Suppose you have the following string:
abc/def/
Should this be evaluated as two matches, abc/ and def/? Or is it a single match containing the whole thing? Your current approach creates a single match. If instead you would like to search for strings of characters and then stop the match as soon as a / is found, you could use something like this: [\s\S]*?/.

Regex expression to match all char inside

I'm trying to mass update a web app, I need to create a regex that matches:
lang::id(ALLCHARACTERS]
Can someone assist me with this? I'm not good with regex. I'm pretty sure it can start like:
lang\:\:\(WHAT GOES HERE\]
Something like this would work:
lang::id\([^]]*]
This will match a literal lang::id\(, followed by zero or more of any character other than ], followed by a literal ].
Note that the only character that really needs to be escaped is the open parenthesis.
lang::id\(.*]
The . means any single character, and then * repeats it zero->N times. Make sure to escape the ( since it is used inside regex and is a special char for them, so escaping it with \ is needed, or the regex will probably complain about unbalanced parenthesis.
If you wanted it to not include all characters, you can add a smaller regex in place of the .*. This way you can break the regex down into smaller chunks which help make it easier to understand and develop for some complex rules.

Regex negation in vim

In vim I would like to use regex to highlight each line that ends with a letter, that is preceeded by neither // nor :. I tried the following
syn match systemverilogNoSemi "\(.*\(//\|:\).*\)\#!\&.*[a-zA-Z0-9_]$" oneline
This worked very good on comments, but did not work on lines containing colon.
Any idea why?
Because with this regex vim can choose any point for starting match for your regular expression. Obviously it chooses the point where first concat matches (i.e. does not have // or :). These things are normally done by using either
\v^%(%(\/\/|\:)#!.)*\w$
(removed first concat and the branch itself, changed .* to %(%(\/\/|\:)#!.)*; replaced collection with equivalent \w; added anchor pointing to the start of line): if you need to match the whole line. Or negative look-behind if you need to match only the last character. You can also just add anchor to the first concat of your variant (you should remove trailing .* from the first concat as it is useless, and the branch symbol for the same reason).
Note: I have no idea why your regex worked for comments. It does not work with comments the way you need it in all cases I checked.
does this work for you?
^\(\(//\|:\)\#<!.\)*[a-zA-Z0-9_]$