Regex expression to match all char inside - regex

I'm trying to mass update a web app, I need to create a regex that matches:
lang::id(ALLCHARACTERS]
Can someone assist me with this? I'm not good with regex. I'm pretty sure it can start like:
lang\:\:\(WHAT GOES HERE\]

Something like this would work:
lang::id\([^]]*]
This will match a literal lang::id\(, followed by zero or more of any character other than ], followed by a literal ].
Note that the only character that really needs to be escaped is the open parenthesis.

lang::id\(.*]
The . means any single character, and then * repeats it zero->N times. Make sure to escape the ( since it is used inside regex and is a special char for them, so escaping it with \ is needed, or the regex will probably complain about unbalanced parenthesis.
If you wanted it to not include all characters, you can add a smaller regex in place of the .*. This way you can break the regex down into smaller chunks which help make it easier to understand and develop for some complex rules.

Related

Vim how to remove some words using regex

In vim editor, I want to delete parentheses and the words in parentheses using regular expression.
Help me please!
As-is:
DOT("."), COMMA(","), SEMICOLON(";"), COLON(":"), QUOTE("'"),
EQUALS("="), NOT_EQUALS("<>"), LESS_THAN("<"), LESS_EQUALS("<="),
Want To-be:
DOT, COMMA, SEMICOLON, COLON, QUOTE,
EQUALS, NOT_EQUALS, LESS_THAN, LESS_EQUALS,
Here is a short one:
%s/(.\{-})//g
Explanations: it matches a parenthese (, then as few characters as possible .\{-} before the next closing parenthese ). It replaces this whole match by nothing.
To keep it simple without having a too much strict regex, I would use
:%s#("..\?")##g
This will basically remove any character or two within double quotes and parenthesis.
Is using also # instead of / it may be easy to read and in some cases helps to avoid escaping / when required.
You should really take the time to learn regex properly, it's fairly useful and pretty cool stuff. That being said, this is a good time to learn at least this part.
You have a text list and you want to match everything that isn't within parentheses, repeatedly over a line.
%s/\([^(]*\)[^)]*)\([^(]*\)/\1\2/g
First, we're gonna do this over the whole file, so let's use %s. Next, we have / as our separator. Our pattern that we'll match is therefore \([^(]*\)[^)]*)\([^(]*\).
Let's break that down some more. \( \) is the grouping operator, which just tells vim "hey, I might want the stuff in here later." [^ ] is the not operator, and says "I a character that isn't any of these characters". [^)]* then says "I want all the characters I can grab in a row that aren't ")". All of that was group one.
After our first \( \) we have stuff that isn't in a group, because we don't want to keep it. [^)]*) uses the not operator again, to match a bunch of characters that aren't ")", and then we have a ")", which matches a literal ")" (there's probably a better way to do this part, but it works.
Next, we have our second \( \) group which contains [^(]*. Again, another not operator, matching as many non "(" in a row as we can. We need our pattern to stop by the next "(" so that our regex can match multiple times on the line; if we'd used \(.*\) instead, we'd have to run our regex a bunch of times since we'd only remove one set of parens per run.
After our pattern, we have another / which delimits the pattern what we're going to put in it's place. Remember how I said \( \) tells vim to keep the stuff inside for later? Here's where we use it. Our first group is basically "everything before a (" and our second group is basically "everything after a )". We tell vim we want to just keep group 1 followed by group 2 with \1\2.
Finally, /g means do to our regex globally over the line, meaning to try matching more than once in the line if possible.
Try this pattern:
(?:[A-Z]{3,9}|, |_){1,2}
You can test it online
Many of the solutions already given are excellent. Like some of the others, I'd recommend learning how to regex in more depth. For your specific issue, you could alternatively search for opening brackets with /( then use da) to delete the brackets and their contents (skip if you want to keep this particular pair), move to the next match with n, repeat the deletion with ;, and do this until you've deleted what you need.
This seems to work:
%s/("[.,;:'=<][>=]*")//g

Match specific numbers and before after have or regex escaped

With /\escape/ I can escape special regex right? But why isn't working?
I'm trying to search specific numbers from the beginning which start with |something in the middle have numbers only [0-9] and ends with | again.
Also have other string etc from left and from the right like so left|something[0-9]|right
This is what I've done, but is not working
/\|/234123[0-9]/\|/
\ only escapes the next character, so the second forward slash is ending the regular expression. Instead, you want this:
/\|something[0-9]\|/
You have to make sure that something is escaped correctly.
Note that if you need to match any number not just a digit, you need [0-9]+.
What would probably help you the most would be the right tool for the job:
https://regex101.com/r/ZdjhCE/2
You'll still have to set your language, as regex are similar between languages, but unluckily not 100% identical.

Regex expression - single quote without comma

I need a regex expression to find single quotes that does not have a comma neither right before nor right after it. Also the single quotes should not be the first character or the last character in the string and should have an alphanumeric character on each side
Example "Jane's book" would detect while "'apples','oranges'"
Can anyone help?
You can use this regex with lookarounds:
(?<=[a-zA-Z0-9])'(?=[a-zA-Z0-9])
RegEx Demo
Something like:
(?<=[A-Za-z0-9])\'(?=[A-Za-z0-9])
should give you matches in the languages that support positive lookaheads and positive lookbehinds (JavaScript only supports lookaheads if I remember correctly). I didn't test the above, but I'm not sure you would even need to escape the single quote...
You need your language-appropriate variation of:
.+'[^,]+.*
' finds you a single quote. You generally do not need to escape a single quotation mark.
[^,] allows any character but a comma and + indicates that you require at least one such character
.* says you can have as many of any character as you like, so putting it before and after what you care about says your expression can occur anywhere in the string. .+ means you must have at least one of any character not a comma.
Note that I'm making some assumptions, like that you'll only have one ' in the string that you want to find. Also I'm assuming you don't care about , except for right after '. If that's not true, you need to be more specific about your requirements.

Ant regex expression

Quite a simple one in theory but can't quite get it!
I want a regex in ant which matches anything as long as it has a slash on the end.
Below is what I expect to work
<regexp id="slash.end.pattern" pattern="*/"/>
However this throws back
java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*/
^
I have also tried escaping this to \*, but that matches a literal *.
Any help appreciated!
Your original regex pattern didn't work because * is a special character in regex that is only used to quantify other characters.
The pattern (.)*/$, which you mentioned in your comment, will match any string of characters not containing newlines, however it uses a possibly unnecessary capturing group. .*/$ should work just as well.
If you need to match newline characters, the dot . won't be enough. You could try something like [\s\S]*/$
On that note, it should be mentioned that you might not want to use $ in this pattern. Suppose you have the following string:
abc/def/
Should this be evaluated as two matches, abc/ and def/? Or is it a single match containing the whole thing? Your current approach creates a single match. If instead you would like to search for strings of characters and then stop the match as soon as a / is found, you could use something like this: [\s\S]*?/.

Simple regex for matching up to an optional character?

I'm sure this is a simple question for someone at ease with regular expressions:
I need to match everything up until the character #
I don't want the string following the # character, just the stuff before it, and the character itself should not be matched. This is the most important part, and what I'm mainly asking. As a second question, I would also like to know how to match the rest, after the # character. But not in the same expression, because I will need that in another context.
Here's an example string:
topics/install.xml#id_install
I want only topics/install.xml. And for the second question (separate expression) I want id_install
First expression:
^([^#]*)
Second expression:
#(.*)$
[a-zA-Z0-9]*[\#]
If your string contains any other special characters you need to add them into the first square bracket escaped.
I don't use C#, but i will assume that it uses pcre... if so,
"([^#]*)#.*"
with a call to 'match'. A call to 'search' does not need the trailing ".*"
The parens define the 'keep group'; the [^#] means any character that is not a '#'
You probably tried something like
"(.*)#.*"
and found that it fails when multiple '#' signs are present (keeping the leading '#'s)?
That is because ".*" is greedy, and will match as much as it can.
Your matcher should have a method that looks something like 'group(...)'. Most matchers
return the entire matched sequence as group(0), the first paren-matched group as group(1),
and so forth.
PCRE is so important i strongly encourage you to search for it on google, learn it, and always have it in your programming toolkit.
Use look ahead and look behind:
To get all characters up to, but not including the pound (#): .*?(?=\#)
To get all characters following, but not including the pound (#): (?<=\#).*
If you don't mind using groups, you can do it all in one shot:
(.*?)\#(.*) Your answers will be in group(1) and group(2). Notice the non-greedy construct, *?, which will attempt to match as little as possible instead of as much as possible.
If you want to allow for missing # section, use ([^\#]*)(?:\#(.*))?. It uses a non-collecting group to test the second half, and if it finds it, returns everything after the pound.
Honestly though, for you situation, it is probably easier to use the Split method provided in String.
More on lookahead and lookbehind
first:
/[^\#]*(?=\#)/ edit: is faster than /.*?(?=\#)/
second:
/(?<=\#).*/
For something like this in C# I would usually skip the regular expressions stuff altogether and do something like:
string[] split = exampleString.Split('#');
string firstString = split[0];
string secondString = split[1];