Vim any character replaced by itself - regex

In Vim, a search and replace command which to the segment below:
func {print("touching curly braces")}
transforms it into this:
func { print("touching curly brace") }
So far I have:
:%s/{.\( \)\#!/{./g
However, it does this to the first segment:
func {.rint("touching curly braces")}
I believe I need something like this:
:%s/{.\( \)\#!/{ ./g
:%s/}.\( \)\#!/} ./g
How do I replace the kleene star '.' with the character it matched?

You need to put the . into a group so you can repeat in the substituion string \(.\)
:%s/{\(.\)\( \)\#!/{ \1/g
:%s/}\(.\)\( \)\#!/} \1/g
This is what is called backreferencing and grouping
If you want to do both spaces at once here is the command:
:%s/{\(\S.*\S\)}/{ \1 }/g

I'd replace
:%s/{\($| \)\#!/{ /g
and
:%s/\(^| \)\#<!}/ }/g
where {\($| \)\#! is { and a negative lookahead for either end-of-line or space, the other expression is analogous with a negative lookbehind and }.
Note that replacing in source code like that is a dangerous endeavor. You can break things very easily. Think of curly braces inside strings, or in regular expressions, or other situations you did not quite think of. Use /gc instead of /g to manually confirm each change.

do you mean this? I hope I understood your problem right:
s/\zs{\ze./& /g
s/.\zs}\ze/ &/g
You can do it in this way too:
s/{\ze./& /g
s/.\zs}/ &/g

Related

Regex: Exact match string ending with specific character

I'm using Java. So I have a comma separated list of strings in this form:
aa,aab,aac
aab,aa,aac
aab,aac,aa
I want to use regex to remove aa and the trailing ',' if it is not the last string in the list. I need to end up with the following result in all 3 cases:
aab,aac
Currently I am using the following pattern:
"aa[,]?"
However it is returning:
b,c
If lookarounds are available, you can write:
,aa(?![^,])|(?<![^,])aa,
with an empty string as replacement.
demo
Otherwise, with a POSIX ERE syntax you can do it with a capture:
^(aa(,|$))+|(,aa)+(,|$)
with the 4th group as replacement (so $4 or \4)
demo
Without knowing your flavor, I propose this solution for the case that it does know the \b.
I use perl as demo environment and do a replace with "_" for demonstration.
perl -pe "s/\baa,|,aa\b/_/"
\b is the "word border" anchor. I.e. any start or end of something looking like a word. It allows to handle line end, line start, blank, comma.
Using it, two alternatives suffice to cover all the cases in your sample input.
Output (with interleaved input, with both, line ending in newline and line ending in blank):
aa,aab,aac
_aab,aac
aab,aa,aac
aab_,aac
aab,aac,aa
aab,aac_
aa,aab,aac
_aab,aac
aab,aa,aac
aab_,aac
aab,aac,aa
aab,aac_
If the \b is unknown in your regex engine, then please state which one you are using, i.e. which tool (e.g. perl, awk, notepad++, sed, ...). Also in that case it might be necessary to do replacing instead of deleting, i.e. to fine tune a "," or "" as replacement. For supporting that, please show the context of your regex, i.e. the replacing mechanism you are using. If you are deleting, then please switch to replacing beforehand.
(I picked up an input from comment by gisek, that the cpaturing groups are not needed. I usually use () generously, including in other syntaxes. In my opinion not having to think or look up evaluation orders is a benefit in total time and risks taken. But after testing, I use this terser/eleganter way.)
If your regex engine supports positive lookaheads and positive lookbehinds, this should work:
,aa(?=,)|(?<=,)aa,|(,|^)aa(,|$)
You could probably use the following and replace it by nothing :
(aa,|,aa$)
Either aa, when it's in the begin or the middle of a string
,aa$ when it's at the end of the string
Demo
As you want to delete aa followed by a coma or the end of the line, this should do the trick: ,aa(?=,|$)|^aa,
see online demo

Notepad++ and delimiters: automatically replace ``string'' by \command{string}

Within Notepad++, I want to replace many instances of the type ``string'' by \command{string} where string can be any string of characters. I am fairly close to what I want to achieve with:
Find: (?<=``)(.*?)(?='')
Replace: \\command{\1}
There is still a problem. With the regex code above, instead of \command{string} I get ``\command{string}'' and I am not sure why the `` and '' are not removed?
It is because you are using lookaround assertions. Lookaround (zero-width) assertions only assert that a position can be matched and do not "consume" any characters on the string. You can use the below regular expression.
Find: ``([^']+)''
Replace: \\command{\1}
You need to wrap everything into a capture group and use that. NP++ seems to not support lookahead/behind, but you dont need that for this specific case anyway:
``([^']+)'' -> \\command{\1}
This will make sure it does not match two commands (longest match) in something like:
run ``ls -l'' or ``ls -a''

Trouble converting regex

This regex:
"REGION\\((.*?)\\)(.*?)END_REGION\\((.*?)\\)"
currently finds this info:
REGION(Test) my user typed this
END_REGION(Test)
I need it to instead find this info:
#region REGION my user typed this
#endregion END_REGION
I have tried:
"#region\\ (.*?)\\\n(.*?)#endregion\\ (.*?)\\\n"
It tells me that the pattern assignment has failed. Can someone please explain what I am doing wrong? I am new to Regex.
It seems the issue lies in the multiline \n. My recommendation is to use the modifier s to avoid multiline complexities like:
/#region\ \(.*?\)(.*?)\s#endregion\s\(.*?\)/s
Online Demo
s modifier "single line" makes the . to match all characters, including line breaks.
Try this:
#region(.*)?\n(.*)?#endregion(.*)?
This works for me when testing here: http://regexpal.com/
When using your original text and regex, the only thing that threw it off is that I did not have a new line at the end because your sample text didn't have one.
Constructing this regex doesn't fail using boost, even if you use the expanded modifier.
Your string to the compiler:
"#region\\ (.*?)\\\n(.*?)#endregion\\ (.*?)\\\n"
After parsed by compiler:
#region\ (.*?)\\n(.*?)#endregion\ (.*?)\\n
It looks like you have one too many escapes on the newline.
if you present the regex as expanded to boost, an un-escaped pound sign # is interpreted as a comment.
In that case, you need to escape the pound sign.
\#region\ (.*?)\\n(.*?)\#endregion\ (.*?)\\n
If you don't use the expanded modifier, then you don't need to escape the space characters.
Taking that tack, you can remove the escape on the space's, and fixing up the newline escapes, it looks like this raw (what gets passed to regex engine):
#region (.*?)\n(.*?)#endregion (.*?)\n
And like this as a source code string:
"#region (.*?)\\n(.*?)#endregion (.*?)\\n"
Your regular expression has an extra backslash when escaping the newline sequence \\\n, use \\s* instead. Also for the last capturing group you can use a greedy quantifier instead and remove the newline sequence.
#region\\ (.*?)\\s*(.*?)#endregion\\ (.*)
Compiled Demo

Replace a character by another, unless it is located in between braces

What I would like to do with the following string, is to replace all comas "," by tabulation, unless the said coma is between braces { }.
Say I have:
goldRigged,1,0,0,0,1,0,0,0,1,"{"LootItemID": "goldOre", "Amount": 1}"
The result should be:
goldRigged\t1\t0\t0\t0\t1\t0\t0\t0\t1\t"{"LootItemID": "goldOre"**,** "Amount": 1}"
I already have: \"(\\{((.*?))\\})\" which allow me to match what's in between { }.
The idea would be to exclude the content with something and match any comas with something like \",^(\\{((.*?))\\})\"
But I guess that by doing that it will exclude the comma itself.
What you would need is called a negative lookahead and a negative lookbehind. However, this would make up a quite complex statement:
Match all commas that are not preceeded by a opening brace as long as they were not previously preceeded by a closing brace (plus the reverted logic for the right side of the comma). This will result in an expression that is difficult to process because the regex engine constantly needs to run up and down your string from its current position what will be rather inefficient.
Instead, iterate over all characters of your string. If you match an opening brace, set an escape hint. Remove it, when you find a closing brace. When you find a comma, replace it when your escape hint is not set. Write your result to some sort of string buffer and your solution will b significantly more efficiant over the regex.
You want to use a negative lookaround to achieve this:
(?<![\{\}]),*(?![\{\}]) should work, try here: http://regex101.com/r/gG3oU1
Use negative lookahead (?!expr) and negative lookbehind (?<!expr) in your regex expression
for example you can code like this:
System.Text.RegularExpressions.Regex.Replace(
"goldRigged,1,0,0,0,1,0,0,0,1, {\"LootItemID\": \"goldOre\", \"Amount\": 1}" ,
#"(?<!\{[^\}].*)[,](?![^\{]*\})", "\t");
Does your input line contain the { only in the last token?
If yes then you can try this brute force approach
echo "goldRigged,1,0,0,0,1,0,0,0,1,"{"LootItemID": "goldOre", "Amount": 1}"" | awk -F'{' '{one=$1;gsub(",","\t",one);printf("%s{%s\n",one,$2);}
The below regex is an expensive way of doing it. As suggest by #Sniffer a parser would be nicer here :)
(?=,.*?"{),|(?!,.*?\}),
First alternation
(?=,.*?"{), - make sure comma is outside the sequence "{
Second alternation
(?!,.*?\}), - make sure comma isn't inside the sequence }"
There will be edge cases that haven't been accounted for, that's the parser comes in
I think you actually need only one lookahead:
,(?=[^{}]*({|$))
reads: a comma, followed by some non-braces and then either an open brace or the end.
Example in JS:
> x = 'goldRigged,1,0,0,0,1,0,0,0,1,"{"LootItemID": "goldOre", "Amount": 1}",some,more{stuff,ff}end'
> x.replace(/,(?=[^{}]*({|$))/g, "#")
"goldRigged#1#0#0#0#1#0#0#0#1#"{"LootItemID": "goldOre", "Amount": 1}"#some#more{stuff,ff}end"
Note this doesn't work if braces can be nested, in this case you need either a regex engine with recursion (?R) or a proper parser.

Vim/Perl Regex Tag Match Problem

I have data that looks like this:
[Shift]);[Ctrl][Ctrl+S][Left mouse-click][Backspace][Ctrl]
I want to find all [.*] tags that have the word mouse in them. Keeping in mind non-greedy specifiers, I tried this in Vim: \[.\{-}mouse.\{-}\], but this yielded this result,
[Shift]);[Ctrl][Ctrl+S][Left mouse-click]
Rather than just the desired,
[Left mouse-click]
Any ideas? Ultimately I need this pattern in Perl syntax as well, so if anyone has a solution in Perl that would also be appreciated.
\[[^]]*mouse[^[]*\]
That is, match a literal opening bracket, then any number of characters that aren't closing brackets, then "mouse," then any number of non-opening-brackets, and finally a literal closing bracket. Should be the same in Perl.
You can use the following regex:
\[[^\]]*mouse.*?\]