Replace the very first matching pattern on later lines in Vim - regex

I have a long text where some lines need to be repeated later.
I put tags like this in the text:
{F1}text need to be repeated later{/F1}
so I can add multiple {F1}{/F1} to later sections and put the contents of the first line between them.
The problem is that there will be a lot of tags like this like {F2}{/F2} etc... and this pattern matches all of those too:
{\(.*\)}.*{\/\1}
So, I want to search every first occurrence of different tags and replace them on later lines, so when I change the first line and make a substitute again, all of the lines will updated, maybe automatically with an autocmd BufWrite.
How could I do this? I accept any solution, not necessarily using my idea of marking the first lines with {}{/} tags. There will be a lot of tags and I don't want to do it one-by-one with individual substitute commands.
I tried with this:
:g/{\(.*\)}\(.*\){\/\1}/s/{\1}.*{\/\1}/{\1}\2{\/\1}/
but it says:
E65 Illegal back reference.

The ReplicateTags() function that is listed below runs a substitution
command replacing contents of each tag (according to its description in the
question) with text in the first occurrence of that tag. The substitution
operates on the whole buffer and processes all of the tags in one pass
(accepting multiline non-overlapping tags). The function returns a dictionary
that maps tag names to contents of their first occurrence.
function! ReplicateTags()
let dict = {}
%s/{\([^}]\+\)}\(\_.\{-}\){\/\1}/\=Tag(dict, submatch(1), submatch(2))/ge
return dict
endfunction
function! Tag(dict, tag, str)
let a:dict[a:tag] = get(a:dict, a:tag, a:str)
return printf('{%s}%s{/%s}', a:tag, a:dict[a:tag], a:tag)
endfunction

Related

Regex that deletes everything except for any tags that contains an specific string inside of it

I need a regex that can be applied on vim editor, or bash (grep command), that will delete everything in a file, leaving only the tags containing an specific string:
<generic>
stuff1
stuff2
stuff3
</generic>
and
<generic>
stuff1
stuff2
DESIRED_STRING
stuff3
</generic>
The first one would be wiped and the second one would remain because of the DESIRED_STRING.
At the end, I need a file with tons of tags that contains a modifier on it. This process will be executed several times to separate one huge file into multiple others.
This (?<=\<custom_item\>).*?(?=\<\/custom_item\>) got me in a point where I could match the content inside of the tags. Not able to filter it though.
The file will always follow this structure
<tag>
system : "Linux"
type : CHECK
</tag>
Where 'CHECK' is the modifier and the word I am looking for
Thank you!!
You may use this approach using awk:
awk '/<generic>/ { tag=1 }
tag && /DESIRED_STRING/ { p=1 }
tag { s = s $0 RS }
/<\/generic>/ { if (p) printf "%s", s; tag=p=0; s="" }' file
We use 2 flags to track our state here. tag represents state when we are inside open and close tags and p represents a state when we find our desired string while inside the open/close tags.
Here's an alternative, in Vim: it is much easier to match than avoid to match, so....
Gmz:1,'z g/DESIRED_STRING/norm yat:$pu<Ctrl-V><Enter><Enter>'zdgg
where <Ctrl-V> and <Enter> are supposed to be keys, not actual text to be entered.
Gmz will set a z mark at the last line. Then, we search for the DESIRED_STRING, and at each one, yank the tag, then paste it to the bottom of the file (in order). Then 'zdgg to delete the original (from the mark z to the top of the file).
Basically, instead of trying to delete everything and making exceptions for the desired content, pull the desired content out first, then delete everything.
Bonus: This will work even with tags that don't align with line breaks (even though OP doesn't have those). For example,
outside<tag>inside
foo DESIRED_STRING inside</tag>outside
will correctly produce
<tag>inside
foo DESIRED_STRING inside</tag>
With Vim regex:
:%s/<\([^>]*\)>\(\_.\(DESIRED_STRING\)\#!\)\{-}<\/\1>//
This regex uses a negative look ahead, \#!, to match all blocks of text not containing DESIRED_STRING. These blocks are then removed with the :%s command

RegEX: Matching everything but a specific value

How do i match everything in an html response but this piece of text
"signed_request" value="The signed_request is placed here"
The fast solution is:
^(.*?)"signed_request" value="The signed_request is placed here"(.*)$
If value can be random text you could do:
^(.*?)"signed_request" value="[^"]*"(.*)$
This will generate two groups that.
If the result was not successful the text does not contain the word.
If the text contains the text more than once, it is only the first time that is ignored.
If you need to remove all instances of the text you can just as well use a replace string method.
But usually it is a bad idea to use regex on html.

Regex to find arguments in text

There's undoubtedly a better way to do this but this is the way my requirements need me to do this.
I'm creating a search form for my web application. I want to use a tagged based search. So I'm using regex to make it work.
So I have a search string: 'c:john customer:15478'
The regex needs to find the tag (c:) and the argument (john), drop the tag, and give me the argument -- and it needs to do so for all of the instances of a tag and their arguments. The regex I have comes close, but it doesn't work correctly. It doesn't grab every argument, or drop the tags in a consistent way. So the question: what's wrong with my regex that needs to be fixed in order to achieve the correct results?
Currently it finds the first tag, grabs its argument, and everything else after it. I need it to stop the match after it finds an argument. i.e. in the case above it will match john customer:15478
Maybe a better question is how do I make VB's regex return everything between the first colon, and the beginning of the next tag (which is followed by another colon) or otherwise stop matching at the beginning of the next tag?
Regex:
(?<=({0}({1})??:)+?)(\S+\s*\S*)(?=\s+?\b\w+:.+?)??
The {0} and the {1} represent a String.format call using a string, say Customer (but it could be anything), to define the tag. the {0} is the first character, and the {1} are the rest of the characters. This regex will match anything that exists behind the tag including another tag and its argument if it exists. So for the string
"c:5401 4664 c:john smith p:joam d:domain.com p:1548 c:215-548-5487 d:""192.168.0.1"""
The matches would be
'5401 4664, john smith, 215-548-5487 d:"192.168.0.1"'
'domain.com p:1548, "192.168.0.1"'
'joam d:domain.com, 1548 c:215-548-5487'
given the tags I have defined. The regex fails to stop its matching at the start of the next tag.
If I undestood You correctly this should solve the problem in general:
/\w+:([^:]+)(?:\s|$)/g
https://regex101.com/r/vN6fH1/1
and with defined tag it would look like this:
/{0}({1})?:([^:]+)(?:\s|$)/g
but this still rely on semicolon not tag name
(so it won't match at all if You did not pass tag name that is in string)

Powershell with regex: Unable to find and replace ALL occurences of specified string in a set of data

I am new to regular expressions and stackoverflow. Any help would be greatly appreciated.
I am trying to remove unwanted data from a data set. The data is contained in a .csv file column with multiple cells, each cell containing data similar to this:
OSVDB #109124,OSVDB #109125,OSVDB #109126,OSVDB #109127,OSVDB #109128,OSVDB #109129,OSVDB #109130,OSVDB #109131,OSVDB #109132,OSVDB #109133,OSVDB #109134,OSVDB #109135,OSVDB #109136,OSVDB #109137,OSVDB #109138,OSVDB #109139,OSVDB #109140,OSVDB #109141,OSVDB #109142,OSVDB #109143,VMSA #2014-0012,OSVDB #102715,OSVDB #104972,OSVDB #106710,OSVDB #115364,IAVA #2014-A-0191,IAVB #2014-B-0160,IAVB #2014-B-0162,IAVB #2015-B-0007
I want to replace the above data with each occurrence of the strings beginning "IAV...". So, the above cell would read:
IAVA #2014-A-0191,IAVB #2014-B-0160,IAVB #2014-B-0162,IAVB #2015-B-0007
Below is a snippet of the script that imports the .csv and gets the column containing the data.
My regex, within powershell is:
$reg1 = '$1'
$reg2 = '(IAV[A|B]\s#[0-9]{4}-[A|B]-[0-9]{4}){1,}'
ForEach-Object {$_.IAVM = [regex]::replace($_.IAVM,$reg2,$reg1); $_}
The result is:
The entire cell contents posted above.
From my understanding {1,} at the end of the regex should return each occurrence of the string pattern, but I'm returning all contents of every cell containing my regex string.
Maybe instead of trying to pick out your string you just delete the stuff you don't want? Try something like:
$reg1=''
$reg2='((OSVDB|VMSA)\s#[M-S0-9-]{6,9}[,]?)'
You have .* in that regex at the very beginning. This will capture everything up to the last match of the pat that follows it. In your case I don't think you need that part anyway.
Also note that PowerShell has a handy -replace operator, so there's often no reason to use the static methods on the Regex type.

Contents within an attribute for both single and multiple ending tags

How can I fetch the contents within value attribute of the below tag across the files
<h:graphicImage .... value="*1.png*" ...../>
<h:graphicImage .... value="*2.png*" ....>...</h:graphicImage>
My regular expression search result should result into
1.png
2.png
All I could find was content for multiple ending tags but what about the single ending tags.
Use an XML parser instead, regex cannot truly parse XML properly, unless you know the input will always follow a particular form.
However, here is a regex you can use to extract the value attribute of h:graphicImage tags, but read the caveats after:
<h:graphicImage[^>]+value="\*(.*?)\*"
and the 1.png or 2.png will be in the first captured group.
Caveats:
here I have assumed that your 1.png, 2.png etc are always surrounded by asterisks as that is what it seems from your question (that is what the \* is for)
this regex will fail if one of the attributes has a ">" character in it, for example
<h:graphicImage foo=">" value="*1.png*"
This is what I mentioned before about regex never being able to parse XML properly.
You could work around this by adjusting your regex:
<h:graphicImage.+?+value="\*(.*?)\*"
But this means that if you had <h:graphicImage /><foo value="*1.png*"> then the 1.png from the foo tag is extracted, when you only want to extract from the graphicImage tag.
Again, regex will always have issues with corner cases for XML, so you need to adjust according to your application (for example, if you know that only the graphicImage tag will ever have a "value" attribute, then the second case may be better than the first).