regex: return ini section as string - regex

Using regex, (I am using Autohotkey, which is PCRE) how can I match the section of an ini file? I don't need to get the individual keys - just the section block.
I've come up with this, which seems to match as long as there is a section after the sought section, but if it is the last section, it fails.
iniregex := "ms)(?<=^\[keys\]).*(?=^\[)"
Example, I want to get the entire contents of the section, [keys], whilst excluding the comments and ignoring the empty lines (it should capture test=2, however, but exclude the comment on that line:
[settings]
settings=0
;settings=1
[keys]
test=0
;test=1
test=2 ;comment
test=3
[nextsection]
this section has an empty and should be caught.
there is an empty line after this line, and it should be caught, too.
eof
I found this, but I'm not sure where to put the sought section name.

You cannot achieve this with a single regexp.
What you can do is using this regexp based on your quote to extract the [keys] section without including the [keys] tag:
/^(?<=\[keys\]\r\n)(?:(?!^\[).)*(?=\r\n)/ms
Afterwards you can use this regexp for the extracted section to exclude comments/blank lines:
/^[^;\s][^;\r\n]*/gm

From your linked question, you would put the sought section name here:
(?ms)^\[keys](?:(?!^\[[^]\r\n]+]).)*
I don't think you'll be able to strip the comments out in the same regex as the capture, however. You'll have to do that in a secondary step.
Your regex fails if there is no section after [keys] because you need to put a "0 or more" type quantifier for the next section. Something like:
iniregex := "ms)(?<=^\[keys\]).*(?:(?=^\[))?"

Related

Regex that deletes everything except for any tags that contains an specific string inside of it

I need a regex that can be applied on vim editor, or bash (grep command), that will delete everything in a file, leaving only the tags containing an specific string:
<generic>
stuff1
stuff2
stuff3
</generic>
and
<generic>
stuff1
stuff2
DESIRED_STRING
stuff3
</generic>
The first one would be wiped and the second one would remain because of the DESIRED_STRING.
At the end, I need a file with tons of tags that contains a modifier on it. This process will be executed several times to separate one huge file into multiple others.
This (?<=\<custom_item\>).*?(?=\<\/custom_item\>) got me in a point where I could match the content inside of the tags. Not able to filter it though.
The file will always follow this structure
<tag>
system : "Linux"
type : CHECK
</tag>
Where 'CHECK' is the modifier and the word I am looking for
Thank you!!
You may use this approach using awk:
awk '/<generic>/ { tag=1 }
tag && /DESIRED_STRING/ { p=1 }
tag { s = s $0 RS }
/<\/generic>/ { if (p) printf "%s", s; tag=p=0; s="" }' file
We use 2 flags to track our state here. tag represents state when we are inside open and close tags and p represents a state when we find our desired string while inside the open/close tags.
Here's an alternative, in Vim: it is much easier to match than avoid to match, so....
Gmz:1,'z g/DESIRED_STRING/norm yat:$pu<Ctrl-V><Enter><Enter>'zdgg
where <Ctrl-V> and <Enter> are supposed to be keys, not actual text to be entered.
Gmz will set a z mark at the last line. Then, we search for the DESIRED_STRING, and at each one, yank the tag, then paste it to the bottom of the file (in order). Then 'zdgg to delete the original (from the mark z to the top of the file).
Basically, instead of trying to delete everything and making exceptions for the desired content, pull the desired content out first, then delete everything.
Bonus: This will work even with tags that don't align with line breaks (even though OP doesn't have those). For example,
outside<tag>inside
foo DESIRED_STRING inside</tag>outside
will correctly produce
<tag>inside
foo DESIRED_STRING inside</tag>
With Vim regex:
:%s/<\([^>]*\)>\(\_.\(DESIRED_STRING\)\#!\)\{-}<\/\1>//
This regex uses a negative look ahead, \#!, to match all blocks of text not containing DESIRED_STRING. These blocks are then removed with the :%s command

Regex to find arguments in text

There's undoubtedly a better way to do this but this is the way my requirements need me to do this.
I'm creating a search form for my web application. I want to use a tagged based search. So I'm using regex to make it work.
So I have a search string: 'c:john customer:15478'
The regex needs to find the tag (c:) and the argument (john), drop the tag, and give me the argument -- and it needs to do so for all of the instances of a tag and their arguments. The regex I have comes close, but it doesn't work correctly. It doesn't grab every argument, or drop the tags in a consistent way. So the question: what's wrong with my regex that needs to be fixed in order to achieve the correct results?
Currently it finds the first tag, grabs its argument, and everything else after it. I need it to stop the match after it finds an argument. i.e. in the case above it will match john customer:15478
Maybe a better question is how do I make VB's regex return everything between the first colon, and the beginning of the next tag (which is followed by another colon) or otherwise stop matching at the beginning of the next tag?
Regex:
(?<=({0}({1})??:)+?)(\S+\s*\S*)(?=\s+?\b\w+:.+?)??
The {0} and the {1} represent a String.format call using a string, say Customer (but it could be anything), to define the tag. the {0} is the first character, and the {1} are the rest of the characters. This regex will match anything that exists behind the tag including another tag and its argument if it exists. So for the string
"c:5401 4664 c:john smith p:joam d:domain.com p:1548 c:215-548-5487 d:""192.168.0.1"""
The matches would be
'5401 4664, john smith, 215-548-5487 d:"192.168.0.1"'
'domain.com p:1548, "192.168.0.1"'
'joam d:domain.com, 1548 c:215-548-5487'
given the tags I have defined. The regex fails to stop its matching at the start of the next tag.
If I undestood You correctly this should solve the problem in general:
/\w+:([^:]+)(?:\s|$)/g
https://regex101.com/r/vN6fH1/1
and with defined tag it would look like this:
/{0}({1})?:([^:]+)(?:\s|$)/g
but this still rely on semicolon not tag name
(so it won't match at all if You did not pass tag name that is in string)

Contents within an attribute for both single and multiple ending tags

How can I fetch the contents within value attribute of the below tag across the files
<h:graphicImage .... value="*1.png*" ...../>
<h:graphicImage .... value="*2.png*" ....>...</h:graphicImage>
My regular expression search result should result into
1.png
2.png
All I could find was content for multiple ending tags but what about the single ending tags.
Use an XML parser instead, regex cannot truly parse XML properly, unless you know the input will always follow a particular form.
However, here is a regex you can use to extract the value attribute of h:graphicImage tags, but read the caveats after:
<h:graphicImage[^>]+value="\*(.*?)\*"
and the 1.png or 2.png will be in the first captured group.
Caveats:
here I have assumed that your 1.png, 2.png etc are always surrounded by asterisks as that is what it seems from your question (that is what the \* is for)
this regex will fail if one of the attributes has a ">" character in it, for example
<h:graphicImage foo=">" value="*1.png*"
This is what I mentioned before about regex never being able to parse XML properly.
You could work around this by adjusting your regex:
<h:graphicImage.+?+value="\*(.*?)\*"
But this means that if you had <h:graphicImage /><foo value="*1.png*"> then the 1.png from the foo tag is extracted, when you only want to extract from the graphicImage tag.
Again, regex will always have issues with corner cases for XML, so you need to adjust according to your application (for example, if you know that only the graphicImage tag will ever have a "value" attribute, then the second case may be better than the first).

Replace the very first matching pattern on later lines in Vim

I have a long text where some lines need to be repeated later.
I put tags like this in the text:
{F1}text need to be repeated later{/F1}
so I can add multiple {F1}{/F1} to later sections and put the contents of the first line between them.
The problem is that there will be a lot of tags like this like {F2}{/F2} etc... and this pattern matches all of those too:
{\(.*\)}.*{\/\1}
So, I want to search every first occurrence of different tags and replace them on later lines, so when I change the first line and make a substitute again, all of the lines will updated, maybe automatically with an autocmd BufWrite.
How could I do this? I accept any solution, not necessarily using my idea of marking the first lines with {}{/} tags. There will be a lot of tags and I don't want to do it one-by-one with individual substitute commands.
I tried with this:
:g/{\(.*\)}\(.*\){\/\1}/s/{\1}.*{\/\1}/{\1}\2{\/\1}/
but it says:
E65 Illegal back reference.
The ReplicateTags() function that is listed below runs a substitution
command replacing contents of each tag (according to its description in the
question) with text in the first occurrence of that tag. The substitution
operates on the whole buffer and processes all of the tags in one pass
(accepting multiline non-overlapping tags). The function returns a dictionary
that maps tag names to contents of their first occurrence.
function! ReplicateTags()
let dict = {}
%s/{\([^}]\+\)}\(\_.\{-}\){\/\1}/\=Tag(dict, submatch(1), submatch(2))/ge
return dict
endfunction
function! Tag(dict, tag, str)
let a:dict[a:tag] = get(a:dict, a:tag, a:str)
return printf('{%s}%s{/%s}', a:tag, a:dict[a:tag], a:tag)
endfunction

Regexp for finding tags without nested tags

I'm trying to write a regexp which will help to find non-translated texts in html code.
Translated texts means that they are going through special tag: or through construction: ${...}
Ex. non-translated:
<h1>Hello</h1>
Translated texts are:
<h1><fmt:message key="hello" /></h1>
<button>${expression}</button>
I've written the following expression:
\<(\w+[^>])(?:.*)\>([^\s]+?)\</\1\>
It finds correct strings like:
<p>text<p>
Correctly skips
<a><fmt:message key="common.delete" /></a>
But also catches:
<li><p><fmt:message key="common.delete" /></p></li>
And I can't figure out how to add exception for ${...} strings in this expression
Can anybody help me?
If I understand you properly, you want to ensure the data inside the "tag" doesn't contain fmt:messsage or ${....}
You might be able to use a negative-lookahead in conjuction with a . to assert that the characters captured by the . are not one of those cases:
/<(\w+)[^>]*>(?:(?!<fmt:message|\$\{|<\/\1>).)*<\/\1>/i
If you want to avoid capturing any "tags" inside the tag, you can ignore the <fmt:message portion, and just use [^<] instead of a . - to match only non <
/<(\w+)[^>]*>(?:(?!\$\{)[^<])*<\/\1>/i
Added from comment If you also want to exclude "empty" tags, add another negative-lookahead - this time (?!\s*<) - ensure that the stuff inside the tag is not empty or only containing whitespace:
/<(\w+)[^>]*>(?!\s*<)(?:(?!\$\{)[^<])*<\/\1>/i
If the format is simple as in your examples you can try this:
<(\w+)>(?:(?!<fmt:message).)+</\1>
Rewritten into a more formal question:
Can you match
aba
but not
aca
without catching
abcba ?
Yes.
FSM:
Start->A->B->A->Terminate
Insert abcba and run it
Start is ready for input.
a -> MATCH, transition to A
b -> MATCH, transition to B
c -> FAIL, return fail.
I've used a simple one like this with success,
<([^>]+)[^>]*>([^<]*)</\1>
of course if there is any CDATA with '<' in those it's not going to work so well. But should do fine for simple XML.
also see
https://blog.codinghorror.com/parsing-html-the-cthulhu-way/
for a discussion of using regex to parse html
executive summary: don't