Vim search replace regex + incremental function - regex

I'm currently stuck in vim trying to find a search/replace oneliner to replace a number with another + increment for each new iteration = when it finds a new match.
I'm working in xml svg code to batch process files Inkscape cannot process the text (plain svg multiline text bug).
<tspan
x="938.91315"
y="783.20563"
id="tspan13017"
style="font-weight:bold">Text1:</tspan><tspan
x="938.91315"
y="833.20563"
id="tspan13019">Text2</tspan><tspan
x="938.91315"
y="883.20563"
id="tspan13021">✗Text3</tspan>
etc.
So what I want to do is to change that to this result:
<tspan
x="938.91315"
y="200"
id="tspan13017"
style="font-weight:bold">Text1:</tspan><tspan
x="938.91315"
y="240"
id="tspan13019">Text2</tspan><tspan
x="938.91315"
y="280"
id="tspan13021">✗Text3</tspan>
etc.
So I duckducked and found the best vim tips resource from zzapper, but I cannot understand it:
convert yy to 10,11,12 :
:let i=10 | ’a,’bg/Abc/s/yy/\=i/ |let i=i+1
I then adapted it to something I can understand and should work in my home vim:
:let i=300 | 327,$ smagic ! y=\"[0-9]\+.[0-9]\+\" ! \=i ! g | let i=i+50
But somehow it doesn't loop, all I get is that:
<tspan
x="938.91315"
300
id="tspan13017"
style="font-weight:bold">Text1:</tspan><tspan
x="938.91315"
300
id="tspan13019">Text2</tspan><tspan
x="938.91315"
300
id="tspan13021">✗Text3</tspan>
So here I'm seriously stuck. I cannot figure out what doesn't work :
My adaptation of the original formula ?
My data layout ?
My .vimrc ?
I'll try to find other resources by myself, but on that kind of trick they are pretty rare I find, and like in zzapper tips, not always delivered with a manual.

One way to fix it:
:let i = 300 | g/\m\<y=/ s/\my="\zs\d\+.\d\+\ze"/\=i/ | let i += 50
Translation:
let i = 300 - hopefully obvious
g/\m\<y=/ ... - for all lines matching \m\<y=, apply the following command; the "following command" is s/.../.../ | let ...; the regexp:
\m - "magic" regexp
\< - match only at word boundary
s/\my="\zs\d\+.\d\+\ze"/\=i/ - substitute; the regexp:
\m - "magic" regexp
\d\+ - one or more digits
\zs...\ze - replace only what is matched between these points
\=i - replace with the value of expression i
let i += 50 - hopefully obvious again.
For more information: :help :g, :help \zs, :help \ze, help s/\\=.

Just to add my take as a memo (wrote this as an answer as an EDIT didn't seem right). Sorry it is not the best vim scripting here but it enables me to understand (I'm not a vim specialist).
:let i=300 | 323,$g/y="/smagic![0-9]\+.[0-9]\+!\=i!g | let i+=50
Assign the initial value to i :
:let i=300
Start :global (:g) function from line 323 to the end of file:
323,$g
Pattern to match for executing the commands (litteral text here)
y="
Substitution with magic on (magic meaning special characters "enabled")
smagic
Pattern to find
[0-9]\+.[0-9]\+
(numbers between 0-9 one or more times, a litteral dot, the numbers again)
Replaced with
\=i
\= tells vim to evaluate i not to write it litterally
Increment i with 50 for the next iteration
let i+=50
This part is still in the g function.
The separators, in bold:
| are the separators between the different functions
/ are the separators in the :g function
! are the separators in the smagic function

Related

RegEx to format Wikipedia's infoboxes code [SOLVED]

I am a contributor to Wikipedia and I would like to make a script with AutoHotKey that could format the wikicode of infoboxes and other similar templates.
Infoboxes are templates that displays a box on the side of articles and shows the values of the parameters entered (they are numerous and they differ in number, lenght and type of characters used depending on the infobox).
Parameters are always preceded by a pipe (|) and end with an equal sign (=). On rare occasions, multiple parameters can be put on the same line, but I can sort this manually before running the script.
A typical infobox will be like this:
{{Infobox XYZ
| first parameter = foo
| second_parameter =
| 3rd parameter = bar
| 4th = bazzzzz
| 5th =
| etc. =
}}
But sometime, (lazy) contributors put them like this:
{{Infobox XYZ
|first parameter=foo
|second_parameter=
|3rd parameter=bar
|4th=bazzzzz
|5th=
|etc.=
}}
Which isn't very easy to read and modify.
I would like to know if it is possible to make a regex (or a serie of regexes) that would transform the second example into the first.
The lines should start with a space, then a pipe, then another space, then the parameter name, then any number of spaces (to match the other lines lenght), then an equal sign, then another space, and if present, the parameter value.
I try some things using multiple capturing groups, but I'm going nowhere... (I'm even ashamed to show my tries as they really don't work).
Would someone have an idea on how to make it work?
Thank you for your time.
The lines should start with a space, then a pipe, then another space, then the parameter name, then a space, then an equal sign, then another space, and if present, the parameter value.
First the selection, it's relatively trivial:
^\s*\|\s*([^=]*?)\s*=(.*)$
Then the replacement, literally your description of what you want (note the space at the beginning):
| $1 = $2
See it in action here.
#Blindy:
The best code I have found so far is the following : https://regex101.com/r/GunrUg/1
The problem is it doesn't align the equal signs vertically...
I got an answer on AutoHotKey forums:
^i::
out := ""
Send, ^x
regex := "O)\s*\|\s*(.*?)\s*=\s*(.*)", width := 1
Loop, Parse, Clipboard, `n, `r
If RegExMatch(A_LoopField, regex, _)
width := Max(width, StrLen(_[1]))
Loop, Parse, Clipboard, `n, `r
If RegExMatch(A_LoopField, regex, _)
out .= Format(" | {:-" width "} = {2}", _[1],_[2]) "`n"
else
out .= A_LoopField "`n"
Clipboard := out
Send, ^v
Return
With this script, pressing Ctrl+i formats the infobox code just right (I guess a simple regex isn't enough to do the job).

regex match all but not specific string

I have some xml file and want to remove everything but a specific string.
There are quite other similar questions at StackOverflow but none of them works for my file and after a few hours of trying different regex I would like to ask for a help.
so far the closest regex which succeeded partly but not completely is:
^((?!<query.*<\/query>).)*$
a sample of the xml file:
<search>
<query>index=_internal [`set_local_host`] source=*license_usage.log* type="Usage" | eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) | eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) | eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) | bin _time span=1d | stats sum(b) as b by _time, pool, s, st, h, idx | timechart span=1d sum(b) AS volumeB by st fixedrange=false | join type=outer _time [search index=_internal [`set_local_host`] source=*license_usage.log* type="RolloverSummary" | eval _time=_time - 43200 | bin _time span=1d | stats latest(stacksz) AS "stack size" by _time] | fields - _timediff | foreach * [eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)] </query>
<earliest>$central_time.earliest$</earliest>
<latest>$central_time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.chart.stackMode">stacked</option>
<option name="charting.chart.style">shiny</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
</row>
<row>
<panel>
<chart>
<search>
<query>index=_introspection sourcetype=splunk_resource_usage component=hostwide saxsa
| eval tcu = ('data.cpu_system_pct' + 'data.cpu_user_pct')
| timechart limit=0 span=1d avg(tcu) by host</query>
<earliest>$central_time.earliest$</earliest>
<latest>$central_time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
I use regex101 so the sample can be paste there in order to see why the rex is working only partly. To tell shortly , it doesn't match the first occurrence of but it matches the second occurrence. What I expect is that the regex does not match any of the occurrence of <query>.*</query>
fx. I want to match anything but not the following string:
<query>anything between(can be multiple lines*)</query>
Sorry for the delayed response. Partially because I'm at work, partially due to the fact that this type of situation was actually rather new to me (I love regular expressions, but I haven't had any exposure to this situation so it's a learning experience for both of us), but I think I may have a solution that you're looking for.
What I basically tried to do was use a bit of recursion in the expression with a combination of a negative look-ahead and negative lookbehind to ensure I'm not capturing any <query> tags
<(?!query).*(?<!<\/query)(?R)*>
< - Matches the literal character < to match the beginning of an opening tag
(?!query) - matches the all of the text opening tags proceeding < excluding query
.* - Matches all of the characters (including the > of the opening tag) up until:
(?<!<\/query) which is a negative look-behind assertion to ensure I'm not getting text from the .* with anything before a </query closing tag (note the missing >).
(?R)* - This is the part that took me a while to wrap my mind around so I may possibly butcher this explanation because I haven't used this before. Whatever the pattern is BEFORE this pattern, it recurses the entire regular expression from the current string position. Now I know that sounded confusing because it was (still is slightly) confusing to me too. But, I believe once <(?!query).*(?<!<\/query) finds it's first match, which would be <search>, from there, it repeats that entire pattern from the end of <search>. Thus, it will then check for an opening <query and a closing </query tag. And if it finds it, it skips it.
> - Matches the literal closing tag >, in which if the XML is written correctly, it should match <search's closing tag.
Test of your example with the following regex here
I sincerely hope this helps!

Regex to Capture and wrap outline formatted text

I have source text that is not particularly clean or well formed but I have a need to find text and wrap a line in a tag. The text is in outline format.
1. becomes a <h1> tag
A. becomes a <h2> tag
(1) becomes a <h3> tag
and so on...
Here are some examples of the source.
PREPARE FOR TEST A. Open the door. B. Turn on the light.
The desired result would be
<h1>1. PREPARE FOR TEST</h1>
<h2>A. Open the door.</h2>
<h2>B. Turn on the light.</h2>
Unfortunately, the text could be the same line or it could be on multiple lines or even have a different number of spaces between the outline number and the text. Another example
(1) Check air inlet and air outlet valves are shown open if OAT is above > 53.6 deg F., or closed if OAT is below
48.2 deg F.
In this case the desired result would be
<h3>(1) Check skin air inlet and skin air outlet valves are shown open if temperature is above 53.6 deg F., or closed if temperature is below 48.2 deg F.</h3>
My questions are
How do I find an entire line of text that is associated with an outline level, i.e., the 1., A., (1) and so on.
How do I then wrap that text with the appropriate tag.
I'm not particularly strong at regex, I have been able to do some of the simpler things required of this project but this has me stumped a bit. Here's what I used to try to find the H1 lines, but as anyone that knows regex can plainly see, this won't work past the first word.
\d{1,3}.\s+[A-Z]{2,}
I'm using Python at the moment but am better with PHP and can move to that if needed and still may because I'm better at PHP then Python.
Thank you.
Since every regex needs a different substitution, you need to apply each regex in turn. Assuming that you want the match to always span an entire line, I'd suggest something like this:
import re
s = """1. becomes a h1 tag
A. becomes a h2 tag
(1) becomes a h3 tag
and so on..."""
regexes = {r"\d+\.": "h1",
r"[A-Z]+\.": "h2",
r"\(\d+\)": "h3",
}
for regex in regexes:
repl = regexes[regex]
s = re.sub("(?m)^" + regex + ".*", "<" + repl + ">" + r"\g<0>" + "</" + repl + ">", s)
print(s)
Result:
<h1>1. becomes a h1 tag</h1>
<h2>A. becomes a h2 tag</h2>
<h3>(1) becomes a h3 tag</h3>
and so on...
Explanation:
Each of the regexes (which only match the actual identifiers) is modified to match from the start of the line until the end of the line:
"(?m)^" + regex + ".*" # (?m) allows ^ to match at the start of lines
The entire match is contained in group 0 which can be accessed in the replacement string via \g<0>.
"<" + repl + ">" + r"\g<0>" + "</" + repl + ">" # add tags around line
For future reference and to close this, what I eventually came up with was to run through the entire string of text and remove some trash first. There are actually 15 of these that I use for this step.
$regexes['lf'] = "/[\n\r]*/";
$regexes['tab-cr-lf'] = "/\t[\r\n]/";
preg_replace($regexes,"", $string);
I then discovered that I could count on space and \t after each header identifier, so then I run some more regexes on the string
$regexes['step1'] = "/(\d{1,2}\..\t)/";
$regexes['step2'] = "/([A-Z]\. \t)/";
$replacements['step1'] = "\n\n<step1>$0";
$replacements['step2'] = "\n\n<step2>$0";
preg_replace($this->headerRegexes, $replacements, $string);
These steps have given me some usable text that I can work with.
Thanks to everyone that chimed in, it gave me somethings to think about as I tackled this problem.

Format a text file by regex match and replace

I have a text file that looks like the following:
Chanelle
Jettie
Winnie
Jen
Shella
Krysta
Tish
Monika
Lynwood
Danae
2649
2466
2890
2224
2829
2427
2816
2648
2833
2453
I need to make it look like this
Chanelle 2649
Jettie 2466
... ...
I tried a lot on sublime editor but couldn't figure out the regex to do that. Can somebody demonstrate if it can be done.
I tested the following in Notepad++ but it should work universally.
Use this as the search string:
(?:(\s+[A-Za-z]+)(\r?\n))((?:\s*[A-Za-z]*\r?\n)+)\s+(\d+)
and this as the replacement:
$1 $4$2$3
Running a replace with it once will do one line at a time, if you run it multiple times it'll continue to replace lines until there are no matching lines left.
Alternatively, you can use this as the replacement if you want to have the values aligned by tabs, but it's not going to match in all cases:
$1\t\t$4$2$3
While the regex answer by SeinopSys will work, you don't need a regex to do this - instead, you can take advantage of Sublime's multiple cursors.
Place your cursor at the beginning of line 1, then hold down Shift↓ to select all the names.
Hit CtrlShiftL (Selection -> Split into Lines) to split the selection into lines.
CtrlC to copy.
Place your cursor on line 11 (the first number line) and press CtrlShift↓ (Windows/OS X) or AltShift↓ (Linux) to place a cursor at the beginning of each number line.
Hit CtrlV to paste the names before the numbers.
You can now delete the names at the top and you're all set. Alternatively, you could use CtrlX to cut the names in step 3.

Format all IP-Addresses to 3 digits

I'd like to use the search & replace dialogue in UltraEdit (Perl Compatible Regular Expressions) to format a list of IPs into a standard Format.
The list contains:
192.168.1.1
123.231.123.2
23.44.193.21
It should be formatted like this:
192.168.001.001
123.231.123.002
023.044.193.021
The RegEx from http://www.regextester.com/regular+expression+examples.html for IPv4 in the PCRE-Format is not working properly:
^(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]){3}$
I'm stucked. Does anybody have a proper solution which works in UltraEdit?
Thanks in advance!
Set the regular expression engine to Perl (on the advanced section) and replace this:
(?<!\d)(\d\d?)(?!\d)
with this:
0$1
twice. That should do it.
If your input is a single IP address (per line) and nothing else (no other text), this approach will work:
I used "Replace All" with Perl style regular expressions:
Replace (?<!\d)(?=\d\d?(?=[.\s]|$))
with 0
Just replace as often as it matches. If there is other text, things will get more complicated. Maybe the "Search in Column" option is helpful here, in case you are dealing with CSV.
If this is just a one-off data cleaning job, I often just use Excel or OpenOffice Calc for this type of thing:
Open your textfile and make sure only one IP address per line.
Open Excel or whatever and goto "Data|Import External Data" and import your textfile using "." as the separator.
You should now have 4 columns in excel:
192 | 168 | 1 | 1
Right click and format each column as a number with 3 digits and leading zeroes.
In column 5 just do a string concatenation of the previous columns with a "." in between each column:
A1 & "." & B1 & "." & C1 & "." & D1
This obviously is a cheap and dirty fix and is not a programmatic way of dealing with this, but I find this sort of technique useful for cleaning up data every now and then.
I'm not sure how you can use Regular Expression in Replace With box in UltraEdit.
You can use this regular expression to find your string:
^(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])$