In VQMod it is possible to perform the same operations on multiple files by separating the file paths with a comma, e.g.
<file name="path/to/file1.php, path/to/file2.php, path/to/file3.php">
I am wanting to add the same bit of code after three other lines of code in one file. Is there a way to do this with one search instead of using three operations?
<search position="after"><![CDATA[<?php //some code here ?>, <?php some more code; ?>, <?php //more code; ?>]]></search>
It is possible to use regular expressions to specify the three search terms in one operation:
<search regex="true" position="after"><![CDATA[~regex-here-including-delimiters~]]></search>
So, for our three bits of code above, we get:
<search regex="true" position="after"><![CDATA[~<\?php //some code here \?>| <\?php some more code; \?>|<\?php //more code; \?>~]]></search>
Related
I have the following XML code:
<quantity1 value="foo" name="bar">
<subquantity duration="2">
<parameter unit="meters" />
</subquantity>
</quantity1>
I want to export all names for further analysis in another document, but only if they have a certain subvalue. For example, how can I use regex to find all names based on if unit="meters"?
Bonus points if you can instruct how to do this in Notepad++. Open to other suggestions/SO posts as well.
Regular expressions are wrong for parsing XML.
Use XPath in XSLT or a scripting language or xmlstarlet instead.
Examples:
//quantity1[subquantity/parameter/#unit="meters"]/#name
//*[*/*/#unit="meters"]/#name
//*[.//#unit="meters"]/#name
I'm currently trying to modify an existing xml text in a way so that a begin and an end tag are added inside specific tags.
I'm trying to use regex there (in perl), but I'm doing something wrong there.
First as example the original text:
......
<xvcs:insert id="1" name="test1" data="mydata"><b>Test1</b></div>
<xvcs:insert id="2" name="test2" class="result">Test2</div>
.....
I want to add after every tag and before every tag.
Thus it should be:
<xvcs:insert id="1" name="test1" data="mydata"><span class="test"><b>Test1</b></span></div>
<xvcs:insert id="2" name="test2" class="result"><span class="test">Test2</span></div>
What I got so far is:
$newtext =~ s/(\<xvcs\:insert(.)+\>)/$1<span class="test">/g;
$newtext =~ s/(\<\/xvcs\:insert\>)/<\/span>$1/g;
But it doesn't function as intended. The first part adds it even after which is not as intended.
So my question is there how could it be done better / more stable? (or what did I do wrong there?)
(the result has to be a string which is why I went with the regex path as I don't want to go through an array and combine the arrayelements into a string again one by one).
You have:
s/(\<xvcs\:insert(.)+\>)/$1<span class="test">/g
Cleaned up:
s/(<xvcs:insert.+>)/$1<span class="test">/g
The problem is that .+ is too permissive. The following > matches the one in "<b>". Fixed:
s/(<xvcs:insert[^>]*>)/$1<span class="test">/g;
All together:
$newtext =~ s{(<xvcs:insert[^>]*>)}{$1<span class="test">}g;
$newtext =~ s{(</xvcs:insert>)}{</span>$1}g;
Or if you have 5.10+ (for \K):
$newtext =~ s{<xvcs:insert[^>]*>\K}{<span class="test">}g;
$newtext =~ s{(?=</xvcs:insert>)}{</span>}g;
Using vim, I am attempting to remove all text outside of <text> blocks. This needs to span across newlines and other (unrelated) tags.
I have attempted to use regex to substitute text for newlines, but failed for a couple of reasons, one of which was my attempts did not span multiple lines, and I need to have my matches be non-greedy. (Is that accomplished using {-} somehow?)
The regex that should match the content I would like to delete would look like: <//text>.*<text.*> but if I make this match non-greedy, I may have other issues. (I also realize I'll have one partial tag section to clean up at the beginning doing this.)
Is there another approach that I should be taking, or can someone guide me to remove all content not between such tags using vim?
EDIT: Including sample text
<contributor>
<username>MalafayaBot</username>
<id>628</id>
</contributor>
<minor />
<comment>Robô: A modificar Categoria:Vocábulo de étimo latino (Português) para Categoria:Entrada de étimo latino (Português)</comment>
<text xml:space="preserve">={{-pt-}}=
==Substantivo==
{{flex.pt|ms=excerto|mp=excertos}}
{{paroxítona|ex|cer|to}} {{m}}
# [[extrato]] de um [[texto]], [[fragmento]]
#: ''A seguir, um '''excerto''' do texto original.''
===Tradução===
{{tradini}}
* {{trad|es|extracto}}
* {{trad|fr|extrait}}
{{tradmeio}}
* {{trad|en|excerpt}}
{{tradfim}}
=={{etimologia|pt}}==
:Do latim ''[[excerptu]]'' (colhido de).
=={{pronúncia|pt}}==
===Brasil===
* [[SAMPA]]: /e."sEx.tu/
* [[AFI]]: /esˈertu/
[[zh:excerto]]</text>
<sha1>8i1zywj37s74ah4wnai11ohorfjn8j5</sha1>
<model>wikitext</model>
Your struggles with regular expressions indicate that you're using the wrong tool for the job.
For text extraction from XML, you can use XSLT, which will handle all special cases far better than a regular expression. Or use special-purpose tools like xidel, a kind of grep for XML. With it, the extraction is as easy as:
xidel --extract "//text" input.xml
if you don't NEED to you vim, you can try using this sed command, just replace "test" with the name of your file. I would test this on a COPY of your file first since the -i option tells sed to modify the actual file you pass in.
sed -i 's/<\/text>[^<]*/<\/text>/g' test
EDIT: after seeing the sample, I'm going to take a different approach... instead of getting rid of all the text not within tags.. I'm going to select all the blocks and output it to a new file. Hopefully your version of grep supports the -P option. Try this:
grep -Pzo "(?s)<text.*?<\/text>" sample.txt > out.txt
I assume that there is only one <text> block in your file. In vim this line works for your sample text:
%s#\_.*\(<text.\{-}>\_.*</text>\)\_.*#\1#
I'm running up against my failure to understand regex substitution patterns and Apache Ant's limited documentation on propertyregex. My problem is that I need to take the ${user.name} property and make a lowercase version called ${user.name.lc} but I can't get the replace string correct.
This is what I've got:
<target name="foobar">
<echo>${user.name}</echo>
<propertyregex
property="user.name.lc"
input="${user.name}"
regexp="[A-Z]"
replace="[a-z]"
global="true" />
<echo>${user.name.lc}</echo>
</target>
It finds the upper case portions of the name correctly, but the replacement bombs. This is what I get:
foobar:
[echo] Sally Fields
[echo] [a-z]ally [a-z]ields
I've been googling and reading for about two hours trying different substitution strings. The ant document refers to groupings and shows examples with these. No help for me because there may or may not be groupings in the user name.
Can anyone provide me with what Ant says I need a "regular expression substitition pattern?"
my
Don't use regex for this. There are only a few regex engines which support what you are looking for and I don't think propertyregex is one of them. Use this instead :
<pathconvert property="converted">
<path path="${user.name}"/>
<chainedmapper>
<flattenmapper/>
<scriptmapper language="javascript">
self.addMappedName(source.toLowerCase());
</scriptmapper>
</chainedmapper>
</pathconvert>
<echo>${converted}</echo>
you can use %1> in the replace attribute. > is the standard regex symbol for converting to upper case, so you code will look like :
<propertyregex
property="user.name.lc"
input="${user.name}"
regexp="[A-Z]"
replace="%1>"
global="true" />
Basically I am using MS Word's find and replace feature (wildcard:true)
but the text I have to edit contains stuff that messes up the search:
//more text in different format above
<file name="london_bitmap" bits="24", owner="sergio"> 1 2 3 </file>
<file name="paris_bitmap" bits="24", owner="sergio"> 1 2 3 </file>
<file name="moscow_bitmap" bits="24", owner="sergio"> 1 2 3 </file>
I want to replace bitmap with a bmp prefix so:
<file name="bmp_london" bits="24", owner="sergio"> 1 2 3 </file>
When I use something like this:
(<*>)_(<bitmap>)
it captures not 1 line but all it can find till it hits "bitmap"
Any idea on how to solve this? Maybe but getting the word just previous to "bitmap"?
The following works (in Word 2003) as the search string
([!"]#)_(<bitmap>)
The [!"] part means : match any single character that isn't a double quote character, and the # qualifier means "find at least one of the preceding". The replacement expression (which I expect you already know :) is
bmp_\1
Hope this helps!
Not sure if word supports lazy evaluation but you might try replacing * by *?