I have a script which pulls XML hosted online and saves it locally. The script then goes through the local file and replaces/adds certain text. However, for some reason, when I use the "&" symbol, there is an extra space inserted along with it within the element text. Here is a sample of the XML elements I am parsing:
<TrackingEvents>
<Tracking event="rewind">
http://www.example.com/rewind_1.png?test=rewind_test
</Tracking>
<Tracking event="pause">
http://www.example.com/pause_1.png?test=rewind_test
</Tracking>
However, after running my script to add the additional test to my elements, the text is added with an additional space, like this:
<TrackingEvents>
<Tracking event="rewind">
http://www.example.com/rewind_1.png?test=rewind_test &cb={CACHEBUSTER}
</Tracking>
<Tracking event="pause">
http://www.example.com/pause_1.png?test=rewind_test &cb={CACHEBUSTER}
</Tracking>
I have tried everything but I don't know why this is occurring or what I can do to prevent this space from being added. I have even tried to strip the white space as well. When I look at the XML that is saved locally before uploading it, everything looks fine (& is for the "&" symbol) as seen here from the source:
<Tracking event="rewind">
http://www.example.com/rewind_1.png?test=rewind_test
&cb={CACHEBUSTER}</Tracking>
<Tracking event="pause">
http://www.example.com/pause_1.png?test=rewind_test
&cb={CACHEBUSTER}</Tracking>
Here is what the code from my script looks like:
for URL, xml_name, original_server in tqdm(XML_tags):
response = requests.get(URL)
with open(xml_name, 'wb') as file:
file.write(response.content)
with open(xml_name) as saved_file:
tree = ET.parse(saved_file)
root = tree.getroot()
for element in root.iter(tag=ET.Element):
if element.text != None:
if ".png" in element.text:
if "?" in element.text:
element.text = element.text + "&cb={CACHEBUSTER}"
element.text = element.text.strip()
else:
element.text = element.text + "?cb={CACHEBUSTER}"
element.text = element.text.strip()
else:
pass
server = "example.server: ../sample/sample/" + original_server
tree.write(xml_name, xml_declaration=True, method='xml',
encoding='utf8')
server_upload = subprocess.Popen(["scp", xml_name, server])
upload_wait = os.waitpid(server_upload.pid, 0)
I can definitely use some help with this. Thanks.
Update: Actually, it appears that this has nothing to do with using the "&". Here is a sample when I just add different text:
<TrackingEvents>
<Tracking event="rewind">
http://www.example.com/rewind_1.png?test=rewind_test test123
</Tracking>
<Tracking event="pause">
http://www.example.com/pause_1.png?test=rewind_test test123
</Tracking>
</TrackingEvents>
The whitespace was in the original XML even before you add anything to element.text; it is the newline between the last letter in the .text and the closing tag. So you should have removed the whitespace before appending text instead of after appending as you did in your code above :
....
if "?" in element.text:
element.text = element.text.strip() + "&cb={CACHEBUSTER}"
else:
element.text = element.text.strip() + "?cb={CACHEBUSTER}"
....
Related
I'm trying to replace part an XML response data with something else.
Here is an example:
?xml version="1.0" encoding="UTF-8"?>
<trustedDevices><trustedDevice><id>1942</id><name>BksQ9LKwWuNOHpn</name></trustedDevice><trustedDevice><id>1944</id><name>6f4srs4PkJk1j36</name></trustedDevice><trustedDevice><id>1943</id><name>7cGYVAlmQoXaVrf</name></trustedDevice></trustedDevices>
I'm trying to get all the <name>(.+?)<\/name> data and replace it with something else (timestamp or random string)
so far, my groovy post processor code looks like this:
String trustedDevices = prev.getResponseDataAsString()
log.info('Response: ' + trustedDevices)
def nameFind = "/<name>(.+?)<\/name>/"
def newTrustedDevices = trustedDevices.replaceAll(nameFind, "test")
log.info('New response: ' + newTrustedDevices)
Unfortunately it seems that replaceAll requires String or Long to work, and won't work with regex.
You regex just need a correct escaping:
def nameFind = "<name>(.+?)<\\/name>"
Replacing values in XML using regular expressions is not the best option as it will be fragile and very sensitive to any markup change.
I would suggest going for Groovy's XML parsing capabilities instead
Example code:
def trustedDevices = new XmlSlurper().parseText(prev.getResponseDataAsString())
trustedDevices.trustedDevice.findAll().each {
it.name = 'test'
}
def newTrustedDevices = new StreamingMarkupBuilder().bind { mkp.yield trustedDevices }.toString()
More information on Groovy scripting in JMeter: Apache Groovy - Why and How You Should Use It
I'm quite new to using Django. As first project I wrote a little tool to create M3U8-Playlists.
My problem is, that long playlists are not transferred completely.
This is the stripped-down code:
def create(request):
# just a dummy filelist
playlist = ["#EXTM3U"] + 5 * ["/home/pi/music/" + 5 * "äöü0123456789á/" + "xyz.mp3"]
file_to_send = ContentFile("")
for item in playlist:
file_to_send.write("{}\n".format(item.replace("/home/pi/Music", r"\\raspberry\music").replace("/", "\\")))
response = HttpResponse(file_to_send, "audio/x-mpegurl")
response["Content-Length"] = file_to_send.size
response["Content-Disposition"] = f"attachment; filename=\"playlist.m3u8\""
# print some debug info
print("lines:", len(playlist), "chars (no linebreaks)", sum([len(entry) for entry in playlist]),
"filesize:", file_to_send.size)
return response
The problem seems to lie in the non-ascii chars in playlist entries (äöüá). When there are no such characters, the file is transferred intact. I assume that these are characters that use two bytes in UTF-8, but writing strings to the ContentFile like I do is probably not correct.
Found the answer, while working on the problem description.
This works:
def create(request):
# just a dummy filelist
playlist = ["#EXTM3U"] + 5 * ["/home/pi/music/" + 5 * "äöü0123456789á/" + "xyz.mp3"]
joined_playlist = "\n".join([item.replace("/home/pi/Music", r"\\raspberry\music").replace("/", "\\") for item in playlist])
file_to_send = ContentFile(joined_playlist.encode("UTF-8"))
response = HttpResponse(file_to_send, "audio/x-mpegurl")
response["Content-Length"] = file_to_send.size
response["Content-Disposition"] = f"attachment; filename=\"playlist.m3u8\""
# print some debug info
print("lines:", len(playlist), "chars (no linebreaks)", sum([len(entry) for entry in playlist]),
"filesize:", file_to_send.size)
return response
The important difference is, that I don't write Strings to the ContentFile any longer, but a byte array, which I got through encoding the String in UTF-8.
HTH
I'm trying to troncate several element.text on a xml file. I succeed to get two list, the first one regroup the formers too long element.text as str (long_name) and the second regroup the same after a troncation (short_name).
Now i want to replace the element.text on my xml, i tried some script but i surrended to work with the function readlines(), i want to find a similar solution with lxml as this code :
txt = open('IF_Generic.arxml','r')
Lines = txt.readlines()
txt.close()
txt = open('IF_Genericnew.arxml','w')
for e in range(len(long_name)) :
for i in range(len(Lines)) :
if (long_name[e] in Lines[i]) == True :
Lines[i] = Lines[i].replace(long_name[e],short_name[e])
for i in Lines :
txt.write(i)
txt.close()
I tried this, but it doesn't work :
f = open('IF_Generic.arxml')
arxml = f.read()
f.close()
tree = etree.parse(StringIO(arxml))
for e,b in enumerate(long_name) :
context = etree.iterparse(StringIO(arxml))
for a,i in context:
if not i.text:
pass
else:
if (b in i.text) == True :
i.text = short_name[e]
obj_arxml = etree.tostring(tree,pretty_print=True)
f = open('IF_Genericnew.arxml','w')
f.write(obj_arxml)
f.close()
Let's say the first element of the list long_name is RoutineServices_EngMGslLim_NVMID03
<BALISE_A>
<BALISE_B>
<SHORT-NAME>RoutineServices_EngMGslLim_NVMID03</SHORT-NAME>
</BALISE_B>
</BALISE_A>
<BALISE_C>
<POSSIBLE-ERROR-REF DEST="APPLICATION-ERROR">/Interfaces/RoutineServices_EngMGslLim_NVMID03/E_NOT_OK</POSSIBLE-ERROR-REF>
<SHORT-NAME>Blah_Bleh_Bluh</SHORT-NAME>
</BALISE_C>
The first element of the list short_name is RoutineServices_EngMGslLim_NV
<BALISE_A>
<BALISE_B>
<SHORT-NAME>RoutineServices_EngMGslLim_NV</SHORT-NAME>
</BALISE_B>
</BALISE_A>
<BALISE_C>
<POSSIBLE-ERROR-REF DEST="APPLICATION-ERROR">/Interfaces/RoutineServices_EngMGslLim_NV/E_NOT_OK</POSSIBLE-ERROR-REF>
<SHORT-NAME>Blah_Bleh_Bluh</SHORT-NAME>
</BALISE_C>
I want this
P.S: I use python 2.7.9
Thanks in advance everyone !
Don't open XML files like text files. I have explained in this answer why this is a bad idea.
Simply let etree read and write the file. It's also less code to write.
from lxml import etree
# read the file and load it into a DOM tree
tree = etree.parse('IF_Generic.arxml')
for elem in tree.iterfind("//*"):
# find elements that contain only text
if len(elem) == 0 and elem.text and elem.text.strip() > '':
# do your replacements ...
elem.text = "new text"
# serialize the DOM tree and write it to file
tree.write('IF_Genericnew.arxml', pretty_print=True)
Instead of going over all elements, which is what "//*" does, you can use more specific XPath to narrow down the elements you want to work on.
For example, something like "//SHORT-NAME | //POSSIBLE-ERROR-REF" would help to reduce the overall work load.
I'm trying to solve an issue with posting comments for a blog that uses the Weblog Sitecore module. From what I can tell, if the blog entry url contains dashes (i.e. http://[domain.org]/blog/2016/december/test-2-entry), then I get the "End of string expected at line [#]" error. If the blog entry url does NOT contain dashes, then the comment form works fine.
<replace mode="on" find="-" replaceWith="_"/>
Also tried to replace the dash with an empty space. Neither solution has worked as I still get the error.
Is there some other setting in the Web.config I can alter to escape the dashes in the urls? I have read that enclosing dashed url text with the # symbol works, but I'd like to be able to do that automatically instead of having the user go back and rename all their blog entries.
Here is a screenshot of the error for reference:
I have not experience the Weblog module but for the issue you are facing, you should escape the dash with #. Please see the following code snippet:
public string EscapePath(string path)
{
string[] joints = Regex.Split(path, "/");
string output = string.Empty;
for (int index = 0; index < joints.Length; index++)
{
string joint = joints[index];
if (!string.IsNullOrEmpty(joint))
output += string.Format("#{0}#", joint);
if (index != joints.Length - 1)
output += "/";
}
return output;
}
Reference: https://github.com/WeTeam/WeBlog/issues/52
More information about escaping dash in queries can be found here
UPDATE
You should call this method before posting the comment for it to escape the dashes. You may also download the dll from here and use it in your solution
I'm looking to match a part of several HTML files that get passed into a loop in an ASP file and then return that part of the HTML files to include in my output. Here's my code so far:
<%for i=0 to uBound(fileIDs) ' fileIDs is an array of URLs
dim srcText, outText, url
Set ex = New RegExp
ex.Global = true
ex.IgnoreCase = true
ex.Pattern = "<section>[\S\s]+</section>" ' This finds the HTML I want
url = fileIDs(i)
Set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
xmlhttp.open "GET", url, false
xmlhttp.send ""
srcText = xmlhttp.responseText
outputText = ex.Execute(mediaSrcText) ' I expect this to be the HTML I want
Response.Write(outputText.Item(0).Value) ' This would then return the first instance
set xmlhttp = nothing
next %>
I've tested the regular expression on my files and it's matching the parts that I want it to.
when I run the page containing this code, I get an error:
Microsoft VBScript runtime error '800a01b6'
Object doesn't support this property or method
on the line with ex.Execute. I've also tried ex.Match, but got the same error. So I'm clearly missing the proper method for returning the match so I can write it out into the file. What is that method? Or am I approaching the problem from the wrong direction?
Thanks!
You need a Set when you're assigning outputText:
Set outputText = ex.Execute(mediaSrcText)
I should probably also say that you really shouldn't be using regular expressions to attempt to parse HTML, although I don't know enough about the context to offer more specific advice.