Can we copy text after a image from unicode file in Sikuli? - python-2.7

I have a file content as below
<?xml version='1.0' encoding='UTF-8'?><cont:ContactId><all:Individual.partyId>10028305</all:Individual.partyId></cont:ContactId><all:applicationId>C18400</all:applicationId>
I need to copy "C18400" from this file and print in output in SIKULI. Please let me know whether we can use python script to fetch the output or is there any other way to do it. Also let me know whether we can use sikuli image capture for this, but this application id that I need in output is dynamic which keeps changing.
I used the below python script, but it didn't work
CODE:
with open("D:\\SODS.txt","r") as fp:
for line in fp:
if "applicationId" in line:
print "true"
print re.search(r"applicationId\>(.+?)\<",fp)
ERROR:
TypeError: expected str or unicode but got

Actually my file looks as below
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body>
<cont:CreateContactResponse xmlns:cont="http://DataModelContactManagement" xmlns:all="http://MID/Business/0.1/All" xmlns:com="http://SOA/1.0/Common">
<com:status>0</com:status>
<com:description/>
<cont:ContactId>
<all:MIDBusCm.Individual.partyId>10028310</all:MIDBusCm.Individual.partyId>
</cont:ContactId>
<all:MIDBusCu.CustomerOrder.applicationId>C18403</all:MIDBusCu.CustomerOrder.applicationId>
</cont:CreateContactResponse>
I tried the below PYTHON script
import xml.etree.ElementTree as ET
data = open("D:\\Pravina\\Projects\\Meteor\\Automation\\API\\SODS.txt", 'r')
tree = ET.parse(data)
doc = tree.getroot()
print doc
rootText='.//{http://'SOA'/1.0/Common}'
errorCode=tree.find(rootText + 'status').text
print errorCode
With the above code the output i got was
"/schemas.xmlsoap.org/soap/envelope/}Envelope at a>
0 "
In my case the output should be "C18403" this is in between the tag 'all:MIDBusCu.CustomerOrder.applicationId'.
Please let me know if I am missing something here?

Related

python3 convert str to bytes-like obj without use encode

I wrote a httpserver to serve html files for python2.7 and python3.5.
def do_GET(self):
...
#if resoure is api
data = json.dumps({'message':['thanks for your answer']})
#if resource is file name
with open(resource, 'rb') as f:
data = f.read()
self.send_response(response)
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(data) # this line raise TypeError: a bytes-like object is required, not 'str'
the code works in python2.7, but in python 3, it raised the above the error.
I could use bytearray(data, 'utf-8') to convert str to bytes, but the html is changed in web.
My question:
How to do to support python2 and python3 without use 2to3 tools and without change the file's encoding.
is there a better way to read a file and sent it content to client with the same way in python2 and python3 ?
thanks in advance.
You just have to open your file in binary mode, not in text mode:
with open(resource,"rb") as f:
data = f.read()
then, data is a bytes object in python 3, and a str in python 2, and it works for both versions.
As a positive side-effect, when this code hits a Windows box, it still works (else binary files like images are corrupt because of the endline termination conversion when opened in text mode).

Is outputMode Still Supported In alchemy_language.entities

I have this inherited code which in Python 2.7 successfully returns results in xml that are then parsed by ElementTree.
result = alchemyObj.TextGetRankedNamedEntities(text)
root = ET.fromstring(result)
I am updating program to Python 3.5 and am attempting to do this so that I don't need to modify xml parsing of results:
result = alchemy_language.entities(outputMode='xml', text='text', max_
items='10'),
root = ET.fromstring(result)
Per http://www.ibm.com/watson/developercloud/alchemy-language/api/v1/#entities outputMode allows the choice between json default and xml. However, I get this error:
Traceback (most recent call last):
File "bin/nerv35.py", line 93, in <module>
main()
File "bin/nerv35.py", line 55, in main
result = alchemy_language.entities(outputMode='xml', text='text', max_items='10'),
TypeError: entities() got an unexpected keyword argument 'outputMode'
Does outputMode actually still exist? If so, what is wrong with the entities parameters?
The watson-developer-cloud does not appear to have this option for Entities. The settings allowed are:
html
text
url
disambiguate
linked_data
coreference
quotations
sentiment
show_source_text
max_items
language
model
You can try accessing the API directly by using requests. For example:
import requests
alchemyApiKey = 'YOUR_API_KEY'
url = 'https://gateway-a.watsonplatform.net/calls/text/TextGetRankedNamedEntities'
payload = { 'apikey': alchemyApiKey,
'outputMode': 'xml',
'text': 'This is an example text. IBM Corp'
}
r = requests.post(url,payload)
print r.text
Should return this:
<?xml version="1.0" encoding="UTF-8"?>
<results>
<status>OK</status>
<usage>By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html</usage>
<url></url>
<language>english</language>
<entities>
<entity>
<type>Company</type>
<relevance>0.961433</relevance>
<count>1</count>
<text>IBM Corp</text>
</entity>
</entities>
</results>

Python: Returning a filename for matching a specific condition

import sys, hashlib
import os
inputFile = 'C:\Users\User\Desktop\hashes.txt'
sourceDir = 'C:\Users\User\Desktop\Test Directory'
hashMatch = False
for root, dirs, files in os.walk(sourceDir):
for filename in files:
sourceDirHashes = hashlib.md5(filename)
for digest in inputFile:
if sourceDirHashes.hexdigest() == digest:
hashMatch = True
break
if hashMatch:
print str(filename)
else:
print 'hash not found'
Contents of inputFile =
2899ebdb5f7a90a216e97b3187851fc1
54c177418615a90a6424cb945f7a6aec
dd18bf3a8e0a2a3e53e2661c7fb53534
Contents of sourceDir files =
test
test 1
test 2
I almost have the code working, I'm just tripping up somewhere. My current code that I have posted always returns the else statement, that the hash hasn't been found, even although they do as I have verified this. I have provided the content of my sourceDir so that someone case try this, the file names are test, test 1 and test 2, the same content is in the files.
I must add however, I am not looking for the script to print the actual file content, but rather the name of the file.
Could anyone suggest to where I am going wrong and why it is saying the condition is false?
You need to open the inputFile using open(inputFile, 'rt') then you can read the hashes. Also when you do read the hashes make sure you strip them first to get rid of new line characters \n at the end of the lines

PYPDF watermarking returns error

hi im trying to watermark a pdf fileusing pypdf2 though i get this error i cant figure out what goes wrong.
i get the following error:
Traceback (most recent call last): File "test.py", line 13, in <module>
page.mergePage(watermark.getPage(0)) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1594, in mergePage
self._mergePage(page2) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1651, in _mergePage
page2Content, rename, self.pdf) File "C:Python27\site-packages\PyPDF2\pdf.py", line 1547, in
_contentStreamRename
op = operands[i] KeyError: 0
using python 2.7.6 with pypdf2 1.19 on windows 32bit.
hopefully someone can tell me what i do wrong.
my python file:
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(open("test.pdf", "rb"))
watermark = PdfFileReader(open("watermark.pdf", "rb"))
# print how many pages input1 has:
print("test.pdf has %d pages." % input.getNumPages())
print("watermark.pdf has %d pages." % watermark.getNumPages())
# add page 0 from input, but first add a watermark from another PDF:
page = input.getPage(0)
page.mergePage(watermark.getPage(0))
output.addPage(page)
# finally, write "output" to document-output.pdf
outputStream = file("outputs.pdf", "wb")
output.write(outputStream)
outputStream.close()
Try writing to a StringIO object instead of a disk file. So, replace this:
outputStream = file("outputs.pdf", "wb")
output.write(outputStream)
outputStream.close()
with this:
outputStream = StringIO.StringIO()
output.write(outputStream) #write merged output to the StringIO object
outputStream.close()
If above code works, then you might be having file writing permission issues. For reference, look at the PyPDF working example in my article.
I encountered this error when attempting to use PyPDF2 to merge in a page which had been generated by reportlab, which used an inline image canvas.drawInlineImage(...), which stores the image in the object stream of the PDF. Other PDFs that use a similar technique for images might be affected in the same way -- effectively, the content stream of the PDF has a data object thrown into it where PyPDF2 doesn't expect it.
If you're able to, a solution can be to re-generate the source pdf, but to not use inline content-stream-stored images -- e.g. generate with canvas.drawImage(...) in reportlab.
Here's an issue about this on PyPDF2.

django file upload doesn't work: f.read() returns ''

I'm trying to upload and parse json files using django. Everything works great up until the moment I need to parse the json. Then I get this error:
No JSON object could be decoded: line 1 column 0 (char 0)
Here's my code. (I'm following the instructions here, and overwriting the handle_uploaded_file method.)
def handle_uploaded_file(f, collection):
# assert False, [f.name, f.size, f.read()[:50]]
t = f.read()
for j in serializers.deserialize("json", t):
add_item_to_database(j)
The weird thing is that when I uncomment the "assert" line, I get this:
[u'myfile.json', 59478, '']
So it looks like my file is getting uploaded with the right size (I've verified this on the server), but the read command seems to be failing entirely.
Any ideas?
I've seen this before. Your file has length, but reading it doesn't. I'm wondering if it's been read previously... try this:
f.seek(0)
f.read()