portia (scrapy/slybot) errors on windows - python-2.7

i installed portia and got it to work i annotated some websites (looks really good)
but when i try to run the spiders i get some errors and nothing gets crawled
im running python 2.7.6 on win 7
C:\Python27\Scripts>python portiacrawl C:\portia\slyd\data\projects\new_project
Traceback (most recent call last):
File "portiacrawl", line 7, in <module>
execfile(__file__)
File "C:\portia\slybot\bin\portiacrawl", line 56, in <module>
main()
File "C:\portia\slybot\bin\portiacrawl", line 54, in main
subprocess.call(command_spec)
File "C:\Python27\lib\subprocess.py", line 522, in call
return Popen(*popenargs, **kwargs).wait()
File "C:\Python27\lib\subprocess.py", line 709, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 957, in _execute_child
startupinfo)
WindowsError: [Error 2] O sistema nÒo conseguiu localizar o ficheiro especificado

I am troubleshooting portia on Windows 8.1 and encountered the same error, exactly.
Try running 'python portiacrawl' by itself to determine if there is a subsequent menu. You should be able to see Help info on 'portiacrawl'. I suspect that you need to name the [spider] & [options] as well as change the terminal directory to see the output from the crawler. I suggest trying the following but rename [spider] to actual name of your spider w/o brackets:
Enter into terminal: C:\portia\slyd\data\projects <------Change to proper directory in cmd
Make sure you are in the terminal directory "C:\portia\slyd\data\projects"
The Cmd propmpt should look like: C:\portia\slyd\data\projects> <----waiting for portia initiation.
Enter into terminal:
python portiacrawl C:\portia\slyd\data\projects\new_project [spider] -t csv -o test.csv; or,
python portiacrawl [spider] -t csv -o test.csv
Report back. I am curious as to the terminal response. Did it initiate portiacrawl & return "access is denied."

Related

trying to use cookiecutter-django, getting errors and does not create anything

trying to get a Django project started using cookiecutter-django and can't seem to get it to generate anything.
using Python 3.6, Django 2.0.5, cookiecutter 1.6.0 (then created a virtualenv and entered a new, blank directory)
so I enter this command:
cookiecutter https://github.com/pydanny/cookiecutter-django
and get this error traceback:
Traceback (most recent call last):
File "c:\python\python36\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\python\python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\Python\python36\Scripts\cookiecutter.exe\__main__.py", line 9, in
<module>
File "c:\python\python36\lib\site-packages\click\core.py", line 722, in
__call__
return self.main(*args, **kwargs)
File "c:\python\python36\lib\site-packages\click\core.py", line 697, in main
rv = self.invoke(ctx)
File "c:\python\python36\lib\site-packages\click\core.py", line 895, in
invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\python\python36\lib\site-packages\click\core.py", line 535, in
invoke
return callback(*args, **kwargs)
File "c:\python\python36\lib\site-packages\cookiecutter\cli.py", line 120,
in main
password=os.environ.get('COOKIECUTTER_REPO_PASSWORD')
File "c:\python\python36\lib\site-packages\cookiecutter\main.py", line 63,
in cookiecutter
password=password
File "c:\python\python36\lib\site-packages\cookiecutter\repository.py", line
103, in determine_repo_dir
no_input=no_input,
File "c:\python\python36\lib\site-packages\cookiecutter\vcs.py", line 99, in
clone
stderr=subprocess.STDOUT,
File "c:\python\python36\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "c:\python\python36\lib\subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['git', 'clone',
'https://github.com/pydanny/cookiecutter-django']' returned non-zero exit
status 128.
OK - figured out how to get this to work.
used Github desktop
from cookiecutter-django repository, right click
open it Git Shell
this opens a Powershell window.
CD to directory where project will be placed in.
cookiecutter https://github.com/pydanny/cookiecutter-django
and it works.
not sure exactly why this works when regular CMD and elevated CMD do not, but this was the only way I could get it to work.
This is a permission issue with github due to the need to setup ssh keys. By the way I'm using ubuntu 12.
https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/ - create a key first in your machine using the instructions in the link. Once you have your ssh key, proceed to step 2. (Step 2 is indicated in the first link as last step)
https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account - add the generated ssh key to your github account.

Python wrapper install gives "Command "python setup.py egg_info" failed with error code 1"

I've been trying to install this Python wrapper for the past two days. I went through all the other questions here on Stack Overflow. Tried literally everything, and nothing seems to work.
Processing /../../../../../wrappers/Python
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/tmp/pip-twPZdY-build/setup.py", line 50, in <module>
**cffi_args
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/core.py", line 111, in setup
_setup_distribution = dist = klass(attrs)
File "/usr/local/lib/python2.7/site-packages/setuptools/dist.py", line 319, in __init__
_Distribution.__init__(self, attrs)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 287, in __init__
self.finalize_options()
File "/usr/local/lib/python2.7/site-packages/setuptools/dist.py", line 386, in finalize_options
ep.load()(self, ep.name, value)
File "/private/tmp/pip-twPZdY-build/.eggs/cffi-1.10.0-py2.7-macosx-10.11-x86_64.egg/cffi/setuptools_ext.py", line 188, in cffi_modules
add_cffi_module(dist, cffi_module)
File "/private/tmp/pip-twPZdY-build/.eggs/cffi-1.10.0-py2.7-macosx-10.11-x86_64.egg/cffi/setuptools_ext.py", line 49, in add_cffi_module
execfile(build_file_name, mod_vars)
File "/private/tmp/pip-twPZdY-build/.eggs/cffi-1.10.0-py2.7-macosx-10.11-x86_64.egg/cffi/setuptools_ext.py", line 25, in execfile
exec(code, glob, glob)
File "../ffi_build.py", line 34, in <module>
ffi.set_source('../_ffi', None)
File "/private/tmp/pip-twPZdY-build/.eggs/cffi-1.10.0-py2.7-macosx-10.11-x86_64.egg/cffi/api.py", line 612, in set_source
raise ValueError("'module_name' must not contain '/': use a dotted "
ValueError: 'module_name' must not contain '/': use a dotted name to make a 'package.module' location
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/tmp/pip-twPZdY-build/
I've reinstalled everything at least twice, updated, tried sudo -H, but nothing seems to work. It seems like a problem with setuptools, but I have no idea how to fix it.
Mac OSX 10.11.6 (El Capitan)
Python 2.7.13
Pip 9.0.1
After carefully reading through the error message, I managed to find the file called ffi_build.py under the Python folder I was trying to bind. As stated in the error message, at line 34 there was a module naming statement that contained a '/'. By replacing that '/' with a '.' I solved the issue and managed to bind the Python wrapper with no issue whatsoever.

Difference running python in PyCharm and in terminal

I'm wondering what is the difference when I run a python program from PyCharm or from the command line.
Actually I'm using a library called wand-py (ImageMagick binding).
If I run my program from the command line it works.
Though if I use PyCharm Run or debug it doesn't and I get the following traceback.
/Users/alexisbenoist/Documents/python/papyrus/env/bin/python "/Applications/PyCharm CE.app/helpers/pydev/pydevd.py" --multiproc --client 127.0.0.1 --port 58993 --file /Users/alexisbenoist/Documents/python/papyrus/tets.py
Connected to pydev debugger (build 135.973)
pydev debugger: process 73166 is connecting
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/helpers/pydev/pydevd.py", line 1733, in <module>
debugger.run(setup['file'], None, None)
File "/Applications/PyCharm CE.app/helpers/pydev/pydevd.py", line 1226, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/alexisbenoist/Documents/python/papyrus/tets.py", line 23, in <module>
blob = image_to_blob(PATH)
File "/Users/alexisbenoist/Documents/python/papyrus/tets.py", line 12, in image_to_blob
pdf.alpha_channel = False
File "/Users/alexisbenoist/Documents/python/papyrus/env/lib/python2.7/site-packages/wand/image.py", line 419, in wrapped
result = function(self, *args, **kwargs)
File "/Users/alexisbenoist/Documents/python/papyrus/env/lib/python2.7/site-packages/wand/image.py", line 992, in alpha_channel
self.raise_exception()
File "/Users/alexisbenoist/Documents/python/papyrus/env/lib/python2.7/site-packages/wand/resource.py", line 218, in raise_exception
raise e
wand.exceptions.WandError: wand contains no images `MagickWand-1' # error/magick-image.c/MagickSetImageAlphaChannel/9504
I'm using the same virtual env in the terminal and PyCharm.
Do you guys know what could cause the problem?
Thanks,
Alexis.

Python Mount CIFS share with space in linux

I am trying to mount to a CIFS share with space in my linux machine using python like below:
from subprocess import call
t = mount -t cifs -o username=kalair2 "//10.32.135.87/root/Singapore Lab/SYMM" /mnt/share
print t
if os.path.exists("/mnt/share"):
print "/mnt/share path already exists"
else:
call("mkdir /mnt/share")
call("chmod 777 /mnt/share")
print "mnt/share has been created"
call(t)
But ended up in the error... Traceback (most recent call last): File
"mounttest.py", line 12, in
call(t) File "/usr/lib64/python2.6/subprocess.py", line 478, in call
p = Popen(*popenargs, **kwargs) File "/usr/lib64/python2.6/subprocess.py", line 642, in init
errread, errwrite) File "/usr/lib64/python2.6/subprocess.py", line 1234, in _execute_child
raise child_exception OS Error: [Err no 2] No such file or directory
It works if i execute this mount command in shell with space. Can anyone help me with this?
Try escaping your space:
"//10.32.135.87/root/Singapore\ Lab/SYMM"
call should be called with an array of arguments - not a single string for the command with the arguments, as documented here: https://docs.python.org/2/library/subprocess.html
Your command should look like this:
call(["mkdir", "/mnt/share"])

failed to use mapreduce in python

I am trying to learn mapreduce program using python mrjob. I am getting following error:
Traceback:
dumping stdin to local file /tmp/pyes_mrjob.testuser.20131004.103251.998597/STDIN
Making directory hdfs:///user/testuser/tmp/mrjob/pyes_mrjob.user.20131004.103251.998597/files/ on HDFS
> /usr/lib/hadoop-mapreduce/bin/hadoop fs -mkdir hdfs:///user/testuser/tmp/mrjob/pyes_mrjob.testuser.20131004.103251.998597/files/
Traceback (most recent call last):
File "pyes_mrjob.py", line 34, in <module>
MRWordFrequencyCount.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 500, in run
mr_job.execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 518, in execute
super(MRJob, self).execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 146, in execute
self.run_job()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 207, in run_job
runner.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 458, in run
self._run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 236, in _run
self._upload_local_files_to_hdfs()
File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 263, in _upload_local_files_to_hdfs
self._mkdir_on_hdfs(self._upload_mgr.prefix)
File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 271, in _mkdir_on_hdfs
self.invoke_hadoop(['fs', '-mkdir', path])
File "/usr/local/lib/python2.7/dist-packages/mrjob/fs/hadoop.py", line 81, in invoke_hadoop
proc = Popen(args, stdout=PIPE, stderr=PIPE)
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I executed the command manually its working fine there but when i try to execute my program its not working.
Since just started learning can someone suggest what library i have to choose. According to some blogs somelibraries has good documention and some libraries has better perfomance and .... I came across below post which looks older
http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/
But so many libraries got updates recently. So can some suggest me library i can start with..
i guess this problem is caused by the way how mrjob calls "hadoop fs -mkdir", if the parent dir of the targeted dir you want to make doesn't exist, -mkdir will fail. that means you have to use "hadoop fs -mkdir -p [path]". Ultimately, you will need to modify mrjob library manually in [path of mrjob install](mine is /usr/lib/python2.6/site-packages/mrjob)/hadoop.py at line 271:
self.invoke_hadoop(['fs', '-mkdir', path])
to
self.invoke_hadoop(['fs', '-mkdir', '-p', path])
Good Luck!
It looks like you set your HADOOP_HOME to "/usr/lib/hadoop-mapreduce". However, this is wrong and it should be set to "/usr/lib/hadoop".
Also, if you get an error saying that the hadoop-streaming.jar could not be found, create a symlink in "/usr/lib/hadoop" to this jar as follows:
sudo ln -s /usr/lib/hadoop-mapreduce/hadoop-streaming.jar /usr/lib/hadoop