i installed portia and got it to work i annotated some websites (looks really good)
but when i try to run the spiders i get some errors and nothing gets crawled
im running python 2.7.6 on win 7
C:\Python27\Scripts>python portiacrawl C:\portia\slyd\data\projects\new_project
Traceback (most recent call last):
File "portiacrawl", line 7, in <module>
execfile(__file__)
File "C:\portia\slybot\bin\portiacrawl", line 56, in <module>
main()
File "C:\portia\slybot\bin\portiacrawl", line 54, in main
subprocess.call(command_spec)
File "C:\Python27\lib\subprocess.py", line 522, in call
return Popen(*popenargs, **kwargs).wait()
File "C:\Python27\lib\subprocess.py", line 709, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 957, in _execute_child
startupinfo)
WindowsError: [Error 2] O sistema nÒo conseguiu localizar o ficheiro especificado
I am troubleshooting portia on Windows 8.1 and encountered the same error, exactly.
Try running 'python portiacrawl' by itself to determine if there is a subsequent menu. You should be able to see Help info on 'portiacrawl'. I suspect that you need to name the [spider] & [options] as well as change the terminal directory to see the output from the crawler. I suggest trying the following but rename [spider] to actual name of your spider w/o brackets:
Enter into terminal: C:\portia\slyd\data\projects <------Change to proper directory in cmd
Make sure you are in the terminal directory "C:\portia\slyd\data\projects"
The Cmd propmpt should look like: C:\portia\slyd\data\projects> <----waiting for portia initiation.
Enter into terminal:
python portiacrawl C:\portia\slyd\data\projects\new_project [spider] -t csv -o test.csv; or,
python portiacrawl [spider] -t csv -o test.csv
Report back. I am curious as to the terminal response. Did it initiate portiacrawl & return "access is denied."
Related
trying to get a Django project started using cookiecutter-django and can't seem to get it to generate anything.
using Python 3.6, Django 2.0.5, cookiecutter 1.6.0 (then created a virtualenv and entered a new, blank directory)
so I enter this command:
cookiecutter https://github.com/pydanny/cookiecutter-django
and get this error traceback:
Traceback (most recent call last):
File "c:\python\python36\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\python\python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\Python\python36\Scripts\cookiecutter.exe\__main__.py", line 9, in
<module>
File "c:\python\python36\lib\site-packages\click\core.py", line 722, in
__call__
return self.main(*args, **kwargs)
File "c:\python\python36\lib\site-packages\click\core.py", line 697, in main
rv = self.invoke(ctx)
File "c:\python\python36\lib\site-packages\click\core.py", line 895, in
invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\python\python36\lib\site-packages\click\core.py", line 535, in
invoke
return callback(*args, **kwargs)
File "c:\python\python36\lib\site-packages\cookiecutter\cli.py", line 120,
in main
password=os.environ.get('COOKIECUTTER_REPO_PASSWORD')
File "c:\python\python36\lib\site-packages\cookiecutter\main.py", line 63,
in cookiecutter
password=password
File "c:\python\python36\lib\site-packages\cookiecutter\repository.py", line
103, in determine_repo_dir
no_input=no_input,
File "c:\python\python36\lib\site-packages\cookiecutter\vcs.py", line 99, in
clone
stderr=subprocess.STDOUT,
File "c:\python\python36\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "c:\python\python36\lib\subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['git', 'clone',
'https://github.com/pydanny/cookiecutter-django']' returned non-zero exit
status 128.
OK - figured out how to get this to work.
used Github desktop
from cookiecutter-django repository, right click
open it Git Shell
this opens a Powershell window.
CD to directory where project will be placed in.
cookiecutter https://github.com/pydanny/cookiecutter-django
and it works.
not sure exactly why this works when regular CMD and elevated CMD do not, but this was the only way I could get it to work.
This is a permission issue with github due to the need to setup ssh keys. By the way I'm using ubuntu 12.
https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/ - create a key first in your machine using the instructions in the link. Once you have your ssh key, proceed to step 2. (Step 2 is indicated in the first link as last step)
https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account - add the generated ssh key to your github account.
I've been trying to install this Python wrapper for the past two days. I went through all the other questions here on Stack Overflow. Tried literally everything, and nothing seems to work.
Processing /../../../../../wrappers/Python
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/tmp/pip-twPZdY-build/setup.py", line 50, in <module>
**cffi_args
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/core.py", line 111, in setup
_setup_distribution = dist = klass(attrs)
File "/usr/local/lib/python2.7/site-packages/setuptools/dist.py", line 319, in __init__
_Distribution.__init__(self, attrs)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 287, in __init__
self.finalize_options()
File "/usr/local/lib/python2.7/site-packages/setuptools/dist.py", line 386, in finalize_options
ep.load()(self, ep.name, value)
File "/private/tmp/pip-twPZdY-build/.eggs/cffi-1.10.0-py2.7-macosx-10.11-x86_64.egg/cffi/setuptools_ext.py", line 188, in cffi_modules
add_cffi_module(dist, cffi_module)
File "/private/tmp/pip-twPZdY-build/.eggs/cffi-1.10.0-py2.7-macosx-10.11-x86_64.egg/cffi/setuptools_ext.py", line 49, in add_cffi_module
execfile(build_file_name, mod_vars)
File "/private/tmp/pip-twPZdY-build/.eggs/cffi-1.10.0-py2.7-macosx-10.11-x86_64.egg/cffi/setuptools_ext.py", line 25, in execfile
exec(code, glob, glob)
File "../ffi_build.py", line 34, in <module>
ffi.set_source('../_ffi', None)
File "/private/tmp/pip-twPZdY-build/.eggs/cffi-1.10.0-py2.7-macosx-10.11-x86_64.egg/cffi/api.py", line 612, in set_source
raise ValueError("'module_name' must not contain '/': use a dotted "
ValueError: 'module_name' must not contain '/': use a dotted name to make a 'package.module' location
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/tmp/pip-twPZdY-build/
I've reinstalled everything at least twice, updated, tried sudo -H, but nothing seems to work. It seems like a problem with setuptools, but I have no idea how to fix it.
Mac OSX 10.11.6 (El Capitan)
Python 2.7.13
Pip 9.0.1
After carefully reading through the error message, I managed to find the file called ffi_build.py under the Python folder I was trying to bind. As stated in the error message, at line 34 there was a module naming statement that contained a '/'. By replacing that '/' with a '.' I solved the issue and managed to bind the Python wrapper with no issue whatsoever.
I'm wondering what is the difference when I run a python program from PyCharm or from the command line.
Actually I'm using a library called wand-py (ImageMagick binding).
If I run my program from the command line it works.
Though if I use PyCharm Run or debug it doesn't and I get the following traceback.
/Users/alexisbenoist/Documents/python/papyrus/env/bin/python "/Applications/PyCharm CE.app/helpers/pydev/pydevd.py" --multiproc --client 127.0.0.1 --port 58993 --file /Users/alexisbenoist/Documents/python/papyrus/tets.py
Connected to pydev debugger (build 135.973)
pydev debugger: process 73166 is connecting
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/helpers/pydev/pydevd.py", line 1733, in <module>
debugger.run(setup['file'], None, None)
File "/Applications/PyCharm CE.app/helpers/pydev/pydevd.py", line 1226, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/alexisbenoist/Documents/python/papyrus/tets.py", line 23, in <module>
blob = image_to_blob(PATH)
File "/Users/alexisbenoist/Documents/python/papyrus/tets.py", line 12, in image_to_blob
pdf.alpha_channel = False
File "/Users/alexisbenoist/Documents/python/papyrus/env/lib/python2.7/site-packages/wand/image.py", line 419, in wrapped
result = function(self, *args, **kwargs)
File "/Users/alexisbenoist/Documents/python/papyrus/env/lib/python2.7/site-packages/wand/image.py", line 992, in alpha_channel
self.raise_exception()
File "/Users/alexisbenoist/Documents/python/papyrus/env/lib/python2.7/site-packages/wand/resource.py", line 218, in raise_exception
raise e
wand.exceptions.WandError: wand contains no images `MagickWand-1' # error/magick-image.c/MagickSetImageAlphaChannel/9504
I'm using the same virtual env in the terminal and PyCharm.
Do you guys know what could cause the problem?
Thanks,
Alexis.
I am trying to mount to a CIFS share with space in my linux machine using python like below:
from subprocess import call
t = mount -t cifs -o username=kalair2 "//10.32.135.87/root/Singapore Lab/SYMM" /mnt/share
print t
if os.path.exists("/mnt/share"):
print "/mnt/share path already exists"
else:
call("mkdir /mnt/share")
call("chmod 777 /mnt/share")
print "mnt/share has been created"
call(t)
But ended up in the error... Traceback (most recent call last): File
"mounttest.py", line 12, in
call(t) File "/usr/lib64/python2.6/subprocess.py", line 478, in call
p = Popen(*popenargs, **kwargs) File "/usr/lib64/python2.6/subprocess.py", line 642, in init
errread, errwrite) File "/usr/lib64/python2.6/subprocess.py", line 1234, in _execute_child
raise child_exception OS Error: [Err no 2] No such file or directory
It works if i execute this mount command in shell with space. Can anyone help me with this?
Try escaping your space:
"//10.32.135.87/root/Singapore\ Lab/SYMM"
call should be called with an array of arguments - not a single string for the command with the arguments, as documented here: https://docs.python.org/2/library/subprocess.html
Your command should look like this:
call(["mkdir", "/mnt/share"])
I am trying to learn mapreduce program using python mrjob. I am getting following error:
Traceback:
dumping stdin to local file /tmp/pyes_mrjob.testuser.20131004.103251.998597/STDIN
Making directory hdfs:///user/testuser/tmp/mrjob/pyes_mrjob.user.20131004.103251.998597/files/ on HDFS
> /usr/lib/hadoop-mapreduce/bin/hadoop fs -mkdir hdfs:///user/testuser/tmp/mrjob/pyes_mrjob.testuser.20131004.103251.998597/files/
Traceback (most recent call last):
File "pyes_mrjob.py", line 34, in <module>
MRWordFrequencyCount.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 500, in run
mr_job.execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 518, in execute
super(MRJob, self).execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 146, in execute
self.run_job()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 207, in run_job
runner.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 458, in run
self._run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 236, in _run
self._upload_local_files_to_hdfs()
File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 263, in _upload_local_files_to_hdfs
self._mkdir_on_hdfs(self._upload_mgr.prefix)
File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 271, in _mkdir_on_hdfs
self.invoke_hadoop(['fs', '-mkdir', path])
File "/usr/local/lib/python2.7/dist-packages/mrjob/fs/hadoop.py", line 81, in invoke_hadoop
proc = Popen(args, stdout=PIPE, stderr=PIPE)
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I executed the command manually its working fine there but when i try to execute my program its not working.
Since just started learning can someone suggest what library i have to choose. According to some blogs somelibraries has good documention and some libraries has better perfomance and .... I came across below post which looks older
http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/
But so many libraries got updates recently. So can some suggest me library i can start with..
i guess this problem is caused by the way how mrjob calls "hadoop fs -mkdir", if the parent dir of the targeted dir you want to make doesn't exist, -mkdir will fail. that means you have to use "hadoop fs -mkdir -p [path]". Ultimately, you will need to modify mrjob library manually in [path of mrjob install](mine is /usr/lib/python2.6/site-packages/mrjob)/hadoop.py at line 271:
self.invoke_hadoop(['fs', '-mkdir', path])
to
self.invoke_hadoop(['fs', '-mkdir', '-p', path])
Good Luck!
It looks like you set your HADOOP_HOME to "/usr/lib/hadoop-mapreduce". However, this is wrong and it should be set to "/usr/lib/hadoop".
Also, if you get an error saying that the hadoop-streaming.jar could not be found, create a symlink in "/usr/lib/hadoop" to this jar as follows:
sudo ln -s /usr/lib/hadoop-mapreduce/hadoop-streaming.jar /usr/lib/hadoop