On my system, I've both python3 and python 2.7, scrapy only suports python2.7 but by debault my libraries are linked to python 3.4.
I am trying to run a basic example that comes with scrapy documentation that is:
#!/usr/bin/python2.7
import scrapy
class StackOverflowSpider(scrapy.Spider):
name = 'stackoverflow'
start_urls = ['http://stackoverflow.com/questions?sort=votes']
def parse(self, response):
for href in response.css('.question-summary h3 a::attr(href)'):
full_url = response.urljoin(href.extract())
yield scrapy.Request(full_url, callback=self.parse_question)
def parse_question(self, response):
yield {
'title': response.css('h1 a::text').extract()[0],
'votes': response.css('.question .vote-count-post::text').extract()[0],
'body': response.css('.question .post-text').extract()[0],
'tags': response.css('.question .post-tag::text').extract(),
'link': response.url,
}
To run this code, is suggested for the command:
scrapy runspider stackoverflow_spider.py -o top-stackoverflow-questions.json
being stackoverflow_spider.py the above snippet.
The problem is that somehow this is calling python3,and since I am not explicitly calling a python version, not sure how to force the command to use python2.7 libs.
below is the error I get:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 9, in <module>
load_entry_point('Scrapy==1.0.5', 'console_scripts', 'scrapy')()
File "/usr/local/lib/python3.4/dist-packages/scrapy/cmdline.py", line 122, in execute
cmds = _get_commands_dict(settings, inproject)
File "/usr/local/lib/python3.4/dist-packages/scrapy/cmdline.py", line 46, in _get_commands_dict
cmds = _get_commands_from_module('scrapy.commands', inproject)
File "/usr/local/lib/python3.4/dist-packages/scrapy/cmdline.py", line 29, in _get_commands_from_module
for cmd in _iter_command_classes(module):
File "/usr/local/lib/python3.4/dist-packages/scrapy/cmdline.py", line 21, in _iter_command_classes
for obj in vars(module).itervalues():
AttributeError: 'dict' object has no attribute 'itervalues'
PS: I have installed scrapy under python 2.7
Related
When I run
python manage.py inspectdb --database=sybase_database
it ends with error message:
Database.register_converter(Database.DT_DECIMAL, util.typecast_decimal) AttributeError: module 'django.db.backends.utils' has no attribute 'typecast_decimal'
$ pip freeze
certifi==2021.10.8
chardet==3.0.4
defusedxml==0.7.1
Django==2.2.4
django-allauth==0.40.0
django-bootstrap-form==3.4
django-bootstrap3==15.0.0
django-crispy-forms==1.7.2
django-crudbuilder==0.2.7
django-debug-toolbar==1.10.1
django-filter==2.2.0
django-mssql-backend==2.8.1
django-tables2==2.4.1
idna==2.8
importlib-metadata==2.1.1
oauthlib==3.1.1
pyodbc==4.0.32
python-dateutil==2.8.2
python3-openid==3.2.0
pytz==2021.3
requests==2.21.0
requests-oauthlib==1.3.0
six==1.16.0
sqlany-django==1.13
sqlanydb==1.0.11
sqlparse==0.4.2
urllib3==1.24.3
zipp==3.6.0
Ubuntu 18.04
sqlany-django only supports up to Django 1.8.5 (Django 1.7.0 for inspectdb command).
It is no longer maintained; it was last updated in May 2016.
You can patch for later versions of Django in manage.py or an AppConfig.
# Django >= 2.0, for sqlany_django/base.py
import decimal
from django.db.backends import utils
utils.typecast_decimal = decimal.Decimal
# Django >= 1.11, for BaseDatabaseWrapper
from sqlany_django.base import DatabaseWrapper
DatabaseWrapper.client_class = lambda self, connection: None
DatabaseWrapper.creation_class = lambda self, connection: None
DatabaseWrapper.features_class = lambda self, connection: None
DatabaseWrapper.introspection_class = lambda self, connection: None
DatabaseWrapper.ops_class = lambda self, connection: None
# Django >= 1.7.1, for inspectdb
from django.db.backends import utils
from sqlany_django import base
class CursorWrapper(base.CursorWrapper, utils.CursorWrapper):
pass
base.CursorWrapper = CursorWrapper
References:
https://github.com/django/django/pull/2154 regarding CursorWrapper
https://github.com/django/django/pull/7206 regarding DatabaseWrapper
https://github.com/django/django/pull/8889 regarding typecast_decimal
https://github.com/sqlanywhere/sqlany-django/pull/15 regarding typecast_decimal
https://sqlanywhere-forum.sap.com/questions/37563/sqlany-django-driver-upgrade-to-django-32 requesting an update to sqlany_django (unanswered as of this post)
Stack traces:
Regarding DatabaseWrapper
Traceback (most recent call last):
...
File "/path/to/site-packages/django/core/management/commands/inspectdb.py", line 40, in handle_inspection
connection = connections[options['database']]
File "/path/to/site-packages/django/db/utils.py", line 207, in __getitem__
conn = backend.DatabaseWrapper(db, alias)
File "/path/to/site-packages/sqlany_django/base.py", line 447, in __init__
super(DatabaseWrapper, self).__init__(*args, **kwargs)
File "/path/to/site-packages/django/db/backends/base/base.py", line 102, in __init__
self.client = self.client_class(self)
TypeError: 'NoneType' object is not callable
Regarding CursorWrapper
Traceback (most recent call last):
...
File "/path/to/site-packages/django/core/management/commands/inspectdb.py", line 47, in handle_inspection
with connection.cursor() as cursor:
AttributeError: __enter__
Replaced typecast_decimal() with decimal.Decimal()
change base.py code like to
Database.register_converter("decimal", decoder(decimal.Decimal))
and in utils.py you can return
decimal.Decimal()
maybe help to you
I started with the following basic python script for using selenium:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select, WebDriverWait
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
path_bin='/usr/bin/firefox'
path_dr='/usr/local/bin/geckodriver'
profile = webdriver.FirefoxProfile()
binary=FirefoxBinary(path_bin)
self.driver = webdriver.Firefox(executable_path=path_dr,firefox_profile=profile,firefox_binary=binary)
self.driver.implicitly_wait(30)
Then, I tried executing it twice (first time without sudo and second time with sudo) as follows:
user4#pc-4:~/Scripts$ python test.py
Traceback (most recent call last):
File "test.py", line 2049, in <module>
self.driver = webdriver.Firefox(executable_path=path_dr,firefox_profile=profile,firefox_binary=binary)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 144, in __init__
self.service = Service(executable_path, log_path=log_path)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/service.py", line 44, in __init__
log_file = open(log_path, "a+") if log_path is not None and log_path != "" else None
IOError: [Errno 13] Permission denied: 'geckodriver.log'
user4#pc-4:~/Scripts$ sudo python test.py
Traceback (most recent call last):
File "test.py", line 2049, in <module>
self.driver = webdriver.Firefox(executable_path=path_dr,firefox_profile=profile,firefox_binary=binary)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 155, in __init__
keep_alive=True)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 92, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 183, in start_session
self.capabilities = response['value']
KeyError: 'value'
So, I have to understand the following:
As observed IOError: [Errno 13] Permission denied: 'geckodriver.log' is being observed when I run the program without a sudo. Is that really required?
Even though the browser window is opened, in spite of having resolved all the dependencies (I'm using Ubuntu 16.04) like geckodriver for Firefox, I'm getting the above error KeyError: 'value'. How can I resolve this? Do I have to make any changes in my code to avoid this?
I could use some helpful advice/tips/tweaks to get my program running.
P.S.: Digging into the web to learn on selenium, points to some architecture-specific issues as mentioned in here and here and none of them seems to have been solved and that concerns me. Would chrome be a better option?
Update: The following is the output of ls -alh:
-rw-r--r-- 1 root root 238K Aug 22 17:12 geckodriver.log
-rwxrwxrwx 1 user4 user 82K Aug 22 17:08 test.py
As you were trying to pass a couple of arguments when invoking webdriver.Firefox(), we need to consider that executable_path and firefox_binary both these arguments takes string type of arguments. So you need to pass the absolute path of geckodriver binary to executable_path and the absolute path of firefox binary to firefox_binary. Further you don't need to use binary=FirefoxBinary(path_bin). The following code block works fine on my Windows 8 machine:
Code is based on Python 3.6.1
from selenium import webdriver
path_bin=r'C:\Program Files\Mozilla Firefox\firefox.exe'
path_dr=r'C:\Utility\BrowserDrivers\geckodriver.exe'
profile = webdriver.FirefoxProfile()
driver = webdriver.Firefox(executable_path=path_dr,firefox_profile=profile,firefox_binary=path_bin)
driver.get('https://www.google.co.in')
As you are on ubuntu the following code block must work for you:
from selenium import webdriver
path_bin='/usr/bin/firefox'
path_dr='/usr/local/bin/geckodriver'
profile = webdriver.FirefoxProfile()
driver = webdriver.Firefox(executable_path=path_dr,firefox_profile=profile,firefox_binary=path_bin)
driver.get('https://www.google.co.in')
I had this same issue with
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 183, in start_session
self.capabilities = response['value']
KeyError: 'value'
So I have updated selenium and geckodriver to the latest version. This was the solution in my case.
I'm new to django (using python 2.7) and I was just trying to use heroku for the first time but I always get the following error:
remote: -----> Preparing static assets
remote: Collectstatic configuration error. To debug, run:
remote: $ heroku run python manage.py collectstatic --noinput
When I run that command, an import error regarding the django registration redux library shows up. I've had this problem before in Django and I fixed it by placing RequestSite under 'requests' and Site under 'models'. That solved the problem but the error still shows up in Heroku.
(venv) C:\Users\Carolina\Desktop\Coding\venv\project1>heroku run python manage.p
y collectstatic --noinput
Running python manage.py collectstatic --noinput on acla-acla... up, run.3645
Traceback (most recent call last):
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/app/.heroku/python/lib/python2.7/site-packages/django/core/management/_
_init__.py", line 350, in execute_from_command_line
utility.execute()
File "/app/.heroku/python/lib/python2.7/site-packages/django/core/management/_
_init__.py", line 324, in execute
django.setup()
File "/app/.heroku/python/lib/python2.7/site-packages/django/__init__.py", lin
e 18, in setup
apps.populate(settings.INSTALLED_APPS)
File "/app/.heroku/python/lib/python2.7/site-packages/django/apps/registry.py"
, line 115, in populate
app_config.ready()
File "/app/.heroku/python/lib/python2.7/site-packages/django/contrib/admin/app
s.py", line 22, in ready
self.module.autodiscover()
File "/app/.heroku/python/lib/python2.7/site-packages/django/contrib/admin/__i
nit__.py", line 26, in autodiscover
autodiscover_modules('admin', register_to=site)
File "/app/.heroku/python/lib/python2.7/site-packages/django/utils/module_load
ing.py", line 50, in autodiscover_modules
import_module('%s.%s' % (app_config.name, module_to_search))
File "/app/.heroku/python/lib/python2.7/importlib/__init__.py", line 37, in im
port_module
__import__(name)
File "/app/.heroku/python/lib/python2.7/site-packages/registration/admin.py",
line 2, in <module>
from django.contrib.sites.models import RequestSite
ImportError: cannot import name RequestSite
The thing is - that line doesn't exist. I went to venv/lib/site-packages/registration/admin.py and line 2 is this one:
from django.contrib import admin
from django.contrib.sites.requests import RequestSite # HERE
from django.contrib.sites.models import Site
from django.utils.translation import ugettext_lazy as _
from registration.models import RegistrationProfile
from registration.users import UsernameField
class RegistrationAdmin(admin.ModelAdmin):
actions = ['activate_users', 'resend_activation_email']
list_display = ('user', 'activation_key_expired')
raw_id_fields = ['user']
search_fields = ('user__{0}'.format(UsernameField()), 'user__first_name', 'user__last_name')
def activate_users(self, request, queryset):
"""
Activates the selected users, if they are not already
activated.
"""
for profile in queryset:
RegistrationProfile.objects.activate_user(profile.activation_key)
activate_users.short_description = _("Activate users")
def resend_activation_email(self, request, queryset):
"""
Re-sends activation emails for the selected users.
Note that this will *only* send activation emails for users
who are eligible to activate; emails will not be sent to users
whose activation keys have expired or who have already
activated.
"""
if Site._meta.installed:
site = Site.objects.get_current()
else:
site = RequestSite(request)
for profile in queryset:
if not profile.activation_key_expired():
profile.send_activation_email(site)
resend_activation_email.short_description = _("Re-send activation emails")
admin.site.register(RegistrationProfile, RegistrationAdmin)
This is what I get with pip freeze, just in case:
Django==1.9
django-crispy-forms==1.5.2
django-registration==2.0.3
django-registration-redux==1.2
django-tinymce==2.2.0
Pillow==3.0.0
requests==2.9.0
South==1.0.2
stripe==1.27.1
wheel==0.24.0
Anyone knows why this is happening? Thanks in advance!
EDIT ----
Ok, so the problem was the one mentioned by Daniel Roseman. The library is broken in pypi and I had to tell heroku to install it from github (where the package is fixed).
So, I went to my requirements.txt file and replaced this line:
django-registration-redux==1.2
with this one:
-e git://github.com/macropin/django-registration.git#egg=django-registration==1.2
(I also removed 'django-registration==2.0.3' because it is an old version of django-registration-redux and was creating problems).
Hope this helps people with the same issue!
It sounds like you edited the code for your locally installed copy of django-registration-redux. But that won't have any effect on Heroku, since the library will be installed directly from PyPI, according to the version in your requirements.txt.
If the library is really broken, you will need to fork it and point your requirements.txt to your fixed version. However, looking at the code on GitHub, it doesn't actually seem to be broken; you just need to update the version you are pointing to.
I am trying to get familiar with Flask-APScheduler plug-in through the following sample code: https://github.com/viniciuschiele/flask-apscheduler/blob/master/examples/jobs.py#L1
My project has the following structure:
backend
run.py
application
__init__.py
utilities
__init__.py
views
models
where,
backend>run.py is:
from application import app
app.run(debug=True)
from application import scheduler
scheduler.start()
backend>application>__init__.py is:
from flask import Flask
app = Flask(__name__)
from application.utilities.views import Config
from flask_apscheduler import APScheduler
app.config.from_object(Config())
scheduler = APScheduler()
scheduler.init_app(app)
backend>application>utilities>__init__.py is empty
backend>application>utilities>models.py is empty
backend>application>utilities>views.py is:
class Config(object):
JOBS = [
{
'id': 'job1',
'func': 'application:utilities:views:job1',
'args': (1, 2),
'trigger': {
'type': 'cron',
'second': 10
}
}
]
def job1(a, b):
print(str(a) + ' ' + str(b))
However, I get the following error:
(env)$ python run.py local
Traceback (most recent call last):
File "run.py", line 1, in <module>
from application import app
File "HOME/backend/application/__init__.py", line 106, in <module>
scheduler.init_app(app)
File "/home/xxxxxx/.anaconda/envs/env/lib/python2.7/site-packages/flask_apscheduler/scheduler.py", line 73, in init_app
self.__load_jobs(app)
File "/home/xxxxxx/.anaconda/envs/env/lib/python2.7/site-packages/flask_apscheduler/scheduler.py", line 136, in __load_jobs
self.__load_job(job, app)
File "/home/xxxxxx/.anaconda/envs/env/lib/python2.7/site-packages/flask_apscheduler/scheduler.py", line 159, in __load_job
func = ref_to_obj(func)
File "/home/xxxxxx/.anaconda/envs/env/lib/python2.7/site-packages/apscheduler/util.py", line 264, in ref_to_obj
raise LookupError('Error resolving reference %s: error looking up object' % ref)
LookupError: Error resolving reference application:utilities:views:job1: error looking up object
Does my structure look like ok? Have a placed the right code in the right place? What should I change to make it work?
Your reference should only have one colon (":"). The colon separates the required import from the variable that has to be looked up. So:
'func': 'application.utilities.views:job1'
I can run python using Beautiful Soup and Mechanized, but for some reason when I try to use Spray-Scraper it just doesn't work. Here's an example of what happens when I attempt to test the scraper with a tutorial:
Project name & BOT name = "tutorial"
The following scripts are the items.py and settings.py that I used.
items.py
import scrapy
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2]
with open(filename, 'wb') as f:
f.write(response.body)
settings.py
BOT_NAME = 'tutorial'
SPIDER_MODULES = ['tutorial.spiders']
NEWSPIDER_MODULE = 'tutorial.spiders'
CMD
C:\Users\Turbo>scrapy startproject tutorial
New Scrapy project 'tutorial' created in:
C:\Users\Turbo\tutorial
You can start your first spider with:
cd tutorial
scrapy genspider example example.com
C:\Users\Turbo>cd tutorial
C:\Users\Turbo\tutorial>scrapy crawl dmoz
Traceback (most recent call last):
File "C:\Python27\Scripts\scrapy-script.py", line 9, in <module>
load_entry_point('scrapy==0.24.4', 'console_scripts', 'scrapy')()
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\cmdline.py"
, line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\cmdline.py"
, line 89, in _run_print_help
func(*a, **kw)
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\cmdline.py"
, line 150, in _run_command
cmd.run(args, opts)
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\commands\cr
awl.py", line 58, in run
spider = crawler.spiders.create(spname, **opts.spargs)
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\spidermanag
er.py", line 44, in create
raise KeyError("Spider not found: %s" % spider_name)
KeyError: 'Spider not found: dmoz'
The problem is that you are putting your spider into the items.py.
Instead, create a package spiders, inside it create a dmoz.py and put your spider into it.
See more at Our first Spider paragraph of the tutorial.