Initiating a scrapy crawl as a Django management command - django

I’ve got a project that connects Django and Scrapy where I’m looking to initiate a spider crawl through a Django management command. The idea is to run it periodically via cron. I’m using Django 1.11, Python 3.5 and Scrapy 1.5
Here's the code for my custom management command in the ‘~/djangoscrapy/src/app/management/commands/run_sp.py’ file
from django.core.management.base import BaseCommand
from scrapy.cmdline import execute
import os
from django.conf import settings
os.chdir(settings.CRAWLER_PATH)
class Command(BaseCommand):
def run_from_argv(self, argv):
print ('In run_from_argv')
self._argv = argv[:]
return self.execute()
def handle(self, *args, **options):
execute(self._argv[1:])
When I run $python manage.py run_sp crawl usc
I get this error….
In run_from_argv
Traceback (most recent call last):
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/home/greendot/lib/python2.7/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/home/greendot/lib/python2.7/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/greendot/webapps/scraper3/src/app/management/commands/run_sp.py", line 15, in run_from_argv
return self.execute()
File "/home/greendot/lib/python2.7/django/core/management/base.py", line 314, in execute
if options['no_color']:
KeyError: u'no_color'
My project structure is as follows
SRC
├── app
│   ├── __init__.py
│   ├── admin.py
│   ├── management
│   │   └── commands
│   │   ├── __init__.py
│   │   ├── run_sp.py
│   ├── models.py
│   └── views.py
├── example_bot
│   ├── dbs
│   ├── example_bot
│   │   ├── __init__.py
│   │   ├── items.py
│   │   ├── middlewares.py
│   │   ├── pipelines.py
│   │   ├── settings.py
│   │   └── spiders
│   │   ├── __init__.py
│   │   ├── __pycache__
│   │   └── usc.py
│   └── scrapy.cfg
├── example_project
│   ├── __init__.py
│   ├── __pycache__
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── manage.py
I've added the line below to my Django settings file so that when the management command executes, it is in the 'example_bot' directory since the 'scrapy crawl' command is only available within the scrapy project directory, and not in the BASE_DIR.
CRAWLER_PATH = os.path.join(BASE_DIR, 'example_bot/')
I can't seem to get this to work though so any help is much appreciated

Related

Celery Cannot find a module in my Django project

I have a Django 2.0 project using celery 4.2.1 and redis 2.10.6. The django project has two apps, memorabilia and face_recognition. I have it all successfully running tasks with django running on my development machine. I uploaded everything to my git server, then installed the apps on my laptop from git, updated all requirements, etc. Both are Ubuntu machines. I am not using django-celery.
When I try to run celery -A MemorabiliaJSON worker -l debug,
I get an exception saying ModuleNotFoundError: No module named 'face_recognition.tasks'
I am not sure how to fix this, as the same code base is running on my development machine.
My file structure is:
├── celery.sh
├── face_recognition
│   ├── admin.py
│   ├── apps.py
│   ├── __init__.py
│   ├── migrations
│   ├── models.py
│   ├── __pycache__
│   ├── tasks.py
│   ├── tests.py
│   └── views.py
├── __init__.py
├── manage.py
├── memorabilia
│   ├── admin.py
│   ├── apps.py
│   ├── fields.py
│   ├── fixtures
│   ├── __init__.py
│   ├── logs
│   ├── migrations
│   ├── models.py
│   ├── __pycache__
│   ├── storage.py
│   ├── tasks.py
│   ├── templates
│   ├── tests
│   ├── urls.py
│   ├── validators.py
│   ├── views.py
│   ├── widgets.py
├── MemorabiliaJSON
│   ├── celery.py
│   ├── default_images
│   ├── documents
│   ├── __init__.py
│   ├── __pycache__
│   ├── settings
│   ├── static
│   ├── urls.py
│   ├── views.py
│   ├── wsgi.py
├── __pycache__
│   ├── celery.cpython-36.pyc
│   └── __init__.cpython-36.pyc
├── requirements.txt
└── tests
MemorabiliaJSON/celery.py
# http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from django.apps import apps
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'MemorabiliaJSON.settings.tsunami')
app = Celery('MemorabiliaJSON')
app.config_from_object('django.conf:settings', namespace='CELERY')
#app.autodiscover_tasks(lambda: [n.name for n in apps.get_app_configs()])
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
(memorabilia-JSON) mark#octopus:~/python-projects/memorabilia-JSON
face_recognition/__init__.py
default_app_config = 'face_recognition.apps.FaceRecognitionConfig'
memorabilia/__init__.py
default_app_config = 'memorabilia.apps.MemorabiliaConfig'
INSTALLED_APPS has these two apps
'memorabilia.apps.MemorabiliaConfig',
'face_recognition.apps.FaceRecognitionConfig',

google cloud pubsub ImportError: cannot import name types

I wrote a small program for Google appengine on the python for a standard environment using google-cloud-pubsub. I get an error
ImportError: cannot import name types. I also saw that the problem is still not solved . But maybe someone started a sub-pub in a standard environment?
I install lib: pip installall -t lib google-cloud-pubsub.
In appengine_config.py: vendor.add('lib')
Error accessing the appengine application:
Traceback (most recent call last):
File "/base/alloc/tmpfs/dynamic_runtimes/python27g/7894e0c59273b2b7/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 240, in Handle
handler = _config_handle.add_wsgi_middleware(self._LoadHandler())
File "/base/alloc/tmpfs/dynamic_runtimes/python27g/7894e0c59273b2b7/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 299, in _LoadHandler
handler, path, err = LoadObject(self._handler)
File "/base/alloc/tmpfs/dynamic_runtimes/python27g/7894e0c59273b2b7/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 85, in LoadObject
obj = __import__(path[0])
File "/base/data/home/apps/e~pqcloud-sp/agg2:20180703t012034.410851540172441919/service_main.py", line 5, in <module>
from google.cloud.pubsub_v1 import PublisherClient
File "/base/data/home/apps/e~pqcloud-sp/agg2:20180703t012034.410851540172441919/lib/google/cloud/pubsub_v1/__init__.py", line 17, in <module>
from google.cloud.pubsub_v1 import types
ImportError: cannot import name types
app.yaml:
runtime: python27
api_version: 1
threadsafe: true
service: agg2
handlers:
- url: .*
script: service_main.app
libraries:
- name: webapp2
version: "2.5.1"
- name: jinja2
version: latest
skip_files:
- ^(.*/)?#.*#$
- ^(.*/)?.*~$
- ^(.*/)?.*\.py[co]$
- ^(.*/)?.*/RCS/.*$
- ^(.*/)?\..*$
service_main.py:
import os
import logging
import webapp2
from google.cloud.pubsub_v1 import PublisherClient
logger = logging.getLogger('service_main')
logger.setLevel(logging.WARNING)
class ServiceTaskMainHandler(webapp2.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'application/json'
self.response.out.write('test')
config = {
'webapp2_extras.sessions': {
'secret_key': 'YOUR_SECRET_KEY'
}
}
MAIN_ROUTE = [
webapp2.Route('/', ServiceTaskMainHandler, name='main'),
]
app = webapp2.WSGIApplication(MAIN_ROUTE, debug=True, config=config)
tree lib/google/cloud/pubsub_v1
lib/google/cloud/pubsub_v1
├── exceptions.py
├── exceptions.pyc
├── futures.py
├── futures.pyc
├── gapic
│   ├── __init__.py
│   ├── __init__.pyc
│   ├── publisher_client_config.py
│   ├── publisher_client_config.pyc
│   ├── publisher_client.py
│   ├── publisher_client.pyc
│   ├── subscriber_client_config.py
│   ├── subscriber_client_config.pyc
│   ├── subscriber_client.py
│   └── subscriber_client.pyc
├── _gapic.py
├── _gapic.pyc
├── __init__.py
├── __init__.pyc
├── proto
│   ├── __init__.py
│   ├── __init__.pyc
│   ├── pubsub_pb2_grpc.py
│   ├── pubsub_pb2_grpc.pyc
│   ├── pubsub_pb2.py
│   └── pubsub_pb2.pyc
├── publisher
│   ├── batch
│   │   ├── base.py
│   │   ├── base.pyc
│   │   ├── __init__.py
│   │   ├── __init__.pyc
│   │   ├── thread.py
│   │   └── thread.pyc
│   ├── client.py
│   ├── client.pyc
│   ├── exceptions.py
│   ├── exceptions.pyc
│   ├── futures.py
│   ├── futures.pyc
│   ├── __init__.py
│   └── __init__.pyc
├── subscriber
│   ├── client.py
│   ├── client.pyc
│   ├── futures.py
│   ├── futures.pyc
│   ├── __init__.py
│   ├── __init__.pyc
│   ├── message.py
│   ├── message.pyc
│   ├── _protocol
│   │   ├── bidi.py
│   │   ├── bidi.pyc
│   │   ├── dispatcher.py
│   │   ├── dispatcher.pyc
│   │   ├── heartbeater.py
│   │   ├── heartbeater.pyc
│   │   ├── helper_threads.py
│   │   ├── helper_threads.pyc
│   │   ├── histogram.py
│   │   ├── histogram.pyc
│   │   ├── __init__.py
│   │   ├── __init__.pyc
│   │   ├── leaser.py
│   │   ├── leaser.pyc
│   │   ├── requests.py
│   │   ├── requests.pyc
│   │   ├── streaming_pull_manager.py
│   │   └── streaming_pull_manager.pyc
│   ├── scheduler.py
│   └── scheduler.pyc
├── types.py
└── types.pyc
The reason your code failed is because App Engine Standard's Python2.7 runtime does not support Pub/Sub Cloud Client Library, only Pub/Sub API Client Library. There's some new code samples showing how to use Pub/Sub with App Engine Standard.
import googleapiclient.discovery
import base64
service = build('pubsub', 'v1')
topic_path = 'projects/{your_project_id}/topics/{your_topic}'
service.projects().topics().publish(
topic=topic_path, body={
"messages": [{
"data": base64.b64encode(data)
}]
}).execute()
Update: Both GAE (Google App Engine) Standard and GAE Flexible Python 3 Runtime support Cloud Pub/Sub Client Library.

Sphinx fails when generating documentation for Django project

I'm trying to automatically generate documentation for my Django project using Sphinx with the autodoc and napoleon extensions.
Using sphinx-quickstart I've created the following structure:
MyDjangoProject
├── __init__.py
├── config
│   ├── __init__.py
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── docs
│   ├── Makefile
│   ├── build
│   └── source
│   ├── _static
│   ├── _templates
│   ├── conf.py
│   └── index.rst
├── myfirstapp
│   ├── __init__.py
│   ├── models.py
│   └── views.py
├── mysecondapp
│   ├── __init__.py
│   ├── models.py
│   └── views.py
...
I've customized docs/source/conf.py to reflect my project structure.
import os
import sys
proj_folder = os.path.realpath(
os.path.join(os.path.dirname(__file__), '../..'))
sys.path.append(proj_folder)
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'config.settings')
import django
django.setup()
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.intersphinx', 'sphinx.ext.napoleon', 'sphinx.ext.viewcode']
# The rest of the default configuration...
Then I go to the root of my project and run sphinx-apidoc -f -o docs/source .. This adds a .rst file for each module to docs/source.
Finally I go to MyDjangoProject and run make html. This fails with an error for each module saying
Traceback (most recent call last):
File "/Users/Oskar/git/MyDjangoProject/venv/lib/python2.7/site-packages/sphinx/ext/autodoc.py", line 551, in import_object
__import__(self.modname)
ImportError: No module named MyDjangoProject.myfirstapp
What am I doing wrong?
Since you have added MyDjangoProject to the python path, you should import myfirstapp as myfirstapp instead of MyDjangoProject.myfirstapp.

Flask Blueprint; No module found named

I'm trying to get Flask Blueprints running in Docker, but having issues with registering Blueprints correct.
I have the following structure:
├── docker-compose.yml
├── nginx
│   ├── Dockerfile
│   └── sites-enabled
│   └── flask_project
└── web
├── Dockerfile
├── __init__.py
├── app.py
├── modules
│   ├── __init__.py
│   └── page
│   ├── __init__.py
│   ├── forms.py
│   ├── models.py
│   ├── views.py
├── requirements.txt
├── static
│   ├── css
│   │   ├── bootstrap.min.css
│   │   └── main.css
│   ├── img
│   └── js
│   └── bootstrap.min.js
└── templates
├── _base.html
└── index.html
app.py contains:
from flask import Flask
from web.modules.page import simple_page
app = Flask(__name__)
app.register_blueprint(simple_page)
if __name__ == '__main__':
print app.url_map
app.run(debug=True)
views.py contains:
from flask import Blueprint
simple_page = Blueprint('simple_page', __name__,
template_folder='templates')
#simple_page.route('/')
def index():
return "Hello world"
__init__.py under page:
from web.modules.page.views import simple_page
The __init__.py files are empty.
The console gives an ImportError: No module named web.modules.page
Thanks for your time.
Look like is structure problem, you can reference from here: https://www.digitalocean.com/community/tutorials/how-to-structure-large-flask-applications
The following is my example, hope it can help for you:
├── app
│   ├── __init__.py
│   ├── main
│   │   ├── __init__.py
│   │   └── views.py
│   ├── models
│   │   └── __init__.py
│   ├── static
│   │   ├── css
│   │   ├── js
│   │   ├── img
│   │   └── file
│   ├── templates
│   │   └── index.html
└── master.py
app/__init__.py
from flask import Flask
from app.main import main
def create_app():
app = Flask(__name__)
app.register_blueprint(main)
return app
app/main/__init__.py
from flask import Blueprint
main = Blueprint('main', __name__)
from app.main import views
master.py
from app import create_app
if __name__ == '__main__':
app = create_app()
app.run(host='0.0.0.0', port=8000, threaded=True)

Import django setting in a app for test

I would like to import django setting in API_script.py in API
the settings are in Agora.settings :
Here is my API_script.py in API :
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "Agora.settings")
from django.contrib.auth.models import User
import django
from django.db.models.loading import cache as model_cache
from Profile.models import Profile_User
try :
django.setup()
except :
pass
def check_profile_exist(token):
print(token)
Here is the error that i get :
Traceback (most recent call last):
File "/home/bussiere/WorkspaceSafe/Agora/API/API_script.py", line 3, in <module>
from django.contrib.auth.models import User
File "/usr/local/lib/python3.4/dist-packages/django/contrib/auth/__init__.py", line 7, in <module>
from django.middleware.csrf import rotate_token
File "/usr/local/lib/python3.4/dist-packages/django/middleware/csrf.py", line 14, in <module>
from django.utils.cache import patch_vary_headers
File "/usr/local/lib/python3.4/dist-packages/django/utils/cache.py", line 26, in <module>
from django.core.cache import caches
File "/usr/local/lib/python3.4/dist-packages/django/core/cache/__init__.py", line 34, in <module>
if DEFAULT_CACHE_ALIAS not in settings.CACHES:
File "/usr/local/lib/python3.4/dist-packages/django/conf/__init__.py", line 48, in __getattr__
self._setup(name)
File "/usr/local/lib/python3.4/dist-packages/django/conf/__init__.py", line 44, in _setup
self._wrapped = Settings(settings_module)
File "/usr/local/lib/python3.4/dist-packages/django/conf/__init__.py", line 92, in __init__
mod = importlib.import_module(self.SETTINGS_MODULE)
File "/usr/lib/python3.4/importlib/__init__.py", line 109, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: No module named 'Agora'
And here my tree file :
.
├── Agora
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-34.pyc
│   │   ├── settings.cpython-34.pyc
│   │   ├── urls.cpython-34.pyc
│   │   └── wsgi.cpython-34.pyc
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── API
│   ├── admin.py
│   ├── API_script.py
│   ├── __init__.py
│   ├── migrations
│   │   ├── __init__.py
│   │   └── __pycache__
│   │   └── __init__.cpython-34.pyc
│   ├── models.py
│   ├── __pycache__
│   │   ├── admin.cpython-34.pyc
│   │   ├── API_script.cpython-34.pyc
│   │   ├── __init__.cpython-34.pyc
│   │   ├── models.cpython-34.pyc
│   │   └── views.cpython-34.pyc
│   ├── tests.py
│   ├── unit_test.py
│   └── views.py
├── Contact
│   ├── admin.py
│   ├── __init__.py
│   ├── models.py
│   ├── __pycache__
│   │   ├── admin.cpython-34.pyc
│   │   ├── __init__.cpython-34.pyc
│   │   └── models.cpython-34.pyc
│   ├── tests.py
│   └── views.py
├── Dockerfile
├── generateadm.py
├── IMG_20150928_105102.jpg
├── __init__.py
├── manage.py
├── Message
│   ├── admin.py
│   ├── __init__.py
│   ├── models.py
│   ├── __pycache__
│   │   ├── admin.cpython-34.pyc
│   │   ├── __init__.cpython-34.pyc
│   │   ├── models.cpython-34.pyc
│   │   └── views.cpython-34.pyc
│   ├── tests.py
│   └── views.py
├── Mock
│   ├── admin.py
│   ├── __init__.py
│   ├── models.py
│   ├── __pycache__
│   │   ├── admin.cpython-34.pyc
│   │   ├── __init__.cpython-34.pyc
│   │   ├── models.cpython-34.pyc
│   │   └── views.cpython-34.pyc
│   ├── tests.py
│   └── views.py
├── Profile
│   ├── admin.py
│   ├── __init__.py
│   ├── models.py
│   ├── profile_script.py
│   ├── __pycache__
│   │   ├── admin.cpython-34.pyc
│   │   ├── __init__.cpython-34.pyc
│   │   ├── models.cpython-34.pyc
│   │   └── profile_script.cpython-34.pyc
│   ├── tests.py
│   └── views.py
├── Queue
│   ├── admin.py
│   ├── __init__.py
│   ├── models.py
│   ├── __pycache__
│   │   ├── admin.cpython-34.pyc
│   │   ├── __init__.cpython-34.pyc
│   │   └── models.cpython-34.pyc
│   ├── tests.py
│   └── views.py
├── requierement.txt
├── result.txt
└── runserver.sh
regards and thanks
Have you appended the django project path to python's path?
e.g.
import os, sys
BASE_PATH="/location/folder/where/manage.py/lives"
sys.path.append(BASE_PATH)
os.environ['DJANGO_SETTINGS_MODULE'] = 'Agora.settings'
While the sys.path issues pointed out by other answers is probably your current problem, it seems that for your use case (a script that does "something" on an app) a Django custom command is more well suited.
It is very easy to setup a custom command:
Create the path management/commands in your API folder. Do not forget to add empty __init__.py files in both management and commands folders.
Then create a Python module named for example apiscript.py inside the management/commands folder, with this content:
from django.core.management.base import BaseCommand, CommandError
from Profile.models import Profile_User
class Command(BaseCommand):
help = 'Describe the purpose of your script'
def handle(self, *args, **options):
# do something with Profile_User model
p = Profile_User.objects.get(pk=1)
You have all the Django machinery already set up (no need to call django.setup()) and you can call your script with:
./manage.py apiscript
It is more than likely the case that the project Agora is not on the python path.
There are two ways you can add it to the path depending on your situation.
Firstly: Depending on your OS you can symlink the Agora projects into the python path directory. This is easily done on linux and OSX and not so easy on windows. It will be something like:
ln -s /path/to/Agora /usr/local/lib/python2.7/site-packages/Agora
Secondly: Add the following code before your application code:
import sys
import os
agora_path = os.path.join('/path/to/library')
sys.path.append(agora_path)
# now add your code
# ...
I've used your ideas and made it agnostic :
import os
import sys
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.append(BASE_DIR)
print(BASE_DIR)
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "Agora.settings")
thanks