I'm using scrapy-0.16 for data extraction from LinkedIn.
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.http import Request
from scrapy import log
from linkedin.items import LinkedinItem, PersonProfileItem
from os import path
from linkedin.parser.HtmlParser import HtmlParser
import os
import urllib
from bs4 import UnicodeDammit
from linkedin.db import MongoDBClient
https://github.com/pondering/scrapy-linkedin
The error comes
Traceback (most recent call last):
File "C:\Users\TAWANE DUDEZ\Desktop\linkedin\linkedin\spiders\LinkedinSpider.py", line 6, in <module>
from linkedin.items import LinkedinItem, PersonProfileItem
ImportError: No module named linkedin.items
Cannot find linkedin.items module.
My suspicion is that you're trying to run the scrapy crawl LinkedinSpider command from the wrong directory. Try navigating to C:\Users\TAWANE DUDEZ\Desktop\linkedin and then running the command again.
Since the crawler is now starting, you also need to be running a MongoDB instance before starting the crawl. The README of the github project being used says to typemongod to start an instance. Just to check, you do have MongoDB and pymongo installed right?
Related
Django Version: 3.1.5
folder structure
So, I'm studying Django. When I try to generate random data for my project I get this error:
Traceback (most recent call last):
File "C:\PythonProjects\DJANGO\myblogsite\blog\management\commands\create_data.py", line 2, in <module>
from core.models import Category, Post, Comment
ModuleNotFoundError: No module named 'core'
Process finished with exit code 1
create_data.py
from django.core.management.base import BaseCommand
from core.models import Category, Post, Comment
from random import randint
import datetime
Has anybody a clue how to deal with this problem?
from django.core.management.base import BaseCommand
from blog.models import Category, Post, Comment
from random import randint
import datetime
Im using bs4 for crawling news data.
At the first time, I made crawler function on views.py but it make error 504 error due to long loading time.
So, I decided to crawling and save data with Django ORM with new python file named 'crawling.py' in the same directory with models.
My crawler importing below functions
# from django.portal import settings
from .models import *
import requests
from bs4 import BeautifulSoup
import urllib.request as req
import ssl
from bs4.builder import builder_registry
import time
but it occurs error like below
(project) macs-MacBook-Pro:portalpage mac$ python crawling.py
Traceback (most recent call last):
File "crawling.py", line 2, in <module>
from .models import *
ImportError: attempted relative import with no known parent package
I found way to run my crawler in root directory but I will use crontab for batch jobs so I would like to locate my crawling.py inside app directory.
How could I run my crawler on back-end smoothly?
This is the error screenshot, click
i want to run a wordcount program using dataflow in python 2.7 on gcloud SDK
i have used the path to import the PipelineOptions,
Here are two approaches-
1.from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
2.from apache_beam.pipeline import PipelineOptions
from apache_beam.pipeline import SetupOptions
but it still show 'ImportError: No module named options.pipeline_options' in both of the approaches which i try.
there is any solution in apache_beam or in python 2.7 ?
I'm receiving this error when I add disqus to the INSTALLED_APP:
Error: No module named urllib.parse
I tracked this down to the following line:
from django.utils.http import urlencode
from django.utils.six.moves.urllib.error import URLError
from django.utils.six.moves.urllib.request import (
ProxyHandler,
Request,
urlopen,
build_opener,
install_opener
)
I know that six.moves is not included with django 1.4.8, is there any substitute?
Thanks
Six is an external library, which Django includes for convenience.
You could try installing six separately, and change the imports, for example change
from django.utils.six.moves.urllib.error import URLError
to
from six.moves.urllib.error import URLError
I compiled my prototype Application with py2exe to check its function as an exe, and run into 0 errors until I go to start it. Nothing happens. A process starts with my app name, it thinks for a few seconds, then nothing. No log file is generated. The app works great when run in python environment, but not in the compiled exe. I've given my setup code below. Any ideas? :
from distutils.core import setup
import py2exe, sys, os
import matplotlib
import FileDialog
import dateutil
sys.argv.append('py2exe')
setup( windows=['ATLAS.pyw'], data_files=matplotlib.get_py2exe_datafiles(),
options = {"py2exe": {
"includes": "decimal, datetime",
"packages": ["FileDialog", "dateutil"],
'bundle_files': 2,
'compressed': True}
},
zipfile = None
)
Hooks utilized in the Application:
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg, NavigationToolbar2TkAgg
from matplotlib.backend_bases import key_press_handler
from pandas.sandbox.qtpandas import DataFrameWidget
from matplotlib.widgets import LassoSelector
from tkFileDialog import askopenfilename
from matplotlib.figure import Figure
import matplotlib.image as mpimg
from PySide import QtGui, QtCore
from matplotlib.path import Path
import pandas.io.sql as psql
from numpy import nonzero
import tkMessageBox as mb
from pylab import *
import pyodbc
import sys
import ttk
SOLVED:
So, using quick fingers (and compiling it with PyInstaller with the --debug option) I screen-capped the quickly-closing console window that contained the Traceback:
WindowsError: [Error 3] The system cannot find the path specified: 'C:\\path\\dateutil\\zoneinfo/*.*'
The zoneinfo file was being saved in pytz instead of dateutil. A quick rename solved the problem.
Only issue though, if you want to compile with -F or --onefile it will not work due to the initial improper naming convention. Not quite sure how to fix that though.