Importing CSV to Django and settings not recognised - django

So i'm getting to grips with Django, or trying to. I have some code that isn't dependent on being called by the webpage - it's designed to populate the database with information. Eventually it will be set up as a cron job to run overnight. This is the first crack at it, which is to do an initial population (once I have that working, I'll move to an add structure, where only new records are pushed.) I'm using Python 2.7, Django 1.5 and Sqlite3. When I run this code, I get
Requested setting DATABASES, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.
That seems fairly obvious, but I've spent a couple of hours now trying to work out how to adjust that setting. How do I call / open a connection / whatever the right terminology is here? I have a number of functions like this that will be scheduled jobs, and this has been frustrating me all afternoon.
import urllib2
import csv
import requests
from django.db import models
from gmbl.models import Match
master_data_file = urllib2.urlopen("http://www.football-data.co.uk/mmz4281/1213/E0.csv", "GET")
data = list(tuple(rec) for rec in csv.reader(master_data_file, delimiter=','))
for row in data:
current_match = Match(matchdate=row[1],
hometeam=row[2],
awayteam = row [3],
homegoals = row [4],
awaygoals = row[5],
homeshots = row[10],
awayshots = row[11],
homeshotsontarget = row[12],
awayshotsontarget = row[13],
homecorners = row[16],
awaycorners = row[17])
current_match.save()
I had originally started out with http://django-csv-importer.readthedocs.org/en/latest/ but I had the same error, and the documentation doesn't make much sense trying to debug it. When I tried calling settings.configure in the function, it said it didn't exist; presumably I had to import it, but couldn't make that work.

Make sure Django, and your project are in PYTHONPATH then you can do:
import urllib2
import csv
import requests
from django.core.management import setup_environ
from django.db import models
from yoursite import settings
setup_environ(settings)
from gmbl.models import Match
master_data_file = urllib2.urlopen("http://www.football-data.co.uk/mmz4281/1213/E0.csv", "GET")
data = list(tuple(rec) for rec in csv.reader(master_data_file, delimiter=','))
# ... your code ...
Reference: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
Hope it helps!

Related

How to configure firebase admin at Django server?

I am trying to add custom token for user authentication (phone number and password) and as per reference documents I would like to configure server to generate custom token.
I have installed, $ sudo pip install firebase-admin
and also setup an environment : export GOOGLE_APPLICATION_CREDENTIALS="[to json file at my server]"
I am using Django project at my server where i have created all my APIs.
I am stucked at this point where it says to initialize app:
default_app = firebase_admin.initialize_app()
Where should i write the above statement within Django files? and how should i generate endpoint to get custom token?
Regards,
PD
pip install firebase-admin
credentials.json file includes some private keys. So, you can’t add to your project directly. If you’re using the git version system and you want to host this file in your project folder, you must add the file name to your “.gitignore”.
Set your operation system environ variable. You can use for MacOSX or Linux distributions; For set a variable in window os (https://www.computerhope.com/issues/ch000549.htm).
$ export GOOGLE_APPLICATION_CREDENTIALS='/path/to/credentials.json'
This part is important, google package (It came with firebase_admin package) looking at some conditions for credentials. One of them is os.environ.get(‘GOOGLE_APPLICATION_CREDENTIALS’). If you set this file, than you don’t need to anythig fot initialize firebase. Otherwise you should define manually.
For initial firebase look at. Set up configurations (https://firebase.google.com/docs/firestore/quickstart#initialize )
Create a file named “firebase.py”.
$ touch firebase.py
Now we can use the “firebase_admin” package for querying. Our firebase.py seems like;
import time
from datetime import timedelta
from uuid import uuid4
from firebase_admin import firestore, initialize_app
__all__ = ['send_to_firebase', 'update_firebase_snapshot']
initialize_app()
def send_to_firebase(raw_notification):
db = firestore.client()
start = time.time()
db.collection('notifications').document(str(uuid4())).create(raw_notification)
end = time.time()
spend_time = timedelta(seconds=end - start)
return spend_time
def update_firebase_snapshot(snapshot_id):
start = time.time()
db = firestore.client()
db.collection('notifications').document(snapshot_id).update(
{'is_read': True}
)
end = time.time()
spend_time = timedelta(seconds=end - start)
return spend_time
You Refer this link(https://medium.com/#canadiyaman/how-to-use-firebase-with-django-project-34578516bafe)

How to pass custom parameters(such as -o) to scrapy crawler

I'm currently working on python2.7/Scrapy 1.8 project.
I work within a Docker container and using a
launchable.py:
import scrapy
from scrapy.crawler import CrawlerProcess
from spiders import addonsimilartechSpider, similartechSpider
process = CrawlerProcess()
process.crawl(similartechSpider.SimilarTechSpider)
process.crawl(addonsimilartechSpider.AddonSimilarSpider)
process.start()
I used to start my scrapy like this :
scrapy crawl <nameofmyspider> -o output.xlsx
I installed scrapy-xlsx and used it until now, now that I have my launchable.py I dont know how to pass 'custom' arguments through scrappy crawler (not spider).
I understand the difference between scrapy settings and spider settings, so :
process.crawl(similartechSpider.SimilarTechSpider, input='-o', first='test1.xlsx')
will likely not work right?
thanks for any of your time taken to answer this.
Use the corresponding Scrapy settings instead (FEED_*).
You can pass them to CrawlerProcess as a dict.
CrawlerProcess(settings={
'FEED_URI': 'output_file_name.xlsx',
'FEED_EXPORTERS' : {'xlsx': 'scrapy_xlsx.XlsxItemExporter'},
})

How to create a pyspark udf, calling a class function from another class function in the same file?

I'm creating a pyspark udf inside a class based view and I have the function what I want to call, inside another class based view, both of them are in the same file (api.py), but when I inspect the content of the dataframe resulting, I get this error:
ModuleNotFoundError: No module named 'api'
I can't understand why this happens, I tried to do a similar code in the pyspark console and it worked good. A similar question was asked here but the difference is that I'm trying to do that in the same file.
This a piece of my full code:
api.py
class TextMiningMethods():
def clean_tweet(self,tweet):
'''
some logic here
'''
return "Hello: "+tweet
class BigDataViewSet(TextMiningMethods,viewsets.ViewSet):
#action(methods=['post'], detail=False)
def word_cloud(self, request, *args, **kwargs):
'''
some previous logic here
'''
spark=SparkSession \
.builder \
.master("spark://"+SPARK_WORKERS) \
.appName('word_cloud') \
.config("spark.executor.memory", '2g') \
.config('spark.executor.cores', '2') \
.config('spark.cores.max', '2') \
.config("spark.driver.memory",'2g') \
.getOrCreate()
sc.sparkContext.addPyFile('path/to/udfFile.py')
cols = ['text']
rows = []
for tweet_account_index, tweet_account_data in enumerate(tweets_list):
tweet_data_aux_pandas_df = pd.Series(tweet_account_data['tweet']).dropna()
for tweet_index,tweet in enumerate(tweet_data_aux_pandas_df):
row= [tweet['text']]
rows.append(row)
# Create a Pandas Dataframe of tweets
tweet_pandas_df = pd.DataFrame(rows, columns = cols)
schema = StructType([
StructField("text", StringType(),True)
])
# Converts to Spark DataFrame
df = spark.createDataFrame(tweet_pandas_df,schema=schema)
clean_tweet_udf = udf(TextMiningMethods().clean_tweet, StringType())
clean_tweet_df = df.withColumn("clean_tweet", clean_tweet_udf(df["text"]))
clean_tweet_df.show() # This line produces the error
This similar test in pyspark works good
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.functions import udf
def clean_tweet(name):
return "This is " + name
schema = StructType([StructField("Id", IntegerType(),True),StructField("tweet", StringType(),True)])
data = [[ 1, "tweet 1"],[2,"tweet 2"],[3,"tweet 3"]]
df = spark.createDataFrame(data,schema=schema)
clean_tweet_udf = udf(clean_tweet,StringType())
clean_tweet_df = df.withColumn("clean_tweet", clean_tweet_udf(df["tweet"]))
clean_tweet_df.show()
So these are my questions:
What is this error related to? and How can I fix it?
What is the right way to create a pyspark udf when you're working with class based view? is a wrong practice to write functions that you will use as pyspark udf, in the same file where you will call them? (in my case, all my api endpoints, working with django rest framework)
Any help will be appreciated, thanks in advance
UPDATE:
This link and this link explains how to use custom classes with pyspark using SparkContext, but not with SparkSession that is my case , but I used this:
sc.sparkContext.addPyFile('path/to/udfFile.py')
The problem is that I defined the class where I have the functions to use as pyspark udf, in the same file where I'm creating the udf function for the dataframe (as a showed in my code). I couldn't found how to reach that behaviour when the path of addPyFile() is in the same code. In spite of that, I moved my code and I followed these steps (that was another error that I fixed):
Create a new folder called udf
Create a new empty __ini__.py file, to make the directory to a package.
And create a file.py for my udf functions.
core/
udf/
├── __init__.py
├── __pycache__
└── pyspark_udf.py
api/
├── admin.py
├── api.py
├── apps.py
├── __init__.py
In this file, I tried to import the dependencies either at the beginning or inside the function. In all the cases I receive ModuleNotFoundError: No module named 'udf'
pyspark_udf.py
import re
import string
import unidecode
from nltk.corpus import stopwords
class TextMiningMethods():
"""docstring for TextMiningMethods"""
def clean_tweet(self,tweet):
# some logic here
I have tried with all of these, At the beginning of my api.py file
from udf.pyspark_udf import TextMiningMethods
# or
from udf.pyspark_udf import *
And inside the word_cloud function
class BigDataViewSet(viewsets.ViewSet):
def word_cloud(self, request, *args, **kwargs):
from udf.pyspark_udf import TextMiningMethods
In the python debugger this line works:
from udf.pyspark_udf import TextMiningMethods
But when I show the dataframe, i receive the error:
clean_tweet_df.show()
ModuleNotFoundError: No module named 'udf'
Obviously, the original problem changed to another, now my problem is more related with this question, but I couldn't find a satisfactory way to import the file yet and create a pyspark udf callinf a class function from another class function.
What I'm missing?
After different tries, I couldn't find a solution by referencing to a method in the path of addPyFile(), located in the same file where I was creating the udf (I would like to know if this is a bad practice) or in another file, technically addPyFile(path) documentation says:
Add a .py or .zip dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.
So what I mention should be possible. Based on that, I had to used this solution and zip all the udf folder from it's highest level with:
zip -r udf.zip udf
Also, in the pyspark_udf.py I had to import my dependencies as below to avoid this problem
class TextMiningMethods():
"""docstring for TextMiningMethods"""
def clean_tweet(self,tweet):
import re
import string
import unidecode
from nltk.corpus import stopwords
Instead of:
import re
import string
import unidecode
from nltk.corpus import stopwords
class TextMiningMethods():
"""docstring for TextMiningMethods"""
def clean_tweet(self,tweet):
Then, finally this line worked good:
clean_tweet_df.show()
I hope this could be useful for anyone else
Thank you! Your approach worked for me.
Just to clarify my steps:
Made a udf module with __init__.py and pyspark_udfs.py
Made a bash file to zip udfs first and then run my files on the top level:
runner.sh
echo "zipping udfs..."
zip -r udf.zip udf
echo "udfs zipped"
echo "running script..."
/opt/conda/bin/python runner.py
echo "script ended."
In actual code imported my udfs from udf.pyspark_udfs module and initialized my udfs in the python function I need, like so:
def _produce_period_statistics(self, df: pyspark.sql.DataFrame, period: str) -> pyspark.sql.DataFrame:
""" Produces basic and trend statistics based on user visits."""
# udfs
get_hist_vals_udf = F.udf(lambda array, bins, _range: get_histogram_values(array, bins, _range), ArrayType(IntegerType()))
get_hist_edges_udf = F.udf(lambda array, bins, _range: get_histogram_edges(array, bins, _range), ArrayType(FloatType()))
get_mean_udf = F.udf(get_mean, FloatType())
get_std_udf = F.udf(get_std, FloatType())
get_lr_coefs_udf = F.udf(lambda bar_height, bar_edges, hist_upper: get_linear_regression_coeffs(bar_height, bar_edges, hist_upper), StringType())
...

Why can't I import WD_ALIGN_PARAGRAPH from docx.enum.text?

I transferred some code from IDLE 3.5 (64 bits) to pycharm (Python 2.7). Most of the code is still working, for example I can import WD_LINE_SPACING from docx.enum.text, but for some reason I can't import WD_ALIGN_PARAGRAPH.
At first, nearly non of the imports worked, but after I did
pip install python-docx
instead of
pip install docx
most of the imports worked except for WD_ALIGN_PARAGRAPH.
# works
from __future__ import print_function
import xlrd
import xlwt
import os
import subprocess
from calendar import monthrange
import datetime
from docx import Document
from datetime import datetime
from datetime import date
from docx.enum.text import WD_LINE_SPACING
from docx.shared import Pt
# does not work
from docx.enum.text import WD_ALIGN_PARAGRAPH
I don't get any error messages but Pycharm marks the line as error:
"Cannot find reference 'WD_ALIGN_PARAGRAPH' in 'text.py'".
You can use this instead:
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
and then substitute WD_PARAGRAPH_ALIGNMENT wherever WD_ALIGN_PARAGRAPH would have appeared before.
The reason this is happening is that the actual enum object is named WD_PARAGRAPH_ALIGNMENT, and a decorator is applied that also allows it to be referenced as WD_ALIGN_PARAGRAPH (which is a little shorter, and possibly clearer). I expect the syntax checker in PyCharm is operating on direct module attributes and doesn't pick up the alias, which is resolved by the Python parser/compiler.
Interestingly, I expect your code would work fine either way. But to get rid of the annoying message you can use the base name.
If someone uses pylint it can be easily suppressed with # pylint: disable=E0611 added at the end of the import line.

Separate custom settings variable between development, staging and production

I'm following the project structure as laid out by Zachary Voase, but I'm struggling with one specific issue.
I'd very much like to have a custom settings boolean variable (let's call it SEND_LIVE_MAIL) that I would be using in the project. Basically, I'd like to use this settings variable in my code and if SEND_LIVE_MAIL is True actually send out a mail, whereas when it is set to False just print its contents out to the console. The latter would apply to the dev environment and when running unittests.
What would be a good way of implementing this? Currently, depending on the environment, the django server uses dev, staging or prd settings, but for custom settings variables I believe these need to be imported 'literally'. In other words, I'd be using in my views something like
from settings.development import SEND_LIVE_MAIL
which of course isn't what I want. I'd like to be able to do something like:
from settings import SEND_LIVE_MAIL
and depending on the environment, the correct value is assigned to the SEND_LIVE_MAIL variable.
Thanks in advance!
You shouldn't be importing directly from your settings files anyways. Use:
>>> from django.conf import settings
>>> settings.SEND_LIVE_MAIL
True
The simplest solution is to have this at the bottom of your settings file:
try:
from local_settings import *
except ImportError:
pass
And in local_settings.py specify all your environment-specific overrides. I generally don't commit this file to version control.
There are more advanced ways of doing it, where you end up with a default settings file and a per-environment override.
This article by David Cramer covers the various approaches, including both of the ones I've mentioned: http://justcramer.com/2011/01/13/settings-in-django/
import os
PROJECT_PATH = os.path.dirname(__file__)
try:
execfile(os.path.join(PROJECT_PATH, local_settings.py'))
except IOError:
pass
Then you can have your local_settings.py behave as if it was pasted directly into your settings.py:
$ cat local_settings.py
INSTALLED_APPS += ['foo']
You can do something like this for a wide variety of environment based settings, but here's an example for just SEND_LIVE_MAIL.
settings_config.py
import re
import socket
class Config:
def __init__(self):
fqdn = socket.getfqdn()
env = re.search(r'(devhost|stagehost|prodhost)', fqdn)
env = env and env.group(1)
env = env or 'devhost'
if env == 'devhost':
self.SEND_LIVE_MAIL = # whatever
elif env == 'stagehost':
self.SEND_LIVE_MAIL = # whatever
elif env == 'prodhost':
self.SEND_LIVE_MAIL = # whatever
config = Config()
settings.py
from settings_config import config
SEND_LIVE_MAIL = config.SEND_LIVE_MAIL