Is it dangerous to use dynamic database TEST_NAME in Django? - django

We are a small team of developers working on an application using the PostgreSQL database backend. Each of us has a separate working directory and virtualenv, but we share the same PostgreSQL database server, even Jenkins is on the same machine.
So I am trying to figure out a way to allow us to run tests on the same project in parallel without running into a database name collision. Furthermore, sometimes a Jenkins build would fail mid-way and the test database doesn't get dropped in the end, such that subsequent Jenkins build could get confused by the existing database and fail automatically.
What I've decided to try is this:
import os
from datetime import datetime
DATABASES = {
'default': {
# the usual lines ...
TEST_NAME: '{user}_{epoch_ts}_awesome_app'.format(
user=os.environ.get('USER', os.environ['LOGNAME']),
# This gives the number of seconds since the UNIX epoch
epoch_ts=int((datetime.utcnow() - datetime.utcfromtimestamp(0)).total_seconds())
),
# etc
}
}
So the test database name at each test run most probably will be unique, using the username and the timestamp. This way Jenkins can even run builds in parallel, I think.
It seems to work so far. But could it be dangerous? I'm guessing we're safe as long as we don't try to import the project settings module directly and only use django.conf.settings because that should be singleton-like and evaluated only once, right?

I'm doing something similar and have not run into any issue. The usual precautions should be followed:
Don't access settings directly.
Don't cause the values in settings to be evaluated in a module's top level. See the doc for details. For instance, don't do this:
from django.conf import settings
# This is bad because settings might still be in the process of being
# configured at this stage.
blah = settings.BLAH
def some_view():
# This is okay because by the time views are called by Django,
# the settings are supposed to be all configured.
blah = settings.BLAH
Don't do anything that accesses the database in a module's top level. The doc warns:
If your code attempts to access the database when its modules are compiled, this will occur before the test database is set up, with potentially unexpected results. For example, if you have a database query in module-level code and a real database exists, production data could pollute your tests. It is a bad idea to have such import-time database queries in your code anyway - rewrite your code so that it doesn’t do this.

Instead of the time, you could use the Jenkins executor number (available in the environment); that would be unique enough and you wouldn't have to worry about it changing.
As a bonus, you could then use --keepdb to avoid rebuilding the database from scratch each time... On the downside, failed and corrupted databases would have to be dropped separately (perhaps the settings.py can print out the database name that it's using, to facilitate manual dropping).

Related

Django - Settings : sensitive settings and different environments (dev, prod, staging). What is the recommended way to do this

I am using django version 2.1.7:
I have read a lot of article and questions about this. I found people using various methods.
Approches used in the general are:
1) setting environmental variables
2) multiple settings
3) load configuration variables from a file like using django-environ etc
Approach 1: I will not be using, because i dont want to use environmental variables to store variables.
Approach 3: uses 3rd party library and sometimes they may not be maintained any more. But if there is any standard way to do this i am fine.
Approach 2: Here i am not sure how to store the sensitive data seprately.
So can someone guide me in this regard.
I'm using a combination of all:
When launching gunicorn as a service in systemd, you can set your environment variables using the conf file in your .service.d directory:
[Service]
Environment="DJANGO_SETTINGS_MODULE=myapp.settings.production"
Environment="DATABASE_NAME=MYDB"
Environment="DATABASE_USER=MYUSER"
Environment="DATABASE_PASSWORD=MYPASSWORD"
...
That file is fetched from S3 (where it is encrypted) when the instance is launched. I have a startup script using aws-cli that does that.
In order to be able to run management commands, e.g. to migrate your database in production, you also need these variables, so if the environment variables aren't set, I fetch them directly from this file. So in my settings, I have something like this:
def get_env_variable(var_name):
try:
return os.environ[var_name]
except KeyError:
env = fetch_env_from_gunicorn()
return env[var_name]
def fetch_env_from_gunicorn():
gunicorn_conf_directory = '/etc/systemd/system/gunicorn.service.d'
gunicorn_env_filepath = '%s/gunicorn.conf' % (gunicorn_conf_directory,)
if not os.path.isfile(gunicorn_env_filepath):
raise ImproperlyConfigured('No environment settings found in gunicorn configuration')
key_value_reg_ex = re.compile(r'^Environment\s*=\s*\"(.+?)\s*=\s*(.+?)\"$')
environment_vars = {}
with open(gunicorn_env_filepath) as f:
for line in f:
match = key_value_reg_ex.search(line)
if match:
environment_vars[match.group(1)] = match.group(2)
return environment_vars
...
DATABASES = {
...
'PASSWORD': get_env_variable('DATABASE_PASSWORD'),
...}
Note that all management commands I run in my hosted instances, I need to explicitly pass --settings=myapp.settings.production so it knows which settings file to use.
Different settings for different types of environments. That's because things like DEBUG but also mail settings, authentication settings, etc... are too different between the environments to use one single settings file.
I have default.py settings in a settings directory with all common settings, and then production.py for example starts with from .default import * and just overrides what it needs to override like DEBUG = False and different DATABASES (to use the above mechanism), LOGGING etc...
With this, you have the following security risks:
Anyone with ssh access and permissions for the systemd directory can read the secrets. Make sure you have your instances in a VPN and restrict ssh access to specific IP addresses for example. Also I only allow ssh with ssh keys, not username/password, so no risk of stealing passwords.
Anyone with ssh access to the django app directory could run a shell and from django.conf import settings to then read the settings. Same as above, should be restricted anyway.
Anyone with access to your storage bucket (e.g. S3) can read the file. Again, this can be restricted easily.

Is it possible to log queries only for one view in Django?

I know, that in settings, I can have debug = True and all SQL queries are logged.
But, I want to log all the SQL queries made by one specific view, before the response is returned.
How can I do this in Django 1.3 ?
This can be done by changing logging settings within context of specific view.
You still need to have DEBUG enabled:
For performance reasons, SQL logging is only enabled when settings.DEBUG is set to True, regardless of the logging level or handlers that are installed.
... but you can modify configuration so that only queries by specific view(s) are logged.
Here's how it could be done:
1: In settings.py set django SQL logger (django.db.backends) level to INFO or higher and verify that it stopped recording SQL queries.
2.a. First line in your view to have SQL logging, set that logger level to DEBUG. Last line, set it back to original value. This is simplest way, but queries executed by middleware before/after view code don't get logged.
2.b. Write custom middleware to do the same thing before and after view processing. If you place it first, all queries made by other middlewares will be also logged.
Note that this method is not thread safe, but since you only want this setup in development environment, this should not matter.
Django documentation "Logging" chapter:
https://docs.djangoproject.com/en/1.4/topics/logging/
Python logging library reference:
http://docs.python.org/library/logging.html

Django+Postgres: "current transaction is aborted, commands ignored until end of transaction block"

I've started working on a Django/Postgres site. Sometimes I work in manage.py shell, and accidentally do some DB action that results in an error. Then I am unable to do any database action at all, because for any database action I try to do, I get the error:
current transaction is aborted, commands ignored until end of transaction block
My current workaround is to restart the shell, but I should find a way to fix this without abandoning my shell session.
(I've read this and this, but they don't give actionable instructions on what to do from the shell.)
You can try this:
from django.db import connection
connection._rollback()
The more detailed discussion of This issue can be found here
this happens to me sometimes, often it's the missing
manage.py migrate
or
manage.py syncdb
as mentioned also here
it also can happen the other way around, if you have a schemamigration pending from your models.py. With south you need to update the schema with.
manage.py schemamigration mymodel --auto
Check this
The quick answer is usually to turn on database level autocommit by adding:
'OPTIONS': {'autocommit': True,}
To the database settings.
I had this error after restoring a backup to a totally empty DB. It went away after running:
./manage syncdb
Maybe there were some internal models missing from the dump...
WARNING: the patch below can possibly cause transactions being left in an open state on the db (at least with postgres). Not 100% sure about that (and how to fix), but I highly suggest not doing the patch below on production databases.
As the accepted answer does not solve my problems - as soon as I get any DB error, I cannot do any new DB actions, even with a manual rollback - I came up with my own solution.
When I'm running the Django-shell, I patch Django to close the DB connection as soon as any errors occur. That way I don't ever have to think about rolling back transactions or handling the connection.
This is the code I'm loading at the beginning of my Django-shell-session:
from django import db
from django.db.backends.util import CursorDebugWrapper
old_execute = CursorDebugWrapper.execute
old_execute_many = CursorDebugWrapper.executemany
def execute_wrapper(*args, **kwargs):
try:
old_execute(*args, **kwargs)
except Exception, ex:
logger.error("Database error:\n%s" % ex)
db.close_connection()
def execute_many_wrapper(*args, **kwargs):
try:
old_execute_many(*args, **kwargs)
except Exception, ex:
logger.error("Database error:\n%s" % ex)
db.close_connection()
CursorDebugWrapper.execute = execute_wrapper
CursorDebugWrapper.executemany = execute_many_wrapper
For me it was a test database without migrations. I was using --keepdb for testing. Running it once without it fixed the error.
There are a lot of useful answers on this topic, but still it can be a challenge to figure out what is the root of the issue. Because of this, I will try to give just a little more context on how I was able to figure out the solution for my issue.
For Django specifically, you want to turn on logs for db queries and before the error is raised, you can find the query that is failing in the console. Run that query directly on db, and you will see what is wrong.
In my case, one column was missing in db, so after migration everything worked correctly.
I hope this will be helpful.
If you happen to get such an error when running migrate (South), it can be that you have lots of changes in database schema and want to handle them all at once. Postgres is a bit nasty on that. What always works, is to break one big migration into smaller steps. Most likely, you're using a version control system.
Your current version
Commit n1
Commit n2
Commit n3
Commit n4 # db changes
Commit n5
Commit n6
Commit n7 # db changse
Commit n8
Commit n9 # db changes
Commit n10
So, having the situation described above, do as follows:
Checkout repository to "n4", then syncdb and migrate.
Checkout repository to "n7", then syncdb and migrate.
Checkout repository to "n10", then syncdb and migrate.
And you're done. :)
It should run flawlessly.
If you are using a django version before 1.6 then you should use Christophe's excellent xact module.
xact is a recipe for handling transactions sensibly in Django applications on PostgreSQL.
Note: As of Django 1.6, the functionality of xact will be merged into the Django core as the atomic decorator. Code that uses xact should be able to be migrated to atomic with just a search-and-replace. atomic works on databases other than PostgreSQL, is thread-safe, and has other nice features; switch to it when you can!
I add the following to my settings file, because I like the autocommit feature when I'm "playing around" but dont want it active when my site is running otherwise.
So to get autocommit just in shell, I do this little hack:
import sys
if 'shell' in sys.argv or sys.argv[0].endswith('pydevconsole.py'):
DATABASES['default']['OPTIONS']['autocommit'] = True
NOTE: That second part is just because I work in PyCharm, which doesnt directly run manage.py
I got this error in Django 1.7. When I read in the documentation that
This problem cannot occur in Django’s default mode and atomic()
handles it automatically.
I got a bit suspicious. The errors happened, when I tried running migrations. It turned out that some of my models had my_field = MyField(default=some_function). Having this function as a default for a field worked alright with sqlite and mysql (I had some import errors, but I managed to make it work), though it seems to not work for postgresql, and it broke the migrations to the point that I didn't event get a helpful error message, but instead the one from the questions title.

Django Celery Database for Models on Producer and Worker

I want to develop an application which uses Django as Fronted and Celery to do background stuff.
Now, sometimes Celery workers on different machines need database access to my django frontend machine (two different servers).
They need to know some realtime stuff and to run the django-app with
python manage.py celeryd
they need access to a database with all models available.
Do I have to access my MySQL database through direct connection? Thus I have to allow user "my-django-app" access not only from localhost on my frontend machine but from my other worker server ips?
Is this the "right" way, or I'm missing something? Just thought it isn't really safe (without ssl), but maybe that's just the way it has to be.
Thanks for your responses!
They will need access to the database. That access will be through a database backend, which can be one that ships with Django or one from a third party.
One thing I've done in my Django site's settings.py is load database access info from a file in /etc. This way the access setup (database host, port, username, password) can be different for each machine, and sensitive info like the password isn't in my project's repository. You might want to restrict access to the workers in a similar manner, by making them connect with a different username.
You could also pass in the database connection information, or even just a key or path to a configuration file, via environment variables, and handle it in settings.py.
For example, here's how I pull in my database configuration file:
g = {}
dbSetup = {}
execfile(os.environ['DB_CONFIG'], g, dbSetup)
if 'databases' in dbSetup:
DATABASES = dbSetup['databases']
else:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
# ...
}
}
Needless to say, you need to make sure that the file in DB_CONFIG is not accessible to any user besides the db admins and Django itself. The default case should refer Django to a developer's own test database. There may also be a better solution using the ast module instead of execfile, but I haven't researched it yet.
Another thing I do is use separate users for DB admin tasks vs. everything else. In my manage.py, I added the following preamble:
# Find a database configuration, if there is one, and set it in the environment.
adminDBConfFile = '/etc/django/db_admin.py'
dbConfFile = '/etc/django/db_regular.py'
import sys
import os
def goodFile(path):
return os.path.isfile(path) and os.access(path, os.R_OK)
if len(sys.argv) >= 2 and sys.argv[1] in ["syncdb", "dbshell", "migrate"] \
and goodFile(adminDBConfFile):
os.environ['DB_CONFIG'] = adminDBConfFile
elif goodFile(dbConfFile):
os.environ['DB_CONFIG'] = dbConfFile
Where the config in /etc/django/db_regular.py is for a user with access to only the Django database with SELECT, INSERT, UPDATE, and DELETE, and /etc/django/db_admin.py is for a user with these permissions plus CREATE, DROP, INDEX, ALTER, and LOCK TABLES. (The migrate command is from South.) This gives me some protection from Django code messing with my schema at runtime, and it limits the damage an SQL injection attack can cause (though you should still check and filter all user input).
This isn't a solution to your exact problem, but it might give you some ideas for ways to smarten up Django's database access setup for your purposes.

How do I run a unit test against the production database?

How do I run a unit test against the production database instead of the test database?
I have a bug that's seems to occur on my production server but not on my development computer.
I don't care if the database gets trashed.
Is it feasible to make a copy the database, or part of the database that causes the problem? If you keep a backup server, you might be able to copy the data from there instead (make sure you have another backup, in case you messed the backup database).
Basically, you don't want to mess with live data and you don't want to be left with no backup in case you mess something up (and you will!).
Use manage.py dumpdata > mydata.json to get a copy of the data from your database.
Go to your local machine, copy mydata.json to a subdirectory of your app called fixtures e.g. myapp/fixtures/mydata.json and do:
manage.py syncdb # Set up an empty database
manage.py loaddata mydata.json
Your local database will be populated with data and you can test away.
Make a copy the database... It's really a good practices!!
Just execute the test, instead call commit, call rollback at the end of.
The first thing to try should be manually executing the test code on the shell, on the production server.
python manage.py shell
If that doesn't work, you may need to dump the production data, copy it locally and use it as a fixture for the testcase you are using.
If there is a way to ask django to use the standard database without creating a new one, I think rather than creating a fixture, you can do a sqldump which will generally be a much smaller file.
Short answer: you don't.
Long answer: you don't, you make a copy of the production database and run it there
If you really don't care about trashing the db, then Marco's answer of rolling back the transaction is my preferred choice as well. You could also try NdbUnit but I personally don't think the extra baggage it brings is worth the gains.
How do you test the test db now? By test db do you mean SQLite?
HTH,
Berryl
I have both a full-on-slow-django-test-db suite and a crazy-fast-runs-against-production test suite built from a common test module. I use the production suite for sanity checking my changes during development and as a commit validation step on my development machine. The django suite module looks like this:
import django.test
import my_test_module
...
class MyTests(django.test.TestCase):
def test_XXX(self):
my_test_module.XXX(self)
The production test suite module uses bare unittest and looks like this:
import unittest
import my_test_module
class MyTests(unittest.TestCase):
def test_XXX(self):
my_test_module.XXX(self)
suite = unittest.TestLoader().loadTestsFromTestCase(MyTests)
unittest.TextTestRunner(verbosity=2).run(suite)
The test module looks like this:
def XXX(testcase):
testcase.assertEquals('foo', 'bar')
I run the bare unittest version like this, so my tests in either case have the django ORM available to them:
% python manage.py shell < run_unit_tests
where run_unit_tests consists of:
import path.to.production_module
The production module needs a slightly different setUp() and tearDown() from the django version, and you can put any required table cleaning in there. I also use the django test client in the common test module by mimicking the test client class:
class FakeDict(dict):
"""
class that wraps dict and provides a getlist member
used by the django view request unpacking code, used when
passing in a FakeRequest (see below), only needed for those
api entrypoints that have list parameters
"""
def getlist(self, name):
return [x for x in self.get(name)]
class FakeRequest(object):
"""
an object mimicing the django request object passed in to views
so we can test the api entrypoints from the developer unit test
framework
"""
user = get_test_user()
GET={}
POST={}
Here's an example of a test module function that tests via the client:
def XXX(testcase):
if getattr(testcase, 'client', None) is None:
req_dict = FakeDict()
else:
req_dict = {}
req_dict['param'] = 'value'
if getattr(testcase, 'client', None) is None:
fake_req = FakeRequest()
fake_req.POST = req_dict
resp = view_function_to_test(fake_req)
else:
resp = testcase.client.post('/path/to/function_to_test/', req_dict)
...
I've found this structure works really well, and the super-speedy production version of the suite is a major time-saver.
If you database supports template databases, use the production database as a template database. Ensure that you Django database user has sufficient permissions.
If you are using PostgreSQL, you can easily do this specifying the name of your production database as POSTGIS_TEMPLATE(and use the PostGIS backend).