I'm running a multi-tennant website, where I would like to reduce the overhead of creating a PostgreSQL connection per request. Django's CONN_MAX_AGE allows this, at the expense of creating a lot of open idle connections to PostgreSQL (8 workers * 20 threads = 160 connections). With 10MB per connection, this consumes a lot of memory.
The main purpose is reducing connection-time overhead.
Hence my questions:
Which setup should I use for such solution? (PgBouncer?)
Can I use 'transaction' pool mode with Django?
Would I be better off using something like: https://github.com/kennethreitz/django-postgrespool instead of Django's pooling?
Django 1.6 settings:
DATABASES['default'] = {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
....
'PORT': '6432'
'OPTIONS': {'autocommit': True,},
'CONN_MAX_AGE': 300,
}
ATOMIC_REQUESTS = False # default
Postgres:
max_connections = 100
PgBouncer:
pool_mode = session # Can this be transaction?
max_client_conn = 400 # Should this match postgres max_connections?
default_pool_size = 20
reserve_pool_size = 5
Here's a setup I've used.
pgbouncer running on same machine as gunicorn, celery, etc.
pgbouncer.ini:
[databases]
<dbname> = host=<dbhost> port=<dbport> dbname=<dbname>
[pgbouncer]
: your app will need filesystem permissions to this unix socket
unix_socket_dir = /var/run/postgresql
; you'll need to configure this file with username/password pairs you plan on
; connecting with.
auth_file = /etc/pgbouncer/userlist.txt
; "session" resulted in atrocious performance for us. I think
; "statement" prevents transactions from working.
pool_mode = transaction
; you'll probably want to change default_pool_size. take the max number of
; connections for your postgresql server, and divide that by the number of
; pgbouncer instances that will be conecting to it, then subtract a few
; connections so you can still connect to PG as an admin if something goes wrong.
; you may then need to adjust min_pool_size and reserve_pool_size accordingly.
default_pool_size = 50
min_pool_size = 10
reserve_pool_size = 10
reserve_pool_timeout = 2
; I was using gunicorn + eventlet, which is why this is so high. It
; needs to be high enough to accommodate all the persistent connections we're
; going to allow from Django & other apps.
max_client_conn = 1000
...
/etc/pgbouncer/userlist.txt:
"<dbuser>" "<dbpassword>"
Django settings.py:
...
DATABASES = {
'default': {
'ENGINE': 'django.contrib.gis.db.backends.postgresql_psycopg2',
'NAME': '<dbname>',
'USER': '<dbuser>',
'PASSWORD': '<dbpassword>',
'HOST': '/var/run/postgresql',
'PORT': '',
'CONN_MAX_AGE': None, # Set to None for persistent connections
}
}
...
If I remember correctly, you can basically have any number of "persistent" connections to pgbouncer, since pgbouncer releases server connections back to the pool when Django is done with them (as long as you're using transaction or statement for pool_mode). When Django tries to reuse its persistent connection, pgbouncer takes care of waiting for a usable connection to Postgres.
Related
My Django settings.py file contains the following configuration options:
# Caches
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.redis.RedisCache',
'LOCATION': 'redis://redis:6379',
}
}
# Queues
RQ_QUEUES = {
'default': {
'HOST': 'redis',
'PORT': 6379,
'DB': 0,
'DEFAULT_TIMEOUT': 360,
},
}
Both CACHES and RQ_QUEUES contain configuration details that point to a redis server.
Is it possible to reconfigure these settings to point to an instance of fakeredis instead ?
I have reviewed the fakeredis documentation and so far I have only seen examples where the redis connection is manually over-ridden, every time a call to redis is made.
It seems to me that when running tests, it would be much more convenient to simply point the Django CACHE location directly to fakeredis.
Is this possible?
I am trying to run two identical Django projects on different databases, one for production using a certain port number (say, 80) and the other for testing using another port number (say, 8000). I also used Nginx and Gunicorn as the reverse proxy server and the application server, with Nginx listening to ports 80 and 8000 and forwarding to gunicorn of ports 8001 and 8002, respectively.
The problem is: how do I know the port number of the request in Django's settings.py so that the project can choose different databases?
The standard practice for doing this in django is to create a local_settings.py file
Put this at the top of the local_settings.py file:
try:
from settings import *
except ImportError:
pass
Now in local_settings.py you must override the following variable:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'dbname',
'USER': 'dbuser',
'PASSWORD': 'dbpassword',
'HOST': '127.0.0.1',
'PORT': '5432',
}
}
With different values for both projects
Then when running Django you need to set the following environment variable:
export DJANGO_SETTINGS_MODULE="appname.local_settings"
One way to bunch this all together is to create a run.sh file that first sets this variable, then runs gunicorn
To sum it up, settings.py is common between both projects, local_settings.py overrides those variables that are different between different projects
I have a Django website with Postgresql backend, for which I'm utilizing pgbouncer for db connection pooling (transaction mode).
The application and the DB reside on separate servers (1 server each). I have installed pgbouncer on the application server. My question is: what should the config be in settings.py? Note that I'm using Unix sockets for connecting to pgbouncer.
My current settings.py contains:
DATABASE_URL = 'postgres://user1:pass1#xx.xxx.xxx.xxx:5432/db1'
DATABASES = {
'default': dj_database_url.config(default=DATABASE_URL)
}
Relevant sections of pgbouncer.ini are:
[databases]
db1 = host=xx.xxx.xxx.xxx port=5432 dbname=db1
listen_addr = *
listen_port = 6432
auth_type = md5
unix_socket_dir = /var/run/postgresql
pool_mode = transaction
max_client_conn = 200
default_pool_size = 300
userlist.txt contains:
"user1" "pass1"
Note: One answer is here, but doesn't work for me since the DB isn't available locally in my case. I need to set the DATABASE_URL environment variable, instead of using default = '...'.
One suggestions seems to be to treat pgbouncer as a database in settings.py. In that case, would something like the following work?
if PRODUCTION == '1':
#PRODUCTION is set to '1' if in production environment
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'pgbouncer',
'USER': 'user1',
'PASSWORD': 'pass1',
'HOST': '/var/run/postgresql',
'PORT': '6432',
}
From the docs:
pgbouncer is a PostgreSQL connection pooler. Any target application
can be connected to pgbouncer as if it were a PostgreSQL server, and
pgbouncer will create a connection to the actual server, or it will
reuse one of its existing connections.
Also,
Have your application (or the psql client) connect to pgbouncer
instead of directly to PostgreSQL server.
The configurations:
pgbouncer.ini: An example pgbouncer.ini with comments about defaults
[databases]
db1 = host=xx.xxx.xxx.xxx port=5432 dbname=db1
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = userlist.txt
unix_socket_dir = /var/run/postgresql
pool_mode = transaction
max_client_conn = 100
default_pool_size = 20
userlist.txt:
"user1" "pass1"
to put in settings.py:
if PRODUCTION == '1':
#PRODUCTION is set to '1' if in production environment
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'db1',
'USER': 'user1',
'PASSWORD': 'pass1',
'HOST': '/var/run/postgresql',
# 'PORT': '6432',
}
Extra:
In case not using unix socket - you can set HOST : '127.0.0.1' or 'localhost' if pgbouncer is running locally, or whatever the IP of server pgbouncer will be running on.
From the docs:
If you’re using PostgreSQL, by default (empty HOST), the connection to
the database is done through UNIX domain sockets (‘local’ lines in
pg_hba.conf). If your UNIX domain socket is not in the standard
location, use the same value of unix_socket_directory from
postgresql.conf. If you want to connect through TCP sockets, set HOST
to ‘localhost’ or ‘127.0.0.1’ (‘host’ lines in pg_hba.conf). On
Windows, you should always define HOST, as UNIX domain sockets are not
available.
In case of postgreSQL For ENGINE you can use postgresql or postgresql_psycopg2 - there's difference between the both given your Django version - postgresql_psycopg2 vs posgresql.
All of your DB settings in settings.py should be identical to the settings in your pgbouncer config, except the host in settings.py will point to pgbouncer. You probably need to change 'NAME': 'pgbouncer' to 'NAME': 'db1'. Since you're using a unix socket the port shouldn't matter.
I've been stuck with this problem for a couple of days.
I developed an application for appengine using Django and I'd like to use Google Cloud SQL for my database. Everything works fine until I want to apply migrations on the development server when it fails with the following error:
django.db.utils.OperationalError: (1045, "Access denied for user 'MY_DB_USER'#'MY_IP' (using password: YES)")
What I've done is as follows:
I followed the instructions in the Django Support page to
develop my application.
In order to create a 1st generation Cloud SQL instance I followed
the steps outlined here, using the Cloud SDK.
I then created a new user following the instructions here and assigned it a password.
I deployed the application using the following command line:
gcloud preview app deploy MY-APP-DIR/app.yaml --version 0-1-0
I authorized my IP and my AppEngine Application ID. They are both listed in the ''Authorization'' section under ''Access Control'' in my SQL instance.
Finally, I tried to apply migrations using the following command line:
SETTINGS_MODE='prod' MY-APP-DIR/manage.py migrate
settings.py
The relevant portion of my settings.py looks as follows:
if os.getenv('SERVER_SOFTWARE', '').startswith('Google App Engine'):
DEBUG = False
# Running on production App Engine, so use a Google Cloud SQL database.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'HOST' : '/cloudsql/[MY-PROYECT-ID]:[MY-CLOUD-SQL-INSTANCE]',
'NAME': '[MY-DB-NAME]',
'USER': 'root',
}
}
elif os.getenv('SETTINGS_MODE') == 'prod':
DEBUG = False
# Running in development, but want to access the Google Cloud SQL instance
# in production.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': '[MY-DB-NAME]', # db name.
'USER': '[MY-DB-USER]',
'PASSWORD' : '[MY-DB-USER-PASSWORD]',
'HOST' : '[IPV4 ASSIGNED IN GOOGLE CONSOLE]',
'PORT': '3306',
}
}
else:
# Running in development, so use a local MySQL database.
DEBUG = True
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': '[MY-LOCAL-DB]',
'USER': 'root',
'PASSWORD': 'root',
}
}
Any idea as to what might be causing the problem?
Thank you!
I finally figured out what was the problem.
The proper way to grant a user database access, in order to apply migrations, is the following:
White list your IP. It should be shown under 'Authorized Networks'
Create a new database user account, but do not choose the 'Allow any host (%)' wildcard, instead select the "Restrict host by name, address, or address range" option and assign your IP ( The one you just whitelisted ).
You should now be able to run migrations with the command: SETTINGS_MODE='prod' PROJECT_DIR/manage.py migrate
As a side note, make sure the root user whose host is localhost doesn't have a password, else your App Engine application won't be able to connect to the database.
Hope this helps someone else!
I have 2 Django projects with following db settings:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'db1', # 'db2' for second db
...
}
}
When trying to sync second db with command
python manage.py syncdb --database=db2
I receive error
django.db.utils.ConnectionDoesNotExist: The connection db2 doesn't exist
When I use some other commands, South uses migrations from first project and fills db2 with wrong tables. How to correctly sync/migrate several projects served by single Django + South instance?
The database syncing method does not take the NAME key in the --database option. As specified earlier, default for your db1 only works.
So you need to setup an additional database dictionary for your db2.