Accessing and sharing data between multiple sites in django from different databases

Accessing and sharing data between multiple sites in django from different databases - django

Currently I am having three sites for example let it be site1, site2 and site3 . Each site require authentication. Both site1 and site2 take the same database let it be "Portfolio" database and site3 is having a different database let it be "site3specific" database.
I am planning to have a Common Account database for keeping the login credentials of users for the all different sites available. So that each sites (i.e. site1, site2 and site3) will make use of the Common Account database for authenticating the user login. I am planning to keep the user details in a separate database since all the three sites in development, testing and live environment can share the same user credentials without redundancy. Also each site may have its own specific data that we may be having or entering differently in development, staging and live environments.
Also there is a possibility of sharing some data between sites.
Could anyone please tell me how can I achieve these task in django + Apache + mod_wsgi.
Please advice whether I need to have a globally shared settings file , model file and urls file. IF then how my globally shared settings files need to be modified . Please advice.

This is how we currently operate.
Each site has its own VirtualHost entry in the httpd.conf, and each app has its own django.wsgi config file which looks something like this (you can probably use a simpler one):
import os, sys, site, glob
prev_sys_path = list(sys.path)
root_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
site.addsitedir(glob.glob(os.path.join(root_dir, 'venv/lib/python*/site-packages'))[0])
sys.path.append('/usr/local/django-apps')
sys.path.append('/usr/local/django-apps/AppName')
new_sys_path = []
for item in list(sys.path):
if item not in prev_sys_path:
new_sys_path.append(item)
sys.path.remove(item)
sys.path[:0] = new_sys_path
os.environ['DJANGO_SETTINGS_MODULE'] = 'AppName.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
The VirtualHost needs to contain entries like this:
SetEnv DJANGO_ENV ${environment
WSGIDaemonProcess appname user=apache group=apache processes=2 threads=15 display-name=%{GROUP}
WSGIProcessGroup appname
WSGIScriptAlias / /usr/local/django-apps/AppName/apache/django.wsgi
<Directory /usr/local/django-apps/AppName/apache>
Order deny,allow
</Directory>
From there, the database set up is dependent on what database engine you're using.
Hope this helps.

You have to look at your requirements, and see if all sites would perhaps require and if so respect a single sign-on (sso) service. If that is the case, then you might need to look at how sessions are transfered between sites as sessions are SITE_ID specific. So, just making it work may be a great start, but looking at the big picture before you dig too deal in might be a good idea.

I set the same session name in these sites (a.xx.com/b.xx.com/c.xx.com -> sesssion name=xx.com). In my Django project, I used three settings files for each site and used the manager.py to separate these sites. The last step, start them up separately.

Related

Isn't it unsafe to expose django media_url to everyone?

According to Django document: "it was common to place static assets in MEDIA_ROOT along with user-uploaded files, and serve them both at MEDIA_URL. "
Does that mean everyone could access other people's uploaded files?
Isn't it unsafe?

Yes
A clever user can possibly guess the path to media files belonging to other users.
Django was born in the news publishing business where this was not of concern: the admin is based in the concept of trusted users like writers and editors belonging to the same organization.
Mitigation
Requiring authentication
Not my first choice but you can make the webserver authenticate against Django's user database:
WSGIScriptAlias / /path/to/mysite.com/mysite/wsgi.py
WSGIPythonPath /path/to/mysite.com
WSGIProcessGroup %{GLOBAL}
WSGIApplicationGroup %{GLOBAL}
<Location "/media/private-user-content/">
AuthType Basic
AuthName "Top Secret"
Require valid-user
AuthBasicProvider wsgi
WSGIAuthUserScript /path/to/mysite.com/mysite/wsgi.py
</Location>
The accepted answer recommends serving sensitive files from an authenticated Django view. It is OK for low traffic apps but for larger projects it carries a performance hit not every site can afford.
Serving from Cloud Storage Services
Large projects should be using some cloud storage backend for both performance and cost considerations. If your project is already hosted at some of the big 3 (AWS, GCP, Azure) check Django Storages. For example, if you are using the S3 backend, you can turn "query parameter authentication" for generated URLs and voila, problem gone. This has some advantages:
it is transparent to developers
enterprise-grade performance
lower cost of storage and network
probably the most secure option
Obfuscating the path
For small projects where you are serving media and application from the same webserver you can make very hard for nosy users to find media files not belonging to them:
1) disable the web server "auto index" in the MEDIA_ROOT folder. For apache, it is like:
<Directory /path/to/application/media/root>
Options -Indexes
</Directory>
Without indexes, in order to access files belonging to other people you will have to guess the exact file name.
2) make the file path hard to guess using a crypto hash in the "upload_to" parameter from FileFields:
def hard_to_guess(instance, filename):
salt = 'six random words for hidden salt'
hash = hashlib.md5(instance.user.username + salt)
return '/'.join(['content', hash, filename])
...
class SomeModel(models.Model):
...
user = models.ForeignKey(User)
content = models.FileField(upload_to=hard_to_guess)
...
This solution has no performance hit because media files are still served directly from the webserver.

To answer your question: yes, this would allow everyone to access everybody's uploaded files. And yes, this is a security risk.
As a general rule, sensitive files should never be served directly from the filesystem. As another rule, all files should be considered sensitive unless explicitly marked otherwise.
The origin of the MEDIA_ROOT and MEDIA_URL settings probably lie in Django's history as a publishing platform. After all, your editors probably won't mind if the pictures they add to articles can easily be found. But then again, pictures accompanying an article are usually non-sensitive.
To expand on your question: sensitive files should always be placed in a directory that is not directly accessible by the web server. Requesting those files should only be done through a view class or function, which can do some sensible access checking before serving the file.
Also, do not rely on obfuscation for sensitive files. For example, let's use Paulo's example (see other answer) to obfuscate photo albums. Now my pictures are stored like MEDIA_URL/A8FEB0993BED/P100001.JPG. If I share this link with someone else, they can easily try URLs like MEDIA_URL/A8FEB0993BED/P710032.JPG, basically allowing them to brute-force my entire photo album.

Django moving from Development server to Deployment Server

Regarding this documentation page from the Django website,
https://docs.djangoproject.com/en/1.2/howto/static-files/
where it says for development "With that said, Django does support static files during development. You can use the django.views.static.serve() view to serve media files."
So my question is, if I use this method, How much work is required to move to apache.
Currently I have a symbolic link to my image folder in the /var/www folder, and in the Django settings I have set the media url to :
MEDIA_URL = 'http://127.0.0.1:80/Images/'
This seems like a fairly easy hack but my project is going to get very big (with lots of css, js and pdfs) and I doubt if this method is good.

My approach was to have apache itself intercept the static files urls and serve them directly without invoking django at all. So my apache config looked something like this:
<VirtualHost *:80>
ServerName www.myproject.com
Alias /static /docs/my_website/static
<Directory /docs/my_website/static>
Order allow,deny
Allow from all
</Directory>
Alias /favicon.ico /docs/my_website/static/images/icons/favicon.ico
Include "/13parsecs/conf/django.conf"
</VirtualHost>
Then you can just keep doing whatever you're doing in the dev environment, and when you get to apache it won't invoke django at all for static content, which is what you want.

This is a perfectly good way of doing things. The only change you'll need to make is to put the actual URL of your site, rather than the localhost IP.

Don't "move to Apache", start using it in the first place. None of the software needed has licensing fees and runs on almost any platform, so the only excuse you could have is "I'm too lazy".

multiple django projects at same url

I would like to have multiple django projects living at the same root url like this:
example.com/ # controlled by home django project
example.com/project-2 # controlled by a separate django project
example.com/project-3 # controlled by yet another django project
I am already redefining the LOGIN_REDIRECT_URL, etc. as suggested by this excellent answer, but I have discovered another hurdle. I am actually using the same apps in the projects that live at example.com/project-2 and example.com/project-3, which causes some non-trivial problems for linking to content inside of a django project that have thus far been solved with seemingly hacky solutions.
For example, you can never refer to '/' in any template in either example.com/project-2 or example.com/project-3 to return to the root of the django project hosted at either of these URLs --- this will link to the home django project at example.com. To get around this, I have made a context processor that correctly prepends the root url of the project based on a custom settings.py variable SCRIPT_NAME: '' (for example.com), '/project-2' (for example.com/project-2), or '/project-3' (for example.com/project-3). This is all fine and good except that you need to do the same thing in the get_absolute_url functions. Before I knew it, I had just turned a bunch of code that was very reusable (by people other than myself) into code that was not reusable at all.
Is there a way to accomplish the same effect without having to prepend absolute URLs with the SCRIPT_NAME? Perhaps something clever with apache or mod_wsgi configuration? I am at a loss and hoping someone can help...
EDIT:
My apache configuration for example.com looks like this:
# redirect un-'/'-terminated urls to the '/'-terminated root urls
RewriteEngine On
RewriteRule /project-2$ /project-2/ [R=302,L]
RewriteRule /project-3$ /project-3/ [R=302,L]
# mod wsgi setup
WSGIScriptAlias /project-2 /srv/project2/project-2.wsgi
WSGIScriptAlias /project-3 /srv/project3/project-3.wsgi
WSGIScriptAlias / /srv/project1/project-1.wsgi

You don't show how you're serving these projects from your Apache configuration, which would have been useful. But if you define them as separate WSGIScriptAlias directives, then SCRIPT_NAME is automatically passed through for you, and Django takes it into account when reversing and creating URLs.
WSGIScriptAlias /project-2 /srv/project2/project2.wsgi
WSGIScriptAlias /project-3 /srv/project3/project3.wsgi
WSGIScriptAlias / /srv/project1/project1.wsgi

Django, how to access request object in settings.py

Is it somehow possible to access the request object inside settings.py?
Maybe by creating a temporary settings object, modifying it and then telling the rest of the "chain" to use that instead of the normal settings.py?
I need it to decide which DB-connection to use.
As an extra question. If I were to have something like 5000 database connections, would settings.py be just as efficient as storing them in a sqlite db on the web-frontend?
And would it be just as painless to update the connections? Or does the server have to be reloaded to catch the changes in settings.py?
Edit: To clarify why I might be needing that many connections.
I am building a webapp. It's SaaS and like many others the accounts will each have a subdomain that they can create users on and will have no need to interact with any other subdomain/account.
It would then be nice to confine each account to a DB all of its own. This grants some extra security and simplifies the app. There are many more advantages to it, but this should illustrate it just fine.
This is why I might end up with that many different databases (but not that many different physical servers if that makes any difference).

If i understand this right, you could use django's new db-routing system and select database on-the-fly based on model instance (e.g. your user) without the need of using() call.

Just adding this for anyone else looking for the same.
It is not currently possible. I have created a feature request on the Django bug-tracker (#13056 i think) and submitted a prototype for a fix, but I don't think it will be included anytime soon and it probably has a lot of bugs in it.
I have moved the project to Flask as it has the g object that is perfectly suited for this.

Django's ORM is not designed to switch database credentials mid-stride. Perhaps you would be happier with something a bit more DIY, such as SQLAlchemy.

I've addressed this problem on a site I've been using recently, and decided to let Apache/mod_wsgi do the work. This solution adds a bit of memory and CPU overhead, but for my app it was the best way to keep everything flexible.
Apache .conf:
SetEnv DJANGO_TEMPLATE_DIR '/usr/local/www/apache22/data/django/templates/'
<VirtualHost *:80>
ServerName encendio.whatever.com
ServerAdmin your_admin#whatever.com
DocumentRoot "/usr/local/www/apache22/data"
SetEnv DJANGO_DATABASE_NAME monkeys
SetEnv DJANGO_DATABASE_USER root
SetEnv DJANGO_DATABASE_PASSWORD secretPass
SetEnv DJANGO_DATABASE_PORT ''
SetEnv DJANGO_DATABASE_HOST ''
WSGIScriptAlias / /usr/local/www/apache22/data/django/wsgi_handler.py
</VirtualHost>
settings.py:
DATABASE_NAME = os.environ.get('DJANGO_DATABASE_NAME', '')
DATABASE_USER = os.environ.get('DJANGO_DATABASE_USER', '')
DATABASE_PASSWORD = os.environ.get('DJANGO_DATABASE_PASSWORD', '')
DATABASE_HOST = os.environ.get('DJANGO_DATABASE_HOST', '')
This allows you to set up each site as a VirtualHost in the httpd.conf.

How does one set up multiple accounts with separate databases for Django on one server?

What options are there for installing Django such that multiple users (each with an "Account") can each have their own database?
The semantics are fairly intuitive. There may be more than one User for an Account. An Account has a unique database (and a database corresponds to an account). Picture WordpressMU. :)
I've considered this:
External solution - Multiplex to multiple servers/daemons
Multiple Django installations, with each Django installation / project corresponding to an account that sets its own DATABASE_NAME, e.g.
File system:
/bob
/settings.py (contains DATABASE_NAME="bob")
/sue
/settings.py (contains DATABASE_NAME="sue")
Then having a Django instance running for each of bob and sue. I don't like this methodology- it feels brutish and it smells foul. But I'm confident it would work, and based on the suggestions it might be the cleanest, smartest way to do it.
The apps can be stored elsewhere; the only thing that need be unique to the django configuration is the settings.py (and even there, only DATABASE_NAME, etc. need be different, the rest can be imported).
(Incidentally, I'm using lighttpd and FastCGI.)
Internal solution - Django multiplexing database settings
On the other hand, I've thought of having one single Django installation, and
(a) Adding a "prefix_" to each database table, corresponding to account of the logged-in user; or
(b) Changing the database according to the account of the User that is logged in.
I'd be particularly interested in seeing the "Django way" to do these (hoping that it's something dead-simple). For example, middleware that takes a Request's User and changes the django.conf.SETTINGS['DATABASE_NAME'] to the database for this user's account.
This raises red flags, viz. Is this thread-safe? i.e. Does changing django.conf.SETTINGS affect other processes? Is there just an inherent danger in changing django.conf.SETTINGS -- would the DB connection be setup already? Is restarting the DB connection part of the public API? -- I'm going to have a look at the Django source when I look to this problem again.
I'm conscious that 2(a) and (b) could require User authentication to be stored and accessed in a different mechanism that the core.
For now, I'm going to go with the external mapping at the webserver layer- it's simplest and cleanest for now. However, I don't like the idea of FastCGI daemons running for every account- it seems to needlessly waste memory, particularly if there will be 2000+ accounts. However, I'd like to keep this discussion open as it's an interesting problem and the solution doesn't seem ideal for certain cases.
Comments duly appreciated.
Cheers

The Django way would definitely be to have separate installations with their own database name (#1). #2 would involve quite a bit of hacking with the ORM, and even then I'm not quite sure it's possible at all.
But mind you, you don't need a WHOLE new installation of all the site's models/views/templates for each user, just a new settings.py with all the appropriate paths to the common source files. Plus, to run all these installations in Apache, do it the way I do here:
<VirtualHost 1.2.3.4>
DocumentRoot /www/site1
ServerName site1.com
<Location />
SetHandler python-program
SetEnv DJANGO_SETTINGS_MODULE site1.settings
PythonPath "['/www'] + sys.path"
PythonDebug On
PythonInterpreter site1
</Location>
</VirtualHost>
<VirtualHost 1.2.3.4>
DocumentRoot /www/site2
ServerName site2.com
<Location />
SetHandler python-program
SetEnv DJANGO_SETTINGS_MODULE site2.settings
PythonPath "['/www'] + sys.path"
PythonDebug On
PythonInterpreter site2
</Location>
</VirtualHost>
assuming you've got /www/site1/settings.py, www/site2/settings.py and so on...
Of course, you now need to have a main site where people log in, that then redirects you to the appropriate site (here I've just put it as "site1.com", "site2.com", but you get the idea.)

The Django ORM doesn't provide multiple database support classes, but it is definitely possible - you'll have to write a custom manager and make a few other tweaks. Eric Florenzano has a great article with detailed code samples:
http://www.eflorenzano.com/blog/post/easy-multi-database-support-django/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js