Managing large user databases for single-signon - django

How would you implement a system with the following objectives:
Manage authentication,
authorization for
hundreds of thousands of existing users currently tightly integrated with a 3rd party vendor's application (We want to bust these users out into something we manage and make our apps work against it, plus our 3rd party vendors work against it).
Manage profile information linked to those users
Must be able to be accessed from any number of web applications on just about any platform (Windows, *nix, PHP, ASP/C#, Python/Django, et cetera).
Here some sample implementations:
LDAP/AD Server to manage everything. Use custom schema for all profile data. Everything can authenticate against LDAP/AD and we can store all sorts of ACLs and profile data in a custom schema.
Use LDAP/AD for authentication only, tie LDAP users to a most robust profile/authorization server using some sort of traditional database (MSSQL/PostgreSQL/MySQL) or document based DB (CouchDB, SimpleDB, et cetera). Use LDAP for authorization, then hit the DB for more advanced stuff.
Use a traditional database (Relational or Document) for everything.
Are any of these three the best? Are there other solutions which fit the objectives above and are easier to implement?
** I should add that almost all applications that will be authenticating against the user database will be under our control. The lone few outsiders will be the applications we're removing the current user database from and perhaps 1 or 2 others. Nothing so broad as to need an openID server.
Its also important to know that a lot of these users have had these accounts for 5-8 years and know their logins and passwords, et cetera.

There is a difference between authentication and authorization/profiling so don't force both necessarily into a single tool. Your second solution of using LDAP for authentication and a DB for authorization seems more robust as the LDAP data is controlled by the user and the DB would be controlled by an admin. The latter would likely morph in structure and complexity over time, but authentication is just that authentication. Separation of these functions will prove more manageable.

If you have an existing ActiveDirectory infrastructure, that will be the way to go. This will be particularly advantageous to companies that have already had Windows servers set up for authentication. If this is the case, I'm leaning towards your first bullet point in "sample implementations".
Otherwise it will be a toss-up between AD and opensource LDAP options.
It might be not viable to roll your own authentication schema for single-sign-on (especially considering the high amount of documentation and integration work you might have to do), and obviously do not bundle your authentication server with any of the applications running on your system (since you want it to be able to be independent of the load of such applications).
Goodluck!

Use LDAP/AD for authentication only, tie LDAP users to a most robust profile/authorization server using some sort of traditional database (MSSQL/PostgreSQL/MySQL) or document based DB (CouchDB, SimpleDB, et cetera). Use LDAP for authorization, then hit the DB for more advanced stuff.

We have different sites with around 100k users and they all work with normal databases. If most applications can access the db you can use this solution.

You can always implement your own OpenID server. There is already a Python library for OpenID so it should be fairly easy.
Of course you don't need to accept logins authorized by other servers in your applications. Accept credentials authorized only by your own server.
Edit: I have found an implementation of OpenID server protocol in Django.
Edit2: There is an obvious advantage in implementing OpenID for your users. They will be able to login to StackOverflow with their logins :-)

Related

Cross Platform Single Sign On

Want to know some best ways how to achieve Single Sign On for cross platform django projects. I have a monolithic application which is getting converted to Multi Tenant system. The core part of the monolithic application is converted and divided into micro services but there are portions and part of monolithic application which will take time to get converted.
So currently I cannot remove monolithic application hence needed a way to implement Single Sign On for these two application running in parallel.
Monolithic Stack:- Python, Django1.10, mysql,
MultiTenantSystem Stack :- Python, Django2.1, Postgres
Some references :-
https://github.com/aldryn/django-simple-sso
https://medium.com/#MicroPyramid/django-single-sign-on-sso-to-multiple-applications-64637da015f4
I would recommand working with OpenID Connect or SAML.
At work we are currently using django-oidc-provider with some business customization of course. This allows you to serve a single sign-on across multiple platforms.
The way it works is having a central authentication server that handles all logins and redirects the user to clients, which then again, exchange for an access-token and/or id-token. How to implement the access token from here varies, but for your sake the back-end middleware would fetch user info from the authentication server, and give the user a session cookie for your service related to the user info just fetched.
Or even better, use id-tokens. That way you dont need to ask for user info from the authentication server as these are JWT and can be verified by cryptography.
For more info you can checkout the OpenID website.

user-authentication from remote site?

I am building a tool, in django, for a client's web site.
The tool I am building requires users to be signed in to an existing account.
User-authentication is handled by legacy software on another vendor's servers.
I can contact the programmer who wrote the legacy software (I am unsure of their development environment), but I am not sure what to ask for -- what hooks, api, rpc, etc. do I want?
Is there a design pattern for this type of situation? And what features of django should I use or extend to make this as straightforward as possible? REMOTE_USER sounds like the right thing, but I am not sure how I would use it in this case.
I'd recommend using jquery requests. You can send the username and password (encrypted, of course) to the remote site and get back a cookie/session key.
If you have access to the database, I'd also recommend doing that. For example, if the remote host is using MySQL, ask to have a view created for your user and then you can authenticate directly. With this approach, however, you may have to set up a MySQL connection outside of settings.py.
Two approaches:
1) API: If they have released their API, it would be much more simpler, you authenticate user using their API.
2) Expose Database: If they don't have API, they should must give their access to their database so that you can go in and authenticate. But while doing this keep in mind several things: Django authenticate() won't work, because by default authenticate method authenticates again auth_user table. You can of course manually authenticate using your own logic but that would be problem too: you have to create your own sessions and stuffs. So your option is to use custom user models (only available from Django 1.5) in Django.
I am sure other may have better solution than this.

What is the purpose of a web API

I'm working on an app and websites. They have related information such as users, contracts, etc. What is the reason for designing an API and not connecting directly to the database?
Edit:
I'm just starting development and have no experience with web services. Please be as thorough as possible.
Sites such as Facebook, Google, and Twitter could never let third party apps connect directly to their database: it's an enormous security risk. (Would you be comfortable if Facebook allowed anyone to access their database, including private user information and messages?)
APIs serve as a gate through which third party apps can get the kinds of information they are permitted to access.
There are several reasons why you would use an API instead of using direct access.
The first 2 that come to mind:
Using an API allows you to write the client code without knowing any details of the specific implementation, so if you change your database structure or location for instance, you need only rewrite the API wrapper code, not everywhere its referenced.
It allows you to have different levels of authentication. As mentioned in another answer, it is not ideal for all users of an application to have access to every other users data.

Authorizing an application with Oauth and Python

I am trying to build an application that will use data from multiple social services. The user will need to authorize their accounts to be accessed across these multiple services (e.g. facebook, twitter, foursquare) using oauth.
I don't really need the users to login with these accounts, really it is just allowing their data from the api to be pulled.
I know I need to use oauth, but I am having trouble finding a basic example of how to do this type of thing (a lot of examples exist for logging in with oauth).
I have been trying the python-oath2 library.
Does anyone have any recommendation for a good tutorial or example of doing this type of thing in python, and if possible django.
Thanks.
Why reinvent the wheel? There is a plethora of reusable applications that have this implemented. You can find a comparison here: http://djangopackages.com/grids/g/authentication/
Why not give rauth a try? We use this in production for this exact purpose. Although you don't need to require the user to login with your app via the provider, you're going to redirect to the provider, where they'll be asked to authenticate your application. Assuming they accept (or even if they don't), they'll be redirected back to your application, i.e. via the redirect_uri or oauth_callback, there you'll ensure they authorized your app and then proceed with whatever housekeeping you need to do, e.g. saving some info about the user in your database. Try the examples and also pay particular attention to the Facebook example. Now the Facebook example is intended for authorization with the example web app, but the same pattern can be used for what you're trying to do. (You just won't be having them login in via Facebook, for instance. However, the flow can be and probably should be identical, sans database operations and template login lingo.)

Why do some API providers require an API key?

Several web service APIs have you sign up for an API key. For example, UPS Web services requires a key, which is included in calls to their service -- In addition to the username and password.
What is this key used for by the provider? Perhaps UPS is the only one to require both API key and username/password?
One idea is that they use it to limit or measure API usage, but it seems to me that a setting in the users profile could easily do the same thing -- especially since you generally have to get an account w/ username and password to get the API in the first place.
There are two predominant use cases. The first is to measure, track and restrict API usage. If someone is building a service that allows third parties to access it, the service provider may want to control (or at least know) who has access so that they can try and prevent things like denial of service attacks. On the measure and track side, interesting information can be obtained such as knowing which applications are popular for accessing the service or which features people use the most.
The other use case is related to security and authentication. It is unwise for a service provider to have third party applications and services require users to give up their username and password for the primary service. This is a huge exposure. That is why many services are standardizing on protocols such as OAuth, which provides delegated access via authorization to a user's data. While not foolproof, it is definitely preferable to distributing user credentials to unknown, and untrusted, parties.
Most of the time it is to monitor how developers are using the web-api. If they somehow disagree with your usage of the api it provides a means for them to shut it/you down without hurting the other users. And the statistics per user/app are always valuable.
I've used the flickr api - in that situation the key is yours, but the login data might be those of people using your app, so the api key is the only way to differentiate between the apps.
Usually it used to get stats on how much application performing queries to API.
I think asking username/password with API key is ambigious in some cases, but it is a way how it is implemented - so we can't do something with it.
They ask for API key because you could have more than one API under same account - in case you have more than one site which are use same API.
They could use it to signify which version of the API you are trying to use. Perhaps in Version 1.0, there is a method that takes a POST on www.UPS.com/search and there is another one in version 2.0 at the same address, but takes a different parameter set, or even returns data in a different format/style. Your program was built on V1.0 and expects a certain API contract. They want to be able to create V2.0 without interfering with their customer's products.
That's just a guess, but it sounds good to me.
I think Gracenote does a similar thing for cddb. I forget the details, but I remember something about some token.
(They have/had really draconian rules about using their service too.)
Simon reminded me what the gracenote thing was. Gracenote and Fedex and other webservices have lots of developers writing apps for the software. So the developers get a token to put into their apps, but the end users have their own user name and password. It lets the services keep an eye on abusing programs, etc. That is probably te primary reason. (like a browser or a webbot informing the webserver who/what it is)
Originally, Blogger required you to apply for an API key (a la Google Maps) and used it to restrict access to the API. As Blogger evolved into Metaweblog, the requirement for the API became less important, and Blogger no longer requires you to apply for a key. As noted by others, it can still be used for tracking purposes.
In our situation, our clients want it for:
Tracking/analytics - figuring out who's doing what and building what products. Because a number of users are desktop apps, just looking at referrers isn't always enough.
Permissions - which resources should a user have access to? How can a user build apps that have access to specified resources?
Licensing/legal - enforcing that users have read and accepted ToU/licensing information.
Security - passing around usernames/passwords is a really bad idea.