How to prioritize queries from different user roles in superset? - apache-superset

I have low-priority users and high-priority users. First of all, I need to process queries from high-priority users. Low-priority users should be limited initially because they slow down the high-priority users.
At the moment I have not found a solution. Is this generally possible or do I need to fork apache-superset and implement such logic myself in the source code? Is this functionality planned in the roadmap?

Superset generally is a thin layer over your existing data stores and doesn't have much of a compute layer.
With this in mind, the right technical decision is probably to configure this at the database / data store layer. Many people integrate LDAP both into Superset and into their data store so 2-way configuration of roles & datas tore priorities / permissions can be configured.
With that being said, Superset is open source! You're definitely welcome to fork the code and implement it yourself. Better yet, you can feel free to raise a discussion by creating a Github issue.

Like Srini said, it depends on your data layer.
One way you could this is by defining a custom SQL_QUERY_MUTATOR in your superset_config.py:
# superset_config.py
def SQL_QUERY_MUTATOR(sql, username, security_manager):
pool = "vip" if username in VIP_LIST else "normal"
return f"-- pool: {pool}\n{sql}"
This will prepend a comment to the query specifying a pool that's either "vip" or "normal". You could then send this to a proxy that parses the comment and dispatches the query to the proper header.
Another way of doing this is specifying a DB_CONNECTION_MUTATOR that sets connection parameters depending on the user:
# superset_config.py
def DB_CONNECTION_MUTATOR(uri, params, username, security_manager, source):
pool = "vip" if username in VIP_LIST else "normal"
params["configuration"] = {"job.queue.name": pool}}
return uri, params

Related

Should I store failed login attempts in AWS Cognito or Dynamo DB?

I have a requirement to build a basic "3 failed login attempts and your account gets locked" functionality. The project uses AWS Cognito for Authentication, and the Cognito PreAuth and PostAuth triggers to run a Lambda function look like they will help here.
So the basic flow is to increment a counter in the PreAuth lambda, check it and block login there, or reset the counter in the PostAuth lambda (so successful logins dont end up locking the user out). Essentially it boils down to:
PreAuth Lambda
if failed-login-count > LIMIT:
block login
else:
increment failed-login-count
PostAuth Lambda
reset failed-login-count to zero
Now at the moment I am using a dedicated DynamoDB table to store the failed-login-count for a given user. This seems to work fine for now.
Then I figured it'd be neater to use a custom attribute in Cognito (using CognitoIdentityServiceProvider.adminUpdateUserAttributes) so I could throw away the DynamoDB table.
However reading https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-dg.pdf the section titled "Configuring User Pool Attributes" states:
Attributes are pieces of information that help you identify individual users, such as name, email, and phone number. Not all information about your users should be stored in attributes. For example, user data that changes frequently, such as usage statistics or game scores, should be kept in a separate data store, such as Amazon Cognito Sync or Amazon DynamoDB.
Given that the counter will change on every single login attempt, the docs would seem to indicate I shouldn't do this...
But can anyone tell me why? Or if there would be some negative consequence of doing so?
As far as I can see, Cognito billing is purely based on storage (i.e. number of users), and not operations, whereas Dynamo charges for read/write/storage.
Could it simply be AWS not wanting people to abuse Cognito as a storage mechanism? Or am I being daft?
We are dealing with similar problem and main reason why we have decided to store extra attributes in DB is that Cognito has quotas for all the actions and "AdminUpdateUserAttributes" is limited to 25 per second.
More information here:
https://docs.aws.amazon.com/cognito/latest/developerguide/limits.html
So if you have a pool with 100k or more it can create a bottle neck if wanted to update a Cognito user records with every login etc.
Cognito UserAttributes are meant to store information about the users. This information can then be read from the client using the AWS Cognito SDK, or just by decoding the idToken on the client-side. Every custom attribute you add will be visible on the client-side.
Another downside of custom attributes is that:
You only have 25 values to set
They cannot be removed or changed once added to the user pool.
I have personally used custom attributes and the interface to manipulate them is not excellent. But that is just a personal thought.
If you want to store this information, and not depend on DynamoDB, you can use Amazon Cognito Sync. Besides the service, it offers a client with great features that you can incorporate to your app.
AWS DynamoDb appears to be your best option, it is commonly used for such use cases. Some of the benefits of using it:
You can store separate record for each login attempt with as much info as you want such as ip address, location, user-agent etc. You can also add datetime that can be used by pre-auth Lambda to query by time range for example failed attempt within last 30 minutes
You don't need to manage table because you can set TTL for DynamoDb record so that record will be deleted automatically after specified time.
You can also archive items in S3

How to change users while preserving the store?

I want to implement a "fast login".
I'm developing an enterprise software where a lot of users work in the same organization with the same data in the same computer and I want to be able to know who did what and when. Right now they have to log out and log in and load the data has to be loaded into the store all over again.
What I want is for them to be able to, without logging out, click on a user, from the organization, insert his password and the user is switched while preserving the store.
Any idea how I can accomplish this?
I'm using ember-simple-auth v1.1.0 and ember v2.10.2
The simpliest solution would be disabling page reload when user logs out. As far as I know, it's a reload causes data loss from store, not a logging out by itself. To do this, you need to overwrite sessionInvalidated method in your application route. For example,
sessionInvalidated() {
this.transitionTo('/login');
},
But remember - you lower security with this method: if someone will log out and leave webpage with app open, other person will have a possibility to extract data (if they have enough technical background to at least install ember inspector).
Other solution will require heavy research. I think it should be possible to implement custom authenticator which would allow to authenticate new user without logging out previous, by simply replacing tokens in store. But I don't know how hard it will be to implement and what obstacles you can meet. You will need to read ember-simple-auth's sources a lot, that's for sure.
I was actually able to solve it by simply using authenticate() with another user but never calling invalidateSession() which is the function that calls sessionInvalidated() that looks like this:
sessionInvalidated() {
if (!testing) {
if (this.get('_isFastBoot')) {
this.transitionTo(Configuration.baseURL);
} else {
window.location.replace(Configuration.baseURL);
}
}
}
So by not calling sessionInvalidated() the user isn't redirected or the page refreshed and the new user is able to keep using the store without switching pages.

How to decouple uniqueness validation from the persistence layer?

Let's say I have a super simple user registration check that a user's email must be unique across all users.
I've expressed this requirement in such functions.
(defn validate-user [user]
(and (:email user) (is-unique? (:email user))))
(defn is-unique? [email]
(not (db-api/user-exists {:email email})))
But I want to decouple my validation from the database, I want to make it purely functional. I could probably also inject the database API as a parameter to validate-user, like
(defn validate-user [db-api user]
(and (:email user) (is-unique? db-api (:email user))))
(defn is-unique? [db-api email]
(not ((:user-exists db-api) {:email email})))
but I don't know if this is idiomatic.
Also, it feels like the consumer of validate-user should not care about the database api. It feels like having this dependency undermines the entire concept of separating the business logic layer from the persistence layer. So I'm looking for a mindset that explains how to do this properly, or why it could not be done.
In order to avoid race conditions, the database should handle this constraint. You are probably looking for the equivalent of INSERT IF NOT EXISTSin the SQL world.
In practise, you could have a function create-user and a function update-user. The create-user function could use the IF NOT EXISTS check.
You cannot decouple it from the database : it is the database responsibility to maintain the constraints of the data (relations, in the relational world). Nobody else than the database can do it due to race conditions.
Let me expand on that with an example :
Suppose that two users (userA and userB) wish at the very same time to create an account with a new email not already chosen : "user#example.com". Your system then queries the database :
checking that "user#example.com" doesn't exists on behalf of userA, it returns true
checking that "user#example.com" doesn't exists on behalf of userB, it returns true
You then proceed to :
create an account for userA with a mail "user#example.com"
create an account for userB with a mail "user#example.com"
Depending of the semantics of the database and your requests, you may end up with :
two accounts with the mail "user#example.com" (ex. no constraint on the database, no unique index)
an account for "userB" is created, no account for userA (ex. the request to create the account for userA failed because there was already a mal present)
an account for "userB" is created, then the data for "userA" overrides the data from "userB" (ex. the semantics of an update statement are user in the database - probably a bug at this point)
Because of that, you should try to create an account, and let the database tell you if it failed. You just cannot check that yourself, without recreating the properties of a database yourself (I personally wouldn't dare trying).

Multi-tenant Django applications using Mongoengine

I want to build a multi tenant architecture for a SAAS system. We are using Django as our backend and mongoengine as our main database and gunicorn as our web-server.
Our clients are a few big companies, so the number of databases pre-allocating space shouldn't be a problem.
The first approach we took was to write a middleware to determine the source of the request to properly connect to a mongoengine database. Here is the code:
class MongoConnectionMiddleware(object):
def process_request(self, request):
if request.user.is_authenticated():
mongo_connect(request.user.profile.establishment)
And the mongo_connect method:
def mongo_connect(establishment):
db_name = 'db_client_%d' % establishment.id
connect(db_name)
This will register the "default" alias as the db_name for every mongoengine request.
But it seems that when many concurrent users from different companies are making requests, each one sets the default db_name to it's own name.
As an example:
Company A makes a request and connects to database A. While A is making it's work company B connects to database B. This makes A also connect to B's database in the process, so A fails to find some ids.
¿Is there a way to isolate the connection to the mongo database per request to avoid this problem?
Unfortunately MongoEngine seems to be designed around a very basic use case of a single primary connection and multiple auxiliary connections.
http://docs.mongoengine.org/en/latest/guide/connecting.html#connecting-to-mongodb
To get around the default connection logic, I define the first connection I come across as the default, I also add it as a named connection. I then add any subsequent connection as named connections only.
https://github.com/MongoEngine/mongoengine/issues/607#issuecomment-38651532
You can use the with_db decorator to switch from one connection to another, but it's a contextmanager call, which means as soon as you leave the with statement, it will revert. It also still requires a default connection.
http://docs.mongoengine.org/en/latest/guide/connecting.html#switch-database-context-manager
You might be able to put it inside a function and then yield inside the with to prevent it reverting immediately, I'm not sure if this is valid.
You could use a wrapper of some kind, either a function, class or a custom QuerySet, that checks the current django/flask session and switches the db to the appropriate connection.
I'm not sure if a QuerySet can do this, but it would probably be the nicest way if it can.
http://docs.mongoengine.org/en/latest/guide/querying.html#custom-querysets
I included some code in this issue here where I change the database connection for my models.
https://github.com/MongoEngine/mongoengine/issues/605
def switch(model, db):
model._meta['db_alias'] = db
# must set _collection to none so it is re-evaluated
model._collection = None
return model
MyDocument = switch(MyDocument, 'db-alias')
You'll also want to take a look at the code that mongoengine uses to switch dbs.
Beware that mongo engine likes to cache things, so changing a few variables here and there doesn't always cause an effect. It's full of surprises like this.
Edit:
I should also add, that the 'connect' call won't pick up value changes. So calling connect with new parameters wont take effect unless its a new alias. Even the disconnect function (which isn't exposed publically) doesn't let you do this as the models will cache the connection. I mention this in some of the issues linked above and also here: https://github.com/MongoEngine/mongoengine/issues/566

When using AWS SQS, is there any reason to prefer using GetQueueUrl to building a queue url from the region, account id, and name?

I have an application that uses a single SQS queue.
For the sake of flexibility I would like to configure the application using the queue name, SQS region, and AWS account id (as well as the normal AWS credentials and so forth), rather than giving a full queue url.
Does it make any sense to use GetQueueUrl to retrieve a url for the queue when I can just build it with something like the following (in ruby):
region = ENV['SQS_REGION'] # 'us-west-2'
account_id = ENV['SQS_AWS_ACCOUNT_ID'] # '773083218405'
queue_name = ENV['SQS_QUEUE_NAME'] # 'test3'
queue_url = "https://sqs.#{region}.amazonaws.com/#{account_id}/#{queue_name}
# => https://sqs.us-west-2.amazonaws.com/773083218405/test3
Possible reasons that it might not:
Amazon might change their url format.
Others???
I don't think you have any guarantee that the URL will have such a form. The official documentation states the GetQueueUrl call as the official method for obtaining queue urls. So while constructing it using the method above may be a very good guess, it may also fail at any time because Amazon can change the URL scheme (e.g. for new queues).
If Amazon changes the queue URL in a breaking way it will not be immediate and will be deprecated slowly, and will take effect moving up a version (i.e. when you upgrade your SDK).
While the documentation doesn't guarantee it, Amazon knows that it would be a massively breaking change for thousands of customers.
Furthermore, lots of customers use hard coded queue URLs which they get from the console, so those customers would not get the updated queue URL format either.
In the end, you will be safe either way. If you have LOTs of queues, then you will be better off formatting them yourself. If you have a small number of queues, then it shouldn't make much difference either way.
I believe for safety purposes the best way to get the URL is through the sqs.queue.named method. What you can do is memoize the queues by name to avoid multiple calls, something like that:
# https://github.com/phstc/shoryuken/blob/master/lib/shoryuken/client.rb
class Client
##queues = {}
class << self
def queues(queue)
##queues[queue.to_s] ||= sqs.queues.named(queue)
end
end
end