sqlalchemy random seed per sql query - flask

I am looking for a way to order the results of an sqlalchemy query randomly, however I want the random order to be consistent for each user based on lets say a token.
q = q.order_by(func.random())
Note: I'm using postgresql
I found that I can use select setseed(0) and union all with the result of the query and then offset by 1, however that assumes that I know the number of columns q will have (which I could probably find out), but it sounds like a very bad practice to hardcode the number of columns especially for maintainability.
Is this the right approach? If so, is there a way to dynamically get the number of columns of a query? If not, what would be the correct approach?

you can use inspect to get the number of columns
something like:
from sqlalchemy import inspect
url=YOUR_DB_URL
engine = create_engine(url, convert_unicode=True, echo=False)
inspector = inspect(engine)
schemas = inspector.get_schema_names() #(If you need to find the names of the schemas)
number_of_cols= len(inspector.get_columns(YOUR_TABLE_NAME, schema=SCHEMA_NAME))

Related

How can I annotate another annotate group by query in Django?

I have two queries:
Proyecto.objects.filter().order_by('tipo_proyecto')
Proyecto.objects.values('tipo_proyecto').annotate(total=Sum('techo_presupuestario'))
How can I make this in only one query? I want that the first query contains an annotate data that represents all sums of techo_presupuestario, depending on your tipo_proyecto. Is this posible?
If I understand you correct, you'd like to add a conditionally aggregated sum over one field to each object, so you get each object with a sum fitting to its tipo_proyecto. Right?
I don't know, if this makes sense, but it could be done anyway using Subquery:
from django.db.models import Sum, Subquery, OuterRef
sq = Subquery(Proyecto.objects.filter(tipo_proyecto=OuterRef("tipo_proyecto")).values("tipo_proyecto").annotate(techoSum=Sum("techo_presupuestario")).values("techoSum"))
Proyecto.objects.all().annotate(tipoTechoSum = sq).order_by('tipo_proyecto')
Nonetheless, I wouldn't recommend this, as it puts some heavy load on your database. (In MySQL there will be an nested SELECT statement referring to the same table, which might be pretty unpleasant depending on the table's size.)
I'd say the better approach is to "collect" your aggregated sums separately and add the values to the model objects in your code.

Convert Django ORM query with large IN clause to table value constructor

I have a bit of Django code that builds a relatively complicated query in a programmatic fashion, with various filters getting applied to an initial dataset through a series of filter and exclude calls:
for filter in filters:
if filter['name'] == 'revenue':
accounts = accounts.filter(account_revenue__in: filter['values'])
if filter['name'] == 'some_other_type':
if filter['type'] == 'inclusion':
accounts = accounts.filter(account__some_relation__in: filter['values'])
if filter['type'] == 'exclusion':
accounts = accounts.exclude(account__some_relation__in: filter['values'])
...etc
return accounts
For most of these conditions, the possible values of the filters are relatively small and contained, so the IN clauses that Django's ORM generates are performant enough. However there are a few cases where the IN clauses can be much larger (10K - 100K items).
In plain postgres I can make this query much more optimal by using a table value constructor, e.g.:
SELECT domain
FROM accounts
INNER JOIN (
VALUES ('somedomain.com'), ('anotherdomain.com'), ...etc 10K more times
) VALS(v) ON accounts.domain=v
With a 30K+ IN clause in the original query it can take 60+ seconds to run, while the table value version of the query takes 1 second, a huge difference.
But I cannot figure out how to get Django ORM to build the query like I want, and because of the way all the other filters are constructed from ORM filters I can't really write the entire thing as raw SQL.
I was thinking I could get the raw SQL that Django's ORM is going to run, regexp parse it, but that seems very brittle (and surprisingly difficult to get the actual SQL that is about to be run, because of parameter handling etc). I don't see how I could annotate with RawSQL since I don't want to add a column to select, but instead want to add a join condition. Is there a simple solution I am missing?

Django order_by('?') filter with distinct()

I want to retrieving unique foreign key instances and ordering randomly
However, I got an error when I want to use order_by('?')
My query is like this:
qs=Course.objects.distinct('courseschedule__object_id').order_by('courseschedule__object_id')
this query is works fine, but right now I want to order randomly(to get random result every time),I try this
qs=qs.order_by('?')
I got this error:
django.db.utils.ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Any idea how to fix it? My database is Postgres, I don't want do rawSQL.... I really appreciate you guys help!!!!
First of all, random order using the database is expensive and slow according to: querysetsDjango . Second, in my opinion instead of using the database to return a random order get your query from the database without distinct (because it also can be slow), shuffle the query using python methods for example: randomOrder and make a set() and then . This way you are not using the database to do your stuff but using pure python code.

How to order django query set filtered using '__icontains' such that the exactly matched result comes first

I am writing a simple app in django that searches for records in database.
Users inputs a name in the search field and that query is used to filter records using a particular field like -
Result = Users.objects.filter(name__icontains=query_from_searchbox)
E.g. -
Database consists of names- Shiv, Shivam, Shivendra, Kashiva, Varun... etc.
A search query 'shiv' returns records in following order-
Kahiva, Shivam, Shiv and Shivendra
Ordered by primary key.
My question is how can i achieve the order -
Shiv, Shivam, Shivendra and Kashiva.
I mean the most relevant first then lesser relevant result.
It's not possible to do that with standard Django as that type of thing is outside the scope & specific to a search app.
When you're interacting with the ORM consider what you're actually doing with the database - it's all just SQL queries.
If you wanted to rearrange the results you'd have to manipulate the queryset, check exact matches, then use regular expressions to check for partial matches.
Search isn't really the kind of thing that is best suited to the ORM however, so you may which to consider looking at specific search applications. They will usually maintain an index, which avoids database hits and may also offer a percentage match ordering like you're looking for.
A good place to start may be with Haystack

Cassandra NOT EQUAL Operator

Question to all Cassandra experts out there.
I have a column family with about a million records.
I would like to query these records in such a way that I should be able to perform a Not-Equal-To kind of operation.
I Googled on this and it seems I have to use some sort of Map-Reduce.
Can somebody tell me what are the options available in this regard.
I can suggest a few approaches.
1) If you have a limited number of values that you would like to test for not-equality, consider modeling those as a boolean columns (i.e.: column isEqualToUnitedStates with true or false).
2) Otherwise, consider emulating the unsupported query != X by combining results of two separate queries, < X and > X on the client-side.
3) If your schema cannot support either type of query above, you may have to resort to writing custom routines that will do client-side filtering and construct the not-equal set dynamically. This will work if you can first narrow down your search space to manageable proportions, such that it's relatively cheap to run the query without the not-equal.
So let's say you're interested in all purchases of a particular customer of every product type except Widget. An ideal query could look something like SELECT * FROM purchases WHERE customer = 'Bob' AND item != 'Widget'; Now of course, you cannot run this, but in this case you should be able to run SELECT * FROM purchases WHERE customer = 'Bob' without wasting too many resources and filter item != 'Widget' in the client application.
4) Finally, if there is no way to restrict the data in a meaningful way before doing the scan (querying without the equality check would returning too many rows to handle comfortably), you may have to resort to MapReduce. This means running a distributed job that would scan all rows in the table across the cluster. Such jobs will obviously run a lot slower than native queries, and are quite complex to set up. If you want to go this way, please look into Cassandra Hadoop integration.
If you want to use not-equals operator on a specific partition key and get all other data from table then you can use a combination of range queries and TOKEN function from CQL to achieve this
For example, if you want to fetch all rows except the ones having partition key as 'abc' then you execute below 2 queries
select <column1>,<column2> from <keyspace1>.<table1> where TOKEN(<partition_key_column_name>) < TOKEN('abc');
select <column1>,<column2> from <keyspace1>.<table1> where TOKEN(<partition_key_column_name>) > TOKEN('abc');
But, beware that result is going to be huge (depending on size of table and fields you need). So you might want to use this in conjunction with dsbulk kind of utility. Also note that there is no guarantee of ordering in your result. This is just a kind of data dump which will most probably be useful for some kind of one-time data migration like scenarios.