Doctrine Query with subquery, order by, rand and group by

Doctrine Query with subquery, order by, rand and group by - doctrine-orm

We have a table, from which we want to choose a random row, out of the last 1000 entries. However a user can enter multiple times, and independent of the amount of entries each user should have the same chance. This is currently solved by a native query (see below), which works. I was wondering if it's possible to translate this behaviour to a Doctrine statement, as we're planing to switch from MySQL to PostgreSQL in the future.
Current solution:
$conn = $this->em->getConnection();
$sql = 'SELECT tmp.username FROM (SELECT username, id FROM entries ORDER BY id DESC limit 1000) AS tmp GROUP BY tmp.id ORDER BY RAND() LIMIT 1';
$stmt = $conn->prepare($sql);
$stmt->execute();

Related

Netsuite suiteql how to get all available tables to query?

I am using Postman and Netsuite's SuiteQL to query some tables. I would like to write two queries. One is to return all items (fulfillment items) for a given sales order. Two is to return all sales orders that contain a given item. I am not sure what tables to use.
The sales order I can return from something like this.
"q": "SELECT * FROM transaction WHERE Type = 'SalesOrd' and id = '12345'"
The item I can get from this.
"q": "SELECT * FROM item WHERE id = 1122"
I can join transactions and transactionline for the sale order, but no items.
"q": "SELECT * from transactionline tl join transaction t on tl.transaction = t.id where t.id in ('12345')"
The best reference I have found is the Analytics Browser, https://system.netsuite.com/help/helpcenter/en_US/srbrowser/Browser2021_1/analytics/record/transaction.html, but it does not show relationships like an ERD diagram.
What tables do I need to join to say, given this item id 1122, return me all sales orders (transactions) that have this item?

You are looking for TransactionLine.item. That will allow you to query transaction lines whose item is whatever internal id you specify.
{
"q": "SELECT Transaction.ID FROM Transaction INNER JOIN TransactionLine ON TransactionLine.Transaction = Transaction.ID WHERE type = 'SalesOrd' AND TransactionLine.item = 1122"
}
If you are serious about getting all available tables to query take a look at the metadata catalog. It's not technically meant to be used for learning SuiteQL (supposed to make the normal API Calls easier to navigate), but I've found the catalog endpoints are the same as the SuiteQL tables for the most part.
https://{{YOUR_ACCOUNT_ID}}.suitetalk.api.netsuite.com/services/rest/record/v1/metadata-catalog/
Headers:
Accept application/schema+json

You can review all the available records, fields and joins in the Record Catalog page (Customization > Record Catalog).

PowerBI DAX function to count number of occurrences using DISTINCT (Countif)

I am trying to create a column in PowerBI that counts how times a customer name occurs in a list.
I may be going in the wrong direction, but what I have so far is;
My SQL query returns a table of customer site visits (Query1), from which I have created a new table (Unique) and column (Customer) that lists the distinct names (cust_company_descr) from Query1;
Unique = DISTINCT(Query1[cust_company_descr])
What I need is a new column in the Unique table (Count) that will give me a count of how many times each customer name in the Query1 table appears.
If I were working in Excel, the solution would be to generate a list of unique values and do a COUNTIF, but I can't find a way to replicate this.
I have seen a few solutions that involve this sort of thing;
CountValues =
CALCULATE ( COUNTROWS ( TableName ); TableName[ColumnName] = " This Value " )
The issue I have with this is the Unique table contains over 300 unique entries, so I can't make this work.
Example (The Count column is what I'm trying to create)
Query 1
cust_company_descr
Company A
Company B
Company A
Company C
Company B
Company A
Unique
Company_____Count
Company A_____3
Company B_____2
Company C_____1
Any help is gratefully received.

Top user of day, week, all time - best way to implement?

Suppose each user on my site has a score which increases as they use the site (like Stackoverflow) and I have a field score stored in the user profile table for each user.
Getting the top 10 users of all time is easy, just order by the score column.
I want to have "top 10 today", "top 10 this week", "top 10 of all time".
What's the best way to implement this? Do I need to store every single score change with a timestamp?

You would have to have a table that stored the increments and use a timestamp. I.E.
CREATE TABLE ScoreIncreases (
PrimaryKey UNIQUEIDENTIFIER,
UserId UNIQUEIDENTIFIER,
ScoreIncrease INT,
CreatedDate DATETIME)
Your query would then be something like
SELECT TOP 1 u.PrimaryKey, SUM(ScoreIncrease)
FROM Users u
INNER JOIN ScoreIncreases si ON si.Userid=u.PrimaryKey
WHERE DATEDIFF(day,si.CreatedDate,GETDATE()) = 0
GROUP BY u.PrimaryKey
ORDER BY SUM(ScoreIncrease) DESC

How to use subquery in django?

I want to get a list of the latest purchase of each customer, sorted by the date.
The following query does what I want except for the date:
(Purchase.objects
.all()
.distinct('customer')
.order_by('customer', '-date'))
It produces a query like:
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
I am forced to use customer_id as the first ORDER BY expression because of DISTINCT ON.
I want to sort by the date, so what the query I really need should look like this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I don't want to sort using python because I still got to page limit the query. There can be tens of thousands of rows in the database.
In fact it is currently sorted by in python now and is causing very long page load times, so that's why I'm trying to fix this.
Basically I want something like this https://stackoverflow.com/a/9796104/242969. Is it possible to express it with django querysets instead of writing raw SQL?
The actual models and methods are several pages long, but here is the set of models required for the queryset above.
class Customer(models.Model):
user = models.OneToOneField(User)
class Purchase(models.Model):
customer = models.ForeignKey(Customer)
date = models.DateField(auto_now_add=True)
item = models.CharField(max_length=255)
If I have data like:
Customer A -
Purchase(item=Chair, date=January),
Purchase(item=Table, date=February)
Customer B -
Purchase(item=Speakers, date=January),
Purchase(item=Monitor, date=May)
Customer C -
Purchase(item=Laptop, date=March),
Purchase(item=Printer, date=April)
I want to be able to extract the following:
Purchase(item=Monitor, date=May)
Purchase(item=Printer, date=April)
Purchase(item=Table, date=February)
There is at most one purchase in the list per customer. The purchase is each customer's latest. It is sorted by latest date.
This query will be able to extract that:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I'm trying to find a way not to have to use raw SQL to achieve this result.

This may not be exactly what you're looking for, but it might get you closer. Take a look at Django's annotate.
Here is an example of something that may help:
from django.db.models import Max
Customer.objects.all().annotate(most_recent_purchase=Max('purchase__date'))
This will give you a list of your customer models each one of which will have a new attribute called "most_recent_purchase" and will contain the date on which they made their last purchase. The sql produced looks like this:
SELECT "demo_customer"."id",
"demo_customer"."user_id",
MAX("demo_purchase"."date") AS "most_recent_purchase"
FROM "demo_customer"
LEFT OUTER JOIN "demo_purchase" ON ("demo_customer"."id" = "demo_purchase"."customer_id")
GROUP BY "demo_customer"."id",
"demo_customer"."user_id"
Another option, would be adding a property to your customer model that would look something like this:
#property
def latest_purchase(self):
return self.purchase_set.order_by('-date')[0]
You would obviously need to handle the case where there aren't any purchases in this property, and this would potentially not perform very well (since you would be running one query for each customer to get their latest purchase).
I've used both of these techniques in the past and they've both worked fine in different situations. I hope this helps. Best of luck!

Whenever there is a difficult query to write using Django ORM, I first try the query in psql(or whatever client you use). The SQL that you want is not this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id" "shop_purchase.id" "shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC, "shop_purchase.date" DESC;
) AS result
ORDER BY date DESC;
In the above SQL, the inner SQL is looking for distinct on a combination of (customer_id, id, and date) and since id will be unique for all, you will get all records from the table. I am assuming id is the primary key as per convention.
If you need to find the last purchase of every customer, you need to do something like:
SELECT "shop_purchase.customer_id", max("shop_purchase.date")
FROM shop_purchase
GROUP BY 1
But the problem with the above query is that it will give you only the customer name and date. Using that will not help you in finding the records when you use these results in a subquery.
To use IN you need a list of unique parameters to identify a record, e.g., id
If in your records id is a serial key, then you can leverage the fact that the latest date will be the maximum id as well. So your SQL becomes:
SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id";
Note that I kept only one field (id) in the selected clause to use it in a subquery using IN.
The complete SQL will now be:
SELECT *
FROM shop_customer
WHERE "shop_customer.id" IN
(SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id");
and using the Django ORM it looks like:
(Purchase.objects.filter(
id__in=Purchase.objects
.values('customer_id')
.annotate(latest=Max('id'))
.values_list('latest', flat=True)))
Hope it helps!

I have a similar situation and this is how I'm planning to go about it:
query = Purchase.objects.distinct('customer').order_by('customer').query
query = 'SELECT * FROM ({}) AS result ORDER BY sent DESC'.format(query)
return Purchase.objects.raw(query)
Upside it gives me the query I want. Downside is that it is raw query and I can't append any other queryset filters.

This is my approach if I need some subset of data (N items) along with the Django query. This is example using PostgreSQL and handy json_build_object() function (Postgres 9.4+), but same way you can use other aggregate function in other database system. For older PostgreSQL versions you can use combination of array_agg() and array_to_string() functions.
Imagine you have Article and Comment models and along with every article in the list you want to select 3 recent comments (change LIMIT 3 to adjust size of subset or ORDER BY c.id DESC to change sorting of subset).
qs = Article.objects.all()
qs = qs.extra(select = {
'recent_comments': """
SELECT
json_build_object('comments',
array_agg(
json_build_object('id', id, 'user_id', user_id, 'body', body)
)
)
FROM (
SELECT
c.id,
c.user_id,
c.body
FROM app_comment c
WHERE c.article_id = app_article.id
ORDER BY c.id DESC
LIMIT 3
) sub
"""
})
for article in qs:
print(article.recent_comments)
# Output:
# {u'comments': [{u'user_id': 1, u'id': 3, u'body': u'foo'}, {u'user_id': 1, u'id': 2, u'body': u'bar'}, {u'user_id': 1, u'id': 1, u'body': u'joe'}]}
# ....

Django: Distinct foreign keys

class Log:
project = ForeignKey(Project)
msg = CharField(...)
date = DateField(...)
I want to select the four most recent Log entries where each Log entry must have a unique project foreign key. I've tries the solutions on google search but none of them works and the django documentation isn't that very good for lookup..
I tried stuff like:
Log.objects.all().distinct('project')[:4]
Log.objects.values('project').distinct()[:4]
Log.objects.values_list('project').distinct('project')[:4]
But this either return nothing or Log entries of the same project..
Any help would be appreciated!

Queries don't work like that - either in Django's ORM or in the underlying SQL. If you want to get unique IDs, you can only query for the ID. So you'll need to do two queries to get the actual Log entries. Something like:
id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)

Actually, you can get the project_ids in SQL. Assuming that you want the unique project ids for the four projects with the latest log entries, the SQL would look like this:
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4;
Now, you actually want all of the log information. In PostgreSQL 8.4 and later you can use windowing functions, but that doesn't work on other versions/databases, so I'll do it the more complex way:
SELECT logs.*
FROM logs JOIN (
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4 ) as latest
ON logs.project_id = latest.project_id
AND logs.date = latest.max_date;
Now, if you have access to windowing functions, it's a bit neater (I think anyway), and certainly faster to execute:
SELECT * FROM (
SELECT logs.field1, logs.field2, logs.field3, logs.date
rank() over ( partition by project_id
order by "date" DESC ) as dateorder
FROM logs ) as logsort
WHERE dateorder = 1
ORDER BY logs.date DESC LIMIT 1;
OK, maybe it's not easier to understand, but take my word for it, it runs worlds faster on a large database.
I'm not entirely sure how that translates to object syntax, though, or even if it does. Also, if you wanted to get other project data, you'd need to join against the projects table.

I know this is an old post, but in Django 2.0, I think you could just use:
Log.objects.values('project').distinct().order_by('project')[:4]

You need two querysets. The good thing is it still results in a single trip to the database (though there is a subquery involved).
latest_ids_per_project = Log.objects.values_list(
'project').annotate(latest=Max('date')).order_by(
'-latest').values_list('project')
log_objects = Log.objects.filter(
id__in=latest_ids_per_project[:4]).order_by('-date')
This looks a bit convoluted, but it actually results in a surprisingly compact query:
SELECT "log"."id",
"log"."project_id",
"log"."msg"
"log"."date"
FROM "log"
WHERE "log"."id" IN
(SELECT U0."id"
FROM "log" U0
GROUP BY U0."project_id"
ORDER BY MAX(U0."date") DESC
LIMIT 4)
ORDER BY "log"."date" DESC

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Doctrine Query with subquery, order by, rand and group by - doctrine-orm

Related

Netsuite suiteql how to get all available tables to query?

PowerBI DAX function to count number of occurrences using DISTINCT (Countif)

Top user of day, week, all time - best way to implement?

How to use subquery in django?

Django: Distinct foreign keys

Categories

Resources