I have this ActiveRecord query:
Stock.select('DISTINCT ON (stocks.part_number)*').joins(:part, :manufacturer)
.includes(:manufacturer, :part).order(:part_number).with_cat(category).
where(manufacturers: {abbr: ['manufacturer1', 'manufacturer2']})
with_cat is a scope:
scope :with_cat, -> (category) { where(parts: {category_id: category}) }
Now the reason I am using Distinct on is because every manufacturer can have the same part as another, hence duplicates. I do not want duplicates. The above gets the job done. Except when I add count to it I get an error.
PG::SyntaxError: ERROR: syntax error at or near "ON"
LINE 1: SELECT COUNT(DISTINCT ON (stocks.part_number)*) FROM "stocks...
^
: SELECT COUNT(DISTINCT ON (stocks.part_number)*) FROM "stocks"
INNER JOIN "parts" ON "parts"."id" = "stocks"."part_id"
INNER JOIN "manufacturers" ON "manufacturers"."id" = "stocks"."manufacturer_id"
WHERE "parts"."category_id" = 17 AND "manufacturers"."abbr" IN ('manufacturer1', 'manufacturer2')
Not really sure how to add a count to the query without causing that error. I'm not familiar with Distinct on either. Any explanation as to why this is happening would be great!
Does it fix the query if you use the count ahead of distinct? Got the idea per this postgres documentation.
Stock.select('count(distinct part_number)')...
Related
I need to correct a datapoint in a pre-existing table. I am using multiple CTEs to find the bad value and the corresponding good value. I am having trouble working out how to overwrite the value in the table using the output of the CTE. Here is what I am trying:
with [extra CTEs here]....
,CTE3 AS (
SELECT c1.FIELD_1, c1.FIELD_2 AS GOOD, c2.FIELD_3 AS BAD
FROM CTE1 c1
JOIN CTE2 c2 ON c1.FIELD_1 = c2.FIELD_1
)
update TABLE1
set TABLE1.FIELD_3 = CTE3.GOOD
from CTE3
INNER JOIN TABLE1 ON CTE3.BAD = TABLE1.FIELD_3
Is it even possible to achieve this?
If so, how should I change my logic to get it to work?
Trying the above logic is throwing the following error:
SQL Error [42601]: An unexpected token "WITH CTE1 AS ( SELECT
FIELD_1" was found following "BEGIN-OF-STATEMENT". Expected tokens
may include: "<update>".. SQLCODE=-104, SQLSTATE=42601,
DRIVER=4.27.25
Table designs and expected output:
I have a quick question about the following piece of code. Why can we use 'NA' = for the subquery ? I mean, the subquery might return a group of values, not a single one, right? Could anyone tell me the reason? Many thanks for your time and attention.
proc sql;
select lastname, first name
from sasuser.staffmaster
where 'NA' =
(select jobcategory
from sasuser.supervisors
where staffmaster.empid = supervisors.empid);
quit;
Thanks again.
Assuming EMPID is a unique ID for an employee (I hope it is?), and each employee has only one supervisor, that query should resolve to a single row every time. (A single row for each row returned from the outer query, of course, which is important. Think of it like a join - that's basically what that is, a slightly oddly phrased join, which often will be turned into an actual join by the SQL parser.)
In general, however, sure, it could resolve to multiple rows. SAS will let you do the query, and if it returns just one row it works; if it returns 2+ rows, it fails. As Quentin pointed out in comments, this is a correlated subquery.
I am using Java DB (Java DB is Oracle's supported version of Apache Derby and contains the same binaries as Apache Derby. source: http://www.oracle.com/technetwork/java/javadb/overview/faqs-jsp-156714.html#1q2).
I am trying to update a column in one table, however I need to join that table with 2 other tables within the same database to get accurate results (not my design, nor my choice).
Below are my three tables, ADSID is a key linking Vehicles and Customers and ADDRESS and ZIP in Salesresp are used to link it to Customers. (Other fields left out for the sake of brevity.)
Salesresp(address, zip, prevsale)
Customers(adsid, address, zipcode)
Vehicles(adsid, selldate)
The goal is to find customers in the SalesResp table that have previously purchased a vehicle before the given date. They are identified by address and adsid in Customers and Vechiles respectively.
I have seen updates to a column with a single join and in fact asked a question about one of my own update/joins here (UPDATE with INNER JOIN). But now I need to take it that one step further and use both tables to get all the information.
I can get a multi-JOIN SELECT statement to work:
SELECT * FROM salesresp
INNER JOIN customers ON (SALESRESP.ZIP = customers.ZIPCODE) AND
(SALESRESP.ADDRESS = customers.ADDRESS)
INNER JOIN vehicles ON (Vehicles.ADSId =Customers.ADSId )
WHERE (VEHICLES.SELLDATE<'2013-09-24');
However I cannot get a multi-JOIN UPDATE statement to work.
I have attempted to try the update like this:
UPDATE salesresp SET PREVSALE = (SELECT SALESRESP.address FROM SALESRESP
WHERE SALESRESP.address IN (SELECT customers.address FROM customers
WHERE customers.adsid IN (SELECT vehicles.adsid FROM vehicles
WHERE vehicles.SELLDATE < '2013-09-24')));
And I am given this error: "Error code 30000, SQL state 21000: Scalar subquery is only allowed to return a single row".
But if I change that first "=" to a "IN" it gives me a syntax error for having encountered "IN" (Error code 30000, SQL state 42X01).
I also attempted to do more blatant inner joins, but upon attempting to execute this code I got the the same error as above: "Error code 30000, SQL state 42X01" with it complaining about my use of the "FROM" keyword.
update salesresp set prevsale = vehicles.selldate
from salesresp sr
inner join vehicles v
on sr.prevsale = v.selldate
inner join customers c
on v.adsid = c.adsid
where v.selldate < '2013-09-24';
And in a different configuration:
update salesresp
inner join customer on salesresp.address = customer.address
inner join vehicles on customer.adsid = vehicles.ADSID
set salesresp.SELLDATE = vehicles.selldate where vehicles.selldate < '2013-09-24';
Where it finds the "INNER" distasteful: Error code 30000, SQL state 42X01: Syntax error: Encountered "inner" at line 3, column 1.
What do I need to do to get this multi-join update query to work? Or is it simply not possible with this database?
Any advice is appreciated.
If I were you I would:
1) Turn off autocommit (if you haven't already)
2) Craft a select/join which returns a set of columns that identifies the record you want to update E.g. select c1, c2, ... from A join B join C... WHERE ...
3) Issue the update. E.g. update salesrep SET CX = cx where C1 = c1 AND C2 = c2 AND...
(Having an index on C1, C2, ... will boost performance)
4) Commit.
That way you don't have worry about mixing the update and the join, and doing it within a txn ensures that nothing can change the result of the join before your update goes through.
I am trying to select MAX scores objects for unique user(s).
So user should be distinct and i should receive max score for a user (order), and I need to get other object values like dataAdded for this max score.
$query = $this->createQueryBuilder('s');
$query->select('DISTINCT s.user, s');
$query->where('s.challenge = :challenge')->setParameter('challenge', $challenge);
$query->andWhere('s.date_added BETWEEN :monday AND :sunday')
->setParameter('monday', $startDate->format('Y-m-d'))
->setParameter('sunday', $endDate->format('Y-m-d'));
$query->orderBy('s.score', 'DESC');
//$query->groupBy('s.user');
$query->setMaxResults($limit);
$results = $query->getQuery();
Does not matter how I try i can't make distinct to work. I have tried group by MAX(s.score) but this will associate random object with max score rather then the same object..
Does anyone have a solution to this? Distinct simply don't seem to work...
Generated query:
SELECT DISTINCT s.user, s FROM Digital\ApplicationBundle\Entity\ChallengeScore s WHERE s.challenge = :challenge AND (s.date_added BETWEEN :monday AND :sunday) ORDER BY s.score DESC
But an error ;(
[Semantical Error] line 0, col 18 near 'user, s FROM': Error: Invalid PathExpression. Must be a StateFieldPathExpression
Its worth to mention that user is a relation. When i try this on other files this works fine...
If you need get count distinct values, you can use countDistinct. But if you must get all distinct values, use groupBy
class Log:
project = ForeignKey(Project)
msg = CharField(...)
date = DateField(...)
I want to select the four most recent Log entries where each Log entry must have a unique project foreign key. I've tries the solutions on google search but none of them works and the django documentation isn't that very good for lookup..
I tried stuff like:
Log.objects.all().distinct('project')[:4]
Log.objects.values('project').distinct()[:4]
Log.objects.values_list('project').distinct('project')[:4]
But this either return nothing or Log entries of the same project..
Any help would be appreciated!
Queries don't work like that - either in Django's ORM or in the underlying SQL. If you want to get unique IDs, you can only query for the ID. So you'll need to do two queries to get the actual Log entries. Something like:
id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)
Actually, you can get the project_ids in SQL. Assuming that you want the unique project ids for the four projects with the latest log entries, the SQL would look like this:
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4;
Now, you actually want all of the log information. In PostgreSQL 8.4 and later you can use windowing functions, but that doesn't work on other versions/databases, so I'll do it the more complex way:
SELECT logs.*
FROM logs JOIN (
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4 ) as latest
ON logs.project_id = latest.project_id
AND logs.date = latest.max_date;
Now, if you have access to windowing functions, it's a bit neater (I think anyway), and certainly faster to execute:
SELECT * FROM (
SELECT logs.field1, logs.field2, logs.field3, logs.date
rank() over ( partition by project_id
order by "date" DESC ) as dateorder
FROM logs ) as logsort
WHERE dateorder = 1
ORDER BY logs.date DESC LIMIT 1;
OK, maybe it's not easier to understand, but take my word for it, it runs worlds faster on a large database.
I'm not entirely sure how that translates to object syntax, though, or even if it does. Also, if you wanted to get other project data, you'd need to join against the projects table.
I know this is an old post, but in Django 2.0, I think you could just use:
Log.objects.values('project').distinct().order_by('project')[:4]
You need two querysets. The good thing is it still results in a single trip to the database (though there is a subquery involved).
latest_ids_per_project = Log.objects.values_list(
'project').annotate(latest=Max('date')).order_by(
'-latest').values_list('project')
log_objects = Log.objects.filter(
id__in=latest_ids_per_project[:4]).order_by('-date')
This looks a bit convoluted, but it actually results in a surprisingly compact query:
SELECT "log"."id",
"log"."project_id",
"log"."msg"
"log"."date"
FROM "log"
WHERE "log"."id" IN
(SELECT U0."id"
FROM "log" U0
GROUP BY U0."project_id"
ORDER BY MAX(U0."date") DESC
LIMIT 4)
ORDER BY "log"."date" DESC