Django Window Function FirstValue Producing Invalid SQL - django

I'm trying to use a window function to get the first of each group and its producing the sql query below - I took the sql and played around with it in the console and it appears that the CAST() function is causing the issue. Is there a way to get rid of this in the query?
Django 2,1 Code:
def contract_list(user):
price_window = Window(
expression=FirstValue('settle_price'),
partition_by=F('contract'),
order_by=F('trade_time').desc()
)
market_prices = Trade.objects.filter(
contract__isnull=False
).annotate(
market_price=price_window
).values(
con=F('contract'),
market_price=F('market_price'),
).order_by(
'contract'
).all()
produces this SQL run on a SQLite3 database:
SELECT
CAST(FIRST_VALUE("trading_trade"."settle_price") AS NUMERIC) OVER PARTITION BY "trading_trade"."contract_id" ORDER BY "trading_trade"."trade_time" DESC) AS "market_price",
"trading_trade"."contract_id" AS "con"
FROM
"trading_trade"
WHERE
"trading_trade"."contract_id" IS NOT NULL
ORDER BY
"trading_trade"."contract_id" ASC
Again, the issue seems to be related to the CAST(FIRST_VALUE("trading_trade"."settle_price") AS NUMERIC) since removing the cast function produces a result set.

Related

Is it possible to update a piece of data in a pre-existing table using the output of data from a CTE?

I need to correct a datapoint in a pre-existing table. I am using multiple CTEs to find the bad value and the corresponding good value. I am having trouble working out how to overwrite the value in the table using the output of the CTE. Here is what I am trying:
with [extra CTEs here]....
,CTE3 AS (
SELECT c1.FIELD_1, c1.FIELD_2 AS GOOD, c2.FIELD_3 AS BAD
FROM CTE1 c1
JOIN CTE2 c2 ON c1.FIELD_1 = c2.FIELD_1
)
update TABLE1
set TABLE1.FIELD_3 = CTE3.GOOD
from CTE3
INNER JOIN TABLE1 ON CTE3.BAD = TABLE1.FIELD_3
Is it even possible to achieve this?
If so, how should I change my logic to get it to work?
Trying the above logic is throwing the following error:
SQL Error [42601]: An unexpected token "WITH CTE1 AS ( SELECT
FIELD_1" was found following "BEGIN-OF-STATEMENT". Expected tokens
may include: "<update>".. SQLCODE=-104, SQLSTATE=42601,
DRIVER=4.27.25
Table designs and expected output:

DAX Query Union Multiple Tables & Return Distinct

I have two tables (CompletedJobs & ScriptDetails) and using DAX, I want to return distinct Names that appear in CompletedJobs that do not appear in ScriptDetails.
Here is my SQL Query. Works and return values.
Select Distinct CJ.[Name]
From CompletedJobs CJ
Left Join ScriptDetails SD
ON CJ.[Name]=SD.ActivityName
Where SD.ActivityName IS NULL
I started with creating the following DAX query, but just doing this, I get the following error message:
"A table of multiple values was supplied where a single value was expected"
AdhocJobs = DISTINCT(UNION(SELECTCOLUMNS(CompletedJobs,"Name",CompletedJobs[Name]),SELECTCOLUMNS(ScriptDetails,"Name",ScriptDetails[ActivityName])))
How do I create a DAX query that would replicate the SQL query?
Rather than recreate your SQL, there is DAX that already addresses your specific use case. The EXCEPT function returns a table where rows from the LEFT SIDE table do not appear in the RIGHT SIDE table.
EVALUATE
DISTINCT (
EXCEPT (
SUMMARIZE ( CompletedJobs , CompletedJobs [Name]),
SUMMARIZE ( ScriptDetails , ScriptDetails [ActivityName])
)
)
In this case I used SUMMARIZE to reduce each table down to one column, and then wrapped them with EXCEPT to take only the Names from Completed Jobs that aren't ActivityNames in ScriptDetails.
Hope it helps.

How to enforce Django to use "JOIN VALUES"

I'm having a performance problem where I need to replace section of my query statement. Right now I have a the following:
select count(*) FROM "mytable" WHERE "field" IN ('v1', 'v2', ..., 'vN');
this can be translated to Django ORM:
Mytable.objects.all().filter(field__in=[myvalues]).count()
I need to do the following though:
select count(*) FROM "mytable" JOIN (values ('v1', 'v2', ..., 'vN')) as lookup(value) on lookup.value = "mytable".field;
Is there a way to add this to the ORM? I need to do with ORM because I already have other filters. Worst case scenario I thought of getting the query string and adding there manually...
I'm using Postgresql 9.6
I found a way after reading over and over the documentation. I even found a patch that was not merged a while ago.
It doesn't really do the join, but it works much faster than using __in straightforward.
What I'm doing is executing a RawSQL() that was introduced in Django 2.0 and with that result I do the __in again.
So here is a code example:
query = """select myfield from mytable join (values
('v1'), ('v2'), ..., ('vN')
) as lookup(value) on lookup.value = mytable.myfield"""
r = RawSQL(query, [])
mymodel.filter(myfield__in=r)
Now it takes miliseconds instead of minutes!

Executemany on pyodbc only return result from last parameter

I have a problem when I try to use pyodbc executemany function.
I have an Oracle database and I want to extract data for multiple days.
I cannot use between in my request, because the database is not indexed on the date field and its taking forever.
I want to manually ask all day and process answers.
I cannot thread this part, so I wanted to use executemany to get rows more quickly.
The problem is when I use executemany I only got the result of the last argument asked.
Here is my code:
import pyodbc
conn = pyodbc.connect('DRIVER={Oracle in instantclient_11_2};DBQ=dbname;UID=uid;PWD=pwd')
cursor = conn.cursor()
query = "SELECT date FROM table WHERE date = TO_DATE(?, 'DD/MM/YYYY')"
query_args = (
('29/04/2016',),
('28/04/2016',),
)
cursor.executemany(query, query_args)
rows = cursor.fetchall()
In rows, I can only find rows with (datetime.datetime(2016, 4, 28, 0, 0), ).
Always the last argument.
I am using python 2.7.9 from WinPython on a Oracle database with a client on 11.0.2.
Except this query, every other query is perfectly fine.
I cannot use IN () synthax for 2 reasons:
I want to limit operations on database side, and do most of thing on script side (I've tried but it's way too long)
I might have more than 1000 different dates in the request.
(Right now I'm using IN() OR IN() OR IN()... but if anyone find something better that would be wonderful !)
Am I doing something wrong ?
Thanks for helping.
Your query runs once with one argument. If you want to run for multiple dates either use "IN" clause, this will require to modify query_args a bit.
"SELECT date FROM table WHERE date in (TO_DATE(?, 'DD/MM/YYYY'), TO_DATE(?, 'DD/MM/YYYY'))"
query_args = (
('29/04/2016','28/04/2016'),
)
or cursor through each date argument:
while query_arg in query_args:
cursor.executemany(query, query_arg )
rows = cursor.fetchall()

Django: Distinct foreign keys

class Log:
project = ForeignKey(Project)
msg = CharField(...)
date = DateField(...)
I want to select the four most recent Log entries where each Log entry must have a unique project foreign key. I've tries the solutions on google search but none of them works and the django documentation isn't that very good for lookup..
I tried stuff like:
Log.objects.all().distinct('project')[:4]
Log.objects.values('project').distinct()[:4]
Log.objects.values_list('project').distinct('project')[:4]
But this either return nothing or Log entries of the same project..
Any help would be appreciated!
Queries don't work like that - either in Django's ORM or in the underlying SQL. If you want to get unique IDs, you can only query for the ID. So you'll need to do two queries to get the actual Log entries. Something like:
id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)
Actually, you can get the project_ids in SQL. Assuming that you want the unique project ids for the four projects with the latest log entries, the SQL would look like this:
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4;
Now, you actually want all of the log information. In PostgreSQL 8.4 and later you can use windowing functions, but that doesn't work on other versions/databases, so I'll do it the more complex way:
SELECT logs.*
FROM logs JOIN (
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4 ) as latest
ON logs.project_id = latest.project_id
AND logs.date = latest.max_date;
Now, if you have access to windowing functions, it's a bit neater (I think anyway), and certainly faster to execute:
SELECT * FROM (
SELECT logs.field1, logs.field2, logs.field3, logs.date
rank() over ( partition by project_id
order by "date" DESC ) as dateorder
FROM logs ) as logsort
WHERE dateorder = 1
ORDER BY logs.date DESC LIMIT 1;
OK, maybe it's not easier to understand, but take my word for it, it runs worlds faster on a large database.
I'm not entirely sure how that translates to object syntax, though, or even if it does. Also, if you wanted to get other project data, you'd need to join against the projects table.
I know this is an old post, but in Django 2.0, I think you could just use:
Log.objects.values('project').distinct().order_by('project')[:4]
You need two querysets. The good thing is it still results in a single trip to the database (though there is a subquery involved).
latest_ids_per_project = Log.objects.values_list(
'project').annotate(latest=Max('date')).order_by(
'-latest').values_list('project')
log_objects = Log.objects.filter(
id__in=latest_ids_per_project[:4]).order_by('-date')
This looks a bit convoluted, but it actually results in a surprisingly compact query:
SELECT "log"."id",
"log"."project_id",
"log"."msg"
"log"."date"
FROM "log"
WHERE "log"."id" IN
(SELECT U0."id"
FROM "log" U0
GROUP BY U0."project_id"
ORDER BY MAX(U0."date") DESC
LIMIT 4)
ORDER BY "log"."date" DESC