JPA Criteria Builder - jpa-2.0

Currently , I am doing some JPA stuff, but having some problem, as described below :
Table Structure-
Id(Integer) , StatusType(String) , CreationTime(TimeStamp)
I want to extract StatusType , Count(StatusType) and CreationTime[GROUP BY](Cast in Date instead of TimeStamp)
if CreationTime is grouped in TimeStamp then no grouping is done because of uniqueness of the timestamp
I have a sql query that solves my purpose - Select StatusType , Count(*) , Date(CreationTime) from Table Group By Date(CreationTime)
It casts Timestamp to Date & group by CreationTime,
but I want this to be with CriteriaBuilder API or at least in JPQL Query , so that it works for all Database. Any idea about it?
Thanks in advance.

you may cast the timestamp to a integer with "yyyyMMdd" format, by using:
year(CreateTime) * 10000 + month(CreateTime) * 100 + day(CreateTime)

Yes, timeStamp can be casted to Date using JPQL.
I think this blog post will solve your problem.

Related

Django ORM and GROUP BY

Newcommer to Django here.
I'm currently trying to fetch some data from my model with a query that need would need a GROUP BY in SQL.
Here is my simplified model:
class Message(models.Model):
mmsi = models.CharField(max_length=16)
time = models.DateTimeField()
point = models.PointField(geography=True)
I'm basically trying to get the last Message from every distinct mmsi number.
In SQL that would translates like this for example:
select a.* from core_message a
inner join
(select mmsi, max(time) as time from core_message group by mmsi) b
on a.mmsi=b.mmsi and a.time=b.time;
After some tries, I managed to have something working similarly with Django ORM:
>>> mf=Message.objects.values('mmsi').annotate(Max('time'))
>>> Message.objects.filter(mmsi__in=mf.values('mmsi'),time__in=mf.values('time__max'))
That works, but I find my Django solution quite clumsy. Not sure it's the proper way to do it.
Looking at the underlying query this looks like this :
>>> print(Message.objects.filter(mmsi__in=mf.values('mmsi'),time__in=mf.values('time__max')).query)
SELECT "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea FROM "core_message" WHERE ("core_message"."mmsi" IN (SELECT U0."mmsi" FROM "core_message" U0 GROUP BY U0."mmsi") AND "core_message"."time" IN (SELECT MAX(U0."time") AS "time__max" FROM "core_message" U0 GROUP BY U0."mmsi"))
I'd appreciate if you could propose a better solution for this problem.
Thanks !
You only need something like this:
Message.objects.all().distinct('mmsi').values('mmsi', 'time').order_by('mmsi','-id')
or like this:
Message.objects.all().values('mmsi').annotate(date_last=Max('time'))
Note: the last is translate by Django in this sql query:
SELECT "message"."mmsi", MAX("message"."time") AS "date_last" FROM "message" GROUP BY "message"."mmsi", "message"."time" ORDER BY "message"."time" DESC
Using the answers and comments, I managed to solve this using a subquery or a simple distinct order by.
Simple distinct order by solution inspired by #Oriphiel answer:
Message.objects.distinct('mmsi').order_by('mmsi','-time')
The underlying SQL query looks like this :
SELECT DISTINCT ON ("core_message"."mmsi") "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea
FROM "core_message"
ORDER BY "core_message"."mmsi" ASC, "core_message"."time" DESC
Simple and straightforward.
Subquery solution inspired by #DanielRoseman comment:
time_order=Message.objects.filter(mmsi=OuterRef('mmsi')).order_by('-time')
Message.objects.filter(id__in=Subquery(time_order.values('id')[:1]))
The underlying SQL query looks like this :
SELECT "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea
FROM "core_message"
WHERE "core_message"."id" IN
(SELECT U0."id" FROM "core_message" U0 WHERE U0."mmsi" = ("core_message"."mmsi") ORDER BY U0."time" DESC LIMIT 1)
A tad more complex but it gives more flexibility. If I wanted to get first five messages for every MMSI, I'd just need to change the LIMIT value. In Django, it would look like this :
Message.objects.filter(id__in=Subquery(time_order.values('id')[:5]))

how to pick maximum date from a group of dates coming through source,other than high date?

I have around 7 fields coming from source. I need to pick the highest date of them all in an informatica expression. I may also get a default high date (12/31/9999) in any fields, but if that date shows up in any of the fields , then that has to be skipped in comparison.
e.g, if my source fields have data - 1/1/2001 , 1/2/2002, 2/2,2003, 12/31/999.
Then my expression output has to be 2/2/2003.
Create additional port that will discard the default, like
agg_Date = IIF(in_Date = '12/31/9999', NULL, in_Date)
Now use agg_Date in Aggregator Transformation to calculate MAX.
You can do it on SQ level ( custom sql query ) or Informatica level :
https://forgetcode.com/informatica/1472-greatest-find-greatest-value
Step 1:
Define check for each field ( or do it inline )
DATE_1_CHECKED = IIF( DATE_1 = TO_DATE('9999.. ', 'YYYY-' ), NULL, DATE_1)
Step 2:
GREATEST(DATE_1, DATE_2, DATE_3 )
ps. I'm not sure about casting function TO_DATE, please read doc.
ps. If You want to cut precission of date/time in informatica, please use trim(DATE_1, 'DD') to get date with HH24:MM:SS zero filled.
write an IIF to compare the date values using < , > and output the greatest one and !=12/31/999

QDateTime with sqlite3

I'm using Sqlite3 with Qt , anyway to save the DateTime form in the db i used Text type , see this from my db :
data
so the INSERT and SELECT is work very , but how i could make specific SELECT !
my code :
QString("SELECT * from main.sell_cash_log WHERE 'when' >= '%1' AND 'when' <= '%2'").arg(ui->fromdate->dateTime().toString("dd-MM-yyyy:HH-mm-ss")).arg(ui->todate->dateTime().toString("dd-MM-yyyy:HH-mm-ss"))
You're probably better off using one of the date operaterators to get info for a specific date
https://www.tutorialspoint.com/sqlite/sqlite_data_types.htm
To select all in the month of November:
SELECT * FROM main.sell_cash_log WHERE strftime('%Y-%m-%d', when) BETWEEN "11-01-2016" AND "11-31-2016"
See also SQL Select between dates which is where I copied that query q
The problem was by the field called when , in insert query i was useing escape string ('when') but with select not work (' ' ) so i used (when) and it's workd :
:
CartItems->setQuery(QString("SELECT * from main.sell_cash_log WHERE datetime(`when`) BETWEEN datetime('%1') AND datetime('%2')").arg(ui->fromdate->dateTime().toString("yyyy-MM-dd hh:mm:ss")).arg(ui->todate->dateTime().toString("yyyy-MM-dd hh:mm:ss")));

Django + PostgreSQL: Fill missing dates in a range

I have a table with one of the columns as date. It can have multiple entries for each date.
date .....
----------- -----
2015-07-20 ..
2015-07-20 ..
2015-07-23 ..
2015-07-24 ..
I would like to get data in the following form using Django ORM with PostgreSQL as database backend:
date count(date)
----------- -----------
2015-07-20 2
2015-07-21 0 (missing after aggregation)
2015-07-22 0 (missing after aggregation)
2015-07-23 1
2015-07-24 1
Corresponding PostgreSQL Query:
WITH RECURSIVE date_view(start_date, end_date)
AS ( VALUES ('2015-07-20'::date, '2015-07-24'::date)
UNION ALL SELECT start_date::date + 1, end_date
FROM date_view
WHERE start_date < end_date )
SELECT start_date, count(date)
FROM date_view LEFT JOIN my_table ON date=start_date
GROUP BY date, start_date
ORDER BY start_date ASC;
I'm having trouble translating this raw query to Django ORM query.
It would be great if someone can give a sample ORM query with/without a workaround for Common Table Expressions using PostgreSQL as database backend.
The simple reason is quoted here:
My preference is to do as much data processing in the database, short of really involved presentation stuff. I don't envy doing this in application code, just as long as it's one trip to the database
As per this answer django doesn't support CTE's natively, but the answer seems quite outdated.
References:
MySQL: Select All Dates In a Range Even If No Records Present
WITH Queries (Common Table Expressions)
Thanks
I do not think you can do this with pure Django ORM, and I am not even sure if this can be done neatly with extra(). The Django ORM is incredibly good in handling the usual stuff, but for more complex SQL statements and requirements, more so with DBMS specific implementations, it is just not quite there yet. You might have to go lower and down to executing raw SQL directly, or offload that requirement to be done by the application layer.
You can always generate the missing dates using Python, but that will be incredibly slow if the range and number of elements are huge. If this is being requested by AJAX for other use (e.g. charting), then you can offload that to Javascript.
from datetime import date, timedelta
from django.db.models.functions import Trunc
from django.db.models.expressions import Value
from django.db.models import Count, DateField
# A is model
start_date = date(2022, 5, 1)
end_date = date(2022, 5, 10)
result = A.objects\
.annotate(date=Trunc('created', 'day', output_field=DateField())) \
.filter(date__gte=start_date, date__lte=end_date) \
.values('date')\
.annotate(count=Count('id'))\
.union(A.objects.extra(select={
'date': 'unnest(Array[%s]::date[])' %
','.join(map(lambda d: "'%s'::date" % d.strftime('%Y-%m-%d'),
set(start_date + timedelta(n) for n in range((end_date - start_date).days + 1)) -
set(A.objects.annotate(date=Trunc('created', 'day', output_field=DateField())) \
.values_list('date', flat=True))))})\
.annotate(count=Value(0))\
.values('date', 'count'))\
.order_by('date')
In stead of the recursive CTE you could use generate_series() to construct a calendar-table:
SELECT calendar, count(mt.zdate) as THE_COUNT
FROM generate_series('2015-07-20'::date
, '2015-07-24'::date
, '1 day'::interval) calendar
LEFT JOIN my_table mt ON mt.zdate = calendar
GROUP BY 1
ORDER BY 1 ASC;
BTW: I renamed date to zdate. DATE is a bad name for a column (it is the name for a data type)

Analyzing tweeter with hive, regex extract

I am trying to analyze what are the most popular hashtags of July. So far I am able to select tweets from July, or display the most popular tweets, but I didn't sucess in putting them together. I am thinking about creating a intermediate table with july tweets, then display the popular hashtags, but I don't know how, can you help me? What about a 2 level select (select a from select b from table) ?
SELECT hashtags.text, count(*) as total FROM tweets
WHERE regexp_extract(created_at, "(Tue) (Jul)*", 2) = "Jul"
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text), created_at
ORDER BY total_count DESC
LIMIT 200
Regards, K.
So far, I did this, which is pretty much what I want, but is there any mean to achieve this differently ?
Working nested query:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM (
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
) tweets
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
EDIT:
Ok, so if you want you can also do it by a temporary table:
CREATE TABLE tmpdb (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
Then you update it:
INSERT OVERWRITE TABLE tmpdb
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
And the request become as simple as this:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM tmpdb
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
The pro/cons about the second method:
You need to update the table if you want accurate requests, so it is not suited for one-shot request, but if you need to do multiple requests on the current state of the database, then this method is better.
Don't forget that, copying a database is a costly operation ! So know when to use it :)