DQL Perform a union in query - doctrine-orm

I know there are several post relatives to this question but I still can't do mine !
I have two tables (studios and models). I want to perform an union for my datatable.
Currently, I have this :
$cnn = $this->Doctrine->getConnection();
$inscr = $cnn->fetchAll("(SELECT s.thedate AS mydate, s.name AS designation, \'Studio\' AS mytype
FROM studios s ORDER BY mydate LIMIT 2)
UNION
(SELECT m.thedate AS mydate, m.nickname AS designation, \'Hotesse\' AS mytype
FROM models m ORDER BY mydate LIMIT 2
)");
return $inscr;
But nothing appears in my datatable and I can't perform any var_dump or other.
I test this in my RDBMS and I get waited results. So, anyone could help ?

I finally found a solution to perform the query and get a result
$sql = "(SELECT s.thedate AS mydate, s.name AS designation, 'Studio' AS mytype
FROM studios s ORDER BY mydate LIMIT 2)
UNION
(SELECT m.thedate AS mydate, m.nickname AS designation, 'Hotesse' AS mytype
FROM models m ORDER BY mydate LIMIT 2
)";
$stmt = $this->em->getConnection()->prepare($sql);
$stmt->execute();
return $stmt->fetchAll();
But, unfortunatly in datatable the result must be an instance of Doctrine\ORM\QueryBuilder and my solution returns an array..
If someone has an idea for my special case, I'm here !

Related

why use 'NA' = with the possibility of returning a group of values in SAS?

I have a quick question about the following piece of code. Why can we use 'NA' = for the subquery ? I mean, the subquery might return a group of values, not a single one, right? Could anyone tell me the reason? Many thanks for your time and attention.
proc sql;
select lastname, first name
from sasuser.staffmaster
where 'NA' =
(select jobcategory
from sasuser.supervisors
where staffmaster.empid = supervisors.empid);
quit;
Thanks again.
Assuming EMPID is a unique ID for an employee (I hope it is?), and each employee has only one supervisor, that query should resolve to a single row every time. (A single row for each row returned from the outer query, of course, which is important. Think of it like a join - that's basically what that is, a slightly oddly phrased join, which often will be turned into an actual join by the SQL parser.)
In general, however, sure, it could resolve to multiple rows. SAS will let you do the query, and if it returns just one row it works; if it returns 2+ rows, it fails. As Quentin pointed out in comments, this is a correlated subquery.

Analyzing tweeter with hive, regex extract

I am trying to analyze what are the most popular hashtags of July. So far I am able to select tweets from July, or display the most popular tweets, but I didn't sucess in putting them together. I am thinking about creating a intermediate table with july tweets, then display the popular hashtags, but I don't know how, can you help me? What about a 2 level select (select a from select b from table) ?
SELECT hashtags.text, count(*) as total FROM tweets
WHERE regexp_extract(created_at, "(Tue) (Jul)*", 2) = "Jul"
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text), created_at
ORDER BY total_count DESC
LIMIT 200
Regards, K.
So far, I did this, which is pretty much what I want, but is there any mean to achieve this differently ?
Working nested query:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM (
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
) tweets
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
EDIT:
Ok, so if you want you can also do it by a temporary table:
CREATE TABLE tmpdb (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
Then you update it:
INSERT OVERWRITE TABLE tmpdb
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
And the request become as simple as this:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM tmpdb
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
The pro/cons about the second method:
You need to update the table if you want accurate requests, so it is not suited for one-shot request, but if you need to do multiple requests on the current state of the database, then this method is better.
Don't forget that, copying a database is a costly operation ! So know when to use it :)

How to use subquery in django?

I want to get a list of the latest purchase of each customer, sorted by the date.
The following query does what I want except for the date:
(Purchase.objects
.all()
.distinct('customer')
.order_by('customer', '-date'))
It produces a query like:
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
I am forced to use customer_id as the first ORDER BY expression because of DISTINCT ON.
I want to sort by the date, so what the query I really need should look like this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I don't want to sort using python because I still got to page limit the query. There can be tens of thousands of rows in the database.
In fact it is currently sorted by in python now and is causing very long page load times, so that's why I'm trying to fix this.
Basically I want something like this https://stackoverflow.com/a/9796104/242969. Is it possible to express it with django querysets instead of writing raw SQL?
The actual models and methods are several pages long, but here is the set of models required for the queryset above.
class Customer(models.Model):
user = models.OneToOneField(User)
class Purchase(models.Model):
customer = models.ForeignKey(Customer)
date = models.DateField(auto_now_add=True)
item = models.CharField(max_length=255)
If I have data like:
Customer A -
Purchase(item=Chair, date=January),
Purchase(item=Table, date=February)
Customer B -
Purchase(item=Speakers, date=January),
Purchase(item=Monitor, date=May)
Customer C -
Purchase(item=Laptop, date=March),
Purchase(item=Printer, date=April)
I want to be able to extract the following:
Purchase(item=Monitor, date=May)
Purchase(item=Printer, date=April)
Purchase(item=Table, date=February)
There is at most one purchase in the list per customer. The purchase is each customer's latest. It is sorted by latest date.
This query will be able to extract that:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I'm trying to find a way not to have to use raw SQL to achieve this result.
This may not be exactly what you're looking for, but it might get you closer. Take a look at Django's annotate.
Here is an example of something that may help:
from django.db.models import Max
Customer.objects.all().annotate(most_recent_purchase=Max('purchase__date'))
This will give you a list of your customer models each one of which will have a new attribute called "most_recent_purchase" and will contain the date on which they made their last purchase. The sql produced looks like this:
SELECT "demo_customer"."id",
"demo_customer"."user_id",
MAX("demo_purchase"."date") AS "most_recent_purchase"
FROM "demo_customer"
LEFT OUTER JOIN "demo_purchase" ON ("demo_customer"."id" = "demo_purchase"."customer_id")
GROUP BY "demo_customer"."id",
"demo_customer"."user_id"
Another option, would be adding a property to your customer model that would look something like this:
#property
def latest_purchase(self):
return self.purchase_set.order_by('-date')[0]
You would obviously need to handle the case where there aren't any purchases in this property, and this would potentially not perform very well (since you would be running one query for each customer to get their latest purchase).
I've used both of these techniques in the past and they've both worked fine in different situations. I hope this helps. Best of luck!
Whenever there is a difficult query to write using Django ORM, I first try the query in psql(or whatever client you use). The SQL that you want is not this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id" "shop_purchase.id" "shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC, "shop_purchase.date" DESC;
) AS result
ORDER BY date DESC;
In the above SQL, the inner SQL is looking for distinct on a combination of (customer_id, id, and date) and since id will be unique for all, you will get all records from the table. I am assuming id is the primary key as per convention.
If you need to find the last purchase of every customer, you need to do something like:
SELECT "shop_purchase.customer_id", max("shop_purchase.date")
FROM shop_purchase
GROUP BY 1
But the problem with the above query is that it will give you only the customer name and date. Using that will not help you in finding the records when you use these results in a subquery.
To use IN you need a list of unique parameters to identify a record, e.g., id
If in your records id is a serial key, then you can leverage the fact that the latest date will be the maximum id as well. So your SQL becomes:
SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id";
Note that I kept only one field (id) in the selected clause to use it in a subquery using IN.
The complete SQL will now be:
SELECT *
FROM shop_customer
WHERE "shop_customer.id" IN
(SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id");
and using the Django ORM it looks like:
(Purchase.objects.filter(
id__in=Purchase.objects
.values('customer_id')
.annotate(latest=Max('id'))
.values_list('latest', flat=True)))
Hope it helps!
I have a similar situation and this is how I'm planning to go about it:
query = Purchase.objects.distinct('customer').order_by('customer').query
query = 'SELECT * FROM ({}) AS result ORDER BY sent DESC'.format(query)
return Purchase.objects.raw(query)
Upside it gives me the query I want. Downside is that it is raw query and I can't append any other queryset filters.
This is my approach if I need some subset of data (N items) along with the Django query. This is example using PostgreSQL and handy json_build_object() function (Postgres 9.4+), but same way you can use other aggregate function in other database system. For older PostgreSQL versions you can use combination of array_agg() and array_to_string() functions.
Imagine you have Article and Comment models and along with every article in the list you want to select 3 recent comments (change LIMIT 3 to adjust size of subset or ORDER BY c.id DESC to change sorting of subset).
qs = Article.objects.all()
qs = qs.extra(select = {
'recent_comments': """
SELECT
json_build_object('comments',
array_agg(
json_build_object('id', id, 'user_id', user_id, 'body', body)
)
)
FROM (
SELECT
c.id,
c.user_id,
c.body
FROM app_comment c
WHERE c.article_id = app_article.id
ORDER BY c.id DESC
LIMIT 3
) sub
"""
})
for article in qs:
print(article.recent_comments)
# Output:
# {u'comments': [{u'user_id': 1, u'id': 3, u'body': u'foo'}, {u'user_id': 1, u'id': 2, u'body': u'bar'}, {u'user_id': 1, u'id': 1, u'body': u'joe'}]}
# ....

Doctrine 2 edit DQL in entity

I have several database tables with 2 primary keys, id and date. I do not update the records but instead insert a new record with the updated information. This new record has the same id and the date field is NOW(). I will use a product table to explain my question.
I want to be able to request the product details at a specific date. I therefore use the following subquery in DQL, which works fine:
WHERE p.date = (
SELECT MAX(pp.date)
FROM Entity\Product pp
WHERE pp.id = p.id
AND pp.date < :date
)
This product table has some referenced tables, like category. This category table has the same id and date primary key combination. I want to be able to request the product details and the category details at a specific date. I therefore expanded the DQL as shown above to the following, which also works fine:
JOIN p.category c
WHERE p.date = (
SELECT MAX(pp.date)
FROM Entity\Product pp
WHERE pp.id = p.id
AND pp.date < :date
)
AND c.date = (
SELECT MAX(cc.date)
FROM Entity\ProductCategory cc
WHERE cc.id = c.id
AND cc.date < :date
)
However, as you can see, if I have multiple referenced tables I will have to copy the same piece of DQL. I want to somehow add these subqueries to the entities so that every time an entity is called it adds this subquery.
I have thought of adding this in a __construct($date) or some kind of setUp($date) method, but I'm kind of stuck here. Also, would it help to add #Id to Entity\Product::date?
I hope someone can help me. I do not expect a complete solution, one step in a good direction would be very much appreciated.
I think I've found my solution. The trick was (first, to update to Doctrine 2.2 and) using a filter:
namespace Filter;
use Doctrine\ORM\Mapping\ClassMetaData,
Doctrine\ORM\Query\Filter\SQLFilter;
class VersionFilter extends SQLFilter {
public function addFilterConstraint(ClassMetadata $targetEntity, $targetTableAlias) {
$return = $targetTableAlias . '.date = (
SELECT MAX(sub.date)
FROM ' . $targetEntity->table['name'] . ' sub
WHERE sub.id = ' . $targetTableAlias . '.id
AND sub.date < ' . $this->getParameter('date') . '
)';
return $return;
}
}
Add the filter to the configuration:
$configuration->addFilter("version", Filter\VersionFilter");
And enable it in my repository:
$this->_em->getFilters()->enable("version")->setParameter('date', $date);

Nested statements in sqlite

I'm using the sqlite3 library in c++ to query the database from *.sqlite file. can you write a query statement in sqlite3 like:
char* sql = "select name from table id = (select full_name from second_table where column = 4);"
The second statement should return an id to complete the query statement with first statement.
Yes you can, just make sure that the nested query doesn't return more than one row. Add a LIMIT 1 to the end of the nested query to fix this. Also make sure that it always returns a row, or else the main query will not work.
If you want to match several rows in the nested query, then you can use either IN, like so:
char* sql = "select name from table WHERE id IN (select full_name from second_table where column = 4);"
or you can use JOIN:
char* sql = "select name from table JOIN second_table ON table.id = second_table.full_name WHERE second_table.column = 4"
Note that the IN method can be very slow, and that JOIN can be very fast, if you index on the right columns
On a sidenote, you can use SQLiteadmin (http://sqliteadmin.orbmu2k.de/) to view the database and make queries directly in it (useful for testing etc).