I'm trying to use Doctrine 2 (LaravelDoctrine) instead of Eloquent.
Performing simple query of getting paginated data:
// Doctrine
$orders = $this->orderRepository->paginateAll(50);
// Eloquent
$orders = $this->orders->paginate(50);
I end up with Doctrine queries taking too much time (info from Laravel Debugbar):
Doctrine queries took 3.86s
1.79s
SELECT DISTINCT id_0 FROM (SELECT o0_.id AS id_0, o0_.customer_id AS customer_id_1, o0_.customer_ref AS customer_ref_2, ... FROM orders o0_) dctrn_result LIMIT 50 OFFSET 0
16.63ms
SELECT o0_.id AS id_0, o0_.customer_id AS customer_id_1, o0_.customer_ref AS customer_ref_2, ... FROM orders o0_ WHERE o0_.id IN (?)
2.05s
SELECT COUNT(*) AS dctrn_count FROM (SELECT DISTINCT id_0 FROM (SELECT o0_.id AS id_0, o0_.customer_id AS customer_id_1, o0_.customer_ref AS customer_ref_2, ... FROM orders o0_) dctrn_result) dctrn_table
Eloquent queries took 50.38ms
36.73ms
select count(*) as aggregate from `orders`
13.65ms
select * from `orders` limit 50 offset 0
I'm not pasting whole doctrine queries here as there's 65 columns in the table, but they all used with 'AS', identically to mentioned columns.
Is this is intended Doctrine 2 behavior to be so slow or am I missing something?
I used annotations as Meta Data for mapping and Doctrine\ORM\EntityRepository for accessing the data.
Thank you in advance.
Best Regards.
Related
So I'd like make a query that shows all the datasets from a project, and the number of tables in each one. My problem is with the number of tables.
Here is what I'm stuck with :
SELECT
smt.catalog_name as `Project`,
smt.schema_name as `DataSet`,
( SELECT
COUNT(*)
FROM ***DataSet***.INFORMATION_SCHEMA.TABLES
) as `nbTable`,
smt.creation_time,
smt.location
FROM
INFORMATION_SCHEMA.SCHEMATA smt
ORDER BY DataSet
The view INFORMATION_SCHEMA.SCHEMATA lists all the datasets from the project the query is executed, and the view INFORMATION_SCHEMA.TABLES lists all the tables from a given dataset.
The thing is that the view INFORMATION_SCHEMA.TABLES needs to have the dataset specified like this give the tables informations : dataset.INFORMATION_SCHEMA.TABLES
So what I need is to replace the *** DataSet*** by the one I got from the query itself (smt.schema_name).
I am not sure if I can do it with a sub query, but I don't really know how to manage to do it.
I hope I'm clear enough, thanks in advance if you can help.
You can do this using some procedural language as follows:
CREATE TEMP TABLE table_counts (dataset_id STRING, table_count INT64);
FOR record IN
(
SELECT
catalog_name as project_id,
schema_name as dataset_id
FROM `elzagales.INFORMATION_SCHEMA.SCHEMATA`
)
DO
EXECUTE IMMEDIATE
CONCAT("INSERT table_counts (dataset_id, table_count) SELECT table_schema as dataset_id, count(table_name) from ", record.dataset_id,".INFORMATION_SCHEMA.TABLES GROUP BY dataset_id");
END FOR;
SELECT * FROM table_counts;
This will return something like:
I have two tables:
table 1:
Product LOB
BVPN NS
SD-WAN IS
QUICK START NS
BVPN SMALL OSBU
Table 2:
Product LOB
BVPN NS
SD-WAN IS
QUICK START NS
BVPN SMALL NS
I want to create a custom column that will change the value "OSBU" in LOB column of table1 to NS based on the value in LOB column of table2 and keep other values the same. I used the following code but it's not giving me the desired output. Can anyone tell what is wrong?
Column =
IF (
'table1'[LOB] = "OSBU",
RELATED ( 'table2'[LOB] ),
'table1'[GOLD_BILLING_PROFILE.Product/Service]
)
RELATED function works between tables with a relationship established only. You would have to create a relationship between Table1 and Table2 based on Product and hopefully it is a one to one mapping. The following link should give you the basic details on creating and managing a relationship:
https://learn.microsoft.com/en-us/power-bi/desktop-create-and-manage-relationships
Hope this helps.
Edit:
I don't know why you are using a different variable for the FALSE Condition. Ideally it should be something like:
Column =
IF (
'table1'[LOB] = "OSBU",
RELATED ( 'table2'[LOB] ),
'table1'[LOB]
)
From
https://cloud.google.com/bigquery/docs/partitioned-tables:
you can shard tables using a time-based naming approach such as [PREFIX]_YYYYMMDD
This enables me to do:
SELECT count(*) FROM `xxx.xxx.xxx_*`
and query across all the shards. Is there a special notation that queries only the latest shard? For example say I had:
xxx_20180726
xxx_20180801
could I do something along the lines of
SELECT count(*) FROM `xxx.xxx.xxx_{{ latest }}`
to query xxx_20180801?
SINGLE QUERY INSPIRED BY Mikhail Berlyant:
SELECT count(*) as c FROM `XXX.PREFIX_*` WHERE _TABLE_SUFFIX IN ( SELECT
SUBSTR(MAX(table_id), LENGTH('PREFIX_') + 2)
FROM
`XXX.__TABLES_SUMMARY__`
WHERE
table_id LIKE 'PREFIX_%')
If you do care about cost (meaning how many tables will be scaned by your query) - the only way to do so is to do in two steps like below
First query
#standardSQL
SELECT SUBSTR(MAX(table_id), LENGTH('PREFIX') + 1)
FROM `xxx.xxx.__TABLES_SUMMARY__`
WHERE table_id LIKE 'PREFIX%'
Second Query
#standardSQL
SELECT COUNT(*)
FROM `xxx.xxx.PREFIX_*`
WHERE _TABLE_SUFFIX = '<result of first query>'
so, if result of first query is 20180801 so, second query will obviously look like below
#standardSQL
SELECT COUNT(*)
FROM `xxx.xxx.PREFIX_*`
WHERE _TABLE_SUFFIX = '20180801'
If you don't care about cost but rather need just result - you can easily combine above two queries into one - but - again - remember - even though result will be out of last table - cost will be as you query all table that match xxx.xxx.PREFIX_*
Forgot to mention (even though it should be obvious): of course when you have only COUNT(1) in your SELECT - the cost will be 0(zero) for both options - but in reality - most likely you will have something more valuable than just count(1)
I know this is a kind of an old thread but I was surprised why no one offers an answer using Variables.
"Héctor Neri" already mentioned this in the comments but I thought might be better to have an actual answer with a sample code posted.
#standardSQL
DECLARE SHARD_DATE STRING;
SET SHARD_DATE=(
SELECT MAX(REPLACE(table_name,'{TABLE}_',''))
FROM `{PRJ}.{DATASET}.INFORMATION_SCHEMA.TABLES`
WHERE table_name LIKE '{TABLE}_20%'
);
SELECT * FROM `{PRJ}.{DATASET}.{TABLE}_*`
WHERE _TABLE_SUFFIX = SHARD_DATE
Make sure to replace {PRJ}, {DATASET}, and {TABLE} values with your table location.
If you run this on BigQuery Web UI, you will see this message:
WARNING: Could not compute bytes processed estimate for script.
But you can see that variable properly reduce the table scan to the latest partition and does not cause any extra cost after running the script.
I have a table with one of the columns as date. It can have multiple entries for each date.
date .....
----------- -----
2015-07-20 ..
2015-07-20 ..
2015-07-23 ..
2015-07-24 ..
I would like to get data in the following form using Django ORM with PostgreSQL as database backend:
date count(date)
----------- -----------
2015-07-20 2
2015-07-21 0 (missing after aggregation)
2015-07-22 0 (missing after aggregation)
2015-07-23 1
2015-07-24 1
Corresponding PostgreSQL Query:
WITH RECURSIVE date_view(start_date, end_date)
AS ( VALUES ('2015-07-20'::date, '2015-07-24'::date)
UNION ALL SELECT start_date::date + 1, end_date
FROM date_view
WHERE start_date < end_date )
SELECT start_date, count(date)
FROM date_view LEFT JOIN my_table ON date=start_date
GROUP BY date, start_date
ORDER BY start_date ASC;
I'm having trouble translating this raw query to Django ORM query.
It would be great if someone can give a sample ORM query with/without a workaround for Common Table Expressions using PostgreSQL as database backend.
The simple reason is quoted here:
My preference is to do as much data processing in the database, short of really involved presentation stuff. I don't envy doing this in application code, just as long as it's one trip to the database
As per this answer django doesn't support CTE's natively, but the answer seems quite outdated.
References:
MySQL: Select All Dates In a Range Even If No Records Present
WITH Queries (Common Table Expressions)
Thanks
I do not think you can do this with pure Django ORM, and I am not even sure if this can be done neatly with extra(). The Django ORM is incredibly good in handling the usual stuff, but for more complex SQL statements and requirements, more so with DBMS specific implementations, it is just not quite there yet. You might have to go lower and down to executing raw SQL directly, or offload that requirement to be done by the application layer.
You can always generate the missing dates using Python, but that will be incredibly slow if the range and number of elements are huge. If this is being requested by AJAX for other use (e.g. charting), then you can offload that to Javascript.
from datetime import date, timedelta
from django.db.models.functions import Trunc
from django.db.models.expressions import Value
from django.db.models import Count, DateField
# A is model
start_date = date(2022, 5, 1)
end_date = date(2022, 5, 10)
result = A.objects\
.annotate(date=Trunc('created', 'day', output_field=DateField())) \
.filter(date__gte=start_date, date__lte=end_date) \
.values('date')\
.annotate(count=Count('id'))\
.union(A.objects.extra(select={
'date': 'unnest(Array[%s]::date[])' %
','.join(map(lambda d: "'%s'::date" % d.strftime('%Y-%m-%d'),
set(start_date + timedelta(n) for n in range((end_date - start_date).days + 1)) -
set(A.objects.annotate(date=Trunc('created', 'day', output_field=DateField())) \
.values_list('date', flat=True))))})\
.annotate(count=Value(0))\
.values('date', 'count'))\
.order_by('date')
In stead of the recursive CTE you could use generate_series() to construct a calendar-table:
SELECT calendar, count(mt.zdate) as THE_COUNT
FROM generate_series('2015-07-20'::date
, '2015-07-24'::date
, '1 day'::interval) calendar
LEFT JOIN my_table mt ON mt.zdate = calendar
GROUP BY 1
ORDER BY 1 ASC;
BTW: I renamed date to zdate. DATE is a bad name for a column (it is the name for a data type)
Django is to making a query much more complicated than it needs to be.
A Sentiment may have a User and a Card, and I am getting the Cards which are not in the passed User's Sentiments
This is the query:
Card.objects.all().exclude(sentiments__in=user.sentiments.all())
this is what Django runs:
SELECT * FROM "cards_card"
WHERE NOT ("cards_card"."id" IN (
SELECT V1."card_id" AS "card_id"
FROM "sentiments_sentiment" V1
WHERE V1."id" IN (
SELECT U0."id"
FROM "sentiments_sentiment" U0
WHERE U0."user_id" = 1
)
)
)
This is a version I came up with which didn't do an N-Times full table scan:
Card.objects.raw('
SELECT DISTINCT "id"
FROM "cards_card"
WHERE NOT "id" IN (
SELECT "card_id"
FROM "sentiments_sentiment"
WHERE "user_id" = ' + user_id + '
)
)')
I don't know why Django has to do it with the N-Times scan. I've been scouring the web for answers, but nothing so far. Any suggestions on how to keep the performance but not have to fall back to raw SQL?
A better way of writing this query without the subqueries would be:
Card.objects.all().exclude(sentiments__user__id=user.id)