I have the below MSSQL query for which I am not able to figure out the Korma entities. Please help out
select t.d as did from (
select dataid as d , count(dataid) as
cd from <table_name>
WHERE prid = <pid> group by dataid
) as t WHERE t.cd >1;
Thanks
SQL Korma documentation site contains subselect sample:
;; Subselects can be used as entities too!
(defentity subselect-example
(table (subselect users
(where {:active true}))
:activeUsers))
Related
So I'd like make a query that shows all the datasets from a project, and the number of tables in each one. My problem is with the number of tables.
Here is what I'm stuck with :
SELECT
smt.catalog_name as `Project`,
smt.schema_name as `DataSet`,
( SELECT
COUNT(*)
FROM ***DataSet***.INFORMATION_SCHEMA.TABLES
) as `nbTable`,
smt.creation_time,
smt.location
FROM
INFORMATION_SCHEMA.SCHEMATA smt
ORDER BY DataSet
The view INFORMATION_SCHEMA.SCHEMATA lists all the datasets from the project the query is executed, and the view INFORMATION_SCHEMA.TABLES lists all the tables from a given dataset.
The thing is that the view INFORMATION_SCHEMA.TABLES needs to have the dataset specified like this give the tables informations : dataset.INFORMATION_SCHEMA.TABLES
So what I need is to replace the *** DataSet*** by the one I got from the query itself (smt.schema_name).
I am not sure if I can do it with a sub query, but I don't really know how to manage to do it.
I hope I'm clear enough, thanks in advance if you can help.
You can do this using some procedural language as follows:
CREATE TEMP TABLE table_counts (dataset_id STRING, table_count INT64);
FOR record IN
(
SELECT
catalog_name as project_id,
schema_name as dataset_id
FROM `elzagales.INFORMATION_SCHEMA.SCHEMATA`
)
DO
EXECUTE IMMEDIATE
CONCAT("INSERT table_counts (dataset_id, table_count) SELECT table_schema as dataset_id, count(table_name) from ", record.dataset_id,".INFORMATION_SCHEMA.TABLES GROUP BY dataset_id");
END FOR;
SELECT * FROM table_counts;
This will return something like:
I have 2 databases in Athena each with it's own table. I'm not sure how to join two tables.Contractinfo_2019 is a database and so is enrollmentinfo_2019 another database. I keep getting error :
"SYNTAX_ERROR: line 11:10: Table awsdatacatalog.enrollmentinfo_2019.contractinfo2019 does not exist
This query ran against the "enrollmentinfo_2019" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 1bbc3941-4fa1-40a0-87c1-eb093784c990."
SELECT a.*,
b.*
FROM
(SELECT contract_id,
plan_id,
organization_type,
plan_type,
organization_name,
plan_name,
parent_organization
FROM contractinfo2019) AS a
LEFT JOIN
(SELECT contract_number,
plan_id,
state,
county,
enrollment
FROM enrollmentinfo2019) AS b
ON a.contract_id=b.contract_number
AND a.plan_id=b.plan_id
Can someone please guide me how to join table's in Athena. I'm not sure what am i doing wrong here?
I would recommend re-writing the query using WITH
for example:
WITH a AS
(SELECT contract_id,
plan_id,
organization_type,
plan_type,
organization_name,
plan_name,
parent_organization
FROM Contractinfo_2019.contractinfo2019),
b as
(SELECT contract_number,
plan_id,
state,
county,
enrollment
FROM enrollmentinfo_2019.enrollmentinfo2019)
SELECT * FROM a
LEFT JOIN b ON a.contract_id=b.contract_number
AND a.plan_id=b.plan_id
You just need qualified table names.
Instead of:
FROM contractinfo2019
use this (assuming I got your database and table name right):
FROM contractinfo_2019.contractinfo2019
I have several database tables with 2 primary keys, id and date. I do not update the records but instead insert a new record with the updated information. This new record has the same id and the date field is NOW(). I will use a product table to explain my question.
I want to be able to request the product details at a specific date. I therefore use the following subquery in DQL, which works fine:
WHERE p.date = (
SELECT MAX(pp.date)
FROM Entity\Product pp
WHERE pp.id = p.id
AND pp.date < :date
)
This product table has some referenced tables, like category. This category table has the same id and date primary key combination. I want to be able to request the product details and the category details at a specific date. I therefore expanded the DQL as shown above to the following, which also works fine:
JOIN p.category c
WHERE p.date = (
SELECT MAX(pp.date)
FROM Entity\Product pp
WHERE pp.id = p.id
AND pp.date < :date
)
AND c.date = (
SELECT MAX(cc.date)
FROM Entity\ProductCategory cc
WHERE cc.id = c.id
AND cc.date < :date
)
However, as you can see, if I have multiple referenced tables I will have to copy the same piece of DQL. I want to somehow add these subqueries to the entities so that every time an entity is called it adds this subquery.
I have thought of adding this in a __construct($date) or some kind of setUp($date) method, but I'm kind of stuck here. Also, would it help to add #Id to Entity\Product::date?
I hope someone can help me. I do not expect a complete solution, one step in a good direction would be very much appreciated.
I think I've found my solution. The trick was (first, to update to Doctrine 2.2 and) using a filter:
namespace Filter;
use Doctrine\ORM\Mapping\ClassMetaData,
Doctrine\ORM\Query\Filter\SQLFilter;
class VersionFilter extends SQLFilter {
public function addFilterConstraint(ClassMetadata $targetEntity, $targetTableAlias) {
$return = $targetTableAlias . '.date = (
SELECT MAX(sub.date)
FROM ' . $targetEntity->table['name'] . ' sub
WHERE sub.id = ' . $targetTableAlias . '.id
AND sub.date < ' . $this->getParameter('date') . '
)';
return $return;
}
}
Add the filter to the configuration:
$configuration->addFilter("version", Filter\VersionFilter");
And enable it in my repository:
$this->_em->getFilters()->enable("version")->setParameter('date', $date);
class Log:
project = ForeignKey(Project)
msg = CharField(...)
date = DateField(...)
I want to select the four most recent Log entries where each Log entry must have a unique project foreign key. I've tries the solutions on google search but none of them works and the django documentation isn't that very good for lookup..
I tried stuff like:
Log.objects.all().distinct('project')[:4]
Log.objects.values('project').distinct()[:4]
Log.objects.values_list('project').distinct('project')[:4]
But this either return nothing or Log entries of the same project..
Any help would be appreciated!
Queries don't work like that - either in Django's ORM or in the underlying SQL. If you want to get unique IDs, you can only query for the ID. So you'll need to do two queries to get the actual Log entries. Something like:
id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)
Actually, you can get the project_ids in SQL. Assuming that you want the unique project ids for the four projects with the latest log entries, the SQL would look like this:
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4;
Now, you actually want all of the log information. In PostgreSQL 8.4 and later you can use windowing functions, but that doesn't work on other versions/databases, so I'll do it the more complex way:
SELECT logs.*
FROM logs JOIN (
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4 ) as latest
ON logs.project_id = latest.project_id
AND logs.date = latest.max_date;
Now, if you have access to windowing functions, it's a bit neater (I think anyway), and certainly faster to execute:
SELECT * FROM (
SELECT logs.field1, logs.field2, logs.field3, logs.date
rank() over ( partition by project_id
order by "date" DESC ) as dateorder
FROM logs ) as logsort
WHERE dateorder = 1
ORDER BY logs.date DESC LIMIT 1;
OK, maybe it's not easier to understand, but take my word for it, it runs worlds faster on a large database.
I'm not entirely sure how that translates to object syntax, though, or even if it does. Also, if you wanted to get other project data, you'd need to join against the projects table.
I know this is an old post, but in Django 2.0, I think you could just use:
Log.objects.values('project').distinct().order_by('project')[:4]
You need two querysets. The good thing is it still results in a single trip to the database (though there is a subquery involved).
latest_ids_per_project = Log.objects.values_list(
'project').annotate(latest=Max('date')).order_by(
'-latest').values_list('project')
log_objects = Log.objects.filter(
id__in=latest_ids_per_project[:4]).order_by('-date')
This looks a bit convoluted, but it actually results in a surprisingly compact query:
SELECT "log"."id",
"log"."project_id",
"log"."msg"
"log"."date"
FROM "log"
WHERE "log"."id" IN
(SELECT U0."id"
FROM "log" U0
GROUP BY U0."project_id"
ORDER BY MAX(U0."date") DESC
LIMIT 4)
ORDER BY "log"."date" DESC
Here's how I can do it when MySQL is the backend,
cursor.execute('show tables')
rows = cursor.fetchall()
for row in rows:
cursor.execute('drop table %s; ' % row[0])
But how can I do it when postgresql is the backend?
cursor.execute("""SELECT table_name FROM information_schema.tables WHERE table_schema='public' AND table_type != 'VIEW' AND table_name NOT LIKE 'pg_ts_%%'""")
rows = cursor.fetchall()
for row in rows:
try:
cursor.execute('drop table %s cascade ' % row[0])
print "dropping %s" % row[0]
except:
print "couldn't drop %s" % row[0]
Courtesy of http://www.siafoo.net/snippet/85
You can use select * from pg_tables; get get a list of tables, although you probably want to exclude where schemaname <> 'pg_catalog'...
Based on another one of your recent questions, if you're trying to just drop all your django stuff, but don't have permission to drop the DB, can you just DROP the SCHEMA that Django has everything in?
Also on your drop, use CASCADE.
EDIT: Can you select * from information_schema.tables; ?
EDIT: Your column should be row[2] instead of row[0] and you need to specify which schema to look at with a WHERE schemaname = 'my_django_schema_here' clause.
EDIT: Or just SELECT table_name from pg_tables where schemaname = 'my_django_schema_here'; and row[0]
Documentation says that ./manage.py sqlclear Prints the DROP TABLE SQL statements for the given app name(s).
I use this script to clear the tables, I put it in a script called phoenixdb.sh because it burns the DB down and a new one rises from the ashes. I use this to prevent lots of migrations in the early dev portion of the project.
set -e
python manage.py dbshell <<EOF
DROP SCHEMA public CASCADE;
CREATE SCHEMA public;
EOF
python manage.py migrate
This wipes the tables from the db without deleting the db. Your Django user will need to own the schema though which you can setup with:
alter schema public owner to django-db-user-name;
And you might want to change the owner of the db as well
alter database django-db-name owner to django-db-user-name;
\dt is the equivalent command in postgres to list tables. Each row will contain values for (schema, Name, Type, Owner), so you have to use the second (row[1]) value.
Anyway, you solution will break (in MySQL and PostgreSQL) when foreign-key constraints are involved, and if there aren't any, you might get troubles with the sequences. So the best way is in my opinion to simply drop the whole database and call initdb again (which is also the more efficient solution).