Apache Superset tree chart doesn't display hierarchy correctly - apache-superset

I'm trying to present a hierarchy query in the tree chart in Apache Superset.
For some reason, it always displays it as a single dot or a straight line.
I've originally tried to use it for presenting the structure of pgBackRest information for PostgreSQL backups, but when that didn't work, I tried a simple hierarchy query for employees and managers and that didn't work as well.
If someone has worked with tree chart, please assist.
My Apache Superset version is 1.3.2
Attached are the queries I've tried to make it work.
with recursive cte as (
select 1 as level, ds.name, ds.backup_label, ds.backup_prior from (
select data->'name' as name,
(jsonb_array_elements(data->'backup')->>'label')::text as backup_label,
(jsonb_array_elements(data->'backup')->>'prior')::text as backup_prior
from jsonb_array_elements(v2.pgbackrest_info()) as data
) as ds
where ds.backup_prior is null
union all
select c.level + 1 as level, ds2.name, ds2.backup_label, ds2.backup_prior from (
select data->'name' as name,
(jsonb_array_elements(data->'backup')->>'label')::text as backup_label,
(jsonb_array_elements(data->'backup')->>'prior')::text as backup_prior
from jsonb_array_elements(v2.pgbackrest_info()) as data
) as ds2 join cte c on c.backup_label = ds2.backup_prior)
select * from cte;
Employees queries
SELECT id, name, manager_id, 1 as depth FROM employees
WHERE id = 2
UNION
SELECT e.id, e.name, e.manager_id, t.depth + 1
FROM employees as e
JOIN tree t
ON t.id = e.manager_id
)
SELECT id, name, manager_id, depth FROM tree;

Just in case this is of help, you can go through this particular example and adapt it to your own data.
First, we need to create a chart. I've run this query on SQL Lab and created a chart from it:
select 'Terror' as genre, 'IT' as movie
union
select 'Terror' as genre, 'The Shining' as movie
union
select 'Action' as genre, 'Terminator 2' as movie
union
select 'Comedy' as genre, 'Hot Fuzz' as movie
union
select 'Comedy' as genre, 'Bad Santa' as movie
union
select 'Movies' as genre, 'Terror' as movie
union
select 'Movies' as genre, 'Comedy' as movie
union
select 'Movies' as genre, 'Action' as movie
union
select '' as genre, 'Movies' as movie
Then configured that chart like this:
As you can see, I'm not using a column name since the ones I'm putting together are strings already, and I'm setting a root id value to the entry that should come as a root.

Related

Oracle Apex error ORA-01776: cannot modify more than one base table through a join view

I have an app in Oracle Apex 22.21. There are multiple tables (ORDERS, ORDER_ITEMS, STORES, and PRODUCTS).
ORDERS table
enter image description here
I have a Master Detail report that is editable. The main report shows the ORDERS table and the detail shows the ORDER_ITEMS table.
Report image
enter image description here
In the ORDERS table, there is a column STORE_ID which is a foreign key to the STORES table. The STORES table has a column STORE_NAME. I am able to edit the report (change the STORE_ID to another 'id' ex: 1,2,3) when the table's Source is set to the ORDERS table.
STORES table
enter image description here
STORES table data
enter image description here
I want the ORDERS table to include the STORE_NAME column referring to the STORES table. As it does not make sense for the user to enter a STORE_ID to edit a row. I want the user to be able to edit the STORE_ID by entering the STORE_NAME or by choosing an LOV. I changed the report Source Type to SQL Query and ran the below code.
select
ORDERS_LOCAL.*,
STORES.STORE_NAME
from ORDERS_LOCAL
inner join STORES
ON ORDERS_LOCAL.STORE_ID=STORES.STORE_ID
However, when I try to edit a cell, I encounter an error ORA-01776: cannot modify more than one base table through a join view
I've found a post/solution regarding this error and tried to follow the instructions. The first solution does not work in my case because I actually want the user to be able to edit the STORE_ID column by showing STORE_NAME.
enter image description here
I've tried changing and running the PL/SQL code exactly as instructed but nothing saves when I change a cell value and click save. But I don't receive any error.
BEGIN
CASE :apex$row_status
WHEN 'C'
THEN
INSERT INTO stores (store_id, store_name)
VALUES ( :p10_store_id, :p10_store_name);
INSERT INTO orders_local (order_id,
order_number,
order_date,
store_id,
full_name,
email,
city,
state,
zip_code,
credit_card,
order_items
)
VALUES ( :p10_order_id,
:p10_order_number,
:p10_order_date,
:p10_store_id,
:p10_full_name,
:p10_email,
:p10_city,
:p10_state,
:p10_zip_code,
:p10_credit_card,
:p10_order_items);
WHEN 'U'
THEN
UPDATE orders_local
SET order_id = :p10_order_id,
order_number = :p10_order_number,
order_date = :p10_order_date,
store_id = :p10_store_id,
full_name = :p10_full_name,
email = :p10_email,
city= :p10_city,
state= :p10_state,
zip_code= :p10_zip_code,
credit_card= :p10_credit_card,
order_items= :p10_order_items
WHERE order_id = :p10_order_id;
UPDATE stores
SET store_name = :p10_store_name
WHERE store_id = :p10_store_id;
WHEN 'D'
THEN
DELETE orders_local
WHERE order_id = :p10_order_id;
DELETE stores
WHERE store_id = :p10_store_id;
END CASE;
END;
Take a step back. The "report that is editable" is an interactive grid. If the report is display only, then you can use any SQL to display data. However, if it is editable then the SQL statement is used to update the rows as well. The statement
select
ORDERS_LOCAL.*,
STORES.STORE_NAME
from ORDERS_LOCAL
inner join STORES
ON ORDERS_LOCAL.STORE_ID=STORES.STORE_ID
Cannot be used to update the store_id in the orders_local table. Currently you're trying to work around this by using custom code for the update but that is overcomplicating things. So, take a step back and restart.
The query for the interactive grid should be
select
*
from ORDERS_LOCAL
Define a List of Values to display the select list for Stores. The query for that list of values is
select
store_id as return_value,
store_name as display_value
from stores
In the interactive grid us this list of values for the store_id column.
That is all there is to it. This will allow you to use the native process for handling the IG updates.

Query for listing Datasets and Number of tables in Bigquery

So I'd like make a query that shows all the datasets from a project, and the number of tables in each one. My problem is with the number of tables.
Here is what I'm stuck with :
SELECT
smt.catalog_name as `Project`,
smt.schema_name as `DataSet`,
( SELECT
COUNT(*)
FROM ***DataSet***.INFORMATION_SCHEMA.TABLES
) as `nbTable`,
smt.creation_time,
smt.location
FROM
INFORMATION_SCHEMA.SCHEMATA smt
ORDER BY DataSet
The view INFORMATION_SCHEMA.SCHEMATA lists all the datasets from the project the query is executed, and the view INFORMATION_SCHEMA.TABLES lists all the tables from a given dataset.
The thing is that the view INFORMATION_SCHEMA.TABLES needs to have the dataset specified like this give the tables informations : dataset.INFORMATION_SCHEMA.TABLES
So what I need is to replace the *** DataSet*** by the one I got from the query itself (smt.schema_name).
I am not sure if I can do it with a sub query, but I don't really know how to manage to do it.
I hope I'm clear enough, thanks in advance if you can help.
You can do this using some procedural language as follows:
CREATE TEMP TABLE table_counts (dataset_id STRING, table_count INT64);
FOR record IN
(
SELECT
catalog_name as project_id,
schema_name as dataset_id
FROM `elzagales.INFORMATION_SCHEMA.SCHEMATA`
)
DO
EXECUTE IMMEDIATE
CONCAT("INSERT table_counts (dataset_id, table_count) SELECT table_schema as dataset_id, count(table_name) from ", record.dataset_id,".INFORMATION_SCHEMA.TABLES GROUP BY dataset_id");
END FOR;
SELECT * FROM table_counts;
This will return something like:

Using another table in BigQuery Regex

I would like to map a string column to a category based on a regular expression match.
Is it possible to use another bigquery table containing the regular expressions and corresponding category for this? This would make it easier for me to update only a table when adding new categories/updating the regex, instead of having to update all queries that would use this lookup.
Query:
CASE
-- Use the entries from another table here
WHEN REGEXP_MATCH(string_to_check, cat1regex) THEN cat1
WHEN REGEXP_MATCH(string_to_check, cat2regex) THEN cat2
etc.
END
Mapping table:
Regex category
pagex|pagey xy
pagez|page1 z1
It's also possible there is another simple way to do something similar that I'm not thinking of, answers pointing those out are welcome too.
Any help would be appreciated.
Below is for BigQuery Standard SQL
#standardSQL
SELECT
string_to_check,
MAX(IF(REGEXP_CONTAINS(string_to_check, reg), category, NULL)) AS category
FROM yourTable
CROSS JOIN mappingTable
GROUP BY string_to_check
You can test / play with it using below dummy date from your question
#standardSQL
WITH `mappingTable` AS (
SELECT r'pagex|pagey' AS reg, 'xy' AS category UNION ALL
SELECT r'pagez|page1', 'z1'
),
`yourTable` AS (
SELECT string_to_check
FROM UNNEST(["pagex.com", "pagez#example.org", "page.example.net"]) AS string_to_check
)
SELECT
string_to_check,
MAX(IF(REGEXP_CONTAINS(string_to_check, reg), category, NULL)) AS category
FROM yourTable
CROSS JOIN mappingTable
GROUP BY string_to_check

Analyzing tweeter with hive, regex extract

I am trying to analyze what are the most popular hashtags of July. So far I am able to select tweets from July, or display the most popular tweets, but I didn't sucess in putting them together. I am thinking about creating a intermediate table with july tweets, then display the popular hashtags, but I don't know how, can you help me? What about a 2 level select (select a from select b from table) ?
SELECT hashtags.text, count(*) as total FROM tweets
WHERE regexp_extract(created_at, "(Tue) (Jul)*", 2) = "Jul"
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text), created_at
ORDER BY total_count DESC
LIMIT 200
Regards, K.
So far, I did this, which is pretty much what I want, but is there any mean to achieve this differently ?
Working nested query:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM (
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
) tweets
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
EDIT:
Ok, so if you want you can also do it by a temporary table:
CREATE TABLE tmpdb (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
Then you update it:
INSERT OVERWRITE TABLE tmpdb
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
And the request become as simple as this:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM tmpdb
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
The pro/cons about the second method:
You need to update the table if you want accurate requests, so it is not suited for one-shot request, but if you need to do multiple requests on the current state of the database, then this method is better.
Don't forget that, copying a database is a costly operation ! So know when to use it :)

How to use subquery in django?

I want to get a list of the latest purchase of each customer, sorted by the date.
The following query does what I want except for the date:
(Purchase.objects
.all()
.distinct('customer')
.order_by('customer', '-date'))
It produces a query like:
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
I am forced to use customer_id as the first ORDER BY expression because of DISTINCT ON.
I want to sort by the date, so what the query I really need should look like this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I don't want to sort using python because I still got to page limit the query. There can be tens of thousands of rows in the database.
In fact it is currently sorted by in python now and is causing very long page load times, so that's why I'm trying to fix this.
Basically I want something like this https://stackoverflow.com/a/9796104/242969. Is it possible to express it with django querysets instead of writing raw SQL?
The actual models and methods are several pages long, but here is the set of models required for the queryset above.
class Customer(models.Model):
user = models.OneToOneField(User)
class Purchase(models.Model):
customer = models.ForeignKey(Customer)
date = models.DateField(auto_now_add=True)
item = models.CharField(max_length=255)
If I have data like:
Customer A -
Purchase(item=Chair, date=January),
Purchase(item=Table, date=February)
Customer B -
Purchase(item=Speakers, date=January),
Purchase(item=Monitor, date=May)
Customer C -
Purchase(item=Laptop, date=March),
Purchase(item=Printer, date=April)
I want to be able to extract the following:
Purchase(item=Monitor, date=May)
Purchase(item=Printer, date=April)
Purchase(item=Table, date=February)
There is at most one purchase in the list per customer. The purchase is each customer's latest. It is sorted by latest date.
This query will be able to extract that:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I'm trying to find a way not to have to use raw SQL to achieve this result.
This may not be exactly what you're looking for, but it might get you closer. Take a look at Django's annotate.
Here is an example of something that may help:
from django.db.models import Max
Customer.objects.all().annotate(most_recent_purchase=Max('purchase__date'))
This will give you a list of your customer models each one of which will have a new attribute called "most_recent_purchase" and will contain the date on which they made their last purchase. The sql produced looks like this:
SELECT "demo_customer"."id",
"demo_customer"."user_id",
MAX("demo_purchase"."date") AS "most_recent_purchase"
FROM "demo_customer"
LEFT OUTER JOIN "demo_purchase" ON ("demo_customer"."id" = "demo_purchase"."customer_id")
GROUP BY "demo_customer"."id",
"demo_customer"."user_id"
Another option, would be adding a property to your customer model that would look something like this:
#property
def latest_purchase(self):
return self.purchase_set.order_by('-date')[0]
You would obviously need to handle the case where there aren't any purchases in this property, and this would potentially not perform very well (since you would be running one query for each customer to get their latest purchase).
I've used both of these techniques in the past and they've both worked fine in different situations. I hope this helps. Best of luck!
Whenever there is a difficult query to write using Django ORM, I first try the query in psql(or whatever client you use). The SQL that you want is not this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id" "shop_purchase.id" "shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC, "shop_purchase.date" DESC;
) AS result
ORDER BY date DESC;
In the above SQL, the inner SQL is looking for distinct on a combination of (customer_id, id, and date) and since id will be unique for all, you will get all records from the table. I am assuming id is the primary key as per convention.
If you need to find the last purchase of every customer, you need to do something like:
SELECT "shop_purchase.customer_id", max("shop_purchase.date")
FROM shop_purchase
GROUP BY 1
But the problem with the above query is that it will give you only the customer name and date. Using that will not help you in finding the records when you use these results in a subquery.
To use IN you need a list of unique parameters to identify a record, e.g., id
If in your records id is a serial key, then you can leverage the fact that the latest date will be the maximum id as well. So your SQL becomes:
SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id";
Note that I kept only one field (id) in the selected clause to use it in a subquery using IN.
The complete SQL will now be:
SELECT *
FROM shop_customer
WHERE "shop_customer.id" IN
(SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id");
and using the Django ORM it looks like:
(Purchase.objects.filter(
id__in=Purchase.objects
.values('customer_id')
.annotate(latest=Max('id'))
.values_list('latest', flat=True)))
Hope it helps!
I have a similar situation and this is how I'm planning to go about it:
query = Purchase.objects.distinct('customer').order_by('customer').query
query = 'SELECT * FROM ({}) AS result ORDER BY sent DESC'.format(query)
return Purchase.objects.raw(query)
Upside it gives me the query I want. Downside is that it is raw query and I can't append any other queryset filters.
This is my approach if I need some subset of data (N items) along with the Django query. This is example using PostgreSQL and handy json_build_object() function (Postgres 9.4+), but same way you can use other aggregate function in other database system. For older PostgreSQL versions you can use combination of array_agg() and array_to_string() functions.
Imagine you have Article and Comment models and along with every article in the list you want to select 3 recent comments (change LIMIT 3 to adjust size of subset or ORDER BY c.id DESC to change sorting of subset).
qs = Article.objects.all()
qs = qs.extra(select = {
'recent_comments': """
SELECT
json_build_object('comments',
array_agg(
json_build_object('id', id, 'user_id', user_id, 'body', body)
)
)
FROM (
SELECT
c.id,
c.user_id,
c.body
FROM app_comment c
WHERE c.article_id = app_article.id
ORDER BY c.id DESC
LIMIT 3
) sub
"""
})
for article in qs:
print(article.recent_comments)
# Output:
# {u'comments': [{u'user_id': 1, u'id': 3, u'body': u'foo'}, {u'user_id': 1, u'id': 2, u'body': u'bar'}, {u'user_id': 1, u'id': 1, u'body': u'joe'}]}
# ....