Does Superset support query/join multiple tables at a time? - apache-superset

It is said in the superset documentation that it is impossible to query/join multiple tables.
Can I query/join multiple tables at one time?
Not directly no. A Superset SQLAlchemy datasource can only be a single table or a view.
But from my experience I can do that without any problems. Is it outdated documentation, or something that I don't yet understand?

yes,superset sql lab support join multiple tables one time.
select * from charles.m_jdbc_3 m left join druid.druid_supervisors d on m.id=d.id
where m.id={{id_value}} and m.value={{value_key}}

It is not possible to do joins simply..
https://github.com/apache/incubator-superset/issues/875

Related

What is LATEST_ON syntax in QuestDB?

I'm using QuestDB and SQL for the first time, and I stumbled upon the LATEST_ON syntax used in QuestDB. Can someone explain it's usage and where to use it?
Quoted from the docs:
For scenarios where multiple time series are stored in the same table, it is relatively difficult to identify the latest items of these time series with standard SQL syntax. QuestDB introduces LATEST ON clause for a SELECT statement to remove boilerplate clutter and splice the table with relative ease.
For more information visit the official documentation
LATEST ON is to find the latest record for each unique time series in a table. See this page for some examples: https://questdb.io/docs/reference/sql/latest-on/
It gives you the latest available record for each combination of the PARTITION BY values, according to the ON timestamp
Maybe easier to understand with an example. If you go to https://demo.questdb.io you can execute this query
select * from trades latest on timestamp
partition by symbol, side
It will then show you the latest existing row for each combination of Symbol and Side. If you wanted to do this using standard SQL, you would probably have to use a window function, something like this
select * from
(select *
,ROW_NUMBER() over (partition by Symbol, Side
order by timestamp DESC) AS RowNumber
from trades where timestamp > '2022-10-01') t
where t.RowNumber = 0
Latest on retrieves the latest entry by timestamp for a given key or combination of keys, for scenarios where multiple time series are stored in the same table.
Check this link for some examples: https://questdb.io/docs/reference/sql/latest-on/

In Redshift, how do you combine CTAS with the "if not exists" clause?

I'm having some trouble getting this table creation query to work, and I'm wondering if I'm running in to a limitation in redshift.
Here's what I want to do:
I have data that I need to move between schema, and I need to create the destination tables for the data on the fly, but only if they don't already exist.
Here are queries that I know work:
create table if not exists temp_table (id bigint);
This creates a table if it doesn't already exist, and it works just fine.
create table temp_2 as select * from temp_table where 1=2;
So that creates an empty table with the same structure as the previous one. That also works fine.
However, when I do this query:
create table if not exists temp_2 as select * from temp_table where 1=2;
Redshift chokes and says there is an error near as (for the record, I did try removing "as" and then it says there is an error near select)
I couldn't find anything in the redshift docs, and at this point I'm just guessing as to how to fix this. Is this something I just can't do in redshift?
I should mention that I absolutely can separate out the queries that selectively create the table and populate it with data, and I probably will end up doing that. I was mostly just curious if anyone could tell me what's wrong with that query.
EDIT:
I do not believe this is a duplicate. The post linked to offers a number of solutions that rely on user defined functions...redshift doesn't support UDF's. They did recently implement a python based UDF system, but my understanding is that its in beta, and we don't know how to implement it anyway.
Thanks for looking, though.
I couldn't find anything in the redshift docs, and at this point I'm
just guessing as to how to fix this. Is this something I just can't do
in redshift?
Indeed this combination of CREATE TABLE ... AS SELECT AND IF NOT EXISTS is not possible in Redshift (per documentation). Concerning PostgreSQL, it's possible since version 9.5.
On SO, this is discussed here: PostgreSQL: Create table if not exists AS . The accepted answer provides options that don't require any UDF or procedural code, so they're likely to work with Redshift too.

Write to multiple tables in HBASE

I have a situation here where I need to write to two of the hbase tables say table1,table 2. Whenever a write happens on table 1, I need to do some operation on table 2 say increment a counter in table 2 (like triggering). For this purpose I need to access (write) to two tables in the same task of a map-reduce program. I heard that it can be done using MultiTableOutputFormat. But I could not find any good example explaining in detail. Could some one please answer whether is it possible to do so. If so how can/should I do it. Thanks in advance.
Please provide me an answer that should not include co-processors.
To write into more than one table in map-reduce job, you have to specify that in job configuration. You are right this can be done using MultiTableOutputFormat.
Normally for a single table you use like:
TableMapReduceUtil.initTableReducerJob("tableName", MyReducer.class, job);
Instead of this write:
job.setOutputFormatClass(MultiTableOutputFormat.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setNumReduceTasks(2);
TableMapReduceUtil.addDependencyJars(job);
TableMapReduceUtil.addDependencyJars(job.getConfiguration());
Now at the time of writing data in table write as:
context.write(new ImmutableBytesWritable(Bytes.toBytes("tableName1")),put1);
context.write(new ImmutableBytesWritable(Bytes.toBytes("tableName2")),put2);
For this you can use HBase Observer, You have to create an observer and have to deploy on your server(applicable only for HBase Version >0.92), It will automatic trigger to another table.
And I think HBase Observer has similar concepts of like Aspects.
For more details -
https://blogs.apache.org/hbase/entry/coprocessor_introduction

Django - Raw SQL Queries - What Happens in Joins

I'm reading that I can use raw SQL in Django and have Django actually build my models from the results.
However I'm wondering what happens if I use joins in the raw SQL. How will Django know what models to use?
(Are there any other issues I should be aware of?)
It's not the joins that matter, but the column names. You could, for example, do the following:
SELECT table.id, other_table.name AS name from table join other_table using (id)
and pass that into your table model. Django would then treat the names from other_table as though they were names from table and give your normal table instances. I can't imagine why you would want to do that though...
The important thing to remember is that Django is using a very simple mapping from your SQL to its model structure. You can subvert it if you want, but you'll probably end up with some hard to maintain code.

borland builder c++ oracle question

I have a Borland builder c++ 6 application calling Oracle 10g database. Operating over a LAN. When the application in question makes a simple db select e.g.
select table_name from element_tablenames where element_id = 10023842
the following is recorded as happening in Oracle (from the performance logs)
select table_name
from element_tablenames
where element_id = 10023842
then immediately (and not from C++ source code but perhaps deeper)
select table_name, element_tablenames.ROWID
from element_tablenames
where element_id = 10023842
The select statement is only called once in the TADODbQuery object, yet two queries are being performed - one to parse and the other adds the ROWID for executon.
Over a WAN and many, many queries this is obviously a problem to the user.
Does anyone know why this might be happening, can someone suggest a solution?
Agree with Robert.
The ROWID uniquely identifies a row in a table so that the returned record can be applied back to the database with any changes (or as a DELETE).
Is there a way to identify a particular column (or set of columns) as a primary key so that it can be used to identify a row without using a ROWID.
I don't know exactly where the RowID is coming from, it could be either the TAdoQuery implementation or the Oracle Driver. But I am sure I found the reason.
From the Oracle docs:
If the database table does not contain a primary key, the ROWID must be selected explicitly when populating DataTable.
So I suspect your Table does not have a primary key, either add one or add the rowid.
Either way this will solve the duplicate query problem.
Since you are concerned about performance. In general
Using TAdoQuery you can set the CursorType to optimize different behaviors for performance. This article covers this from a TAdoQuery perspective. MSDN also has an article that covers it from from a general ADO Perspective. Finally the specifications from the Oracle Driver can be useful.
I would recommend setting the Cursor to either as they are the only supported by Oracle
ctStatic - Bi-directional query produced.
ctOpenForwardOnly - Unidirectional query produced, fastest but can't call Prior
You can also play with CursorLocation to see how it effects your speed.