How to skip N rows in QuestDB SQL? - questdb

I'm creating paging output from data in QuestDB table and want to take 100 rows after X pages of 100 rows. In Postgres it would be something like
select * from tbl
OFFSET 200
LIMIT 100
I see LIMIT but cannot find OFFSET equivalent in QuestDB SQL, is it supported?

Thanks to #basha04 the equivalent query is
select * from tbl
LIMIT 200,300

Related

Retrieving the row with the greatest timestamp in questDB

I'm currently running QuestDB 6.1.2 on linux. How do I get the row with maximum value from a table? I have tried the following on a test table with around 5 million rows:
select * from table where cast(timestamp as symbol) in (select cast(max(timestamp) as symbol) from table );
select * from table inner join (select max(timestamp) mm from table ) on timestamp >= mm
select * from table where timestamp = max(timestamp)
select * from table where timestamp = (select max(timestamp) from table )
where 1 is correct but runs in ~5s, 2 is correct and runs in ~500ms but looks unnecessarily verbose for a query, 3 compiles but returns an empty table, and 4 is incorrect syntax although that's how sql usually does it
select * from table limit -1 works. QuestDB returns rows sorted by timestamp as default, and limit -1 takes the last row, which happens to be the row with the greatest timestamp. To be explicit about ordering by timestamp, select * from table order by timestamp limit -1 could be used instead. This query runs in around 300-400ms on the same table.
As a side note, the third query using timestamp=max(timestamp) doesn't work yet since QuestDB does not support subqueries in where yet (questDB 6.1.2).

BigQuery - Join take few hours in 3TB table

I have 2 tables in BQ.
Table ID: blog
Table size: 3.07 TB
Table ID: llog
Table size: 259.82 GB
Im running the below query and it took few hours(even its not finished, I killed it, so not able to capture the query plan).
select bl.row.iddi as bl_iddi, count(lg.row.id) as count_location_ping
from
`poc_dataset.blog` bl LEFT OUTER JOIN `poc_dataset.llog` lg
ON
bl.row.iddi = lg.row.iddi
where bl.row.iddi="124623832432"
group by bl.row.iddi
Im not sure how to optimize this. Blog table has trillions of rows.
Unless some extra details are missing in your question - below should give you expected result
#standardSQL
SELECT row.iddi AS iddi, COUNT(row.id) AS count_location_ping
FROM `poc_dataset.llog`
WHERE row.iddi= '124623832432'
GROUP BY row.iddi

How do I select rows from an Sqlite table exluding ones from a previous query?

I have an Sqlite table having > 25 million rows. I'd selected 1 million rows randomly from this table using the following code:
# using sqlite3 code
c = cursor.execute("SELECT *
FROM reviews_table WHERE ROWID IN (SELECT ROWID FROM reviews_table ORDER BY RANDOM() LIMIT 1000000) ")
Now, I wish to select another 1 million rows from the table, excluding those rows in the previous query. How would I go about doing this?

how to perform 'SELECT TOP X FROM TABLE' type queries with DB2 / dashDB

I would like to perform the equivalent of SELECT TOP 1 ... query in db2 / dashDB:
SELECT TOP 1 * FROM customers
How can I achieve this?
You can achieve this query using the FETCH FIRST x ROWS ONLY statement, E.g.
SELECT * FROM customers FETCH FIRST 1 ROWS ONLY
Another way on dashDB, and more easy for my opinion is to use the 'limit n', E.g.
SELECT * FROM customers LIMIT 1

Using TABLESAMPLE with PERCENT returns all the records from table

I have a small test table with two fields - id and name, 19 records total. When I try to get 10 percent of record from this table using the following query, I get ALL the records. I tried to do this on large table, but result is the same - all records are returned. The query:
select * from test tablesample (10 percent) s;
If I use ROWS instead of TABLESAMPLE (i.e.: select * from test tablesample (10 rows) s;, it works fine, only 10 records are returned. How can I get just the neccessary percentage of records?
You can refer to the link below:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
You must be using CombinedHiveOutputFormat, which does not go well with ORC format. Hence you will never be able to save the output from Percent query to a table.
In my knowledge the best way to do this is using rand() function. But again you should not use this with order by() clause as it will impact performance. Here is my sample query which is time efficient :
SELECT * FROM table_name
WHERE rand() <= 0.0001
DISTRIBUTE BY rand()
SORT BY rand()
LIMIT 5000;
I tested this on 900M row table and query executed in 2 mins.
Hope this helps.
You can use PERCENT with TABLESAMPLE. For example:
SELECT * FR0M TABLE_NAME
TABLESAMPLE(1 PERCENT) T;
This will select 1% of the data size of inputs and not necessarily the number of rows. More details can be found here.
But if you are really looking for a method to select a percentage of the number of rows, then you may have to use LIMIT clause with the number of records you need to retrieve.
For example, if your table has 1000 records, then you can select random 10% records as:
select * from table_name order by rand() limit 100;