Does QuestDB have EXPLAIN - questdb

Postgres has syntax to show the query plan for a SQL statement:
https://www.postgresql.org/docs/9.4/using-explain.html
Does QuestDB have something equivalent so that I can see the query plan that would result from my SQL before actually executing it?

The QuestDB docs show an explain option for the /exec HTTP endpoint. However, I can't get it to work on my instance running v6.2.
It should work like this:
curl -G \
--data-urlencode "query=select * from t limit 2;" \
--data-urlencode "explain=true" \
http://localhost:9000/exec

Related

Executing an SQL file on Redshift via the CLI

We have an SQL file that we would like to run on our Redshift cluster, we're already aware that this is possible via psql as described in this Stackoverflow answer and this Stackoverflow answer. However, we were wondering whether this was possible using the Redshift Data API?
We looked through the documentation but were unable to find anything apart from batch-execute-statement which takes a space delimited list of SQL statements. We're happy to resort to this but would prefer a method of running a file directly against the cluster.
Also, we'd like to parameterise the file as well, can this be done?
Our Current Attempt
This is what we've tried so far:
PARAMETERS="[\
{\"name\": \"param1\", \"value\": \"${PARAM1}\"}, \
{\"name\": \"param2\", \"value\": \"${PARAM2}\"}, \
{\"name\": \"param3\", \"value\": \"${PARAM3}\"}, \
{\"name\": \"param4\", \"value\": \"${PARAM4}\"}, \
{\"name\": \"param5\", \"value\": \"${PARAM5}\"}, \
{\"name\": \"param6\", \"value\": \"${PARAM6}\"}\
]"
SCRIPT_SQL=$(tr -d '\n' <./sql/script.sql)
AWS_RESPONSE=$(aws redshift-data execute-statement \
--region $AWS_REGION \
--cluster-identifier $CLUSTER_IDENTIFIER \
--sql "$SCRIPT_SQL" \
--parameters "$PARAMETERS" \
--database public \
--secret $CREDENTIALS_ARN)
Where all undeclared variables are variables set earlier in the script.
I am a bit confused. Redshift Data API is a REST API which expects you to send a request and it executes the query against your cluster (or serverless). Typical usage might be like using a Lambda Function to connect to your Redshift environment, and execute queries from there. You can load your file into Lambda, decompose and send the commands one by one if you like. And of course, you can parametrise anything within that Lambda. But for the Redshift Data API to work, it needs to be request - response type of operation.
And please note it is an async API.

How do I delete tables in the QuestDB console?

I imported a lot of CSV files into my database for testing and I'd like to clear out a few of the tables I don't need. How can I get rid of multiple tables at once? Is there an easy way like selecting many tables in this view:
Easiest way I found was to use the REST API /exec endpoint: https://questdb.io/docs/develop/insert-data/#exec-endpoint
I generated a bash script using the output of the "select name from tables()" Meta function.
Example lines:
curl -G --data-urlencode "query=DROP TABLE 'delete_me.csv'" http://localhost:9000/exec
curl -G --data-urlencode "query=DROP TABLE 'delete_me_also.csv'" http://localhost:9000/exec
If you use the web console (or even /exec to query) select name from tables() can be filtered on a regex just like a regular query.
Converting to a bash script is manual though. I recommend just dumping the table names to csv, then using bash to put in the appropriate quotes, etc.
I did it with awk:
awk -F, '{ print "curl -G --data-urlencode \"query="DROP TABLE \'$0\'"\" http://localhost:9000/exec"}' ~/Downloads/quest_db_drop_tables.sql > ~/Downloads/quest_db_drop_tables.sh

How can I programmatically download data from QuestDB?

Is there a way to download query results from the database such as tables or other datasets? The UI supports a CSV file download, but this is manual work to browse and download files at the moment. Is there a way I can automate this? Thanks
You can use the export REST API endpoint, this is what the UI uses under the hood. To export a table via this endpoint:
curl -G --data-urlencode "query=select * from my_table" http://localhost:9000/exp
query= may be any SQL query, so if you have a report with more granularity that needs to be regularly generated, this may be passed into the request. If you don't need anything complicated, you can redirect curl output to file
curl -G --data-urlencode "query=select * from my_table" \
http://localhost:9000/exp > myfile.csv

Clockify Api - get hours for each user for the last 24 hours

Hi I was wondering if you could help me.
What would be the right endpoint to get a report for hours logged for each user in a given workspace the past 24 hours? The API doesn't make it clear which report to use and what values to supply in a post request to get this result.
Excuse me if I seem a little naive about the capabilities, I have been asked to look at this without prior knowledge of the API and I'm just trying to get me head around it.
Clockify just seem to have shut down their old api (I was using). The report API is documented here: https://clockify.me/developers-api#tag-Reports
And this works quite well. For your case, the requests could look like:
curl --request POST \
--url https://reports.api.clockify.me/v1/workspaces/<YOUR WORKSPACE>/reports/summary \
--header 'content-type: application/json' \
--header 'x-api-key: <YOUR API KEY>' \
--data '{
"dateRangeStart": "2020-08-13T00:00:00.000Z",
"dateRangeEnd": "2020-08-13T23:59:59.000Z",
"summaryFilter": {"groups": ["USER"]},
"exportType": "JSON"
}'
There is no "last 24h" though, you will have to adjust the dates yourself.
What might be interesting for your case:
This example just returns all durations by user. If you want all time entries explicitely, add TIMEENTRY as summary group:
"groups": ["USER", "TIMEENTRY"]
You can also export it to a different format: JSON, CSV, XLSX, PDF are supported.

How to connect to Redshift from AWS Glue (PySpark)?

I am trying to connect to Redshift and run simple queries from a Glue DevEndpoint (that is requirement) but can not seems to connect.
Following code just times out:
df = spark.read \
.format('jdbc') \
.option("url", "jdbc:redshift://my-redshift-cluster.c512345.us-east-2.redshift.amazonaws.com:5439/dev?user=myuser&password=mypass") \
.option("query", "select distinct(tablename) from pg_table_def where schemaname = 'public'; ") \
.option("tempdir", "s3n://test") \
.option("aws_iam_role", "arn:aws:iam::147912345678:role/my-glue-redshift-role") \
.load()
What could be the reason?
I checked URL, user, password and also tried different IAM roles but every time just hangs..
Also tried without IAM role (just having URL, user/pass, schema/table that already exists there) and also hangs/timeout:
jdbcDF = spark.read \
.format("jdbc") \
.option("url", "jdbc:redshift://my-redshift-cluster.c512345.us-east-2.redshift.amazonaws.com:5439/dev") \
.option("dbtable", "public.test") \
.option("user", "myuser") \
.option("password", "mypass") \
.load()
Reading data (directly in Glue SSH terminal) from S3 or from Glue tables (catalog) seems fine so I know that Spark and Dataframes are fine, just there is something with connection to RedShift but not sure what?
Select last option while creating glue job. And in next screen, it will ask to select Glue connection
You seem to be on the correct path. I connect and query Redshift from Glue PySpark job the same way except a minor change of using
.format("com.databricks.spark.redshift")
I have also successfully used
.option("forward_spark_s3_credentials", "true")
instead of
.option("iam_role", "my_iam_role")