How can I programmatically download data from QuestDB? - questdb

Is there a way to download query results from the database such as tables or other datasets? The UI supports a CSV file download, but this is manual work to browse and download files at the moment. Is there a way I can automate this? Thanks

You can use the export REST API endpoint, this is what the UI uses under the hood. To export a table via this endpoint:
curl -G --data-urlencode "query=select * from my_table" http://localhost:9000/exp
query= may be any SQL query, so if you have a report with more granularity that needs to be regularly generated, this may be passed into the request. If you don't need anything complicated, you can redirect curl output to file
curl -G --data-urlencode "query=select * from my_table" \
http://localhost:9000/exp > myfile.csv

Related

How to load data/update Power BI Dataset monthly

I've been asked to implement a way to load data to my datasets once a month. As Power BI Service doesn't have this option, I had to find a solution using Power Query and bellow I describe the step-by-step of my solution.
If it helps you at some way, please, let me know by posting a comment bellow. If you have a better and/or more elegant solution I'm glad to hear from you.
So, as my first solution didn't work, here I'll post the definity solution that we (me and my colleges) found.
I have to say that this solution is not so simple as it uses a Linux server, Gitlab and Jenkins, so it require a relative complex environment and I'll not describe how to build it.
At the end, I'll suggest a simpler solution.
THE ENVIRONMENT
On my company we use Jenkins to schedule jobs, Gitlab to store source code and we have a Linux Server to execute small tasks using Shell Script. For this problem I used all three services besides Power BI API.
JENKINS
I use Jenkins to schedule a job that run montlhy. This job was created using the following configs:
Parameters: I created 2 parameters (workspace_id and dataset_id) so I can test the script at any environment (Power BI Workspace) by just changing the value of those parameters;
Schedule Job: this job was schedule to run every day 1 at 02:00 a.m. As Jenkins uses the same sintax as CRON (I thing it is just a intermediate between you and CRON) the value of this field is 0 2 1 * *.
Build: as here we have a remote linux server to execute the scripts, I used a Execute shell script on remote host using ssh. I don't know why on Jenkins you can not execute the curl command direct on the job, it just didn't work, so I had to split the solution into both Jenkins and Linux server. At SSH site you have to select the credentials (previously created by my team) and at command are the commands bellow:
#Navigate to the script shell directory
cd "script-shell-script/"
# pulls the last version of the script. If you aren't using Gitlab,
# remove this command
git pull
# every time git pulls a new file version, it has read access.
# This command allows the execution of the filechmod +x powerbi_refresh_dataset.sh
# make a call to the file passing as parameter the workspace id and dataset id
./powerbi_refresh_dataset.sh $ID_WORKSPACE $ID_DATASET
SHELL SCIPT
As you already imagine, the core solution is the content of powerbi_refresh_dataset.sh. But, before going, there, you must understand how Power BI API works and you have to configure your Power BI environment to make API calls work. So, please, make sure that you already have your Principal Service properly configured by following this tutorial: https://learn.microsoft.com/en-us/power-bi/developer/embedded/embed-service-principal
Once you got your object_id, client_id and client_secret you can create your shell script file. Bellow is the code of my .sh file.
# load OBJECT_ID, CLIENT_ID and CLIENT_SECRET as environment variables
source credential_file.sh
# This command retrieves a new token from Microsoft Credentials Manager
token_msg=$(curl -X POST "https://login.windows.net/$OBJECT_ID/oauth2/token" \
-H 'Content-Type: application/x-www-form-urlencoded' \
-H 'Accept: application/json' \
-d 'grant_type=client_credentials&resource=https://analysis.windows.net/powerbi/api&client_id='$CLIENT_ID'&client_secret='$CLIENT_SECRET
)
# Extract the token from the response message
token=$(echo "$token_msg" | jq -r '.access_token')
# Ask Power BI to refresh dataset
refresh_msg=$(curl -X POST 'https://api.powerbi.com/v1.0/myorg/groups/'$1'/datasets/'$2'/refreshes' \
-H 'Authorization: Bearer '$token \
-H 'Content-Type: application/json' \
-d '{"notifyOption": "NoNotification"}')
And here goes some explanation. The first command is source credential_file.sh which loads 3 variables (OBJECT_ID, CLIENT_ID and CLIENT_SECRET). The intention here is to separate confidential info from the script so I can store the main script file on a version control (Git) and not disclosure any sensitivy information. So, besides powerbi_refresh_dataset.sh file you must have credential_file.sh at the same directory and with the following content:
OBJECT_ID=OBJECT_ID_VALUE
CLIENT_ID=CLIENT_ID_VALUE
CLIENT_SECRET=CLIENT_SECRET_VALUE
It's important to say that if you are using Git or any other version control, only powerbi_refresh_dataset.sh file goes to version control and credential_file.sh file must remain only at your Linux Server. I suggest you to save it's content into a password store application like keepass, as CLIENT_SECRET is not possible to retrieve.
FINAL CONSIDERATIONS
So above is the most relevant info of my solution. As you can see I'm ommiting (intentionally) how to build the environment and make them talk (jekins with linux, jenkins with Git and so on).
If all you have is a Linux or Windows host, I suggest you this:
Linux Host
On this simpler environment, just create the powerbi_refresh_dataset.sh and credential_file.sh, place it at any directory and create a CRON task to call powerbi_refresh_dataset as many time as you wish.
Windows Host
On windows you can do almost the same as on Linux, but you'll have to replace the content of shell script file by Power Shell command (google it) and use the Scheduled Task to regularly execute you Power Shell file.
Well, I think this would help you. I know it's not a complete answer as it will only works if you have a similar environment, but I hope that the final tips might help you.
Best regards
The Solution
First let me resume the solution. I just putted a condition execution at the end of each query that checks if today is the day where new data must be uploaded or not. If yes, it returns the step to be executed, if not, it raises a error.
There is many ways to implement that and I'll go from the simplest form to the more complex one.
Simplest Version: checking if it's the day to load new data directly at the query
This is the simplest way to implement the solution, but, depending on your dataset it may not be the smartest one.
Lets say you have this foo query:
let
step1 = ...,
...,
...,
step10 = SomeFunction(Somevariable, someparameter)
in
setp10
Now lets pretend you want that query to upload new data just on 1st day of the month. To do that, you just insert a condicional struction at in clause.
let
step1 = ...,
...,
...,
step10 = SomeFunction(Somevariable, someparameter)
in
if Date.Day(DateTime.LocalNow()) = 1 then setp10 else error "Today is not the day to load data"
At this example I just replaced the setp10 at the return of the query by this piece of code:if Date.Day(DateTime.LocalNow()) = 1 then setp10 else error "Today is not the day to load data". By doing that, setp10 will be the result of this query only if this query is been executed at day 1st of the month, otherwise, it will return a error.
And here it's worthy some explanation. Power Query is not a script language that runs at the same order that it's declared. So the fact the condicional statement was placed at the end of the query doesn't mean that all code above will be executed before the error is launched. As Power Query just executes what's necessary, the if... statement it will probably be the first one to be executed. For more info about how Power Query works behind the scene, I stronlgy recomend you this reading: https://bengribaudo.com/blog/2018/02/28/4391/power-query-m-primer-part5-paradigm
Using function
Now lets move foward. Lets say that your Dataset set has not only one, but many queries and all of them needs to be executed only once a month. In this case, a smart way to do that is by using what all other programming languages have to reuse block of code: create a function!
For this, create a new Blank Query and paste this code on its body:
(step) =>
let
result = if Date.Day(DateTime.LocalNow()) = 1 then step else error "Today is not the day to load data"
in
result
Now, at each query you'll call this function, sending the last setp as parameter. The function will check which day is today and return the same step passed as parameter if it's the day to load the data. Otherwise, it will return the error.
Bellow is the code of our query using our function called check_if_upload
let
step1 = ...,
...,
...,
step10 = SomeFunction(Somevariable, someparameter)
step11 = check_if_upload(step10)
in
step11
Using parameters
One final tip. As your query raises a error if today is not the day to upload day, it means that you can only test your ETL once a month, right? The error message also limite you to save you Power Query, which means that if you don't apply the modifications you can't upload the new Power Query version (having this implementations) to Power BI Service.
Well, you could change the value of the day verification into the function, but it's let's say, a little dummy.
A more ellegante way to change this parameter is by using parameters. So, lets do it. Create a parameter (I'll call it Upload Day) as a number type. Now, all you have to do is use this parameter at your function. It will look like this:
(step) =>
let
result = if Date.Day(DateTime.LocalNow()) = #"Upload Day" then step else error "Today is not the day to load data"
in
result
That's it. Now you can change the upload day directly at Power BI Service, just changing this parameter at the dataset (click on dataset name and goes to Settings >> Parameters).
Hope you neiled it and that its helpful for you.
Best regards.

How to set metadata that created at certain time?

I want to set metadata all object when created date is on 12 o'clock tonight. For now I just can set metadata for all objects that's already in a bucket with this command below :
gsutil -m setmeta -h "Content-Type:application/pdf" -h "Content-disposition: inline" gs://mystorage/pdf/*.pdf
My plan is to set all new object by run gsutil command in the midnight automatically because I already make a command witch upload all file from my server to google storage every midnight. But the only problem is I don't know witch file is new.
I know that we can use google cloud trigger but I just want to use gsutil command if it's possible
I think there is no feature that gsutil or the GCS API provides to set metadata for objects based on timestamp.
According to link At upload time you can specify one or more metadata properties to associate with objects.
As you mentioned I already make a command which uploads all files from my server to google storage every midnight you can set metadata while uploading objects. command may look like below in your case.
gsutil -m setmeta -h "Content-Type:application/pdf" -h "Content-disposition: inline" cp -r images gs://bucket/images
Or else
you can list the objects based on timestamp and store the output to a file. By iterating through each line in the outfile file use your setmetadata command for the objects.
Or, you can use Pub/Sub notifications for Cloud Storage, and subscribe to the new objects event OBJECT_FINALIZE.
Some sample code showing this can be referred here

How do I delete tables in the QuestDB console?

I imported a lot of CSV files into my database for testing and I'd like to clear out a few of the tables I don't need. How can I get rid of multiple tables at once? Is there an easy way like selecting many tables in this view:
Easiest way I found was to use the REST API /exec endpoint: https://questdb.io/docs/develop/insert-data/#exec-endpoint
I generated a bash script using the output of the "select name from tables()" Meta function.
Example lines:
curl -G --data-urlencode "query=DROP TABLE 'delete_me.csv'" http://localhost:9000/exec
curl -G --data-urlencode "query=DROP TABLE 'delete_me_also.csv'" http://localhost:9000/exec
If you use the web console (or even /exec to query) select name from tables() can be filtered on a regex just like a regular query.
Converting to a bash script is manual though. I recommend just dumping the table names to csv, then using bash to put in the appropriate quotes, etc.
I did it with awk:
awk -F, '{ print "curl -G --data-urlencode \"query="DROP TABLE \'$0\'"\" http://localhost:9000/exec"}' ~/Downloads/quest_db_drop_tables.sql > ~/Downloads/quest_db_drop_tables.sh

What's the best way to add project level metadata to a google cloud project?

Labels are project level have character limitations like cannot have spaces. I could add metadata through a bigquery table, or on each server. I could also make a README.txt on the default appspot bucket.
What's the best way to add metadata at a project level? Things like what the project is about, why it's there, people responsible, stakeholders, developers, context/vocabulary. Eg when I get fired people can see what is what.
Storing Metadata:
1. Console
This is quite straightforward. Once you navigate to Metadata section under Compute Engine (Compute Engine > Metadata), you can add project-level key:value pair in the console.
2. gcloud
Type the following command in the cloud shell of the project.
gcloud compute project-info add-metadata --metadata projectMailID=abc#gmail.com
3. API
Sending a post request to the google API. This is usually a more manual task, where you need to make a GET first to get fingerprint and then post to the API using the fingerprint.
Querying Metadata:
1. curl or wget
This is the frequently used option for getting instance or project metadata.
curl "http://metadata.google.internal/computeMetadata/v1/project/" -H "Metadata-Flavor: Google"
The above command will list all the metadata associated with the given project. Metadata can be stored either in directory or a single entry. If the URL ends in /, then it lists the directory, else it shows the value of single entry key.
The custom-metadata are stored under attributes directory. This can be retrieved by:
curl "http://metadata.google.internal/computeMetadata/v1/project/attributes/" -H "Metadata-Flavor: Google"
The above command lists all custom entries made in the project. To get the value of a single entry, try this:
curl "http://metadata.google.internal/computeMetadata/v1/project/attributes/ProjectMailID" -H "Metadata-Flavor: Google"
Metadata-Flavor: Google
This header indicates that the request was sent with the intention of retrieving metadata values, rather than unintentionally purposes.
2. gcloud
The gcloud command will list all metadata and other information about the project.
gcloud compute project-info describe
3. API
Making a GET request to the API will do the equivalent of gcloud.
GET https://www.googleapis.com/compute/v1/projects/<project>
Additional Information:
Waiting For Updates
This option allows to wait for any changes to the metadata and then retrieve the updated value. This can be done by appending ?wait_for_change=true as query parameter.
curl "http://metadata.google.internal/computeMetadata/v1/project/attributes/?wait_for_change=true" -H "Metadata-Flavor: Google"
Recursive
This option is used to print recursively the entries in the directory. This can be done by appending ?recursive=true as query parameter.
curl "http://metadata.google.internal/computeMetadata/v1/project/attributes/?recursive=true" -H "Metadata-Flavor: Google"

Proc http with https url

So, I want to use Google Url shortener Api, and I try to use
proc http
so, when I run this code
filename req "D:\input.txt";
filename resp "D:\output.txt";
proc http
url="https://www.googleapis.com/urlshortener/v1/url"
method="POST"
in=req
ct="application/JSON"
out=resp
;run;
(where D:\input.txt looks like {"longUrl": "http://www.myurl.com"} ) everything works greate on my home SAS Base 9.3. But, at work, on EG 4.3, I get:
NOTE: The SAS System stopped processing this step because of errors.
and no possible to debug. After googling, I found, that I have to set java system option like this
-jreoptions (-Djavax.net.ssl.trustStore=full-path-to-the-trust-store -Djavax.net.ssl.trustStorePassword=trustStorePassword)
But, where I can get "the certificate of the service to be trusted"- and password to it?
Edit: As I noticed in comments below, my work SAS installed into server, so I didn't have direct access to configuration. Also, It isn't good idea to change servers config. So, I try to google more, and found beautiful solution using cUrl, without X command (cause it block in my EG). Equivalent syntax is:
filename test pipe 'curl -X POST -d #D:\input.txt https://www.googleapis.com/urlshortener/v1/url --header "Content-Type:application/json"';
data _null_;
infile test missover lrecl= 32000;
input ;
file resp;
put _infile_;
run;
Hope it help someone
Where to get the certificate
Open the URL that you want the certificate from via Chrome. Click on the lock file in the URL bar, click on "details" tab and then click on "Save as file" in the bottom right. You will need to know what trust store you are going to use at this stage. See the following step.
The password and trust store is defined by you. It is in most cases nothing more than an encrypted zip file. There are a lot of tools out there that allow you to create a trust store, encrypt it and then import the certificates into it. The choice will depend on what OS you are using. There are some java based tools that OS independent, for example Portecle. It allows to define various trust stores on different OS and you can administer them remotely.
Regards,
Vasilij