How to connect apache superset directly with google BigQuery? - apache-superset

I am running Apache superset on GCP instance and it works fine with Sqlite database which is default in superset and I don't need to configure so many things. But my requirement is that I need superset to connect directly with BigQuery instead of Sqlite and I don't have developer background. So, is there an easy way to do that without heavy codes?

Connecting to BigQuery is very well documented here in Preset's Superset user documentation https://docs.preset.io/docs/big-query-database

Following the steps mentioned at the official Google Cloud page here, you need to do the following
Install pybigquery
pip install pybigquery
Download your Google Cloud authorization json key file
From your terminal instance, set GOOGLE_APPLICATION_CREDENTIALS env. var to the path of your json key file
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/[json_file].json"

Elbehery is right. I don't have enough rep to comment, but I wanted to note that Apache has created docs for this.
FWIW, I couldn't use the UI for importing credentials.json so I set it as an env var in my Docker image. Here are the commands and steps I run locally:
# Setup virtual environment (exit by typing "deactivate")
pip3 install virtualenv
python3 -m virtualenv ./.venv
source ./.venv/bin/activate
​
# Download Superset
git clone https://github.com/apache/superset.git
cd superset/
​
# Create a copy of your credentials for docker to use
cp ~/.config/gcloud/application_default_credentials.json docker/credentials.json
echo "GOOGLE_APPLICATION_CREDENTIALS=docker/credentials.json" >> docker/.env-non-dev
​
# Run Superset
docker-compose -f docker-compose-non-dev.yml pull
docker-compose -f docker-compose-non-dev.yml up
​
Now that Superset is running locally:
visit http://0.0.0.0:8088/ in your web browser
In the top right of UI, click the +DATABASE button (or + > Data > Connect Database)
In popup window Click Supported Databases then (at the bottom) Other
Set DISPLAY NAME: BigQuery
Set SQLALCHEMY URI: bigquery://my_project_id
Click Test Connection
Click Connect
​
Now that BigQuery is integrated into Superset:
In the top of page Click SQL Lab > SQL Editor

Related

How do I install Apache Superset CLI on Windows?

Superset offers a CLI for managing the Superset instance, but I am unable to find instructions for getting it installed and talking to my instance of Superset.
My local machine is Windows, but my instance of Superset is running in a hosted Kubernetes cluster.
-- Update 2 2022.08.06
After some continued exploration, have found some steps that seem to be getting me closer.
# clone the Superset repo
git clone https://github.com/apache/superset
cd superset
# create a virtual environment using Python 3.9,
# which is compatible with the current version of numpy
py -3.9 -m venv .venv
.venv\Scripts\activate
# install the Superset package
pip install apache-superset
# install requirements (not 100% sure which requirements are needed)
pip install -r .\requirements\base.txt
pip install -r .\requirements\development.txt
# install psycopg2
pip install psycopg2
# run superset-cli
superset-cli
# error: The term 'superset-cli' is not recognized
# run superset
superset
superset will run, but now I'm getting an error from psycopg2 about unknown host:
Loaded your LOCAL configuration at [c:\git\superset\superset_config.py]
logging was configured successfully
2022-08-06 06:29:08,311:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-08-06 06:29:08,317:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `FILTER_STATE_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
2022-08-06 06:29:08,318:WARNING:superset.utils.cache_manager:Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `FILTER_STATE_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `EXPLORE_FORM_DATA_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
2022-08-06 06:29:08,322:WARNING:superset.utils.cache_manager:Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `EXPLORE_FORM_DATA_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
2022-08-06 06:29:10,602:ERROR:flask_appbuilder.security.sqla.manager:DB Creation and initialization failed: (psycopg2.OperationalError) could not translate host name "None" to address: Unknown host
My config file c:\git\superset\superset_config.py has the following database settings:
DATABASE_HOST = os.getenv("DATABASE_HOST")
DATABASE_DB = os.getenv("DATABASE_DB")
POSTGRES_USER = os.getenv("POSTGRES_USER")
POSTGRES_PASSWORD = os.getenv("DATABASE_PASSWORD")
I could set those values in the superset_config.py or I could set the environment variables and let superset_config.py read them. However, my instance of superset is running in a hosted kubernetes cluster and the superset-postgres service is not exposed by external ip. The only service with an external ip is superset.
Still stuck...
I was way off track - once I found the Preset-io backend-sdk repo on github it started coming together.
https://github.com/preset-io/backend-sdk
Install superset-cli
mkdir superset_cli
cd superset_cli
py -3.9 -m venv .venv
.venv\Scripts\activate
pip install -U setuptools setuptools_scm wheel #for good measure
pip install "git+https://github.com/preset-io/backend-sdk.git"
Example command
# Export resources (databases, datasets, charts, dashboards)
# into a directory as YAML files from a superset site: https://superset.example.org
mkdir export
superset-cli -u [username] -p [password] https://superset.example.org export export

Connect Google Cloud Build to Google Cloud SQL

Google Cloud Run allows for using Cloud SQL. But what if you need Cloud SQL when building your container in Google Cloud Build? Is that possible?
Background
I have a Next.js project, that runs in a Container on Google Cloud Run. Pushing my code to Cloud Build (installing the stuff, generating static pages and putting everything in a Container) and deploying to Cloud Run works perfectly. 👌
Cloud SQL
But, I just added some functionality in which it also needs to some data from my PostgreSQL instance that runs on Google Cloud SQL. This data is used when building the project (generating the static pages).
Locally, on my machine, this works fine as the project can connect to my CloudSQL proxy. While running in CloudRun this should also work, as Cloud Run allows for connecting to my Postgres instance on Cloud SQL.
My problem
When building my project with Cloud Build, I need access to my database to be able to generate my static pages. I am looking for a way to connect my Docker cloud builder to Cloud SQL, perhaps just like Cloud Run (fully managed) provides a mechanism that connects using the Cloud SQL Proxy.
That way I could be connecting to /cloudsql/INSTANCE_CONNECTION_NAME while building my project!
Question
So my question is: How do I connect to my PostgreSQL instance on Google Cloud SQL via the Cloud SQL Proxy while building my project on Google Cloud Build?
Things like my database credentials, etc. already live in Secrets Manager, so I should be able to use those details I guess 🤔
You can use the container that you want (and you need) to generate your static pages, and download cloud sql proxy to open a tunnel with the database
- name: '<YOUR CONTAINER>'
entrypoint: 'sh'
args:
- -c
- |
wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy
chmod +x cloud_sql_proxy
./cloud_sql_proxy -instances=<my-project-id:us-central1:myPostgresInstance>=tcp:5432 &
<YOUR SCRIPT>
App engine has an exec wrapper which has the benefit of proxying your Cloud SQL in for you, so I use that to connect to the DB in cloud build (so do some google tutorials).
However, be warned of trouble ahead: Cloud Build runs exclusively* in us-central1 which means it'll be pathologically slow to connect from anywhere else. For one or two operations, I don't care but if you're running a whole suite of integration tests that simply will not work.
Also, you'll need to grant permission for GCB to access GCSQL.
steps:
- id: 'Connect to DB using appengine wrapper to help'
name: gcr.io/google-appengine/exec-wrapper
args:
[
'-i', # The image you want to connect to the db from
'$_GCR_HOSTNAME/$PROJECT_ID/$REPO_NAME:$SHORT_SHA',
'-s', # The postgres instance
'${PROJECT_ID}:${_POSTGRES_REGION}:${_POSTGRES_INSTANCE_NAME}',
'-e', # Get your secrets here...
'GCLOUD_ENV_SECRET_NAME=${_GCLOUD_ENV_SECRET_NAME}',
'--', # And then the command you want to run, in my case a database migration
'python',
'manage.py',
'migrate',
]
substitutions:
_GCLOUD_ENV_SECRET_NAME: mysecret
_GCR_HOSTNAME: eu.gcr.io
_POSTGRES_INSTANCE_NAME: my-instance
_POSTGRES_REGION: europe-west1
* unless you're willing to pay more and get very stung by Beta software, in which case you can use cloud build workers (at the time of writing are in Beta, anyway... I'll come back and update if they make it into production and fix the issues)
The ENV VARS (including DB connections) are not available during build steps.
However, you can use ENTRYPOINT (of Docker) to run commands when the container runs (after completing the build steps).
I was having the need to run DB migrations when a new build was deployed (i.e. when the container starts running) and using ENTRYPOINT (to a file/command) was able to run migrations (which require DB connection details, not available during the build-process).
"How to" part is pretty brief and is located here : https://stackoverflow.com/a/69088911/867451

How do i continue working with Amplify on a new machine?

I'm using react native for my project. On my old machine, when i ran amplify status, i had Auth, Api and Storage services listed.
I moved to my new machine, installed node, watchman, brew etc... and then navigated to my react native project and ran: react-native run-ios, and voila, my app is running. All the calls to my AWS Api, Auth and Storage are working perfectly.
Now i can make some amplify commands. Such as amplify status. I tried: amplify env add: here's what i got:
Users-MBP-2:projectname username$ amplify env add
Note: It is recommended to run this command from the root of your app directory
? Do you want to use an existing environment? Yes
? Choose the environment you would like to use: dev
Using default provider awscloudformation
✖ There was an error initializing your environment.
init failed
Error: ENOENT: no such file or directory, open '/Users/username/.aws/credentials'
at Object.openSync (fs.js:462:3)
at Proxy.readFileSync (fs.js:364:35)
at Object.readFileSync (/usr/local/lib/node_modules/#aws-amplify/cli/node_modules/aws-sdk/lib/util.js:95:26)
at IniLoader.parseFile (/usr/local/lib/node_modules/#aws-amplify/cli/node_modules/aws-sdk/lib/shared-ini/ini-loader.js:6:47)
at IniLoader.loadFrom (/usr/local/lib/node_modules/#aws-amplify/cli/node_modules/aws-sdk/lib/shared-ini/ini-loader.js:56:30)
at Config.region (/usr/local/lib/node_modules/#aws-amplify/cli/node_modules/aws-sdk/lib/node_loader.js:100:36)
at Config.set (/usr/local/lib/node_modules/#aws-amplify/cli/node_modules/aws-sdk/lib/config.js:507:39)
at Config.<anonymous> (/usr/local/lib/node_modules/#aws-amplify/cli/node_modules/aws-sdk/lib/config.js:342:12)
at Config.each (/usr/local/lib/node_modules/#aws-amplify/cli/node_modules/aws-sdk/lib/util.js:507:32)
at new Config (/usr/local/lib/node_modules/#aws-amplify/cli/node_modules/aws-sdk/lib/config.js:341:19) {
errno: -2,
syscall: 'open',
code: 'ENOENT',
path: '/Users/username/.aws/credentials'
}
Do you think credentials info needs to be brought/configured to my new machine?
When i run amplify configure project it's like doing an amplify init and building a project from scratch. I'm being asked:
? Enter a name for the project: ProjectName
? Choose your default editor: Visual Studio Code
? Choose the type of app that you're building javascript
Please tell us about your project
? What javascript framework are you using (Use arrow keys)
angular
ember
ionic
react
❯ react-native
vue
none
etc....
I also already have a region, username and accessKey, secretAccess key etc..
I do not want to replace or ruin anything in my current backend or current project! Whats going on?
Ensure amplify-cli is installed and you're logged in with your AWS details.
npm install -g #aws-amplify/cli
amplify configure
Running amplify configure is mainly to give the cli knowledge of your AWS account so subsequent commands can have access to things.
If you get amplify: command not found errors try restarting your terminal. If still no luck, you will need to check amplify has been added to your PATH variable.
Run amplify env add , but choose an existing environment. This will let you choose the environment you created on your other machine so you can pull those settings down to your new machine.
amplify env add
? Do you want to use an existing environment? Yes
Production
Follow up with:
amplify pull
You don't need to run amplify add auth again or anything. All of that will pull down automatically after you've done the above.
You DO NOT need to do all config again, but some for sure
You have to install amplify cli npm install -g #aws-amplify/cli
use amplify pull
https://docs.amplify.aws/cli/start#amplify-pull
Follow the rest of steps -
-- provide the accessKeyId, secretAccessKey
-- region
-- select amplify project
and then rest of app related thing like IDE, directory......
I tried every solution then I found this. (in MacBook)
% sudo -i
Password:
~ root# npm install -g #aws-amplify/cli
-- Ctrl+D to exist from Root user
% amplify pull --appId xxxx --envName yyyy.
Note: To get --appId xxxx --envName yyyy
Log in to the AWS console. Choose AWS Amplify. Click your app. Go to Backend
environments. Find the backend environment you wish to pull. Click
Edit backend. See top right then click 'Local setup instructions
' ( amplify pull --appId
YOUR_APP_ID --envName YOUR_ENV_NAME )
Waiting until it request to verify your amplify.
✔ Successfully received Amplify Studio tokens.
? Choose your default editor: Visual Studio Code
? Choose the type of app that you're building javascript
Please tell us about your project
? What javascript framework are you using react
? Source Directory Path: src
? Distribution Directory Path: build
? Build Command: npm run-script build
? Start Command: npm run-script start
✔ Synced UI components.
? Do you plan on modifying this backend? Yes
⠴ Building resource api/xxxx✅ GraphQL schema compiled successfully.
Edit your schema at ....
✔ Successfully pulled backend environment yyyy from the cloud.
✅
Successfully pulled backend environment staging from the cloud.
Run 'amplify pull' to sync future upstream changes.
% amplify pull
% npm install
% npm start
Hope this help every one!!
Happy Coding :)

Google Cloud Platform: cloudshell - is there any way to "keep" gcloud init configs?

Does anyone know of a way to persist configurations done using "gcloud init" commands inside cloudshell, so they don't vanish each time you disconnect?
I figured out how to persist python pip installs using the --user
example: pip install --user pandas
But, when I create a new configuration using gcloud init, use it for a bit, close cloudshell (or cloudshell times out on me), then reconnect later, the configurations are gone.
Not a big deal, I bounce between projects/etc so it's nice to have the configs saved so I can simply run
gcloud config configurations activate config-name
Thanks...Rich Murnane
Google Cloud Shell only persists data in your $HOME directory. Commands like gcloud init modify the environment variables and store configuration files in /tmp which is deleted when the VM is restarted. The VM is terminated after being idle for 20 minutes or 60 minutes depending on which document you read.
Google Cloud Shell is a Docker container. You can modify the docker image to customize to fit your needs. This method will allow you to install packages, tools, etc that are not located in your $HOME directory.
You can also store your files and configuration scripts on Google Cloud Storage. Modify .bashrc to download your cloud files and run your configuration script.
Either method will allow you to create a persistent environment.
This StackOverflow answer covers in detail what gcloud init does and how to basically emulate the same thing via script or command line.
gcloud init details
this isn't exactly what I wanted, but since my
account (userid) isn't changing, I'm simply going to
do the command
gcloud config set project second-project-name
good enough, thanks...Rich

Bitnami Redmine Windows Installer Upgrade to 3.3.1

Most of the documentation examples are for Linux:
https://docs.bitnami.com/installer/apps/redmine/#how-to-upgrade-redmine
I would like to see one for the Windows Server variety. Trying to upgrade 3.2.2 to 3.3.1. Client wants to keep it on local Windows only. No cloud.
Bitnami Developer here. Thanks for your comment, we will update soon the documentation of bitnami to add more guides of windows.
I have been able to migrate to redmine 3.2.2 to 3.3.1, these are the steps you have to follow:
Go to the manager-windows (C:\Bitnami\redmine-3.2.2-0\manager-windows.exe) and stop all the services. Then start again mysql. You should have something like this:
manager-windows
Do a dump of your mysql database. You can use the use-redmine console to do this (C:\Bitnami\redmine-3.2.2-0\use_redmine.exe) and execute the following:
mysqldump -u root -p --databases bitnami_redmine > backup.sql
Save that backup and download the last version of redmine stack installer (3.3.1-0): Bitnami redmine installers
Install it in your machine and open the manager-windows (C:\Bitnami\redmine-3.3.1-0\manager-windows.exe). Stop all services and start again the mysql service to restore the backup.
Start the use_redmine console(C:\Bitnami\redmine-3.3.1-0\use_redmine.exe)
Execute the following in the use_redmine console:
mysql -u root -p
Password: ****
mysql> drop database bitnami_redmine;
mysql> create database bitnami_redmine;
mysql> grant all privileges on bitnami_redmine.* to 'bn_redmine'#'localhost' identified by 'DATABASE_PASSWORD';
Restore the new database:
`mysql -u root -p bitnami_redmine < /path/to/your/backup.sql`
Edit the Redmine configuration file to update the database user password (the same that you set previously) at
C:\Bitnami\redmine-3.3.1-0\apps\redmine\htdocs\config\database.yml:
production:
adapter: mysql2
database: bitnami_redmine
host: localhost
username: bn_redmine
password: "DATABASE_PASSWORD"
encoding: utf8
In the use_redmine console migrate the database to the latest version:
bundle exec rake db:migrate RAILS_ENV=production
After this, you should be able to start all the services again in the C:\Bitnami\redmine-3.3.1-0\manager-windows.exe and log in in the application as always.