How do I install Apache Superset CLI on Windows? - apache-superset

Superset offers a CLI for managing the Superset instance, but I am unable to find instructions for getting it installed and talking to my instance of Superset.
My local machine is Windows, but my instance of Superset is running in a hosted Kubernetes cluster.
-- Update 2 2022.08.06
After some continued exploration, have found some steps that seem to be getting me closer.
# clone the Superset repo
git clone https://github.com/apache/superset
cd superset
# create a virtual environment using Python 3.9,
# which is compatible with the current version of numpy
py -3.9 -m venv .venv
.venv\Scripts\activate
# install the Superset package
pip install apache-superset
# install requirements (not 100% sure which requirements are needed)
pip install -r .\requirements\base.txt
pip install -r .\requirements\development.txt
# install psycopg2
pip install psycopg2
# run superset-cli
superset-cli
# error: The term 'superset-cli' is not recognized
# run superset
superset
superset will run, but now I'm getting an error from psycopg2 about unknown host:
Loaded your LOCAL configuration at [c:\git\superset\superset_config.py]
logging was configured successfully
2022-08-06 06:29:08,311:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-08-06 06:29:08,317:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `FILTER_STATE_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
2022-08-06 06:29:08,318:WARNING:superset.utils.cache_manager:Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `FILTER_STATE_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `EXPLORE_FORM_DATA_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
2022-08-06 06:29:08,322:WARNING:superset.utils.cache_manager:Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `EXPLORE_FORM_DATA_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
2022-08-06 06:29:10,602:ERROR:flask_appbuilder.security.sqla.manager:DB Creation and initialization failed: (psycopg2.OperationalError) could not translate host name "None" to address: Unknown host
My config file c:\git\superset\superset_config.py has the following database settings:
DATABASE_HOST = os.getenv("DATABASE_HOST")
DATABASE_DB = os.getenv("DATABASE_DB")
POSTGRES_USER = os.getenv("POSTGRES_USER")
POSTGRES_PASSWORD = os.getenv("DATABASE_PASSWORD")
I could set those values in the superset_config.py or I could set the environment variables and let superset_config.py read them. However, my instance of superset is running in a hosted kubernetes cluster and the superset-postgres service is not exposed by external ip. The only service with an external ip is superset.
Still stuck...

I was way off track - once I found the Preset-io backend-sdk repo on github it started coming together.
https://github.com/preset-io/backend-sdk
Install superset-cli
mkdir superset_cli
cd superset_cli
py -3.9 -m venv .venv
.venv\Scripts\activate
pip install -U setuptools setuptools_scm wheel #for good measure
pip install "git+https://github.com/preset-io/backend-sdk.git"
Example command
# Export resources (databases, datasets, charts, dashboards)
# into a directory as YAML files from a superset site: https://superset.example.org
mkdir export
superset-cli -u [username] -p [password] https://superset.example.org export export

Related

How to connect apache superset directly with google BigQuery?

I am running Apache superset on GCP instance and it works fine with Sqlite database which is default in superset and I don't need to configure so many things. But my requirement is that I need superset to connect directly with BigQuery instead of Sqlite and I don't have developer background. So, is there an easy way to do that without heavy codes?
Connecting to BigQuery is very well documented here in Preset's Superset user documentation https://docs.preset.io/docs/big-query-database
Following the steps mentioned at the official Google Cloud page here, you need to do the following
Install pybigquery
pip install pybigquery
Download your Google Cloud authorization json key file
From your terminal instance, set GOOGLE_APPLICATION_CREDENTIALS env. var to the path of your json key file
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/[json_file].json"
Elbehery is right. I don't have enough rep to comment, but I wanted to note that Apache has created docs for this.
FWIW, I couldn't use the UI for importing credentials.json so I set it as an env var in my Docker image. Here are the commands and steps I run locally:
# Setup virtual environment (exit by typing "deactivate")
pip3 install virtualenv
python3 -m virtualenv ./.venv
source ./.venv/bin/activate
​
# Download Superset
git clone https://github.com/apache/superset.git
cd superset/
​
# Create a copy of your credentials for docker to use
cp ~/.config/gcloud/application_default_credentials.json docker/credentials.json
echo "GOOGLE_APPLICATION_CREDENTIALS=docker/credentials.json" >> docker/.env-non-dev
​
# Run Superset
docker-compose -f docker-compose-non-dev.yml pull
docker-compose -f docker-compose-non-dev.yml up
​
Now that Superset is running locally:
visit http://0.0.0.0:8088/ in your web browser
In the top right of UI, click the +DATABASE button (or + > Data > Connect Database)
In popup window Click Supported Databases then (at the bottom) Other
Set DISPLAY NAME: BigQuery
Set SQLALCHEMY URI: bigquery://my_project_id
Click Test Connection
Click Connect
​
Now that BigQuery is integrated into Superset:
In the top of page Click SQL Lab > SQL Editor

Perform actions on server after CircleCI deployment

I have a Django project that I deploy on a server using CircleCI. The server is a basic cloud server, and I can SSH into it.
I set up the deployment section of my circle.yml file, and everything is working fine. I would like to automatically perform some actions on the server after the deployment (such as migrating the database or reloading gunicorn).
I there a way to do that with CircleCI? I looked in the docs but couldn't find anything related to this particular problem. I also tried to put ssh user#my_server_ip after my deployment step, but then I get stuck and cannot perform any action. I can successfully SSH in, but the rest of the commands is not called.
Here is what my ideal circle.yml file would look like:
deployment:
staging:
branch: develop
commands:
- rsync --update ./requirements.txt user#server:/home/user/requirements.txt
- rsync -r --update ./myapp/ user#server:/home/user/myapp/
- ssh user#server
- workon myapp_venv
- cd /home/user/
- pip install -r requirements.txt
I solved the problem by putting a post_deploy.sh file on the server, and putting this line on the circle.yml:
ssh -i ~/.ssh/id_myhost user#server 'post_deploy.sh'
It executes the instructions in the post_deploy.sh file, which is exactly what I wanted.

AWS: CodeDeploy for a Docker Compose project?

My current objective is to have Travis deploy our Django+Docker-Compose project upon successful merge of a pull request to our Git master branch. I have done some work setting up our AWS CodeDeploy since Travis has builtin support for it. When I got to the AppSpec and actual deployment part, at first I tried to have an AfterInstall script do docker-compose build and then have an ApplicationStart script do docker-compose up. The containers that have images pulled from the web are our PostgreSQL container (named db, image aidanlister/postgres-hstore which is the usual postgres image plus the hstore extension), the Redis container (uses the redis image), and the Selenium container (image selenium/standalone-firefox). The other two containers, web and worker, which are the Django server and Celery worker respectively, use the same Dockerfile to build an image. The main command is:
CMD paver docker_run
which uses a pavement.py file:
from paver.easy import task
from paver.easy import sh
#task
def docker_run():
migrate()
collectStatic()
updateRequirements()
startServer()
#task
def migrate():
sh('./manage.py makemigrations --noinput')
sh('./manage.py migrate --noinput')
#task
def collectStatic():
sh('./manage.py collectstatic --noinput')
# find any updates to existing packages, install any new packages
#task
def updateRequirements():
sh('pip install --upgrade -r requirements.txt')
#task
def startServer():
sh('./manage.py runserver 0.0.0.0:8000')
Here is what I (think I) need to make happen each time a pull request is merged:
Have Travis deploy changes using CodeDeploy, based on deploy section in .travis.yml tailored to our CodeDeploy setup
Start our Docker containers on AWS after successful deployment using our docker-compose.yml
How do I get this second step to happen? I'm pretty sure ECS is actually not what is needed here. My current status right now is that I can get Docker started with sudo service docker start but I cannot get docker-compose up to be successful. Though deployments are reported as "successful", this is only because the docker-compose up command is run in the background in the Validate Service section script. In fact, when I try to do docker-compose up manually when ssh'd into the EC2 instance, I get stuck building one of the containers, right before the CMD paver docker_run part of the Dockerfile.
This took a long time to work out, but I finally figured out a way to deploy a Django+Docker-Compose project with CodeDeploy without Docker-Machine or ECS.
One thing that was important was to make an alternate docker-compose.yml that excluded the selenium container--all it did was cause problems and was only useful for local testing. In addition, it was important to choose an instance type that could handle building containers. The reason why containers couldn't be built from our Dockerfile was that the instance simply did not have the memory to complete the build. Instead of a t1.micro instance, an m3.medium is what worked. It is also important to have sufficient disk space--8GB is far too small. To be safe, 256GB would be ideal.
It is important to have an After Install script run service docker start when doing the necessary Docker installation and setup (including installing Docker-Compose). This is to explicitly start running the Docker daemon--without this command, you will get the error Could not connect to Docker daemon. When installing Docker-Compose, it is important to place it in /opt/bin/ so that the binary is used via /opt/bin/docker-compose. There are problems with placing it in /usr/local/bin (I don't exactly remember what problems, but it's related to the particular Linux distribution for the Amazon Linux AMI). The After Install script needs to be run as root (runas: root in the appspec.yml AfterInstall section).
Additionally, the final phase of deployment, which is starting up the containers with docker-compose up (more specifically /opt/bin/docker-compose -f docker-compose-aws.yml up), needs to be run in the background with stdin and stdout redirected to /dev/null:
/opt/bin/docker-compose -f docker-compose-aws.yml up -d > /dev/null 2> /dev/null < /dev/null &
Otherwise, once the server is started, the deployment will hang because the final script command (in the ApplicationStart section of my appspec.yml in my case) doesn't exit. This will probably result in a deployment failure after the default deployment timeout of 1 hour.
If all goes well, then the site can finally be accessed at the instance's public DNS and port in your browser.

How can i install puppet cluster on Amazon EC2 instances?

I'm using ubuntu 12.04 AMI in EC2 for creating puppet cluster and i'm facing problems while configuring it.
The problem is that the master is not able to recognize the slaves.
Do i need more packages other than mysql
/etc/mysql/my.cnf
what changes do i need in the above file?
Puppet is a configuration management tool that allows automating the process of defining and maintaining consistent state of several developer workstations. It is a descriptive, centralized and client-server based system. The central server is configured and the clients synchronize themselves to it to ensure that all systems end in the described state. For instance, the task of ensuring the same development environment on all developer systems in a project can be easily accomplished using Puppet.
Here is a quick procedure to set up a Puppet server and one Puppet client on Amazon EC2 instance having Ubuntu OS, and also installing Puppet Dashboard on server to view the status of the clients.
Prerequisites
Two ec2 instances set up with Ubuntu ami.
One instance named as puppetserver and other as puppetclient.
Procedure
Puppet server and client set up
Configuring hosts files View the /etc/hostname file on puppetserver and puppetclient. These are the Puppet server and client hostnames respectively
Edit /etc/hosts file on both the systems. Add server and client IPs and corresponding hostnames.
Setting up the Puppet Server
Enabling the Puppet Labs Package repository
Download the "puppetlabs-release" package for the OS (here, Ubuntu 12.04) on Puppet server
Install the package by running
dpkg -i
Run apt-get update to get new list of available packages.
For example, to enable the repository for Ubuntu 12.04, Precise Pangolin:
wget https://apt.puppetlabs.com/puppetlabs-release-precise.deb
sudo dpkg -i puppetlabs-release-precise.deb
sudo apt-get updateInstall Puppet
Install Puppet
Install puppetmaster
sudo apt-get update sudo apt-get install puppetmaster
Setting up the Puppet Client
Install Puppet on the puppet client(s)
sudo apt-get update sudo apt-get install puppet
Specify the Puppet server domain name on the client. To do this, modify the
/etc/puppet/puppet.conf
file and add the line
server=.
The client can now connect to the Puppet master.
Start the Puppet agent service for establishing first communication between server and client.
sudo puppet agent --verbose --no-daemonize --onetime
This starts a connection to the Puppet master process that is listening on port 8140 on the Puppet server. The output will be verbose, and the agent will not continue running in the background as a daemon. Also, it will run only one time, that is, after the connection is closed, the agent process will exit. The output looks like:
The client has made itself known to the server by sending an SSL certificate request. The server needs to certify the client.
To view the list of yet-to-be signed certificates on the server
sudo puppet cert --list
This lists the following
Sign the client node's SSL certificate
sudo puppet cert --sign <puppet client name>
Client can now establish full connection to the server and poll the Puppet master for any configuration updations.
Defining Configurations
We have set up puppet on both Puppet server and client and have also established communication between the two machines. Next step is to define the configuration for the target systems using puppet manifest. These manifests are specified in site.pp file.
As an example, we define a manifest that will create a helloworld.txt file on the client.
Defining manifest
Put the following manifest definition in /etc/puppet/manifests/site.pp file,
node "<puppet client hostname>" { file { "/home/ubuntu/helloworld.txt": content => "This is test content", ensure => file, owner => "ubuntu", group => "ubuntu", mode => 0644 } }
This manifest defines that the puppet client must have a helloworld.txt file
in /home/ubuntu/ folder with content, This is test content.
Getting changes on client
On puppet client, run the following command.
sudo puppet agent -t
The puppet client pulls the manifests defined in the site.pp file on the puppet server. It learned that a file named helloworld.txt with defined specifications, is expected to exist at location /home/ubuntu. Since, no such file exists on the client, the agent takes action and creates the file.
View the 'helloworld.txt' file
To verify that the client exists in a state defined by the Puppet server, run the following command
sudo vi /home/ubuntu/helloworld.txt
The file contents are same as defined in the manifest definition on the server.
Installing Puppet Dashboard
Overview
Puppet Dashboard is a GUI that interfaces with Puppet. It can be used to view and report the status of all the client nodes. Puppet dashboard runs on port 3000 on the puppet server.
Following are the steps for set up
Installing external dependencies
Dashboard is a Ruby on Rails web app and thus requires certain software to be installed
RubyGems
Rake version 0.8.3 or newer
MySQL database server version 5.x
Ruby-MySQL bindings version 2.7.x or 2.8.x
Install the packages
sudo apt-get install -y build-essential irb libmysql-ruby libmysqlclient-dev libopenssl-ruby libreadline-ruby mysql-server rake rdoc ri ruby ruby-dev
Install RubyGems package system
( URL="http://production.cf.rubygems.org/rubygems/rubygems-1.3.7.tgz" PACKAGE=$(echo $URL | sed "s/\.[^\.]*$//; s/^.*\///") cd $(mktemp -d /tmp/install_rubygems.XXXXXXXXXX) && \ wget -c -t10 -T20 -q $URL && \ tar xfz $PACKAGE.tgz && \ cd $PACKAGE && \ sudo ruby setup.rb )
Create gem as an alternative name for gem1.8
sudo update-alternatives --install /usr/bin/gem gem /usr/bin/gem1.8 1
Installing Puppet Dashboard
Install puppet-dashboard from puppetlabs package repository
sudo apt-get update sudo apt-get install puppet-dashboard
Configuring Dashboard
Modify the database.yml file. It can be found at /usr/share/puppet-dashboard/config/database.yml.
Under the key-value pairs for production environment, the database value 'dashboard_production' specifies the dashboard database name, and username value 'dashboard' specifies the user for this database. In the next step, we will create both the database and the user. password value is the password for MySQL.
Creating and Configuring MySQL database
Create the user and database for puppet-dashboard. Navigate to MySQL command line
CREATE DATABASE dashboard_production CHARACTER SET utf8; CREATE USER 'dashboard'#'localhost' IDENTIFIED BY 'my_password'; GRANT ALL PRIVILEGES ON dashboard_production.* TO 'dashboard'#'localhost';
Configure MySQL's maximum packet size to permit larger rows in database
set global max_allowed_packet = 33554432;
Also modify the mysql configuration file /etc/mysql/my.cnf
Allowing 32MB allows an occasional 17MB row with plenty of spare room
max_allowed_packet = 32M
To create dashboard tables, run the following command in the puppet-dashboard folder
cd /usr/share/puppet-dashboard rake RAILS_ENV=production db:migrate
Testing that Dashboard is working
Start the dashboard using Ruby’s built-in WEBrick server
cd /usr/share/puppet-dashboard
sudo ./script/server -e production
Dashboard instance starts on port 3000 using the “production” environment. Dashboard’s UI can be viewed at :3000
Configure puppet
Both the puppet server and client need to be configured for the dashboard to receive reports.
Configure agent nodes to submit reports to master by turning their reporting ON.
puppet.conf (on each agent)
[agent]
report = true
Configure the server. Add the http report handler to puppet server's reports setting and set reporturl to Dashboard instance’s reports/upload URL
puppet.conf (on puppet master)
[master]
reports = store, http
reporturl = http://<server hostname>:3000/reports/upload
For enabling dashboard's external node classifier(ENC),
puppet.conf (on puppet master)
[master]
node_terminus = exec
external_nodes = /usr/bin/env PUPPET_DASHBOARD_URL=http://<server hostname>:3000 /usr/share/puppet-dashboard/bin/external_node
Testing Puppet's connection to Dashboard
Restart the puppet master
Run one of the puppet agents to test the configurations
sudo puppet agent -t
The output will be:
This means that the report has arrived. To process it, we will activate the delayed_job workers.
Starting delayed_job workers
Run the following command
cd /usr/share/puppet-dashboard
sudo env RAILS_ENV=production script/delayed_job -p dashboard -n 1 -m start
This starts the delayed_job workers, and completes the pending task.
Thus, puppet is now installed on two EC2 instances, out of which one is server and the other is client. Also, puppet-dashboard is installed to view the status of the client nodes.

How did you setup your Django dev environment?

I'm trying to setup a local Django dev environment using VMs enabled with Vagrant but I'm not sure what's the best way to go about it.
I did a git clone for Django files from production server and installed all the modules that the production server has on my local VM. I wanted to avoid installing a database on my local VM but ran into some problems with the sessions. The local machine is using SESSION_COOKIE_DOMAIN='localhost' and the production is using SESSION_COOKIE_DOMAIN='.mydomain.com' so that creates some confusion.
Not to mention that on the setting.py on my dev environment I had to change IPs to point to the public IP address of the database (thus poking a hole on the security) while my production settings.py is using the local IPs so I ended up using different settings.py files.
I can continue experimenting with new methods but I really have to get going with the project and I'm pretty sure some people had this figured out already.
So how did you setup your Django dev environment?
I have a public repo on GitHub available here:
https://github.com/FlipperPA/djangovagrant
Instructions from the README.md:
Django / Python / MySQL
This is a Vagrant project for Django development.
This does not yet support berkshelf or librarian; all necessary repos are included in 'cookbooks'.
Prerequisites, all platforms:
Virtualbox https://www.virtualbox.org/wiki/Downloads
Vagrant http://downloads.vagrantup.com/
Pre-requisites, Windows only:
git-bash
ruby rvm
Fairly easy to get it running:
vagrant up
vagrant ssh djangovm
** (Note: You are now in the Virtualbox VM as superuser vagrant)
sudo apt-get install python-pip
** (Note: PIP is a Python package manager)
sudo apt-get install python-mysqldb
sudo pip install django
Starting a Django project:
django-admin.py startproject django_project
cd django_project
python manage.py runserver [::]:8000
The VM is configured to use port forwarding. If everything went right, you should be able to access the running server through the browser on your computer running the virtual machine at this url:
http://localhost:8001/
New to Django? Next steps? I highly recommend: http://www.tangowithdjango.com/
For more advanced topics, check out Two Scoops of Django: http://twoscoopspress.org/
there are a few django Apps that I've seen to manage this but I always prefer the following in my settings.py as the number of different configs are usually minimal
SITE_TYPE = environ.get( 'SITE_TYPE', 'DEV' )
if SITE_TYPE == 'LIVE':
DEBUG = False
DEFAULT_HOST = ''
else:
DEBUG = True
DEFAULT_HOST = '50.56.82.194'
EMAIL_HOST = DEFAULT_HOST
I can recommend this repository.
You can modify it to support Django projects.
Vagrantfile updates:
config.vm.define "web1", primary: true do |web1_config|
web1_config.ssh.forward_agent = true
# Create a private network, which allows host-only access to the machine
web1_config.vm.network "private_network", ip: "192.168.11.10"
web1_config.vm.hostname = "web1.#{domain}"
web1_config.vm.provision "shell", path: "provisioners/shell/python.setup.sh"
web1_config.vm.provision "shell", path: "provisioners/shell/application.setup.sh"
end
Then add a provisioners/shell/application.setup.sh file with the following content:
#!/bin/bash
local_user=vagrant
if [ ! -n "$(grep "^bitbucket.org " /home/$local_user/.ssh/known_hosts)" ]; then
ssh-keyscan bitbucket.org >> ~/.ssh/known_hosts 2>/dev/null;
fi
if [[ ! -d "/home/$local_user/app" ]]; then
git clone git#bitbucket.org:czerasz/sample-django-app.git /home/$local_user/app
chown -R $local_user:$local_user /home/$local_user/app
su - $local_user -c "source /usr/local/bin/virtualenvwrapper.sh && mkvirtualenv sample-django-app-env && workon sample-django-app-env && pip install -r /home/$local_user/app/requirements.txt"
fi
Change the repository address (git#bitbucket.org:czerasz/sample-django-app.git) and make also sure that you have a requirements.txt in the root of your git repository. Run vagrant up.
Vagrant will start two machines:
web1 with your django project
db1 with a PoestgreSQL database
If you still have issues add the following to your Vagrantfile:
web1_config.ssh.private_key_path = [ '~/.vagrant.d/insecure_private_key', '~/.ssh/bitbucket' ]
And execute this command on your host (the machine where you run vagrant):
ssh-add ~/.ssh/bitbucket
The ~/.ssh/bitbucket is the ssh private key which you use for bitbucket. It can be ~/.ssh/id_rsa or something different depending how you configured it.