Is there any way to can install airflow on kubernetes on premise - airflow-scheduler

I am new to airflow and need assistance on how to install airflow on k8s .
Needs are:
1 . How to Build docker image of airflow only for webserver and scheduler
2 . How to Build separate docker image for MySQL
3 . How to create airflow.cfg with kunernetes executor?
4 . Any sample would be appreciated.

If you mean Apache Airflow - you can just use existed helm chart, that described here https://airflow.apache.org/docs/apache-airflow/stable/kubernetes.html
Unfortunately from your description it is not clear what kind of problem you want to solve.

Related

Average CI CD duration on Gitlab

I'm new to gitlab and AWS EC2 and still learning .
I'm having a pipeline on gitlab to deploy an n-tiers app with 2 front in angular and a backend in Java spring boot + postgresql .
I'm doing the ng build on gitlab server then push it to my ec2 with an rsync , and for the spring part i do the ".tar " generation on my ec2 with some caching method it takes less than 20s to fully update the backend so theres not much problem there .
Each part containerized under docker with docker compose , i have to do docker-compose build no cache to take fresh data .
It takes 3 mn 38s to fully upload all parts of my app and around 2mn if i want to upload only one part ( like for exemple i want update only one of my front with a specific job) .
Is it too long or i am on an average time for CI CD ?
If you guys know other better way or more standard to push to production containerized app to AWS EC2 feel free to tell me .
thanks !

How can I run beta gcloud component like "gcloud beta artifacts docker images scan" within Cloud Build?

I am trying to include the Container Analyis API link in a Cloud Build pipeline.This is a beta component and with command line I need to install it first:
gcloud components install beta local-extract
then I can run the on demand container analyis (if the container is present locally):
gcloud beta artifacts docker images scan ubuntu:latest
My question is how I can use component like beta local-extract within Cloud Build ?
I tried to do a fist step and install the missing componentL
## Update components
- name: 'gcr.io/cloud-builders/gcloud'
args: ['components', 'install', 'beta', 'local-extract', '-q']
id: Update component
but as soon as I move to the next step the update is gone (since it is not in the container)
I also tried to install the component and then run the scan using (& or ;) but it is failling:
## Run vulnerability scan
- name: 'gcr.io/cloud-builders/gcloud'
args: ['components', 'install', 'beta', 'local-extract', '-q', ';', 'gcloud', 'beta', 'artifacts', 'docker', 'images', 'scan', 'ubuntu:latest', '--location=europe']
id: Run vulnaribility scan
and I get:
Already have image (with digest): gcr.io/cloud-builders/gcloud
ERROR: (gcloud.components.install) unrecognized arguments:
;
gcloud
beta
artifacts
docker
images
scan
ubuntu:latest
--location=europe (did you mean '--project'?)
To search the help text of gcloud commands, run:
gcloud help -- SEARCH_TERMS
so my question are:
how can I run "gcloud beta artifacts docker images scan ubuntu:latest" within Cloud Build ?
bonus: from the previous command how can I get the "scan" output value that I will need to pass as a parameter to my next step ? (I guess it should be something with --format)
You should try the cloud-sdk docker image:
https://github.com/GoogleCloudPlatform/cloud-sdk-docker
The Cloud Build team (implicitly?) recommends it:
https://github.com/GoogleCloudPlatform/cloud-builders/tree/master/gcloud
With the cloud-sdk-docker container you can change the entrypoint to bash pipe gcloud commands together. Here is an (ugly) example:
https://github.com/GoogleCloudPlatform/functions-framework-cpp/blob/d3a40821ff0c7716bfc5d2ca1037bcce4750f2d6/ci/build-examples.yaml#L419-L432
As to your bonus question. Yes, --format=value(the.name.of.the.field) is probably what you want. The trick is to know the name of the field. I usually start with --format=json on my development workstation to figure out the name.
The problem comes from Cloud Build. It cache some often used images and if you want to use a brand new feature in GCLOUD CLI the cache can be too old.
I performed a test tonight, the version is 326 in cache. the 328 has just been released. So, the cached version has 2 weeks old, maybe too old for your feature. It could be worse in your region!
The solution to fix this, is to explicitly request the latest version.
Go to this url gcr.io/cloud-builders/gcloud
Copy the latest version
Paste the full version name in the step of your Cloud Build pipeline.
The side effect is a longer build. Indeed, because this latest image isn't cached, it has to be downloaded in Cloud Build.

How to restart webserver in Cloud Composer

We recently met a known issue on airflow:
Airflow "This DAG isnt available in the webserver DagBag object "
Now we used a temporary solution to restart whole environment by changing configurations but this is not an efficient method.
The best workaround now we think is to restart webservers on cloud composer, but we didn't find any command to restart webserver. Is it a possible action?
Thanks!
For those who wander and find this thread: currently the for versions >= 1.13.1 Composer has a preview for a web server restart
Only certain types of updates will cause the webserver container to be restarted, like adding, removing, or upgrading one of the PyPI packages or like changing an Airflow setting.
You can do for example:
# Set some arbitrary Airflow config value to force a webserver rebuild.
gcloud composer environments update ${ENVIRONMENT_NAME} \
--location=${ENV_LOCATION} \
--update-airflow-configs=dummy=true
# Remove the previously set config value.
gcloud composer environments update ${ENVIRONMENT_NAME} \
--location=${ENV_LOCATION} \
--remove-airflow-configs=dummy
From Google Cloud Docs:
gcloud beta composer environments restart-web-server ENVIRONMENT_NAME --location=LOCATION
I finally found an alternative solution!
Based on this document:
https://cloud.google.com/composer/docs/how-to/managing/deploy-webserver
We can build airflow webservers on kubernetes (Yes, please throw away built-in webserver). So we can kill webserver pods to force restart =)
From console dags can be retrieved, we can list all dag present. There other commands too.

Zeppelin pyspark interpreter not able to submit application in YARN

Environment : AWS EMR emr-5.11.1 , Zeppelin 0.7.3 , Spark 2.2.1
Problem : Zeppelin pyspark interpreter is not submitting jobs as applications in YARN
As per this , i have done following changes , with no effect
set SPARK_HOME
added spark.executer.memory=5g , spark.cores.max ,
master=yarn-client , spark.home in pyspark interpreter tab in zeppelin
added spark.dynamicAllocation.enabled = true in yarn-site.xml
Restarted interpreter and zeppelin process
Please Help
Solution 1
I have the same question, please upgrade to 0.8.0, the newest version solve that question.
Solution 2
edit $ZEPPELIN_HOME/conf/zeppelin-env.sh, add export SPARK_SUBMIT_OPTIONS="--num-executors 10 --driver-memory 8g --executor-memory 10g --executor-cores 4 ".
if you don't have zeppelin-env.sh, please copy and rename zeppelin-env.sh.template to zeppelin-env.sh.
Solution 3
edit $SPARK_CONF_DIR/spark-defaults.conf and modify what you want to add.
After that, restart your server.

Setting up jupyterhub docker using one of the jupyter stacks

I'm trying to get a Jupyterhub up and running. 2.7 Python kernels are required, so basically whatever in the docker-stacks repo would be great. In the documentation, it mentions that it can work with Jupyterhub using DockerSpawner, but I can't quite see how it all fits together. Is anyone aware of a simple step by step guide to get this working?
To use any docker image first pull that from docker hub - docker pull jupyter/scipynotebook
Now install dockerspawner - pip install dockerspawner
Add necessary lines to jupyterhub_config.py
(https://github.com/jupyterhub/dockerspawner/blob/master/README.md)
The way to use specific docker image this line does the magic - c.DockerSpawner.image = 'jupyter/scipynotebook'