I am currently running Google Cloud Composer with a Composer version 2.0.9 and airflow version 2.1.4. I am trying install the most recent version of dbt (1.0.4 for core and 1.0.0 for the BigQuery plugin). Because cloud composter images has specific packages installed, I am getting conflicting PyPI dependency issues. When I try to fix one dependency another issue occurs. Does anyone know the specific set of packages installed that would resolve this issue? I have read the following posts by the community but I wanted to know if anyone has a solution for just using composer?
How to run DBT in airflow without copying our repo
How to set up dbt with Google Cloud Composer?
I was able to reproduce the behaviour you are seeing. Below are the dependency conflicts I saw in the Cloud Build logs. These conflicts are occurring between the dbt-core requirements and the pre-installed package requirements in Composer.
Pre-installed package requirements:
hologram 0.0.14 has requirement jsonschema<3.2,>=3.0, but you have jsonschema 3.2.0. ##=> can be installed manually
flask 1.1.4 has requirement click<8.0,>=5.1, but you have click 8.1.2.
apache-airflow 2.1.4+composer has requirement markupsafe<2.0,>=1.1.1, but you have markupsafe 2.0.1.
looker-sdk 22.4.0 has requirement typing-extensions>=4.1.1, but you have typing-extensions 3.10.0.2.
dbt-core requirements:
hologram 0.0.14 has requirement jsonschema<3.2,>=3.0, but you have jsonschema 3.2.0. ##=> can be installed manually
dbt-core 1.0.4 has requirement click<9,>=8, but you have click 7.1.2.
dbt-core 1.0.4 has requirement MarkupSafe==2.0.1, but you have markupsafe 1.1.1.
dbt-core 1.0.4 has requirement typing-extensions<3.11,>=3.7.4, but you have typing-extensions 4.1.1.
I tried downgrading the pre-installed packages, but subsequent package installations fail and it is not recommended as well.
Therefore, I would suggest using an external solution as stated in this thread you have linked. Quoting the workarounds given in #Ryan Yuan's answer here.
Using external services to run dbt jobs, e.g. Cloud Run.
Using Composer's KubernetesPodOperator(updated Composer 2 link). My colleague has put up a nice article on dbt discourse here going through the setup process.
Ignoring Composer's Dependency conflicts by setting Composer's environmental variable IGNORE_PYPI_DEPENDENCY_CONFLICTS to True.
However, I don't recommend this as it may cause potential issues.
Creating a Python virtual environment in Composer and install the dbt packages.
As mentioned by #Kabilan Mohanraj, the current version of dbt (1.0.4) and a more recent version of Composer has dependency issues (Composer version 2.0.9 and Airflow version 2.1.4). Therefore an alternative solution is needed. In my case, I played around and searched for a solution from other people in the community and found one person using a certain version of Composer and dbt that only had mimimal dependency issues. However, as mentioned by #Kabilan Mohanraj, Google does not recommend downgrading preinstalled packages, so this would not be a viable solution for something in production.
create composer through gcloud to use an older version that is not available via the Composer UI
gcloud composer environments create my_airflow_dbt_example
--location us-central1
--image-version composer-1.17.9-airflow-2.1.4
requirements
dbt-bigquery==0.21.0
jsonschema==3.1.1
packaging==20.9
For this specific composer version, you are downgrading jsonschema from 3.2.0 to 3.1.1 and packaging from 21.3 to 20.9
Related
I am using Cloud Foundry's nodejs profile and my nodejs package.json requires chartjs-node-canvas. That package uses node-canvas and node-canvas is based on Cairo. The node-canvas site says I have to add the cairo-devel package to Linux (apt-get) in order for canvas to be installed.
Is it possible to add software to the OS image running on cloud foundry? If so, how?
You can do that by vendoring the dependencies. When you vendor them, you'll build locally in an Ubuntu Bionic Linux container or VM. Node will build everything that's required and you will no longer need the cairo-devel package (it's only needed to build).
The process to vendor dependencies is documented here.
The other option is to use the Apt Buildpack which is described on this SO post. That can be used to install any apt packages.
I have installed OpenStack using PackStack on CentOS-7. I need to install successfully Tacker service with OpenStack PackStack on CentOS-7.
Any help on this would be helpful for me.
Thanks.
Follow the official openstack guide to install tacker https://docs.openstack.org/tacker/latest/install/manual_installation.html
There are .rpm packages in RDO repositories called openstack-tacker and openstack-tacker-common, install them and configure the service following the official guide.
Tacker also requires some other services (mistral and barbican) which need to be installed and configured, there are other deployment options which support deploying tacker and its dependencies in a single config, as example kolla-ansible.
I am a scientist who is exploring the use of Dask on Amazon Web Services. I have some experience with Dask, but none with AWS. I have a few large custom task graphs to execute, and a few colleagues who may want to do the same if I can show them how. I believe that I should be using Kubernetes with Helm because I fall into the "Try out Dask for the first time on a cloud-based system like Amazon, Google, or Microsoft Azure" category.
I also fall into the "Dynamically create a personal and ephemeral deployment for interactive use" category. Should I be trying native Dask-Kubernetes instead of Helm? It seems simpler, but it's hard to judge the trade-offs.
In either case, how do you provide Dask workers a uniform environment that includes your own Python packages (not on any package index)? The solution I've found suggests that packages need to be on a pip or conda index.
Thanks for any help!
Use Helm or Dask-Kubernetes ?
You can use either. Generally starting with Helm is simpler.
How to include custom packages
You can install custom software using pip or conda. They don't need to be on PyPI or the anaconda default channel. You can point pip or conda to other channels. Here is an example installing software using pip from github
pip install git+https://github.com/username/repository#branch
For small custom files you can also use the Client.upload_file method.
I am using Cloud Composer and I noticed that it selects the version of Apache Airflow and Python (2.7.x) for me. I want to use a different version of Airflow and/or Python. How can I change this?
Cloud Composer deploys the latest stable build of Airflow. New versions of Airflow are usually deployed by Composer within a a few weeks of their stable release. The Airflow version deployed and the Python version installed cannot be changed at this time. A future release of Cloud Composer may offer the ability to select the Airflow and/or Python version for new environments.
If you want to deploy a specific version of Airflow you will need to use the gcloud CLI tool in order to specify this. It is not currently possible to do this from the web front end.
Have a look at the follow page to see the available versions https://cloud.google.com/composer/docs/concepts/versioning/composer-versions
If you would like to deploy say Airflow 1.10 and Python 3 to your environment you would use the
--image-version
--python-version
flags in order to set this. For example if you used the following it would install with Composer 1.4.1, Airflow 1.10 and Python 3
gcloud beta composer environments create ENV_NAME --image-version composer-1.4.1-airflow-1.10.0 --python-version 3
You will need to specify all the other parameters and arguments required for the environment as well. The above only shows the two arguments to set the Airflow and Python versions.
I have a specific version of google-cloud-core in my server and I don’t know how to choose the other libraries in a way to make them suit with my google cloud core version.
I can’t just move to the latest versions because old programs cannot run into new versions for example BigQuery library.
My specific need is to know « how to know » which version of Google Cloud Storage should I choose according to the 0.26.0 core version.
Is there some repositories where we can find packages grouped by google-cloud-core versions?
In my case the version 1.6 of google-cloud-storage works but I found it just by downgrade and try again method !
Best regards
What you can do as a workaround is create a virtual environment, install a specific library - like google-cloud-storage - and check the dependencies installed with that library version. I made a quick test and installed a few versions of google-cloud-storage. For version 1.3.0, the google-cloud-core 0.26.0 dependency was installed.
You can do so by following these steps:
virtualenv env-name
source env-name/bin/activate
pip freeze (to check there is nothing there)
pip install google-cloud-storage==1.3.0
pip freeze (again)
Once finished you’ll see google-cloud-core 0.26.0 was installed.