Accessing Airflow REST API in AWS Managed Workflows? - amazon-web-services

I have Airflow running in AWS MWAA, I would like to access REST API and there are 2 ways to do this but doesn't seem to work for me.
Overriding api.auth_backend. This used to work and now AWS MWAA won't allow you to add this, it is consider as 'blocklist' and not allow.
api.auth_backend = airflow.api.auth.backend.default
Using MWAA Cli(Python). This doesn't work if any of the DAGs uses packages that are in requirments.txt file.
a. as an example, I have "paramiko" in requirements.txt because I have a task that uses SSHOperator. The MWAA Cli fails with "no module paramiko"
b. Also noted here, https://docs.aws.amazon.com/mwaa/latest/userguide/access-airflow-ui.html
"Any command that parses a DAG (such as list_dags, backfill) will fail if the DAG uses plugins that depend on packages that are installed through requirements.txt."

We are using MWAA 2.0.2 and managed to use Airflow's Rest-API through MWAA CLI, basically following the instructions and sample codes of the Apache Airflow CLI command reference. You'll notice that not all Rest-API calls are supported, but many of them are (even when you have a requirements.txt in place).
Also have a look at AWS sample codes on GitHub.

Related

EMR Serverless Airflow Operator not allowing EMR custom images

I want to launch a Spark job on EMR Serverless from Airflow. I want to use Spark 3.3.0 and Scala 2.13 but the 6.9.0 EMR Release ships with Scala 2.12. I created a FAT jar including all Spark dependencies and it won't work either. As an alternative, I am trying to use an EMR custom image by creating an application using --image-configuration with the Airflow operator but it won't just pass through all the arguments from the boto API.
create_app = EmrServerlessCreateApplicationOperator(
task_id="create_my_app",
job_type="SPARK",
release_label="emr-6.9.0",
config={"name": "data-ingestion",
"imageConfiguration": {
"imageUri": "xxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/emr-custom-image:0.0.1"}})
Airflow gives the following error message:
Unknown parameter in input: "imageConfiguration", must be one of:
name, releaseLabel, type, clientToken, initialCapacity, maximumCapacity, tags, autoStartConfiguration, autoStopConfiguration, networkConfiguration
This other config won't work either:
config={"name": "data-ingestion",
"imageUri": "xxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/emr-custom-image:0.0.1"})
Does anybody have any ideas other than downgrading my Scala version?
Airflow operator passes the argument to the boto3 client, and this client create the application.
The configuration imageConfiguration is added to boto3 client in 1.26.44 (PR), and the other configuration are added in different version (please check the changelog).
So you can try to upgrade the version of boto3 in you Airflow server, provided that it is compatible with the others dependencies, and if not, you may need to upgrade your Airflow version.

Serverless and AWS deploy issue

I have to update a website on aws using serverless deploy.
This website were not created by me, it's the first time I work with serverless and AWS solutions.
I have the source code, deploy files, etc, from the last person in charge.
I run a before-deploy.js script to create all local files, check them to see if the updates went ok. Everything's fine.
But anytime I try to deploy using the simple command "serverless deploy", it fails printing this error :
CREATE_FAILED: MainStaticSite (AWS::S3::Bucket)
“mywebsite.com” already exists
I don’t really understand this error, as I know the website already exists but I just want to update it.
I tried more specific commands like :
serverless deploy -v --stage production --region eu-west-1
But this one only shows this output :
Framework Core: 3.10.1
Plugin: 6.2.0
SDK: 4.3.2
PS
And doesn't updates the website.
I changed the keys on AWS, maybe it's because of this ?
Looks like he doesn’t want to overwrite the existing files, but no idea why.
If someone has an answer or a lead.
Thank you :)

AWS cloudformation: How to run cfn-nag locally in Windows

I have a cloud formation template where I have all the resources and details for the project.
I have the cfn-lint setup locally and it is running perfectly fine. However when I push the code changes, build fails at deployment stage due to cfn-nag stating some simple changes which could be fixed.
I'm using windows machine and I need a way to run this cfn-nag locally so that I could check this just like cfn-lint and fix them locally instead of waiting 40 minutes for build till it reaches deployment stage.
I referred several posts online, found below two helpful
https://stelligent.com/2018/03/23/validating-aws-cloudformation-templates-with-cfn_nag-and-mu/
https://github.com/stelligent/cfn_nag
What is the difference between cfn-nag and cfn-lint and why lint is not failing on what cfn-nag is complaining about?
The above links have some instructions on Ruby and Brew but I'm using Nodejs, felt lost. Please help.
CFN-Nag looks for patterns in AWS CloudFormation templates that may indicate insecure infrastructure,
Ex:
IAM rules that are too permissive (wildcards),
Security group rules that are too permissive (wildcards),
Access logs that aren’t enabled,
Encryption that isn’t enabled,
CFN-Lint scans the AWS CloudFormation template by processing a collection of Rules, where every rule handles a specific function check or validation of the template. It validates against AWS CloudFormation Resource specification.
This collection of rules can be extended with custom rules using the --append-rules argument.
Ex: Whitespaces, alignment(YAML), type checks, valid values for resource properties, and other best practices.
Those two links you previded above have all the information needed, just not directly for a Nodejs developer using a Windows machine.
Step1: Pull the docket image stelligent/cfn-nag
Step2: Add the script to your package.json for cfn-nag
Ex:
"scripts" : {
"cfn:nag": "cfn-nag"
}
If you're using docker-compose.yml
Add the cfn-nag image details to your docker-compose.yml like below
cfn-nag:
image: "stelligent/cfn-nag"
volumes:
-./path_of_cfn_file_to_copy: /path_to_copy_to
command: ${COMMAND: -/path_to_copy_tp/cfn_file}
Just set the scripts in package.json to run via docker-compose
"cfn:nag": "docker-compose run --rm cfn-nag"

GCP, Composer, Airflow, Operators

As Google Cloud Composer uses Cloud Storage to store Apache Airflow DAGs. However, where the operators are stored ? I am getting an error as below:
Broken DAG: [/home/airflow/gcs/dags/example_pubsub_flow.py] cannot import name PubSubSubscriptionCreateOperator.
This operator was added in Airflow 1.10.0 . As of today, Cloud Composer is still using Airflow 1.9.0, hence this operator is not available yet. You can add this as a plugin.
apparently, according to the following post in this message in the Composer Google Group list, to install as a plugin the contrib is not needed to add the Plugin boilerplate.
It is enough with registering the plugins via this command:
gcloud beta composer environments storage plugins import --environment dw --location us-central1 --source=custom_operators.py
See here for detail.
The drawback is that if your contrib operator uses others you will have to copy also those and modify the way they are imported in python, using:
from my_custom_operator import MyCustomOperator
instead of:
from airflow.contrib.operators.my_custom_operator import MyCustomOperator

AWS EMR script-runner access error

I'm running emr-5.12.0, with Amazon 2.8.3, Hive 2.3.2, Hue 4.1.0, Livy 0.4.0, Spark 2.2.1 and Zeppelin 0.7.3 on 1 m4.large as my master node and 1 m4.large as core node.
I am trying to execute a bootstrap action that configures some parts of the cluster. One of these includes the line:
sudo sed -i '/zeppelin.pyspark.python/c\ \"zepplin.pyspark.python\" : \"python3\",' /etc/alternatives/zeppelin-conf/interpreter.json
It makes sure that the Zeppelin uses python3.4 instead of python2.7. It works fine if I execute this in the terminal after SSH'ing to the master node, but it fails when I submit it as a Custom JAR step on the AWS Web interface. I get the following error:
ed: can't read /etc/alternatives/zeppelin-conf/interpreter.json
: No such file or directory
Command exiting with ret '2'
The same thing happens if I use
sudo sed -i '/zeppelin.pyspark.python/c\ \"zepplin.pyspark.python\" : \"python3\",' /etc/zeppelin-conf/interpreter.json
Obviously I could just change it from the Zeppelin UI, but I would like to include it in the bootstrap action.
Thanks!
It turns out that a bootstrap action submitted throug the AWS EMR web interface is submitted as a regular EMR step, so it's only run on the master node. This can be seen if you click the 'AWS CLI export' in the cluster web interface. The intended bootstrap action is listed as a regular step.
Using the command line to launch a cluster with a bootstrap action bypasses this problem, so I've just used that.
Edit: Looking back at the web interface, it's pretty clear that I was adding regular steps instead of bootstrap actions. My bad!