I’m trying to figure out how to add a spark step properly to my aws-emr cluster from the command line aws-cli.
Some background:
I have a large dataset (thousands of .csv files) that I need to read in and analyze. I have a python script that looks something like:
analysis_script.py
import pandas as pd
from pyspark.sql import SQLContext, DataFrame
from pyspark.sql.types import *
from pyspark import SparkContext
import boto3
#Spark context
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
df = sqlContext.read.format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").load("s3n://data_input/*csv")
def analysis(df):
#do bunch of stuff. Create output dataframe
return df_output
df_output = analysis(df)
df_output.save_as_csv_to_s3_somehow
I want the output csv file to go to the directory s3://dataoutput/
Do I need to add the py file to a jar or something? What command do I use to run this analysis utilizing my cluster nodes, and how do I get the output to the correct directoy? Thanks.
I launch the cluster using:
aws emr create-cluster --release-label emr-5.5.0\
--name PySpark_Analysis\
--applications Name=Hadoop Name=Hive Name=Spark Name=Pig Name=Ganglia Name=Presto Name=Zeppelin\
--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=r3.xlarge InstanceGroupType=CORE,InstanceCount=4,InstanceType=r3.xlarge\
--region us-west-2\
--log-uri s3://emr-logs-zerex/
--configurations file://./zeppelin-env-config.json/
--bootstrap-actions Name="Install Python Packages",Path="s3://emr-code/bootstraps/install_python_packages_custom.bash"
I usually use the --steps parameter of the aws emr create-cluster which can be specified like --steps file://mysteps.json. The file has the following look to it:
[
{
"Type": "Spark",
"Name": "KB Spark Program",
"ActionOnFailure": "TERMINATE_JOB_FLOW",
"Args": [
"--verbose",
"--packages",
"org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.1,com.amazonaws:aws-java-sdk-s3:1.11.27,org.apache.hadoop:hadoop-aws:2.7.2,com.databricks:spark-csv_2.11:1.5.0",
"/tmp/analysis_script.py"
]
},
{
"Type": "Spark",
"Name": "KB Spark Program",
"ActionOnFailure": "TERMINATE_JOB_FLOW",
"Args": [
"--verbose",
"--packages",
"org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.1,com.amazonaws:aws-java-sdk-s3:1.11.27,org.apache.hadoop:hadoop-aws:2.7.2,com.databricks:spark-csv_2.11:1.5.0",
"/tmp/analysis_script_1.py"
]
}
]
You can read more about steps here. I use the bootstrap script to load my code from S3 into /tmp and then specify the steps of execution in the file.
As for writing to s3 here is a link that explains that.
Related
I have a python script that launches a Spark step on an EMR cluster.
This is the definition of the step:
Steps=[{
'Name': f"remove_num_{source['target_table'].upper()}",
'ActionOnFailure': 'CONTINUE',
'HadoopJarStep': {
'Jar': 'command-runner.jar',
'Args': ["spark-submit", \
"--deploy-mode", "cluster", \
"--master", "yarn", \
"--conf", "spark.yarn.submit.waitAppCompletion=false", \
"/home/hadoop/remove_row_num.py", \
source['target_table'].upper(), \
HDFS_PATH, \
IMPORT_BUCKET, \
S3_KEY]
}
}])
The script works fine and the step is correctly created and executed.
The problem is that the Spark step execution takes some minutes to complete while the status of the EMR step turns to Completed just after the Spark job is triggered (about 10sec) and not at the end of the Spark step processing.
Is there a way to let the EMR step turn to Completed just when the Spark job is completed?
Thanks,
Luca
Looking for a way to pause gitlab job and resume based on AWS lambda input.
Due to restrictive permissions in my organization, below is my current CI workflow:
In diagram above, on push event lambda triggers gitlab job through webhook. the gitlab job only gets latest code, zip files and copy to certain s3 bucket. AFTER gitlab job is finished, same lambda then triggers codebuild build which does gets latest zip file from s3 bucket, creates UI chunk files and artifacts are pushed to a different s3 bucket.
### gitlab-ci.yml ###
variables:
environment: <aws-account-number>
stages:
- get-latest-code
get-latest-code:
stage: get-latest-code
script:
- zip -r project.zip $(pwd)/*
- export PATH=$PATH:/tmp/project/.local/bin
- pip install awscli
- aws s3 cp $(pwd)/project.zip s3://project-input-bucket-dev
rules:
- if: ('$CI_PIPELINE_SOURCE == "merge_request_event"' || '$CI_PIPELINE_SOURCE == "push"')
### lambda code ###
def runner_lambda_handler(event, context):
cb = boto3.client( 'codebuild' )
builds_dir = os.environ.get('BUILDS_DIR', '/tmp/project/builds')
logger.debug("STARTING GIT LAB RUNNER")
gitlab_runner_cmd = f"gitlab-runner --debug run-single -u https://git.company.com/ -t {token} " \
f"--builds-dir {builds_dir} --max-builds 1 " \
f"--wait-timeout 900 --executor shell"
s3_libraries_loader.subprocess_cmd(gitlab_runner_cmd)
cb.start_build(projectName='PROJECT-Deploy-dev')
return {
"statusCode": 200,
"body": "Gitlab build success."
}
#### Codebuild Stack ####
codebuild.Project(self, f"Project-{env_name}",
project_name=f"Project-{env_name}",
role=codebuild_role,
environment_variables={
"INPUT_S3_ARTIFACTS_BUCKET": {
"value": input_bucket.bucket_name
},
"INPUT_S3_ARTIFACTS_OBJECT": {
"value": "project.zip"
},
"OUTPUT_S3_ARTIFACTS_BUCKET": {
"value": output_bucket.bucket_name
},
"PROJECT_NAME": {
"value": f"Project-{env_name}"
}
},
cache=codebuild.Cache.bucket(input_bucket),
environment=codebuild.BuildEnvironment(
build_image=codebuild.LinuxBuildImage.STANDARD_5_0
),
vpc=vpc,
security_groups=[codebuild_sg],
artifacts=codebuild.Artifacts.s3(
bucket=output_bucket,
include_build_id=False,
package_zip=False,
encryption=False
),
build_spec=codebuild.BuildSpec.from_object({
"version": "0.2",
"cache": {
"paths": ['/root/.m2/**/*', '/root/.npm/**/*', 'build/**/*', '*/project/node_modules/**/*']
},
"phases": {
"install": {
"runtime-versions": {
"nodejs": "14.x"
},
"commands": [
"aws s3 cp --quiet s3://$INPUT_S3_ARTIFACTS_BUCKET/$INPUT_S3_ARTIFACTS_OBJECT .",
"unzip $INPUT_S3_ARTIFACTS_OBJECT",
"cd project",
"export SASS_BINARY_DIR=$(pwd)",
"npm cache verify",
"npm install",
]
},
"build": {
"commands": [
"npm run build"
]
},
"post_build": {
"commands": [
"echo Clearing s3 bucket folder",
"aws s3 rm --recursive s3://$OUTPUT_S3_ARTIFACTS_BUCKET/$PROJECT_NAME"
]
}
},
"artifacts": {
"files": [
"**/*.html",
"**/*.js",
"**/*.css",
"**/*.ico",
"**/*.woff",
"**/*.woff2",
"**/*.svg",
"**/*.png"
],
"discard-paths": "yes",
"base-directory": "$(pwd)/dist/proj"
}
})
)
What's needed:
Currently there is a disconnect between gitlab job and codebuild job. I'm looking to find a way to PAUSE gitlab job after all steps are executed. later on codebuild job successful completion I can resume the same gitlab job and mark as done
thanks in advance
you need to change gitlab's webhook update part and there by curl you can check the status of other parts.
if their status are not finished pause git push to pushing event
You'll need to implement your own logic to wait for the codebuild job you finish, you may use batch-get-builds you check the status.
Check out this, you can have something similar in you gitlab job waiting for the codebuild job to finish
You can create a manual job in your pipeline to serve as a pause. Code build can call the GitLab API to run the manual job, allowing the pipeline to continue. That would probably be the most resource-efficient way to handle this scenario.
However you won’t be able to resume the same job with this method.
There is no mechanism to ‘pause’ a job, but you can implement a polling mechanism as suggested in another answer if you really need to keep things in the same job for some reason. However, you will end up consuming more minutes (if using GitLab.com shared runners) or system resources than needed. You may also want to consider your job timeouts if the process takes a long time.
I'm trying to package a pyspark job with PEX to be run on google cloud dataproc, but I'm getting a Permission Denied error.
I've packaged my third party and local dependencies into env.pex and an entrypoint that uses those dependencies into main.py. I then gsutil cp those two files up to gs://<PATH> and run the script below.
from google.cloud import dataproc_v1 as dataproc
from google.cloud import storage
def submit_job(project_id: str, region: str, cluster_name: str):
job_client = dataproc.JobControllerClient(
client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
)
operation = job_client.submit_job_as_operation(
request={
"project_id": project_id,
"region": region,
"job": {
"placement": {"cluster_name": cluster_name},
"pyspark_job": {
"main_python_file_uri": "gs://<PATH>/main.py",
"file_uris": ["gs://<PATH>/env.pex"],
"properties": {
"spark.pyspark.python": "./env.pex",
"spark.executorEnv.PEX_ROOT": "./.pex",
},
},
},
}
)
The error I get is
Exception in thread "main" java.io.IOException: Cannot run program "./env.pex": error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 14 more
Should I expect packaging my environment like this to work? I don't see a way to change the permission of files included as file_uris in the pyspark job config, and I don't see any documentation on google cloud about packaging with PEX, but PySpark official docs include this guide.
Any help is appreciated - thanks!
You can always run a PEX file using a compatible interpreter. So instead of specifying a program of ./env.pex you could try python env.pex. That does not require env.pex to be executable.
I wasn't able to run the pex directly in the end, but did get a workaround working for now, which was suggested by a user in the pants slack community (thanks!)...
The workaround is to unpack the pex as a venv in a cluster initialization script.
The initialization script gsutil copied to gs://<PATH TO INIT SCRIPT>:
#!/bin/bash
set -exo pipefail
readonly PEX_ENV_FILE_URI=$(/usr/share/google/get_metadata_value attributes/PEX_ENV_FILE_URI || true)
readonly PEX_FILES_DIR="/pexfiles"
readonly PEX_ENV_DIR="/pexenvs"
function err() {
echo "[$(date +'%Y-%m-%dT%H:%M:%S%z')]: $*" >&2
exit 1
}
function install_pex_into_venv() {
local -r pex_name=${PEX_ENV_FILE_URI##*/}
local -r pex_file="${PEX_FILES_DIR}/${pex_name}"
local -r pex_venv="${PEX_ENV_DIR}/${pex_name}"
echo "Installing pex from ${pex_file} into venv ${pex_venv}..."
gsutil cp "${PEX_ENV_FILE_URI}" "${pex_file}"
PEX_TOOLS=1 python "${pex_file}" venv --compile "${pex_venv}"
}
function main() {
if [[ -z "${PEX_ENV_FILE_URI}" ]]; then
err "ERROR: Must specify PEX_ENV_FILE_URI metadata key"
fi
install_pex_into_venv
}
main
To start the cluster and run the initialization script to unpack the pex into a venv on the cluster:
from google.cloud import dataproc_v1 as dataproc
def start_cluster(project_id: str, region: str, cluster_name: str):
cluster_client = dataproc.ClusterControllerClient(...)
operation = cluster_client.create_cluster(
request={
"project_id": project_id,
"region": region,
"cluster": {
"project_id": project_id,
"cluster_name": cluster_name,
"config": {
"master_config": <CONFIG>,
"worker_config": <CONFIG>,
"initialization_actions": [
{
"executable_file": "gs://<PATH TO INIT SCRIPT>",
},
],
"gce_cluster_config": {
"metadata": {"PEX_ENV_FILE_URI": "gs://<PATH>/env.pex"},
},
},
},
}
)
To start the job and use the unpacked pex venv to run the pyspark job:
def submit_job(project_id: str, region: str, cluster_name: str):
job_client = dataproc.ClusterControllerClient(...)
operation = job_client.submit_job_as_operation(
request={
"project_id": project_id,
"region": region,
"job": {
"placement": {"cluster_name": cluster_name},
"pyspark_job": {
"main_python_file_uri": "gs://<PATH>/main.py",
"properties": {
"spark.pyspark.python": "/pexenvs/env.pex/bin/python",
},
},
},
}
)
Following #megabits answer here is the bash based workflow that works for me
copy the init script (from answer) to GCS as gs://BUCKET/pkg/cluster-env-init.bash
build PEX providing --include-tools argument that is required by initialization script, e.g.
pex --include-tools -r requirements.txt -o env.pex
put PEX file on GCS
gsutil mv env.pex "gs://BUCKET/pkg/env.pex"
create cluster using PEX file to set-up env
gcloud dataproc clusters create your-cluster --region us-central1 \
--initialization-actions="gs://BUCKET/pkg/cluster-env-init.bash" \
--metadata "PEX_ENV_FILE_URI=gs://BUCKET/pkg/env.pex"
run job
gcloud dataproc jobs submit pyspark your-script.py \
--cluster=your-cluster --region us-central1 \
--properties spark.pyspark.python="/pexenvs/env.pex/bin/python"
I'm doing a lot of work with AWS EMR and when you build an EMR cluster through the AWS Management Console you can click a button to export the AWS CLI Command that creates the EMR cluster.
It then gives you a big CLI command that isn't formatted in any way i.e., if you copy and paste the command it's all on a single line.
I'm using these EMR CLI commands, that were created by other individuals, to create the EMR clusters in Python using the AWS SDK Boto3 library i.e., I'm looking at the CLI command to get all the configuration details. Some of the configuration details are present on the AWS Management Console UI but not all of them so it's easier for me to use the CLI command that you can export.
However, the AWS CLI command is very hard to read since it's not formatted. Is there an AWS CLI command formatter available online similar to JSON formatters?
Another solution I could use is to Clone the EMR Cluster and go through the EMR Cluster creation screen on the AWS Management Console to get all the configuration details but I'm still curious if I could format the CLI Command and do it that way. Another added benefit of being able to format the exported CLI command is that I could put it on a Confluence page for documentation.
Here is some quick python code to do it:
import shlex
import json
import re
def format_command(command):
tokens = shlex.split(command)
formatted = ''
for token in tokens:
# Flags get a new line
if token.startswith("--"):
formatted += '\\\n '
# JSON data
if token[0] in ('[', '{'):
json_data = json.loads(token)
data = json.dumps(json_data, indent=4).replace('\n', '\n ')
formatted += "'{}' ".format(data)
# Quote token when it contains whitespace
elif re.match('\s', token):
formatted += "'{}' ".format(token)
# Simple print for remaining tokens
else:
formatted += token + ' '
return formatted
example = """aws emr create-cluster --applications Name=spark Name=ganglia Name=hadoop --tags 'Project=MyProj' --ec2-attributes '{"KeyName":"emr-key","AdditionalSlaveSecurityGroups":["sg-3822994c","sg-ccc76987"],"InstanceProfile":"EMR_EC2_DefaultRole","ServiceAccessSecurityGroup":"sg-60832c2b","SubnetId":"subnet-3c76ee33","EmrManagedSlaveSecurityGroup":"sg-dd832c96","EmrManagedMasterSecurityGroup":"sg-b4923dff","AdditionalMasterSecurityGroups":["sg-3822994c","sg-ccc76987"]}' --service-role EMR_DefaultRole --release-label emr-5.14.0 --name 'Test Cluster' --instance-groups '[{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m4.xlarge","Name":"Master"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m4.xlarge","Name":"CORE"}]' --configurations '[{"Classification":"spark-defaults","Properties":{"spark.sql.avro.compression.codec":"snappy","spark.eventLog.enabled":"true","spark.dynamicAllocation.enabled":"false"},"Configurations":[]},{"Classification":"spark-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"SPARK_DAEMON_MEMORY":"4g"},"Configurations":[]}]}]' --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region us-east-1"""
print(format_command(example))
Output looks like this:
aws emr create-cluster \
--applications Name=spark Name=ganglia Name=hadoop \
--tags Project=MyProj \
--ec2-attributes '{
"ServiceAccessSecurityGroup": "sg-60832c2b",
"InstanceProfile": "EMR_EC2_DefaultRole",
"EmrManagedMasterSecurityGroup": "sg-b4923dff",
"KeyName": "emr-key",
"SubnetId": "subnet-3c76ee33",
"AdditionalMasterSecurityGroups": [
"sg-3822994c",
"sg-ccc76987"
],
"AdditionalSlaveSecurityGroups": [
"sg-3822994c",
"sg-ccc76987"
],
"EmrManagedSlaveSecurityGroup": "sg-dd832c96"
}' \
--service-role EMR_DefaultRole \
--release-label emr-5.14.0 \
--name Test Cluster \
--instance-groups '[
{
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"VolumeType": "gp2",
"SizeInGB": 32
},
"VolumesPerInstance": 1
}
]
},
"InstanceCount": 1,
"Name": "Master",
"InstanceType": "m4.xlarge",
"InstanceGroupType": "MASTER"
},
{
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"VolumeType": "gp2",
"SizeInGB": 32
},
"VolumesPerInstance": 1
}
]
},
"InstanceCount": 1,
"Name": "CORE",
"InstanceType": "m4.xlarge",
"InstanceGroupType": "CORE"
}
]' \
--configurations '[
{
"Properties": {
"spark.eventLog.enabled": "true",
"spark.dynamicAllocation.enabled": "false",
"spark.sql.avro.compression.codec": "snappy"
},
"Classification": "spark-defaults",
"Configurations": []
},
{
"Properties": {},
"Classification": "spark-env",
"Configurations": [
{
"Properties": {
"SPARK_DAEMON_MEMORY": "4g"
},
"Classification": "export",
"Configurations": []
}
]
}
]' \
--scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
--region us-east-1
I am unable to set environment variables for my spark application. I am using AWS EMR to run a spark application. Which is more like a framework I wrote in python on top of spark, to run multiple spark jobs according to environment variables present. So in order for me to start the exact job, I need to pass the environment variable into the spark-submit. I tried several methods to do this. But none of them works. As I try to print the value of the environment variable inside the application it returns empty.
To run the cluster in the EMR I am using following AWS CLI command
aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Spark --ec2-attributes '{"KeyName":"<Key>","InstanceProfile":"<Profile>","SubnetId":"<Subnet-Id>","EmrManagedSlaveSecurityGroup":"<Group-Id>","EmrManagedMasterSecurityGroup":"<Group-Id>"}' --release-label emr-5.13.0 --log-uri 's3n://<bucket>/elasticmapreduce/' --bootstrap-action 'Path="s3://<bucket>/bootstrap.sh"' --steps file://./.envs/steps.json --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"c4.xlarge","Name":"Master"}]' --configurations file://./.envs/Production.json --ebs-root-volume-size 64 --service-role EMRRole --enable-debugging --name 'Application' --auto-terminate --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region <region>
Now Production.json looks like this:
[
{
"Classification": "yarn-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"FOO": "bar"
}
}
]
},
{
"Classification": "spark-defaults",
"Properties": {
"spark.executor.memory": "2800m",
"spark.driver.memory": "900m"
}
}
]
And steps.json like this :
[
{
"Name": "Job",
"Args": [
"--deploy-mode","cluster",
"--master","yarn","--py-files",
"s3://<bucket>/code/dependencies.zip",
"s3://<bucket>/code/__init__.py",
"--conf", "spark.yarn.appMasterEnv.SPARK_YARN_USER_ENV=SHAPE=TRIANGLE",
"--conf", "spark.yarn.appMasterEnv.SHAPE=RECTANGLE",
"--conf", "spark.executorEnv.SHAPE=SQUARE"
],
"ActionOnFailure": "CONTINUE",
"Type": "Spark"
}
]
When I try to access the environment variable inside my __init__.py code, it simply prints empty. As you can see I am running the step using spark with yarn cluster in cluster mode. I went through these links to reach this position.
How do I set an environment variable in a YARN Spark job?
https://spark.apache.org/docs/latest/configuration.html#environment-variables
https://spark.apache.org/docs/latest/configuration.html#runtime-environment
Thanks for any help.
Use classification yarn-env to pass environment variables to the worker nodes.
Use classification spark-env to pass environment variables to the driver, with deploy mode client. When using deploy mode cluster, use yarn-env.
(Dear moderator, if you want to delete the post, let me know why.)
To work with EMR clusters I work using the AWS Lambda, creating a project that build an EMR cluster when a flag is set in the condition.
Inside this project, we define the variables that you can set in the Lambda and then, replace this to its value. To use this, we have to use the AWS API. The possible method you have to use is the AWSSimpleSystemsManagement.getParameters.
Then, make a map like val parametersValues = parameterResult.getParameters.asScala.map(k => (k.getName, k.getValue)) to have a tuple with its name and value.
Eg: ${BUCKET} = "s3://bucket-name/
What this means, you only have to write in your JSON ${BUCKET} instead all the name of your path.
Once you have replace the value, the step JSON can have a view like this,
[
{
"Name": "Job",
"Args": [
"--deploy-mode","cluster",
"--master","yarn","--py-files",
"${BUCKET}/code/dependencies.zip",
"${BUCKET}/code/__init__.py",
"--conf", "spark.yarn.appMasterEnv.SPARK_YARN_USER_ENV=SHAPE=TRIANGLE",
"--conf", "spark.yarn.appMasterEnv.SHAPE=RECTANGLE",
"--conf", "spark.executorEnv.SHAPE=SQUARE"
],
"ActionOnFailure": "CONTINUE",
"Type": "Spark"
}
]
I hope this can help you to solve your problem.