Sagemaker Pipelines - Unable to parse Pipeline Definition - amazon-web-services

I'm using Sagemaker Pipelines to chain together two consecutive ProcessingJobs. I'm getting a weird error when I call pipeline.upsert()
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreatePipeline operation: Unable to parse pipeline definition. Property 'null' with value 'null' is not of expected type 'String'
This is what my pipeline looks like:
step_process_data = ProcessingStep(
name='ProcessDataStep',
processor=script_processor,
code=os.path.join(BASE_DIR, "scripts/preprocess.py"),
job_arguments=job_arguments
)
step_split_data = ProcessingStep(
name='SplitDataStep',
processor=script_processor,
code=os.path.join(BASE_DIR, "scripts/split_data.py"),
job_arguments=job_arguments,
depends_on=[step_process_data]
)
pipeline = Pipeline(
name="DataPreperationPipeline",
steps=[step_process_data, step_split_data],
sagemaker_session=sagemaker_session
)
Any thoughts on what I am doing wrong or missing?

I ran into the same issue where my job_arguments were not all strings. I'd make sure all items in job_arguments are of the same type.

Not sure if you have all the objects are set up correctly can you please follow the below example and verify.
https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.ipynb

Related

InSpec Testing on AWS Auto Scaling Groups

I'm trying to perform some testing on infrastructure that's created using Terraform. This specific test is testing an auto scaling group. I don't know the full name of the resource as it's appended with a dynamic token, but I know the start of the resource name which is set in the asg_name variable. I've got the following test:
asg_name = input('asg_name')
control 'aws_auto_scaling_groups' do
title 'Auto Scale Group'
desc 'Ensures Autoscale Group exists with correct configuration'
describe aws_auto_scaling_group ( name: /^#{asg_name}*/ ) do
it { should exist }
its('min_size') { should be 1}
its('max_size') { should be 1}
end
end
This is failing with the following:
/opt/inspec/embedded/lib/ruby/gems/2.7.0/gems/inspec-core-5.17.4/lib/inspec/profile_context.rb:171:in `instance_eval': aws/controls/loadbalancer.rb:6: syntax error, unexpected tLABEL, expecting ')' (SyntaxError)
... aws_auto_scaling_group ( name: /^#{asg_name}*/ ) do
... ^~~~~
aws/controls/loadbalancer.rb:12: syntax error, unexpected `end', expecting end-of-input
I've tried a number of different options, including using aws_auto_scaling_groups.where which didn't work as expected as it returned an array, but I still haven't been able to get it working. Please can anyone tell me how I do a match against a name for a single resource like this using InSpec.
Thank you in advance!

How to pass the “user_data_mapper” argument to beam pipeline’s WriteToSnowflake function?

I am trying to create a pipeline for writing the data to snowflakes using apache beam. For writing the data I am using WriteToSnowflakes function, but I am getting below error message.
TypeError: init() missing 1 required positional arguments: 'user_data_mapper'
When I searched regarding this argument, I founded this link
https://beam.apache.org/documentation/io/built-in/snowflake/#required-parameters-1
but I am not able understand this function. what this ‘user’ argument means, how and where to define it and what data has to be passed to this function from the pipeline. Can any one please help me with some sample code for this argument.
The sample code :
with TestPipeline(options=PipelineOptions(OPTIONS)) as p:
(p
| <SOURCE OF DATA>
| WriteToSnowflake(
server_name=<SNOWFLAKE SERVER NAME>,
username=<SNOWFLAKE USERNAME>,
password=<SNOWFLAKE PASSWORD>,
schema=<SNOWFLAKE SCHEMA>,
database=<SNOWFLAKE DATABASE>,
staging_bucket_name=<GCS OR S3 BUCKET>,
storage_integration_name=<SNOWFLAKE STORAGE INTEGRATION NAME>,
table_schema=<SNOWFLAKE TABLE SCHEMA>,
table=<SNOWFLAKE TABLE>,
create_disposition='CREATE_NEVER',
write_disposition= 'WRITE_TRUNCATE',
)

Dataflow job run failing when templateLocation argument is set

Dataflow job is failing with below exception when I pass parameters staging,temp & output GCS bucket locations.
Java code:
final String[] used = Arrays.copyOf(args, args.length + 1);
used[used.length - 1] = "--project=OVERWRITTEN"; final T options =
PipelineOptionsFactory.fromArgs(used).withValidation().as(clazz);
options.setProject(PROJECT_ID);
options.setStagingLocation("gs://abc/staging/");
options.setTempLocation("gs://abc/temp");
options.setRunner(DataflowRunner.class);
options.setGcpTempLocation("gs://abc");
The error:
INFO: Staging pipeline description to gs://ups-heat-dev- tmp/mniazstaging_ingest_validation/staging/
May 10, 2018 11:56:35 AM org.apache.beam.runners.dataflow.util.PackageUtil tryStagePackage
INFO: Uploading <42088 bytes, hash E7urYrjAOjwy6_5H-UoUxA> to gs://ups-heat-dev-tmp/mniazstaging_ingest_validation/staging/pipeline-E7urYrjAOjwy6_5H-UoUxA.pb
Dataflow SDK version: 2.4.0
May 10, 2018 11:56:38 AM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Printed job specification to gs://ups-heat-dev-tmp/mniazstaging_ingest_validation/templates/DataValidationPipeline
May 10, 2018 11:56:40 AM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Template successfully created.
Exception in thread "main" java.lang.NullPointerException
at org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:501)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:477)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:312)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:248)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:202)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:195)
at com.example.DataValidationPipeline.main(DataValidationPipeline.java:66)
I was also facing the same issue, the error was throwing at p.run().waitForFinish();. Then I have tried following code
PipelineResult result = p.run();
System.out.println(result.getState().hasReplacementJob());
result.waitUntilFinish();
This was throwing the following exception
java.lang.UnsupportedOperationException: The result of template creation should not be used.
at org.apache.beam.runners.dataflow.util.DataflowTemplateJob.getState (DataflowTemplateJob.java:67)
Then to fix the issue I used the following code
PipelineResult result = pipeline.run();
try {
result.getState();
result.waitUntilFinish();
} catch (UnsupportedOperationException e) {
// do nothing
} catch (Exception e) {
e.printStackTrace();
}
I was running into the problem with java.lang.UnsupportedOperationException: The result of template creation should not be used. today aswell and I tried to fixed it by checking if the job was of type DataflowTemplateJob first:
val (sc, args) = ContextAndArgs(cmdlineArgs)
// ...
val result = sc.run()
if (!result.isInstanceOf[DataflowTemplateJob]) result.waitUntilFinish()
I think this should work for bare java jobs, but if you use Scio, then the result will be some anonymous type, so in the end I had to do the try catch version aswell.
try {
val result = sc.run().waitUntilFinish()
} catch {
case _: UnsupportedOperationException => // this happens during template creation
}
as displayed in the official Flex Template sample
There is a comment saying:
// For a Dataflow Flex Template, do NOT waitUntilFinish().
The same applies if you call any of those methods of the Runner if you pass the --templateRunner argument
if you change the pipeline to pipeline.run(); it is not going to fail.
The Issue is still flagged as opened by apache beam
https://github.com/apache/beam/issues/20106

Amazon Lex Error: An error occurred (BadRequestException) when calling the PutIntent operation: RelativeId does not match Lex ARN format

I'm trying to build a chatbot using Amazon's boto3 library. Right now, I am trying to create an intent using the put_intent function. My code is as follows:
intent = lexClient.put_intent(name = 'test',
sampleUtterances = ["Who is messi?"]
)
When I try running this, I get the following exception:
botocore.errorfactory.BadRequestException: An error occurred
(BadRequestException) when calling the PutIntent operation: RelativeId
does not match Lex ARN format: intent:test2:$LATEST
Can anyone tell me what I'm doing wrong?
I got the same error when trying to have a digit in intent name field. Realized that was not allowed when trying to do the same from AWS console.
Error handling could really be more specific.
Try taking the question mark out of the utterance, that has caused me issues in the past!
You need to run GetSlotType. That will return current checksum for that slot. Put that checksum in your PutSlotType checksum. Big bang boom.
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/LexModelBuildingService.html#getSlotType-property
var params = {
name: "AppointmentTypeValue",
checksum:'54c6ab5f-fe30-483a-a364-b76e32f6f05d',
description: "Type of dentist appointment to schedule'",
enumerationValues: [
{
value: "cleaning"
},
{
value: "whitening"
},
{
value: "root canal"
},
{
value:"punch my face"
}
]
};
For the put_intent function I faced similar issues.
At least the following 3 are worth mentioning.
Sample Utterances
There are requirements for the sample utterances:
An utterance can consist only of Unicode
characters, spaces, and valid punctuation marks. Valid punctuation
marks are: periods for abbreviations, underscores, apostrophes, and
hyphens. If there is a slot placeholder in your utterance, ensure that
it's in the {slotName} format and has spaces at both ends.
It seems like there is no error raised when calling the put_intent function with the following code.
intent = lexClient.put_intent(name = 'test',
sampleUtterances = ["Who is messi"]
)
However, if you try to add it to your bot and start building the bot it will fail.
To fix it remove the question mark at the end of you sampleUtterance.
intent = lexClient.put_intent(name = 'test',
sampleUtterances = ["Who is messi?"]
)
Prior intent version
If your intent already exists you need to add the checksum to your function call. To get the checksum of your intent you can use the get_intent function.
For example docs:
response = client.get_intent(
name='test',
version='$LATEST'
)
found_checksum = response.get('checksum')
After that you can put a new version of the intent:
intent = lexClient.put_intent(name = 'test',
sampleUtterances = ["Who is messi"],
checksum = found_checksum
)
Intent Name (correct in your case, just adding this for reference)
It seems like the name can only contain letters, underscores, and should be <=100 in length. Haven't found anything in the docs. This is just trial and error.
Calling put_intent with the following:
intent = lexClient.put_intent(name = 'test_1',
sampleUtterances = ["Who is messi"]
)
Results in the following error:
BadRequestException: An error occurred (BadRequestException) when calling the PutIntent operation: RelativeId does not match Lex ARN format: intent:test_1:$LATEST
To fix the name you can replace it to:
intent = lexClient.put_intent(name = 'test',
sampleUtterances = ["Who is messi"]
)

No error description provided though QSqlQuery::exec fails

I get into the if block:
if ( !_query.exec( ) )
{
QString errdb = _db.driver()->lastError().databaseText();
QString errdrv = _db.driver()->lastError().driverText();
//...
but errdb and errdrv are empty.
Is there another way to check what went wrong?
You can get the error using QSqlQuery::lastError(), in your case _query.lastError().
Quote from the Qt documentation:
Returns error information about the last error (if any) that occurred
with this query.
What you need is _db.lastError().text().