I am trying to run model monitoring on a model in AWS Sagemaker. The monitoring jobs are failing due to " Encoding mismatch: Encoding is JSON for endpointInput, but Encoding is base64 for endpointOutput. We currently only support the same type of input and output encoding at the moment."
Encoding is JSON for endpointInput and base64 for endpointOutput but the expected is json for both input and output.
I tried using the json_content_types in the DataCaptureConfig but the endpointOutput is still in base64 encoded.
Below is my DataCaptureConfig which i used in the deploy :
data_capture_config=DataCaptureConfig(
enable_capture = True,
sampling_percentage=100,
json_content_types = 'application/json',
destination_s3_uri=MY_BUCKET)
My capture files from the model looks something like this:
{
"captureData": {
"endpointInput": {
"observedContentType": "application/json",
"mode": "INPUT",
"data": "{ === json data ===}",
"encoding": "JSON"
},
"endpointOutput": {
"observedContentType": "*/*",
"mode": "OUTPUT",
"data": "{====base 64 encoded output ===}",
"encoding": "BASE64"
}
},
"eventMetadata": {
=== some metadata ===
}
I have observed that the output content type is not being recognized as the json/application.
So I need a workaround/procedure to get the output in the json encoded form.
Please help to the get JSON encoding for both input and output data.
Similar issue reported here , but there is no response.
I have come across a similar issue earlier while invoking the endpoint using boto3 sagemaker-runtime. Try adding the 'Accept' request parameter in invoke_endpoint function with value as 'application/json'.
refer for more help https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html#API_runtime_InvokeEndpoint_RequestSyntax
While deploying the endpoint please set the CaptureContentTypeHeader in the DataCaptureConfig and appropriately map the output either Json ontentTypes or CsvContentTypes.
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CaptureContentTypeHeader.html
Doing this would set the encoding accordingly . If this is not set the default is base64 encoding and hence the issue .
Related
I am trying to use Google Cloud Vision API.
I am using the REST API in this link.
POST https://vision.googleapis.com/v1/files:asyncBatchAnnotate
My request is
{
"requests": [
{
"inputConfig": {
"gcsSource": {
"uri": "gs://redaction-vision/pdf_page1_employment_request.pdf"
},
"mimeType": "application/pdf"
},
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
],
"outputConfig": {
"gcsDestination": {
"uri": "gs://redaction-vision"
}
}
}
]
}
But the response is always only "name" like below:
{
"name": "operations/a7e4e40d1e1ac4c5"
}
My "gs" location is valid.
When I write the wrong path in "gcsSource", 404 not found error is coming.
Who knows why my response is weird?
This is expected, it will not send you the output as a HTTP response. To see what the API did, you need to go to your destination bucket and check for a file named "xxxxxxxxoutput-1-to-1.json", also, you need to specify the name of the object in your gcsDestination section, for example: gs://redaction-vision/test.
Since asyncBatchAnnotate is an asynchronous operation, it won't return the result, it instead returns the name of the operation. You can use that unique name to call GetOperation to check the status of the operation.
Note that there could be more than 1 output file for your pdf if the pdf has more pages than batchSize and the output json file names change depending on the number of pages. It isn't safe to always append "output-1-to-1.json".
Make sure that the uri prefix you put in the output config is unique because you have to do a wildcard search in gcs on the prefix you provide to get all of the json files that were created.
I am trying to save data in S3 through firehose proxied by API gateway. I have create an API gateway endpoint that uses the AWS service integration type and PutRecord action for firehose. I have the mapping template as
{
"DeliveryStreamName": "test-stream",
"Records": [
#foreach($elem in $input.path('$.data'))
{
"Data": "$elem"
}
#if($foreach.hasNext),#end
#end
]
}
Now when I test the endpoint with below JSON
{
"data": [
{"ticker_symbol":"DemoAPIGTWY","sector":"FINANCIAL","change":-0.42,"price":50.43},{"ticker_symbol":"DemoAPIGTWY","sector":"FINANCIAL","change":-0.42,"price":50.43}
]
}
JSON gets modified and shows up as below after the transformation
{ticker_symbol=DemoAPIGTWY, sector=FINANCIAL, change=-0.42, price=50.43}
: is being converted to = which is not a valid JSON
Not sure if something is wrong in the above mapping template
The problem is, that $input.path() returns a json object and not a stringified version of the json. You can take a look at the documentation here.
The Data property expects the value to be a string and not a json object. So long story short - currently there is no built in function which can revert a json object into its stringified version. This means you need to re read the current element in the loop via $input.json(). This will return a json string representation of the element, which you then can add as Data.
Take a look at the answer here which illustrates this concept.
In your case, applying the concept described in the link above would result in a mapping like this:
{
"DeliveryStreamName": "test-stream",
"Records": [
#foreach($elem in $input.path('$.data'))
{
#set($json = $input.json("$[$foreach.index]"))
"Data":"$util.base64Encode($json)",
}
#if($foreach.hasNext),#end
#end
]
}
API Gateway considers the payload data as a text and not as a Json unless explicitly specified.
Kinesis also expects data to be in encoded format while proxying through API Gateway.
Try the following code and this should work, wondering why the for loop has been commented in the mapping template.
Assuming you are not looping through the record set, the following solution should work for you.
{
"DeliveryStreamName": "test-stream",
"Record": {
"Data": "$util.base64Encode($input.json('$.Data'))Cg=="
}
}
Thanks & Regards,
Srivignesh KN
I am using AWS Kinesis Firehose with a custom Data Transformation. The Lambda's written in Python 3.6 and returns strings that look like the following:
{
"records": [
{
"recordId": "...",
"result": "Ok",
"data": "..."
},
{
"recordId": "...",
"result": "Ok",
"data": "..."
},
{
"recordId": "...",
"result": "Ok",
"data": "..."
}
]
}
This Lambda is perfectly happy, and logs outputs that look like the above just before returning them to Firehose. However, the Firehose's S3 Logs then show an error:
Invalid output structure: Please check your function and make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed.
Looking at the examples for this spread across the web in JS and Java, it's not clear to me what I need to be doing differently; I'm quite confused.
If your data is a json object, you can try following
import base64
import json
def lambda_handler(event, context):
output = []
for record in event['records']:
# your own business logic.
json_object = {...}
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(json.dumps(json_object).encode('utf-8')).decode('utf-8')
}
output.append(output_record)
return {'records': output}
base64.b64encode function only works with b'xxx' string while 'data' attribute of output_record needs a normal 'xxx' string.
I've found the same error using Node.js.
Reading the documentation http://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html my mistake was not base64-encoding of the data field of every record.
I resolved doing this:
{
recordId: record.recordId,
result: 'Ok',
data: new Buffer(JSON.stringify(data)).toString('base64')
}
You can check the code in my repo.
https://github.com/hixichen/golang_lamda_decode_protobuf_firehose
I have defined a restful service with nested queries. Output mapping is defined in XML. I get proper response as XML. But if I request a JSON response using Accept:Application/json I get
{
"Fault": {
"faultcode": "soapenv:Server",
"faultstring": "Error while writing to the output stream using JsonWriter",
"detail": ""
}
}
I was getting below exception in 3.5.0 and I found a jira saying it is fixed in 3.5.1. So I tried in 3.5.1 now I am not getting below exception but the same output.
javax.xml.stream.XMLStreamException: Invalid Staring element
Please note I have also tried the escapeNonPrintableChar="true" option in my queries but no use. Strange thing is it working for different data sets. Just one particular data set is throwing this output.
I have changed the JSON formatters as below and got it to work but there is a problem in that.
<messageFormatter contentType="application/json" class="org.apache.axis2.json.JSONMessageFormatter"/>
<!--messageFormatter contentType="application/json" class="org.apache.axis2.json.gson.JsonFormatter" / -->
<messageBuilder contentType="application/json" class="org.apache.axis2.json.JSONOMBuilder"/>
<!--messageBuilder contentType="application/json" class="org.apache.axis2.json.gson.JsonBuilder" /-->
If I use above formatter the null values are not represented properly. Like I get
"Person": {
"Name": {
"#nil": "true"
}
but I want it as (like the other JSON formatter used to give)
"Person": {
"Name": null
}
Any help please. Is there a bug still left in this area?
When you are creating the query in your output response, you define the format you want to receive the response you can select xml or json, in the case you mention you can select the json option, then select generate response this creates this json structure.
{
"entries": {
"entry": [
{
"field1": "$column1",
"field2": "$column2"
}
]
}
}
Then you can modify the answer you need with your fields. Here is an example of how I use it in my query
{
"Pharmacies": {
"Pharmacy": [
{
"ID": "$Id",
"Descripcion": "$Desc",
"Latitude": "$Latitude",
"Longitude": "$Longitude",
"Image": "$Image"
}
]
}
}
The values with "$" are correspond to the name of the column of the query
Regards
Problem
Using the REST API, I have trained and deployed a model that I now want to use for prediction. I've defined the collections for prediction input and output and uploaded a json file formatted accordingly to the cloud storage. However, when trying to create a prediction job I cannot figure out what value to use for the dataFormat field, which is a required parameter. Is there any way to list all valid values?
What I've tried
My requests look like the one below. I've tried JSON, NEWLINE_DELIMITED_JSON (like when importing data into BigQuery), and even the json mime type application/json, in pretty much all different cases I can think of (upper and lower combined with snake, camel, etc.).
{
"jobId": "my_predictions_123",
"predictionInput": {
"modelName": "projects/myproject/models/mymodel",
"inputPaths": [
"gs://model-bucket/data/testset.json"
],
"outputPath": "gs://model-bucket/predictions/0/",
"region": "us-central1",
"dataFormat": "JSON"
},
"predictionOutput": {
"outputPath": "gs://my-bucket/predictions/1/"
}
}
All my attempts have only gotten me this back though:
{
"error": {
"code": 400,
"message": "Invalid value at 'job.prediction_input.data_format' (TYPE_ENUM), \"JSON\"",
"status": "INVALID_ARGUMENT",
"details": [
{
"#type": "type.googleapis.com/google.rpc.BadRequest",
"fieldViolations": [
{
"field": "job.prediction_input.data_format",
"description": "Invalid value at 'job.prediction_input.data_format' (TYPE_ENUM), \"JSON\""
}
]
}
]
}
}
From Cloud ML API reference document https://cloud.google.com/ml/reference/rest/v1beta1/projects.jobs#DataFormat, the data format field in your request should be "TEXT" for all text inputs (including JSON, CSV, etc).