python, google cloud platform: unable to overwite a file from google bucket: CRC32 does not match - google-cloud-platform

I am using python3 client to connect to google buckets and trying to the following
download 'my_rules_file.yaml'
modify the yaml file
overwrite the file
Here is the code that i used
from google.cloud import storage
import yaml
client = storage.Client()
bucket = client.get_bucket('bucket_name')
blob = bucket.blob('my_rules_file.yaml')
yaml_file = blob.download_as_string()
doc = yaml.load(yaml_file, Loader=yaml.FullLoader)
doc['email'].clear()
doc['email'].extend(["test#gmail.com"])
yaml_file = yaml.dump(doc)
blob.upload_from_string(yaml_file, content_type="application/octet-stream")
This is the error I get from the last line for upload
BadRequest: 400 POST https://storage.googleapis.com/upload/storage/v1/b/fc-sandbox-datastore/o?uploadType=multipart: {
"error": {
"code": 400,
"message": "Provided CRC32C \"YXQoSg==\" doesn't match calculated CRC32C \"EyDHsA==\".",
"errors": [
{
"message": "Provided CRC32C \"YXQoSg==\" doesn't match calculated CRC32C \"EyDHsA==\".",
"domain": "global",
"reason": "invalid"
},
{
"message": "Provided MD5 hash \"G/rQwQii9moEvc3ZDqW2qQ==\" doesn't match calculated MD5 hash \"GqyZzuvv6yE57q1bLg8HAg==\".",
"domain": "global",
"reason": "invalid"
}
]
}
}
: ('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>)
why is this happening. This seems to happen only for ".yaml files".

The reason for your error is because you are trying to use the same blob object for both downloading and uploading this will not work you need two separate instances... You can find some good examples here Python google.cloud.storage.Blob() Examples
You should use a seperate blob instance to handle the upload you are trying with only one...
.....
blob = bucket.blob('my_rules_file.yaml')
yaml_file = blob.download_as_string()
.....
the second instance is needed here
....
blob.upload_from_string(yaml_file, content_type="application/octet-stream")
...

Related

AWS sagemaker endpoint received client (400) error

I've deployed a tensorflow multi-label classification model using a sagemaker endpoint as follows:
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type="ml.m5.2xlarge", endpoint_name='testing-2')
It gets deployed and works fine when I invoke it from the Sagemaker Jupyter instance:
sample = ['this movie was extremely good']
output=predictor.predict(sample)
output:
{'predictions': [[0.00370046496,
4.32942124e-06,
0.00080883503,
9.25126587e-05,
0.00023958087,
0.000130862]]}
However, I am unable to send a request to the deployed endpoint from other notebooks or sagemaker studio. I'm unsure of the request format.
I've tried several variations in the input format and still failed. The error message is as below:
sagemaker error
Request:
{
"body": {
"text": "Testing model's prediction on this text"
},
"contentType": "application/json",
"endpointName": "testing-2",
"customURL": "",
"customHeaders": [
{
"Key": "sm_endpoint_name",
"Value": "testing-2"
}
]
}
Error:
Error invoking endpoint: Received client error (400) from primary with message "{ "error": "Failed to process element:
0 key: text of 'instances' list. Error: INVALID_ARGUMENT: JSON object: does not have named input: text" }".
See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/testing-2
in account 793433463428 for more information.
Is there any way to find out exactly how the model expects the request format to be?
Earlier I had the same model on my local system and the way I tested it was using this curl request:
curl -s -H 'Content-Type: application/json' -d '{"text": "what ugly posts"}' http://localhost:7070/sentiment
And it worked fine without any issues.
I've tried different formats and replaced the "text" key inside body with other words like "input", "body", nothing etc.
Based on your description above, I assume you are deploying the TensorFlow model using the SageMaker TensorFlow container.
If you want to view what your model expects as input you can use the saved_model CLI:
1
├── keras_metadata.pb
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index
!saved_model_cli show --all --dir {"1"}
After you have confirmed the input name above you can invoke the endpoint as follows:
import json
import boto3
client = boto3.client('runtime.sagemaker')
data = {"instances": ['this movie was extremely good']}
response = client.invoke_endpoint(EndpointName=<EndpointName>,
Body=json.dumps(data))
response_body = response['Body']
print(response_body.read())
The same payload can then also be used in Studio when invoking the endpoint.

Regex - extract ip

I'm tring to pull some data from a plain log file with a json convrestor.
this is the log entry:
01/04/2022 15:29:34.2934 +03:00 - [INFO] - [w3wp/LPAPI-Last Casino/177] - AppsFlyerPostback?re_targeting_conversion_type=&is_retargeting=false&app_id=id624512118&platform=ios&event_type=in-app-event&attribution_type=organic&ip=8.8.8.8&name=blabla
This is the regex I'm using:
(?P<date>[0-9]{2}\/[0-9]{2}\/[0-9]{4}).(?P<time>\s*[0-9]{2}:[0-9]{2}:[0-9]{2}).*(?P<level>\[\D+\]).-.\[(?P<application_subsystem_thread>.*)\].-.(?P<message>.*)
This is the output I'm getting:
{
"application_subsystem_thread": "w3wp/LPAPI-Last Casino/177",
"date": "01/04/2022",
"level": "[INFO]",
"message": "AppsFlyerPostback?re_targeting_conversion_type=&is_retargeting=false&app_id=id624512118&platform=ios&event_type=in-app-event&attribution_type=organic&ip=8.8.8.8&name=blabla",
"time": "15:29:34"
}
As you can see, the convertor is using the group names as the json key.
I would like to get the following output instead:
{
"application_subsystem_thread": "w3wp/LPAPI-Last Casino/177",
"date": "01/04/2022",
"level": "[INFO]",
"message": "AppsFlyerPostback?re_targeting_conversion_type=&is_retargeting=false&app_id=id624512118&platform=ios&event_type=in-app-event&attribution_type=organic&ip=8.8.8.8&name=blabla",
"time": "15:29:34",
"ip": "8.8.8.8"
}
As you can see I would like to get the IP as well how can I do it ?
You could extract it from the part of the message:
As defined in the message it could be captured with
ip\=(?P<ip_address>(?:[0-9]+\.){3}[0-9]+)
So then we incoperate it as part of the greater message group
(?P<message>.*ip\=(?P<ip_address>(?:[0-9]+\.){3}[0-9]+).*)
Resulting in the final expression
(?P<date>[0-9]{2}\/[0-9]{2}\/[0-9]{4}).(?P<time>\s*[0-9]{2}:[0-9]{2}:[0-9]{2}).*(?P<level>\[\D+\]).-.\[(?P<application_subsystem_thread>.*)\].-.(?P<message>.*ip\=(?P<ip_address>(?:[0-9]+\.){3}[0-9]+).*)
var message = `01/04/2022 15:29:34.2934 +03:00 - [INFO] - [w3wp/LPAPI-Last Casino/177] - AppsFlyerPostback?re_targeting_conversion_type=&is_retargeting=false&app_id=id624512118&platform=ios&event_type=in-app-event&attribution_type=organic&ip=8.8.8.8&name=blabla`;
// NOTE - The regex in this code sample has been modified to be ECMAScript compliant
console.log(/(?<date>[0-9]{2}\/[0-9]{2}\/[0-9]{4}).(?<time>\s*[0-9]{2}:[0-9]{2}:[0-9]{2}).*(?<level>\[\D+\]).-.\[(?<application_subsystem_thread>.*)\].-.(?<message>.*ip\=(?<ip_address>(?:[0-9]+\.){3}[0-9]+).*)/gm.exec(message).groups)

An error has occurred: The server encountered an error processing the Lambda response

I am using AWS Lex and AWS Lambda for creating a chatbot. The request and response format are as follows
Event being passed to AWS Lambda
{
"alternativeIntents": [
{
"intentName": "AMAZON.FallbackIntent",
"nluIntentConfidence": null,
"slots": {}
}
],
"botVersion": "$LATEST",
"dialogState": "ConfirmIntent",
"intentName": "OrderBeverage",
"message": "you want to order 2 pints of beer",
"messageFormat": "PlainText",
"nluIntentConfidence": {
"score": 0.92
},
"responseCard": null,
"sentimentResponse": null,
"sessionAttributes": {},
"sessionId": "2021-05-10T09:13:06.841Z-bSWmdHVL",
"slotToElicit": null,
"slots": {
"Drink": "beer",
"Quantity": "2",
"Unit": "pints"
}
}
Response Format-
{
"statusCode": 200,
"dialogAction": {
"type": "Close",
"fulfillmentState": "Fulfilled",
"message": {
"contentType": "PlainText",
"content": "Message to convey to the user. For example, Thanks, your pizza has been ordered."
}
}
}
AWS LAMBDA PYTHON IMPLEMENTATION-
import json
def lambda_handler(event, context):
# TODO implement
slots= event["slots"];
drink,qty,unit= slots["Drink"], slots["Quantity"], slots["Unit"]
retStr= "your order of "+qty+" "+unit+ " of "+drink+ " is coming right up!";
return {"dialogAction": {
"type": "Close",
"fulfillmentState": "Fulfilled",
"message": {
"contentType": "PlainText",
"content": retStr
},
}
}
The formats are in accordance with the documentation, however still getting error in processing lambda response. What is the issue?
This error occurs when the execution of the Lambda function fails and throws an error back to Amazon Lex.
I have attempted to recreate your environment using the python code shared and the test input event.
The output format that you have specified in the original post is correct. Your problem appears to lie with the input test event. The input message that you are using differs from what Lex is actually sending to your Lambda function.
Try adding some additional debugging to your Lambda function to log the event that Lex passes into it and then use the logged event as your new test event.
Ensure that you have CloudWatch logging enabled for the Lambda function so that you can view the input message in the logs.
Here's how my Lambda function looks:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
def dispatch(event):
# TODO implement
slots= event["slots"];
drink,qty,unit= slots["Drink"], slots["Quantity"], slots["Unit"]
retStr= "your order of "+qty+" "+unit+ " of "+drink+ " is coming right up!";
return {"dialogAction": {
"type": "Close",
"fulfillmentState": "Fulfilled",
"message": {
"contentType": "PlainText",
"content": retStr
},
}}
def lambda_handler(event, context):
logger.debug('event={}'.format(event))
response = dispatch(event)
logger.debug(response)
return response
Now if you test via the Lex console you will find your error in the CloudWatch logs.:
[ERROR] KeyError: 'slots' Traceback (most recent call last): File
"/var/task/lambda_function.py", line 33, in lambda_handler
response = dispatch(event) File "/var/task/lambda_function.py", line 19, in dispatch
slots= event["slots"];
Using this error trace and the logged event, you should see that slots is nested within currentIntent.
You will need to update your code to extract the slot values from the correct place.
Trust this helps you.

Pyspark - read data from elasticsearch cluster on EMR

I am trying to read data from elasticsearch from pyspark. I was using the elasticsearch-hadoop api in Spark. The es cluster sits on aws emr, which requires credential to sign in. My script is as below:
from pyspark import SparkContext, SparkConf sc.stop()
conf = SparkConf().setAppName("ESTest") sc = SparkContext(conf=conf)
es_read_conf = { "es.host" : "vhost", "es.nodes" : "node", "es.port" : "443",
"es.query": '{ "query": { "match_all": {} } }',
"es.input.json": "true", "es.net.https.auth.user": "aws_access_key",
"es.net.https.auth.pass": "aws_secret_key", "es.net.ssl": "true",
"es.resource" : "index/type", "es.nodes.wan.only": "true"
}
es_rdd = sc.newAPIHadoopRDD( inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=es_read_conf)
Pyspark keeps throwing error:
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [HEAD] on
[index] failed; servernode:443] returned [403|Forbidden:]
I checked everything which all made sense except for the user and pass entries, would aws access key and secret key work here? We don't want to use the console user and password here for security purpose. Is there a different way to do the same thing?

Submit an App by Url to Firefox Marketplace

I can install it with URL, but i can't upload to firefox marketplace.
but i have 2 errors:
JSON Parse Error
Error: The webapp extension could not be parsed due to a syntax error in the JSON.
No JSON object could be decoded: line 1 column 0 (char 0)
well the json is this:
{
"name": "Snake",
"description": "Snake in html and js",
"launch_path": "/index.html",
"developer": {
"name": "ZiTAL",
"url": "https://github.com/ZiTAL/snakejs"
},
"icons": {
"128": "/img/snake-128.png"
},
"installs_allowed_from": ["*"]
}
Second error:
Manifests must be served with the HTTP header "Content-Type: application/x-web-app-manifest+json". See https://developer.mozilla.org/docs/Web/Apps/Manifest#Serving_manifests for more information.
well if i downloaded with wget:
wget http://myurl/manifest.webapp
the header is OK
HTTP eskaera bidalia, erantzunaren zain... 200 OK
Luzera: 267 [application/x-web-app-manifest+json]
Saving to: ‘manifest.webapp’
To validate the app, you need to put the manifest.webapp url, not the app url:
http://myurl/manifest.webapp
Second error:
You could try wget --save-headers and look in the output file, if the Content-Type header is really correct...