Importing an AWS image into softlayer - amazon-web-services

Is it possible to import an AWS image into softlayer directly? I know we can download AWS image and the import into softlayer but was looking for some automated solution.

there is not any Softlayer's API method wich make all the proccess automatically, the image must be uploaded in any of your object storage's account you could use the API to upload the image there here some references:
http://sldn.softlayer.com/blog/waelriac/Managing-SoftLayer-Object-Storage-Through-REST-APIs
and see this documentation about how to handle large files
https://docs.openstack.org/developer/swift/overview_large_objects.html
Once the file has been uploaded tou can import it using the API:
here an example using the SOftlayer's Python client:
"""
Create Image Template from external source
This script creates a transaction to import a disk image from an external source and create
a standard image template
Important manual pages:
http://sldn.softlayer.com/reference/services/SoftLayer_Virtual_Guest_Block_Device_Template_Group/createFromExternalSource
http://sldn.softlayer.com/reference/datatypes/SoftLayer_Container_Virtual_Guest_Block_Device_Template_Configuration
http://sldn.softlayer.com/reference/datatypes/SoftLayer_Virtual_Guest_Block_Device_Template_Group
License: http://sldn.softlayer.com/article/License
Author: SoftLayer Technologies, Inc. <sldn#softlayer.com>
"""
import SoftLayer
# Your SoftLayer username and apiKey
USERNAME = 'set me'
API_KEY = 'set me'
# Declare the group name to be applied to the imported template
name = 'imageTemplateTest'
# Declare the note to be applied to the imported template
note = 'This is for test Rcv'
'''
Declare referenceCode of the operating system software description for the imported VHD
available options: CENTOS_6_32, CENTOS_6_64, CENTOS_7_64, REDHAT_6_32, REDHAT_6_64, REDHAT_7_64,
UBUNTU_12_32, UBUNTU_12_64, UBUNTU_14_32, UBUNTU_14_64, WIN_2003-STD-SP2-5_32, WIN_2003-STD-SP2-5_64,
WIN_2008-STD-R2-SP1_64, WIN_2012-STD_64.
'''
operatingSystemReferenceCode = 'CENTOS_6_64'
'''
Define the parameters below, which refers to object storage where the image template is stored.
It will help to build the uri.
'''
# Declare the object storage account name
objectStorageAccountName = 'SLOS307608-10'
# Declare the cluster name where the image is stored
clusterName = 'dal05'
# Declare the container name where the image is stored
containerName = 'OS'
# Declare the file name of the image stored in the object storage, it should be .vhd or
fileName = 'testImage2.vhd-0.vhd'
"""
Creating an SoftLayer_Container_Virtual_Guest_block_Device_Template_Configuration Skeleton
which contains the information from external source
"""
configuration = {
'name': name,
'note': note,
'operatingSystemReferenceCode': operatingSystemReferenceCode,
'uri': 'swift://'+ objectStorageAccountName + '#' + clusterName + '/' + containerName + '/' + fileName
}
# Declare the API client
client = SoftLayer.Client(username=USERNAME, api_key=API_KEY)
groupService = client['SoftLayer_Virtual_Guest_Block_Device_Template_Group']
try:
result = groupService.createFromExternalSource(configuration)
print(result)
except SoftLayer.SoftLayerAPIError as e:
print("Unable to create the image template from external source. faultCode=%s, faultString=%s" % (e.faultCode, e.faultString))
exit(1)
Regards

Related

SageMaker Monitoring Tutorial boto3 Object Function Type Error

I am following the steps in the SageMaker Monitoring Tutorial here:
https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/introduction/SageMaker-ModelMonitoring.html
And for the line:
bucket.Object(code_prefix + "/preprocessor.py").upload_file("preprocessor.py")
I get the error:
TypeError: expected string or bytes-like object
Which I dont understand, because the input to the upload_file() function is "preprocessor.py" which is a string.
To unblock you quickly, please take the sagemaker session route to upload the artifact to S3 bucket -
import sagemaker
from sagemaker import get_execution_role
# S3 bucket for saving code and model artifacts.
# Feel free to specify a different bucket and prefix
bucket = sagemaker.Session().default_bucket()
artifact_name = "preprocessor.py"
prefix = "code" // change as required
sample_url = sagemaker.Session().upload_data(
artifact_name, bucket=bucket, key_prefix=prefix"
)

How to add PubSub Destination while creating a job using preset "preset/web-hd" using google cloud transcoder-api?

I am converting a video, uploaded to cloud storage using a signed URL, using Transcoder API. I have written a cloud function that is triggered with write operations on the bucket. Everything is working fine but I need to get a notification when the conversion is completed. I am creating the job to convert the vid using the following code. I am trying to follow the solution proposed in this answer Google Cloud Platform: Convert uploaded MP4 file to HLS file
def create_job_from_preset(project_id, location, input_uri, output_uri, preset):
"""Creates a job based on a job preset.
Args:
project_id: The GCP project ID.
location: The location to start the job in.
input_uri: Uri of the video in the Cloud Storage bucket.
output_uri: Uri of the video output folder in the Cloud Storage bucket.
preset: The preset template (for example, 'preset/web-hd')."""
client = TranscoderServiceClient()
parent = f"projects/{project_id}/locations/{location}"
job = transcoder_v1.types.Job()
job.input_uri = input_uri
job.output_uri = output_uri
job.template_id = preset
job.ttl_after_completion_days = 1
job.config = transcoder_v1.types.JobConfig(
PubsubDestination={
topic_name=f"projects/{project_id}/topics/testing"
}
)
response = client.create_job(parent=parent, job=job)
print(f"Job: {response.name}")
return response
The following snippet in the above code is not working
job.config = transcoder_v1.types.JobConfig(
PubsubDestination={
topic_name=f"projects/{project_id}/topics/testing"
}
)
I have viewed the following but couldn't find any solution.
https://cloud.google.com/transcoder/docs/how-to/create-pub-sub
How to configure pubsub_destination in Transcoder API of GCP
You cannot not define any configuration on your JobConfig on your code if you are creating a job from a preset or template since the preset and template will already populate the JobConfig for you.
As an alternative, you may create job using an ad-hoc configuration and then define PubsubDestination as shown on below code:
Note that I also corrected the syntax in using the PubsubDestination
from google.cloud.video import transcoder_v1
from google.cloud.video.transcoder_v1.services.transcoder_service import (
TranscoderServiceClient,
)
def create_job_from_ad_hoc(project_id, location, input_uri, output_uri):
"""Creates a job based on an ad-hoc job configuration.
Args:
project_id: The GCP project ID.
location: The location to start the job in.
input_uri: Uri of the video in the Cloud Storage bucket.
output_uri: Uri of the video output folder in the Cloud Storage bucket."""
client = TranscoderServiceClient()
parent = f"projects/{project_id}/locations/{location}"
job = transcoder_v1.types.Job()
job.input_uri = input_uri
job.output_uri = output_uri
job.config = transcoder_v1.types.JobConfig(
elementary_streams=[
transcoder_v1.types.ElementaryStream(
key="video-stream0",
video_stream=transcoder_v1.types.VideoStream(
h264=transcoder_v1.types.VideoStream.H264CodecSettings(
height_pixels=360,
width_pixels=640,
bitrate_bps=550000,
frame_rate=60,
),
),
),
transcoder_v1.types.ElementaryStream(
key="video-stream1",
video_stream=transcoder_v1.types.VideoStream(
h264=transcoder_v1.types.VideoStream.H264CodecSettings(
height_pixels=720,
width_pixels=1280,
bitrate_bps=2500000,
frame_rate=60,
),
),
),
transcoder_v1.types.ElementaryStream(
key="audio-stream0",
audio_stream=transcoder_v1.types.AudioStream(
codec="aac", bitrate_bps=64000
),
),
],
mux_streams=[
transcoder_v1.types.MuxStream(
key="sd",
container="mp4",
elementary_streams=["video-stream0", "audio-stream0"],
),
transcoder_v1.types.MuxStream(
key="hd",
container="mp4",
elementary_streams=["video-stream1", "audio-stream0"],
),
],
pubsub_destination=
transcoder_v1.types.PubsubDestination(
topic=f"projects/{project_id}/topics/your-topic"
),
)
response = client.create_job(parent=parent, job=job)
print(f"Job: {response.name}")
return response
Output of my testing:
Other alternative is to create your own job template and then use it in your template_id so that you don't have to always define PubsubDestination in your code.

Sagemaker python sdk: accessing custom_attributes in inference job

I am using Sagemaker python sdk for my inference job and following this guide. I am triggering my sagemaker inference job from Airflow with below python callable:
def transform(sage_role, inference_file_local_path, **kwargs):
"""
Python callable to execute Sagemaker SDK train job. It takes infer_batch_output, infer_batch_input, model_artifact,
instance_type and infer_file_name as run time parameter.
:param inference_file_local_path: Local entry_point path for Inference file.
:param sage_role: Sagemaker execution role.
"""
model = TensorFlowModel(entry_point=infer_file_name,
source_dir=inference_file_local_path,
model_data=model_artifact,
role=sage_role,
framework_version="2.5.1")
tensorflow_serving_transformer = model.transformer(
instance_count=1,
instance_type=instance_type,
accept="text/csv",
strategy="SingleRecord",
max_payload=10,
max_concurrent_transforms=10,
output_path=batch_output)
return tensorflow_serving_transformer.transform(data=batch_input, content_type='text/csv')
and my simply inference.py looks like:
def input_handler(data, context):
""" Pre-process request input before it is sent to TensorFlow Serving REST API
Args:
data (obj): the request data, in format of dict or string
context (Context): an object containing request and configuration details
Returns:
(dict): a JSON-serializable dict that contains request body and headers
"""
if context.request_content_type == 'application/x-npy':
# very simple numpy handler
payload = np.load(data.read().decode('utf-8'))
x_user_feature = np.asarray(payload.item().get('test').get('feature_a_list'))
x_channel_feature = np.asarray(payload.item().get('test').get('feature_b_list'))
examples = []
for index, elem in enumerate(x_user_feature):
examples.append({'feature_a_list': elem, 'feature_b_list': x_channel_feature[index]})
return json.dumps({'instances': examples})
if context.request_content_type == 'text/csv':
payload = pd.read_csv(data)
print("Model name is ..............")
model_name = context.model_name
print(model_name)
examples = []
row_ch = []
if config_exists(model_bucket, "{}{}".format(config_path, model_name)):
config_keys = get_s3_json_file(model_bucket, "{}{}".format(config_path, model_name))
feature_b_list = config_keys["feature_b_list"].split(",")
row_ch = [float(ch_feature_str) for ch_feature_str in feature_b_list]
if "column_names" in config_keys.keys():
cols = config_keys["column_names"].split(",")
payload.columns = cols
for index, row in payload.iterrows():
row_user = row['feature_a_list'].replace('[', '').replace(']', '').split()
row_user = [float(x) for x in row_user]
if not row_ch:
row_ch = row['feature_b_list'].replace('[', '').replace(']', '').split()
row_ch = [float(x) for x in row_ch]
example = {'feature_a_list': row_user, 'feature_b_list': row_ch}
examples.append(example)
raise ValueError('{{"error": "unsupported content type {}"}}'.format(
context.request_content_type or "unknown"))
def output_handler(data, context):
"""Post-process TensorFlow Serving output before it is returned to the client.
Args:
data (obj): the TensorFlow serving response
context (Context): an object containing request and configuration details
Returns:
(bytes, string): data to return to client, response content type
"""
if data.status_code != 200:
raise ValueError(data.content.decode('utf-8'))
response_content_type = context.accept_header
prediction = data.content
return prediction, response_content_type
It is working fine however I want to pass custom arguments to inference.py so that I can modify the input data accordingly based on requirement. I thought of using a config file per requirement and download it from s3 based on model name but as I am using model_data and passes model.tar.gz at runtime context.model_name is always None.
Is there a way I can pass run time argument to inference.py that I can use for customization?
In the docs I see sagemaker provides custom_attributes but I don't see any example of it on how to use it and access it in inference.py.
custom_attributes (string): content of ‘X-Amzn-SageMaker-Custom-Attributes’ header from the original request. For example, ‘tfs-model-name=half*plus*three,tfs-method=predict’
Currently CustomAttributes is supported in the InvokeEndpoint API call when using a realtime Endpoint.
As an example, you can look at passing JSON Lines as input to your Transform Job that contains the input payload and some custom arguments which you can consume in your inference.py file.
For example,
{
"input":"1,2,3,4",
"custom_args":"my_custom_arg"
}

Search for 2 strings from multiple pdfs in AWS S3 Bucket which has sub directories without downloading those in local machine

Im looking to search for two words in multiple pdfs located in AWS S3 bucket. However, I dont want to download those docs in local machine, instead if the search part could directly run on those pdfs via URL. Point to note that these PDFs are located in multiple sub directories within a bucket ( like year folder, then month folder, then date ).
Amazon S3 does not have a 'Search' capability. It is a "simple storage service".
You would either need to download those documents to some form of compute platform (eg EC2, Lambda, or your own computer) and perform the searches, or you could pre-index the documents using a service like Amazon OpenSearch Service and then send the query to the search service.
Running a direct scan of PDFs to search for texts in an S3 bucket is HARD:
Some PDFs contain text that were embedded inside images (They are not readable in text form)
If you want to download a PDF without saving it, consider using memory-optimized machines and don't store the files in the hard drive of the virtual machines and use in-memory streams.
In order to get around texts inside images, it would require you to use OCR logic which is also HARD to execute. You'll prolly want to use AWS Textract or Google Vision for OCR. If compliance and security is an issue, you could use Tesseract.
If in any case that you have a reliable OCR solution, I would suggest to run a text extraction job after an upload event happens, this will save you tons of money to pay for any OCR service that you'll consume, it will also enable your organization to cache the contents of the pdf in text format in more search-friendly services like AWS OpenSearch
Here's a tutorial which uses Tika (for PDF OCR) and OpenSearch (for search engine) to search the contents of PDF files within an S3 bucket:
import boto3
from tika import parser
from opensearchpy import OpenSearch
from config import *
import sys
# opensearch object
os = OpenSearch(opensearch_uri)
s3_file_name="prescription.pdf"
bucket_name="mixpeek-demo"
def download_file():
"""Download the file
:param str s3_file_name: name of s3 file
:param str bucket_name: bucket name of where the s3 file is stored
"""
# s3 boto3 client instantiation
s3_client = boto3.client(
's3',
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region_name
)
# open in memory
with open(s3_file_name, 'wb') as file:
s3_client.download_fileobj(
bucket_name,
s3_file_name,
file
)
print("file downloaded")
# parse the file
parsed_pdf_content = parser.from_file(s3_file_name)['content']
print("file contents extracted")
# insert parsed pdf content into elasticsearch
insert_into_search_engine(s3_file_name, parsed_pdf_content)
print("file contents inserted into search engine")
def insert_into_search_engine(s3_file_name, parsed_pdf_content):
"""Download the file
:param str s3_file_name: name of s3 file
:param str parsed_pdf_content: extracted contents of PDF file
"""
doc = {
"filename": s3_file_name,
"parsed_pdf_content": parsed_pdf_content
}
# insert
resp = os.index(
index = index_name,
body = doc,
id = 1,
refresh = True
)
print('\nAdding document:')
print(resp)
def create_index():
"""Create the index
"""
index_body = {
'settings': {
'index': {
'number_of_shards': 1
}
}
}
response = os.indices.create(index_name, body=index_body)
print('\nCreating index:')
print(response)
if __name__ == '__main__':
globals()[sys.argv[1]]()
full tutorial: https://medium.com/#mixpeek/search-text-from-pdf-files-stored-in-an-s3-bucket-2f10947eebd3
Corresponding github repo: https://github.com/mixpeek/pdf-search-s3

Using different settings on a aws lambda function coded in python

I'm using a lambda function, coded in python, as a backend to an aws-api-gateway method.
The api is completed, but now I have a new problem, the API should be deployed to multiple environments (production, test, etc), and each one should use a different configuration for the backend. Let's say that I had this handler:
import settings
import boto3
def dummy_handler(event, context):
logger.info('got event{}'.format(event))
utils = Utils(event["stage"])
response = utils.put_ticket_on_dynamodb(event["item"])
return json.dumps(response)
class Utils:
def __init__(self, stage):
self.stage = stage
def put_ticket_on_dynamodb(self, item):
# Write record to dynamoDB
try:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(settings.TABLE_NAME)
table.put_item(Item=item)
except Exception as e:
logger.error("Fail to put item on DynamoDB: {0}".format(str(e)))
raise
logger.info("Item successfully written to DynamoDB")
return item
Now, in order to use a different TABLE_NAME on each stage, I replace the setting.py file by a module, with this structure:
settings/
__init__.py
_base.py
_servers.py
development.py
production.py
testing.py
Following this answer here.
But I don't have any idea of how can I use it on my solution, considering that stage (passed as parameter to the Utils class), will match the settings filename in the module settings, What should I change in my class Utils to make it works?
Another alternative to handling this use case is to use API Gateway's stage variables and pass in the setting which vary by stage as parameters to your Lambda function.
Stage variables are name-value pairs associated with a specific API deployment stage and act like environment variables for use in your API setup and mapping templates. For example, you can configure an API method in each stage to connect to a different backend endpoint by setting different endpoint values in your stage variables.
Here is a blog post on using stage variables.
Here is the full documentation on using stage variables.
I finally used a different approach here. Instead of a python module for the setting, I used a single script for the settings, with a dictionary containing the configuration for each environment. I would like to use a separate settings script for each environment, but so far I can't find how.
So, now my settings file looks like this:
COUNTRY_CODE = 'CL'
TIMEZONE = "America/Santiago"
LOCALE = "es_CL"
DEFAULT_PAGE_SIZE = 20
ENV = {
'production': {
'TABLE_NAME': "dynamodbTable",
'BUCKET_NAME': "sssBucketName"
},
'testing': {
'TABLE_NAME': "dynamodbTableTest",
'BUCKET_NAME': "sssBucketNameTest"
},
'test-invoke-stage': {
'TABLE_NAME': "dynamodbTableTest",
'BUCKET_NAME': "sssBucketNameTest"
}
}
And my code:
def put_ticket_on_dynamodb(self, item):
# Write record to dynamoDB
try:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(settings.ENV[self.stage]["TABLE_NAME"])
table.put_item(Item=item)
except Exception as e:
logger.error("Fail to put item on DynamoDB: {0}".format(str(e)))
raise
logger.info("Item successfully written to DynamoDB")
return item