input data to aws elasticsearch using boto3 or es library - amazon-web-services

I have a lot of data that I want to send to aws elasticsearch. by looking at the https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-gsg-upload-data.html aws website it uses curl -Xput However I want to use python to do this therefore I've looked into boto3 documentation but cannot find a way to input data.
https://boto3.amazonaws.com/v1/documentation/api/1.9.42/reference/services/es.html I cannot see any method that inserts data.
This seems very basic job. Any help?

You can send the data to elastic search using HTTP interface. Here is the code sourced from
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-request-signing.html
from requests_aws4auth import AWS4Auth
import boto3
host = '' # For example, my-test-domain.us-east-1.es.amazonaws.com
region = '' # e.g. us-west-1
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
es = Elasticsearch(
hosts = [{'host': host, 'port': 443}],
http_auth = awsauth,
use_ssl = True,
verify_certs = True,
connection_class = RequestsHttpConnection
)
document = {
"title": "Moneyball",
"director": "Bennett Miller",
"year": "2011"
}
es.index(index="movies", doc_type="_doc", id="5", body=document)
print(es.get(index="movies", doc_type="_doc", id="5"))
EDIT
To confirm whether data is pushed to the elastic cache under your index, you can try to do an HTTP GET by replacing the domain and index name
search-my-domain.us-west-1.es.amazonaws.com/_search?q=movies

Related

Not able to create an sagemaker endpoint with datacaptureconfig enabled using boto3 API

SageMaker version: 2.129.0
boto3 version: 1.26.57
Error details: ClientError: An error occurred (ValidationException) when calling the CreateEndpoint operation: One or more endpoint features are not supported using this configuration.
Steps to replicate the above issue:
# Download model
!aws s3 cp s3://sagemaker-sample-files/models/xgb-churn/xgb-churn-prediction-model.tar.gz model/
# Step 1 – Create model
import boto3
from sagemaker.s3 import S3Uploader
import datetime
from sagemaker import get_execution_role
bucket = "sagemaker-us-east-x-xxxxxxxxxx"
prefix = "sagemaker/xgb"
sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_session.region_name
sm_boto3 = boto3.client("sagemaker")
model_url = S3Uploader.upload(
local_path="model/xgb-churn-prediction-model.tar.gz",
desired_s3_uri=f"s3://{bucket}/{prefix}",
)
from sagemaker import image_uris
image_uri = image_uris.retrieve("xgboost", region, "0.90-1")
model_name = f"DEMO-xgb-churn-pred-model-{datetime.datetime.now():%Y-%m-%d-%H-%M-%S}"
resp = sm_boto3.create_model(
ModelName=model_name,
ExecutionRoleArn=get_execution_role(),
Containers=[{"Image": image_uri, "ModelDataUrl": model_url}],
)
# Step 2 – Create endpoint config
epc_name = f"DEMO-xgb-churn-pred-epc-{datetime.datetime.now():%Y-%m-%d-%H-%M-%S}"
endpoint_config_response = sm_boto3.create_endpoint_config(
EndpointConfigName = epc_name,
ProductionVariants=[
{
'InstanceType':'ml.m5.xlarge',
'InitialInstanceCount':1,
'ModelName':model_name,
'VariantName':'production',
'InitialVariantWeight':1
}
],
DataCaptureConfig={
'EnableCapture': True,
'InitialSamplingPercentage': 50,
'DestinationS3Uri': 's3://sagemaker-us-east-x-xxxxxxxxxx/sagemaker/xgb/',
'CaptureOptions': [
{
'CaptureMode': 'InputAndOutput'
},
],
'CaptureContentTypeHeader': {
'JsonContentTypes': [
'application/json',
]
}
}
)
print('Endpoint configuration name: {}'.format(epc_name))
print('Endpoint configuration arn: {}'.format(endpoint_config_response['EndpointConfigArn']))
# Step 3 - Create endpoint
endpoint_name = f"DEMO-xgb-churn-pred-ep-{datetime.datetime.now():%Y-%m-%d-%H-%M-%S}"
endpoint_params = {
'EndpointName': endpoint_name,
'EndpointConfigName': epc_name,
}
endpoint_response = sm_boto3.create_endpoint(EndpointName=endpoint_name, EndpointConfigName=epc_name)
print('EndpointArn = {}'.format(endpoint_response['EndpointArn']))
Expected behaviour: Should able to create an endpoint with data capture enabled using boto3 API.
I was able to replicate your set up and it works for me as expected. I am using a higher version of boto3 and SageMaker, consider upgrading both and give it a try !
boto3 version - 1.26.62
sagemaker version - 2.131.0

opensearch authentication with opensearch-py on aws lambda

I am trying to connect to AWS OpenSearch domain from AWS Lambda using the opensearch python client (development purposes, non production).
I was trying the following:
from opensearchpy import OpenSearch
import boto3
from requests_aws4auth import AWS4Auth
import os
import config
my_region = os.environ['AWS_REGION']
service = 'es' # still es???
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, my_region, service, session_token=credentials.token)
openSearch_endpoint = config.openSearch_endpoint
# sth wrong here:
openSearch_client = OpenSearch(hosts = [openSearch_endpoint], auth = awsauth)
as per the following blogs:
https://aws.amazon.com/blogs/database/indexing-metadata-in-amazon-elasticsearch-service-using-aws-lambda-and-python/
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/search-example.html
but it does not work (it does not want to authenticate, "errorMessage":"AuthorizationException(403, '')" . However if I don't use the python client but simply go through requests instead:
import requests
host = config.openSearch_endpoint
url = host + '/' +'_cat/indices?v'
# this one works:
r = requests.get(url, auth=awsauth)
, my lambda function does communicate with the OpenSearch domain.
I consulted the OpenSearch() documentation but it is not clear to me how its parameters map to boto3 session credentials, and/or to AWS4Auth. So what should this line
openSearch_client = OpenSearch(hosts = [openSearch_endpoint], auth = awsauth)
be?
actually managed to find the solution a couple of hours later:
from opensearchpy import OpenSearch, RequestsHttpConnection
my_region = os.environ['AWS_REGION']
service = 'es' # still es?
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, my_region, service, session_token=credentials.token)
host = config.openSearch_endpoint
openSearch_client = OpenSearch(
hosts=[openSearch_endpoint],
http_auth = awsauth,
use_ssl = True,
verify_certs = True,
ssl_assert_hostname = False,
ssl_show_warn = False,
connection_class=RequestsHttpConnection
)
OpenSearch Service requires port 443 for incoming requests therefore you need to add a new Inbound Rule under Security Group attached to your OpenSearch Service domain.
Try the rule:
Type: HTTPS
Protocol: TCP
Port range: 443
Source: 0.0.0.0/0 (Anywhere-IPv4)
Additionally, you should have a Resource-based policy for your Lambda function to perform requests to your OpenSearch Service domain.

How can I get reports from Google Cloud Storage using the Google's API

I have to create a program that get informations on a daily basis about installations of a group of apps on the AppStore and the PlayStore.
For the PlayStore, using Google Cloud Storage I followed the instructions on this page using the client library and a Service Account method and the Python code example :
https://support.google.com/googleplay/android-developer/answer/6135870?hl=en&ref_topic=7071935
I slightly changed the given code to make it work since documentation looks not up-to-date. I made it possible to connect to the API and it seems to connect correctly.
My problem is that I don't understand what object I get and how to use it. It's not a report it just looks like files properties in a dict.
This is my code (private data "hidden") :
import json
from httplib2 import Http
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.discovery import build
client_email = '************.iam.gserviceaccount.com'
json_file = 'PATH/TO/MY/JSON/FILE'
cloud_storage_bucket = 'pubsite_prod_rev_**********'
report_to_download = 'stats/installs/installs_****************_202005_app_version.csv'
private_key = json.loads(open(json_file).read())['private_key']
credentials = ServiceAccountCredentials.from_json_keyfile_name(json_file, scopes='https://www.googleapis.com/auth/devstorage.read_only')
storage = build('storage', 'v1', http=credentials.authorize(Http()))
supposed_to_be_report = storage.objects().get(bucket=cloud_storage_bucket, object=report_to_download).execute()
When I print the supposed_to_be_report - which is a dictionary- I only get what I understand as Metadata about he report like this:
{'kind': 'storage#object', 'id': 'pubsite_prod_rev_***********/stats/installs/installs_****************_202005_app_version.csv/1591077412052716',
'selfLink': 'https://www.googleapis.com/storage/v1/b/pubsite_prod_rev_***********/o/stats%2Finstalls%2Finstalls_*************_202005_app_version.csv',
'mediaLink': 'https://storage.googleapis.com/download/storage/v1/b/pubsite_prod_rev_***********/o/stats%2Finstalls%2Finstalls_****************_202005_app_version.csv?generation=1591077412052716&alt=media',
'name': 'stats/installs/installs_***********_202005_app_version.csv',
'bucket': 'pubsite_prod_rev_***********',
'generation': '1591077412052716',
'metageneration': '1',
'contentType': 'text/csv;
charset=utf-16le', 'storageClass': 'STANDARD', 'size': '378', 'md5Hash': '*****==', 'contentEncoding': 'gzip'......
I am not sure I'm using it correctly. Could you please explain me where am I wrong and/or how to get installs reports correctly ?
Thanks.
I can see that you are using googleapiclient.discovery client, this is not an issue, but the recommended way to access Google Cloud APIs programmatically is by using the client libraries.
Second, you are just retrieving the object's metadata. You can download the object to have access to the file contents, this is a sample using the client library.
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Blob {} downloaded to {}.".format(
source_blob_name, destination_file_name
)
)
Sample taken from official docs.

How to get EC2 memory utilization using command line (aws cli)

I am trying to get EC2 memory utilization using aws cli and I see that EC2MemoryUtilization is not available as a metric. I installed cloudwatch agent in the ec2 instance and I have created a dashboard for mem_used_percent.
Now I want to consume the memory used data points programmatically. I could find for CPUUtilization but I am unable to find anything for Memory utilization.
Any help in this regard is helpful. Thanks!
This python script pushes the system memory metrics to cloudwatch in a custom namespace. Schedule the script in crontab to execute it at every 1 minute or 5 minutes to plot the system memory metrics with respect to time. Ensure that IAM role assigned to the vm has sufficient privileges to put metric data to cloudwatch.
#!/usr/bin/env python
import psutil
import requests
import json
import os
import boto3
get_memory = psutil.virtual_memory()
free_memory = get_memory.free/(1024*1024*1024)
print "Free Memory: ", free_memory, "GB"
headers = {'content-type': 'application/json'}
req = requests.get(url='http://169.254.169.254/latest/meta-data/iam/security-credentials/cloudwatch_access', headers=headers)
res = json.loads(req.text)
AccessKeyId = res['AccessKeyId']
SecretAccessKey = res['SecretAccessKey']
Token = res['Token']
Region = "ap-south-1"
os.environ["AWS_ACCESS_KEY_ID"] = AccessKeyId
os.environ["AWS_SECRET_ACCESS_KEY"] = SecretAccessKey
os.environ["AWS_SESSION_TOKEN"] = Token
os.environ["AWS_DEFAULT_REGION"] = Region
namespace = 'mynamespace'
dimension_name = 'my_dimension_name'
dimension_value = 'my_dimension_value'
cloudwatch = boto3.client('cloudwatch')
cloudwatch.put_metric_data(
MetricData=[
{
'MetricName': 'Free Memory',
'Dimensions': [
{
'Name': dimension_name,
'Value': dimension_value
},
],
'Unit': 'Gigabytes',
'Value': free_memory
},
],
Namespace=namespace
)

AWS S3 Bucket Upload/Transfer with boto3

I need to upload files to S3 and I was wondering which boto3 api call I should use?
I have found two methods in the boto3 documentation:
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.upload_file
http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html
Do I use the client.upload_file() ...
#!/usr/bin/python
import boto3
session = Session(aws_access_key_id, aws_secret_access_key, region)
s3 = session.resource('s3')
s3.Bucket('my_bucket').upload_file('/tmp/hello.txt', 'hello.txt')
or do I use S3Transfer.upload_file() ...
#!/usr/bin/python
import boto3
session = Session(aws_access_key_id, aws_secret_access_key, region)
S3Transfer(session).upload_file('/tmp/hello.txt', 'my_bucket', 'hello.txt')
Any suggestions would be appreciated. Thanks in advance.
.
.
.
possible solution...
# http://boto3.readthedocs.io/en/latest/reference/services/s3.html#examples
# http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.put_object
# http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.get_object
client = boto3.client("s3", "us-west-1", aws_access_key_id = "xxxxxxxx", aws_secret_access_key = "xxxxxxxxxx")
with open('drop_spot/my_file.txt') as file:
client.put_object(Bucket='s3uploadertestdeleteme', Key='my_file.txt', Body=file)
response = client.get_object(Bucket='s3uploadertestdeleteme', Key='my_file.txt')
print("Done, response body: {}".format(response['Body'].read()))
It's better to use the method on the client. They're the same, but using the client method means you don't have to setup things yourself.
You can use Client: low-level service access : I saw a sample code in https://www.techblog1.com/2020/10/python-3-how-to-communication-with-aws.html