Consuming API's using AWS Lambda - amazon-web-services

I am a newcomer to AWS with very little cloud experience. The project I have is to call and consume a API from NOAA, and then save parse the returned XML document to a database. I have a ASP.NET console app that is able to do this pretty easily and successfully. However, I need to do the same thing, but in the cloud on a serverless architecture. Here are the steps I am wanting it to take:
Lambda calls the API at NOAA everyday at midnight
the API returns an XML doc with results
Parse the data and save the data to a cloud PostgreSQL database
It sounds simple, but I am having one heck of a time figuring out how to do this. I have a DB requisitioned from AWS, as that is where data is currently going through my console app. Does anyone have any advice or a resource I could look at for advice? Also, I would prefer to keep this in .NET, but realize that I may need to move it to Python.
Thanks in advance everyone!

Its pretty simple and you can test your code with below simple python boto3 lambda code.
Create new lambda function with admin access (temporary set Admin role and then you can set required role)
Add the following code
https://github.com/mmakadiya/public_files/blob/main/lambda_call_get_rest_api.py
import json
import urllib3
def lambda_handler(event, context):
# TODO implement
http = urllib3.PoolManager()
r = http.request('GET', 'http://api.open-notify.org/astros.json')
print(r.data)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
The above code will run the REST API to fetch the data. This is just a sample program and it will help you to go further.
MAKE SURE that Lambda function max run time is `15 minutes and it can not be run >15 min so think accordingly.

Related

Need recommendation to create an API by aggregating data from multiple source APIs

Before I start doing this I wanted to get advice from the community on the best and most efficient manner to go about doing it.
Here is what I want to do:
Ingest data from multiple API's which returns JSON
Store it in either S3 or DynamoDB
Modify the data to use my JSON structure
Pipe out the aggregate data as an API
The data will be updated twice a day, so I would pull in the data from the source APIs and put it through my pipeline twice a day.
So basically I want to create an API by aggregating data from multiple source APIs.
I've started playing with Lambda and created the following function using Python.
#https://stackoverflow.com/a/41765656
import requests
import json
def lambda_handler(event, context):
#https://www.nylas.com/blog/use-python-requests-module-rest-apis/ USEFUL!!!
#https://stackoverflow.com/a/65896274
response = requests.get("https://remoteok.com/api")
#print(response.json())
return {
'statusCode': 200,
'body': response.json()
}
#https://stackoverflow.com/questions/63733410/using-lambda-to-add-json-to-dynamodb DYNAMODB
This works and returns a JSON response.
Here are my questions:
Should I store the data on S3 or DynamoDB?
Which AWS service should I use to aggregate the data into my JSON structure?
Which service should I use to publish the aggregate data as an API, API Gateway?
However, before I go further I would like to know what is the best way to go about doing this.
If you have experience with this I would love to hear from you.
The answer will vary depending on the quantity of data you're planning to mine. Lambdas are designed for short-duration, high-frequency workloads and thus might not be suitable.
I would recommend looking into AWS Glue, as this seems like a fairly typical ETL (Extract Transform Load) problem. You can set up glue jobs to run on a schedule, and as for data aggregation, that's the T in ETL.
It's simple to output the glue dataframe (result of a transformation) as s3 files, which can then be queried directly by Amazon Athena (as if they were db content).
As for exposing that data via an API, the serverless framework or SST are great tools for taking the sting out of spinning up a serverless API and associated resources.

AWS Lambda function is not updating in basic test environment

I'm trying to set up a very basic AWS Lambda script, but I struggle to get the AWS Lambda Test functionality to recognize the changes I make.
To setup the simplest possible test, I created a new AWS Lambda function for Python 3.7. I then make a simple change in the code as below, add a test output and run Test:
import json
def lambda_handler(event, context):
# TODO implement
return {
'statusCode': 200,
'body': json.dumps('I changed this text')
}
I've verified that Version: is set to $LATEST - yet when I run the test, there is no change in my output - it keeps returning the original code output. Further, if I try to export the function, I get the original code - not my updated code above (despite having saved it).
I realize this seems very basic, but I wanted to check if others have experienced this as well.
Based on feedback, it seems hitting Deploy is required in order to be able to test the updated function
Even with hitting deploy, there's a definite delay. Try adding a print statement and hitting deploy and then testing. This will show that it often doesn't accept the new code right away. EXTREMELY frustrating for debugging. I have to literally refresh my lambda console page to get the changes to take.

import JSON file to google firestore using cloud functions

I'm new to GCP and Python. I have got a task to import JSON file into google firestore using google cloud functions via Python.
Kindly assist please.
I could achieve this system setup using below code. Posting for your reference:-
CLOUD FUNCTIONS CODE
REQUIREMENTS.TXT (Dependencies)
`google-api-python-client==1.7.11
google-auth==1.6.3
google-auth-httplib2==0.0.3
google-cloud-storage==1.19.1
google-cloud-firestore==1.6.2`
MAIN.PY
from google.cloud import storage
from google.cloud import firestore
import json
client = storage.Client()``
def hello_gcs_generic(data, context):
print('Bucket: {}'.format(data['bucket']))
print('File: {}'.format(data['name']))
bucketValue = data['bucket']
filename = data['name']
print('bucketValue : ',bucketValue)
print('filename : ',filename)
testFile = client.get_bucket(data['bucket']).blob(data['name'])
dataValue = json.loads(testFile.download_as_string(client=None))
print(dataValue)
db = firestore.Client()
doc_ref = db.collection(u'collectionName').document(u'documentName')
doc_ref.set(dataValue)
Cloud functinos are server less functions provided by google. The beauty of cloud function is it will destroy it will invoke any trigger happens and itself once the execution is complete. Cloud functions are single purpose functions. Not only python, you can also use NodeJS and Go to write cloud functions. You can create a cloud function very easily by visiting quick start of cloud functions (https://cloud.google.com/functions/docs/quickstart-console).
Your task is to import a JSON file into google firestore. This part you can do using Firestore python connector like any normal python program and add into cloud function console or upload via gcloud. Still the trigger part is missing here. As I mentioned cloud function is serverless. It will execute when any event happens in the attached trigger. You haven't mentioned any trigger here (when you want to trigger the function). Once you give information about the trigger I can give more insights on resolution.
Ishank Aggarwal, You can add the above code snippet as a part of cloud function by following steps:
https://console.cloud.google.com/functions/
Create function using function name, yours requirements, choose run time as python and choose trigger as your gcs bucket.
Once you create it, if any change happens in your bucket, the function will trigger and execute your code

extract all aws transcribe results using boto3

I have a couple hundred transcribed results in aws transcribe and I would like to get all the transcribed text and store it in one file.
Is there any way to do this without clicking on each transcribed result and copy and pasting the text?
You can do this via the AWS APIs.
For example, if you were using Python, you can use the Python boto3 SDK:
list_transcription_jobs() will return a list of Transcription Job Names
For each job, you could then call get_transcription_job(), which will provide the TranscriptFileUri that is the location where the transcription is stored.
You can then use get_object() to download the file from Amazon S3
Your program would then need to combine the content from each file into one file.
See how you go with that. If you run into any specific difficulties, post a new Question with the code and an explanation of the problem.
I put an example on GitHub that shows how to:
run an AWS Transcribe job,
use the Requests package to get the output,
write output to the console.
You ought to be able to refit if pretty easily for your purposes. Here's some of the code, but it'll make more sense if you check out the full example:
job_name_simple = f'Jabber-{time.time_ns()}'
print(f"Starting transcription job {job_name_simple}.")
start_job(
job_name_simple, f's3://{bucket_name}/{media_object_key}', 'mp3', 'en-US',
transcribe_client)
transcribe_waiter = TranscribeCompleteWaiter(transcribe_client)
transcribe_waiter.wait(job_name_simple)
job_simple = get_job(job_name_simple, transcribe_client)
transcript_simple = requests.get(
job_simple['Transcript']['TranscriptFileUri']).json()
print(f"Transcript for job {transcript_simple['jobName']}:")
print(transcript_simple['results']['transcripts'][0]['transcript'])

Is it possible to make boto3 ignore signature expired error?

I was testing a Python app using boto3 to access DynamoDB and I got the following error message from boto3.
{'error':
{'Message': u'Signature expired: 20160915T000000Z is now earlier than 20170828T180022Z (20170828T181522Z - 15 min.)',
'Code': u'InvalidSignatureException'}}
I noticed that it's because I'm using the python package 'freezegun.free_time' to freeze the time at 20160915, since the mock data used by the tests is static.
I did research the error a little bit and I found this answer post. Basically, it's saying that AWS makes signatures invalid after a short time after they are created. From my understanding, in my case, the signature is marked to be created at 20160915 because of the use of 'freeze_time', but AWS uses the current time (the time when the test runs). Therefore, AWS thinks that this signature has expired for almost a year and sends an error message back.
Is there any way to make AWS ignore that error? Or is it possible to use boto3 to manually modify the date and time the signature is created at?
Please let me know if I'm not explaining my questions clearly. Any ideas are appreciated.
AWS API calls use a timestamp to prevent replay attacks. If you computer time/date is skewed too far from actual time, then the API calls will be denied.
Running requests from a computer with the date set to 2016 would certainly trigger this failure situation.
The checking is done on the host side, so there is nothing you can fix locally aside from using the real date (or somehow forcing Python into using a different date to the rest of your system).
Just came across a similar issue with immobilus. My solution was to replace datetime from botocore.auth with a unmocked version, as suggested by Antonio.
The pytest example would look like this
import types
from immobilus import logic
#pytest.fixture(scope='session', autouse=True)
def _boto_real_time():
from botocore import auth
auth.datetime = get_original_datetime()
def get_original_datetime():
original_datetime = types.ModuleType('datetime')
original_datetime.mktime = logic.original_mktime
original_datetime.date = logic.original_time
original_datetime.gmtime = logic.original_gmtime
original_datetime.localtime = logic.original_localtime
original_datetime.strftime = logic.original_strftime
original_datetime.date = logic.original_date
original_datetime.datetime = logic.original_datetime
return original_datetime
Is there any way to make AWS ignore that error?
No
Or is it possible to use boto3 to manually modify the date and time the signature is created at?
You should patch any datetime / time call that is in the auth.py file of the botocore library (source: https://github.com/boto/botocore/blob/develop/botocore/auth.py).