Looking to build a mobile application that records a a session of data. The data is required to be cleansed, and then uploaded into an incoming S3 bucket. An event is on this bucket that then triggers a Lambda function to process the data, which then is placed into an outgoing S3 bucket. This is in the form of a file with the file contents being a word on the result of the processing. This result then needs to be returned back to the device.
I'm looking to architect this using as many AWS services as possible. There does also need to be historical data available to the user(device), to see their previous results.
At the moment, I have the following ideas:
AWS Cognito to authenticate device
Mobile device will process and cleanse data, and again using Cognito authentication, place payload packet into S3 incoming bucket, with the DeviceID making up part of the filename
Process remaining as is with Lamdba function, with output being text file, again using DeviceID naming convention
Event trigger on outgoing S3 bucket, with another Lambda function to store the result into DynamoDB. Once stored, send a push notification to the device with the latest result (status)
A small EC2 instance with a custom Node.js admin app to search DynamoDB and view all results, and potentially intercept results (like a workflow) before sent to user. Even possible to trigger final notification to user from admin console
Device application will used AWS SDK to read DynamoDB results historically
Future may incorporate Elastic MapReduce to perform complex queries on results
Solution seems fairly sound, I'm still getting up to speed on all the available AWS services, so not sure if I'm missing anything glaringly obvious.
I'd recommend using AWS API Gateway in combination with Amazon Lambda instead of deploying standalone instance.
I'd also recommend using SNS for mobile push notifications since this very well aligns with the rest of your architecture.
S3 doesn't seem to be adding much value to this flow unless you are trying to decouple the processing of the input from the ingestion. You can invoke Amazon Lambda directly from your mobile app and write the result into ddb directly from lambda.
If you do use S3, I'd recommend using the cognito id as part of the key over the device id. The benefits of this are twofold, you can enable fine grain access control so that one user can't access another user's s3 objects or rows in ddb, and if a user has multiple devices and you use authenticated users, the user can see the same data as they transition between their mobile devices.
Related
There is a use case I'm working on and I'm not quite sure how it can be solved. The main goal is to upload an image from a react native app to an Amazon S3 bucket using an AWS Lambda function (API Gateway) in order to use Amazon Rekognition service with another Amazon S3 image depending on some values sent to the lambda.
Since the image could be too large I have to use presigned URLs, which means that i make a request to the lambda to get a presigned S3 url to the client so that the client uploads the image to the bucket straight away. But then, how can i use face Rekognition service within the AWS Lambda?
I know i can trigger a lambda after an S3 upload, so i could do the face Rekognition request right after the user makes the http request with the presigned URL, but how can i get face Rekognition service response from that triggered lambda to the original user?
I've thought about SNS, but sending some text message to the user after an image upload instead of a message in the app seems odd.
Thank you in advance and apologies for the long read
You're on the right track with SNS, maybe not with the service, but with the principal.
This is a problem of asynchronous handling of requests and how you can subsequently inform the user of the server's decision. To start, you're async process will need to store the result of the facial recognition somewhere, if you're already using a database (SQL or NoSQL), that would seem to be the place to do this.
Then you have to get the information to the user. Since your user is running a mobile application, there are only two ways of doing this. Either the user will have to poll the back-end service in order to retrieve the result of the async process, or your back-end will need to push the result to the device. Polling the service is straightforward and is usable depending on the load you expect from your application and the duration of the asynchronous process. You can also use long polling to reduce the number of requests, but this doesn't fix the issue (too many users spamming your service waiting for the result) itself.
If you want to notify the users, you will have to create a notification mechanism that is not based on polling a service. You could for example make use of WebSockets, configure your devices to have an MQTT connection (e.g., with AWS IoT) or use another cloud-based notification service that allows you to push messages to the device. You also do not have to include all the information in the message you push to your devices. The pushed message can be a trigger for the device to retrieve the result from the back-end service e.g., using an HTTP API.
There are close to 100,000 devices that are generating logs (total of 10-20 TB a day) which I would like them to directly upload to kinesis. How do I control access? IAM only lets me create a max of 1000 users per account (I know we can request user limit increase), but would like to know what is a better way to do this.
One requirement is, I would like to be able to grant/revoke access to kinesis per device.
Since you have IoT Core already, I think that I would first try to leverage it for logging. This will let you take advantage of the certificate-based authorization that's built-in to IoT core, and I know that you can hook an IoT topic into a Kinesis stream.
If you feel that this would be too much volume (and perhaps too expensive based on the number of messages and rules), then I'd provide my devices with temporary security credentials that let them write to Kinesis and nothing else.
You would generate these credentials on a per-device basis (as far as I can tell, there are no quotas on the number of credentials per account), using a scheduled job, either in Lambda or on ECS. This job would iterate through your devices and generate a set of credentials for each. It would then either publish these credentials to the device via IoT Core, or update the device shadow.
The device could then use these credentials to create a Kinesis client to publish log messages. Your client would have to create a new client whenever it receives new credentials.
As an alternative, if your devices maintain logfiles internally, you could use a similar approach to trigger uploading those files to S3. In that case, rather than publishing temporary credentials, the scheduled task would publish a pre-signed URL for each device. It would publish the URL to the device, and the device would use that to upload its accumulated logs. Then you'd need something to do something with the files on S3.
I have a system which uses AWS IoT to send data to a DynamoDB table. This is simple weather data and stores time, temperature, humidity and pressure. So far that side is working well.
I want to display data from this table on a webpage hosted on S3. In its most simple form this would just display the latest row of data. I had though that this would be a case of a simple client-side javascript to query the database, but looking on Amazon it gets quite complicated with Lambda functions called through API gateway using IAM to certify.
Is there a simpler way to go about this? Data should be publicly readable, and non-writeable, so I thought should be easier than what I have read so far.
Please have a look at the Simple Web Service CDK Pattern. It helps you create a simple end-to-end service using API Gateway, a Lambda function, and access to a DynamoDB table with just a few lines of code. It is available in multiple programming languages.
As a general note: Whenever you want to provide dynamic content, you need some kind of application that takes care of it. An API Gateway backed by an AWS Lambda function is no more complicated than running a web server with all the undifferentiated heavy lifting like network configuration, firewall setup, OS patching, and maintenance. And proper handling of identity and access control needs to be done in any case.
If you really want to just display the latest row, and you prefer to keep you webpage as static as possible, I would consider just writing out the latest row of dynamodb to a simple json file using whatever backend process you want, then that file can be consumed by your front end application without having to worry about IAM Credentials or even the AWS JS SDK - keep it as simple and lightweight as possible.
No need to repeatedly hit your dynamodb to pull back the same data for each page load either - which should also save you some money in the long run.
There is a in-browser JavaScript SDK. It allows the JavaScript on your web page to make calls directly to DynamoDB, without having to make API calls through API Gateway that trigger a Lambda function that itself makes calls to DynamoDB on your behalf.
The main consideration here is how to authenticate your web browser client. It cannot make DynamoDB API calls without AWS credentials. I'm assuming that you have no server-side component that can return AWS credentials to the client.
You could embed read-only, minimally-permissioned AWS credentials in the page itself - this is definitely not a best practice, but it's simple and in some cases might be acceptable. Other options include requiring the web site user to authenticate via Web Identity Federation, which would dynamically yield a set of usable, time-limited AWS credentials. Another, more sophisticated option, would be AWS Amplify which supports both authenticated and unauthenticated clients. This is generally preferable to hard-coding (read-only) AWS credentials in a web page but is a little more complex to set up.
There is also a blog post: Dynamic Websites Using the AWS SDK for JavaScript in the Browser.
In all these scenarios, the page itself would make API calls directly to DynamoDB, and you should ensure that the IAM role associated with the credentials is heavily restricted (just the relevant DynamoDB table(s) and just the necessary query/get permissions).
Well, I have a web page (PHP) that is running on-premise and it's accessed from different countries. I would like to catch some data and store it somewhere. I can handle internally with the team the data and the format of the file to catch the info. But we would like to get leverage of AWS to store it in S3. So we notice that we need an intermedium layer to avoid use AWS credentials required for S3.
as this page is on the internet and it's consumed by a user thru web for sure we don't want to include anything for credentials embedded in the site. So likely Kinesis data firehose as consumer role could just catch the data send by our page and then internally store it in S3.
Question
I see that exist an SDK for Kinesis but it requires AWS credentials. We really need a kind of link where we need the data produced and AWS handles the rest. But I don't know why I require to set up AWS credentials using the SDK. Does it mean then that our website will load and live with our credentials? I don't feel this approach secure. I appreciate the comments.
You can use API Gateway Kinesis Proxy to avoid using credentials or even aws-sdk in your webpages.
https://docs.aws.amazon.com/apigateway/latest/developerguide/integrating-api-with-aws-services-kinesis.html
This way you don't need to expose any credentials and control permissions with a role.
If you are worried about having a security issue and if the users are authenticated, you can use custom authorizers to authorize the url.
https://docs.aws.amazon.com/apigateway/latest/developerguide/use-custom-authorizer.html
If it is public facing, then just the above integration should work.
Hope it helps.
I'm a first-timer with AWS and I'm a bit lost.
I would like to have a serverless stack using Cognito to handle authentication, DynamoDB, Lambda and CloudFront for exposing REST services.
I don't know exactly how to handle users data. For example, I would like to store the user email and physical address. I've seen you can have that directly in Cognito, however, I would like to perform custom validation when these attributes are set/updated.
Can I do that easily with a trigger, letting the user have a Write access on its data?
Or should I restrain Write access to these attributes and expose a REST service to update them manually in a lambda?
I've also seen someone using a users table in DynamoDB to store some data, what are the advantage compared to using directly the identity pool?
Thanks,
You can easily store this kind of data(email, address) in Cognito user pools and validate the data using PreSignUp Lambda trigger, more details.
The advantage of using DynamoDB to store user data is that you will almost certainly hit a RequestLimitExceeded exception using Cognito as a primary data store. If you contact AWS support and explain what you are doing, they will up the Cognito API limit on your account - but that only temporarily solves the problem. Since Amazon doesn't publish what will trigger a RequestLimitExceeded error, you will eventually hit it again if your traffic increases.
Every time I have tried to use Cognito as the only source of user data I have run into this problem. So I end up storing user data in Dynamo or RDS.
If you don't have a lot of traffic or if you aren't going to be querying the Cognito API often, then it might work for you