Where does AWS Amplify DataStore store its data? - amazon-web-services

I have been looking for a Firebase Firestore equivalence to AWS World, I came across AWS Amplify and saw the DataStore module. From watching this video I see it does pretty much exactly what I expected, that is being about to have a real-time data mechanism, I watched this video from the official Amplify website but I haven't really seen where the data get a store or is this just a pub-sub kind of mechanism where I need to additionally write code which writes data to where I want it to be store e.g to DynamoDb or any other destination on Cloud?

Related

Transfer/Replicate Data periodically from AWS Documentdb to Google Cloud Big Query

We are building a customer facing App. For this app, data is being captured by IoT devices owned by a 3rd party, and is transferred to us from their server via API calls. We store this data in our AWS Documentdb cluster. We have the user App connected to this cluster with real time data feed requirements. Note: The data is time series data.
The thing is, for long term data storage and for creating analytic dashboards to be shared with stakeholders, our data governance folks are requesting us to replicate/copy the data daily from the AWS Documentdb cluster to their Google cloud platform -> Big Query. And then we can directly run queries on BigQuery to perform analysis and send data to maybe explorer or tableau to create dashboards.
I couldn't find any straightforward solutions for this. Any ideas, comments or suggestions are welcome. How do I achieve or plan the above replication? And how do I make sure the data is copied efficiently - memory and pricing? Also, don't want to disturb the performance of AWS Documentdb since it supports our user facing App.
This solution would need some custom implementation. You can utilize Change Streams and process the data changes in intervals to send to Big Query, so there is a data replication mechanism in place for you to run analytics. One of the use cases of using Change Streams is for analytics with Redshift, so Big Query should serve a similar purpose.
Using Change Streams with Amazon DocumentDB:
https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html
This document also contains a sample Python code for consuming change streams events.

How to retrieve last added files programmatically from Amazon S3

I'm using the Agora SDK & RestAPI to recording life streams and upload them on AWS S3, the sdk comes with snapshoting feature that's Im planning to use for stream thumbnails.
Both Cloud Recording and Snapshot Recording are integrated with my app and works perfectly,
the Remaining problem is that the snapshots are named down to the microseconds
Agora snapshot services image file naming
From my overview, The services works as follow, Mobile App sends data to my Server, My server make requests to the Agora API so It joins the Life Stream channel and starts snapshoting and saving the images to AWS, so I suppose it's impossible to have time synchronization between AWS, AgoraRestAPI, my Server & my mobile app.
I've gone through their docs and I can't find anything about retrieving the file names.
I was thinking maybe I can have a Lambda Function that retrieves the last added file on a given bucket/folder? but due to my lack of knowledge on AWS and Lambda functions I don't know how's that or if it's possible.
any suggestions would be appreciated

Rate limited API requests in Cloud Composer

I'm planning a project whereby I'd be hitting the (rate-limited) Reddit API and storing data in GCS and BigQuery. Initially, Cloud Functions would be the choice, but I'd have to create a Datastore implementation to manage the "pseudo" queue of requests and GAE for cron jobs.
Doing everything in Dataflow wouldn't make sense because it's not advised the make external requests (i.e. hitting the Reddit API) and perpetually running a single job.
Could I use Cloud Composer to read fields from a Google Sheet, then create a queue of requests based on the Google Sheet, then have a task queue execute those requests, store them in GCS and load into BigQuery?
Sounds like a legitimate use case for Composer, additionally you could also leverage the pool concept in Airflow to manage concurrent calls to the same endpoint (e.g., Reddit API).

What would be the AWS equivalent to Firebase Realtime Database?

I'm working on a new game project at the moment that will consist of a React Native front-end and a Lambda-based back-end. The app requires some real time features such as active user records, geofencing, etc.
I was looking at Firebase's Realtime Database that looks like a really elegant solution for real-time data sync but I don't think AWS has anything quite like it.
The 3 options I could think of for "serverless" realtime using only AWS services are:
Option 1: AWS IoT Messaging over WebSockets
This one is quite obvious, a managed WebSockets connection through the IoT SDK. I was thinking of triggering Lambdas in response to inbound and outbound events and just use WebSockets as the realtime layer, building custom handling logic on the app client as you typically would.
The downside to this, at least compared to Firebase, is that I will have to handle the data in the events myself which will add another layer of management on top of WebSockets and will have to be standardized with the API data layer in the application's stores.
Pros:
Scalable bi-directional realtime connection
Cons:
Only works when the app is open
Message structure needs to be implemented
Multiple transport layers to be managed
Option 2: Push-triggered re-fetch
Another option is to use push notifications as real-time triggers but use a regular HTTP request to API Gateway to actually get the updated payload.
I like this approach because it sticks to only one transport layer and a single source of truth for application state. It will also trigger updates when the app is not open since these are Push Notifications.
The downside is that this is a lot of custom work with potentially difficult mappings between push notifications to the data that needs to be fetched.
Pros:
Push notifications work even when app is closed
Single source of truth, transport layer
Cons:
Most custom solution
Will involve many more HTTP requests overall
Option 3: Cognito Sync
This is newer to me and I'm not sure if it can actually be interfaced with from the server.
Cognito Sync offers user state sync. across devices complete with offline support and is part of the Cognito SDK which I'll be using anyway. It sounds like just what I'm looking for but couldn't find any conclusive evidence as to whether it is possible to modify, or "trigger", updates from AWS and not just from one of the devices.
Pros:
Provides an abstracted real-time data model
Connected to Cognito user records OOTB
Cons:
Not sure if can be modified or updated from Lambdas
I'm wondering if anyone has experience doing real-time on AWS as part of a Lambda-based architecture and if you have an opinion on what is the best way to proceed?
I asked a similar question to the AWS Support, and this was their response.
My question to them:
What's the group of AWS services (if it's possible) to give that same
in-browser real-time DBaaS feel like Firebase?
AWS Cognito seems to be great for user-accounts. Is there anything
similar for the WebSockets / real-time DB part?
Their response:
To your question, Firebase is closest to the AWS service AWS
MobileHub. You can check out more details below about mobilehub from
below link.
https://aws.amazon.com/mobile/details/
https://aws.amazon.com/mobile/getting-started/
"AWS Cognito seems to be great for user-accounts. Is there anything
similar for the WebSockets / real-time DB part?"
Amazon Dynamodb is a fast and flexible NoSQL database service for all
applications that need consistent, single-digit millisecond latency at
any scale. It is a fully managed cloud database and supports both
document and key-value store models. Its flexible data model, reliable
performance, and automatic scaling of throughput capacity, makes it a
great fit for mobile, web, gaming, ad tech, IoT, and many other
applications.
Amazon Dynamodb can be further optimized with Amazon DynamoDB
Accelerator (DAX) which is a fully managed, highly available,
in-memory cache that can reduce Amazon DynamoDB response times from
milliseconds to microseconds, even at millions of requests per second.
For more information, please see below documentation.
https://aws.amazon.com/dynamodb/getting-started/
https://aws.amazon.com/dynamodb/dax/
Should you have any further questions, please do not hesitate to let
me know.
Thanks.
Best regards,
Tayo O. Amazon Web Services
Check out the AWS Support Knowledge Center, a knowledge base of
articles and videos that answer customer questions about AWS services:
https://aws.amazon.com/premiumsupport/knowledge-center/?icmpid=support_email_category
Also while researching this answer I also found this, looks interesting:
https://aws.amazon.com/blogs/database/how-to-build-a-chat-application-with-amazon-elasticache-for-redis/
The comments to that article is interesting as well.
Jacob Wakeem:
What advantage this
approach have over using aws iot? It seems that iot has all these
functionality without writing a single line of code and with
server-less architecture.
Sam Dengler:
The managed PubSub feature in the AWS IoT
service is also a good approach to message-based applications, like
the one demonstrated in the article. With Elasticache (Redis),
customers who use Pub/Sub are typically also using Redis as a data
store for other use cases such as caching, leaderboards, etc. With
that said, you could also use ElastiCache (Redis) with the AWS IoT
service by triggering an AWS Lambda function via the AWS IoT rules
engine. Depending on how the message-based application is architected
and how the data is leveraged, one solution may be a better fit than
the other.
Check out AWS AppSync for some of these realtime and offline features using different data sources, including databases search and compute.
AWS Amplify is AWS's modern answer to Firebase.
Fastest way to build mobile and web applications
AWS Amplify is a development platform for building secure, scalable
mobile and web applications. It makes it easy for you to authenticate
users, securely store data and user metadata, authorize selective
access to data, integrate machine learning, analyze application
metrics, and execute server-side code. Amplify covers the complete
mobile application development workflow from version control, code
testing, to production deployment, and it easily scales with your
business from thousands of users to tens of millions. The Amplify
libraries and CLI, part of the Amplify Framework, are open source and
offer a pluggable interface that enables you to customize and create
your own plugins.
Sounds like AWS Serverless is most suited alternative.
Also wondering: AWS vs Firebase - Is It Even a Fair Fight?
AWS Amplify. You can find more information here: AWS Amplify
You could consider using supabase.
It is opensource and can be installed onto ec2 / docker containers.
https://supabase.com/docs/guides/hosting/docker
I've found the hosted solution / free really poewrful to get up and running quickly. (yet to deploy to aws)
I know this is an old question, but nowadays AWS offers AppSync... a service that destroys Firebase RDB in every aspect

AWS : What's the difference between Simple Workflow Service and Data Pipeline?

What's the difference between Amazon Simple Workflow Service and Amazon Data Pipeline ? It seems that they are pretty much the same product. The Data Pipeline has a nice web based diagram editor though.
Cheers !
From http://aws.amazon.com/datapipeline/faqs/
Q: How is AWS Data Pipeline different from Amazon Simple Workflow
Service?
While both services provide execution tracking, retry and
exception-handling capabilities, and the ability to run arbitrary
actions, AWS Data Pipeline is specifically designed to facilitate the
specific steps that are common across a majority of data-driven
workflows – inparticular, executing activities after their input data
meets specific readiness criteria, easily copying data between
different data stores, and scheduling chained transforms. This highly
specific focus means that its workflow definitions can be created
[with] very rapidly and with no code or programming knowledge.
Data Pipeline is service used to transfer data between various services of AWS. Example you can use DataPipeline to read the log files from your EC2 and periodically move them to S3.
Simple Workflow service is very powerful service. You can write even your workflow logic using it. Example : Most of the ecommerce systems have scalability problems in their order systems. You can use write code in SWF to make this ordering workflow process itself.
AWS Big Data Blog does a wonderful job of explaining key features of SWF, Data Pipeline & Lambda.
Below diagram is copied from the blog.