How can DynamoDB be used as a sink to an AWS Data Analytics Application (Flink)?
I'm not finding examples or an existing DynamoDB Sink implementation class.
The Apache Flink community is working on it. This is the related PR.
There is no official DynamoDB connector for Flink.
However, there are various third party ones such as
https://github.com/klarna-incubator/flink-connector-dynamodb
Or
https://github.com/fabricalab/streaming-flink-dynamodb-connector
I would strongly suggest that you test any third party connectors to ensure they are production ready and suit your needs.
Related
I have to create reports in SAP Analytics Cloud using data saved in delta tables in Databricks on AWS. I have come across some ready-made connectors (such as this: https://www.cdata.com/kb/tech/databricks-connect-sac.rst) but as a proof of concept my team has decided to deploy a docker container with the sap data provider (https://docs.aws.amazon.com/sap/latest/general/data-provider-installallation.html) and to pull the data into SAC via a JDBC connection. This feels like re-inventing the wheel, so I was wondering if there are ready made tools for this purpose, or if not, if anyone has done this using a docker container and can share some tips or code that would be much appreciated.
I just want to avoid use of custom/manual resolvers in appsync completely. So I'm using Amplify to setup GraphQL appsync API in my app. I'm doing all the stuffs by changing schema.graphql and amplify push.
I have 2 questions :
1. What are the limitations and what problems I'm going to face in future?
2. Can graphql subscriptions get update when app is not running(like user should be notified)?
tons of business logic will be exposed on the client side code.
I think for push notifications you would still have to go via external integrations like FCM/APNS. Multiple integration options are available in SNS
Just to preamble these answers, the fact that you use an amplify generated graphQL and resolvers doesn't stop you from later including custom resolvers and pipeline functions - it's just that you need to learn quite a bit about where to include them in the backend file structure of amplify.
1. What are the limitations and what problems I'm going to face in future?
This depends on how well your applications use-case matches the graphQL schema design and if your application is relatively self-contained. Amplify becomes more complex when your application needs to talk to other back-end systems, you'll need to start using DynamoDB triggers to notify other state machines/event bridge/SNS or similar services.
As mentioned none of these problems are crippling, you can deal with them later but it will be a step up in the AWS knowledge required to implement them.
For small high-volume/availability apps Amplify and DynamoDB as-it-comes is great. If your application matures into many micro-services and sites then you'll need to learn quite a bit more AWS to make them play together well. Amplify does determine your DynamoDB on a table per object basis and you'll probably be stuck with (paying for) that. Think hard about if you ever might want to go to a different optimised data source (RDS or single dynamo table) to reduce the number of queries required to fulfil your graphQL requests.
2. Can graphql subscriptions get update when app is not running(like user should be notified)?
No. Anurag mentions SNS which would be a good option to out-app notify users, best to blend subscriptions and another service.
If we have an on-prem sources like SQL-Server and Oracle. Data from it has to be ingested periodically in batch mode in Big Query. What shud be the architecture? Which GCP native services can be used for this? Can Dataflow or DataProc be used?
PS: Our organization haven't licensed any third-party ETL tool so far. Preference is for google native service. Data Fusion is very expensive.
There are two approaches you can take with Apache Beam.
Periodically run a Beam/Dataflow batch job on your database. You could use Beam's JdbcIO connector to read data. After that you can transform your data using Beam transforms (PTransforms) and write to the destination using a Beam sink. In this approach, you are responsible for handling duplicate data (for example, by providing different SQL queries across executions).
Use a Beam/Dataflow pipeline that can read change streams from a database. The simplest approach here might be using one of the available Dataflow templates. For example, see here. You can also develop your own pipeline using Beam's DebeziumIO connector.
Can you elaborate on the differences between Pub/Sub and Firestore and provide some scenarios or use cases on which one to choose?
I'm not sure which one to use for building an app for a food delivery service that services real-time updates reflected as soon as they are added or changed to the database, ensuring that customers and drivers are aware of when food is ready for pickup and when food is in transit to their end destination such as UberEats.
The difference is quite simple:
Firestore (RealtimeDB) is for backend to frontend (customers/users) communication and realtime updates
Pubsub is a backend to backend message bus for async processing.
In your use case, you won't use PubSub to send notification to your users! Use realtimeDB to perform these updates.
Pub/Sub is like a notification system wherein you receive updates when something is added, changed or removed.
Firestore, on the other hand, is a NoSQL database for mobile (Android, iOS) and other web apps that can be directly access via native SDK. It can support many data types, from simple strings to complex objects. It also supports whatever data structure that works best for your app.
It is best to use Firestore for your app as it provides realtime updates.
You can check for the detailed documentation of Pub/Sub and Firestore.
For Firestore, you can either use either mobile/web client library or server client library.
Here's the link for Firestore, containing its benefits and key features.
I have been following this tutorial and it has given me great insight into linking up a mongodb and getting back google authentication. However, I want to use dynamodb, and I found this to hook up a localhost dynamodb to test out my theories, but I am not sure how to swap out mongoose - because I don't want it linking to a mongodb, or can I use mongoose for the schema and tell it to write to a dynamodb? Not sure. Thanks for any guidance in advance. I'm new to aws if you can't tell.
Mongoose is a library for connecting to MongoDB. I believe your question title should be "DynamoDB instead of MongoDB".
You can't use Mongoose to connect to DynamoDB. There are many differences between MongoDB and DynamoDB. If you want to use DynamoDB with your NodeJS application you should look into using the AWS SDK for NodeJS.
Please be aware that there are major differences between MongoDB and DynamoDB. It's not going to be trivial to take a MongoDB tutorial and modify that to work with DynamoDB. I highly recommend you read up on DynamoDB and understand its restrictions and limitations before committing to using it.