How can I achieve graphql/dataloader like behaviour in AppSync? - amazon-web-services

I am currently building an AppSync API and after doing some research into how to load four different fields with two data sources (two fields per data source) I came across this question.
The response to this question seems to do a great job explaining the use of data loaders and using mostly field resolver. The problem is that in AppSync I can't use tools like graphql/dataloader which would allow me to deduplicate for example multiple get requests to the same DynamoDB item.
Then I thought maybe something like this is built into AppSync natively but after trying and looking at the traces it seems to not be doping any deduplication work and just does the dumb call to the DynamoDB table for each field, although they use the same key.
No I was wondering if there is a way to achieve this without borderline writing it myself in a pipeline resolver?

Related

What are the limitations of using AWS appsync api(GraphQL) through Amplify?

I just want to avoid use of custom/manual resolvers in appsync completely. So I'm using Amplify to setup GraphQL appsync API in my app. I'm doing all the stuffs by changing schema.graphql and amplify push.
I have 2 questions :
1. What are the limitations and what problems I'm going to face in future?
2. Can graphql subscriptions get update when app is not running(like user should be notified)?
tons of business logic will be exposed on the client side code.
I think for push notifications you would still have to go via external integrations like FCM/APNS. Multiple integration options are available in SNS
Just to preamble these answers, the fact that you use an amplify generated graphQL and resolvers doesn't stop you from later including custom resolvers and pipeline functions - it's just that you need to learn quite a bit about where to include them in the backend file structure of amplify.
1. What are the limitations and what problems I'm going to face in future?
This depends on how well your applications use-case matches the graphQL schema design and if your application is relatively self-contained. Amplify becomes more complex when your application needs to talk to other back-end systems, you'll need to start using DynamoDB triggers to notify other state machines/event bridge/SNS or similar services.
As mentioned none of these problems are crippling, you can deal with them later but it will be a step up in the AWS knowledge required to implement them.
For small high-volume/availability apps Amplify and DynamoDB as-it-comes is great. If your application matures into many micro-services and sites then you'll need to learn quite a bit more AWS to make them play together well. Amplify does determine your DynamoDB on a table per object basis and you'll probably be stuck with (paying for) that. Think hard about if you ever might want to go to a different optimised data source (RDS or single dynamo table) to reduce the number of queries required to fulfil your graphQL requests.
2. Can graphql subscriptions get update when app is not running(like user should be notified)?
No. Anurag mentions SNS which would be a good option to out-app notify users, best to blend subscriptions and another service.

Sanity check on AWS Big Data Architecture

We're currently looking to move our AWS architecture over to something that supports large amounts of data and can scale as we gain more customers. When this project started we stuck with what we knew, a Ruby app on an EC2 making RESTful API calls, storing the results in S3, and also storing everything in an RDS. We have a SPA front end written in VueJS to support the stored data.
As our client list has grown, the outbound API calls and subsequence data we are storing is also growing. I'm currently tasked with looking for a better solution and I wanted to get a sense of feedback on what I was thinking so far. Currently we have around 5 millions rows of relational data which will only increase as our client list does. I could see in a year or two we would be in the low billions or rows.
The Ruby app does a great job of handling queuing the outbound API calls, retries, and everything else in-between. For this reason we thought about keeping the app and rather than inserting directly into the RDS, it would simply dump the results into S3 as a CSV.
A trigger in S3 could now convert the raw CSV data into parquet format using a Lambda function (I was looking at something like PyArrow). From here we could move over from the traditional RDS to something like Athena which supports parquet and would allow us to reuse most of our existing SQL queries.
To further optimize the performance for the user we thought about caching commonly used queries in a Dynamo table. Because the data is based on the scheduled external API calls, we could control when to bust the cache of the queries.
Big Data backends aren't really my thing, so any feedback is greatly appreciated. I know I have a lot more research to do into parquet as it's new to me. Eventually we'd like to do some ML on this data, so I believe parquet will also support thanks.

Google Tag Manager clickstream to Amazon

So the questions has more to do with what services should i be using to have the efficient performance.
Context and goal:
So what i trying to do exactly is use tag manager custom HTML so after each Universal Analytics tag (event or pageview) send to my own EC2 server a HTTP request with a similar payload to what is send to Google Analytics.
What i think, planned and researched so far:
At this moment i have two big options,
Use Kinesis AWS which seems like a great idea but the problem is that it only drops the information in one redshift table and i would like to have at least 4 o 5 so i can differentiate pageviews from events etc ... My solution to this would be to divide from the server side each request to a separated stream.
The other option is to use Spark + Kafka. (Here is a detail explanation)
I know at some point this means im making a parallel Google Analytics with everything that implies. I still need to decide what information (im refering to which parameters as for example the source and medium) i should send, how to format it correctly, and how to process it correctly.
Questions and debate points:
Which options is more efficient and easiest to set up?
Send this information directly from the server of the page/app or send it from the user side making it do requests as i explained before.
Does anyone did something like this in the past? Any personal recommendations?
You'd definitely benefit from Google Analytics custom task feature instead of custom HTML. More on this from Simo Ahava. Also, Google Big Query is quite a popular destination for streaming hit data since it allows many 'on the fly computations such as sessionalization and there are many ready-to-use cases for BQ.

GraphEngine Scaling

I don't see any documentation on scaling GraphEngine.
I'm confused as to how this works. I saw that there is a "distributed" mode. I can add the servers and get them to hook up, but I can't get data to be used in both places.
LocalStorage vs CloudStorage is confusing. It doesn't seem you can query on CloudStorage either (for data that might exist in multiple places).
Do you have any examples of this? I'd be very appreciative.
If you mean LINQ interfaces by "query", then currently there's no built-in support for LINQ over cloud storage. However, it should be easy to write a wrapper handler that broadcasts a query to the cloud, and let each server collect data using LINQ, then aggregated together.
Could you elaborate your scenario? It would be interesting to see how a user would like to query over a distributed cluster.

What is the "proper" way to use DynamoDB for an iOS app?

I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?
Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.
From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.