Setting up sns_publish_operator in airflow

Setting up sns_publish_operator in airflow - amazon-web-services

Has anyone used the sns_publish_operator?
https://airflow.apache.org/docs/stable/_modules/airflow/contrib/operators/sns_publish_operator.html
I am quite new to airflow and am having some issues around setting up the architecture correctly.
I have set up a simple DAG with a data quality check task. Basically, if the dataset fails the data quality checks, I'd like to send an SNS notification. If it passes the data quality checks, I'd like it to reframe from sending an email.
There does not seem to be as much online help in this realm as I thought. Any resources or general tips would be much appreciated.

This question is a bit older, but maybe this helps someone still.
First addressing the SnsPublishOperator: You will need to setup an Airflow connection to AWS. There is multiple ways to do that. The easiest is probably using the Web UI. Go there to Admin->Connections->[+] (add new record). Then you set it up as 'Amazon Webservices Connection' Conn Type. Login and Password are the AWS key and key secret. Finally, you also have to provide the region, where you SNS topic is seated in the 'Extras' section:
{"region_name": "us-east-1"}
Now, you can use the Operator in you code, by providing it also the new connection's conn_id:
my_sns_task = SnsPublishOperator(
task_id='task_name',
target_arn='your_sns_topic_arn',
message='your_message',
aws_conn_id=conn_id
)
Theoretically, that operator also has a "subject" parameter, but I receive an error from the component, when I try to set it.
As for incorporating that operator into your DAG, a possible way would be to have the task, that evaluates your data, fail, if the checks to do not pass, and have the sns task be triggered on failure:
my_sns_task = SnsPublishOperator(
task_id='task_name',
target_arn='your_sns_topic_arn',
message='your_message',
aws_conn_id=conn_id,
trigger_rule='one_failed'
)
my_sns_task.set_upstream(datacheck_task)

Related

Send push notifications/emails when a query/mutation happends in AppSync/Aurora

I am using AppSync with Aurora/RDS.
I would like that in some cases, when a query/mutation is sent to the db, then, after that, I want to send an email and push notification, but this should be detached from the query/mutation, that is, it does not matter if it fails or works.
At the moment I see all these options:
Can you tell me which one I should use?
Create a query that calls a lambda function that sends the
push/email and call it from the client once the actual
query/mutation is done. I don't like this because the logic is in
the client rather than the server. Seems easy to implement, and I
guess it is easy to ignore the result of the second operation from a
client point of view.
A variation of the previous one. Pack both operations in a single
network request. With GraphQL, that is easy, but I don't want the
client waits for the second operation. (Is it possible to create
lambda functions that return immediately, like a trigger of other
functions?)
Attach my queries/mutations to lambda functions instead of RDS
directly. Then, those lambda functions call other lambda functions
for notifications. Seems more difficult to program, but more
micro-services architecture friendly. Probably this is the best one,
not sure.
Use SQL triggers and call lambda functions from those triggers. I
don't know if this is even possible. Researching...
Use pipelines resolvers. The first one is the query/mutation, the
second one is the lambda function that sends the push/email. I would
say this is a bad option because I don't want the client to wait for
the second operation or manage the logic when the second resolver
fails.
Amazon RDS Events: It appears it is possible to attach lambda
functions to specific AWS RDS events.
https://docs.aws.amazon.com/lambda/latest/dg/services-rds.html It
seems it is about creating DBs, restoring... and that kind of
things. I don't see anything like creating a row, updating a row...
So, I discard this unless I am wrong.
Invoking a Lambda Function with an Aurora MySQL Stored Procedure
CALL mysql.lambda_async ( lambda_function_ARN,lambda_function_input )
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.Lambda.html
"For example, you might want to send a notification using Amazon
Simple Notification Service (Amazon SNS) whenever a row is inserted
into a specific table in your database." That is exactly what I am
looking for. I like this idea, but I don't know if that is possible
with Aurora Serverless. Researching... It seems it is not possible
when using server-less:
https://www.reddit.com/r/aws/comments/a9szid/aurora_serverless_call_lambda/
Use step functions: No idea about how to use it.
Somehow, attach this lambda notification function to GraphQL/AppSync
instead of the database, but I guess it is not a good idea because I
need to read the database to the push notification token and the
email of the use who is going to receive the notifications.
Which method do you recommend me? I am using amplify cli.
Thanks a lot.

Currently AWS AppSync can only send notifications when the app is active. We are looking into implementation of the non active case.
If you want to send notifications when the app is not active, you can use the push notifications on iOS: silent push/interactive push or push notifications on Android.
If you want to send emails, voice/text message or notifications on phone when the app is not active, you can integrate with Amazon Pinpiont.

WSO2 API Manager 2.1 : Gateway not enforcing Throttling Limits

We have deployed API-M 2.1 in a distributed way (each component, GW, TM, KM are running in their own Docker image) on top on DC/OS 1.9 ( Mesos ).
We have issues to get the gateway to enforce throttling policies (should it be subscription tiers or app-level policies). Here is what we have managed to define so far:
The Traffic Manager itself does it job : it receives the event streams, analyzes them on the fly and pushes an event onto the JMS topic throttledata
The Gateway reads the message properly.
So basically we have discarded a communication issue.
However we found two potential issues:
In the event which is pushed to the TM component, the value of the appTenant is null (instead of carbon.super)- We have a single tenant defined.
When the gateway receives the throttling message, it decides to let the message go thinking the "stopOnQuotaReach" is set to false, when it is set to true (we checked the value in the database).
Digging into the source code, we related those two issues to a single source: the value for both values above are read from the authContext and apparently incorrectly set. We are stuck and running out of ideas of things to try and would need some pointers to what could be a potential source of the problem and things to check.
Can somebody help please ?
Thanks- Isabelle.

Is there two TM with HA enabled available in the system?
If the TM is HA enabled, how gateways publish data to TM. Is it load balanced data publishing or failover data publishing to the TMs?
Did you follow below articles to configure the environment with respect to your deployment?
http://wso2.com/library/articles/2016/10/article-scalable-traffic-manager-deployment-patterns-for-wso2-api-manager-part-1/
http://wso2.com/library/articles/2016/10/article-scalable-traffic-manager-deployment-patterns-for-wso2-api-manager-part-2/
Is throttling completely not working in your environment?
Have you noticed any JMS connection related logs in gateways nodes?

In these tests, we have disabled HA to avoid possible complications. Neither subscription nor app throttling policies are working, both because parameters that should have values have not the adequate value (appTenant, stopOnQuotaReach).
Our scenario is far more basic. If we go with one instance of each component, it fails as Isabelle described. And the only thing we know is that both parameters come from the Authentication Context.
Thank you!

Selecting message queue approach for multiple consumers in AWS

Please help selecting a MQ app/system/approach for the following use-case:
Check for incoming messages for a specific user -> read the message if available -> delete from the queue, ideally, staying within AWS.
Context:
Social networking app, users receiving messages, i.e.
I need to identify incoming messages by recipient ID.
The app is doing long-polls for new messages every 30 seconds.
Message size is <1Kb.
As per current estimates, I'll need 100M+ message checks per months in total (however, much less messages, these are just checks).
While users acknowledge messages choosing OK or Ignore, however not sure if ACK support is required from MQ system for that.
I'm in AWS. Initially thought of SQS, but the more I read the less it looks like a good match - cannot set message recipient ID in a way to filter by recipient, etc, however maybe I'm wrong.
One of the options I also thought about is to just use DynamoDB's "messages" table, partition key being userId and sort key being a messageId, thus I'll be able to easily query by a user, however concerned with costs.
If possible, I would much more prefer to stay within AWS or at least use SAAS like SQS, as being a 1-person startup I really want to avoid headaches supporting self-hosted system.
Thank you!
D

You are right on both these counts:
SQS won't work, because of the limitation you pointed.
DynamoDB would work, but cost a lot.
I can suggest the following:
Create a Redis cluster, possibly on Amazon ElastiCache.
In it, make one List per user.
Whenever a new message comes, append it to concerned User's list.
To deliver the message, just read from the User's list. Also, flush the queue if needed.
What I am suggesting is very similar to how Twitter manages each User's news-feed and home-feed.
It should also be cheap.

I need help clarifying a high level use-case of Amazon SQS

So I need a second pair of eyes to correct or confirm my understand standing of Amazon SQS. From my understanding, you can add an unlimited amount of messages to one queue. A message can be 256 KB in size, and if it needs to be larger than that, you can use amazon s3 to store 2 GB. Reading around online, it appears there are many use cases for this queuing service. For example one use case of SQS can act as a database buffer.
But here's what I'm looking to do.. I'm looking to make a real time messaging system. My current functionality acts like more of a message board, so the implementation just inserts into the database then reads the data and packages it into JSON to be inserted on SQLITE mobile phone. That works great, but I'm getting a lot of requests from people to make it real-time.
So what I'm wondering is can I utilize amazon SQS to write and read messages for a chat application? So in my theoretical use case of SQS would have a message queue to write to, and pull from the that queue every second to check for messages on mobile. But here's where I'm confused. Since you cannot "Query" a particular message from the queue, would it make sense to have a queue per user then a generic queue for the app server to read from? Or am I just talking crazy and should spend cognitive resources thinking about implementing an open connection on an Ec2 instance?
Any help would be great,
Thanks!

Have you thought about using Amazon SNS to push the chat messages to your mobile devices? Each user publishes to a topic and the readers subscribe to that topic. You just have to be ok with missing messages if the app isn't running.

If you only have a few (or maybe, less than 100) users, you could have thought of having one SQS queue per user. If that is not so, the solution won't be operationally feasible.
If you were to have one generic queue, SQS won't help because it doesn't allow querying for a given field in all available messages.
I can think of following options for your use case:
Setup one Redis cluster, possibly on Amazon ElastiCache. Have one message List per user.
One Messages table in MySQL, possibly on AWS RDS. This will provide an easy way to query messages for a given user.
You can also use DynamoDB in #2.

Amazon SNS CreatePlatformApplication returns error when reusing platform applications

I had code that was working that would create a new platform application for every message that went out. I thought that was wasteful so I tried to change the code to use list_platform_applications to get available applications and reuse the one that has the proper name (part of the PlatformApplicationArn).
This will work for several messages in a row when suddenly I'll get this error from CreatePlatformApplication:
{"Error":{"Code":"InvalidParameter","Message":"Invalid parameter: This
endpoint is already registered with a different
token.","Type":"Sender"},"RequestId":"06bd3443-598e-5c06-9f5c-7f84349ea067"}
That doesn't even make sense. I'm creating an endpoint. I didn't pass one in. Is it really complaining about the endpoint it's returning.
According to the Amazon documentation:
"The CreatePlatformEndpoint action is idempotent, so if the requester
already owns an endpoint with the same device token and attributes,
that endpoint's ARN is returned without creating a new endpoint."
So it seems to me, if there's an appropriate one it will be returned. Otherwise, create a brand new fresh one.
Am I missing something?

Oh darn. I think I found the reason for this behavior. After facing this issue, I made sure that each token was only uploaded once to AWS SNS. When testing this, I realized that nevertheless I ended up with multiple endpoints with the same token - huh???
It turned out that these duplicated tokens resulted from outdated tokens being uploaded to AWS SNS. After creating an endpoint using an outdated token, SNS would automagically revive the endpoint by updating it with the current device token (which afaik is delivered back from GCM as a canonical ID once you try to send push messages to outdated tokens).
So e.g. uploading these (made-up) tokens and custom data
APA9...YFDw, {original_token: APA9...YFDw}
APA9...XaSd, {original_token: APA9...XaSd} <-- Assume this token is outdated
APA9...sVQa, {original_token: APA9...sVQa}
might result in something like this - i.e. different endpoints with identical tokens:
APA9...YFDw, {original_token: APA9...YFDw}, arn:aws:sns:eu-west-1:4711:endpoint/GCM/myapp/daf64...5c204
APA9...YFDw, {original_token: APA9...XaSd}, arn:aws:sns:eu-west-1:4711:endpoint/GCM/myapp/a980f...e3c82 <-- Duplicate token!
APA9...sVQa, {original_token: APA9...sVQa}, arn:aws:sns:eu-west-1:4711:endpoint/GCM/myapp/14777...7d9ff
This scenario in turn seems to lead to above error on subsequent attempts to create endpoints using outdated tokens. On the hand, it seems correct that subsequent requests fail. On the other hand, intuitively I have the gut-feeling that the duplication of tokens that is taking place seems wrong, or at least difficult to handle. Maybe once SNS discovers that a token is outdated and needs to be changed, it could first check if there is already another endpoint existent with the same token...
I will research on this a bit more and see if I can find a way to handle this properly.
Cheers

Had the same issue, with the device reporting one token (outdated according to GCM) and the SNS retrieving/storing another.
We solved it by clearing the app cache on the device and reopening the app (which in our case, re-registered the device on the gcm service), generating the same token (not outdated) that SNS was attempting to push to.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js