How to use ConversationHandler for Telegram bot in AWS Lambda - amazon-web-services

I'm currently writing a Telegram bot using python-telegram-bot as a wrapper. I want to try and host this on AWS Lambda. However, so far the examples I've seen are simple, dumb bots that are unable to continue a conversation. I'm leveraging ConversationHandler to run the bot's conversations but this doesn't work well on AWS Lambda. I'm not sure how to fix this issue.
bot = MyBot()
def lambda_handler(event=None, context=None):
try:
dispatcher = bot.updater.dispatcher
message = json.loads(event['body'])
print("Incoming:", message)
dispatcher.process_update(Update.de_json(message, bot.updater.bot))
except Exception as e:
print(e)
return {"statusCode": 500}
bot.updater.idle()
return {"statusCode": 200}
How can I get the bot to hold a conversation state throughout?

ConversationHandler stores the state internally, i.e. in memory. I don't know how AWS handles initialization of variables, but if the ConversationHandler is initialized anew on each incoming update, it won't remember which state each conversation was in. If you can use some sort of database/file storage on AWS, you can try to use PTBs persistence setup to store the converstanion states and reload them for each incoming update.
Disclaimer: I'm currently the maintainer of python-telegram-bot

Related

Can a lambda return a response and wait for a new body without closing the session?

I am running a puppeteer function in AWS Lambda and I have a scenario that the user makes a POST request to the lambda with his username and email. The function is going to check if they are valid in a website and return the JSON to the user with the answer. Is it possible to use the same lambda session to receive another input/body from the user?
The reason I need it to be the same session is because each time an user and email is sent to the lambda, the puppeteer website is going to generate unique ID's that need to be used AFTER the user sends his data in that exact moment because it is logged into the website with an unique session.
I'm currently running this function in a NodeJS and it is fine because the session isnt going to be closed but the session is closed once the lambda returns the first response.
Like people mentioned above, Lambda function is stateless resource and you can ultimately use dynamoDB to store any values such session ID or so.
Additionally, if the Lambda function should wait for response or any updated values by querying DynamoDB, then you can implement AWS Step Function or Airflow which provides the "wait" state.
See what States you can leverage in the AWS Docs.

How should I handle asynchronous processes that occur after API calls in AWS?

I'm designing the backend for a website that uses API Gateway and Lambda to handle API requests, many of which target a MySQL DB on RDS. Some processes need to happen asynchronously but I'm debating which is best practice or cleaner.
In the given scenario, every time a user creates a new row in a certain table, let's say an email also needs to be sent asynchronously. There are many other scenarios similar to this but this will set precedent.
Option 1: In the lambda that handles the API request, first write to the MySQL instance to add the new row. When the response from MySQL comes back successful, write to something like SQS which will later be read from another lambda that sends an email. When the response from SQS is successful that the record was added to the queue, send a 201 response saying the REST API call was successful.
Option 2: In the lambda that handles the API request, write to the MySQL instance to add the new row. When the response from the MySQL comes back successful, send a 201 response saying the REST API call was successful. Then set up a DMS (data migration service) task that runs indefinitely to send database modification binlogs to a kinesis stream which will trigger a lambda that will handle all DB changes, read the change as a new row in a certain table, and send an email.
Option 1:
less infrastructure
more direct tracking of logic from an API call
1 extra http call (to sqs) delaying response times for an api for a web page
Option 2:
more infrastructure (dms task, replication instance)
scaling out shards may mean loss of ordering when processes binlog events if ordering is a requirement (it is)
side question: Are you able to choose hash key for kinesis for dms tasks from mysql?
a single codebase for reacting to all modifications in the DB may actually make following logic in code simpler
Is this the tradeoff or am I missing something? What is best practice in this scenario?
Option 1 in my view seems most logical, but I would replace SQS and second lambda with SNS. So, modified option 1 could be:
Option 1: In the lambda that handles the API request, first write to the MySQL instance to add the new row. When the response from MySQL comes back successful, publish confirmation message to SNS that sends an email. When the response from SNS is successful send a 201 response saying the REST API call was successful.
This should be faster, cheaper and easier to implement then using SQS and second lambda for sending email.

Lambda Low Latency Messaging Options

I have a Lambda that requires messages to be sent to another Lambda to perform some action. In my particular case it is passing a message to a Lambda in order for it to perform HTTP requests and refresh cache entries.
Currently I am relying on the AWS SDK to send an SQS message. The mechanics of this are working fine. The concern that I have is that the SQS send method call takes around 50ms on average to complete. Considering I'm in a Lambda, I am unable to perform this in the background and expect for it to complete before the Lambda returns and is frozen.
This is further compounded if I need to make multiple SQS send calls, which is particularly bad as the Lambda is responsible for responding to low-latency HTTP requests.
Are there any alternatives in AWS for communicating between Lambdas that does not require a synchronous API call, and that exhibits more of a fire and forget and asynchronous behavior?
Though there are several approaches to trigger one lambda from another, (in my experience) one of the fastest methods would be to directly trigger the ultimate lambda's ARN.
Did you try invoking one Lambda from the other using AWS SDKs?
(for e.g. in Python using Boto3, I achieved it like this).
See below, the parameter InvocationType = 'Event' helps in invoking target Lambda asynchronously.
Below code takes 2 parameters (name, which can be either your target Lambda function's name or its ARN, params is a JSON object with input parameters you would want to pass as input). Try it out!
import boto3, json
def invoke_lambda(name, params):
lambda_client = boto3.client('lambda')
params_bytes = json.dumps(params).encode()
try:
response = lambda_client.invoke(FunctionName = name,
InvocationType = 'Event',
LogType = 'Tail',
Payload = params_bytes)
except ClientError as e:
print(e)
return None
return response
Hope it helps!
For more, refer to Lambda's Invoke Event on Boto3 docs.
Alternatively, you can use Lambda's Async Invoke as well.
It's difficult to give exact answers without knowing what language are you writing the Lambda function in. To at least make "warm" function invocations faster I would make sure you are creating the SQS client outside of the Lambda event handler so it can reuse the connection. The AWS SDK should use an HTTP connection pool so it doesn't have to re-establish a connection and go through the SSL handshake and all that every time you make an SQS request, as long as you reuse the SQS client.
If that's still not fast enough, I would have the Lambda function handling the HTTP request pass off the "background" work to another Lambda function, via an asynchronous call. Then the first Lambda function can return an HTTP response, while the second Lambda function continues to do work.
You might also try to use Lambda Destinations depending on you use case. With this you don't need to put things in a queue manually.
https://aws.amazon.com/blogs/compute/introducing-aws-lambda-destinations/
But it limits your flexibility. From my point of view chaining lambdas directly is an antipattern and if you would need that, go for step functions

What is the convention when using Boto3 clients vs resources?

So I have an API that makes calls to AWS services and I am using Boto3 in order to do this within my python application. The question I have deals with Boto3's client vs resource access levels. I think I understand the difference between them (one is low-level access the other is higher-level object-oriented service access) but my question is if it is okay to instantiate both a client and resource? For example, some resource functionality is easier to access using a resource over a client, but there is some functionality only the client has. Is it bad to instantiate both and use the easiest access level when needed or will there be some sort of disconnect when using two separate access levels when connecting to the same resource?
I am not running into any errors with my code to connect to SQS shown below, however I want to make sure that down the line I am not shooting myself in the foot by arbitrarily choosing between the client/resource for the same aws connection.
import boto3
REGION = 'us-east-1'
sqs_r = boto3.resource('sqs', REGION)
sqs_c = boto3.client('sqs', REGION)
def create_queue(queue_name):
queue_attributes = {
'FifoQueue': 'true',
'DelaySeconds': '0',
'MessageRetentionPeriod': '900', # 15 minutes to complete a command, else deleted.
'ContentBasedDeduplication': 'true'
}
try:
queue = sqs_r.get_queue_by_name(QueueName=queue_name)
except:
queue = sqs_r.create_queue(QueueName=queue_name, Attributes=queue_attributes)
def list_all_queues(queue_name_prefix=''):
all_queues = sqs_c.list_queues(QueueNamePrefix=queue_name_prefix)
print(all_queues['QueueUrls'])
print(type(all_queues))
Both of the above function work properly, one creates a queue and the other lists all of the queues at sqs. However, one function uses a resource and the other uses a client. Is this okay?
You can certainly use both.
The resource method actually uses the client method behind-the-scenes, so AWS only sees client-like calls.
In fact, the resource even contains a client. You can access it like this:
import boto3
s3 = boto3.resource('s3')
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.meta.client.copy(copy_source, 'otherbucket', 'otherkey')
This example is from the boto3 documentation. It shows how a client is being extracted from a resource, and makes a client call, effectively identical to s3_client.copy().
Both client and resource just create a local object. There is no back-end activity involved.

creating a web url that listens to redis pubsub published message

Edit
OK I have a long polling from javascript that talks to a django view. The view looks as follows. It loses some messages that I publish from redis client in the channel. Also I should not be connecting to redis for every request (Perhaps the redis variables can be saved in session?)
If someone can point out the changes I need to make this view work with long polling, it would be awesome! Thank you!
def listen (request):
if request.session:
logger.info( 'request session: %s' %(request.session))
channel = request.GET.get('channel', None)
if channel:
logger.info('not in cache - first time - constructing redis object')
r = redis.Redis(host='localhost', port=6379, db=0)
p = r.pubsub()
logger.info('subscribing to channel: %s' %(channel))
p.psubscribe(channel)
logger.info('subscribed to channel: %s' %(channel))
message = p.listen().next()
logger.info('got msg %s' %(message))
return HttpResponse(json.dumps(message));
return HttpResponse('')
----Original question---
I am trying to create a chat application (using django, python) and am trying to avoid the polling mechanism. I have been struggling with this now - so any pointers would be really appreciated!
Since web sockets are not supported in most browsers, I think long polling is the right choice. Right now I am looking for something that scales better than regular polling and is easy to integrate with python django stack. Once I am done with this development, I plan to evaluate other python frameworks (tornado twister, gevent etc.) come to mind.
I did some research and liked the redis pubsub mechanism. The chat message gets published to a channel to which both users have already subscribed to. Following are my questions:
From what I understand, apache would not scale well since long polling would soon run into process/thread limits. Hence I have decided to switch to nginx. Is this rationale correct? Also are there any issues involved in nginx that I am worried about? In particular, I am worried about the latest version not supporting http 1.1 for proxy passing as mentioned in the blog post at http://www.letseehere.com/reverse-proxy-web-sockets?
How do I create the client portion of the subscription of messages on the browser side? In my mind, it would be a url to which the javascript code would "long poll". So at the javascript level, the client would poll a url which gets "blocked" in a "non blocking way" at the server side. When a result (in this case a new chat message) appears, server returns the result. Javascript does what it needs to and then again polls the same url. Is this thinking correct? What happens in between the intervals when the javascript loop is pausing - do we loose any messages from the server side.
In essence, I want to create the following:
From redis, I publish a message to a channel "foo" (can use redis-cli also - easy to incorporate it later in python/django)
I want the same message to appear in two browser windows that use the same js code to poll. Assume that the browser code knows the channel name for test purpose
I publish a second message that again appears in two browser windows.
I am new to real time apps, so apologies for any question that may not make sense.
Thank you!
Well just answering your question partly and mentioning one option out of many: Gunicorn being used with an async worker class is a solution for long-polling/non-blocking requests that is really easy to setup!