AWS SQS - Receive message from CRON - amazon-web-services

I have an AWS Elastic Beanstalk instance configured as a Worker Environment. It has a cron job that runs every x minutes. In Worker Configuration I have the path to a php file that runs when the cron fires. If I go to the SQS dashboard I can manually send the SQS for this worker a message as well as set an actual message, for example "hello".
My question is, how can I have the php file access the SQS message's message attribute?
The obvious answer is to use the AWS\SQSClient however the only way to read a message is to first get a message. The problem here is that the message has already been retrieved by the Elastic Beanstalk worker code. So how can I now read its attributes?
EDIT
Just to add more clarity to what I am describing I'm going to give a detailed write up of my steps to cause this.
I log into my elastic bean stalk and create a new Environment in my application.
I select 'create new worker'
I configure a PHP instance
I upload a new source for my environment
The source zip contains 2 files cron.yaml and someJob.php
See file codes below
I continue through set up until I get to the "Worker Details" section. Here I set the following:
Worker Queue - Autogenerated queue
HTTP Path - /someJob.php
MIME Type - default
HTTP Connections - 10
Visibility Timeout - 300
I let the environment build
During the build an autogenerated SQS message and dead letter queue are automatically built
Once finished the environment sits there until the first cron job time is hit
A message is somehow sent to the autogenerated SQS message queue
someJob.php runs
The message apparently gets deleted
Cron:
version: 1
cron:
- name: "Worker"
url: "someJob.php"
schedule: "0 * * * *"
PHP:
psuedo
<?send me an email, update the db, whatever?>
//NOTE: I don't even connect to the AWS file or perform ANY SQS actions for this to work
Now my question is, if I go to the autogenerated SQS queue I can select it, go to Queue Actions, go to send a message, and then send an actual message string ... such as "Hello".
Can I access the message message value "Hello" even though my PHP wasn't responsible for calling the message from SQS? Obviously, I would need to the call the AWS lib and associated SQS commands but the only command I can do is "receiveMessage" which I assume would pull a new message instead of the information from the currently received "Hello" message.
Note that sending the "Hello" message will also call someJob.php to run.

Related

AWS lambda: Execute function on timeout

I am developing a lambda function that migrates logs from an SFTP server to an S3 bucket.
Due to the size of the logs, the function sometimes is timing out - even though I have set the maximum timeout of 15 minutes.
try:
logger.info(f'Migrating {log_name}... ')
transfer_to_s3(log_name, sftp)
logger.info(f'{log_name} was migrated succesfully ')
If transfer_to_s3() fails due to timeoutlogger.info(f'{log_name} was migrated succesfully') line won't be executed.
I want to ensure that in this scenario, I will somehow know that a log was not migrated due to timeout.
Is there a way to force lambda to perform an action, before exiting, in the case of a timeout?
Probably a better way would be to use SQS for that:
Logo info ---> SQS queue ---> Lambda function
If lambda successful moves the files, it removes the log info from SQS queue. If it fails, the log info persists in the SQS queue (or goes to DLQ for special handling), so the next lambda invocation can handle it.

Move Google Pub/Sub Messages Between Topics

How can I bulk move messages from one topic to another in GCP Pub/Sub?
I am aware of the Dataflow templates that provide this, however unfortunately restrictions do not allow me to use Dataflow API.
Any suggestions on ad-hoc movement of messages between topics (besides one-by-one copy and pasting?)
Specifically, the use case is for moving messages in a deadletter topic back into the original topic for reprocessing.
You can't use snapshots, because snapshots can be applied only on subscriptions of the same topics (to avoid message ID overlapping).
The easiest way is to write a function that pull your subscription. Here, how I will do it:
Create a topic (named, for example, "transfer-topic") with a push subscription. Set the timeout to 10 minutes
Create a Cloud Functions HTTP triggered by PubSub push subscription (or a CLoud Run service). When you deploy it, set the timeout to 9 minutes for Cloud Function and to 10 minutes for Cloud Run. The content of the processing is the following
Read a chunk of messages (for examples 1000) from the deadletter pull subscription
Publish the messages (in bulk mode) into the initial topic
Acknowledge the messages of the dead letter subscription
Repeat this up to the pull subscription is empty
Return code 200.
The global process:
Publish a message in the transfer-topic
The message trigger the function/cloud run with a push HTTP
The process pull the messages and republish them into the initial topic
If the timeout is reached, the function crash and PubSub perform a retry of the HTTP request (according with an exponential backoff).
If all the message are processed, the HTTP 200 response code is returned and the process stopped (and the message into the transfer-topic subscription is acked)
this process allow you to process a very large amount of message without being worried about the timeout.
I suggest that you use a Python script for that.
You can use the PubSub CLI to read the messages and publish to another topic like below:
from google.cloud import pubsub
from google.cloud.pubsub import types
# Defining parameters
PROJECT = "<your_project_id>"
SUBSCRIPTION = "<your_current_subscription_name>"
NEW_TOPIC = "projects/<your_project_id>/topics/<your_new_topic_name>"
# Creating clients for publishing and subscribing. Adjust the max_messages for your purpose
subscriber = pubsub.SubscriberClient()
publisher = pubsub.PublisherClient(
batch_settings=types.BatchSettings(max_messages=500),
)
# Get your messages. Adjust the max_messages for your purpose
subscription_path = subscriber.subscription_path(PROJECT, SUBSCRIPTION)
response = subscriber.pull(subscription_path, max_messages=500)
# Publish your messages to the new topic
for msg in response.received_messages:
publisher.publish(NEW_TOPIC, msg.message.data)
# Ack the old subscription if necessary
ack_ids = [msg.ack_id for msg in response.received_messages]
subscriber.acknowledge(subscription_path, ack_ids)
Before running this code you will need to install the PubSub CLI in your Python environment. You can do that running pip install google-cloud-pubsub
An approach to execute your code is using Cloud Functions. If you decide to use it, pay attention in two points:
The maximum time that you function can take to run is 9 minutes. If this timeout get exceeded, your function will terminate without finishing the job.
In Cloud Functions you can just put google-cloud-pubsub in a new line of your requirements file instead of running a pip command.

Prevent multiple instances of an Azure web job processing the same Queue message

I noticed that multiple instances of my Web job are receiving the same message and end up acting on it. This is not the desired behavior. I would like multiple messages to be processed concurrently, however, I do not want the same message being processed by multiple instances of the web job.
My web job is of the continuous running type.
I use a QueueTrigger to receive the message and invoke the function
My function runs for several hours.
I have looked into the JobHostConfiguration.BatchSize and MaxDequeue properties and I am not sure on these. I simply want a single instance processing a message and that it could take several hours to complete.
This is what I see in the web job logs indicating the message is received twice.
[01/24/2017 16:17:30 > 7e0338: INFO] Executing:
'Functions.RunExperiment' - Reason: 'New queue message detected on
'runexperiment'.'
[01/24/2017 16:17:30 > 7e0338: INFO] Executing:
'Functions.RunExperiment' - Reason: 'New queue message detected on
'runexperiment'.'
According to the official document, if we use Azure queue storage in the WebJob on the multiple instance, we no need to write code to prevent multiple instances to processing the same queue message.
The WebJobs SDK queue trigger automatically prevents a function from processing a queue message multiple times; functions do not have to be written to be idempote.
I deployed a WebJob on the 2 instances WebApp, It also works correctly(not execute twice with same queue message). It could run on the 2 instances and there is no duplicate executed message.
So it is very odd that the queue message is executed twice, please have a try to debug it whether there are 2 queue messages that have the same content are triggered.
The following is my debug code. I wrote the message that with the executed time and instance id info into another queue.
public static void ProcessQueueMessage([QueueTrigger("queue")] string message, [Queue("logqueue")] out string newMessage, TextWriter log)
{
string instance = Environment.GetEnvironmentVariable("WEBSITE_INSTANCE_ID");
string newMsg = $"WEBSITE_INSTANCE_ID:{instance}, timestamp:{DateTime.Now}, Message:{message}";
log.WriteLine(newMsg);
Console.WriteLine(newMsg);
newMessage = newMsg;
}
}
I had the same issue of a single message processed multiple times at the same time. The issue disappeared as soon as I have set the MaxPollingInterval property...

SQS Messages never gets removed/deleted after script runs

I'm having issues where my SQS Messages are never deleted from the SQS Queue. They are only removed when the lifetime ends, which is 4 days.
So to summarize the app:
Send URL to SQS Queue to wait to be crawled
Send message to Elastic Beanstalk App that crawls the data and store it in database
The script seems to be working in the meaning that it does receive the message, and it does crawl it successfully and store the data successfully in the database. The only issue is that the messages remain in the queue, stuck at "Message Available".
So if I for example load the queue with 800 messages, it will be stuck at ~800 messages for 4 days and then they will all be deleted instantly because of the lifetime value. It seems like a few messages get deleted because the number changes slightly, but a large majority is never removed from the queue.
So question:
Isn't SQS supposed to remove the message as soon as it has been send and received by the script?
Is there a manual way for me to in the script itself, delete the current message? From what I know the message is only sent 1 way. From SQS -> App. So from what I know, I can not do SQS <-> App.
Any ideas?
A web application in a worker environment tier should only listen on
the local host. When the web application in the worker environment
tier returns a 200 OK response to acknowledge that it has received and
successfully processed the request, the daemon sends a DeleteMessage
call to the SQS queue so that the message will be deleted from the
queue. (SQS automatically deletes messages that have been in a queue
for longer than the configured RetentionPeriod.) If the application
returns any response other than 200 OK or there is no response within
the configured InactivityTimeout period, SQS once again makes the
message visible in the queue and available for another attempt at
processing.
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
So I guess that answers my question. Some messages do not return HTTP 200 and then they are stuck in an infinite loop.
No the messages won't get deleted when you read a Queue Item; it is only hidden for a specific amount of time it is called as Visibility Timeout. The idea behind visibility timeout is to ensure that if there are multiple consumers for a single queue, no two consumer pick the same item and start processing.
The is the change you need to do your app to get the expected behavior
Send URL to SQS Queue to wait to be crawled
Send message to Elastic Beanstalk App that crawl the data and store it in database
On the event of successful crawled status, use the receipt-handle(not the message id) and delete the Queue Item from the Queue.
AWS Documentation - DeleteMessage

Amazon Message Queue Service (SQS) sending and confirmation

Scenario:
A elastic beanstalk environment has a few web servers which sends a request to another worker environment for user sign up, etc.
Problem:
When one of the worker machines have finished the task, I also want it to send a confirmation to the web server.
It seems that SQS does not have "confirmation" message.
When we offload a job to send a email, but I also want to let the web server to know that send email was successful.
One solution I could do is implement another queue that the web server polls, however, many servers can poll on the same queue, and the confirmation for Server 1, can be recieved by Server 2, and we would need to wait for a timeout for the message, but then Server 3 might intercept the message. It could wait a while for Server 1 to get a confirmation.
The way your worker machines "confirm" they processed a message is by deleting it from the queue. The lifecycle of a queue message is:
Web server sends a message with a request.
Worker receives the message.
Worker processes the message successfully.
Worker deletes the message.
A message being deleted is the confirmation that it was processed successfully.
If the worker does not delete the message in step 4 then the message will reappear in the queue after a specified timeout, and the next worker to check for messages will get it and process it.
If you are concerned that a message might be impossible to process and keep reappearing in the queue forever, you can set up a "dead letter" queue. After a message has been received a specified number of times but never deleted, the message is transferred to the dead letter queue, where you can have a process for dealing with these unusual situations.
Amazon's Simple Queue Service Developer Guide has a good, clear explanation of this lifecycle and how you can be sure that every message is processed or moved to a dead letter queue for special handling.