The Acknowledgment Deadline is 10 seconds. When I use asynchronous-pull way to process the message, I don't call message.ack() and message.nack(), wait for the message ack deadline and expect Pub/Sub redeliver this message.
After waiting over 10 seconds, the subscriber doesn't receive the message again. Here is my code:
subscriber:
import { pubsubClient, IMessage, parseMessageData } from '../../googlePubsub';
import { logger } from '../../utils';
const topicName = 'asynchronous-pull-test';
const subName = 'asynchronous-pull-test';
const subscription = pubsubClient.topic(topicName).subscription(subName);
const onMessage = (message: IMessage) => {
const { data, ...rest } = message;
const jsonData = parseMessageData(data);
logger.debug('received message', { arguments: { ...rest, data: jsonData } });
const publishTime = new Date(message.publishTime).getTime();
const republishTimestamp = Date.now() - 5 * 1000;
if (publishTime < republishTimestamp) {
logger.info('message acked');
message.ack();
} else {
logger.info('push message back to MQ');
}
};
logger.info('subscribe the MQ');
subscription.on('message', onMessage).on('error', (err: Error) => logger.error(err));
publisher:
const topicName = 'asynchronous-pull-test';
async function main() {
const messagePayload = { email: faker.internet.email(), campaignId: '1' };
await pub(topicName, messagePayload);
}
main();
I am using "#google-cloud/pubsub": "^0.19.0",
I expect the subscriber will receive the message again at the ack deadline 10 seconds later. Which means my subscriber receives and processes the message every 10 seconds. Am I wrong?
The Google Cloud Pub/Sub client libraries automatically call modifyAckDeadline for messages that are neither acked or nacked for a configurable period of time. In node.js, this is configured via the maxExtension property:
const options = {
flowControl: {
maxExtension: 60, // Specified in seconds
},
};
const subscription = pubsubClient.topic(topicName).subscription(subName, options);
In general, it is not a good practice to not ack/nack a message as a means to delay its redelivery. This will result in the message still counting against the flow control max outstanding messages, meaning it could prevent the delivery of future messages until the originally received messages are acked or nacked. At this time, Cloud Pub/Sub does not have a means by which to delay message redelivery, but it is something under consideration.
You should nack the message to tell Pub/Sub to redeliver it. Documentation
Related
This is my setup.
Subscription A is a push subscription that POSTs messages to a cloud Run deployment.
That deployment exposes an HTTP endpoint, processes the message, posts the result to Topic B, and responds 200 to subscription A's POST request. The whole process takes ~1.5 seconds.
Therefore, for every message in subscription A, I should end up with 1 message in Topic B.
This is how my code looks like
My app started an Express server
const express = require('express');
const bodyParser = require('body-parser');
const _ = require('lodash');
const startBrowser = require('./startBrowser');
const tab = require('./tab');
const createMessage = require('./publishMessage');
const domain = 'https://example.com';
require('dotenv').config();
const app = express();
app.use(bodyParser.json());
const port = process.env.PORT || 8080;
app.listen(port, async () => {
console.log('Listening on port', port);
});
The endpoint where all the magic happens
app.post('/', async (req, res) => {
// Define the success and fail functions, that respond status 200 and 500 respectively
const failed = () => res.status(500).send();
const completed = async () => {
const response = await res.status(200).send();
if (response && res.writableEnded) {
console.log('successfully responded 200');
}
};
//Process the data coming from Subscription A
let pubsubMessage = decodeBase64Json(req.body.message.data);
let parsed = await processor(pubsubMessage);
//Post the processed data to topic B
let messageId = await postParsedData(parsed);
if (messageId) {
// ACK the message once the data has been processed and posted to topic B.
completed();
} else {
console.log('Didnt get a message id');
// failed();
}
});
//define the functions that post data to Topic B
const postParsedData = async (parsed) => {
if (!_.isEmpty(parsed)) {
const topicName = 'topic-B';
const messageIdInternal = await createMessage(parsed, topicName);
};
return messageId;
} else {
console.log('Parsed is Empty');
return null;
}
};
function decodeBase64Json(data) {
return JSON.parse(Buffer.from(data, 'base64').toString());
}
Execution time takes about ~1.5 seconds and I can see the successful responses logged on Cloud run every ~1.5seconds. That adds up to about ~2400messages/hour (per cloud run instance).
Topic B is getting new messages at ~2400messages/hour, Subscription A's acknowledgement rate is ~200messages/hour, which leads to the messages being re-delivered many times.
Subscription A's Acknowledgement deadline is 600 seconds.
The request timeout period in Cloud run is 300 seconds.
I've tried ACKing messages before they're published to topic-B or even before parsing, but I'm getting the same result.
Edit: added screenshot of the pending messages and processed messages. Many more messages processed than ACKed pending messages. Should be 1:1
Thanks for your help
Solution This error could not be reproduced by GCP support. It didn't happen with large amounts of Cloud Run VMs. The solution is just to increase the number of worker instances
You need to await your complete(); function call. like this
....
if (messageId) {
// ACK the message once the data has been processed and posted to topic B.
await completed();
} else {
console.log('Didnt get a message id');
// failed();
}
If I use SQS long polling and set the "WaitTimeSeconds" to say 10 seconds and "MaxNumberOfMessages" to 1, and a single message is delivered to the queue after say 0.1 seconds, will the call to sqs.receiveMessage() return immediately at that point, or should it not return until the 10 seconds of "WaitTimeSeconds" have elapsed?
In my testing the call to sqs.receiveMessage() seems to not return until the full duration of "WaitTimeSeconds" has elapsed.
Here is the code:
// Load the AWS SDK for Node.js
var AWS = require("aws-sdk");
const fmReqQ = "https://sqs.ap-southeast-2.amazonaws.com/myactid/fmReqQ";
const fmRspQ = "https://sqs.ap-southeast-2.amazonaws.com/myactid/fmRspQ";
const SydneyRegion = "ap-southeast-2";
var credentials = new AWS.SharedIniFileCredentials({ profile: "myprofile" });
AWS.config.credentials = credentials;
// Set the region
AWS.config.update({ region: SydneyRegion });
// Create an SQS service object
var sqs = new AWS.SQS({ apiVersion: "2012-11-05" });
async function sendRequest() {
var sendParams = {
MessageBody: "Information of 12/11/2016.",
QueueUrl: fmReqQ,
};
try {
data = await sqs.sendMessage(sendParams).promise();
console.log("Success, request MessageId: ", data.MessageId);
} catch (err) {
console.log("Error", err);
}
}
async function doModelling() {
console.time("modelling");
await sendRequest();
await receiveResponse();
console.timeEnd("modelling");
}
async function receiveResponse() {
var receiveParams = {
AttributeNames: ["SentTimestamp"],
MaxNumberOfMessages: 1,
MessageAttributeNames: ["All"],
QueueUrl: fmRspQ,
WaitTimeSeconds: 1,
};
let data = null;
try {
data = await sqs.receiveMessage(receiveParams).promise();
console.log("Success, response MessageId: ", data);
} catch (err) {
console.log("Error", err);
}
}
doModelling();
When I set "WaitTimeSeconds: 3" I get output:
Success, request MessageId: e5079c2a-050f-4681-aa8c-77b05ac7da7f
Success, response MessageId: {
ResponseMetadata: { RequestId: '1b4d6a6b-eaa2-59ea-a2c3-3d9b6fadbb3f' }
}
modelling: 3.268s
When I set "WaitTimeSeconds: 10" I get output:
Success, request MessageId: bbf0a429-b2f7-46f2-b9dd-38833b0c462a
Success, response MessageId: {
ResponseMetadata: { RequestId: '64bded2d-5398-5ca2-86f8-baddd6d4300a' }
}
modelling: 10.324s
Notice how the elapsed time durations match the WaitTimeSeconds.
From reading about AWS SQS long polling it says it long polling "Return messages as soon as they become available."
I don't seem to be seeing the messages "as soon as they become available", I seem to be noticing the sqs.receiveMessage() call always taking the duration set in WaitTimeSeconds.
As you can see in the sample code, I have set MaxNumberOfMessage to 1.
Using ReceiveMessage() with Long Polling will return as soon as there is at least one message in the queue.
I'm not a Node person, but here's how I tested it:
Created an Amazon SQS queue
In one window, I ran:
aws sqs receive-message --queue-url https://sqs.ap-southeast-2.amazonaws.com/123/foo --visibility-timeout 1 --wait-time-seconds 10
Then, in another window, I ran:
aws sqs send-message --queue-url https://sqs.ap-southeast-2.amazonaws.com/123/foo --message-body bar
The receive-message command returned very quickly after I used the send-message command.
It is possible that your tests are impacted by messages being 'received' but marked as 'invisible', and remaining invisible for later tests since your code does not call DeleteMessage(). I avoided this by specifically stating --visibility-timeout 1, which made the message immediately reappear on the queue for the next test.
The number of messages being requested (--max-number-of-messages) does not impact this result. It returns as soon as there is at least one message available.
I set the "WaitTimeSeconds" to 0, I seem to get the behaviour that I am after now:
Success, request MessageId: 9f286e22-1a08-4532-88ba-06c88be3dbc3
Success, response MessageId: {
ResponseMetadata: { RequestId: '5c264e27-0788-5772-a990-19d78c8b2565' }
}
modelling: 307.884ms
The value I was specifying for "WaitTimeSeconds" determines the duration of the sqs.receiveMessage() call because there were no messages on my queue, so the sqs.receiveMessage() call will wait for duration specified by "WaitTimeSeconds".
I am in the process of investigating AWS SQS FIFO queues and building some prototype. I am however having difficulty in grasping how I can extract a specific message that was sent.
A couple of questions now:
I understand that with the ReceiveMessageAsync call, a list of messages is being returned. Should I loop through this list and match the MessageId property with that of my original sent message?
If there is a list of unprocessed messages in the queue, lets say 15, and I send a new message, will my message only be returned with ReceiveMessageAsync when at least 6 messages are deleted off the queue?
Currently in my prototype, I perform a SendMessageAsync request and immediately afterwards, perform a ReceiveMessageAsync in order to get the processed message. It is here that I loop through the received list of messageIds in order to get my message, perform some logic on the message, and then request to delete the message off the queue. Is this logic correct?
var sqsClient = new AmazonSQSClient(RegionEndpoint.EUWest1);
var sendQueueUrl = $"{ConfigurationManager.AppSettings["AWSServer"]}{ConfigurationManager.AppSettings["SQSSend"]}";
var deduplicationId = "fc1e026d-4a04-4cdf-b0b0-16bc78dde19c"; //Guid.NewGuid().ToString();
var sqsMessageRequest = new SendMessageRequest
{
QueueUrl = sendQueueUrl,
MessageGroupId = "testGroup",
MessageDeduplicationId = deduplicationId,
MessageBody = "{\"message\":\"hello\"}"
};
try
{
var sendMessageResponse = await sqsClient.SendMessageAsync(sqsMessageRequest);
var receiveQueueUrl = $"{ConfigurationManager.AppSettings["AWSServer"]}{ConfigurationManager.AppSettings["SQSReceive"]}";
var receiveMessageRequest = new ReceiveMessageRequest
{
AttributeNames = { "All" },
MaxNumberOfMessages = 10,
MessageAttributeNames = { "All" },
QueueUrl = receiveQueueUrl,
WaitTimeSeconds = 20
};
bool messagesFound = false;
while (!messagesFound)
{
var receiveMessageResponse = await sqsClient.ReceiveMessageAsync(receiveMessageRequest);
if (receiveMessageResponse.HttpStatusCode != System.Net.HttpStatusCode.OK)
Console.WriteLine("Failed request to receive message\n");
else
{
foreach (var message in receiveMessageResponse.Messages)
{
if (message.MessageId != sendMessageResponse.MessageId)
continue;
messagesFound = true;
/*process message further and delete afterwards*/
var deleteMessageRequest = new DeleteMessageRequest($"{ConfigurationManager.AppSettings["AWSServer"]}{ConfigurationManager.AppSettings["SQSReceive"]}", message.ReceiptHandle);
var deleteMessageResponse = await sqsClient.DeleteMessageAsync(deleteMessageRequest);
}
}
}
}
catch (Exception ex)
{
throw new Exception("SendMessageAsync: " + ex.Message);
}
finally
{
sqsClient.Dispose();
}
Looping over a queue to find a specific item will be inefficient for SQS in a FIFO queue if this is your aim.
You would actually be better looking at Virtual Queues which will offer a way to use SQS to perform a 1 to 1 mapping of send and receive.
By using this feature, you would define a virtual queue for your specific message, the consumer will be able to process all messages (or technically consumer from a specific virtual queue).
Whereas both FIFO and standard SQS queues you will need to process all messages in the queue, which is capped to 10 messages at a time. This makes it really inefficient to try to find your specific message.
We're trying to develop a self-invoking lambda to process S3 files in chunks. The lambda role has the policies needed for the invocation attached.
Here's the code for the self-invoking lambda:
export const processFileHandler: Handler = async (
event: S3CreateEvent,
context: Context,
callback: Callback,
) => {
let bucket = loGet(event, 'Records[0].s3.bucket.name');
let key = loGet(event, 'Records[0].s3.object.key');
let totalFileSize = loGet(event, 'Records[0].s3.object.size');
const lastPosition = loGet(event, 'position', 0);
const nextRange = getNextSizeRange(lastPosition, totalFileSize);
context.callbackWaitsForEmptyEventLoop = false;
let data = await loadDataFromS3ByRange(bucket, key, nextRange);
await database.connect();
log.debug(`Successfully connected to the database`);
const docs = await getParsedDocs(data, lastPosition);
log.debug(`upserting ${docs.length} records to database`);
if (docs.length) {
try {
// upserting logic
log.debug(`total documents added: ${await docs.length}`);
} catch (err) {
await recurse(nextRange.end, event, context);
log.debug(`error inserting docs: ${JSON.stringify(err)}`);
}
}
if (nextRange.end < totalFileSize) {
log.debug(`Last ${context.getRemainingTimeInMillis()} milliseconds left`);
if (context.getRemainingTimeInMillis() < 10 * 10 * 10 * 6) {
log.debug(`Less than 6000 milliseconds left`);
log.debug(`Invoking next iteration`);
await recurse(nextRange.end, event, context);
callback(null, {
message: `Lambda timed out processing file, please continue from LAST_POSITION: ${nextRange.start}`,
});
}
} else {
callback(null, { message: `Successfully completed the chunk processing task` });
}
};
Where recurse is an invocation call to the same lambda. Rest of the things work as expected it just times out whenever the call stack comes on this invocation request:
const recurse = async (position: number, event: S3CreateEvent, context: Context) => {
let newEvent = Object.assign(event, { position });
let request = {
FunctionName: context.invokedFunctionArn,
InvocationType: 'Event',
Payload: JSON.stringify(newEvent),
};
let resp = await lambda.invoke(request).promise();
console.log('Invocation complete', resp);
return resp;
};
This is the stack trace logged to CloudWatch:
{
"errorMessage": "connect ETIMEDOUT 63.32.72.196:443",
"errorType": "NetworkingError",
"stackTrace": [
"Object._errnoException (util.js:1022:11)",
"_exceptionWithHostPort (util.js:1044:20)",
"TCPConnectWrap.afterConnect [as oncomplete] (net.js:1198:14)"
]
}
Not a good idea to create a self-invoking lambda function. In case of an error (could also be a bad handler call on AWS side) a lambda function might re-run several times. Very hard to monitor and debug.
I would suggest using Step Functions. I believe this tutorial can help Iterating a Loop Using Lambda
From the top of my head, if you prefer not dealing with Step Functions, you could create a Lambda trigger for an SQS queue. Then you pass a message to the queue if you want to run the lambda function another time.
I am trying to publish multiple message at time (around 50) but Pub/Sub is giving is Deadline Exceeded at /user_code/node_modules/#google-cloud/pubsub/node_modules/grpc/src/client.js:55 error.
const pubsub = PubSub();
const topic = pubsub.topic('send_wishes');
const publisher = topic.publisher();
//data is dictionary object
Object.keys(data).forEach(function(key){
var userObj = data[key];
const dataBuffer = Buffer.from(JSON.stringify(userObj));
const publisher = topic.publisher();
publisher.publish(dataBuffer)
.then((results) => {
const messageId = results[0];
console.log(`Message ${messageId} published.`);
return;
});
})
For single message it's working fine. For batching I try the batch configuration of publisher but it is also not working
const publisher = topic.publisher({
batching: {
maxMessages: 15,
maxMilliseconds: 2000
}
});
Once creating subscription please change Acknowledgement Deadline of subscription time for default 10 sec to 100 sec.