GCP PubSub retry backoff timing - google-cloud-platform

I have a dead-letter policy configured for my Google Pub/Sub subscription as:
...
dead_letter_policy {
dead_letter_topic = foobar
max_delivery_attempts = x
}
{
"minimumBackoff": y,
"maximumBackoff": z
}
...
Plugging in various values, i am not seeing the retries happen at times i would expect. E.g.
max_delivery_attempts: 5
minimumBackoff: 10 Seconds
maximumBackoff: 300 Seconds
Seconds between retries:
15
17
20
29
max_delivery_attempts: 30
minimumBackoff: 5 Seconds
maximumBackoff: 600 Seconds
Seconds between retries:
12
9
9
14
15
18
24
24
45
44
58
81
82
120
..., and so on.
From this testing, it seems u need a high max attempts value to get actual exponential back-off? For my first data set, i would have expected the time between my last 2 attempts would have been closer to 300. From my second data set, it seems this would only be the case if the max attempts is set to the max value of 100. Is this assumption correct?
(also, this is a pull subscription)
Thanks

Related answer: How does the exponential backoff configured in Google Pub/Sub's RetryPolicy work?
The exponential backoff based on minimum_backoff and maximum_backoff roughly follows the equation mentioned in the question above (with randomization factor). The relevant factor to your question are
Maximum backoff is not part of the calculation when deriving the backoff interval. Maximum backoff setting is used to ensure we do not back off more than configured, even if the backoff interval computation results in such an answer. The rate of growth in interval duration still increases with retry, as visible from your test.
The multiplication factor, responsible for the growth in backoff interval, is a system internal detail and clients should not be dependent on it.
If you want the maximum backoff to happen before the dead letter event occurs, I suggest starting with a higher minimum backoff configuration.

Related

concurrency in AWS Lambda function

I am just looking for confirmation that I understand the lambda settings correctly.
The minimum concurrency setting is 10, My lambda function takes 3000 ms.
So does that mean the maximum number of times my function can be called a minute is...
60 seconds / 3 seconds = 20 calls
20 x 10 concurrent = 200 function calls per minute.
thanks

C++ call to API function ::GetTickCount() jumps ~18 days

On a few Windows computers I have seen that two, on each other following, calls to ::GetTickCount() returns a difference of 1610619236 ms (around 18 days). This is not due to wrap around og int/unsigned int mismatch. I use Visual C++ 2015/2017.
Has anybody else seen this behaviour? Does anybody have any idea about what could cause behaviour like this?
Best regards
John
Code sample that shows the bug:
class CLTemp
{
DWORD nLastCheck;
CLTemp()
{
nLastCheck=::GetTickCount();
}
//Service is called every 200ms by a timer
void Service()
{
if( ::GetTickCount() - nLastCheck > 20000 )//check every 20 sec
{
//On some Windows machines, after an uptime of 776 days, the
//::GetTickCount() - nLastCheck gives a value of 1610619236
//(corresponding to around 18 days)
nLastCheck = ::GetTickCount();
}
}
};
Update - problem description, a way of recreating and solution:
The Windows API function GetTickCount() unexpectedly jumps 18 days forward in time when passing 776 days after Windows Restart.
We have experienced several times that some of our long running Windows pc applications coded in Microsoft Visual C++ suddenly reported a time-out error. In many of our applications we call GetTickCount() to perform some tasks with certain intervals or to watch for a time-out condition. The example code could go as this:
DWORD dwTimeNow, dwPrevTime = ::GetTickCount();
bool bExit = false;
While (!bExit)
{
dwTimeNow = ::GetTickCount();
if (dwTimeNow – dwPrevTime >= 5000)
{
dwPrevTime = dwTimeNow;
// Perform my task
}
else
{
::Sleep(10);
}
}
GetTickCount() returns a DWORD, which is an unsigned 32-bit int. GetTickCount() wraps around from its maximum value of 0xFFFFFFFF to zero after app. 49 days. The wrap around is easily handled by using unsigned arithmetic and always subtracting the previous value from the new value to calculate the distance. Do never compare two values from GetTickCount() against each other.
So, the wrap around at its maximum value each 49 days it expected and handled. But we have experienced an unexpected wrap around to zero of GetTickCount() after 776 days after latest Windows Restart. And in this case GetTickCount() wraps from 0x9FFFFFFF to zero, which is 1610612736 milliseconds too early corresponding to around 18.6 days. When GetTickCount() is used to check for a time-out condition and it suddenly reports that 18 days have elapsed since last check, then the software reports a false time-out condition. Note that it is 776 days after a Windows Restart. A Windows Restart resets the GetTickCount() value to zero. A pc reboot does not, instead the time elapsed while switched off is added to the initial GetTickCount() value.
We have made a test program that provides evidence of this issue. The test program reads the values of GetTickCount(), GetTickCount64(), InterruptTime(), and UnbiasedInterruptTime() each 5000 milliseconds scheduled by a Windows Timer. Each time the sample program calculates the distance in time for each of the four time-functions. If the distance in time is 10000 milliseconds or more, it is marked as a time jump event and logged. Each time it also keeps track of the minimum distance and the maximum distance in time for each time-function.
Before starting the test program, a Windows Restart is carried out. Ensure no automatic time synchronization is enabled. Then make a Windows shut down. Start the pc again and make it enter its Bios setup when it boots. In the Bios, advance the real time clock 776 days. Let the pc boot up and start the test program. Then after 17 hours the unexpected wraparound of GetTickCount() occurs (776 days, 17 hours, and 21 minutes). It is only GetTickCount() that shows this behavior. The other time-functions do not.
The following excerpt from the logfile of the test program shows the start values reported by the four time-functions. In this example the time has only been advanced to 775 days after Windows Restart. The format of the log entry is the time-function value converted into: days hh:mm:ss.msec. TickCount32 is the plain GetTickCount(). Because it is a 32-bit value it has wrapped around and shows a different value. At GetTickCount64() we can see the 775 days.
2024-05-14 09:13:27.262 Start times
TickCount32 : 029 08:30:11.591
TickCount64 : 775 00:12:01.031
InterruptTime : 775 00:12:01.036
UnbiasedInterruptTime: 000 00:05:48.411
The next excerpt from the logfile shows the unexpected wrap around of GetTickCount() (TickCount32). The format is: Distance between the previous value and the new value (should always be around 5000 msec). Then follows the new value converted into days and time, and finally follows the previous value converted into days and time. We can see that GetTickCount() jumps 1610617752 milliseconds (app. 18.6 days) while the other three time-functions only advances app. 5000 msec as expected. At TickCount64 one can see that it occurs at 776 days, 17 hours, and 21 minutes.
2024-05-16 02:22:30.394 Time jump *****
TickCount32 : 1610617752 - 000 00:00:00.156 - 031 01:39:09.700
TickCount64 : 5016 - 776 17:21:04.156 - 776 17:20:59.140
InterruptTime : 5015 - 776 17:21:04.165 - 776 17:20:59.150
UnbiasedInterruptTime: 5015 - 001 17:14:51.540 - 001 17:14:46.525
If you increase the time that the real time clock is advanced to two times 776 days and 17 hours – for example 1551 days – the phenomenon shows up once more. It has a cyclic nature.
2026-06-30 06:34:26.663 Start times
TickCount32 : 029 12:41:57.888
TickCount64 : 1551 21:44:51.328
InterruptTime : 1551 21:44:51.334
UnbiasedInterruptTime: 004 21:24:24.593
2026-07-01 19:31:47.641 Time jump *****
TickCount32 : 1610617736 - 000 00:00:04.296 - 031 01:39:13.856
TickCount64 : 5000 - 1553 10:42:12.296 - 1553 10:42:07.296
InterruptTime : 5007 - 1553 10:42:12.310 - 1553 10:42:07.303
UnbiasedInterruptTime: 5007 - 006 10:21:45.569 - 006 10:21:40.562
The only viable solution to this issue seems to be using GetTickCount64() and totally abandon usage of GetTickCount().

How to setup Concurrency Thread Group

I have following test plan to test concurrent user load test of a website -
Configuration set as -
Target Concurrency = 10
Ramp up Time = 1
Ramp up step count = 1
Hold Target rate time = 6
So it's creating confusion, what I am expecting that it will send only 10 requests at a time in 1 second but the result is it sends first 10 request at a time in 1 second and continue sending requests till 60 seconds.
Why it is so?
Keep Hold Target Rate Time to 1 sec to match your expectations.
The graph should reflect the settings you made.
Note: In the graph you shared, it is clearly visible that you kept Hold Target Rate Time to 60 sec (reflected in the graph also) which resulted in 60 seconds execution after ramp-up time.
Reference:
Refer Concurrency ThreadGroup section in the link
as per requirements for simulating 10 requests at a time in 1 second
Target Concurrency = 10
Ramp up Time = 1
Ramp up step count = 1
Hold Target rate time = 1
Keep Hold Target rate time till you want to run to test.
e.g 1 sec for running test plan for 1 sec, 1 min to run test plan for 1 min.

How to compare event count value with previous time interval event

I am looking for whether I can compare the total number of event count for the current one hr interval with the total number of event count with the previous one hour interval and if the current hour count is less than previous hour count then one email should get triggered from Riemann.
I am not sure whether we can store the value and compare it with the current event value because I learned events will get expired due to TTL option in Riemann.
Please correct me if I am wrong and suggest me a reference code to achieve it in Riemann.
Thanks in advance
It sounds like you want the rate of change of the count over an hour and then to decide if that rate is negative? One way to do this is just as you describe:
(fold-interval-metric 3600 folds/count
(fixed-event-window 2
(smap folds/difference
(where (neg? (:metric event))
email))))
and this makes sense. You may find that if you use the built in derivative over time function ddt that and graph it you can spot these problems over much shorter timescales. If your success rate falls to zero on minute three of an hour 57 minutes is a long time for the computer to wait before it calls a human for help. If the rate of change on a 15 minute period approches negative infinity it's very likely that your service just stopped.
I'm fond of wrapping ddt in the exponential weighted moving average ewma so spikes don't set off the alarms and have had an extremely low false positive rate with this pattern:
(ewma 30 (ddt ...your stuff here...))
I often want to compare the rate of the requests to a service with the responses with this pattern which uses ewma ddt and project:
(pipe ↲ (splitp = service
"service:input" (ewma 30 ↲)
"service:output" (ewma 30 ↲)
bit-bucket) ;; throw out other services here
(project [(service "service:input")
(service "service:output")]
(smap folds/quotient-sloppy
(with :service "service-ratio-rate-of-change"
(ddt ...your streams here...)))))
If requests are infrequent you will need to play with the interval in all these examples to ensure that the alarms don't go off between events. If your events are infrequent you may also need to set the :ttl on the events high enough that they don't expire while you are agrigating them.
ps: the ↲ can be any symbol(s) you want, I just chose that unicode character.
pss: a false posative rate of one alarm per quarter should be reasonable if you consider these things carefully.

SoapUI load test, calculate cnt in variance strategy

I work with SoapUI project and I have one question. In following example I've got 505 requests in 5 seconds with thread count =5. I would like to understand how count has been calculated in this example.
For example, if I want 1000 request in 1 minute what setting should I set in variance strategy?
Regards, Evgeniy
variance strategy as the name implies, it varies the number of threads overtime.Within the specified interval the threads will increase and decrease as per the variance value, thus simulating a realistic real time load on target web-service.
How variance is calculated : its not calculated using the mathematical variance formula. its just a multiplication. (if threads = 10 and variance = 0.5 then 10 * 0.5 = 5. The threads will be incremented and decremented by 5)
For example:
Threads = 20
variance = 0.8
Strategy = variance
interval = 60
limit = 60 seconds
the above will vary the thread by 16 (because 20 * 0.8 = 16), that is the thread count will increase to 36 and decrease to 4 and end with the original 20 within the 60 seconds.
if your requirement is to start with 500 threads and hit 1000 set your variance to 2 and so on.
refrence link:
chek the third bullet - simulating different type of load - soapUI site
Book for reference:
Web Service Testing with SoapUi by Charitha kankanamge