Multiple Timers in C++ / MySQL - c++

I've got a service system that gets requests from another system. A request contains information that is stored on the service system's MySQL database. Once a request is received, the server should start a timer that will send a FAIL message to the sender if the time has elapsed.
The problem is, it is a dynamic system that can get multiple requests from the same, or various sources. If a request is received from a source with a timeout limit of 5 minutes, and another request comes from the same source after only 2 minutes, it should be able to handle both. Thus, a timer needs to be enabled for every incoming message. The service is a web-service that is programmed in C++ with the information being stored in a MySQL database.
Any ideas how I could do this?

A way I've seen this often done: Use a SINGLE timer, and keep a priority queue (sorted by target time) of every timeout. In this way, you always know the amount of time you need to wait until the next timeout, and you don't have the overhead associated with managing hundreds of timers simultaneously.
Say at time 0 you get a request with a timeout of 100.
Queue: [100]
You set your timer to fire in 100 seconds.
Then at time 10 you get a new request with a timeout of 50.
Queue: [60, 100]
You cancel your timer and set it to fire in 50 seconds.
When it fires, it handles the timeout, removes 60 from the queue, sees that the next time is 100, and sets the timer to fire in 40 seconds. Say you get another request with a timeout of 100, at time 80.
Queue: [100, 180]
In this case, since the head of the queue (100) doesn't change, you don't need to reset the timer. Hopefully this explanation makes the algorithm pretty clear.
Of course, each entry in the queue will need some link to the request associated with the timeout, but I imagine that should be simple.
Note however that this all may be unnecessary, depending on the mechanism you use for your timers. For example, if you're on Windows, you can use CreateTimerQueue, which I imagine uses this same (or very similar) logic internally.

Related

Scheduler Design? Multiple Events on the same timeline starting at different times

I have multiple objects (Object1, Object2 and Object3) which MAY want to utilize a callback. If it is decided that an object wants to be registed
for a periodic callback, they all will use a 30 second reset rate. The object will choose when it registers for a callback (that it would want
at that fixed interval of 30 seconds going forward).
If I wanted to give each object its own internal Timer (such as a timer on a seperate thread) this would be a simple
problem. However each timer would need to be on a seperate thread, which would grow too much as my object count grows.
So for example:
at T=10 seconds into runtime, Object 1 registers for a callback. Since the callback occurs every 30 seconds, its next fire event will
be at T=40, then T=70, T=100 etc.
say 5 seconds later (T=15), Object 2 registers for a callback. Meaning its next call is at T=45, T=75, T=105 etc.
Lastly 1 second after Object 2, Object 3 registers for a callback. Its callback should be invoked at T=46 etc.
A dirty solution I would have for this to for everything to calculate its delta from the first registered Object.
So Object 0 is 0, Object 1 is 10 and Object 3 is 11. Then in a constantly running loop, once the 30 seconds have elapsed, I know
that Object 0's callback can process, and within 10 seconds from that point I can then call object 1's callback etc.
I don't like that in a way that stay busy waits as a while loop must constantly be running. I guess SystemSleep calls may not be as different using semaphores.
Another thought I had was finding the lowest common multiple between the fire events. For example if I kew it was possible every 3 seconds I may have to fire an event, i would keep track of that.
I think essentially what I am trying to make is some sort of simple scheduler? I'm sure I am hardly the first person to do this.
I am trying to come up with a performant solution. a While Loop or a ton of timers on their own threads would make this easy, but that is not a good solution.
Any ideas? Is there a name for this design?
Normally you would use a priority queue, a heap or similar to manage your timed callbacks using a single timer. You check what callback needs to be called next and that is the time you set for the timer to wake you up.
But if all callbacks use a constant 30s repeat then you can just use a queue. New callbacks are added to the end as a pair of callback and (absolute) timestamp and the next callback to call will always be at the front. Every time you call a callback you add it back to the queue with a timestamp 30s increased.

Celery on SQS - Handling Duplicates [duplicate]

I know that it is possible to consume a SQS queue using multiple threads. I would like to guarantee that each message will be consumed once. I know that it is possible to change the visibility timeout of a message, e.g., equal to my processing time. If my process spend more time than the visibility timeout (e.g. a slow connection) other thread can consume the same message.
What is the best approach to guarantee that a message will be processed once?
What is the best approach to guarantee that a message will be processed once?
You're asking for a guarantee - you won't get one. You can reduce probability of a message being processed more than once to a very small amount, but you won't get a guarantee.
I'll explain why, along with strategies for reducing duplication.
Where does duplication come from
When you put a message in SQS, SQS might actually receive that message more than once
For example: a minor network hiccup while sending the message caused a transient error that was automatically retried - from the message sender's perspective, it failed once, and successfully sent once, but SQS received both messages.
SQS can internally generate duplicates
Simlar to the first example - there's a lot of computers handling messages under the covers, and SQS needs to make sure nothing gets lost - messages are stored on multiple servers, and can this can result in duplication.
For the most part, by taking advantage of SQS message visibility timeout, the chances of duplication from these sources are already pretty small - like fraction of a percent small.
If processing duplicates really isn't that bad (strive to make your message consumption idempotent!), I'd consider this good enough - reducing chances of duplication further is complicated and potentially expensive...
What can your application do to reduce duplication further?
Ok, here we go down the rabbit hole... at a high level, you will want to assign unique ids to your messages, and check against an atomic cache of ids that are in progress or completed before starting processing:
Make sure your messages have unique identifiers provided at insertion time
Without this, you'll have no way of telling duplicates apart.
Handle duplication at the 'end of the line' for messages.
If your message receiver needs to send messages off-box for further processing, then it can be another source of duplication (for similar reasons to above)
You'll need somewhere to atomically store and check these unique ids (and flush them after some timeout). There are two important states: "InProgress" and "Completed"
InProgress entries should have a timeout based on how fast you need to recover in case of processing failure.
Completed entries should have a timeout based on how long you want your deduplication window
The simplest is probably a Guava cache, but would only be good for a single processing app. If you have a lot of messages or distributed consumption, consider a database for this job (with a background process to sweep for expired entries)
Before processing the message, attempt to store the messageId in "InProgress". If it's already there, stop - you just handled a duplicate.
Check if the message is "Completed" (and stop if it's there)
Your thread now has an exclusive lock on that messageId - Process your message
Mark the messageId as "Completed" - As long as this messageId stays here, you won't process any duplicates for that messageId.
You likely can't afford infinite storage though.
Remove the messageId from "InProgress" (or just let it expire from here)
Some notes
Keep in mind that chances of duplicate without all of that is already pretty low. Depending on how much time and money deduplication of messages is worth to you, feel free to skip or modify any of the steps
For example, you could leave out "InProgress", but that opens up the small chance of two threads working on a duplicated message at the same time (the second one starting before the first has "Completed" it)
Your deduplication window is as long as you can keep messageIds in "Completed". Since you likely can't afford infinite storage, make this last at least as long as 2x your SQS message visibility timeout; there is reduced chances of duplication after that (on top of the already very low chances, but still not guaranteed).
Even with all this, there is still a chance of duplication - all the precautions and SQS message visibility timeouts help reduce this chance to very small, but the chance is still there:
Your app can crash/hang/do a very long GC right after processing the message, but before the messageId is "Completed" (maybe you're using a database for this storage and the connection to it is down)
In this case, "Processing" will eventually expire, and another thread could process this message (either after SQS visibility timeout also expires or because SQS had a duplicate in it).
Store the message, or a reference to the message, in a database with a unique constraint on the Message ID, when you receive it. If the ID exists in the table, you've already received it, and the database will not allow you to insert it again -- because of the unique constraint.
AWS SQS API doesn't automatically "consume" the message when you read it with API,etc. Developer need to make the call to delete the message themselves.
SQS does have a features call "redrive policy" as part the "Dead letter Queue Setting". You just set the read request to 1. If the consume process crash, subsequent read on the same message will put the message into dead letter queue.
SQS queue visibility timeout can be set up to 12 hours. Unless you have a special need, then you need to implement process to store the message handler in database to allow it for inspection.
You can use setVisibilityTimeout() for both messages and batches, in order to extend the visibility time until the thread has completed processing the message.
This could be done by using a scheduledExecutorService, and schedule a runnable event after half the initial visibility time. The code snippet bellow creates and executes the VisibilityTimeExtender every half of the visibilityTime with a period of half the visibility time. (The time should to guarantee the message to be processed, extended with visibilityTime/2)
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
ScheduledFuture<?> futureEvent = scheduler.scheduleAtFixedRate(new VisibilityTimeExtender(..), visibilityTime/2, visibilityTime/2, TimeUnit.SECONDS);
VisibilityTimeExtender must implement Runnable, and is where you update the new visibility time.
When the thread is done processing the message, you can delete it from the queue, and call futureEvent.cancel(true) to stop the scheduled event.

epoll performance for smaller timeout values

I have a single thread server process that watches few (around 100) sockets via epoll in a loop, my question is that how to decide the optimum value of epoll_wait timeout value, since this is a single threaded process, everything is triggered off epoll_wait , if there is no activity on sockets, program remains idle, my guess is that if i give too small timeout, which causes too many epoll_wait calls there is no harm because even though my process is doing too many epoll_wait calls, it would be sitting idle otherwise, but there is another point, I run many other processes on this (8 core) box, something like 100 other process which are clients to this process, I am wondering how timeout value impacts cpu context switiching, i.e if i give too small timeout which results in many epoll_wait call will my server process be put in waiting many more times vs when I give a larger timeout value which results in fewer epoll_wait calls.
any thoughts/ideas.
Thanks
I believe there is no good reason to make your process wake up if it has nothing to do. Simply set the timeout to when you first need to do something. For example, if your server has a semantic of disconnecting a client after N seconds of inactivity, set the epoll timeout to the time after the first client would have to be disconnected assuming no activity. In other words, set it to:
min{expire_time(client); for each client} - current_time
Or, if that's negative, you can disconnect at least one client immediately. In general, this works not only for disconnecting clients; you can abstract the above into "software timers" within your application.
I'm failing to see this compromise you've mentioned. If you use a timeout any smaller than you have to, you'll wake up before you have to, then, presumably, go back to sleep because you have nothing to do. What good does that do? On the other hand, you must not use a timeout any larger than what you have to - because that would make your program not respect the disconnect timeout policy.
If your program is not waiting for any time-based event (like disconnecting clients), just give epoll_wait() timeout value -1, making it wait forever.
UPDATE If you're worried that this process being given less CPU when other processes are active, just give it lower nice value (scheduler priority). On the other hand, if you're worried that your server process will be swapped out to disk in favour of other processes when it's idle, it is possible to avoid swapping it out. (or you can just lower /proc/sys/vm/swappiness, affecting all processes)

Do any boost::asio async calls automatically time out?

I have a client and server using boost::asio asynchronously. I want to add some timeouts to close the connection and potentially retry if something goes wrong.
My initial thought was that any time I call an async_ function I should also start a deadline_timer to expire after I expect the async operation to complete. Now I'm wondering if that is strictly necessary in every case.
For example:
async_resolve presumably uses the system's resolver which has timeouts built into it (e.g. RES_TIMEOUT in resolv.h possibly overridden by configuration in /etc/resolv.conf). By adding my own timer, I may conflict with how the user wants his resolver to work.
For async_connect, the connect(2) syscall has some sort of timeout built into it
etc.
So which (if any) async_ calls are guaranteed to call their handlers within a "reasonable" time frame? And if an operation [can|does] timeout would the handler be passed the basic_errors::timed_out error or something else?
So I did some testing. Based on my results, it's clear that they depend on the underlying OS implementation. For reference, I tested this with a stock Fedora kernel: 2.6.35.10-74.fc14.x86_64.
The bottom line is that async_resolve() looks to be the only case where you might be able to get away without setting a deadline_timer. It's practically required in every other case for reasonable behavior.
async_resolve()
A call to async_resolve() resulted in 4 queries 5 seconds apart. The handler was called 20 seconds after the request with the error boost::asio::error::host_not_found.
My resolver defaults to a timeout of 5 seconds with 2 attempts (resolv.h), so it appears to send twice the number of queries configured. The behavior is modifiable by setting options timeout and options attempts in /etc/resolv.conf. In every case the number of queries sent was double whatever attempts was set to and the handler was called with the host_not_found error afterwards.
For the test, the single configured nameserver was black-hole routed.
async_connect()
Calling async_connect() with a black-hole-routed destination resulted in the handler being called with the error boost::asio::error::timed_out after ~189 seconds.
The stack sent the initial SYN and 5 retries. The first retry was sent after 3 seconds, with the retry timeout doubling each time (3+6+12+24+48+96=189). The number of retries can be changed:
% sysctl net.ipv4.tcp_syn_retries
net.ipv4.tcp_syn_retries = 5
The default of 5 is chosen to comply with RFC 1122 (4.2.3.5):
[The retransmission timers] for a SYN
segment MUST be set large enough to
provide retransmission of the segment
for at least 3 minutes. The
application can close the connection
(i.e., give up on the open attempt)
sooner, of course.
3 minutes = 180 seconds, though the RFC doesn't appear to specify an upper bound. There's nothing stopping an implementation from retrying forever.
async_write()
As long as the socket's send buffer wasn't full, this handler was always called right away.
My test established a TCP connection and set a timer to call async_write() a minute later. During the minute where the connection was established but prior to the async_write() call, I tried all sorts of mayhem:
Setting a downstream router to black-hole subsequent traffic to the destination.
Clearing the session in a downstream firewall so it would reply with spoofed RSTs from the destination.
Unplugging my Ethernet
Running /etc/init.d/network stop
No matter what I did, the next async_write() would immediately call its handler to report success.
In the case where the firewall spoofed the RST, the connection was closed immediately, but I had no way of knowing that until I attempted the next operation (which would immediately report boost::asio::error::connection_reset). In the other cases, the connection would remain open and not report errors to me until it eventually timed out 17-18 minutes later.
The worst case for async_write() is if the host is retransmitting and the send buffer is full. If the buffer is full, async_write() won't call its handler until the retransmissions time out. Linux defaults to 15 retransmissions:
% sysctl net.ipv4.tcp_retries2
net.ipv4.tcp_retries2 = 15
The time between the retransmissions increases after each (and is based on many factors such as the estimated round-trip time of the specific connection) but is clamped at 2 minutes. So with the default 15 retransmissions and worst-case 2-minute timeout, the upper bound is 30 minutes for the async_write() handler to be called. When it is called, error is set to boost::asio::error::timed_out.
async_read()
This should never call its handler as long as the connection is established and no data is received. I haven't had time to test it.
Those two calls MAY have timeouts that get propigated up to your handlers, but you might be supprised at the length of time it takes before either of those times out. (I know I have let a connection just sit and try to connect on a single connect call for over 10 minutes with boost::asio before killing the process). Also the async_read and async_write calls do not have timeouts associated with them, so if you wish to have timeouts on your reads and writes, you will still need a deadline_timer.

Sleep Function Error In C

I have a file of data Dump, in with different timestamped data available, I get the time from timestamp and sleep my c thread for that time. But the problem is that The actual time difference is 10 second and the data which I receive at the receiving end is almost 14, 15 second delay. I am using window OS. Kindly guide me.
Sorry for my week English.
The sleep function will sleep for at least as long as the time you specify, but there is no guarantee that it won't sleep for longer.If you need an accurate interval, you will need to use some other mechanism.
If I understand well:
you have a thread that send data (through network ? what is the source of data ?)
you slow down sending rythm using sleep
the received data (at the other end of network) can be delayed much more (15 s instead of 10s)
If the above describe what you are doing, your design has several flaws:
sleep is very imprecise, it will wait at least n seconds, but it may be more (especially if your system is loaded by other running apps).
networks introduce a buffering delay, you have no guarantee that your data will be send immediately on the wire (usually it is not).
the trip itself introduce some delay (latency), if your protocol wait for ACK from the receiving end you should take that into account.
you should also consider time necessary to read/build/retrieve data to send and really send it over the wire. Depending of what you are doing it can be negligible or take several seconds...
If you give some more details it will be easier to diagnostic the source of the problem. sleep as you believe (it is indeed a really poor timer) or some other part of your system.
If your dump is large, I will bet that the additional time comes from reading data and sending it over the wire. You should mesure time consumed in the sending process (reading time before and after finishing sending).
If this is indeed the source of the additional time, you just have to remove that time from the next time to wait.
Example: Sending the previous block of data took 4s, the next block is 10s later, but as you allready consumed 4s, you just wait for 6s.
sleep is still a quite imprecise timer and obviously the above mechanism won't work if sending time is larger than delay between sendings, but you get the idea.
Correction sleep is not so bad in windows environment as it is in unixes. Accuracy of windows sleep is millisecond, accuracy of unix sleep is second. If you do not need high precision timing (and if network is involved high precision timing is out of reach anyway) sleep should be ok.
Any modern multitask OS's scheduler will not guarantee any exact timings to any user apps.
You can try to assign 'realtime' priority to your app some way, from a windows task manager for instance. And see if it helps.
Another solution is to implement a 'controlled' sleep, i.e. sleep a series of 500ms, checking current timestamp between them. so, if your all will sleep a 1s instead of 500ms at some step - you will notice it and not do additional sleep(500ms).
Try out a Multimedia Timer. It is about as accurate as you can get on a Windows system. There is a good article on CodeProject about them.
Sleep function can take longer than requested, but never less. Use winapi timer functions to get one function called-back in a interval from now.
You could also use the windows task scheduler, but that's going outside programmatic standalone options.