I bit of a puzzle to solve in CF here.
You have a timestamp passed as an argument to the server. For example 2012-5-10 14:55.
How would you see if 30 seconds has passed from the time given and return true or false?
Rules:
You cannot use the current server time to check if 30 seconds have passed because the timestamp comes from anywhere in the world and will always be wrong.
In this case you can always trust the argument timestamp passed to the server.
How would this be possible in ColdFusion 9 (if at all)?
Hmmm.... Your problem is that you don't know the latency. You can count 30 seconds from the time you receive the timestamp - but that could be seconds after the time stamp was created.
You can count 30 seconds easily enough with...
sleep(30000); //1000 miliseconds = 1 second. 30k = 30 seconds
So as soon as you get the var you could "wait" for thirty seconds and then do something. But from your question it seems like you need exact 30 seconds from the time the timestamp (created on the client) was created. You probably cannot be that exact because:
The 2 clocks are not in synch
You cannot figure out the latency of the request.
Even if you could, since you don't control the client you will have trouble guaranteeing the results because HTTP is stateless and not real time.
If you can dictate an HTML5 browser you could use websockets for this - it's practically begging for it :) But that's the only real solution I can think of.
You don't specify if the argument passed to the server is a API request, form submit or an ajax request, etc, so there will be differences in the implementation on the other end. However, the ColdFusion server end should be extremely similar and is basically bound with 3 issues; timezones, time disparity, and latency.
To solve the timezone issue you can require the argument to be passed as UTC time. When comparing the times with Coldfusion you would do something like:
<cfif DateDiff("s", Variables.PassedTimestamp, DateConvert( "Local2UTC", Now() )) EQ 30>
<!--- Exactly 30 seconds difference between timestamps --->
</cfif>
But you don't know if the time sent to you is synchronized with the time on your server. To help with this you can provide a method for the other end to query your time and either adjust their time accordingly or to echo back your time with their own time.
That time synching and the originally discussed send of the timestamp will both suffer from latency. Any communication over a network will suffer from latency, just varying levels of latency. Location and Internet connection type can have more of an impact than protocol type. You could do something like pinging the other end to get the response time and then halving that and adding it to the timestamp, but that's still an aproximation and not all servers will respond to a ping.
Additionally, the code on your server and the other end will also introduce latency. To reduce this you want the other end to calculate the timestamp as close to the end of the request as possible, and your code to calculate the timestamp as soon as possible. This can be done with ColdFusion by setting the request time as near the top of you Application.cfm|cfc as possible.
<cfset Request.Now = Now()>
And then change the checking code to:
DateDiff("s", Variables.PassedTimestamp, DateConvert( "Local2UTC", Request.Now ))
If, instead, you're wanting to loop until 30 seconds has passed and then respond, note that your response won't appear to happen after 30 seconds because the latency will delay the response going back to the other end.
Getting exactly 30 seconds is impossible, but you take steps that will get your closer. If you're just needing to see if approximately 30 seconds has passed, then that will be good enough.
Related
So we have a very huge database which has around 300,000 urls. These urls have to be pinged and get data from.(these urls are radio stations which are playing song. The data is metadata)
Some of them are sometimes inactive and sometimes active.
On any given time, around 80,000 are active. Some respond slow, some respond quickly. I have a server and I am thinking to do this using c++
My goal is to ping and parse(or crawl) them within 1 minute and keep repeating the process because information(the song playing on them) can change over time. ranging from 2-7 minutes mostly. But I am not sure if it is possible.
What should be my approach to do it?
I have thought of creating two programs, one to test if the url is active or not and run it twice a day. And how much time it generally takes to respond. Does it usually respond slow or whether it is responding slower now.
And the other to do the actual crawling where fastest will be crawled first and some dedicated threads for urls which respond faster.
Please i would love more better ideas or better solutions for it. Can any one tell me how to do the maths to find out the number of dedicated threads i should allot to each for getting the results in least number of time
You don't need performance of your CPU (not your bottleneck at the moment), but you need to avoid network layer stall... if the request timeout is 60 seconds, and you have 16 threads, and hit 16 very slow servers (which will time-out eventually), you are generally stalled for 60 seconds and not processing anything more.
So I would start with let's say 500 threads (and like 15-30s timeout, if you know the very slow radios are capable to fit even this), and keep some statistic about their turnaround, and keep adding more working threads dynamically for every original which didn't get response within 2-3 secs. 80000/500 = 160, so each "normally quick" worker thread has then to ping around 160 urls, if each does take 2 seconds, that's still 320 = 5min! So 500 sounds like minimum.
That said, having 500+ threads will somewhat burden CPU and memory (not sure how much, with decent thread/memory model implementation 500 doesn't sounds like much for modern x86 CPU with GB of RAM, even 5000 sounds still reasonable), but I would worry lot more about the network layer and about possible firewalls around, you need server-grade like network for such amount of requests (if I would try something like that from my home, my own router would filter me out with default settings, detecting it as some kind of DoS attack).
So get some statistic how long the request on average take, then take your target time (2-7min), and divide the number of urls by those, like average ping 5s, round time 3min = 300,000/(3*60/5) = 8333.33 threads at least needed. Then you will have to profile your app to verify, that with 8000 threads it will not choke on something else, but it will really handle the task as expected.
(other option is to fire asynchronous http request from single thread, but that sort of creates its own threads for each task any way, so I would rather manage the threads myself, and use synchronous http calls)
And thinking about dynamic grow mechanics... you can keep some counters about how many new requests were added in last second, and how many finished (either responded or failed), and after few seconds of running these should start to form some kind of "throughput" statistic, then if throughput is under desired threshold, you can add more threads.
About active/inactive... keep the response time/last-seen/last-check together with url, and add some further logic to check url only when it makes sense (like not within next 60s, if it did just respond, or check inactive just after 6h from last test). You need also avoid checking the same url in two different threads at the same time, so some central manager code should feed the threads with target (maybe some FIFO thread-safe queue ... actually you can use its size to estimate how well the worker threads are processing it, so you can add more threads when you see the queue is not emptying fast enough = that avoids adding the statistic code to thread themselves).
I've got a service system that gets requests from another system. A request contains information that is stored on the service system's MySQL database. Once a request is received, the server should start a timer that will send a FAIL message to the sender if the time has elapsed.
The problem is, it is a dynamic system that can get multiple requests from the same, or various sources. If a request is received from a source with a timeout limit of 5 minutes, and another request comes from the same source after only 2 minutes, it should be able to handle both. Thus, a timer needs to be enabled for every incoming message. The service is a web-service that is programmed in C++ with the information being stored in a MySQL database.
Any ideas how I could do this?
A way I've seen this often done: Use a SINGLE timer, and keep a priority queue (sorted by target time) of every timeout. In this way, you always know the amount of time you need to wait until the next timeout, and you don't have the overhead associated with managing hundreds of timers simultaneously.
Say at time 0 you get a request with a timeout of 100.
Queue: [100]
You set your timer to fire in 100 seconds.
Then at time 10 you get a new request with a timeout of 50.
Queue: [60, 100]
You cancel your timer and set it to fire in 50 seconds.
When it fires, it handles the timeout, removes 60 from the queue, sees that the next time is 100, and sets the timer to fire in 40 seconds. Say you get another request with a timeout of 100, at time 80.
Queue: [100, 180]
In this case, since the head of the queue (100) doesn't change, you don't need to reset the timer. Hopefully this explanation makes the algorithm pretty clear.
Of course, each entry in the queue will need some link to the request associated with the timeout, but I imagine that should be simple.
Note however that this all may be unnecessary, depending on the mechanism you use for your timers. For example, if you're on Windows, you can use CreateTimerQueue, which I imagine uses this same (or very similar) logic internally.
I have a very basic app that plugs data into a stored procedure which in turn returns a recordset. I've been experiencing what I thought were 'timeouts'. However, I'm now no longer convinced that this is what is really happening. The reason why is that the DBA and I watched sql server spotlight to see when the stored procedure was finished processing. As soon as the procedure finished processing and returned a recordset, the ColdFusion page returned a 'timeout' error. I'm finding this to be consistent whenever the procedure takes longer than a minute. To prove this, I created a stored procedure with nothing more than this:
BEGIN
WAITFOR DELAY '00:00:45';
SELECT TOP 1000 *
FROM AnyTableName
END
If I run it for 59 seconds I get a result back in ColdFusion. If I change it to one minute:
WAITFOR DELAY '00:01';
I get a cfstoredproc timeout error. I've tried running this in different instances of ColdFusion on the same server, different databases/datasources. Now, what is strange, is that I have other procedures that run longer than a minute and return a result. I've even tried this locally on my desktop with ColdFusion 10 and get the same result. At this point, I'm out of places to look so I'm reaching out for other things to try. I've also increased the timeout in the datasource connections and that didn't help. I even tried ColdFusion 10 with the timeout attribute but no luck there either. What is consistent is that the timeout error is displayed when the query completes.
Also, I tried adding the WAITFOR in cfquery and the same result happened. It worked when set for 59 seconds, but timed out when changed to a minute. I can change the sql to select top 1 and there is no difference in the result.
Per the comments, it looks like your request timeout is set to sixty seconds.
Use cfsetting to extend your timeout to whatever you need.
<cfsetting requesttimeout = "{numberOfSeconds}">
The default timeout for all pages is 60s, you need to change this in the cfadmin if it is not enough, but most pages should not run this long.
Take some time to familiarise yourself with the cfadmin and all its settings to avoid such head scratching.
As stated use cfsetting tag to override for specific pages.
I have a fastcgi app running several hundred requests per second. Most of the requests finish in a millisecond or less, but some are taking up more than half a second. When outputting some more timing information, I was able to pin down the problem to calling FCGX_Finish_r.
As far as I can see, this call will flush all data written to apache (mod_fastcgi). But why is it blocking?
I tried narrowing it down to the response size, but although the longest response durations (i.e. 2seconds) are with the largest responses (i.e. 120kb), those same large responses can be as short as 16ms.
-- EDIT --
I ran some more numbers, and the bad timings seem to occur once the responses are bigger than 16k. Although from that point on, there is no correlation between the size and the duration of the slow responses.
I have a file of data Dump, in with different timestamped data available, I get the time from timestamp and sleep my c thread for that time. But the problem is that The actual time difference is 10 second and the data which I receive at the receiving end is almost 14, 15 second delay. I am using window OS. Kindly guide me.
Sorry for my week English.
The sleep function will sleep for at least as long as the time you specify, but there is no guarantee that it won't sleep for longer.If you need an accurate interval, you will need to use some other mechanism.
If I understand well:
you have a thread that send data (through network ? what is the source of data ?)
you slow down sending rythm using sleep
the received data (at the other end of network) can be delayed much more (15 s instead of 10s)
If the above describe what you are doing, your design has several flaws:
sleep is very imprecise, it will wait at least n seconds, but it may be more (especially if your system is loaded by other running apps).
networks introduce a buffering delay, you have no guarantee that your data will be send immediately on the wire (usually it is not).
the trip itself introduce some delay (latency), if your protocol wait for ACK from the receiving end you should take that into account.
you should also consider time necessary to read/build/retrieve data to send and really send it over the wire. Depending of what you are doing it can be negligible or take several seconds...
If you give some more details it will be easier to diagnostic the source of the problem. sleep as you believe (it is indeed a really poor timer) or some other part of your system.
If your dump is large, I will bet that the additional time comes from reading data and sending it over the wire. You should mesure time consumed in the sending process (reading time before and after finishing sending).
If this is indeed the source of the additional time, you just have to remove that time from the next time to wait.
Example: Sending the previous block of data took 4s, the next block is 10s later, but as you allready consumed 4s, you just wait for 6s.
sleep is still a quite imprecise timer and obviously the above mechanism won't work if sending time is larger than delay between sendings, but you get the idea.
Correction sleep is not so bad in windows environment as it is in unixes. Accuracy of windows sleep is millisecond, accuracy of unix sleep is second. If you do not need high precision timing (and if network is involved high precision timing is out of reach anyway) sleep should be ok.
Any modern multitask OS's scheduler will not guarantee any exact timings to any user apps.
You can try to assign 'realtime' priority to your app some way, from a windows task manager for instance. And see if it helps.
Another solution is to implement a 'controlled' sleep, i.e. sleep a series of 500ms, checking current timestamp between them. so, if your all will sleep a 1s instead of 500ms at some step - you will notice it and not do additional sleep(500ms).
Try out a Multimedia Timer. It is about as accurate as you can get on a Windows system. There is a good article on CodeProject about them.
Sleep function can take longer than requested, but never less. Use winapi timer functions to get one function called-back in a interval from now.
You could also use the windows task scheduler, but that's going outside programmatic standalone options.