I'm trying to fetch a huge amount of mails (2500 and more) from an IMAP-Server. Actually I'm using the imap.FetchHeaders() fuction but this is not THAT fast. Then I've tried the imap.FetchSingleHeader() but this is so much slower than imap.FetchHeaders()...
What would you recommend ??
The imap.FetchHeaders() method will send a single IMAP command to fetch the headers. The IMAP server will send all headers in a single reply. The majority of the time it takes for the entire operation to complete is likely the IMAP server "think time", to process the request and send the response. If you turn on verbose logging (set the imap.VerboseLogging property = true) and then examine the contents of the imap.LastErrorText property, you should see timing information in elapsed milliseconds.
In summary, it's unlikely that fetching 2500 headers can be made any faster.
One note: To avoid problems we've seen when trying to fetch huge numbers of emails, Chilkat will send a maximum request of 1000 headers in a single request. This means that inside the FetchHeaders method (for the case of fetching 2500 headers), three separate request/response pairs will occur.
Thanks Howard,
This is to answer your question in the comment above about GetMailboxStatus.
The GetMailboxStatus method sends a STATUS command requesting the following items: (MESSAGES RECENT UIDNEXT UIDVALIDITY UNSEEN)
Given that it's part of the IMAP protocol standard (at https://www.rfc-editor.org/rfc/rfc3501#section-6.3.10 ), it should be valid for all servers. (I don't recall ever fielding a support question where GetMailboxStatus did not work correctly.)
Related
I want to use bokeh to display a time series and provide the data with updates via source.stream(someDict). The data is, however, generated by a c++ application (server) that may run on the same machine or a machine in the network. I was looking into transmitting the updated data (only the newly added lines of the time series) via ZMQ to the python program (client).
The transmission of the message seems easy enough to implement but
the dictionary is column based. Is it not more efficient to append lines, i.e. one line per point in time, and send this?
If there is no good way for the first, what kind of object should I send? Do I need to marshal the information or is it sufficient to make a long string like {col1:[a,b,c,...], col2:[...],...} and send this to the client? I expect to send not more than a few hundred lines with 10 floats per second.
Thanks for all helpful answers.
I'm having a difficult time with libcurl trying to adapt it to a particular situation. What I'm doing is essentially loading a variable number of objects into memory, performing various transforms on them, and then I want to uploaded them (serialized binary data of course) as part of a multi part post.
The part I'm struggling with is that I want to just add them as a part as they finish down this pipeline, then delete them after that particular part is posted.
I have thought about perhaps giving it a read function ptr, and on the callbacks perhaps manually feed the buffer with the part headers and data? This approach seems to be quite a hack.
I have tried the regular multipart approach (with multi-handle) but that seems to require all the data up front, or to be read from a file. Which i do not want libcurl to deal with.
To recap, I want to open a connection, start http multipart post request -> get in memory buffer -> add as post attatchment (multipart) -> send that off -> wait for next chunk of data -> repeat till done.
Thanks in advanced.
Use the curl_formadd() function to prepare a multipart/form-data HTTP post, and then use the CURLOPT_HTTPPOST option to actuallly send it. curl_formadd() has a CURLFORM_STREAM option to enable use of the connection's CURLOPT_READFUNCTION callback so you can custom-stream each multipart's data.
i've created an C++ application using WinSck, which has a small (handles just a few features which i need) http server implemented. This is used to communicate with the outside world using http requests. It works, but sometimes the requests are not handled correctly, because the parsing fails. Now i'm quite sure that the requests are correctly formed, since they are sent by major web browsers like firefox/chrome or perl/C# (which have http modules/dll's).
After some debugging i found out that the problem is in fact in receiving the message. When the message comes in more than just one part (it is not read in one recv() call) then sometimes the parsing fails. I have gone through numerous tries on how to resolve this, but nothing seems to be reliable enough.
What i do now is that i read in data until i find "\r\n\r\n" sequence which indicates end of header. If WSAGetLastError() reports something else than 10035 (connection closed/failed) before such a sequence is found i discard the message. When i know i have the whole header i parse it and look for information about the body length. However i'm not sure if this information is mandatory (i think not) and what should i do if there is no such information - does it mean there will be no body? Another problem is that i do not know if i should look for a "\r\n\r\n" after the body (if its length is greater than zero).
Does anybody know how to reliably parse a http message?
Note: i know there are implementations of http servers out there. I want my own for various reasons. And yes, reinventing the wheel is bad, i know that too.
If you're set on writing your own parser, I'd take the Zed Shaw approach: use the Ragel state machine compiler and build your parser based on that. Ragel can handle input arriving in chunks, if you're careful.
Honestly, though, I'd just use something like this.
Your go-to resource should be RFC 2616, which describes HTTP 1.1, which you can use to construct a parser. Good luck!
You could try looking at their code to see how they handle a HTTP message.
Or you could look at the spec, there's message length fields you should use. Only buggy browsers send additional CRLFs at the end, apparently.
Anyway HTTP request has "\r\n\r\n" at the end of request headers and before the request data if any, even if request is "GET / HTTP/1.0\r\n\r\n".
If method is "POST" you should read as many bytes after "\r\n\r\n", as specified in Content-Length field.
So pseudocode is:
read_until(buf, "\r\n\r\n");
if(buf.starts_with("POST")
{
contentLength = regex("^Content-Length: (\d+)$").find(buf)[1];
read_all(buf, contentLength);
}
There will be "\r\n\r\n" after the content only if content includes it. Content may be binary data, it hasn't any terminating sequences, and the one method to get its size is use Content-Length field.
HTTP GET/HEAD requests have no body, and POST request can have no body too. You have to check if it's a GET/HEAD, if it's, then you have no content (body/message) sent. If it was a POST, do as the specs say about parsing a message of known/unknown length, as #gbjbaanb said.
I don't know if its possible but just want to ask if we can cfhttp or any other thing to read selected amount of data instead of putting whole file in CFHTTP.FileContent.
I am using cfhttp and want to read only last two lines from a remote xml files(about 20 of them) and read middle two lines from some text files (about 7 of them). Is there any way I could just read that specific data instead of getting all files because its taking a lot of time right now(about 15-20 seconds). I just want to reduce the run time of my .cfm page.
Any suggestions ???
Hmm, not really any special way to get just parts of the remote files.
Do you have to do it every time? Could you fetch the files in the background, write them locally, and have your actual incoming requests just read those files? Make the reading of the remote files asynchronous to the incoming requests?
If not, and you're using CF8+, you could use CFTHREAD to thread out the various requests to run in parallel: http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=Tags_t_04.html
You can use the "join" action in the end to make wait for all the threads to complete.
Edit:
Here's a great tutorial by Ben Nadel on using CFThread to parallelize CFHTTP requests:
http://www.bennadel.com/blog/749-Learning-ColdFusion-8-CFThread-Part-II-Parallel-Threads.htm
There's something else, though:
27-30 sequential http requests should not take 20-30 seconds. It really shouldn't even take 1-2 seconds - so you may have some serious other issue going on here.
HTTP does not have the ability to read a file in that manner. This has nothing to do with ColdFusion.
You can use some smart caching to reduce the time somewhat at the cost of a longer time the first time you run it using CFHTTP's method="HEAD" which does not.
Do you have a local copy of the page?
No, use CFHTTP method="GET" to grab and store it
Yes, use CFHTTP method="HEAD" to check the timestamp and compare it to the cached version. If cache is newer, use it, else CFHTTP method="GET" to grab and parse the file you want.
method="HEAD" will only grab the http headers and not the entire file which will speed things up ever so slightly. Either way, you are making almost 30 file requests, so this isn't going to be instantaneous either way you cut it.
How about ask CF to only serve that chunk of file with URL params?
Since it is XML, I guess you can use xmlSearch() and return only the result?
as for text file, u can pass in the startline & numOfLines and return only those lines as string?
im now currently developing a standalone c++ program that would list all the access URL in a browser and its corresponding response time....
at this point of time, i can already sniff all out and in going packets. i am using winpcap for this...
retrieved packets were filtered to by only those 'tcp port 80(http) or 443(https)'...
and know i want to read some http headers. the problem i have is that usually ip are fragmented.
I want to know how to reassemble this and how to have some details about the http..
Note: i want to implement that of WIRESHARK.. in every packet/frame, it has a
'REASSEMBLED TCP SEGMENT'
any idea or tutorials how i can easily attain this?!..
thanks alot!
You'll have to do the same thing TCP does to reassemble packets, which means parsing the header of the packets and sequencing them into another buffer. The worst program logic is probably dealing with missing information; you'll then have to see if it was flagged and retransmitted.
There are a number of RFCs which cover this: 675, 793, 1122 and others. If looking through those seems overwhelming, maybe back off and look at the Roadmap RFC, rfc 4614.