Libevent and file I/O - c++

Does the libevent deal with buffered file I/O? I know it handles sockets pretty good, but does it concern also normal files or it's "only" an epoll/... wrapper?

Using libevent (or any of the underlying readiness notification mechanisms such as e.g. epoll or kqueue) with normal file descriptors does not normally make sense. Exceptions are files on NFS or using kernel AIO with an eventfd.
File descriptors on local disks are always ready, there is always sufficient buffer space, and operations always complete "immediately". The write operation merely copies data to the buffer cache, and the actual write to the disk happens ... whenever it happens. (note that this link is Linux-specific, but apart from maybe some little implementation details it works the same on other systems)

libevent is not an epoll wrapper. It selects the highest performance method available on each platform.
Sockets are also file descriptors so you can should be able to use libevent for file io.
You will need to disable epoll usage of libevent though. If I remember right Epoll does not support unix files.
struct event_config *cfg = event_config_new();
event_config_avoid_method(cfg, "epoll");

libevent sits at a lower level than buffered file I/O (what you get with stdio.h), using file descriptors directly. You are correct thinking that it is 'just' an epoll/select/kevent/etc wrapper. Its purpose is to listen for events on descriptors, which is the lowest level of file I/O. However, you can use it in conjunction with the stdio.h file I/O library facilities, as that also eventually uses the file descriptors. You can use fileno(3) to retrieve the file descriptor from the FILE * you want to wait on.

Related

Can you open the same file multiple times for writing?

We are writing a multi threaded application that does a bunch of bit twiddling and writes the binary data to disk. Is it possible to have each thread std::fopen the same file for writing at the same time? The reasoning would be each thread could do its work and have its own access to the writable file.
std::fstream has functionality defined in terms of the C stdio library. I would be surprised if it were actually specified, but the most likely behavior from opening the same file twice is multiple internal buffers bound to the same file descriptor.
The usual way to simultaneously write to multiple points in the same file is POSIX pwrite or writev. This functionality is not wrapped by C stdio, and by extension not by C++ iostreams either. But, having multiple descriptors to the same filesystem file might work too.
Edit: POSIX open called twice on the same file in Mac OS X produces different file descriptors. So, it might work on your platform, but it's probably not portable.
A definitive answer would require connecting these dots:
Where the C++ standard specifies that fstream works like a C (stdio) stream.
Where the C standard defines when a stream is created (fopen is only defined to associate a stream with a newly-opened file).
Where the POSIX standard defines its requirements for C streams.
Where POSIX defines the case of opening the same file twice.
This is a bit more research than I'm up for at the moment, but I'm sure someone out there has done the legwork.
I've written some high speed multi-threaded data capture utilities, but the output went to separate files on separate hard drives, and then were post-processed.
I seem to recall that you can have fopen not lock the file so in theory that would allow different threads to all write to the same file with independent handles. In practice you're going to run into other issues, namely concurrency. Your threads are almost certainly going to step all over each other and scramble the results unless you implement some synchronization. And if you have to do that, why not just use one handle across all the threads?
I/O access is not a parallelizable task (it can't be, you simply can't send two or more data addresses over the device bus at the same time) so you'd better implement a queue in which many threads posts their chunks of data and one single consumer actually writes them to disk.

Regarding handling more than 1024 socket descriptors

I have written a chat server using C on Linux. I have tested the same and it works fine with respect to performance. The only thing which lags is that I am using select system call for handling of sockets descriptors. Since select has the limit of 1024 so at max my chat server can handle only 1024 users concurrently.
I know that the other option which I can use is poll, but not so sure about it and its performance as compared to select.
Please suggest me the most effective way by which I can resolve this situation.
poll() can be used as an almost drop-in replacement for select(), and will allow you to exceed 1024 file descriptors (you can make make the array passed to poll() as large as you want).
It will have similar performance characteristics to select(), since both require the kernel and userspace application to scan the entire array - but if select() is working OK for you, then poll() should too. (There is actually a slight performance improvement in poll() - the .events field, specifying the events you are interested in for each file descriptor, is not changed by poll(), so you don't have to rebuild the array before every call like you do with the file descriptor sets passed to select()).
If you later find yourself having performance problems caused by scanning the poll file descriptor array, you can consider switching to the epoll interface, which is more complicated but also scales better with very large numbers of file descriptors.
Your question is known as the C10K problem (how to deal with more than 10 thousands simultaneous connections). You'll find lot of resources on the web, e.g. this one.
And you should consider select as an obsolete system call. Even with only dozens of file descriptors, you should at least prefer poll
Notice that Qt and Gtk provide you with an event loop machinery, often using poll (and QtCore or Glib can be used outside of graphical interfaces). There is also libev and libevent. I suggest using one of them.
Linux has no 1024 limit on select(). But:
select() performance is very poor
FreeBSD does :)
Your can use poll(). But its performance suffers when number of active connections increases.
Using epoll() is preferable on Linux however I would suggest to use libevent
libevent is fast, clean and portable way to implement heavy loaded servers and for linux it has epoll under the hood.

fastest technique to read a file into memory?

Is there a generally-accepted fastest technique which is used to read a file into memory in c++?
I will only be reading the file.
I have seen boost have an implementation and I have seen a couple other implementations on here but I would like to know what is considered the fastest?
Thank you in advance
In case it matters, I am considering files up to 1GB and this is for windows.
Use memory-mapped files, maybe using the boost wrapper for portability.
If you want to read files bigger than the free, contiguous portion of your virtual address space, you can move the mapped portion of the file at will.
Consider using Memory-Mapped Files for your case, as the files can be upto 1 GB size.
Memory-mapped files and how they work
And here you can start with win32 API:
MapViewOfFile
There are several other helpful API on MSDN page.
In the event memory-mapped files are not adequate for your application, and file I/O is your bottleneck, using an I/O completion port to handle async I/O on the files will be the fastest you can get on Windows.
I/O completion ports provide an efficient threading model for
processing multiple asynchronous I/O requests on a multiprocessor
system. When a process creates an I/O completion port, the system
creates an associated queue object for requests whose sole purpose is
to service these requests. Processes that handle many concurrent
asynchronous I/O requests can do so more quickly and efficiently by
using I/O completion ports in conjunction with a pre-allocated thread
pool than by creating threads at the time they receive an I/O request.
Generally speaking, mmap it is. But n Windows they have invented their own way of doing this, see "File Mapping". Boost has Memory-Mapped Files library that wraps both ways under a portable pile of code.
Also, you have to optimize for your use-case if you want to be fast. Just mapping file contents into memory is not enough. It is indeed possible that you don't need memory mapped files and could better off using async file I/O, for example. There are many solutions for many problems.

How to treat RAM data as if it was a real file?

So I have some temp data in my program (in RAM). I want to somehow make it seem as it is a file (for example for sending it into another program which takes a file link as argument)?
Is it possible?
How to do such thing?
Why not simply write the file to disk? If writing to disk is too slow, you can pass the FILE_ATTRIBUTE_TEMPORARY flag to CreateFile to keep the data in cache (and avoid writing it to the physical device).
Sometimes the obvious solutions are the best...
If supported by your operating system (Unixoid systems and Windows do), you could try to use memory-mapped files.
You can do it in C using the popen() function:
FILE *f = popen("program args", "w");
// write your output to f here using stdio
pclose(f);
This is possible if your external program reads its input from stdin.
You could use pipe()
The pipe() function shall create a pipe and place two file descriptors,
one each into the arguments fildes[0] and fildes[1], that refer to the
open file descriptions for the read and write ends of the pipe. Their
integer values shall be the two lowest available at the time of the
pipe() call. The O_NONBLOCK and FD_CLOEXEC flags shall be clear on both
file descriptors. (The fcntl() function can be used to set both these
flags.)
Yes, it is possible. You can transfer your data to your other application via an interprocess communication mechanism:
Depending on your OS, you have different options here. You could create a pipe, as other posters have mentioned here, as many OSes have pipes.
You could also use shared memory.
You could simply write it out to a file, and then open up that file in your other application.
Many OSes have other techniques you can use.
EDIT: MSDN lists all the IPC mechanisms available for Windows here.

What's the deal with boost.asio and file i/o?

I've noticed that boost.asio has a lot of examples involving sockets, serial ports, and all sorts of non-file examples. Google hasn't really turned up a lot for me that mentions if asio is a good or valid approach for doing asynchronous file i/o.
I've got gobs of data i'd like to write to disk asynchronously. This can be done with native overlapped io in Windows (my platform), but I'd prefer to have a platform independent solution.
I'm curious if
boost.asio has any kind of file support
boost.asio file support is mature enough for everyday file i/o
Will file support ever be added? Whats the outlook for this?
Has boost.asio any kind of file support?
Starting with (I think) Boost 1.36 (which contains Asio 1.2.0) you can use [boost::asio::]windows::stream_handle or windows::random_access_handle to wrap a HANDLE and perform asynchronous read and write methods on it that use the OVERLAPPED structure internally.
User Lazin also mentions boost::asio::windows::random_access_handle that can be used for async operations (e.g. named pipes, but also files).
Is boost.asio file support mature enough for everyday file i/o?
As Boost.Asio in itself is widely used by now, and the implementation uses overlapped IO internally, I would say yes.
Will file support ever be added? Whats the outlook for this?
As there's no roadmap found on the Asio website, I would say that there will be no new additions to Boost.Asio for this feature. Although there's always the chance of contributors adding code and classes to Boost.Asio. Maybe you can even contribute the missing parts yourself! :-)
boost::asio file i/o on Linux
On Linux, asio uses the epoll mechanism to detect if a socket/file descriptor is ready for reading/writing. If you attempt to use vanilla asio on a regular file on Linux you'll get an "operation not permitted" exception because epoll does not support regular files on Linux.
The workaround is to configure asio to use the select mechanism on Linux. You can do this by defining BOOST_ASIO_DISABLE_EPOLL. The trade-off here being select tends to be slower than epoll if you're working with a large number of open sockets. Open a file regularly using open() and then pass the file descriptor to a boost::asio::posix::stream_descriptor.
boost::asio file i/o on Windows
On Windows you can use boost::asio::windows::object_handle to wrap a Handle that was created from a file operation. See example.
boost::asio::windows::random_access_handle is the easiest way to do this, if you need something advanced, for example asynchronous LockFileEx or something else, you might extend asio, add your own asynchronous events. example
io_uring has changed everything.
asio now support async file read/write.
See the releases notes:
asio 1.21.0 releases notes
ASIO supports overlapped I/O on Windows where support is good. On Unixes this idea has stagnated due to:
Files are often located on the same physical device, accessing them sequentially is preferable.
File requests often complete very rapidly because they are physically closeby.
Files are often critical to complete the basic operation of a program (e.g. reading in its configuration file must be done before initializing further)
The one common exception is serving files directly to sockets. This is such a common special-case that Linux has a kernel function that handles this for you. Again, negating the reason to use asynchronous file I/O.
In Short: ASIO appears to reflect the underlying OS design philosophy, overlapped I/O being ignored by most Unix developers, so it is not supported on that platform.
Asio 1.21 appears to have added built-in filesystem support.
For instance, asio::stream_file now exists with all the async methods you'd expect.
Linux has an asio Library that is no harder to use than Windows APIs for this job (I've used it). Both sets of operating systems implement the same conceptual architecture. They differ in details that are relevant to writing a good library, but not to the point that you cannot have a common interface for both OS platforms (I've used one).
Basically, all flavors of Async File I/O follow the "Fry Cook" architecture. Here's what I mean in the context of a Read op: I (processing thread) go up to a fast food counter (OS) and ask for a cheeseburger (some data). It gives me a copy of my order ticket (some data structure) and issues a ticket in the back to the cook (the Kernel & file system) to cook my burger. I then go sit down or read my phone (do other work). Later, somebody announces that my burger is ready (a signal to the processing thread) and I collect my food (the read buffer).