I am using the libpcap library to monitor HTTP requests and responses. I am also storing the 10 most recent GET requests in memory based on string search and a few responses. Suppose the monitor is on and I am downloading a file, will it affect my download speed or is it, a copy of packet is passed on to libpcap without affecting the traffic?
Previously, i was doing same using iptables + libnetfilter_queue. My libnetfilter_queue based module was bit slow in analysing the packets as many string searches and related operations were done on every outgoing packet, and few incoming packets. It affected by download speed, suppose downloading a file using a download accelerator. When the module was running my download speeds were less in comparison to when it wasn't running. Possible because all the packets were passed to my netfilter_queue module and then to other user applications.
Will i face the same problem with libpcap. I heard it uses some zero-copy mechanism.
A copy of the packet is passed to the PF_PACKET socket (I'm inferring from "libnetfilter" that you're using Linux), so it's not processed in the same code path that processes it as regular network input.
Newer versions of libpcap (1.0 and later) pass those packets to userland through shared memory, which is the "zero-copy" mechanism.
However, there's still processing being done for each packet, so there will be some slowdown unless your machine has idle processor cores and spare memory bandwidth (and disk bandwidth if your program is writing significant amounts of data to the file system). It won't directly increase packet processing latency, as it's not in the code path the way your netfilter-based mechanism was, so it probably won't impact networking performance as much.
It will be a copy of packet is passed on to libpcap without affecting the traffic.
Related
Muliple process access to writing on same file simultaneously..if the file size is excess on the limit(example 10mb),the processing file is renamed(sample.txt to sample1.txt)rolling appender) and create a new one on the same name.
My issue is ,multiple process writing at same time,File size exceed time file closed, if one of the process is still writing on same file. doesnt File rolling .can any one help
One strategy that I've used also works on a distributed computing system accross multiple machines.
If you create a library which will package log messages and then send them via TCP to a destination, then you can have as many processes as you like writing to the same logger. You'd need a server at that destination to receive the log messages and write them to one file.
Generally, inter-process communication occurs via either shared memory or networking. Using networking we can go not-only inter-process, but also inter machine. If we just use the destination of localhost or 127.0.0.1, then the packet never actually reaches the network card. Most drivers are smart enough to just pass the packet to any processes listening, leading to good performance too.
I'm using the asio ( non boost version) library to capture incoming UDP packets via a 10GB Ethernet adapter.
150k packets a second is fine, but I start getting dropped packets when i got to higher rates like 300k packets/sec.
I'm pretty sure the bottleneck is in DMA'ing 300k seperate transfers from the network card to the host system. The transfers aren't big only 1400 bytes per transfer, so not a bandwidth issue.
Ideally i would like a mechanism to coalesce the data from multiple packets into a single DMA transfer to the host. Currently I am using asio::receive, to do synchronous transfers which gives better performance than async_receive.
I have tried using the receive command with a larger buffer, or using an array of multiple buffers, but i always seem to get a single read of 1400 bytes.
Is there any way around this?
Ideally i would like to read some multiple of the 1400 bytes at a time, so long as it didn't take too long for the total to be filled.
ie. wait up to 4ms and then return 4 x 1400 bytes, or simply return after 4ms with however many bytes are available...
I do not control the entire network so i cannot force jumbo frames :(
Cheers,
I would remove the asio layer and go direct to the metal.
If you're on Linux you should use recvmmsg(2) rather than recvmsg() or recvfrom(), as it at least allows for the possibility of transferring multiple messages at a time within the kernel, which the others don't.
If you can't do either of these things, you need to at least moderate your expectations. recvfrom() and recvmsg() and whatever lies over them in asio will never deliver more than one UDP datagram at a time. You need to:
speed up your receiving loop as much as possible, eliminating all possible overhead, especially dynamic memory allocation and I/O to other sockets or files.
ensure that the socket receiver buffer is as large as possible, at least a megabyte, via setsockopt()/SO_RCVBUFSIZ, and don't assume that what you set was what you got: get it back via getsockopt() to see if the platform has limited you in some way.
may be you can try a workarround with tcpdump using the libcap library http://www.tcpdump.org/ and filtering to recive UDP packets
I would like to write a program and run it on two machines, and send some data from one machine to another in an Ethernet frame.
Typically application data is at layer 7 of the OSI model, is there anything like a kernel restriction or API restriction, that would stop me from writing a program in which I can specify a destination MAC address and have some data sent to that MAC as the Ethernet payload? Then write a program to listen for incoming frames and grab the frames from a specified source MAC address, extracting the payload of data from the frame?
(So I don't want any other overhead like IP or TCP/UDP headers, I don't want to go higher than layer 2).
Can this be done in C++, or must all communication happen at the IP layer, and can this be done on Ubuntu? Extra love for pointing or providing examples! :D
My problem is obviously I'm new to network programming in c++ and as far as I know, if I want to communicate across a network I have to use a socket() call or similar, which works at an IP layer, so can I write a c++ program to work at OSI layer 2, are there APIs for this, does the Linux kernel even allow this?
As you already mentioned sockets, probably you would just like to use a raw socket. Maybe this page with C example code is of some help.
In case you are looking for an idea for a program only using Ethernet while still being useful:
Wake on LAN in it's original form is quite simple. Note however that most current implementations actually send UDP packets (exploiting that the receiver does not parse for packet headers etc. but just a string in the packet's payload).
Also the use of raw sockets is usually restricted to privileged users. You might need to either
call your program as root
or have it owned by root and setuid bit set
or set the capability for creating raw socket using setcap CAP_NET_RAW+ep /path/to/your/program-file
The last option gives more fine grained privileges (just raw sockets, not write access to your whole file system etc.) than the other two. It is still less widely known however, since it is "only" supported from kernel 2.6.24 on (which came with Ubuntu 8.04).
Yes, actually linux has a very nice feature that makes it easy to deal with layer 2 packets. You can use a TAP device, which allows your userspace program to read/write ethernet traffic through the kernel.
http://www.kernel.org/pub/linux/kernel/people/marcelo/linux-2.4/Documentation/networking/tuntap.txt
http://en.wikipedia.org/wiki/TUN/TAP
I am writing to USB disk from a lowest priority thread, using chunked buffer writing and still, from time to time the system in overall lags on this operation. If I disable writing to disk only, everything works fine. I can't use Windows file operations API calls, only C write. So I thought maybe there is a WinAPI function to turn on/off USB disk write caching which I could use in conjunction with FlushBuffers or similar alternatives? The number of drives for operations is undefined.
Ideally I would like to never be lagging using write call and the caching, if it will be performed transparently is ok too.
EDIT: would _O_SEQUENTIAL flag on write only operations be of any use here?
Try to reduce I/O priority for the thread.
See this article: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686277(v=vs.85).aspx
In particular use THREAD_MODE_BACKGROUND_BEGIN for your IO thread.
Warning: this doesn't work in Windows XP
The thread priority won't affect the delay that happens in the process of writing the media, because it's done in the kernel mode by the file system/disk drivers that don't pay attention to the priority of the calling thread.
You might try to use "T" flag (_O_SHORTLIVED) and flush the buffers at the end of the operation, also try to decrease the buffer size.
There are different types of data transfer for USB, for data there are 3:
1.Bulk Transfer,
2.Isochronous Transfer, and
3.Interrupt Transfer.
Bulk Transfers Provides:
Used to transfer large bursty data.
Error detection via CRC, with guarantee of delivery.
No guarantee of bandwidth or minimum latency.
Stream Pipe - Unidirectional
Full & high speed modes only.
Bulk transfer is good for data that does not require delivery in a guaranteed amount of time The USB host controller gives a lower priority to bulk transfer than the other types of transfer.
Isochronous Transfers Provides:
Guaranteed access to USB bandwidth.
Bounded latency.
Stream Pipe - Unidirectional
Error detection via CRC, but no retry or guarantee of delivery.
Full & high speed modes only.
No data toggling.
Isochronous transfers occur continuously and periodically. They typically contain time sensitive information, such as an audio or video stream. If there were a delay or retry of data in an audio stream, then you would expect some erratic audio containing glitches. The beat may no longer be in sync. However if a packet or frame was dropped every now and again, it is less likely to be noticed by the listener.
Interrupt Transfers Provides:
Guaranteed Latency
Stream Pipe - Unidirectional
Error detection and next period retry.
Interrupt transfers are typically non-periodic, small device "initiated" communication requiring bounded latency. An Interrupt request is queued by the device until the host polls the USB device asking for data.
From the above, it seems that you want a Guaranteed Latency, so you should use Isochronous mode. There are some libraries that you can use like libusb, or you can read more in msdn
To find out what is letting your system hang you first need to drill down to the Windows hang. What was Windows doing while you did experience the hang?
To find this out you can take a kernel dump. How to get and analyze a Kernel Dump read here.
Depending on the findings you get there you then need to decide if there is anything under your control you can do about. Since you are using a third party library to to the writing there is little you can do except to set the IO priority, thread priority on thread or process level. If the library you were given links against a specific CRT you could try to build your own customized version of it to e.g. flush after every write to prevent write combining by the OS to write only data in big chunks back to disc.
Edit1
Your best bet would be to flush the device after every write. This could force the OS to flush any pending data and write the current pending writes to disc without caching the writes up to certain amount.
The second best thing would be to simply wait after each write to give the OS the chance to write pending changes though small back to disc after a certain time interval.
If you are deeper into performance you should try out XPerf which has a nice GUI and shows you even the call stack where your process did hang. The Windows Team and many other teams at MS use this tool to troubleshoot hang experiences. The latest edition with many more features comes with the Windows 8 SDK. But beware that Xperf only works on OS > Vista.
I'm having problems with low performance using a Windows named pipe. The throughput drops off rapidly as the network latency increases. There is a roughly linear relationship between messages sent per second and round trip time. It seems that the client must ack each message before the server will send the next one. This leads to very poor performance, I can only send 5 (~100 byte) messages per second over a link with an RTT of 200 ms.
The pipe is asynchronous, using multiple overlapped write operations (and multiple overlapped reads at the client end), but this is not improving throughput. Is it possible to send messages in parallel over a named pipe? The pipe is created using PIPE_TYPE_MESSAGE, would PIPE_READMODE_BYTE work better? Is there any other way I can improve performance?
This is a deployed solution, so I can't simply replace the pipe with a socket connection (I've read that Windows named pipe aren't recommended for use over a WAN, and I'm wondering if this is why). I'd be grateful for any help with this matter.
We found that Named Pipes had poor performance from Windows XP onwards.
I don't have a solution for you. But I am concurring with the notion of Named Pipes being useless from XP onwards. We changed our software (in terms of IPC) completely because of it.
Is your comms factored into a separate DLL? Perhaps you could replace the DLL with an interface that looks the same but behaves differently?
I've implemented a work around, introducing a small (~1ms) fixed delay to buffer up as much data as possible before writing to the pipe. Over a network link with a RTT of 200ms, I can send ten times as much data in about a third of the time.
I send a message down the pipe when it first connects, so the client can determine the comms mode supported by the server and send data accordingly.
I would imagine that some of the WAN optimisation gear out there would be able to boost performance, as one of the things they do is understand protocols and reduce their chattiness. Given the latency of many WAN links, this alone can boost throughput and reduce timeouts.