Intercepting traffic to memcached for statistics/analysis - c++

I want to setup a statistics monitoring platform to watch a specific service, but I'm not quiet sure how to go about it. Processing the intercepted data isn't my concern, just how to go about it. One idea was to setup a proxy between the client application and the service so that all TCP traffic went first to my proxy, the proxy would then delegate the intercepted messages to an awaiting thread/fork to pass the message on and recieve the results. The other was to try and sniff the traffic between client & service.
My primary goal is to avoid any serious loss in transmission speed between client & application but get 100% complete communications between client & service.
Environment: UBuntu 8.04
Language: c/c++
In the background I was thinking of using a sqlite DB running completely in memory or a 20-25MB memcache dameon slaved to my process.
Update:
Specifically I am trying to track the usage of keys for a memcache daemon, storing the # of sets/gets success/fails on the key. The idea is that most keys have some sort of separating character [`|_-#] to create a sort of namespace. The idea is to step in between the daemon and the client, split the keys apart by a configured separator and record statistics on them.

Exactly what are you trying to track? If you want a simple count of packets or bytes, or basic header information, then iptables will record that for you:
iptables -I INPUT -p tcp -d $HOST_IP --dport $HOST_PORT -j LOG $LOG_OPTIONS
If you need more detailed information, look into the iptables ULOG target, which sends each packet to userspace for analysis.
See http://www.netfilter.org for very thorough docs.

If you want to go the sniffer way, it might be easier to use tcpflow instead of tcpdump or libpcap. tcpflow will only output TCP payload so you don't need to care about reassembling the data stream yourself. If you prefer using a library instead of gluing a bunch of programs together you might be interested in libnids.
libnids and tcpflow are also available on other Unix flavours and do not restrict you to just Linux (contrarily to iptables).
http://www.circlemud.org/~jelson/software/tcpflow/
http://libnids.sourceforge.net/

You didn't mention one approach: you could modify memcached or your client to record the statistics you need. This is probably the easiest and cleanest approach.
Between the proxy and the libpcap approach, there are a couple of tradeoffs:
- If you do the packet capture approach, you have to reassemble the TCP
streams into something usable yourself. OTOH, if your monitor program
gets bogged down, it'll just lose some packets, it won't break the cache.
Same if it crashes. You also don't have to reconfigure anything; packet
capture is transparent.
- If you do the proxy approach, the kernel handles all the TCP work for
you. You'll never lose requests. But if your monitor bogs down, it'll bog
down the app. And if your monitor crashes, it'll break caching. You
probably will have to reconfigure your app and/or memcached servers so
that the connections go through the proxy.
In short, the proxy will probably be easier to code, but implementing it may be a royal pain, and it had better be perfect or its taking down your caching. Changing the app or memcached seems like the sanest approach to me.
BTW: You have looked at memcached's built-in statistics reporting? I don't think its granular enough for what you want, but if you haven't seen it, take a look before doing actual work :-D

iptables provides libipq, a userspace packet queuing library. From the manpage:
Netfilter provides a mechanism for
passing packets out of the stack for
queueing to userspace, then receiving
these packets back into the kernel
with a verdict specifying what to do
with the packets (such as ACCEPT or
DROP). These packets may also be
modified in userspace prior to
reinjection back into the kernel.
By setting up tailored iptables rules that forward packets to libipq, in addition to specifying the verdict for them, it's possible to do packet inspection for statistics analysis.
Another viable option is manually sniff packets by means of libpcap or PF_PACKET socket with the socket-filter support.

Related

Data access at socket at OS level TCP/IP UDP

I am no expert in network programming although I do have some knowledge of Winsock, for any experts out there I am wanting to know if there is a way I can capture data at the socket coming from an application on my machine and do something with it. ie: I sent a message via MSN but I want to capture it from a custom application before it actually gets sent.
Thanks.
You can certainly capture the packets. Tools like Wireshark are proof of that (have a look at the WinPCap library). Just keep in mind that you are capturing what an application sends, so if the application sends encrypted data using SSL/TLS or similar, that is what you are going to get. You won't be able to decrypt and view the original data without the security keys used.
Altering and/or discarding packets, on the other hand, is much harder, requiring much lower level access to the system, but it is possible (see WinDivert, for example).

Bypassing the NAT and open ports through C++ to achieve low latency

My purpose is to:
not ask to the user for opening ports on his router
doing everything by code with my application
it's possible to do that? Considering that this application is supposed to work only with other machiines with the same application installed can i write some kind of protocol from the ground-up that can do that?
My general idea is to make the connection as fast as possible, i also have to exchange small packets, lowering the delay is much more important for me than just having an high throughput.
Do not mess with the NAT. That won't help with latency all that much anyway. You're using TCP/IP, which is a pretty high level protocol, and is relatively slow. That is - the protocol does a lot of great work for you - but at a cost in terms of latency. ( It maintains connection state, and keeps packets in order, and does a decent job of guaranteeing packet delivery etc. )
If you want a very low latency network channel use UDP - this is lower level, and does not do nearly as much work as TCP. UDP simply does its best to deliver each packet to the destination without maintaining connection open, packets don't necessarily arrive in order, and there is know way to know if the packet even got to its destination.
You need to build those things yourself - or learn to live without them.
Applications built on UDP tend to repeat a lot of information, and implement the protocol logic with a lot of room for error. The result is generally lower latency - but at a cost generally of either reliability or transfer rate.
Also - if you need low latency do NOT tunnel through another protocol like tunneling through SSH or something. This will just increase latency.
This answer is geared toward your requirement that you not ask the customer to adjust any settings on their network. If you are okay with having the customer change things on their network for you, then you should just implement your own custom application as Rafael suggests.
You could use the HTTP protocol over port 80 or HTTPS (which is HTTP over SSL/TLS) protocol over port 443 if you need it encrypted. Or use SSH over port 22.
These are just ports that are typically opened up by a firewall, so you are likely to get through. There is no guarantee that they will be open.
The reason to use the HTTP protocol when using the HTTP or HTTPS is that the firewall may be doing deep packet inspection, and it may blackhole or reset your connection if it doesn't match its expectations. Same goes for using SSH over port 22. You might be better off using SSL over 443, and just hope the firewall is not decrypting the data to do HTTP inspection (just remember that it might be decrypting).
For implementation, you could use libCURL, or Boost.Asio.

p2p open source library tcp/udp multicast support

I have a certain application running on my computer. The same application can run on many computers on a LAN or different places in the world. I want to communicate between them. So I basically want a p2p system. But I will always know which computers(specific IP address) will be peers. I just want peers to have join and leave functionality. The single most important aim will be communication speed and time required. I assume simple UDP multicast (if anything like that exists) between peers will be fastest possible solution. I dont want to retransmit messages even if lost. Should I use an existing p2p library e.g. libjingle,etc. or just create some basic framework from scratch as my needs are pretty basic?
I think you're missing the point of UDP. It's not saving any time in a sense that a message gets faster to the destination, it's just you're posting the message and don't care if it arrives safely to the other side. On WAN - it will probably not arrive on the other side. UDP accross networks is problematic, as it can be thrown out by any router on the way which is tight on bandwidth - there's no guarantee of delivery for it.
I wouldn't suggest using UDP out of the topology under your control.
As to P2P vs directed sockets - the question is what it is that you need to move around. Do you need bi/multidirectional communication between all the peers, or you're talking to a single server from all the nodes?
You mentioned multicast - that would mean that you have some centralized source of data that transmits information and all the rest listen - in this case there's no benefit for P2P, and multicast, as a UDP protocol, may not work well accross multiple networks. But you can use TCP connections to each of the nodes, and "multicast" on your own, and not through IGMP. You can (and should) use threading and non-blocking sockets if you're concerned about sending blocking you, and of course you can use the QoS settings to "ask" routers to rush your sockets through.
You can use zeromq for support all network communication:
zeromq is a simple library encapsulate TCP and UDP for high level communication.
For P2P you can use the different mode of 0mq :
mode PGM/EPGM for discover member of P2P on your LAN (it use multicast)
mode REQ/REP for ask a question to one member
mode PULL/PUSH for duplicate one resource on the net
mode Publish/subscribe for transmission a file to all requester
Warning, zeromq is hard to install on windows...
And for HMI, use green-shoes ?
i think you should succeed using multicast,
unfortunately i do not know any library,
but still in case you have to do it from scratch
take a look at this:
http://www.tldp.org/HOWTO/Multicast-HOWTO.html
good luck :-)

Creating TCP network errors for unit testing

I'd like to create various network errors during testing. I'm using the Berkely sockets API directly in C++ on Linux. I'm running a mock server in another thread from within Boost.Test which listens on localhost.
For instance, I'd like to create a timeout during connect. So far I've tried not calling accept in my mock server and setting the backlog to 1, then making multiple connections, but all seem to successfully connect. I would think that if there wasn't room in the backlog queue I would at least get a connection refused error if not a timeout.
I'd like to do this all programatically if possible, but I'd consider using something external like IPchains to intentionally drop certain packets to certain ports during testing, but I'd need to automate creating and removing rules so I could do it from within my Boost.Test unit tests.
I suppose I could mock the various system calls involved, but I'd rather go through a real TCP stack if possible.
Ideas?
When I did some intensive protocol testing recently I used the click modular router. The advantage is it's quite powerful and relatively easy accessible. If you install click as a kernel module on a linux machine, you can easily reach network elements parameters for both setting and reading them. So you can for example change the loss rate of a drop element from 0 to 100%. While it is a little bit more difficult to get started with, you can simulate quite complex things with it. I personally used it (for example) this way to simulate varying bandwidth and packet loss circumstances to test an RTP video stream.
There was another question similar to this. My recommendation would be to use a network traffic generator like the IXIA. It will allow you to do many possible combinations of protocol testing in a repeatable way.

Rebind a socket to a different interface

Is there an existing Linux/POSIX C/C++ library or example code for how to rebind a socket from one physical interface to another?
For example, I have ping transmitting on a socket that is associated with a physical connection A and I want to rebind that socket to physical connection B and have the ping packets continue being sent and received on connection B (after a short delay during switch-over).
I only need this for session-less protocols.
Thank you
Update:
I am trying to provide failover solution for use with PPP and Ethernet devices.
I have a basic script which can accomplish 90% of the functionality through use of iptables, NAT and routing table.
The problem is when the failover occurs, the pings continue being sent on the secondary connection, however, their source IP is from the old connection.
I've spoken with a couple of people who work on commercial routers and their suggestion is to rebind the socket to the secondary interface.
Update 2:
I apologise for not specifying this earlier. This solution will run on a router. I cannot change the ping program because it will run on the clients computer. I used ping as just an example, any connection that is not session-based should be capable of being switched over. I tested this feature on several commercial routers and it does work. Unfortunately, their software is proprietary, however, from various conversations and testing, I found that they are re-binding the sockets on failover.
As of your updated post, the problem is that changing the routing info is not going to change the source address of your ping, it will just force it out the second interface. This answer contains some relevant info.
You'll need to change the ping program. You can use a socket-per-interface approach and somehow inform the program when to fail over. Or you will have to close the socket and then bind to the second interface.
You can get the interface info required a couple of ways including calling ioctl() with the SIOCGIFCONF option and looping through the returned structures to get the interface address info.
I do't think that's quite a well-defined operation. the physical interfaces have different MAC addresses, so unless you have a routing layer mapping them (NAT or the like) then they're going to have different IP addresses.
Ports are identified by a triple of <IP addr, Port number, protocol> so if your IP address changes the port is going to change.
What are you really trying to do here?
I'm not at all sure what you're trying to accomplish, but I have a guess... Are you trying to do some kind of failover? If so, then there are indeed ways to accomplish that, but why not do it in the OS instead of the application?
On one end you can use CARP, and on the other you can use interface trunking/bonding (terminology varies) in failover mode.