I am in dilema to make decision on the below scenarios. Kindly need experts help.
Scenario : There is TCP/IP communication between two process running in two boxes.
Communication Method 1 : Stream based communication on the socket. Where on the receiver side , he will receive the entire byte buffer and interpret first few fixed bytes as header and desrialize it and get to know the message length and start take message of that length and deserialize it and proceed to next message header like that goes on....
Communication Method2 : Put all the messages in a vector and vector will be residing in a class object. serialize the class object in one go and send to receiver. Receiver deserialize the class object and read the vector array one by one.
Please let me know which approach is efficient and if any other approach , please guide me.
Also pros and cons of class based data transmission and structure based data transmission and which is suitable for which scenario ?
Your question lacks some key details, and mixes different concerns, frustrating any attempt to provide a good answer.
Specifically, Method 2 mysteriously "serialises" and "deserialises" the object and contained vector without specifying any details of how that's being done. In practice, the details are of the kind alluded to in Method 1. So, 1 and 2 aren't alternatives unless you're choosing between using a serialisation library and doing it from scratch (in which case I'd say use the library as you're new to this and the library's more likely to get it right).
What I can say:
at a TCP level, it's most efficient to read into a decent sized block (given I tend to work on PC/server hardware, I'd just use 64k though smaller may be enough to get the same kind of throughput) and have each read() or recv() read as much data from the socket as possible
after reading enough bytes (in however many read/recvs) to attempt some interpretation of the data, it's necessary to recognise the end of particular parts of the serialised input: sometimes that's implicit in the data type involved, other times it's communicated using some sentinel (e.g. a linefeed or NUL), and other times there can be a prefixed fixed-size "expect N bytes" header. This aspect/consideration often applies hierarchically to the stream of objects and nested sub objects etc..
the TCP read/recvs may deliver more data than were sent in any single request, so you may have 1 or more bytes that are logically part of the subsequent but incomplete logical message at the end of the block assembled above
the process of reading larger blocks then accessing various fixed and variable sized elements inside the buffers is already supported by C++ iostreams, but you can roll your own if you want
So, let me emphasise this: do NOT assume you will receive any more than 1 byte from any given read of the socket: if you have say a 20 byte header you should loop reading until you hit either an error or have assembled all 20 bytes. Sending 20 bytes in a single write() or send() does not mean the 20 bytes will be presented to a single read() / recv(). TCP is a byte stream protocol, and you have to take arbitrary numbers of bytes as and when they're provided, waiting until you have enough data to interpret it. Similarly, be prepared to get more data than the client could write in a single write()/`send().
Also pros and cons of class based data transmission and structure based data transmission and which is suitable for which scenario ?
These terms are completely bogus. classes and structures are almost identical things in C++ - mechanisms for grouping data and related functions (they differ only in how they - by default - expose the base classes and data members to client code). Either can have or not have member functions or support code that helps serialise and deserialise the data. For example, the simplest and most typical support are operator<< and/or operator>> streaming functions.
If you want to contrast these kind of streaming functions with an ad-hoc "write a binary block, read a binary block" approach (perhaps from thinking of structs as being POD without support code), then I'd say prefer streaming functions where possible, starting with streaming to human-readable representations as they'll make your system easier and quicker to develop, debug and support. Once you're really comfortable with that, if the runtime performance requires it then optimise with a binary representation. If you write the serialisation code well, you won't notice much difference in performance between a cruder void*/#bytes model of data and proper per-member serialisation, but the latter can more easily support unusual cases - portability across systems with different size ints/longs etc., different byte ordering, intentional choices re shallow vs. deep copying of pointed to data etc....
I'd also recommend looking at the boost serialisation library. Even if you don't use it, it should give you a better understanding of how this kind of thing is reasonably implemented in C++.
Both methods are equivalent. In both you must send a header with message size and identifier in order to deserialize. If you assume that first option is composed by a serialized 'class' like a normal message, you must implement the same 'code'.
Another thing you must have in mind is message's size in order to full TCP buffers to optimize communications. If your 1st method messages are so little, try to improve the communication ratio with bigger messages like in 2nd option you describe.
Keep in mind that it's not safe simply streaming out a struct or class directly by interpreting it as a sequence of bytes, even if it's a simple POD - there's issues like endianness (which is unlikely to be a real-world problem for most of us), and structure alignment/padding (which is a potential problem).
C++ doesn't have any built-in serialization/deserialization, you'll either have to roll your own or take a look at things like boost Serialization, or Google's protobuf.
If it is not a homework or study project, there may be little point in fiddling with IPC at TCP stream level especially if that's not something one is familiar with.
Use a messaging library, like ØMQ to send and receive complete messages rather than streams of bytes.
Related
it's probably a pretty basic question, but I couldn't really find something online so I hope you can help me out here!:)
I'm currently working with C++ in Visual Studio and I'm trying to send different variables via TCP Connection at the same time. But I'm not quite sure, what the best way to do it would lookn' like.
I was thinking about sending a long string at once with all variables and between every variable there could be a symbol which shows the end of the variable, just like:
123.45#16.45#33#true#.....
but the programm has to run as fast as possible. A algorithm which sorts every income variables doesn't sound pretty efficient, does it?
The second thought was to create a socket for every variable I got, with a different Port. But I'm not quite sure if that isn't a dirty programming behavior.
So what do you think? Any ideas/experiences?
Thanks in Advance!
Before answering this question, we need to be aligned with some concepts.
TCP is a stream protocol. It's like you're accessing a regular file, but socket API encapsulates many complex tasks from users. When you do a write on a socket, you always append to this socket's TCP stream. So accessing the same socket from two different threads will not give you what you want, instead two different atomic data segments added to TCP stack (you have just made 2 append operation to the stream).
"sending at the same time" may have 2 different meanings:
i) sending the variables at the same TCP session.
In this case, TCP session is no different than writing to a binary file. This is rather a serialization problem than a network protocol problem in that case. You have many options:
a) convert everything to the string representation and send this buffer (fairly slow and inefficient). This is the simplest method. You also need some additional protocol to handle written data as you mentioned (adding # in between). JSON or XML are good examples of that.
b) if you know the exact order to write and read (encode/decode) just write ints, char arrays, shorts, longs as binary (hey float and double serialization may not be that straightforward). Decoder will read binary data in the exact same order. Remember to convert network byte-order if you pick this solution.
ii) sending the variables simultaneously
If you really want to send many variables at the same time (i mean almost simultaneously), then you'd better to have many threads and many sockets at that case. Maybe you are reading integer sensor data from one sensor which creates thousands of integer in a second, and double values from another sensor. The meaning of simultaneously here is that you do not have to wait for reading the whole bunch of integers to read doubles or vice versa. This also saves encoding/decoding time.
I'm writing a TCP Network for a game project.
When a packet comes in the first byte of the packet determines that packet's handling type. The packet should than be forwarded on to a method that handles the packet based on its handle type
I could have a bunch of logic cases that then call a method based on the packet type, but I wanted to see what better design patterns I could implement to reduce code duplication.
I've thought about using the subscriber/notifier pattern already, I'm not fully against it, but I feel as if I'd have a bunch of Subscribe(packetType, funcReference) calls, so perhaps it isn't ideal either.
Having a big switch statement that handles each packet type is perfectly acceptable. Even in the case where there's multiple resolvers for a given handled packet, you can just trigger the subscribed callbacks in that case.
In my experience this is one of those cases where people (myself included, in the past) will over-complicate for the sake of what feels like "better" code. Switch then handle is very easy to grok at first glance, and easy to extend.
Since your packet type marker is only a byte, you can make an array of pointers to handling functions with size of 256 elements. Initialize it once upon program start.
currently we have integrated networking into our game, using UDP protocol. It works fine. But we are sending strings over there network to the server. "10,10,23 - 23,9,10 - 9,23,23"
I came across that I need to serialize the data as this is the right way to do it? what are the benefits of it? does it reduces the performance? Or is sending string fine?
You're already serialising it.
I think what you're asking is whether it is beneficial to serialise to a compact, binary format rather than human-readable strings. The answer is yes, since you can reduce bandwidth requirements and parsing time.
Sometimes you can simply copy the bytes that make up your objects straight into the communications media, though watch out for endianness, padding, width, alignment and other implementation-defined quantities; generally you want to define a single, universal format for your data and some translation may be required on one or more endpoints in order to express the data interchange. That said, in most cases, that's still going to be cheaper than string parsing and stringisation.
The downside is you cannot snoop on the communications channel and immediately see with your eyes what's going on, when debugging your networking.
I've never had formal training in this area so I'm wondering what do they teach in school (if they do).
Say you have two programs in written in two different languages: C++ and Python or some other combination and you want to share a constantly updated variable on the same machine, what would you use and why? The information need not be secured but must be isochronous should be reliable.
Eg. Program A will get a value from a hardware device and update variable X every 0.1ms, I'd like to be able to access this X from Program B as often as possible and obtain the latest values. Program A and B are written and compiled in two different (robust) languages. How do I access X from program B? Assume I have the source code from A and B and I do not want to completely rewrite or port either of them.
The method's I've seen used thus far include:
File Buffer - Read and write to a
single file (eg C:\temp.txt).
Create a wrapper - From A to B or B
to A.
Memory Buffer - Designate a specific
memory address (mutex?).
UDP packets via sockets - Haven't
tried it yet but looks good.
Firewall?
Sorry for just throwing this out there, I don't know what the name of this technique is so I have trouble searching.
Well you can write XML and use some basic message queuing (like rabbitMQ) to pass messages around
Don't know if this will be helpful, but I'm also a student, and this is what I think you mean.
I've used marshalling to get a java class and import it into a C# program.
With marshalling you use xml to transfer code in a way so that it can be read by other coding environments.
When asking particular questions, you should aim at providing as much information as possible. You have added a use case, but the use case is incomplete.
Your particular use case seems like a very small amount of data that has to be available at a high frequency 10kHz. I would first try to determine whether I can actually make both pieces of code part of a single process, rather than two different processes. Depending on the languages (missing from the question) it might even be simple, or turn the impossible into possible --depending on the OS (missing from the question), the scheduler might not be fast enough switching from one process to another, and it might impact the availability of the latest read. Switching between threads is usually much faster.
If you cannot turn them into a single process, then you will have to use some short of IPC (Inter Process Communication). Due to the frequency I would rule out most heavy weight protocols (avoid XML, CORBA) as the overhead will probably be too high. If the receiving end needs only access to the latest value, and that access may be less frequent than 0.1 ms, then you don't want to use any protocol that includes queueing as you do not want to read the next element in the queue, you only care about the last, if you did not read the element when it was good, avoid the cost of processing it when it is already stale --i.e. it does not make sense to loop extracting from the queue and discarding.
I would be inclined to use shared memory, or a memory mapped shared file (they are probably quite similar, depends on the platform missing from the question). Depending on the size of the element and the exact hardware architecture (missing from the question) you may be able to avoid locking with a mutex. As an example in current intel processors, read/write access to 32 bit integers from memory is guaranteed to be atomic if the variable is correctly aligned, so in that case you would not be locking.
At my school they teach CORBA. They shouldn't, it's an ancient hideous language from the eon of mainframes, it's a classic case of design-by-committee, every feature possible that you don't want is included, and some that you probably do (asynchronous calls?) aren't. If you think the c++ specification is big, think again.
Don't use it.
That said though, it does have a nice, easy-to-use interface for doing simple things.
But don't use it.
It almost always pass through C binding.
I need to send a C struct over the wire (using UDP sockets, and possibly XDR at some point) at a fairly high update rate, which potentially causes lots of redundant and unnecessary traffic at several khz.
This is because, some of the data in the struct may not have changed at times, so I thought that delta-encoding the current C struct against the previous C struct would seem like a good idea, pretty much like a "diff".
But I am wondering, what's the best approach of doing something like this, ideally in a portable manner that also ensures that data integrity is maintained? Would it be possible to simply XOR the data and proceed like this?
Similarly, it would be important that the approach remains extensible enough, so that new fields can be added to the struct or reordered if necessary (padding), which sounds as if it'd require versioning information, as well.
Any ideas or pointers (are there existing libraries?) would be highly appreciated!
Thanks
EDIT: Thanks to everyone one who provided an answer, the level of detail is really appreciated, I realize that I probably should not have mentioned UDP though, because that is in fact not the main problem, because there is already a corresponding protocol implemented on top of UDP that accounts for the mentioned difficulties, so the question was really meant to be specific to feasible means of delta encoding a struct, and not so much about using UDP in particular as a transport mechanism.
UDP does not guarantee that a given packet was actually received, so encoding whatever you transmit as a "difference from last time" is problematic -- you can't know that your counterpart has the same idea as you about what the "last time" was. Essentially you'd have to build some overhead on top of UDP to check what packets have been received (tagging each packet with a unique ID) -- everybody who's tried to go this route will agree that more often than not you find yourself more or less duplicating the TCP streaming infrastructure on top of UDP... only, most likely, not as solid and well-developed (although admittedly sometimes you can take advantage of very special characteristics of your payloads in order to gain some modest advantage over plain good old TCP).
Does your transmission need to be one-way, sender to receiver? If that's the case (i.e., it's not acceptable for the receiver to send acknowledgments or retransmits) then there's really not much you can do along these lines. The one thing that comes to mind: if it's OK for the receiver to be out of sync for a while, then the sender could send two kinds of packets -- one with a complete picture of the current value of the struct, and an identifying unique tag, to be sent at least every (say) 5 minutes (so realistically the receiver may be out of sync for up to 15 minutes if it misses two of these "big packets"); one with just an update (diff) from the last "big packet", including the big packet's identifying unique tag and (e.g.) a run-length-encoded version of the XOR you mention.
Of course once having prepared the run-length-encoded version, the server will compare its size vs the size of the whole struct, and only send the delta-kind of packet if the savings are substantial, otherwise it might as well send the big-packet a bit earlier than needed (gains in reliability). The received will keep track of the last big-packet unique tag it has received and only apply deltas which pertain to it (helps against missing packets and packets delivered out of order, depending how sophisticated you want to make your client).
The need for versioning &c, depending on what exactly you mean (will senders and receivers with different ideas about how the struct's C layout should look need to communicate regularly? how do they handshake about what versions are know to both? etc), will add a whole further universe of complications, but that is really another question, and your core question as summarized in the title is already plenty big enough;-).
If you can afford occasional meta-messages from the receiver back to the sender (acks or requests to resend) then depending on the various numerical parameters in play you may design different strategies. I suspect acks would have to be pretty frequent to do much good, so a request to resend a big-packet (either a specifically identified one or "whatever you have that's freshest") may be the best meta-strategy to cull the options space (which otherwise threatens to explode;-). If so then the sender may be blissfully ignorant of whatever strategy the receiver is using to request big-packet-resends, and you can experiment on the receiver side with various such strategies without needing to redeploy the sender as well.
It's hard to offer much more help without some specifics, i.e., at least ballpark numbers for all the numerical parameters -- packet sizes, frequencies of sending, how long is it tolerable for the sender to be out of sync with the receiver, a bundle of network parameters, etc etc. But I hope this somewhat generic analysis and suggestions still help.
To delta encode:
1) Send "key frames" periodically (e.g. once a second). A key frame is a complete copy (rather than a delta) so that if you lose comms for any reason, you only lose a small amount of data before you can "aquire the signal" again. Use a simple packet header that allows you to detect the start of a packet and know what type of data it contains.
2) Calculate the delta from the previous packet and encode that in a compact form. By examining the type of data you are sending and the way it typically changes, you should be able to devise a pretty compact delta. However, you may need to check the size of the delta - in some cases it may not be an efficient encoding - if it's bigger than a key frame you can just send another key frame instead. You can also decide at this point whether your deltas are lossy or lossless.
3) Add a CRC check to the packet (search for CRC32). This will allow the receiver to verify that the packet has been received intact, allowing them to skip invalid packets.
NOTES:
Be careful about doing this over UDP - it gives no guarantee that your packets will arrive in the same order you sent them. Obviously a delta will only work if packets are in order. In this case, you will need to add some form of sequence ID to each packet (first packet is "1", second packet is "2" etc) so that you can detect out-of-order receiving. You may even need to keep a buffer of "n" packets in the receiver so that you can reassemble them in the correct order when you come to decode them (but of course, this could introduce some latency). You will probably also miss some packets over UDP, in which case you'll need to wait until the next keyframe before you'll be able to "re-aquire the signal" - so the key frames must be frequent enough to avoid catastrophic outages in your comms.
Consider using compression (e.g. zip etc). You may find a full packet can be built in a zip-friendly manner (e.g. rearrage data to group bytes that are likely to have similar values (especially zeros) together) and then compressed so well that it is smaller than an uncompressed delta, and you won't need to go to all the effort of deltas at all (and you won't have to worry about packet ordering etc).
edit
- Always use a version number (or packet type) in your packets so you can add new fields or change the delta encoding in the future! You'll need this for differentiating key/delta frames anyway.
I'm not convinced that delta encoding values on UDP - which is inherently unreliable and out of order - is going to be particularly easy. Instead, I'd send an ID of the field which has changed and its current value. That also doesn't require anything to change if you want to add extra fields to the data structure you're sending. If you want a standard way of doing this, look at SNMP; that may be something you can drop in, or it may be a bit baggy for you (it qualifies the field names globally and uses ASN.1 - both of which give maximum interoperability, but at the cost of some bytes in the packet).
Use an RPC like corba or protocol buffers
Use DTLS with a compression option
Use a packed format
Repurposes an existing header compression library