parsing an XMPP stream with libxml2 - c++

I'm a beginner when it comes to libxml2, so here is my question:
I'm working at a small XMPP client. I have a stream that I receive from the network, the received buffer is fed into my Parser class, chunk by chunk, as the data is received. I may receive incomplete fragments of XML data:
<stream><presence from='user1#dom
and at the next read from socket I should get the rest:
ain.com to='hatter#wonderland.lit/'/>
The parser should report an error in this case.
I'm only interested in elements having depth 0 and depth 1, like stream and presence in my example above. I need to parse this kind of stream and for each of this elements, depth 0 or 1, create a xmlNodePtr (I have classes representing stream, presence elements that take as input a xmlNodePtr). So this means I must be able to create an xmlNodePtr from only an start element like , because the associated end element( in this case) is received only when the communication is finished.
I would like to use a pull parser.
What are the best functions to use in this case ? xmlReaderForIO, XmlReaderForMemory etc ?
Thank you !

You probably want a push parser using xmlCreatePushParserCtxt and xmlParseChunk. Even better would be to choose one of the existing open source C libraries for XMPP. For example, here is the code from libstrophe that does what you want already.

Related

decompressing IMAP deflated message

I have an issue trying to decompress an imap message compressed using deflate method. The things I've tryed so far were isolating one of the directions of an IMAP conversation (using wireshark's follow tcp function) and saving the message data in an raw format that I hope it contains only the deflated message part. I then found some programs like tinf (1st and 3rd example) and miniz (tgunzip example) and tryed to inflate back that file, but with no succes.
I am missing something? Thank you in advance.
tinf - http://www.ibsensoftware.com/download.html
Miniz - https://code.google.com/archive/p/miniz/source/default/source
Try piping that raw data to:
perl -MCompress::Zlib -pe 'BEGIN{$i = inflateInit(-WindowBits => -15)}
$_=$i->inflate($_)'
The important part is the -WindowBits => -15 that changes the expected format into a raw one without adler checksum.
(that's derived from the dovecot source, works for me on Thunderbird to gmail network capture).
From RFC4978 that specifies IMAP compression (emphasis mine):
When using the zlib library (see RFC1951), the functions
deflateInit2(), deflate(), inflateInit2(), and inflate() suffice to
implement this extension. The windowBits value must be in the range
-8 to -15, or else deflateInit2() uses the wrong format.
deflateParams() can be used to improve compression rate and resource
use. The Z_FULL_FLUSH argument to deflate() can be used to clear the
dictionary (the receiving peer does not need to do anything).

Send metadata along with Akka stream

Here is my previous question: Send data from InputStream over Akka/Spring stream
I have managed to send compressed and encrypted file over Akka stream. Now, I am looking for way to transport metadata along with data, mainly filename and hash (checksum).
My current idea is to use Flow.prepend function and insert metadata before data this way:
filename, that can vary in size but always ends with null byte
fixed size hash (checksum)
data
Then, on receiving end I would have to use Flow.takeWhile twice - once to read filename and second time to read hash, and then just read data. It doesn't really look like elegant solution plus if in future I would like to add more metadata it will become even worse.
I have noticed method Flow.named, however documentation says just:
Add a ``name`` attribute to this Flow.
and I do not know how to use this (and if is it possible to transport filename over it).
Question is: is there better idea to transport metadata along with data over Akka stream than above?
EDIT: Attaching my drawing with idea.
I think prepending the metadata makes sense. A simple approach could be to prepend the metadata using the same framing you use to send the data.
The receiving end will need to know how many metadata blocks are there, and use this information to split it. See example below.
// client end
filenameSrc
.concat(hashSrc)
.concat(dataSrc)
.via(Framing.delimiter(ByteString("\n"), Int.MaxValue, allowTruncation = true))
.via(Tcp().outgoingConnection(???, ???))
.runForeach{ ??? }
// server end
val printMetadata =
Flow.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val metadataSink = Sink.foreach(println)
val bcast = builder.add(Broadcast[ByteString](2))
bcast.out(0).take(2) ~> metadataSink
FlowShape(bcast.in, bcast.out(1).drop(2).outlet)
})
val handler =
Framing.delimiter(ByteString("\n"), Int.MaxValue)
.via(printMetadata)
.via(???)
This is only one of the many possible approaches to solve this. But whatever solution you choose, the receiver will need to have knowledge of how to extract the metadata from the raw stream of bytes it reads over TCP.

Serialize and deserialize the message using google protobuf in socket programming in C++

Message format to send to server side as below :
package test;
message Test {
required int32 id = 1;
required string name = 2;
}
Server.cpp to do encoding :
string buffer;
test::Test original;
original.set_id(0);
original.set_name("original");
original.AppendToString(&buffer);
send(acceptfd,buffer.c_str(), buffer.size(),0);
By this send function it will send the data to client,i hope and i am not getting any error also for this particular code.
But my concern is like below:
How to decode using Google Protocol buffer for the above message in
the client side
So that i can see/print the message.
You should send more than just the protobuf message to be able to decode it on the client side.
A simple solution would be to send the value of buffer.size() over the socket as a 4-byte integer using network byte order, and the send the buffer itself.
The client should first read the buffer's size from the socket and convert it from network to host byte order. Let's denote the resulting value s. The client must then preallocate a buffer of size s and read s bytes from the socket into it. After that, just use MessageLite::ParseFromString to reconstruct your protobuf.
See here for more info on protobuf message methods.
Also, this document discourages the usage of required:
You should be very careful about marking fields as required. If at
some point you wish to stop writing or sending a required field, it
will be problematic to change the field to an optional field – old
readers will consider messages without this field to be incomplete and
may reject or drop them unintentionally. You should consider writing
application-specific custom validation routines for your buffers
instead. Some engineers at Google have come to the conclusion that
using required does more harm than good; they prefer to use only
optional and repeated. However, this view is not universal.

How to determine length of buffer at client side

I have a server sending a multi-dimensional character array
char buff1[][3] = { {0xff,0xfd,0x18} , {0xff,0xfd,0x1e} , {0xff,0xfd,21} }
In this case the buff1 carries 3 messages (each having 3 characters). There could be multiple instances of buffers on server side with messages of variable length (Note : each message will always have 3 characters). viz
char buff2[][3] = { {0xff,0xfd,0x20},{0xff,0xfd,0x27}}
How should I store the size of these buffers on client side while compiling the code.
The server should send information about the length (and any other structure) of the message with the message as part of the message.
An easy way to do that is to send the number of bytes in the message first, then the bytes in the message. Often you also want to send the version of the protocol (so you can detect mismatches) and maybe even a message id header (so you can send more than one kind of message).
If blazing fast performance isn't the goal (and you are talking over a network interface, which tends to be slower than computers: parsing may be cheap enough that you don't care), using a higher level protocol or format is sometimes a good idea (json, xml, whatever). This also helps with debugging problems, because instead of debugging your custom protocol, you get to debug the higher level format.
Alternatively, you can send some sign that the sequence has terminated. If there is a value that is never a valid sequence element (such as 0,0,0), you could send that to say "no more data". Or you could send each element with a header saying if it is the last element, or the header could say that this element doesn't exist and the last element was the previous one.

synchronizing between send/recv in sockets

I have a server thats sending out data records as strings of varying length(for eg, 79,80,81,82)
I want to be able to receive exactly one record at a time.I've delimited records with a (r) but because I dont know howmany bytes I have to receive, It sometimes merges records and makes it difficult for me to process.
I have two ideas for you:
Use XML for the protocol. This way you know exactly when each message ends.
Send in the header of each "packet" the packet size, this way you know how much to read from the socket for this specific packet.
Edit:
Look at this dummy code for (2)
int buffer_size;
char* buffer;
read( socket, &buffer_size, sizeof(buffer_size));
buffer = (char*) malloc(packet_size);
read( socket, buffer, buffer_size );
// do something
free( buffer) ;
EDIT:
I recommend looking at the comments here, as they note that the contect might not be ready by a simple "read()", you need to keep "read()"ing, until you get the correct buffer size.
Also - you might not need to read the size. Basically you need to look for the ending top level tag of the XML. This can be done by parsing the whole XML, or parlty parsing the XML you get from the stream untill you have 0 nodes "open".
You should delimit with null byte. Show us your code, and we may be able to help you.
Stream sockets do not natively support an idea of a "record" - the abstraction they provide is that of a continuous stream.
You must implement a layer on top of them to provide "records". It sounds like you are already part way there, with the end-of-record delimiter. The pseudo-code to complete it is:
create empty buffer;
forever {
recv data and append to buffer;
while (buffer contains end-of-record marker) {
remove first record from buffer and process it;
move remaining data to beginning of buffer;
}
}
Are you sending your data as a stream?
You can send it as a structure which is easier to parse and retrieve the data from.
struct Message
{
int dataSize;
char data[256];
};