How do you link multipart AIS messages? - ais

The message format is
!AIVDM,2,1,,B,177KQJ5000G?tO`K>RA1wUbN0TKH,0*5C
The second field (in this case, 2), designates the number of parts in the AIS message and the third field (in this case, 1) indicates the part or fragment sequence.
If the messages do NOT arrive in sequence, is there a fail safe method of linking the message fragments? [I understand that several fragments can arrive in random order.]

Message parts must follow an order.If they are not sequential you should not take the message into account.IEC 61162 document which is a standard defines the structures of NMEA sentences including AIS messages,says that the multipart messages have to be in order.If you are facing messages in random order it is a hardware failure as manufacturer have to obey the rules defined in IEC 61162 Standard.

The AIS payloads are encapsulated within the NMEA message format -- NMEA sees them only as a black box -- so you should not expect a NMEA parser to be sensitive to the encoded AIS content.
If you were desperate, you could try to match the multipart messages by the expected lengths of each AIS payload message type, but you'd run into the following problems:
Some message types (e.g. type 8: Binary Broadcast Message) have a variable length
There are conventions for aligning messages on byte boundaries that are not always followed, so even fixed-length messages can be received with differing lengths (e.g. type 15).
The best you can do in practice is assume that multipart messages may only arrive interleaved if the two messages have different NMEA message types (e.g. one message is AIVDM, and the other is SAVDM). This is the approach taken in the NMEA parsing library I wrote, under each_complete_message.
You could go further and use the total number of message parts as a way to differentiate messages, but in practice 3-part messages seem to be exceedingly rare.
TL;DR
No, there is no fail safe method of linking fragments.

Maybe you can check why the third field is empty and whether it can be activated.
ShineMicro uses it in the following way:
!AIVDM,X1,X2,X3,A,S--S,X4*CRC
where
X1 - total number of sentences (parts) needed to transmit 1 AIS message
X2 - sentence number (1 to 9)
X3 - sequential message identifier (0-9), sequentially assigned and is incremented for each new multy-sentences message

If you are using Python, you can use the libais library: https://github.com/schwehr/libais/. The NmeaQueue works a FIFO queue and does the job for you.
Just put all the messages (whatever the order) in the queue, and call the pop method if the length of the queue is positive. It handles single & multipart messages.
import ais
q = ais.nmea_queue.NmeaQueue()
for msg in msg_generator: # can be a list for instance
q.put(msg)
if q.qsize():
d = q.get().get('decoded', None)
print

Related

How to tell if SSL_read has received and processed all the records from single message

Following is the dilemma,
SSL_read, on success returns number of bytes read, SSL_pending is used to tell if the processed record has more that to be read, that means probably buffer provided is not sufficient to contain the record.
SSL_read may return n > 0, but what if this happens when first records has been processed and message effectively is multi record communication.
Question: I am using epoll to send/receive messages, which means I have to queue up event in case I expect more data. What check will ensure that all the records have been read from single message and it's time to remove this event and queue up an response event that will write the response back to client?
PS: This code hasn't been tested so it may be incorrect. Purpose of the code is to share the idea that I am trying to implement.
Following is code snippet for the read -
//read whatever is available.
while (1)
{
auto n = SSL_read(ssl_, ptr_ + tail_, sz_ - tail_);
if (n <= 0)
{
int ssle = SSL_get_error(ch->ssl_, rd);
auto old_ev = evt_.events;
if (ssle == SSL_ERROR_WANT_READ)
{
//need more data to process, wait for epoll notification again
evt_.events = EPOLLIN | EPOLLERR;
}
else if (err == SSL_ERROR_WANT_WRITE)
{
evt_.events = EPOLLOUT | EPOLLERR;
}
else
{
/* connection closed by peer, or
some irrecoverable error */
done_ = true;
tail_ = 0; //invalidate the data
break;
}
if (old_ev != evt_.events)
if (epoll_ctl(epoll_fd_, EPOLL_CTL_MOD, socket_fd_, &evt_) < 0)
{
perror("handshake failed at EPOLL_CTL_MOD");
SSL_free(ssl_);
ssl_ = nullptr;
return false;
}
}
else //some data has been read
{
tail_ = n;
if (SSL_pending(ssl_) > 0)
//buffer wasn't enough to hold the content. resize and reread
resize();
else
break;
}
}
```
enter code here
SSL_read() returns the number of decrypted bytes returned in the caller's buffer, not the number of bytes received on the connection. This mimics the return value of recv() and read().
SSL_pending() returns the number of decrypted bytes that are still in the SSL's buffer and haven't been read by the caller yet. This would be equivalent to calling ioctl(FIONREAD) on a socket.
There is no way to know how many SSL/TLS records constitute an "application message", that is for the decrypted protocol data to dictate. The protocol needs to specify where a message ends and a new message begins. For instance, by including the message length in the message data. Or delimiting messages with terminators.
Either way, the SSL/TLS layer has no concept of "messages", only an arbitrary stream of bytes that it encrypts and decrypts as needed, and transmits in "records" of its choosing. Similar to how TCP breaks up a stream of arbitrary bytes into IP frames, etc.
So, while your loop is reading arbitrary bytes from OpenSSL, it needs to process those bytes to detect separations between protocol messages, so it can then act accordingly per message.
What check will ensure that all the records have been read from single message and it's time to remove this event and queue up an response event that will write the response back to client?
I'd have hoped that your message has a header with the number of records in it. Otherwise the protocol you've got is probably unparseable.
What you'd need is to have a stateful parser that consumes all the available bytes and outputs records once they are complete. Such a parser needs to suspend its state once it reaches the last byte of decrypted input, and then must be called again when more data is available to be read. But in all cases if you can't predict ahead of time how much data is expected, you won't be able to tell when the message is finished - that is unless you're using a self-synchronizing protocol. Something like ATM headers would be a starting point. But such complication is unnecessary when all you need is just to properly delimit your data so that the packet parser can know exactly whether it's got all it needs or not.
That's the problem with sending messages: it's very easy to send stuff that can't be decoded by the receiver, since the sender is perfectly fine with losing data - it just doesn't care. But the receiver will certainly need to know how many bytes or records are expected - somehow. It can be told this a-priori by sending headers that include byte counts or fixed-size record counts (it's the same size information just in different units), or a posteriori by using unique record delimiters. For example, when sending printable text split into lines, such delimiters can be Unicode paragraph separators (U+2029).
It's very important to ensure that the record delimiters can't occur within the record data itself. Thus you need some sort of a "stuffing" mechanism, where if a delimiter sequence appears in the payload, you can alter it so that it's not a valid delimiter anymore. You also need an "unstuffing" mechanism so that such altered delimiter sequences can be detected and converted back to their original form, of course without being interpreted as a delimiter. A very simple example of such delimiting process is the octet-stuffed framing in the PPP protocol. It is a form of HDLC framing. The record separator is 0x7E. Whenever this byte is detected in the payload, it is escaped - replaced by a 0x7D 0x5E sequence. On the receiving end, the 0x7D is interpreted to mean "the following character has been XOR'd with 0x20". Thus, the receiver converts 0x7D 0x5E to 0x5E first (it removes the escape byte), and then XORs it with 0x20, yielding the original 0x7E. Such framing is easy to implement but potentially has more overhead than framing with a longer delimiter sequence, or even a dynamic delimiter sequence whose form differs for each position within the stream. This could be used to prevent denial-of-service attacks, when the attacker may maliciously provide a payload that will incur a large escaping overhead. The dynamic delimiter sequence - especially if unpredictable, e.g. by negotiating a new sequence for every connection - prevents such service degradation.

How to determine length of buffer at client side

I have a server sending a multi-dimensional character array
char buff1[][3] = { {0xff,0xfd,0x18} , {0xff,0xfd,0x1e} , {0xff,0xfd,21} }
In this case the buff1 carries 3 messages (each having 3 characters). There could be multiple instances of buffers on server side with messages of variable length (Note : each message will always have 3 characters). viz
char buff2[][3] = { {0xff,0xfd,0x20},{0xff,0xfd,0x27}}
How should I store the size of these buffers on client side while compiling the code.
The server should send information about the length (and any other structure) of the message with the message as part of the message.
An easy way to do that is to send the number of bytes in the message first, then the bytes in the message. Often you also want to send the version of the protocol (so you can detect mismatches) and maybe even a message id header (so you can send more than one kind of message).
If blazing fast performance isn't the goal (and you are talking over a network interface, which tends to be slower than computers: parsing may be cheap enough that you don't care), using a higher level protocol or format is sometimes a good idea (json, xml, whatever). This also helps with debugging problems, because instead of debugging your custom protocol, you get to debug the higher level format.
Alternatively, you can send some sign that the sequence has terminated. If there is a value that is never a valid sequence element (such as 0,0,0), you could send that to say "no more data". Or you could send each element with a header saying if it is the last element, or the header could say that this element doesn't exist and the last element was the previous one.

Changing this protocol to work with TCP streaming

I made a simple protocol for my game:
b = bool
i = int
sINT: = string whose length is INT followed by a : then the string
m = int message id.
Example:
m133s11:Hello Worldi-57989b0b1b0
This would be:
Message ID 133
String 'Hello World' length 11
int -57989
bool false
bool true
bool false
I did not know however that TCP could potentially only send PART of a message. I'm not sure exactly how I could modify this such that I can do the following:
on receive data from client:
use client's chunk parser
process data
if has partial message then try to find matching END
if no partial messages then try to read a whole message
for each complete message in queue, dispatch it
I could do this by adding B at the beginning of a message and E at the end, then parsing through for the first char to be B and last to be E.
The only problem is what if
I receive something silly in the middle that does not follow the protocol. Or, what if I was supposed to just receive something that is not a message and is just a string. So if I was somehow intended to receive the string HelloB, then I would parse this as hello and the beginning of a message, but I would never receive that message because it is not a message.
How could I modify my protocol to solve these potential issues? As much as I anticipate only ever receiving correctly formed messages, it would be a nightmare if one was poorly encoded and set everything out of whack.
Thanks
I decided to add the length at the beginning and keep track of if I'm working on a message or not:
so:
p32m133s11:Hello Worldi-57989b0b1b0
I then have 3 states, reading to find 'p', reading to find the length after 'p' or reading bytes until length bytes have been read.
What do you think?
It seems to work great.
What you are doing is pretty old-school, magnetic tape stuff. Nice.
The issue you might have is that if a part of the message is received, you cannot tell if you are partway through a token.
E.g. if you receive:
m12
Is this Message 12, or is it the first part of message 122?
If you receive:
i-12
Is this an integer -12 or is it the first part of an integer -124354?
So I think you need to change it so that the message numbers are fixed width (e.g. four digits), the string length is fixed (e.g. 6 digits) and the integer width is fixed at 10 digits.
So your example would be:
m_133s____11:Hello Worldi____-57989b0b1b0
That way if you get the first part of a message you can store it and wait for the remainder to be received before you process it.
You might also consider using control characters to separate message parts. There are ascii control codes often used for this purpose, RS, FS, GS and US. So a message could be
[RS]FieldName[US]FieldValue[RS]fieldName[US]FieldValue[GS].
You know when you have a complete message because the [GS] marks the end. You can then divide it up into fields using the [RS] as a separator, and split each into name/value using the [US].
See http://en.wikipedia.org/wiki/C0_and_C1_control_codes for a brief bit of information.

parsing an XMPP stream with libxml2

I'm a beginner when it comes to libxml2, so here is my question:
I'm working at a small XMPP client. I have a stream that I receive from the network, the received buffer is fed into my Parser class, chunk by chunk, as the data is received. I may receive incomplete fragments of XML data:
<stream><presence from='user1#dom
and at the next read from socket I should get the rest:
ain.com to='hatter#wonderland.lit/'/>
The parser should report an error in this case.
I'm only interested in elements having depth 0 and depth 1, like stream and presence in my example above. I need to parse this kind of stream and for each of this elements, depth 0 or 1, create a xmlNodePtr (I have classes representing stream, presence elements that take as input a xmlNodePtr). So this means I must be able to create an xmlNodePtr from only an start element like , because the associated end element( in this case) is received only when the communication is finished.
I would like to use a pull parser.
What are the best functions to use in this case ? xmlReaderForIO, XmlReaderForMemory etc ?
Thank you !
You probably want a push parser using xmlCreatePushParserCtxt and xmlParseChunk. Even better would be to choose one of the existing open source C libraries for XMPP. For example, here is the code from libstrophe that does what you want already.

Handling TCP Streams

Our server is seemingly packet based. It is an adaptation from an old serial based system. It has been added, modified, re-built, etc over the years. Since TCP is a stream protocol and not a packet protocol, sometimes the packets get broken up. The ServerSocket is designed in such a way that when the Client sends data, part of the data contains the size of our message such as 55. Sometimes these packets are split into multiple pieces. They arrive in order but since we do not know how the messages will be split, our server sometimes does not know how to identify the split message.
So, having given you the background information. What is the best method to rebuild the packets as they come in if they are split? We are using C++ Builder 5 (yes I know, old IDE but this is all we can work with at the moment. ALOT of work to re-design in .NET or newer technology).
TCP guarantees that the data will arrive in the same order it was sent.
That beeing said, you can just append all the incoming data to a buffer. Then check if your buffer contains one or more packets, and remove them from the buffer, keeping all the remaining data into the buffer for future check.
This, of course, suppose that your packets have some header that indicates the size of the following data.
Lets consider packets have the following structure:
[LEN] X X X...
Where LEN is the size of the data and each X is an byte.
If you receive:
4 X X X
[--1--]
The packet is not complete, you can leave it in the buffer. Then, other data arrives, you just append it to the buffer:
4 X X X X 3 X X X
[---2---]
You then have 2 complete messages that you can easily parse.
If you do it, don't forget to send any length in a host-independant form (ntohs and ntohl can help).
This is often accomplished by prefixing messages with a one or two-byte length value which, like you said, gives the length of the remaining data. If I've understood you correctly, you're sending this as plain text (i.e., '5', '5') and this might get split up. Since you don't know the length of a decimal number, it's somewhat ambiguous. If you absolutely need to go with plain text, perhaps you could encode the length as a 16-bit hex value, i.e.:
00ff <255 bytes data>
000a <10 bytes data>
This way, the length of the size header is fixed to 4 bytes and can be used as a minimum read length when receiving on the socket.
Edit: Perhaps I misunderstood -- if reading the length value isn't a problem, deal with splits by concatenating incoming data to a string, byte buffer, or whatever until its length is equal to the value you read in the beginning. TCP will take care of the rest.
Take extra precautions to make sure that you can't get stuck in a blocking read state should the client not send a complete message. For example, say you receive the length header, and start a loop that keeps reading through blocking recv() calls until the buffer is filled. If a malicious client intentionally stops sending data, your server might be locked until the client either disconnects, or starts sending.
I would have a function called readBytes or something that takes a buffer and a length parameter and reads until that many bytes have been read. You'll need to capture the number of bytes actually read and if it's less than the number you're expecting, advance your buffer pointer and read the rest. Keep looping until you've read them all.
Then call this function once for the header (containing the length), assuming that the header is a fixed length. Once you have the length of the actual data, call this function again.