Parsing buffer data in C++

Parsing buffer data in C++ - c++

My C++ project has a buffer which could be any size and is filled by Bluetooth. The format of the incoming messages is like 0x43 0x0B 0x00 0x06 0xA2 0x03 0x03 0x00 0x01 0x01 0x0A 0x0B 0x0B 0xE6 0x0D in which starts with 0x43 and ends with 0x0D. So, it means that each time when buffer is filled, it can have different order of contents according to the above message format.
static const int BufferSize = 1024;
byte buffer[BufferSize];
What is the best way to parse the incoming messages in this buffer?
Since I have come from Java and .NET, What is the best way to make each extracted message as an object? Class could be solution?
I have created a separate class for parsing the buffer like bellow, am I in a right direction?
#include<parsingClass.h>
class A
{
parsingClass ps;
public:
parsingClass.parse(buffer, BufferSize);
}

class ReturnMessage{
char *message;
public:
char *getMessage(unsigned char *buffer,int count){
message = new char[count];
for(int i = 1; i < count-2; i++){
message[i-1] = buffer[i];
}
message[count-2] = '\0';
return message;
}
};
class ParserToMessage{
static int BufferSize = 1024;
unsigned char buffer[BufferSize];
unsigned int counter;
public:
static char *parse_buffer()
{
ReturnMessage rm;
unsigned char buffByte;
buffByte = blueToothGetByte(); //fictional getchar() kind of function for bluetooth
if(buffByte == 0x43){
buffer[counter++] = buffByte;
//continue until you find 0x0D
while((buffByte = blueToothGetByte()) != 0x0D){
buffer[counter++] = buffByte;
}
}
return rm.getMessage(buffer,counter);
}
};

Can you have the parser as a method of a 'ProtocolUnit' class? The method could take a buffer pointer/length as a parameter and return an int that indicates how many bytes it consumed from the buffer before it correctly assembled a complete protocol unit, or -1 if it needs more bytes from the next buffer.
Once you have a complete ProtocolUnit, you can do what you wish with it, (eg. queue it off to some processing thread), and create a new one for the remaining bytes/next buffer.

My C++ project has a buffer which could be any size
The first thing I notice is that you have hard-coded the buffer size. You are in danger of buffer overflow if an attempt is made to read data bigger than the size you have specified into the buffer.
If possible keep the buffer size dynamic and create the byte array according to the size of the data to be received into the buffer. Try and inform the object where your byte array lives of the incoming buffer size, before you create the byte array.
int nBufferSize = GetBufferSize();
UCHAR* szByteArray = new UCHAR[nBufferSize];
What is the best way to parse the incoming messages in this buffer?
You are on the right lines, in that you have created and are using a parser class. I would suggest using memcpy to copy the individual data items one at a time, from the buffer to a variable of your choice. Not knowing the wider context of your intention at this point, I cannot add much to that.
Since I have come from Java and .NET, What is the best way to make
each extracted message as an object? Class could be solution?
Depending on the complexity of the data you are reading from the buffer and what your plans are, you could use a class or a struct. If you do not need to create an object with this data, which provides services to other objects, you could use a struct. Structs are great when your need isn't so complex, whereby a full class might be overkill.
I have created a separate class for parsing the buffer like bellow, am
I in a right direction?
I think so.
I hope that helps for starters!

The question "how should I parse this" depends largely on how you want to parse the data. Two things are missing from your question:
Exactly how do you receive the data? You mention Bluetooth but what is the programming medium? Are you reading from a socket? Do you have some other kind of API? Do you receive it byte at a time or in blocks?
What are the rules for dealing with the data you are receiving? Most data is delimited in some way or of fixed field length. In your case, you mention that it can be of any length but unless you explain how you want to parse it, I can't help.
One suggestion I would make is to change the type of your buffer to use std::vector :
std::vector<unsigned char> buffer(normalSize)
You should choose normalSize to be something around the most frequently observed size of your incoming message. A vector will grow as you push items onto it so, unlike the array you created, you won't need to worry about buffer overrun if you get a large message. However, if you do go above normalSize under the covers the vector will reallocate enough memory to cope with your extended requirements. This can be expensive so you don't want to do it too often.
You use a vector in pretty much the same way as your array. One key difference is that you can simply push elements onto the end of the vector, rather than having to keep a running pointer. SO imagine you received a single int at a time from the Bluetooth source, your code might look something like this:
// Clear out the previous contents of the buffer.
buffer.clear();
int elem(0);
// Find the start of your message. Throw away elements
// that we don't need.
while ( 0x43 != ( elem = getNextBluetoothInt() ) );
// Push elements of the message into the buffer until
// we hit the end.
while ( 0x0D != elem )
{
buffer.push_back( elem );
}
buffer.push_back( elem ); // Remember to add on the last one.
The key benefit is that array will automatically resize the vector without you having to do it no matter whether the amount of characters pushed on is 10 or 10,000.

Related

Passing data buffers from C++ to LabVIEW

I am trying to create a LabVIEW DLL and call it from a C++ program but I am facing a problem of data passing.
A scientific camera I recently bought comes with a LabVIEW SDK, and nothing else. The example program provided with the SDK is mainly a while loop around two functions, ReadData and DecodeData.
ReadData collects data from USB (using VISA read), the data obtained in one call contains several complete data blocks and an incomplete incoming block.
DecodeData is called multiple times to process all the complete blocks (it removes the processed data from the buffer). When all the complete blocks have been processed, the remaining data (the beginning of the incoming block) is passed to ReadData which will concatenate its new data at the end of the buffer.
Full example code:
Details of ReadData:
Details of DecodeData:
In the example program, written in LabVIEW, everything works fine. The problem is when I export these functions in a DLL. The memory buffers, inputs and outputs of both functions, are char arrays. After ReadData, my C++ program correctly obtains a buffer containg data, including null bytes.
The problem is when I inject this buffer in DecodeData, it seems that LabVIEW only takes into account the bytes before the first null byte... I guess that the char[] input is just processed as a null-terminated string and the rest of the data is just discarded.
I tried to add data converters ("string to byte array" at outputs and "byte array to string" at inputs) but the conversion function also discards the data after the first null character.
I could modify the .vi from the sdk to only handle byte arrays and not strings, but it uses lots of character processing functions and I would prefer leaving it as is.
How can I pass the data buffer from C++ to the LabVIEW DLL without losing part of my data?
Edit: here is the C++ code.
The header exported with the LabVIEW DLL:
int32_t __cdecl CORE_S_Read_data_from_USB(char VISARefIn[],
Enum1 blockToProcessPrevCycle, uint32_t bytesToProcessPrevCycle,
uint8_t inBytesRead[], uint32_t *BytesReceived, LVBoolean *DataReception,
uint8_t outBytesRead[], Enum1 *blockToProcess, uint32_t *bytesToProcess,
int32_t longueur, int32_t longueur2);
void __cdecl CORE_S_Decode_data(uint8_t inBytesRead[],
LVBoolean LUXELL256TypeB, uint32_t bytesToProcess, Enum1 blockToProcess,
Cluster2 *PrevHeader, LVBoolean *FrameCompleto,
uint32_t *bytesToProcessNextCycle, Enum1 *blockToProcessNextCycle,
Cluster2 *HeaderOut, uint8_t outBytesRead[], Int16Array *InfraredImage,
Cluster2 *Header, int32_t longueur, int32_t longueur2, int32_t longueur3);
Usage in my C++ source:
while (...)
{
// Append new data in uiBytesRead
ret = CORE_S_Read_data_from_USB(VISARef, blockToProcess, bytesToProcess, uiBytesRead, &BytesReceived,
&DataReception, uiBytesRead, &blockToProcess, &bytesToProcess, BUFFER_SIZE, BUFFER_SIZE);
if (DataReception == 0)
continue;
bool FrameCompleto = true;
while (FrameCompleto)
{
// Removes one frame of uiBytesRead per call
CORE_S_Decode_data(uiBytesRead, LUXELL256TypeB, 0, blockToProcess, &Header, &FrameCompleto, &bytesToProcess, &blockToProcess, &Header,
uiBytesRead, &InfraredImage, &Header, BUFFER_SIZE, BUFFER_SIZE, BUFFER_SIZE);
}
}

It is a little tricky to answer in this specific case but assuming that the problem is that NULL values in the buffer data are causing issues then it might be worth looking at the option to use String Handle Pointers for the String-Type controls and indicators of the VIs you are exporting.
This option can be selected during the "Define VI Prototype" stage of configuring the DLL Build
LabVIEW manages String Types internally as an integer of the string's length and an unsigned char array so it shouldn't matter what characters are used. For interfacing with external code, LabVIEW's extcode.h header defines an LStrHandle as follows:
typedef struct {
int32 cnt; /* number of bytes that follow */
uChar str[1]; /* cnt bytes */
} LStr, *LStrPtr, **LStrHandle;
So a String Handle Pointer is of type *LStrHandle.
extcode.h provides the macros LHStrBuf(LStrHandle) and LHStrLen(LStrHandle) which can ease dereferencing for the String Handle Pointer when you want to read or update the string content and length. Also, note a NULL handle can be used to represent an empty string so don't assume that the handle will be valid without checking.
When creating or resizing String Handle Pointers to pass to a function, it is worth noting that an LStr has exactly the same in-memory representation as a LabVIEW-array so the function NumericArrayResize with typeCode uB can create/resize a large enough buffer to store the string and the length-integer.
An example of creating a new String Handle Pointer for a string of length required_string_length is achieved by passing NumericArrayResize a handle pointer where the handle is NULL.
LStrHandle* new_string_handle_pointer;
// assign NULL value to handle
*new_string_handle_pointer=0;
err = NumericArrayResize(uB, 1, (UHandle *)new_string_handle_pointer, required_string_length);
// new_string_handle_pointer will now reference the new LStrHandle
When updating the String value in a String Handle remember to write the string's characters to the uChar array and to update the size integer. From a performance view, it might not be worth shrinking a String Handle when updating it to a shorter string but you will need to resize it if you know the string you are writing to it will be longer than what it can hold.
You should clean up any handle that is passed to you from LabVIEW or a LabVIEW-based DLL so once you have finished with it, call DSDisposeHandle on the handle that the handle-pointer references.
For more information on LabVIEW's memory manager function please read this guide.

sending raw data directly from RAM

in a SoC solution, the fpga is saving a lot of integer values directly in the RAM.
This Data(the integers) can be seen by the processor on the other side who should send this data over the network without modifying it using the asio library.
Until now this Data was not too big and I copied it to a vector and I send it over the Ethernet without problem (see the code please).
On a current project the amount of the data has been increased (about 200MB) and I would like to send it directly from the ram without copying it a vector before. Of course I will split this to parts.
Is there a way to send this raw data directly from a RAM pointer of type void (void *ptr) or there is a better way to do that ?
Thanks in advance
std::vector<int> int_vect;
for( uint32_t i=from_index ; i<=to_index ; i++ )
{
int_vect.push_back(my_memory_ptr->get_value_axis(....));
}
asio::write(z_sock_ptr->socket_, asio::buffer(int_vect));

Yes. One of the overloads of asio::buffer provides exactly this functionality:
mutable_buffer buffer(
void * data,
std::size_t size_in_bytes);
» more...
If the data is contiguous, we can use it like this:
void* data = /* get data */;
size_t size = /* get size */;
asio::write(z_sock_ptr->socket_, asio::buffer(data, size));

It is possible to create asio buffer from raw data, it is essentially just a non-owning array view:
asio::write(z_sock_ptr->socket_, asio::buffer{p_data, bytes_count});

Dynamically allocate many small pieces of memory

I think this is a very common problem. Let me give an example.
I have a file, which contains many many lines (e.g. one million lines), and each line is of the following form: first comes a number X, and then follows a string of length X.
Now I want to read the file and store all the strings (for whatever reason). Usually, what I will do is: for every line I read the length X, and use malloc (in C) or new (in C++) to allocate X bytes, and then read the string.
The reason that I don't like this method: it might happen that most of the strings are very short, say under 8 bytes. In that case, according to my understanding, the allocation will be very wasteful, both in time and in space.
(First question here: am I understanding correctly, that allocating small pieces of memory is wasteful?)
I have thought about the following optimization: everytime I allocate a big chunk, say 1024 bytes, and whenever a small piece is needed, just cut it from the big chunk. The problem with this method is that, deallocation becomes almost impossible...
It might sound like I want to do the memory management myself... but still, I would like to know if there exists a better method? If needed, I don't mind use some data structure to do the management.
If you have some good idea that only works conditionally (e.g. with the knowledge that most pieces are small), I will also be happy to know it.

The "natural" way to do memory allocation is to ensure that every memory block is at least big enough to contain a pointer and a size, or some similar book-keeping that's sufficient to maintain a structure of free nodes. The details vary, but you can observe the overhead experimentally by looking at the actual addresses you get back from your allocator when you make small allocations.
This is the sense in which small allocations are "wasty". Actually with most C or C++ implementations all blocks get rounded to a multiple of some power of 2 (the power depending on the allocator and sometimes on the order of magnitude size of the allocation). So all allocations are wasty, but proportionally speaking there's more waste if a lot of 1 and 2 byte allocations are padded out to 16 bytes, than if a lot of 113 and 114 byte allocations are padded out to 128 bytes.
If you're willing to do away with the ability to free and reuse just a single allocation (which is fine for example if you're planning to free all of together once you're done worrying about the contents of this file) then sure, you can allocate lots of small strings in a more compact way. For example, put them all end to end in one or a few big allocations, each string nul-terminated, and deal in pointers to the first byte of each. The overhead is either 1 or 0 bytes per string depending how you consider the nul. This can work particularly neatly in the case of splitting a file into lines, if you just overwrite the linebreaks with nul bytes. Obviously you'd need to not mind that the linebreak has been removed from each line!
If you need freeing and re-use, and you know that all allocations are the same size, then you can do away with the size from the book-keeping, and write your own allocator (or, in practice, find an existing pool allocator you're happy with). The minimum allocated size could be one pointer. But that's only an easy win if all the strings are below the size of a pointer, "most" isn't so straightforward.

Yes, statically-allocating a large-ish buffer and reading into that is the usual way to read data.
Say you pick 1KB for the buffer size, because you expect most reads to fit into that.
Are you able to chop rare reads that go above 1KB into multiple reads?
Then do so.
Or not?
Then you can dynamically allocate if and only if necessary. Some simple pointer magic will do the job.
static const unsigned int BUF_SIZE = 1024;
static char buf[BUF_SIZE];
while (something) {
const unsigned int num_bytes_to_read = foo();
const char* data = 0;
if (num_bytes_to_read <= BUF_SIZE) {
read_into(&buf[0]);
data = buf;
}
else {
data = new char[num_bytes_to_read];
read_into(data);
}
// use data
if (num_bytes_to_read > BUF_SIZE)
delete[] data;
}
This code is a delightful mashup of C, C++ and pseudocode, since you did not specify a language.
If you're actually using C++, just use a vector for goodness' sake; let it grow if needed but otherwise just re-use its storage.

You could count the number of lines of text and their total length first, then allocate a block of memory to store the text and a block to store pointers into it. Fill these blocks by reading the file a second time. Just remember to add terminating zeros.

If the entire file will fit into memory, then why not get the size of the file, allocate that much memory and enough for pointers, then read in the entire file and create an array of pointers to the lines in the file?

I would store the "x" with using the largest buffer I can.
You did not tell us what is the max size of x as sizeof(x). It is I think crucial to store it in the buffer to evade addressing for each word and access them relatively quickly.
Something like :
char *buffer = "word1\0word2\0word3\0";
while stocking addr or ...etc.. for 'quick' access
Became like this :
char *buffer = "xx1word1xx2word2xx3word3\0\0\0\0";
As you can see with an x at a fixed size it can be really effective to jump word to word without the need to store each address, only need to read x and jump incrementing the addr using x...
x is not converted in char, integer injected and read using his type size, no need the end of the string \0 for the words this way, only for the full buff to know the end of the buffer (if x==0 then its the end).
I am not that good in explaining thanks to my English I push you some code as a better explanation :
#include <stdio.h>
#include <stdint.h>
#include <string.h>
void printword(char *buff){
char *ptr;
int i;
union{
uint16_t x;
char c[sizeof(uint16_t)];
}u;
ptr=buff;
memcpy(u.c,ptr,sizeof(uint16_t));
while(u.x){
ptr+=sizeof(u.x);
for(i=0;i<u.x;i++)printf("%c",buff[i+(ptr-buff)]);/*jump in buff using x*/
printf("\n");
ptr+=u.x;
memcpy(u.c,ptr,sizeof(uint16_t));
}
}
void addword(char *buff,const char *word,uint16_t x){
char *ptr;
union{
uint16_t x;
char c[sizeof(uint16_t)];
}u;
ptr=buff;
/* reach end x==0 */
memcpy(u.c,ptr,sizeof(uint16_t));
while(u.x){ptr+=sizeof(u.x)+u.x;memcpy(u.c,ptr,sizeof(uint16_t));}/*can jump easily! word2word*/
/* */
u.x=x;
memcpy(ptr,u.c,sizeof(uint16_t));
ptr+=sizeof(u.x);
memcpy(ptr,word,u.x);
ptr+=u.x;
memset(ptr,0,sizeof(uint16_t));/*end of buffer x=0*/
}
int main(void){
char buffer[1024];
memset(buffer,0,sizeof(uint16_t));/*first x=0 because its empty*/
addword(buffer,"test",4);
addword(buffer,"yay",3);
addword(buffer,"chinchin",8);
printword(buffer);
return 0;
}

How to read the standard istream buffer in c++?

I have the following problem. I have to implement a class that has an attribute that is a char pointer meant to point to the object's "code", as follows:
class foo{
private:
char* cod;
...
public:
foo();
void getVal();
...
}
So on, so forth. getVal() is a method that takes the code from the standard istream and fills in all the information, including the code. The thing is, the "code" that identifies the object can't be longer than a certain number of characters. This has to be done without using customized buffers for the method getVal(), so I can't do the following:
//suppose the maximum number of characters is 50
void foo::getVal()
{
char buffer[100];
cin >> buffer;
if (strlen(buffer) > 50) //I'm not sure this would work considering how the stream
of characters would be copied to buffer and how strlen
works, but suppose this tells me how long the stream of
characters was.
{
throw "Exception";
}
...
}
This is forbidden. I also can't use a customized istream, nor the boost library.
I thought I could find the place where istream keeps its information rather easily, but I can't find it. All I've found were mentions to other types of stream.
Can somebody tell me if this can be done or where the stream keeps its buffered information?
Thanks

yes using strlen would work definitely ..you can write a sample program
int main()
{
char buffer[10];
std::cout << "enter buffer:" ;
std::cin >>buffer;
if(strlen(buffer)>6)
std::cout << "size > 6";
getch();
}
for inputs greater than size 6 characters it will display size >6

uhm .... >> reads up to the first blank, while strlen counts up to the first null. They can be mixed if you know for sure no blanks are in the middle of string you're going to read and that there are no more than 100 consecutive characted. If not, you will overrun the buffer before throwing.
Also, accessing the buffer does not grant all the string to be already there (the string can go past the buffer space, requiring to partially read and refill the buffer...)
If blanks are separator, why not just read into an std::string, and react to its final state? All the dynamics above are already handled inside >> for std::string.
[EDIT after the comments below]
The only way to store a sequence of unknown size, is to dynamically allocate the space and make it grow as it is required to grow. This is, no more - no less, what sting and vector do.
Whether you use them or write your own code to allocate and reallocate where more space is required, doesn't change the substance.
I'm start thinking the only reason of those requirements is to see your capability in writing your own string class. So ... just write it:
declare a class holding a pointer a size and a capacity, allocate some space, track how much you store, and when no store is available, allocate another wider store, copy the old, destroy it, and adjust the data member accordingly.
Accessing directly the file buffer is not the way, since you don't control how the file buffer is filled in.

An istream uses a streambuf.
I find that www.cppreference.com is a pretty good place for quick C++ references. You can go there to see how to use a streambuf or its derivative filebuf.

boost memorybuffer and char array

I'm currently unpacking one of blizzard's .mpq file for reading.
For accessing the unpacked char buffer, I'm using a boost::interprocess::stream::memorybuffer.
Because .mpq files have a chunked structure always beginning with a version header (usually 12 bytes, see http://wiki.devklog.net/index.php?title=The_MoPaQ_Archive_Format#2.2_Archive_Header), the char* array representation seems to truncate at the first \0, even if the filesize (something about 1.6mb) remains constant and (probably) always allocated.
The result is a streambuffer with an effective length of 4 ('REVM' and byte nr.5 is \0). When attempting to read further, an exception is thrown. Here an example:
// (somewhere in the code)
{
MPQFile curAdt(FilePath);
size_t size = curAdt.getSize(); // roughly 1.6 mb
bufferstream memorybuf((char*)curAdt.getBuffer(), curAdt.getSize());
// bufferstream.m_buf.m_buffer is now 'REVM\0' (Debugger says so),
// but internal length field still at 1.6 mb
}
//////////////////////////////////////////////////////////////////////////////
// wrapper around a file oof the mpq_archive of libmpq
MPQFile::MPQFile(const char* filename) // I apologize my naming inconsistent convention :P
{
for(ArchiveSet::iterator i=gOpenArchives.begin(); i!=gOpenArchives.end();++i)
{
// gOpenArchives points to MPQArchive, wrapper around the mpq_archive, has mpq_archive * mpq_a as member
mpq_archive &mpq_a = (*i)->mpq_a;
// if file exists in that archive, tested via hash table in file, not important here, scroll down if you want
mpq_hash hash = (*i)->GetHashEntry(filename);
uint32 blockindex = hash.blockindex;
if ((blockindex == 0xFFFFFFFF) || (blockindex == 0)) {
continue; //file not found
}
uint32 fileno = blockindex;
// Found!
size = libmpq_file_info(&mpq_a, LIBMPQ_FILE_UNCOMPRESSED_SIZE, fileno);
// HACK: in patch.mpq some files don't want to open and give 1 for filesize
if (size<=1) {
eof = true;
buffer = 0;
return;
}
buffer = new char[size]; // note: size is 1.6 mb at this time
// Now here comes the tricky part... if I step over the libmpq_file_getdata
// function, I'll get my truncated char array, which I absolutely don't want^^
libmpq_file_getdata(&mpq_a, hash, fileno, (unsigned char*)buffer);
return;
}
}
Maybe someone could help me. I'm really new to STL and boost programming and also inexperienced in C++ programming anyways :P Hope to get a convenient answer (plz not suggest to rewrite libmpq and the underlying zlib architecture^^).
The MPQFile class and the underlying uncompress methods are acutally taken from a working project, so the mistake is either somewhere in the use of the buffer with the streambuffer class or something internal with char array arithmetic I haven't a clue of.
By the way, what is the difference between using signed/unsigned chars as data buffers? Has it anything to do with my problem (you might see, that in the code randomly char* unsigned char* is taken as function arguments)
If you need more infos, feel free to ask :)

How are you determining that your char* array is being 'truncated' as you call it? If you're printing it or viewing it in a debugger it will look truncated because it will be treated like a string, which is terminated by \0. The data in 'buffer' however (assuming libmpq_file_getdata() does what it's supposed to do) will contain the whole file or data chunk or whatever.

Sorry, messed up a bit with these terms (not memorybuffer actually, streambuffer is meant as in the code)
Yeah you where right... I had a mistake in my exception handling. Right after that first bit of code comes this:
// check if the file has been open
//if (!mpf.is_open())
pair<char*, size_t> temp = memorybuf.buffer();
if(temp.first)
throw AdtException(ADT_PARSEERR_EFILE);//Can't open the File
notice the missing ! before temp.first . I was surprized by the exception thrown, looked at the streambuffer .. internal buffer at was confused of its length (C# background :P).
Sorry for that, it's working as expected now.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js