New NSData with range of old NSData maintaining bytes - nsdata

I have a fairly large NSData (or NSMutableData if necessary) object which I want to take a small chunk out of and leave the rest. Since I'm working with large amounts of NSData bytes, I don't want to make a big copy, but instead just truncate the existing bytes. Basically:
NSData *source: < a few bytes I want to
discard > + < big chunk of bytes I
want to keep >
NSData *destination: < big
chunk of bytes I want to keep >
There are truncation methods in NSMutableData, but they only truncate the end of it, whereas I want to truncate the beginning. My thoughts are to do this with the methods:
Note that I used the wrong (copying) method in the original posting. I've edited and fixed it
- (const void *)bytes
and
- initWithBytesNoCopy:length:freeWhenDone:
However, I'm trying to figure out how to manage memory with these. I'm guessing the process will be like this (I've placed ????s where I don't know what to do):
// Get bytes
const unsigned char *bytes = (const unsigned char *)[source bytes];
// Offset the start
bytes += myStart;
// Somehow (m)alloc the memory which will be freed up in the following step
?????
// Release the source, now that I've allocated the bytes
[source release];
// Create a new data, recycling the bytes so they don't have to be copied
NSData destination = [[NSData alloc]
initWithBytesNoCopy:bytes
length:myLength
freeWhenDone:YES];
Thanks for the help!

Is this what you want?
NSData *destination = [NSData dataWithBytes:((char *)source.bytes) + myStart
length:myLength];
I know you said "I don't want to make a big copy," but this only does the same copy you were doing with getBytes:length: in your example, so this may be okay to you.
There's also replaceBytesInRange:withBytes:length:, which you might use like this:
[source setLength:myStart + myLength];
[source replaceBytesInRange:NSMakeRange(0, myStart)
withBytes:NULL
length:0];
But the doc's don't say how that method works (no performance characteristics), and source needs to be an NSMutableData.

depending on the context, the solutions can be different. I will assume that you need a method that would return an autoreleased NSData object with the specified range:
- (NSData *)getSubData:(NSData *)source withRange:(NSRange)range
{
UInt8 bytes[range.length];
[source getBytes:&bytes range:range];
NSData *result = [[NSData alloc] initWithBytes:bytes length:sizeof(bytes)];
return [result autorelease];
}
Of course, you can make it a class method and put it into some kind of "utils" class or create an extension over NSData...

If you want to avoid copying memory blocks, you can use the dataWithBytesNoCopy to keep the old buffer with a certain offset. In this example we "remove" the first 2 bytes:
source = [NSData dataWithBytesNoCopy:(char*)source.bytes + 2 length:source.length - 2];
For the sake of example simplicity, boundary check is skipped, please add it as it convenient for you.
Available in iOS 2.0 and later.

There's also an NSData method -[subdataWithRange:(NSRange)range] that could do the trick. I have no idea what the performance looks like (I'd imagine it does a copy or two, but I don't know for certain). It can be used like:
NSData *destination = [source subdataWithRange:NSMakeRange(0, lengthIWant)];

Related

How to get byte[] into capnp::Data

On the official website, there is a nice and relatively comprehensive example of how one could use CapnProto for C++ serialisation. What is missing, is how to handle the second Blob type capnp::Data, as only capnp::Text is covered.
Just for completeness, here is what the Schema Language says about the blob type:
Blobs: Text, Data
...
Text is always UTF-8 encoded and NUL-terminated.
Data is a completely arbitrary sequence of bytes.
So, if I have the following schema
struct Tiding {
id #0 :Text;
payload #1 :Data;
}
I can start building my message like this
::capnp::MallocMessageBuilder message;
Tiding::Builder tiding = message.initRoot<Tiding>();
tiding.setId("1");
At this point I got stuck. I can't do this:
typedef unsigned char byte;
byte data[100];
... //populate the array
tiding.setPayload(data)
//error: no viable conversion from 'byte [100]' to '::capnp::Data::Reader'
So I mucked around a bit and saw that capnp::Data is wrapping kj::ArrayPtr<const byte>, but I was unable to somehow get a hold of an ArrayPtr, much less use it to set the Payload field for my message.
I saw that there is a way to set the default value for the type Data (i.e. payload #5 :Data = 0x"a1 40 33";), but the schema language doesn't really translate to C++ in this case, so that also didn't help me.
I'd be grateful if somebody could point out what I am missing here. Also, how would I do this if I had List(Data) instead of just Data as the Payload in my schema?
A kj::ArrayPtr is fundamentally a pair of a pointer and a size.
You can create one by calling kj::arrayPtr(), which takes two arguments: a pointer, and the array size. Example:
byte buffer[256];
kj::ArrayPtr<byte> bufferPtr = kj::arrayPtr(buffer, sizeof(buffer));
kj::ArrayPtr has begin() and end() methods which return pointers, and a size() method. So you can convert back to pointer/size like:
byte* ptr = bufferPtr.begin();
size_t size = bufferPtr.size();
Putting it all together, in your example, you want:
tiding.setPayload(kj::arrayPtr(data, sizeof(data)));

Dynamically allocate many small pieces of memory

I think this is a very common problem. Let me give an example.
I have a file, which contains many many lines (e.g. one million lines), and each line is of the following form: first comes a number X, and then follows a string of length X.
Now I want to read the file and store all the strings (for whatever reason). Usually, what I will do is: for every line I read the length X, and use malloc (in C) or new (in C++) to allocate X bytes, and then read the string.
The reason that I don't like this method: it might happen that most of the strings are very short, say under 8 bytes. In that case, according to my understanding, the allocation will be very wasteful, both in time and in space.
(First question here: am I understanding correctly, that allocating small pieces of memory is wasteful?)
I have thought about the following optimization: everytime I allocate a big chunk, say 1024 bytes, and whenever a small piece is needed, just cut it from the big chunk. The problem with this method is that, deallocation becomes almost impossible...
It might sound like I want to do the memory management myself... but still, I would like to know if there exists a better method? If needed, I don't mind use some data structure to do the management.
If you have some good idea that only works conditionally (e.g. with the knowledge that most pieces are small), I will also be happy to know it.
The "natural" way to do memory allocation is to ensure that every memory block is at least big enough to contain a pointer and a size, or some similar book-keeping that's sufficient to maintain a structure of free nodes. The details vary, but you can observe the overhead experimentally by looking at the actual addresses you get back from your allocator when you make small allocations.
This is the sense in which small allocations are "wasty". Actually with most C or C++ implementations all blocks get rounded to a multiple of some power of 2 (the power depending on the allocator and sometimes on the order of magnitude size of the allocation). So all allocations are wasty, but proportionally speaking there's more waste if a lot of 1 and 2 byte allocations are padded out to 16 bytes, than if a lot of 113 and 114 byte allocations are padded out to 128 bytes.
If you're willing to do away with the ability to free and reuse just a single allocation (which is fine for example if you're planning to free all of together once you're done worrying about the contents of this file) then sure, you can allocate lots of small strings in a more compact way. For example, put them all end to end in one or a few big allocations, each string nul-terminated, and deal in pointers to the first byte of each. The overhead is either 1 or 0 bytes per string depending how you consider the nul. This can work particularly neatly in the case of splitting a file into lines, if you just overwrite the linebreaks with nul bytes. Obviously you'd need to not mind that the linebreak has been removed from each line!
If you need freeing and re-use, and you know that all allocations are the same size, then you can do away with the size from the book-keeping, and write your own allocator (or, in practice, find an existing pool allocator you're happy with). The minimum allocated size could be one pointer. But that's only an easy win if all the strings are below the size of a pointer, "most" isn't so straightforward.
Yes, statically-allocating a large-ish buffer and reading into that is the usual way to read data.
Say you pick 1KB for the buffer size, because you expect most reads to fit into that.
Are you able to chop rare reads that go above 1KB into multiple reads?
Then do so.
Or not?
Then you can dynamically allocate if and only if necessary. Some simple pointer magic will do the job.
static const unsigned int BUF_SIZE = 1024;
static char buf[BUF_SIZE];
while (something) {
const unsigned int num_bytes_to_read = foo();
const char* data = 0;
if (num_bytes_to_read <= BUF_SIZE) {
read_into(&buf[0]);
data = buf;
}
else {
data = new char[num_bytes_to_read];
read_into(data);
}
// use data
if (num_bytes_to_read > BUF_SIZE)
delete[] data;
}
This code is a delightful mashup of C, C++ and pseudocode, since you did not specify a language.
If you're actually using C++, just use a vector for goodness' sake; let it grow if needed but otherwise just re-use its storage.
You could count the number of lines of text and their total length first, then allocate a block of memory to store the text and a block to store pointers into it. Fill these blocks by reading the file a second time. Just remember to add terminating zeros.
If the entire file will fit into memory, then why not get the size of the file, allocate that much memory and enough for pointers, then read in the entire file and create an array of pointers to the lines in the file?
I would store the "x" with using the largest buffer I can.
You did not tell us what is the max size of x as sizeof(x). It is I think crucial to store it in the buffer to evade addressing for each word and access them relatively quickly.
Something like :
char *buffer = "word1\0word2\0word3\0";
while stocking addr or ...etc.. for 'quick' access
Became like this :
char *buffer = "xx1word1xx2word2xx3word3\0\0\0\0";
As you can see with an x at a fixed size it can be really effective to jump word to word without the need to store each address, only need to read x and jump incrementing the addr using x...
x is not converted in char, integer injected and read using his type size, no need the end of the string \0 for the words this way, only for the full buff to know the end of the buffer (if x==0 then its the end).
I am not that good in explaining thanks to my English I push you some code as a better explanation :
#include <stdio.h>
#include <stdint.h>
#include <string.h>
void printword(char *buff){
char *ptr;
int i;
union{
uint16_t x;
char c[sizeof(uint16_t)];
}u;
ptr=buff;
memcpy(u.c,ptr,sizeof(uint16_t));
while(u.x){
ptr+=sizeof(u.x);
for(i=0;i<u.x;i++)printf("%c",buff[i+(ptr-buff)]);/*jump in buff using x*/
printf("\n");
ptr+=u.x;
memcpy(u.c,ptr,sizeof(uint16_t));
}
}
void addword(char *buff,const char *word,uint16_t x){
char *ptr;
union{
uint16_t x;
char c[sizeof(uint16_t)];
}u;
ptr=buff;
/* reach end x==0 */
memcpy(u.c,ptr,sizeof(uint16_t));
while(u.x){ptr+=sizeof(u.x)+u.x;memcpy(u.c,ptr,sizeof(uint16_t));}/*can jump easily! word2word*/
/* */
u.x=x;
memcpy(ptr,u.c,sizeof(uint16_t));
ptr+=sizeof(u.x);
memcpy(ptr,word,u.x);
ptr+=u.x;
memset(ptr,0,sizeof(uint16_t));/*end of buffer x=0*/
}
int main(void){
char buffer[1024];
memset(buffer,0,sizeof(uint16_t));/*first x=0 because its empty*/
addword(buffer,"test",4);
addword(buffer,"yay",3);
addword(buffer,"chinchin",8);
printword(buffer);
return 0;
}

Parsing buffer data in C++

My C++ project has a buffer which could be any size and is filled by Bluetooth. The format of the incoming messages is like 0x43 0x0B 0x00 0x06 0xA2 0x03 0x03 0x00 0x01 0x01 0x0A 0x0B 0x0B 0xE6 0x0D in which starts with 0x43 and ends with 0x0D. So, it means that each time when buffer is filled, it can have different order of contents according to the above message format.
static const int BufferSize = 1024;
byte buffer[BufferSize];
What is the best way to parse the incoming messages in this buffer?
Since I have come from Java and .NET, What is the best way to make each extracted message as an object? Class could be solution?
I have created a separate class for parsing the buffer like bellow, am I in a right direction?
#include<parsingClass.h>
class A
{
parsingClass ps;
public:
parsingClass.parse(buffer, BufferSize);
}
class ReturnMessage{
char *message;
public:
char *getMessage(unsigned char *buffer,int count){
message = new char[count];
for(int i = 1; i < count-2; i++){
message[i-1] = buffer[i];
}
message[count-2] = '\0';
return message;
}
};
class ParserToMessage{
static int BufferSize = 1024;
unsigned char buffer[BufferSize];
unsigned int counter;
public:
static char *parse_buffer()
{
ReturnMessage rm;
unsigned char buffByte;
buffByte = blueToothGetByte(); //fictional getchar() kind of function for bluetooth
if(buffByte == 0x43){
buffer[counter++] = buffByte;
//continue until you find 0x0D
while((buffByte = blueToothGetByte()) != 0x0D){
buffer[counter++] = buffByte;
}
}
return rm.getMessage(buffer,counter);
}
};
Can you have the parser as a method of a 'ProtocolUnit' class? The method could take a buffer pointer/length as a parameter and return an int that indicates how many bytes it consumed from the buffer before it correctly assembled a complete protocol unit, or -1 if it needs more bytes from the next buffer.
Once you have a complete ProtocolUnit, you can do what you wish with it, (eg. queue it off to some processing thread), and create a new one for the remaining bytes/next buffer.
My C++ project has a buffer which could be any size
The first thing I notice is that you have hard-coded the buffer size. You are in danger of buffer overflow if an attempt is made to read data bigger than the size you have specified into the buffer.
If possible keep the buffer size dynamic and create the byte array according to the size of the data to be received into the buffer. Try and inform the object where your byte array lives of the incoming buffer size, before you create the byte array.
int nBufferSize = GetBufferSize();
UCHAR* szByteArray = new UCHAR[nBufferSize];
What is the best way to parse the incoming messages in this buffer?
You are on the right lines, in that you have created and are using a parser class. I would suggest using memcpy to copy the individual data items one at a time, from the buffer to a variable of your choice. Not knowing the wider context of your intention at this point, I cannot add much to that.
Since I have come from Java and .NET, What is the best way to make
each extracted message as an object? Class could be solution?
Depending on the complexity of the data you are reading from the buffer and what your plans are, you could use a class or a struct. If you do not need to create an object with this data, which provides services to other objects, you could use a struct. Structs are great when your need isn't so complex, whereby a full class might be overkill.
I have created a separate class for parsing the buffer like bellow, am
I in a right direction?
I think so.
I hope that helps for starters!
The question "how should I parse this" depends largely on how you want to parse the data. Two things are missing from your question:
Exactly how do you receive the data? You mention Bluetooth but what is the programming medium? Are you reading from a socket? Do you have some other kind of API? Do you receive it byte at a time or in blocks?
What are the rules for dealing with the data you are receiving? Most data is delimited in some way or of fixed field length. In your case, you mention that it can be of any length but unless you explain how you want to parse it, I can't help.
One suggestion I would make is to change the type of your buffer to use std::vector :
std::vector<unsigned char> buffer(normalSize)
You should choose normalSize to be something around the most frequently observed size of your incoming message. A vector will grow as you push items onto it so, unlike the array you created, you won't need to worry about buffer overrun if you get a large message. However, if you do go above normalSize under the covers the vector will reallocate enough memory to cope with your extended requirements. This can be expensive so you don't want to do it too often.
You use a vector in pretty much the same way as your array. One key difference is that you can simply push elements onto the end of the vector, rather than having to keep a running pointer. SO imagine you received a single int at a time from the Bluetooth source, your code might look something like this:
// Clear out the previous contents of the buffer.
buffer.clear();
int elem(0);
// Find the start of your message. Throw away elements
// that we don't need.
while ( 0x43 != ( elem = getNextBluetoothInt() ) );
// Push elements of the message into the buffer until
// we hit the end.
while ( 0x0D != elem )
{
buffer.push_back( elem );
}
buffer.push_back( elem ); // Remember to add on the last one.
The key benefit is that array will automatically resize the vector without you having to do it no matter whether the amount of characters pushed on is 10 or 10,000.

C++ char array move null terminator properly?

Hi my problem is kind of difficult to explain so I'll just post my code section here and explain the problem with an example.
This code here has a big and a small array where the big array gets split up in small parts, is stored in the small array and the small array is outputting its content on the screen. Afterwards I free the allocated memory of the small array and initialize it again with the next part of the big array:
//this code is in a loop that runs until all of the big array has been copied
char* splitArray = new char[50];
strncpy(splitArray, bigArray+startPoint, 50); //startPoint is calculated with every loop run, it marks the next point in the array for copying
//output of splitArray on the screen here
delete splitArray;
//repeat loop here
now my problem is that the outputted string has everytime some random symbols at the end. for example "some_characters_here...last_char_hereRANDOM_CHARS_HERE".
after looking deeper into it I found out that splitArray actually doesnt have a size of 50 but of 64 with the null terminator at 64.
so when I copy from bigArray into splitArray then there are still the 14 random characters left after the real string and of course I dont want to output them.
A simple solution would be to manually set the null terminator in the splitArray at [50] but then the program fails to delete the array again.
Can anybody help me find a solution for this? Preferably with some example code, thanks.
How does the program "fail to delete the array again" if you just set splitArray[49] = 0? Don't forget, an array of length 50 is indexed from 0 through 49. splitArray[50] = 0 is writing to memory outside that allocated for splitArray, with all the consequences that entails.
When you allocate memory for the splitArray the memory is not filled with NULL characters, you need to explictly do it. Because of this your string is not properly NULL terminated. To do this you can do char* splitArray = new char[51](); to initialize with NULL character at the time of allocation itself (note that I am allocating 51 chars to have the extra NULL character at the end). . Also note that you need to do delete[] splitArray; and not delete splitArray;.
The function strncpy has the disadvantage that it doesn't terminate the destination string, if the source string contains more than 50 chars. Seems like it does in your case!
If this really is C++, you can do it with std::string splitArray(bigArray+startPoint, 50).
I see a couple of problems with your code:
If you allocate by using new [], you need to free with delete [] (not delete)
Why are you using freestore anyway? From what I can see you might as well use local array.
If you want to store 50 characters in an array, you need 51 for the terminating null character.
You wanted some code:
while(/* condition */)
{
// your logic
char splitArray[51];
strncpy(splitArray, bigArray+startPoint, 50);
splitArray[50] = '\0';
// do stuff with splitArray
// no delete
}
Just doing this will be sufficient:
char* splitArray = new char[50 + 1];
strncpy(splitArray, bigArray+startPoint, 50);
splitArray[50] = '\0';
I'd really question why you're doing this anyway though. This is much cleaner:
std::string split(bigArray+startPoint, 50);
it still does the copy, but handles (de)allocation and termination for you. You can get the underlying character pointer like so:
char const *s = split.c_str();
it'll be correctly nul-terminated, and have the same lifetime as the string object (ie, you don't need to free or delete it).
NB. I haven't changed your original code, but losing the magic integer literals would also be a good idea.

boost memorybuffer and char array

I'm currently unpacking one of blizzard's .mpq file for reading.
For accessing the unpacked char buffer, I'm using a boost::interprocess::stream::memorybuffer.
Because .mpq files have a chunked structure always beginning with a version header (usually 12 bytes, see http://wiki.devklog.net/index.php?title=The_MoPaQ_Archive_Format#2.2_Archive_Header), the char* array representation seems to truncate at the first \0, even if the filesize (something about 1.6mb) remains constant and (probably) always allocated.
The result is a streambuffer with an effective length of 4 ('REVM' and byte nr.5 is \0). When attempting to read further, an exception is thrown. Here an example:
// (somewhere in the code)
{
MPQFile curAdt(FilePath);
size_t size = curAdt.getSize(); // roughly 1.6 mb
bufferstream memorybuf((char*)curAdt.getBuffer(), curAdt.getSize());
// bufferstream.m_buf.m_buffer is now 'REVM\0' (Debugger says so),
// but internal length field still at 1.6 mb
}
//////////////////////////////////////////////////////////////////////////////
// wrapper around a file oof the mpq_archive of libmpq
MPQFile::MPQFile(const char* filename) // I apologize my naming inconsistent convention :P
{
for(ArchiveSet::iterator i=gOpenArchives.begin(); i!=gOpenArchives.end();++i)
{
// gOpenArchives points to MPQArchive, wrapper around the mpq_archive, has mpq_archive * mpq_a as member
mpq_archive &mpq_a = (*i)->mpq_a;
// if file exists in that archive, tested via hash table in file, not important here, scroll down if you want
mpq_hash hash = (*i)->GetHashEntry(filename);
uint32 blockindex = hash.blockindex;
if ((blockindex == 0xFFFFFFFF) || (blockindex == 0)) {
continue; //file not found
}
uint32 fileno = blockindex;
// Found!
size = libmpq_file_info(&mpq_a, LIBMPQ_FILE_UNCOMPRESSED_SIZE, fileno);
// HACK: in patch.mpq some files don't want to open and give 1 for filesize
if (size<=1) {
eof = true;
buffer = 0;
return;
}
buffer = new char[size]; // note: size is 1.6 mb at this time
// Now here comes the tricky part... if I step over the libmpq_file_getdata
// function, I'll get my truncated char array, which I absolutely don't want^^
libmpq_file_getdata(&mpq_a, hash, fileno, (unsigned char*)buffer);
return;
}
}
Maybe someone could help me. I'm really new to STL and boost programming and also inexperienced in C++ programming anyways :P Hope to get a convenient answer (plz not suggest to rewrite libmpq and the underlying zlib architecture^^).
The MPQFile class and the underlying uncompress methods are acutally taken from a working project, so the mistake is either somewhere in the use of the buffer with the streambuffer class or something internal with char array arithmetic I haven't a clue of.
By the way, what is the difference between using signed/unsigned chars as data buffers? Has it anything to do with my problem (you might see, that in the code randomly char* unsigned char* is taken as function arguments)
If you need more infos, feel free to ask :)
How are you determining that your char* array is being 'truncated' as you call it? If you're printing it or viewing it in a debugger it will look truncated because it will be treated like a string, which is terminated by \0. The data in 'buffer' however (assuming libmpq_file_getdata() does what it's supposed to do) will contain the whole file or data chunk or whatever.
Sorry, messed up a bit with these terms (not memorybuffer actually, streambuffer is meant as in the code)
Yeah you where right... I had a mistake in my exception handling. Right after that first bit of code comes this:
// check if the file has been open
//if (!mpf.is_open())
pair<char*, size_t> temp = memorybuf.buffer();
if(temp.first)
throw AdtException(ADT_PARSEERR_EFILE);//Can't open the File
notice the missing ! before temp.first . I was surprized by the exception thrown, looked at the streambuffer .. internal buffer at was confused of its length (C# background :P).
Sorry for that, it's working as expected now.