boost memorybuffer and char array - c++

I'm currently unpacking one of blizzard's .mpq file for reading.
For accessing the unpacked char buffer, I'm using a boost::interprocess::stream::memorybuffer.
Because .mpq files have a chunked structure always beginning with a version header (usually 12 bytes, see http://wiki.devklog.net/index.php?title=The_MoPaQ_Archive_Format#2.2_Archive_Header), the char* array representation seems to truncate at the first \0, even if the filesize (something about 1.6mb) remains constant and (probably) always allocated.
The result is a streambuffer with an effective length of 4 ('REVM' and byte nr.5 is \0). When attempting to read further, an exception is thrown. Here an example:
// (somewhere in the code)
{
MPQFile curAdt(FilePath);
size_t size = curAdt.getSize(); // roughly 1.6 mb
bufferstream memorybuf((char*)curAdt.getBuffer(), curAdt.getSize());
// bufferstream.m_buf.m_buffer is now 'REVM\0' (Debugger says so),
// but internal length field still at 1.6 mb
}
//////////////////////////////////////////////////////////////////////////////
// wrapper around a file oof the mpq_archive of libmpq
MPQFile::MPQFile(const char* filename) // I apologize my naming inconsistent convention :P
{
for(ArchiveSet::iterator i=gOpenArchives.begin(); i!=gOpenArchives.end();++i)
{
// gOpenArchives points to MPQArchive, wrapper around the mpq_archive, has mpq_archive * mpq_a as member
mpq_archive &mpq_a = (*i)->mpq_a;
// if file exists in that archive, tested via hash table in file, not important here, scroll down if you want
mpq_hash hash = (*i)->GetHashEntry(filename);
uint32 blockindex = hash.blockindex;
if ((blockindex == 0xFFFFFFFF) || (blockindex == 0)) {
continue; //file not found
}
uint32 fileno = blockindex;
// Found!
size = libmpq_file_info(&mpq_a, LIBMPQ_FILE_UNCOMPRESSED_SIZE, fileno);
// HACK: in patch.mpq some files don't want to open and give 1 for filesize
if (size<=1) {
eof = true;
buffer = 0;
return;
}
buffer = new char[size]; // note: size is 1.6 mb at this time
// Now here comes the tricky part... if I step over the libmpq_file_getdata
// function, I'll get my truncated char array, which I absolutely don't want^^
libmpq_file_getdata(&mpq_a, hash, fileno, (unsigned char*)buffer);
return;
}
}
Maybe someone could help me. I'm really new to STL and boost programming and also inexperienced in C++ programming anyways :P Hope to get a convenient answer (plz not suggest to rewrite libmpq and the underlying zlib architecture^^).
The MPQFile class and the underlying uncompress methods are acutally taken from a working project, so the mistake is either somewhere in the use of the buffer with the streambuffer class or something internal with char array arithmetic I haven't a clue of.
By the way, what is the difference between using signed/unsigned chars as data buffers? Has it anything to do with my problem (you might see, that in the code randomly char* unsigned char* is taken as function arguments)
If you need more infos, feel free to ask :)

How are you determining that your char* array is being 'truncated' as you call it? If you're printing it or viewing it in a debugger it will look truncated because it will be treated like a string, which is terminated by \0. The data in 'buffer' however (assuming libmpq_file_getdata() does what it's supposed to do) will contain the whole file or data chunk or whatever.

Sorry, messed up a bit with these terms (not memorybuffer actually, streambuffer is meant as in the code)
Yeah you where right... I had a mistake in my exception handling. Right after that first bit of code comes this:
// check if the file has been open
//if (!mpf.is_open())
pair<char*, size_t> temp = memorybuf.buffer();
if(temp.first)
throw AdtException(ADT_PARSEERR_EFILE);//Can't open the File
notice the missing ! before temp.first . I was surprized by the exception thrown, looked at the streambuffer .. internal buffer at was confused of its length (C# background :P).
Sorry for that, it's working as expected now.

Related

Reading contents of file into dynamically allocated char* array- can I read into std::string instead?

I have found myself writing code which looks like this
// Treat the following as pseudocode - just an example
iofile.seekg(0, std::ios::end); // iofile is a file opened for read/write
uint64_t f_len = iofile.tellg();
if(f_len >= some_min_length)
{
// Focus on the following code here
char *buf = new char[7];
char buf2[]{"MYFILET"}; // just some random string
// if we see this it's a good indication
// the rest of the file will be in the
// expected format (unlikely to see this
// sequence in a "random file", but don't
// worry too much about this)
iofile.read(buf, 7);
if(memcmp(buf, buf2, 7) == 0) // I am confident this works
{
// carry on processing file ...
// ...
// ...
}
}
else
cout << "invalid file format" << endl;
This code is probably an okay sketch of what we might want to do when opening a file, which has some specified format (which I've dictated). We do some initial check to make sure the string "MYFILET" is at the start of the file - because I've decided all my files for the job I'm doing are going to start with this sequence of characters.
I think this code would be better if we didn't have to play around with "c-style" character arrays, but used strings everywhere instead. This would be advantageous because we could do things like if(buf == buf2) if buf and buf2 where std::strings.
A possible alternative could be,
// Focus on the following code here
std::string buf;
std::string buf2("MYFILET"); // very nice
buf.resize(7); // okay, but not great
iofile.read(buf.data(), 7); // pretty awful - error prone if wrong length argument given
// also we have to resize buf to 7 in the previous step
// lots of potential for mistakes here,
// and the length was used twice which is never good
if(buf == buf2) then do something
What are the problems with this?
We had to use the length variable 7 (or constant in this case) twice. Which is somewhere between "not ideal" and "potentially error prone".
We had to access the contents of buf using .data() which I shall assume here is implemented to return a raw pointer of some sort. I don't personally mind this too much, but others may prefer a more memory-safe solution, perhaps hinting we should use an iterator of some sort? I think in Visual Studio (for Windows users which I am not) then this may return an iterator anyway, which will give [?] warnings/errors [?] - not sure on this.
We had to have an additional resize statement for buf. It would be better if the size of buf could be automatically set somehow.
It is undefined behavior to write into the const char* returned by std::string::data(). However, you are free to use std::vector::data() in this way.
If you want to use std::string, and dislike setting the size yourself, you may consider whether you can use std::getline(). This is the free function, not std::istream::getline(). The std::string version will read up to a specified delimiter, so if you have a text format you can tell it to read until '\0' or some other character which will never occur, and it will automatically resize the given string to hold the contents.
If your file is binary in nature, rather than text, I think most people would find std::vector<char> to be a more natural fit than std::string anyway.
We had to use the length variable 7 (or constant in this case) twice.
Which is somewhere between "not ideal" and "potentially error prone".
The second time you can use buf.size()
iofile.read(buf.data(), buf.size());
We had to access the contents of buf using .data() which I shall
assume here is implemented to return a raw pointer of some sort.
And pointed by John Zwinck, .data() return a pointer to const.
I suppose you could define buf as std::vector<char>; for vector (if I'm not wrong) .data() return a pointer to char (in this case), not to const char.
size() and resize() are working in the same way.
We had to have an additional resize statement for buf. It would be
better if the size of buf could be automatically set somehow.
I don't think read() permit this.
p.s.: sorry for my bad English.
We can validate a signature without double buffering (rdbuf and a string) and allocating from the heap...
// terminating null not included
constexpr char sig[] = { 'M', 'Y', 'F', 'I', 'L', 'E', 'T' };
auto ok = all_of(begin(sig), end(sig), [&fs](char c) { return fs.get() == (int)c; });
if (ok) {}
template<class Src>
std::string read_string( Src& src, std::size_t count){
std::string buf;
buf.resize(count);
src.read(&buf.front(), 7); // in C++17 make it buf.data()
return buf;
}
Now auto read = read_string( iofile, 7 ); is clean at point of use.
buf2 is a bad plan. I'd do:
if(read=="MYFILET")
directly, or use a const char myfile_magic[] = "MYFILET";.
I liked many of the ideas from the examples above, however I wasn't completely satisfied that there was an answer which would produce undefined-behaviour-free code for C++11 and C++17. I currently write most of my code in C++11 - because I don't anticipate using it on a machine in the future which doesn't have a C++11 compiler.
If one doesn't, then I add a new compiler or change machines.
However it does seem to me to be a bad idea to write code which I know may not work under C++17... That's just my personal opinion. I don't anticipate using this code again, but I don't want to create a potential problem for myself in the future.
Therefore I have come up with the following code. I hope other users will give feedback to help improve this. (For example there is no error checking yet.)
std::string
fstream_read_string(std::fstream& src, std::size_t n)
{
char *const buffer = new char[n + 1];
src.read(buffer, n);
buffer[n] = '\0';
std::string ret(buffer);
delete [] buffer;
return ret;
}
This seems like a basic, probably fool-proof method... It's a shame there seems to be no way to get std::string to use the same memory as allocated by the call to new.
Note we had to add an extra trailing null character in the C-style string, which is sliced off in the C++-style std::string.

Windows readfile with dynamic buffer

I am trying to write a wrapper around Windows file functions, one would read num bytes amount of data from the file and retrun it. For some reason I fail to allocate the memory properly, but I just can't find the reason why:
PBYTE Read(int num_bytes, HANDLER hFile){
PBYTE bBuffer;
DWORD new_size = sizeof(BYTE)*num_bytes;
//after the allocation the debugger already displays a 16 char wide placeholder
bBuffer = (PBYTE)malloc(new_size);
OVERLAPPED o = { 0 };
o.Offset = 0;
BOOL bReadDone = ReadFile(hFile, (LPVOID)bBuffer, sizeof(BYTE)*num_bytes, NULL, &o);
return bBuffer;
}
Data gets copied, but the allocated buffer is always too wide and contains extra wierd filler characters. Can sby please explain what am I doing wrong?
"what am I doing wrong?"
sizeof(BYTE) is 1 so you can remove it everywhere and eliminate the redundant new_size variable.
You tagged your question C++ but used malloc to allocate the buffer. Your design makes the caller responsible for freeing the buffer, which is a poor design approach, and even more so by using malloc/free in C++ program. A good C++ solution to this quandry would be to return a
std::vector.
It is vital that you provide the lpNumberOfBytesRead parameter to ReadFile. Without it you don't know how many bytes were read. And if you don't know how many bytes were read you can't tell the difference between "extra wierd filler characters" and unused memory at the end of the buffer. If the data is characters then character-oriented output routines (and debugger tools) don't know the difference either, since there is no null terminator at the end of the data that was actually read. You could use NumberOfBytesRead to put in a nul terminator so you and the debugger don't read beyond the real data.

C++ Parsing data bytes into struct

I am new in here and I have a question
I have a struct, let's say overall size is 8 bytes, here the struct:
struct Header
{
int ID; // 4 bytes
char Title [4]; // 4 bytes too
}; // so it 8 bytes right?
and I have a file with 8 bytes too...
I just want to ask, how to parse data on that file into the struct of that
I have tried this one:
Header* ParseHeader(char* filename)
{
char* buffer = new char[8];
fstream fs(filename);
if (fs.is_open() != true)
throw new exception("Couldn't Open file for Parsing Header.");
fs.read(buffer, 8);
if (!fs)
{
delete[] buffer;
throw new exception("Couldn't Read header OJN file.\nHeader data was corrupted");
}
Header* header = (Header*)((void*)buffer);
delete[] buffer;
fs.close();
return header;
}
but it fail, and return invalid data than what I was expect (I can make you sure, this is not file fault, the file structured correctly)
Can someone help me?
Thanks
Seems like you do everything fine until this point:
Header* header = (Header*)((void*)buffer);
delete[] buffer;
fs.close();
notice you delete the buffer after the casting, meaning that header points to a deleted location -> junk, you need to either not delete or copy the data if you like to still use it.
Also, to be quite honest, I don't understand how your code compiles, your function states it returns a Header, while you return a Header*..
You are deleting the data that is being returned. Therefore header is no longer accessible.
I think you meant the line to be:
Header header = *(Header*)((void*)buffer);
This will actually copy the header.
The fact that your 8 bytes file correctly maps to your struct Header is mere luck as far as C++ is involved. The structure could have internal padding that make it bigger than 8 bytes, and the data endianness could be different between your file and your CPU.
I realize your code works with your particular compiler version, on your operating system and on your CPU but you should not get into the habit of coding like that, otherwise you'll probably get into big trouble as soon as you change any of those parameters (or maybe even just some compiler flags). In other words, what you are doing is extremely bad practice. In C++ you don't even have the guarantee that an int is actually 4 bytes.
The Right Way™ to load such binary data from a file is to load each field individually and ensure proper endianness conversion depending on the CPU you're using (eg. through hton* / ntoh* or similar functions). Using a fixed-size type like int32_t also helps.
Just define your structure in 1-byte boundary as:
#pragma pack(1)
struct Header
{
int ID; // 4 bytes
char Title [4]; // 4 bytes too
};
#pragma pack()
First pack statement instructs the compiler to use one-byte padding for members in structure. This way size of Header will be 8 bytes. Second pack statement instructs to go back to default setting. You may need to use push and pop instructions (See enter link description here) - but this isn't required for you.
Secondly, and more importantly, you should not use hard-code values like 8. Always use sizeof to read or write a structure. Also, this statement is absolutely not needed:
char* buffer = new char[8];
...
Just declare Header variable itself, and read on it:
Header header;
...
fs.read(&header, sizeof(Header));

Read .part files and concatenate them all

So I am writing my own custom FTP client for a school project. I managed to get everything to work with the swarming FTP client and am down to one last small part...reading the .part files into the main file. I need to do two things. (1) Get this to read each file and write to the final file properly (2) The command to delete the part files after I am done with each one.
Can someone please help me to fix my concatenate function I wrote below? I thought I had it right to read each file until the EOF and then go on to the next.
In this case *numOfThreads is 17. Ended up with a file of 4742442 bytes instead of 594542592 bytes. Thanks and I am happy to provide any other useful information.
EDIT: Modified code for comment below.
std::string s = "Fedora-15-x86_64-Live-Desktop.iso";
std::ofstream out;
out.open(s.c_str(), std::ios::out);
for (int i = 0; i < 17; ++i)
{
std::ifstream in;
std::ostringstream convert;
convert << i;
std::string t = s + ".part" + convert.str();
in.open(t.c_str(), std::ios::in | std::ios::binary);
int size = 32*1024;
char *tempBuffer = new char[size];
if (in.good())
{
while (in.read(tempBuffer, size))
out.write(tempBuffer, in.gcount());
}
delete [] tempBuffer;
in.close();
}
out.close();
return 0;
Almost everything in your copying loop has problems.
while (!in.eof())
This is broken. Not much more to say than that.
bzero(tempBuffer, size);
This is fairly harmless, but utterly pointless.
in.read(tempBuffer, size);
This the "almost" part -- i.e., the one piece that isn't obviously broken.
out.write(tempBuffer, strlen(tempBuffer));
You don't want to use strlen to determine the length -- it's intended only for NUL-terminated (C-style) strings. If (as is apparently the case) the data you read may contain zero-bytes (rather than using zero-bytes only to signal the end of a string), this will simply produce the wrong size.
What you normally want to do is a loop something like:
while (read(some_amount) == succeeded)
write(amount that was read);
In C++ that will typically be something like:
while (infile.read(buffer, buffer_size))
outfile.write(buffer, infile.gcount());
It's probably also worth noting that since you're allocating memory for the buffer using new, but never using delete, your function is leaking memory. Probably better to do without new for this -- an array or vector would be obvious alternatives here.
Edit: as for why while (infile.read(...)) works, the read returns a reference to the stream. The stream in turn provides a conversion to bool (in C++11) or void * (in C++03) that can be interpreted as a Boolean. That conversion operator returns the state of the stream, so if reading failed, it will be interpreted as false, but as long as it succeeded, it will be interpreted as true.

Is there any way to determine how many characters will be written by sprintf?

I'm working in C++.
I want to write a potentially very long formatted string using sprintf (specifically a secure counted version like _snprintf_s, but the idea is the same). The approximate length is unknown at compile time so I'll have to use some dynamically allocated memory rather than relying on a big static buffer. Is there any way to determine how many characters will be needed for a particular sprintf call so I can always be sure I've got a big enough buffer?
My fallback is I'll just take the length of the format string, double it, and try that. If it works, great, if it doesn't I'll just double the size of the buffer and try again. Repeat until it fits. Not exactly the cleverest solution.
It looks like C99 supports passing NULL to snprintf to get the length. I suppose I could create a module to wrap that functionality if nothing else, but I'm not crazy about that idea.
Maybe an fprintf to "/dev/null"/"nul" might work instead? Any other ideas?
EDIT: Alternatively, is there any way to "chunk" the sprintf so it picks up mid-write? If that's possible it could fill the buffer, process it, then start refilling from where it left off.
The man page for snprintf says:
Return value
Upon successful return, these functions return the number of
characters printed (not including the trailing '\0' used to end
output to strings). The functions snprintf and vsnprintf do not
write more than size bytes (including the trailing '\0'). If
the output was truncated due to this limit then the return value
is the number of characters (not including the trailing '\0')
which would have been written to the final string if enough
space had been available. Thus, a return value of size or more
means that the output was truncated. (See also below under
NOTES.) If an output error is encountered, a negative value is
returned.
What this means is that you can call snprintf with a size of 0. Nothing will get written, and the return value will tell you how much space you need to allocate to your string:
int how_much_space = snprintf(NULL, 0, fmt_string, param0, param1, ...);
As others have mentioned, snprintf() will return the number of characters required in a buffer to prevent the output from being truncated. You can simply call it with a 0 buffer length parameter to get the required size then use an appropriately sized buffer.
For a slight improvement in efficiency, you can call it with a buffer that's large enough for the normal case and only do a second call to snprintf() if the output is truncated. In order to make sure the buffer(s) are properly freed in that case, I'll often use an auto_buffer<> object that handles the dynamic memory for me (and has the default buffer on the stack to avoid a heap allocation in the normal case).
If you're using a Microsoft compiler, MS has a non-standard _snprintf() that has serious limitations of not always null terminating the buffer and not indicating how big the buffer should be.
To work around Microsoft's non-support, I use a nearly public domain snprintf() from Holger Weiss.
Of course if your non-MS C or C++ compiler is missing snprintf(), the code from the above link should work just as well.
I would use a two-stage approach. Generally, a large percentage of output strings will be under a certain threshold and only a few will be larger.
Stage 1, use a reasonable sized static buffer such as 4K. Since snprintf() can restrict how many characters are written, you won't get a buffer overflow. What you will get returned from snprintf() is the number of characters it would have written if your buffer had been big enough.
If your call to snprintf() returns less than 4K, then use the buffer and exit. As stated, the vast majority of calls should just do that.
Some will not and that's when you enter stage 2. If the call to snprintf() won't fit in the 4K buffer, you at least now know how big a buffer you need.
Allocate, with malloc(), a buffer big enough to hold it then snprintf() it again to that new buffer. When you're done with the buffer, free it.
We worked on a system in the days before snprintf() and we acheived the same result by having a file handle connected to /dev/null and using fprintf() with that. /dev/null was always guaranteed to take as much data as you give it so we would actually get the size from that, then allocate a buffer if necessary.
Keep in kind that not all systems have snprintf() (for example, I understand it's _snprintf() in Microsoft C) so you may have to find the function that does the same job, or revert to the fprintf /dev/null solution.
Also be careful if the data can be changed between the size-checking snprintf() and the actual snprintf() to the buffer (i.e., wathch out for threads). If the sizes increase, you'll get buffer overflow corruption.
If you follow the rule that data, once handed to a function, belongs to that function exclusively until handed back, this won't be a problem.
For what it's worth, asprintf is a GNU extension that manages this functionality. It accepts a pointer as an output argument, along with a format string and a variable number of arguments, and writes back to the pointer the address of a properly-allocated buffer containing the result.
You can use it like so:
#define _GNU_SOURCE
#include <stdio.h>
int main(int argc, char const *argv[])
{
char *hi = "hello"; // these could be really long
char *everyone = "world";
char *message;
asprintf(&message, "%s %s", hi, everyone);
puts(message);
free(message);
return 0;
}
Hope this helps someone!
Take a look at CodeProject: CString-clone Using Standard C++. It uses solution you suggested with enlarging buffer size.
// -------------------------------------------------------------------------
// FUNCTION: FormatV
// void FormatV(PCSTR szFormat, va_list, argList);
//
// DESCRIPTION:
// This function formats the string with sprintf style format-specs.
// It makes a general guess at required buffer size and then tries
// successively larger buffers until it finds one big enough or a
// threshold (MAX_FMT_TRIES) is exceeded.
//
// PARAMETERS:
// szFormat - a PCSTR holding the format of the output
// argList - a Microsoft specific va_list for variable argument lists
//
// RETURN VALUE:
// -------------------------------------------------------------------------
void FormatV(const CT* szFormat, va_list argList)
{
#ifdef SS_ANSI
int nLen = sslen(szFormat) + STD_BUF_SIZE;
ssvsprintf(GetBuffer(nLen), nLen-1, szFormat, argList);
ReleaseBuffer();
#else
CT* pBuf = NULL;
int nChars = 1;
int nUsed = 0;
size_type nActual = 0;
int nTry = 0;
do
{
// Grow more than linearly (e.g. 512, 1536, 3072, etc)
nChars += ((nTry+1) * FMT_BLOCK_SIZE);
pBuf = reinterpret_cast<CT*>(_alloca(sizeof(CT)*nChars));
nUsed = ssnprintf(pBuf, nChars-1, szFormat, argList);
// Ensure proper NULL termination.
nActual = nUsed == -1 ? nChars-1 : SSMIN(nUsed, nChars-1);
pBuf[nActual+1]= '\0';
} while ( nUsed < 0 && nTry++ < MAX_FMT_TRIES );
// assign whatever we managed to format
this->assign(pBuf, nActual);
#endif
}
I've looked for the same functionality you're talking about, but as far as I know, something as simple as the C99 method is not available in C++, because C++ does not currently incorporate the features added in C99 (such as snprintf).
Your best bet is probably to use a stringstream object. It's a bit more cumbersome than a clearly written sprintf call, but it will work.
Since you're using C++, there's really no need to use any version of sprintf. The simplest thing to do is use a std::ostringstream.
std::ostringstream oss;
oss << a << " " << b << std::endl;
oss.str() returns a std::string with the contents of what you've written to oss. Use oss.str().c_str() to get a const char *. It's going to be a lot easier to deal with in the long run and eliminates memory leaks or buffer overruns. Generally, if you're worrying about memory issues like that in C++, you're not using the language to its full potential, and you should rethink your design.