Read .part files and concatenate them all

Read .part files and concatenate them all - c++

So I am writing my own custom FTP client for a school project. I managed to get everything to work with the swarming FTP client and am down to one last small part...reading the .part files into the main file. I need to do two things. (1) Get this to read each file and write to the final file properly (2) The command to delete the part files after I am done with each one.
Can someone please help me to fix my concatenate function I wrote below? I thought I had it right to read each file until the EOF and then go on to the next.
In this case *numOfThreads is 17. Ended up with a file of 4742442 bytes instead of 594542592 bytes. Thanks and I am happy to provide any other useful information.
EDIT: Modified code for comment below.
std::string s = "Fedora-15-x86_64-Live-Desktop.iso";
std::ofstream out;
out.open(s.c_str(), std::ios::out);
for (int i = 0; i < 17; ++i)
{
std::ifstream in;
std::ostringstream convert;
convert << i;
std::string t = s + ".part" + convert.str();
in.open(t.c_str(), std::ios::in | std::ios::binary);
int size = 32*1024;
char *tempBuffer = new char[size];
if (in.good())
{
while (in.read(tempBuffer, size))
out.write(tempBuffer, in.gcount());
}
delete [] tempBuffer;
in.close();
}
out.close();
return 0;

Almost everything in your copying loop has problems.
while (!in.eof())
This is broken. Not much more to say than that.
bzero(tempBuffer, size);
This is fairly harmless, but utterly pointless.
in.read(tempBuffer, size);
This the "almost" part -- i.e., the one piece that isn't obviously broken.
out.write(tempBuffer, strlen(tempBuffer));
You don't want to use strlen to determine the length -- it's intended only for NUL-terminated (C-style) strings. If (as is apparently the case) the data you read may contain zero-bytes (rather than using zero-bytes only to signal the end of a string), this will simply produce the wrong size.
What you normally want to do is a loop something like:
while (read(some_amount) == succeeded)
write(amount that was read);
In C++ that will typically be something like:
while (infile.read(buffer, buffer_size))
outfile.write(buffer, infile.gcount());
It's probably also worth noting that since you're allocating memory for the buffer using new, but never using delete, your function is leaking memory. Probably better to do without new for this -- an array or vector would be obvious alternatives here.
Edit: as for why while (infile.read(...)) works, the read returns a reference to the stream. The stream in turn provides a conversion to bool (in C++11) or void * (in C++03) that can be interpreted as a Boolean. That conversion operator returns the state of the stream, so if reading failed, it will be interpreted as false, but as long as it succeeded, it will be interpreted as true.

Related

Reading contents of file into dynamically allocated char* array- can I read into std::string instead?

I have found myself writing code which looks like this
// Treat the following as pseudocode - just an example
iofile.seekg(0, std::ios::end); // iofile is a file opened for read/write
uint64_t f_len = iofile.tellg();
if(f_len >= some_min_length)
{
// Focus on the following code here
char *buf = new char[7];
char buf2[]{"MYFILET"}; // just some random string
// if we see this it's a good indication
// the rest of the file will be in the
// expected format (unlikely to see this
// sequence in a "random file", but don't
// worry too much about this)
iofile.read(buf, 7);
if(memcmp(buf, buf2, 7) == 0) // I am confident this works
{
// carry on processing file ...
// ...
// ...
}
}
else
cout << "invalid file format" << endl;
This code is probably an okay sketch of what we might want to do when opening a file, which has some specified format (which I've dictated). We do some initial check to make sure the string "MYFILET" is at the start of the file - because I've decided all my files for the job I'm doing are going to start with this sequence of characters.
I think this code would be better if we didn't have to play around with "c-style" character arrays, but used strings everywhere instead. This would be advantageous because we could do things like if(buf == buf2) if buf and buf2 where std::strings.
A possible alternative could be,
// Focus on the following code here
std::string buf;
std::string buf2("MYFILET"); // very nice
buf.resize(7); // okay, but not great
iofile.read(buf.data(), 7); // pretty awful - error prone if wrong length argument given
// also we have to resize buf to 7 in the previous step
// lots of potential for mistakes here,
// and the length was used twice which is never good
if(buf == buf2) then do something
What are the problems with this?
We had to use the length variable 7 (or constant in this case) twice. Which is somewhere between "not ideal" and "potentially error prone".
We had to access the contents of buf using .data() which I shall assume here is implemented to return a raw pointer of some sort. I don't personally mind this too much, but others may prefer a more memory-safe solution, perhaps hinting we should use an iterator of some sort? I think in Visual Studio (for Windows users which I am not) then this may return an iterator anyway, which will give [?] warnings/errors [?] - not sure on this.
We had to have an additional resize statement for buf. It would be better if the size of buf could be automatically set somehow.

It is undefined behavior to write into the const char* returned by std::string::data(). However, you are free to use std::vector::data() in this way.
If you want to use std::string, and dislike setting the size yourself, you may consider whether you can use std::getline(). This is the free function, not std::istream::getline(). The std::string version will read up to a specified delimiter, so if you have a text format you can tell it to read until '\0' or some other character which will never occur, and it will automatically resize the given string to hold the contents.
If your file is binary in nature, rather than text, I think most people would find std::vector<char> to be a more natural fit than std::string anyway.

We had to use the length variable 7 (or constant in this case) twice.
Which is somewhere between "not ideal" and "potentially error prone".
The second time you can use buf.size()
iofile.read(buf.data(), buf.size());
We had to access the contents of buf using .data() which I shall
assume here is implemented to return a raw pointer of some sort.
And pointed by John Zwinck, .data() return a pointer to const.
I suppose you could define buf as std::vector<char>; for vector (if I'm not wrong) .data() return a pointer to char (in this case), not to const char.
size() and resize() are working in the same way.
We had to have an additional resize statement for buf. It would be
better if the size of buf could be automatically set somehow.
I don't think read() permit this.
p.s.: sorry for my bad English.

We can validate a signature without double buffering (rdbuf and a string) and allocating from the heap...
// terminating null not included
constexpr char sig[] = { 'M', 'Y', 'F', 'I', 'L', 'E', 'T' };
auto ok = all_of(begin(sig), end(sig), [&fs](char c) { return fs.get() == (int)c; });
if (ok) {}

template<class Src>
std::string read_string( Src& src, std::size_t count){
std::string buf;
buf.resize(count);
src.read(&buf.front(), 7); // in C++17 make it buf.data()
return buf;
}
Now auto read = read_string( iofile, 7 ); is clean at point of use.
buf2 is a bad plan. I'd do:
if(read=="MYFILET")
directly, or use a const char myfile_magic[] = "MYFILET";.

I liked many of the ideas from the examples above, however I wasn't completely satisfied that there was an answer which would produce undefined-behaviour-free code for C++11 and C++17. I currently write most of my code in C++11 - because I don't anticipate using it on a machine in the future which doesn't have a C++11 compiler.
If one doesn't, then I add a new compiler or change machines.
However it does seem to me to be a bad idea to write code which I know may not work under C++17... That's just my personal opinion. I don't anticipate using this code again, but I don't want to create a potential problem for myself in the future.
Therefore I have come up with the following code. I hope other users will give feedback to help improve this. (For example there is no error checking yet.)
std::string
fstream_read_string(std::fstream& src, std::size_t n)
{
char *const buffer = new char[n + 1];
src.read(buffer, n);
buffer[n] = '\0';
std::string ret(buffer);
delete [] buffer;
return ret;
}
This seems like a basic, probably fool-proof method... It's a shame there seems to be no way to get std::string to use the same memory as allocated by the call to new.
Note we had to add an extra trailing null character in the C-style string, which is sliced off in the C++-style std::string.

Output info from 2 struct arrays into one file

I apologize if this doesn't make sense. I Am not sure what to google.
Lets say I have two arrays
string a_1[16];
string a_2[20];
I need to output these to a file with a function, first, a_1[0] to a_1[n].
Then reads in the a_2's.
It's also possible to run the function again to add in more a_1's and a_2's to the output file.
so the format will be:
//output_file.txt
a_1[0].....a_1[n]
a_2[0].....a_2[M]
a_1[n+1]...a_1[16]
a_2[M+1]...a_2[20]
my question is. Is there a way to read output_file.txt back so that it will read in all of the a_1's to be in order, a_1[0] to a_1[16].
and then input a_2[0] to a_2[20].
maybe just put "something" between each group so that when "something" is read, it knows to stop reading a_1's and switch to reading in for a_2....

What the OP calls "Something" is typically called a Sentinel or Canary value. To be used as a sentinel, you have to find a pattern that cannot exist in the data stream. This is hard because pretty much anything can be in a string. If you use, say, "XxXxXx" as your sentinel, then you have to be very careful that it is never written to the file.
The concept of Escape Characters (Look it up) can be used here, but a better approach could be to store a count of stored strings at the beginning of the file. Consider an output file that looks like
4
string a1_1
string a1_2
string a1_3
string a1_4
2
string a2_1
string a2_2
Read the cont, four, and then read count strings, then read for the next count and then read count more strings
OK, so you're thinking his sucks. I can't just insert a new string into a1 without also changing the number at the front of the file.
Well, good luck with inserting data into the middle of a file without totally smurfing up the file. It can be done, but only after moving everything after the insertion over by the size of the insertion, and that's not as trivial as it sounds. At the point in a programming career where this is the sort of task to which you are assigned, and you have to ask for help, you are pretty much doomed to reading the file into memory, inserting the new values, and writing the file back out again, so just go with it.
So what does this look like in code? First we ditch the arrays in favour of std::vector. Vectors are smart. They grow to fit. They know how much stuff is in them. They look after themselves so there is no unnecessary new and delete nonsense. You gotta be stupid not to use them.
Reading:
std::ifstream infile(file name);
std::vector<std::string> input;
int count;
if (infile >> count)
{
infile.ignore(); // discard end of line
std::string line;
while (input.size() < count && getline(infile, line))
{
input.push_back(line);
}
if (input.size() != count)
{
//handle bad file
}
}
else
{
// handle bad file
}
and writing
std::ofstream outfile(file name);
if(outfile << output.size())
{
for (std::string & out: output)
{
if (!outfile << out << '\n')
{
// handle write error
}
}
}
else
{
// handle write error
}
But this looks like homework, so OP's probably not allowed to use one. In that case, the logic is the same, but you have to
std::unique_ptr<std::string[]> inarray(new std::string[count]);
or
std::string * inarray = new std::string[count];
to allocate storage for the string you are reading in. The second one looks like less work than the first. Looks are deceiving. The first one looks after your memory for you. The second requires at least one delete[] in your code at the right pace to put the memory away. Miss it and you have a memory leak.
You also need to have a variable keeping track of the size of the array, because pointers don't know how big whatever they are pointing at is. This makes the write for loop less convenient.

C++ unexecuted code changes function behaviour?

I use the following code (taken from https://codereview.stackexchange.com/questions/22901/reading-all-bytes-from-a-file) to efficiently read a file into an array in C++:
static std::vector<char> ReadAllBytes(char const* filename)
{
std::ifstream ifs(filename, std::ios::binary | std::ios::ate);
std::ifstream::pos_type pos = ifs.tellg();
std::vector<char> result(pos);
ifs.seekg(0, std::ios::beg);
ifs.read(&result[0], pos);
for (unsigned int i = 0; i < result.size(); i++)
if (result.at(i) != 0)
return result; // [1]
//if (!ifs.good()) // Commenting out changes contents of result
// return result; // [2] // Commenting out changes contents of result
return result; // [3]
}
Everything works perfectly, a breakpoint at [1] fires and the function returns the data (the loop is just for debug, as I have been getting 0-filled returns which should hold data). However, as soon as I remove the If-Statement at [2], breakpoint [3] fires and the array is empty (the size is correct, but the array is filled with zeros).
How can code which is never executed actually change the behavior of the function? I figured it might have something to do with the stack layout and the fact that I hold the stream and the data as local variables but manually creating them on the heap leads to the exact same situation.
You see me completely baffled. I have never seen something quite like this before. How can this be possible?
PS: I should add that the file contents are binary and the file is about ~32 MB in size.

Are you compiling with optimizations (release mode)? If so, then if I had to guess, I would say it's reordering your code. First off, notice that none of the code after the read actually matters. All it's doing is returning result in all cases and not changing anything in the array. Many of the functions could be inlined which means a good compiler could know this and remove all that code.
So that could easily explain the breakpoint behavior. But adding the if (!ifs.good()) and having the data be empty - that's tough to explain. Perhaps this tidbit about the read function provides insight:
Internally, the function accesses the input sequence by first constructing a sentry object (with noskipws set to true). Then (if good), it extracts characters from its associated stream buffer object as if calling its member functions sbumpc or sgetc, and finally destroys the sentry object before returning.
Notice that read contains a check on good(), which means that perhaps the compiler is combining the two checks of good() (the one in the read, after inlining, and the one in your code) and using that to skip the read entirely(?). That seems really unlikely, but perhaps it points you in a direction to debug.
I will say that I've seen this problem before with the optimizer. Not this specific problem with ifstreams but the more general problem that adding seemingly unused code changes the behavior. And the reason always ends up being that it's free to reorder code that doesn't change the results.

For anyone wondering: I could not pinpoint the reason for the strange behavior of the C++-Compiler here (maybe I did indeed find a bug?).
However, as soon as the file is read in chunks, everything works flawlessly. I'll post the code for your convenience:
static bool ReadAllBytes(char const* filename, char** result, size_t* size)
{
const size_t BUFFER_SIZE = 32;
std::ifstream ifs(filename, std::ios::binary | std::ios::ate | std::ios::in);
*size = ifs.tellg();
*result = new char[*size];
ifs.seekg(0, std::ios::beg);
size_t operations = *size / BUFFER_SIZE;
size_t chunks = 0;
for (; chunks < operations && ifs.good(); chunks++)
ifs.read(*result + chunks * BUFFER_SIZE, BUFFER_SIZE);
if (ifs.good() && *size % BUFFER_SIZE > 0)
ifs.read(*result + chunks * BUFFER_SIZE, *size % BUFFER_SIZE);
return (ifs.good());
}

C++ character array allocation error

I have a function designed to get a file's contents:
bool getFileContents(std::string loc, std::string &code) {
std::ifstream file(loc.c_str());
if(!file.is_open())
return err("Source file could not be read");
int length;
file.seekg(0, std::ios::end);
length = file.tellg();
file.seekg(0, std::ios::beg);
char *buffer = new char[length];
file.read(buffer, length);
code = buffer;
delete[] buffer;
file.close();
return true;
}
When I run this function, the file's length is always retrieved accurately. However, if I call the function once with a file, call it again with a nonexistent file, then call it one more time with the original file, the character string 'buffer' is larger than the int 'length'.
Well, that may not be accurate, rather - when the string 'buffer' is copied to the string 'code', 'code' is longer than 'length'. 'code' is, in each instance, instantiated immediately before the call to 'getFileContents', so it's not a matter of a previous value.
This also seems to occur if I retrieve the contents of a file, subsequently add or remove some text from the file, and retrieve the same file's contents again.
I have little experience with character strings, and figure that I'm not using them correctly, but the code I'm using came from an example, and I can't for the life of me find anything wrong with it.
Thanks for any help,
Wyatt

Well, the problem is that code = buffer relies on a NUL (\0) character to know where the buffer ends. You may be getting the NUL character by chance sometimes (esp. when the program has just started), but not always. Hence the intermittent behaviour.
Try replacing code = buffer with code = std::string(buffer, length).

Apart of the \0 problem described by aix, you do double allocation, which is not necessary here and unsafe (it might be an exception before delete, and you'll have a memory leak). Instead, you can allocate the buffer inside the string, as follows:
code.resize(length);
file.read(&code[0], length);
And don't forget to check the return value of read. It is not guaranteed that all length bytes will be read in one step.

boost memorybuffer and char array

I'm currently unpacking one of blizzard's .mpq file for reading.
For accessing the unpacked char buffer, I'm using a boost::interprocess::stream::memorybuffer.
Because .mpq files have a chunked structure always beginning with a version header (usually 12 bytes, see http://wiki.devklog.net/index.php?title=The_MoPaQ_Archive_Format#2.2_Archive_Header), the char* array representation seems to truncate at the first \0, even if the filesize (something about 1.6mb) remains constant and (probably) always allocated.
The result is a streambuffer with an effective length of 4 ('REVM' and byte nr.5 is \0). When attempting to read further, an exception is thrown. Here an example:
// (somewhere in the code)
{
MPQFile curAdt(FilePath);
size_t size = curAdt.getSize(); // roughly 1.6 mb
bufferstream memorybuf((char*)curAdt.getBuffer(), curAdt.getSize());
// bufferstream.m_buf.m_buffer is now 'REVM\0' (Debugger says so),
// but internal length field still at 1.6 mb
}
//////////////////////////////////////////////////////////////////////////////
// wrapper around a file oof the mpq_archive of libmpq
MPQFile::MPQFile(const char* filename) // I apologize my naming inconsistent convention :P
{
for(ArchiveSet::iterator i=gOpenArchives.begin(); i!=gOpenArchives.end();++i)
{
// gOpenArchives points to MPQArchive, wrapper around the mpq_archive, has mpq_archive * mpq_a as member
mpq_archive &mpq_a = (*i)->mpq_a;
// if file exists in that archive, tested via hash table in file, not important here, scroll down if you want
mpq_hash hash = (*i)->GetHashEntry(filename);
uint32 blockindex = hash.blockindex;
if ((blockindex == 0xFFFFFFFF) || (blockindex == 0)) {
continue; //file not found
}
uint32 fileno = blockindex;
// Found!
size = libmpq_file_info(&mpq_a, LIBMPQ_FILE_UNCOMPRESSED_SIZE, fileno);
// HACK: in patch.mpq some files don't want to open and give 1 for filesize
if (size<=1) {
eof = true;
buffer = 0;
return;
}
buffer = new char[size]; // note: size is 1.6 mb at this time
// Now here comes the tricky part... if I step over the libmpq_file_getdata
// function, I'll get my truncated char array, which I absolutely don't want^^
libmpq_file_getdata(&mpq_a, hash, fileno, (unsigned char*)buffer);
return;
}
}
Maybe someone could help me. I'm really new to STL and boost programming and also inexperienced in C++ programming anyways :P Hope to get a convenient answer (plz not suggest to rewrite libmpq and the underlying zlib architecture^^).
The MPQFile class and the underlying uncompress methods are acutally taken from a working project, so the mistake is either somewhere in the use of the buffer with the streambuffer class or something internal with char array arithmetic I haven't a clue of.
By the way, what is the difference between using signed/unsigned chars as data buffers? Has it anything to do with my problem (you might see, that in the code randomly char* unsigned char* is taken as function arguments)
If you need more infos, feel free to ask :)

How are you determining that your char* array is being 'truncated' as you call it? If you're printing it or viewing it in a debugger it will look truncated because it will be treated like a string, which is terminated by \0. The data in 'buffer' however (assuming libmpq_file_getdata() does what it's supposed to do) will contain the whole file or data chunk or whatever.

Sorry, messed up a bit with these terms (not memorybuffer actually, streambuffer is meant as in the code)
Yeah you where right... I had a mistake in my exception handling. Right after that first bit of code comes this:
// check if the file has been open
//if (!mpf.is_open())
pair<char*, size_t> temp = memorybuf.buffer();
if(temp.first)
throw AdtException(ADT_PARSEERR_EFILE);//Can't open the File
notice the missing ! before temp.first . I was surprized by the exception thrown, looked at the streambuffer .. internal buffer at was confused of its length (C# background :P).
Sorry for that, it's working as expected now.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Read .part files and concatenate them all - c++

Related

Reading contents of file into dynamically allocated char* array- can I read into std::string instead?

Output info from 2 struct arrays into one file

C++ unexecuted code changes function behaviour?

C++ character array allocation error

boost memorybuffer and char array

Categories

Resources