File loader problems - c++

i have a text file which contains authors and books lists, i need to load it to my program, here is the code of the method which should load it:
void Loader::loadFile(const char* path)
{
FILE* file = fopen(path, "r");
char* bufferString;
while (feof(file) != 1) {
fgets(bufferString, 1000, file);
printf("%s", bufferString);
}
}
I use it in my main file:
int main(int argc, char** argv) {
Loader* loader = new Loader();
loader->loadFile("/home/terayon/prog/parser/data.txt");
return 0;
}
And I get data.txt file is not completely printed.
What I should do to get data completed?

fgets reads into the memory pointed to by the pointer passed as first parameter, bufferString on your case.
But your bufferString is an uninitialised pointer (leading to undefined behaviour):
char * bufferString;
// not initialised,
// and definitely not pointing to valid memory
So you need to provide some memory to read into, e.g by making it an array:
char bufferString[1000];
// that's a bit large to store on the stack
As a side note: Your code is not idiomatic C++. You're using the IO functions provided by the C standard library, which is possible, but using the facilities of the C++ STL would be more appropriate.

You have undefined behavior, you have a pointer bufferString but you never actually make int point anywhere. Since it's not initialized its value will be indeterminate and will seem to be random, meaning you will write to unallocated memory in the fgets call.
It's easy to solve though, declare it as an array, and use the array size when calling fgets:
char bufferString[500];
...
fgets(bufferString, sizeof(bufferString), file);
Besides the problem detailed above, you should not do while(!feof(file)), it will not work as you expect it to. The reason is that the EOF flag is not set until you try to read from beyond the file, leading the loop to iterate once to many.
You should instead do e.g. while (fgets(...) != NULL)
The code you have is not very C++-ish, instead it's using the old C functions for file handling. Instead I suggest you read more about the C++ standard I/O library and std::string which is a auto-expanding string class that won't have the limits of C arrays, and won't suffer from potential buffer overflows in the same way.
The code could then look something like this
std::ifstream input_file(path);
std::string input_buffer;
while (std::getline(input_file, input_buffer))
std::cout << input_buffer << '\n';

Related

Reading contents of file into dynamically allocated char* array- can I read into std::string instead?

I have found myself writing code which looks like this
// Treat the following as pseudocode - just an example
iofile.seekg(0, std::ios::end); // iofile is a file opened for read/write
uint64_t f_len = iofile.tellg();
if(f_len >= some_min_length)
{
// Focus on the following code here
char *buf = new char[7];
char buf2[]{"MYFILET"}; // just some random string
// if we see this it's a good indication
// the rest of the file will be in the
// expected format (unlikely to see this
// sequence in a "random file", but don't
// worry too much about this)
iofile.read(buf, 7);
if(memcmp(buf, buf2, 7) == 0) // I am confident this works
{
// carry on processing file ...
// ...
// ...
}
}
else
cout << "invalid file format" << endl;
This code is probably an okay sketch of what we might want to do when opening a file, which has some specified format (which I've dictated). We do some initial check to make sure the string "MYFILET" is at the start of the file - because I've decided all my files for the job I'm doing are going to start with this sequence of characters.
I think this code would be better if we didn't have to play around with "c-style" character arrays, but used strings everywhere instead. This would be advantageous because we could do things like if(buf == buf2) if buf and buf2 where std::strings.
A possible alternative could be,
// Focus on the following code here
std::string buf;
std::string buf2("MYFILET"); // very nice
buf.resize(7); // okay, but not great
iofile.read(buf.data(), 7); // pretty awful - error prone if wrong length argument given
// also we have to resize buf to 7 in the previous step
// lots of potential for mistakes here,
// and the length was used twice which is never good
if(buf == buf2) then do something
What are the problems with this?
We had to use the length variable 7 (or constant in this case) twice. Which is somewhere between "not ideal" and "potentially error prone".
We had to access the contents of buf using .data() which I shall assume here is implemented to return a raw pointer of some sort. I don't personally mind this too much, but others may prefer a more memory-safe solution, perhaps hinting we should use an iterator of some sort? I think in Visual Studio (for Windows users which I am not) then this may return an iterator anyway, which will give [?] warnings/errors [?] - not sure on this.
We had to have an additional resize statement for buf. It would be better if the size of buf could be automatically set somehow.
It is undefined behavior to write into the const char* returned by std::string::data(). However, you are free to use std::vector::data() in this way.
If you want to use std::string, and dislike setting the size yourself, you may consider whether you can use std::getline(). This is the free function, not std::istream::getline(). The std::string version will read up to a specified delimiter, so if you have a text format you can tell it to read until '\0' or some other character which will never occur, and it will automatically resize the given string to hold the contents.
If your file is binary in nature, rather than text, I think most people would find std::vector<char> to be a more natural fit than std::string anyway.
We had to use the length variable 7 (or constant in this case) twice.
Which is somewhere between "not ideal" and "potentially error prone".
The second time you can use buf.size()
iofile.read(buf.data(), buf.size());
We had to access the contents of buf using .data() which I shall
assume here is implemented to return a raw pointer of some sort.
And pointed by John Zwinck, .data() return a pointer to const.
I suppose you could define buf as std::vector<char>; for vector (if I'm not wrong) .data() return a pointer to char (in this case), not to const char.
size() and resize() are working in the same way.
We had to have an additional resize statement for buf. It would be
better if the size of buf could be automatically set somehow.
I don't think read() permit this.
p.s.: sorry for my bad English.
We can validate a signature without double buffering (rdbuf and a string) and allocating from the heap...
// terminating null not included
constexpr char sig[] = { 'M', 'Y', 'F', 'I', 'L', 'E', 'T' };
auto ok = all_of(begin(sig), end(sig), [&fs](char c) { return fs.get() == (int)c; });
if (ok) {}
template<class Src>
std::string read_string( Src& src, std::size_t count){
std::string buf;
buf.resize(count);
src.read(&buf.front(), 7); // in C++17 make it buf.data()
return buf;
}
Now auto read = read_string( iofile, 7 ); is clean at point of use.
buf2 is a bad plan. I'd do:
if(read=="MYFILET")
directly, or use a const char myfile_magic[] = "MYFILET";.
I liked many of the ideas from the examples above, however I wasn't completely satisfied that there was an answer which would produce undefined-behaviour-free code for C++11 and C++17. I currently write most of my code in C++11 - because I don't anticipate using it on a machine in the future which doesn't have a C++11 compiler.
If one doesn't, then I add a new compiler or change machines.
However it does seem to me to be a bad idea to write code which I know may not work under C++17... That's just my personal opinion. I don't anticipate using this code again, but I don't want to create a potential problem for myself in the future.
Therefore I have come up with the following code. I hope other users will give feedback to help improve this. (For example there is no error checking yet.)
std::string
fstream_read_string(std::fstream& src, std::size_t n)
{
char *const buffer = new char[n + 1];
src.read(buffer, n);
buffer[n] = '\0';
std::string ret(buffer);
delete [] buffer;
return ret;
}
This seems like a basic, probably fool-proof method... It's a shame there seems to be no way to get std::string to use the same memory as allocated by the call to new.
Note we had to add an extra trailing null character in the C-style string, which is sliced off in the C++-style std::string.

Converting file to char* Issue

I am a beginner at C++ programming, and I encountered an issue.
I want to be able to convert the contents of a file to a char*, and I used file and string streams. However, it's not working.
This is my function that does the work:
char* fileToChar(std::string const& file){
std::ifstream in(file);
if (!in){
std::cout << "Error: file does not exist\n";
exit(EXIT_FAILURE);
}
std::stringstream buffer;
buffer << in.rdbuf() << std::flush;
in.close();
return const_cast<char *>(buffer.str().c_str());
}
However, when I test the method out by outputting its contents into another file like this:
std::ofstream file("test.txt");
file << fileToChar("fileTest.txt");
I just get tons of strange characters like this:
îþîþîþîþîþîþîþîþîþîþîþîþîþîþîþîþîþ[...etc]
What exactly is going on here? Is there anything I missed?
And if there's a better way to do this, I would be glad to know!
return const_cast<char *>(buffer.str().c_str());
returns a pointer to the internal char buffer of a temporary copy of the internal buffer of the local stringstream. Long story short: As soon as you exit the function, this pointer points to garbage.
Btw, even if that was not a problem, the const_cast would be dangerous nonsense, you are not allowed to write through the pointer std::string::c_str returns. Legitimate uses of const_cast are extremely rare.
And for the better way: The best and easiest way would be returning std::string. Only if this is not allowed, a std::vector<char> (preferred) or new char[somelength] (frowned on) would be viable solutions.
char* fileToChar(std::string const& file){
This line already shows that something is going into the wrong direction. You return a pointer to some string, and it's completely unclear to the user of the function who is responsible for releasing the allocated memory, if it has to be released at all, if nullptr can be returned, and so on.
If you want a string, then by all means use std::string!
std::string fileToChar(std::string const& file){
return const_cast<char *>(buffer.str().c_str());
Another line that should make all alarms go off. const_cast is always a workaround to some underlying problem (or some problem with external code).
There is usually a good reason why something is const. By forcing the compiler to turn off the security check and allowing it to attempt modifications of unmodifiable data, you typically turn compilation errors into hard-to-diagnose run-time errors.
Even if this function worked correctly, any attempt to modify the result would be undefined behaviour:
char* file_contents = fileToChar("foo.txt");
file_contents[0] = 'x'; // undefined behaviour
But it does not work correctly anyway. buffer.str() returns a temporary std::string object. c_str() returns a pointer to that temporary object's internally managed memory. The object's lifetime ends when the full expression return const_cast<char *>(buffer.str().c_str()) has been evaluated. Using the resulting pointer is therefore undefined behaviour, too.
The problems sound complicated, but the fix is easy. Make the function return std::string and turn the last statement into return buffer.str();.
If your question is, how to read the content of a file into an buffer, consider my following suggestion. But take care that buffer is big enough for the file content. A file size check and preallocation of the memory is advised before calling fileToChar().
bool fileToChar(std::string const& file, char* buffer, unsigned int &buffer_size )
{
FILE *f = fopen( file.c_str(), "rb" );
if( f == nullptr )
{
return false;
}
fseek(f , 0, SEEK_END );
const int size = ftell( f );
rewind( f );
fread( buffer, 1, size, f );
fclose( f );
return true;
}

Memory Managing C

I wrote a function, using the C header stdio.h that returns content of a file (text or html). Can anyone please go through it and suggest if I have done the memory management efficiently. I shall be so pleased to hear better suggestions that I can improve my codes.
char *getFileContent(const char *filePath)
{
//Prepare read file
FILE *pReadFile;
long bufferReadSize;
char *bufferReadFileHtml;
size_t readFileSize;
char readFilePath[50];
sprintf_s(readFilePath, "%s", filePath);
pReadFile = fopen (readFilePath, "rb");
if (pReadFile != NULL)
{
// Get file size.
fseek (pReadFile , 0 , SEEK_END);
bufferReadSize = ftell (pReadFile);
rewind (pReadFile);
// Allocate RAM to contain the whole file:
bufferReadFileHtml = (char*) malloc (sizeof(char) * bufferReadSize);
if (bufferReadFileHtml != NULL)
{
// Copy the file into the buffer:
readFileSize = fread (bufferReadFileHtml, sizeof(char), bufferReadSize, pReadFile);
if (readFileSize == bufferReadSize)
{
return bufferReadFileHtml;
} else {
char errorBuffer[50];
sprintf_s(errorBuffer, "Error! Buffer overflow for file: %s", readFilePath);
}
} else {
char errorBuffer[50];
sprintf_s(errorBuffer, "Error! Insufficient RAM for file: %s", readFilePath);
}
fclose (pReadFile);
free (bufferReadFileHtml);
} else {
char errorBuffer[50];
sprintf_s(errorBuffer, "Error! Unable to open file: %s", readFilePath);
}
}
This looks like a C program, not a C++ program. While it will compile using most C++ compilers, it doesn't take advantage of any C++ features (e.g. new/new[], delete/delete[], explicit casting, stream operators, strings, nullptr etc.)
Your code almost looks like a safe C function, although I think sprintf_s is a Microsoft-only function, and thus probably won't compile using GCC, Clang, Intel, etc. as it isn't a part of the standard.
Your function should also return a value at all times. Turn compiler warnings on to catch these kinds of things; they make debugging a lot easier :)
There's not much to be said without knowing how you will be using the buffer you've created. Here are some possible considerations:
1) When you are reading the file into a buffer, your processor is doing nothing else for this program. It might be better to read and begin analyzing the already read portion in parallel.
2) If you need really fast-efficient-low memory file IO, consider converting your program to a state machine and forget about the buffer altogether.
3) If you don't really have a very demanding application, you are killing yourself by writing in C. C#, python, etc-- almost any other language has better string manipulation libraries.
Btw, you should use snprintf for portability and safety, as others have pointed out.

Read .part files and concatenate them all

So I am writing my own custom FTP client for a school project. I managed to get everything to work with the swarming FTP client and am down to one last small part...reading the .part files into the main file. I need to do two things. (1) Get this to read each file and write to the final file properly (2) The command to delete the part files after I am done with each one.
Can someone please help me to fix my concatenate function I wrote below? I thought I had it right to read each file until the EOF and then go on to the next.
In this case *numOfThreads is 17. Ended up with a file of 4742442 bytes instead of 594542592 bytes. Thanks and I am happy to provide any other useful information.
EDIT: Modified code for comment below.
std::string s = "Fedora-15-x86_64-Live-Desktop.iso";
std::ofstream out;
out.open(s.c_str(), std::ios::out);
for (int i = 0; i < 17; ++i)
{
std::ifstream in;
std::ostringstream convert;
convert << i;
std::string t = s + ".part" + convert.str();
in.open(t.c_str(), std::ios::in | std::ios::binary);
int size = 32*1024;
char *tempBuffer = new char[size];
if (in.good())
{
while (in.read(tempBuffer, size))
out.write(tempBuffer, in.gcount());
}
delete [] tempBuffer;
in.close();
}
out.close();
return 0;
Almost everything in your copying loop has problems.
while (!in.eof())
This is broken. Not much more to say than that.
bzero(tempBuffer, size);
This is fairly harmless, but utterly pointless.
in.read(tempBuffer, size);
This the "almost" part -- i.e., the one piece that isn't obviously broken.
out.write(tempBuffer, strlen(tempBuffer));
You don't want to use strlen to determine the length -- it's intended only for NUL-terminated (C-style) strings. If (as is apparently the case) the data you read may contain zero-bytes (rather than using zero-bytes only to signal the end of a string), this will simply produce the wrong size.
What you normally want to do is a loop something like:
while (read(some_amount) == succeeded)
write(amount that was read);
In C++ that will typically be something like:
while (infile.read(buffer, buffer_size))
outfile.write(buffer, infile.gcount());
It's probably also worth noting that since you're allocating memory for the buffer using new, but never using delete, your function is leaking memory. Probably better to do without new for this -- an array or vector would be obvious alternatives here.
Edit: as for why while (infile.read(...)) works, the read returns a reference to the stream. The stream in turn provides a conversion to bool (in C++11) or void * (in C++03) that can be interpreted as a Boolean. That conversion operator returns the state of the stream, so if reading failed, it will be interpreted as false, but as long as it succeeded, it will be interpreted as true.

Is it possible to use an std::string for read()?

Is it possible to use an std::string for read() ?
Example :
std::string data;
read(fd, data, 42);
Normaly, we have to use char* but is it possible to directly use a std::string ? (I prefer don't create a char* for store the result)
Thank's
Well, you'll need to create a char* somehow, since that's what the
function requires. (BTW: you are talking about the Posix function
read, aren't you, and not std::istream::read?) The problem isn't
the char*, it's what the char* points to (which I suspect is what
you actually meant).
The simplest and usual solution here would be to use a local array:
char buffer[43];
int len = read(fd, buffer, 42);
if ( len < 0 ) {
// read error...
} else if ( len == 0 ) {
// eof...
} else {
std::string data(buffer, len);
}
If you want to capture directly into an std::string, however, this is
possible (although not necessarily a good idea):
std::string data;
data.resize( 42 );
int len = read( fd, &data[0], data.size() );
// error handling as above...
data.resize( len ); // If no error...
This avoids the copy, but quite frankly... The copy is insignificant
compared to the time necessary for the actual read and for the
allocation of the memory in the string. This also has the (probably
negligible) disadvantage of the resulting string having an actual buffer
of 42 bytes (rounded up to whatever), rather than just the minimum
necessary for the characters actually read.
(And since people sometimes raise the issue, with regards to the
contiguity of the memory in std:;string: this was an issue ten or more
years ago. The original specifications for std::string were designed
expressedly to allow non-contiguous implementations, along the lines of
the then popular rope class. In practice, no implementor found this
to be useful, and people did start assuming contiguity. At which point,
the standards committee decided to align the standard with existing
practice, and require contiguity. So... no implementation has ever not
been contiguous, and no future implementation will forego contiguity,
given the requirements in C++11.)
No, you cannot and you should not. Usually, std::string implementations internally store other information such as the size of the allocated memory and the length of the actual string. C++ documentation explicitly states that modifying values returned by c_str() or data() results in undefined behaviour.
If the read function requires a char *, then no. You could use the address of the first element of a std::vector of char as long as it's been resized first. I don't think old (pre C++11) strings are guarenteed to have contiguous memory otherwise you could do something similar with the string.
No, but
std::string data;
cin >> data;
works just fine. If you really want the behaviour of read(2), then you need to allocate and manage your own buffer of chars.
Because read() is intended for raw data input, std::string is actually a bad choice, because std::string handles text. std::vector seems like the right choice to handle raw data.
Using std::getline from the strings library - see cplusplus.com - can read from an stream and write directly into a string object. Example (again ripped from cplusplus.com - 1st hit on google for getline):
int main () {
string str;
cout << "Please enter full name: ";
getline (cin,str);
cout << "Thank you, " << str << ".\n";
}
So will work when reading from stdin (cin) and from a file (ifstream).