Converting file to char* Issue - c++

I am a beginner at C++ programming, and I encountered an issue.
I want to be able to convert the contents of a file to a char*, and I used file and string streams. However, it's not working.
This is my function that does the work:
char* fileToChar(std::string const& file){
std::ifstream in(file);
if (!in){
std::cout << "Error: file does not exist\n";
exit(EXIT_FAILURE);
}
std::stringstream buffer;
buffer << in.rdbuf() << std::flush;
in.close();
return const_cast<char *>(buffer.str().c_str());
}
However, when I test the method out by outputting its contents into another file like this:
std::ofstream file("test.txt");
file << fileToChar("fileTest.txt");
I just get tons of strange characters like this:
îþîþîþîþîþîþîþîþîþîþîþîþîþîþîþîþîþ[...etc]
What exactly is going on here? Is there anything I missed?
And if there's a better way to do this, I would be glad to know!

return const_cast<char *>(buffer.str().c_str());
returns a pointer to the internal char buffer of a temporary copy of the internal buffer of the local stringstream. Long story short: As soon as you exit the function, this pointer points to garbage.
Btw, even if that was not a problem, the const_cast would be dangerous nonsense, you are not allowed to write through the pointer std::string::c_str returns. Legitimate uses of const_cast are extremely rare.
And for the better way: The best and easiest way would be returning std::string. Only if this is not allowed, a std::vector<char> (preferred) or new char[somelength] (frowned on) would be viable solutions.

char* fileToChar(std::string const& file){
This line already shows that something is going into the wrong direction. You return a pointer to some string, and it's completely unclear to the user of the function who is responsible for releasing the allocated memory, if it has to be released at all, if nullptr can be returned, and so on.
If you want a string, then by all means use std::string!
std::string fileToChar(std::string const& file){
return const_cast<char *>(buffer.str().c_str());
Another line that should make all alarms go off. const_cast is always a workaround to some underlying problem (or some problem with external code).
There is usually a good reason why something is const. By forcing the compiler to turn off the security check and allowing it to attempt modifications of unmodifiable data, you typically turn compilation errors into hard-to-diagnose run-time errors.
Even if this function worked correctly, any attempt to modify the result would be undefined behaviour:
char* file_contents = fileToChar("foo.txt");
file_contents[0] = 'x'; // undefined behaviour
But it does not work correctly anyway. buffer.str() returns a temporary std::string object. c_str() returns a pointer to that temporary object's internally managed memory. The object's lifetime ends when the full expression return const_cast<char *>(buffer.str().c_str()) has been evaluated. Using the resulting pointer is therefore undefined behaviour, too.
The problems sound complicated, but the fix is easy. Make the function return std::string and turn the last statement into return buffer.str();.

If your question is, how to read the content of a file into an buffer, consider my following suggestion. But take care that buffer is big enough for the file content. A file size check and preallocation of the memory is advised before calling fileToChar().
bool fileToChar(std::string const& file, char* buffer, unsigned int &buffer_size )
{
FILE *f = fopen( file.c_str(), "rb" );
if( f == nullptr )
{
return false;
}
fseek(f , 0, SEEK_END );
const int size = ftell( f );
rewind( f );
fread( buffer, 1, size, f );
fclose( f );
return true;
}

Related

a write function throwing the exception of read access violation in visual studio c++

I tried writing to a file and I am unable to write to it due to "Access violation reading location 0x0000000F" I am able to isolate the problem here is the sample code:
void test_1() {
std::fstream fio{ "xyz.dat",std::ios::in | std::ios::out | std::ios::binary | std::ios::app };
if (!fio) {
std::cerr << "sorry no file";
return;
}
std::string s_test{ "xyz hii \n workf" };
fio.write( ( char* )(s_test.length()), sizeof( size_t ) ); //the write causing issue
}
( char* )(s_test.length()) is treating the length of the string as though it's a pointer and passing address 15 into write. Since there is no valid character at address 15, this triggers Undefined Behaviour, and the behaviour in this case is the program crashes.
This is always a problem when forced to use such a wide cast to force re-interpretation of a type. You can screw up horribly and all the compiler's defenses have been turned off. I don't have a good solution for this.
You need to pass in a legitimate address containing the length for write to operate on. To get this, you'll need to create a variable you can take the address of. &s_test.length(); isn't good enough here because you cannot take the address of a prvalue returned by a function.
auto len = s_test.length();
fio.write( reinterpret_cast<const char*>(&len), sizeof( len ) );
Note that writing a variable of automatically deduced type or a variable of a type that can change between compiler implementations is risky. It's hard to be sure how many bytes you're going to need to read at the other side.
uint32_t len = x.length();
Would be safer, and probably more compact, but at the risk of overflow with strings greater than 4.4 billion characters in length. That's a risk I'm willing to put up with.
Another concern is endian. It's not as common a problem as it used to be, but both the writer and reader need to agree on the byte order of the integer. htonl and ntohl can help mitigate this threat by guaranteeing a byte order.
Assuming that you are trying to write the length of the string to your output file, you can do it like this:
size_t len = s_test.length();
fio.write( ( const char * ) &len, sizeof( size_t ) );

Reading contents of file into dynamically allocated char* array- can I read into std::string instead?

I have found myself writing code which looks like this
// Treat the following as pseudocode - just an example
iofile.seekg(0, std::ios::end); // iofile is a file opened for read/write
uint64_t f_len = iofile.tellg();
if(f_len >= some_min_length)
{
// Focus on the following code here
char *buf = new char[7];
char buf2[]{"MYFILET"}; // just some random string
// if we see this it's a good indication
// the rest of the file will be in the
// expected format (unlikely to see this
// sequence in a "random file", but don't
// worry too much about this)
iofile.read(buf, 7);
if(memcmp(buf, buf2, 7) == 0) // I am confident this works
{
// carry on processing file ...
// ...
// ...
}
}
else
cout << "invalid file format" << endl;
This code is probably an okay sketch of what we might want to do when opening a file, which has some specified format (which I've dictated). We do some initial check to make sure the string "MYFILET" is at the start of the file - because I've decided all my files for the job I'm doing are going to start with this sequence of characters.
I think this code would be better if we didn't have to play around with "c-style" character arrays, but used strings everywhere instead. This would be advantageous because we could do things like if(buf == buf2) if buf and buf2 where std::strings.
A possible alternative could be,
// Focus on the following code here
std::string buf;
std::string buf2("MYFILET"); // very nice
buf.resize(7); // okay, but not great
iofile.read(buf.data(), 7); // pretty awful - error prone if wrong length argument given
// also we have to resize buf to 7 in the previous step
// lots of potential for mistakes here,
// and the length was used twice which is never good
if(buf == buf2) then do something
What are the problems with this?
We had to use the length variable 7 (or constant in this case) twice. Which is somewhere between "not ideal" and "potentially error prone".
We had to access the contents of buf using .data() which I shall assume here is implemented to return a raw pointer of some sort. I don't personally mind this too much, but others may prefer a more memory-safe solution, perhaps hinting we should use an iterator of some sort? I think in Visual Studio (for Windows users which I am not) then this may return an iterator anyway, which will give [?] warnings/errors [?] - not sure on this.
We had to have an additional resize statement for buf. It would be better if the size of buf could be automatically set somehow.
It is undefined behavior to write into the const char* returned by std::string::data(). However, you are free to use std::vector::data() in this way.
If you want to use std::string, and dislike setting the size yourself, you may consider whether you can use std::getline(). This is the free function, not std::istream::getline(). The std::string version will read up to a specified delimiter, so if you have a text format you can tell it to read until '\0' or some other character which will never occur, and it will automatically resize the given string to hold the contents.
If your file is binary in nature, rather than text, I think most people would find std::vector<char> to be a more natural fit than std::string anyway.
We had to use the length variable 7 (or constant in this case) twice.
Which is somewhere between "not ideal" and "potentially error prone".
The second time you can use buf.size()
iofile.read(buf.data(), buf.size());
We had to access the contents of buf using .data() which I shall
assume here is implemented to return a raw pointer of some sort.
And pointed by John Zwinck, .data() return a pointer to const.
I suppose you could define buf as std::vector<char>; for vector (if I'm not wrong) .data() return a pointer to char (in this case), not to const char.
size() and resize() are working in the same way.
We had to have an additional resize statement for buf. It would be
better if the size of buf could be automatically set somehow.
I don't think read() permit this.
p.s.: sorry for my bad English.
We can validate a signature without double buffering (rdbuf and a string) and allocating from the heap...
// terminating null not included
constexpr char sig[] = { 'M', 'Y', 'F', 'I', 'L', 'E', 'T' };
auto ok = all_of(begin(sig), end(sig), [&fs](char c) { return fs.get() == (int)c; });
if (ok) {}
template<class Src>
std::string read_string( Src& src, std::size_t count){
std::string buf;
buf.resize(count);
src.read(&buf.front(), 7); // in C++17 make it buf.data()
return buf;
}
Now auto read = read_string( iofile, 7 ); is clean at point of use.
buf2 is a bad plan. I'd do:
if(read=="MYFILET")
directly, or use a const char myfile_magic[] = "MYFILET";.
I liked many of the ideas from the examples above, however I wasn't completely satisfied that there was an answer which would produce undefined-behaviour-free code for C++11 and C++17. I currently write most of my code in C++11 - because I don't anticipate using it on a machine in the future which doesn't have a C++11 compiler.
If one doesn't, then I add a new compiler or change machines.
However it does seem to me to be a bad idea to write code which I know may not work under C++17... That's just my personal opinion. I don't anticipate using this code again, but I don't want to create a potential problem for myself in the future.
Therefore I have come up with the following code. I hope other users will give feedback to help improve this. (For example there is no error checking yet.)
std::string
fstream_read_string(std::fstream& src, std::size_t n)
{
char *const buffer = new char[n + 1];
src.read(buffer, n);
buffer[n] = '\0';
std::string ret(buffer);
delete [] buffer;
return ret;
}
This seems like a basic, probably fool-proof method... It's a shame there seems to be no way to get std::string to use the same memory as allocated by the call to new.
Note we had to add an extra trailing null character in the C-style string, which is sliced off in the C++-style std::string.

File loader problems

i have a text file which contains authors and books lists, i need to load it to my program, here is the code of the method which should load it:
void Loader::loadFile(const char* path)
{
FILE* file = fopen(path, "r");
char* bufferString;
while (feof(file) != 1) {
fgets(bufferString, 1000, file);
printf("%s", bufferString);
}
}
I use it in my main file:
int main(int argc, char** argv) {
Loader* loader = new Loader();
loader->loadFile("/home/terayon/prog/parser/data.txt");
return 0;
}
And I get data.txt file is not completely printed.
What I should do to get data completed?
fgets reads into the memory pointed to by the pointer passed as first parameter, bufferString on your case.
But your bufferString is an uninitialised pointer (leading to undefined behaviour):
char * bufferString;
// not initialised,
// and definitely not pointing to valid memory
So you need to provide some memory to read into, e.g by making it an array:
char bufferString[1000];
// that's a bit large to store on the stack
As a side note: Your code is not idiomatic C++. You're using the IO functions provided by the C standard library, which is possible, but using the facilities of the C++ STL would be more appropriate.
You have undefined behavior, you have a pointer bufferString but you never actually make int point anywhere. Since it's not initialized its value will be indeterminate and will seem to be random, meaning you will write to unallocated memory in the fgets call.
It's easy to solve though, declare it as an array, and use the array size when calling fgets:
char bufferString[500];
...
fgets(bufferString, sizeof(bufferString), file);
Besides the problem detailed above, you should not do while(!feof(file)), it will not work as you expect it to. The reason is that the EOF flag is not set until you try to read from beyond the file, leading the loop to iterate once to many.
You should instead do e.g. while (fgets(...) != NULL)
The code you have is not very C++-ish, instead it's using the old C functions for file handling. Instead I suggest you read more about the C++ standard I/O library and std::string which is a auto-expanding string class that won't have the limits of C arrays, and won't suffer from potential buffer overflows in the same way.
The code could then look something like this
std::ifstream input_file(path);
std::string input_buffer;
while (std::getline(input_file, input_buffer))
std::cout << input_buffer << '\n';

fopen mlock access violation

I'm repeatedly getting the same error exception for the following method.
Unhandled exception at 0x77a8f4e1 in AST.exe: 0xC0000005: Access violation reading location 0x29919ed9.
bool package::write(char * buf, size_t size, const char *fname)
{
//makeDir(fname);
FILE * output = fopen(fname, "wb");//break point
if (output == NULL)//break point
{
perror("ERROR: ");
return false;
}
fwrite(buf, size, sizeof(char), output);
fclose(output);
return true;
}
It has something to do with the fopen, I know that because of breakpoints. But it only gets the exception the four time it's used, no matter what I do. I've changed the the fname repeatedly, but it always crashes the forth time it's used. And for some reason after I click "Break", I end up at the 345th line of mlock.c.
I'd really appreciate any help in fixing this really annoying headache of an error.
Here is a simple complete example written in C that can be trivially converted to C++ showing the correct way to write your function.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int write( char * buffer, size_t bytes, const char * name )
{
FILE * output = fopen( name, "wb");
if ( !output )
{
perror( "ERROR" );
return 0;
}
fwrite( buffer, sizeof( char ), bytes, output);
fclose( output );
return 1;
}
int main( )
{
write( "Hello world!", strlen( "Hello world!" ), "output.txt" );
return 0;
}
You were calling fwrite() with the wrong order of arguments. The 2nd parameter is the size in bytes of each element to be written to your buffer. The 3rd parameter is the total number of elements to be written to the buffer. For reference, see fwrite().
You should specify an extension when creating a file with fwrite(). Otherwise, the system will create a file of generic type.
It isn't necessary to add a colon and a space after the string passed as an argument to perror() The function does this for you. For reference, see perror()
Break the problem down.
Write a program that calls your function four times in a row. If that does not fail then you know the problem is somewhere else.
You can either add more of your original program to the test program, or begin subtracting pieces of your original program. Back it up first, or commit it to source code control.
When you have things that act so strangely like this seems to be doing, it is very likely that the problem is somewhere completely different. Your program might be writing into memory that it doesn't own. For example, if part of the program holds a pointer to an object and that object gets deleted, and then mlock allocates and reuses that memory AND THEN the program uses that old pointer, it would be writing over mlock information, causing the crash.

Bind temporary to non-const reference

Rationale
I try to avoid assignments in C++ code completely. That is, I use only initialisations and declare local variables as const whenever possible (i.e. always except for loop variables or accumulators).
Now, I’ve found a case where this doesn’t work. I believe this is a general pattern but in particular it arises in the following situation:
Problem Description
Let’s say I have a program that loads the contents of an input file into a string. You can either call the tool by providing a filename (tool filename) or by using the standard input stream (cat filename | tool). Now, how do I initialise the string?
The following doesn’t work:
bool const use_stdin = argc == 1;
std::string const input = slurp(use_stdin ? static_cast<std::istream&>(std::cin)
: std::ifstream(argv[1]));
Why doesn’t this work? Because the prototype of slurp needs to look as follows:
std::string slurp(std::istream&);
That is, the argument i non-const and as a consequence I cannot bind it to a temporary. There doesn’t seem to be a way around this using a separate variable either.
Ugly Workaround
At the moment, I use the following solution:
std::string input;
if (use_stdin)
input = slurp(std::cin);
else {
std::ifstream in(argv[1]);
input = slurp(in);
}
But this is rubbing me the wrong way. First of all it’s more code (in SLOCs) but it’s also using an if instead of the (here) more logical conditional expression, and it’s using assignment after declaration which I want to avoid.
Is there a good way to avoid this indirect style of initialisation? The problem can likely be generalised to all cases where you need to mutate a temporary object. Aren’t streams in a way ill-designed to cope with such cases (a const stream makes no sense, and yet working on a temporary stream does make sense)?
Why not simply overload slurp?
std::string slurp(char const* filename) {
std::ifstream in(filename);
return slurp(in);
}
int main(int argc, char* argv[]) {
bool const use_stdin = argc == 1;
std::string const input = use_stdin ? slurp(std::cin) : slurp(argv[1]);
}
It is a general solution with the conditional operator.
The solution with the if is more or less the standard solution when
dealing with argv:
if ( argc == 1 ) {
process( std::cin );
} else {
for ( int i = 1; i != argc; ++ i ) {
std::ifstream in( argv[i] );
if ( in.is_open() ) {
process( in );
} else {
std::cerr << "cannot open " << argv[i] << std::endl;
}
}
This doesn't handle your case, however, since your primary concern is to
obtain a string, not to "process" the filename args.
In my own code, I use a MultiFileInputStream that I've written, which
takes a list of filenames in the constructor, and only returns EOF when
the last has been read: if the list is empty, it reads std::cin. This
provides an elegant and simple solution to your problem:
MultiFileInputStream in(
std::vector<std::string>( argv + 1, argv + argc ) );
std::string const input = slurp( in );
This class is worth writing, as it is generally useful if you often
write Unix-like utility programs. It is definitly not trivial, however,
and may be a lot of work if this is a one-time need.
A more general solution is based on the fact that you can call a
non-const member function on a temporary, and the fact that most of the
member functions of std::istream return a std::istream&—a
non const-reference which will then bind to a non const reference. So
you can always write something like:
std::string const input = slurp(
use_stdin
? std::cin.ignore( 0 )
: std::ifstream( argv[1] ).ignore( 0 ) );
I'd consider this a bit of a hack, however, and it has the more general
problem that you can't check whether the open (called by the constructor
of std::ifstream worked.
More generally, although I understand what you're trying to achieve, I
think you'll find that IO will almost always represent an exception.
You can't read an int without having defined it first, and you can't
read a line without having defined the std::string first. I agree
that it's not as elegant as it could be, but then, code which correctly
handles errors is rarely as elegant as one might like. (One solution
here would be to derive from std::ifstream to throw an exception if
the open didn't work; all you'd need is a constructor which checked for
is_open() in the constructor body.)
All SSA-style languages need to have phi nodes to be usable, realistically. You would run into the same problem in any case where you need to construct from two different types depending on the value of the condition. The ternary operator cannot handle such cases. Of course, in C++11 there are other tricks, like moving the stream or suchlike, or using a lambda, and the design of IOstreams is virtually the exact antithesis of what you're trying to do, so in my opinion, you would just have to make an exception.
Another option might be an intermediate variable to hold the stream:
std::istream&& is = argc==1? std::move(cin) : std::ifstream(argv[1]);
std::string const input = slurp(is);
Taking advantage of the fact that named rvalue references are lvalues.