Passing a file pointer to a function - c++

If I pass a FILE pointer to a function, is it updated?
Can I do something like the following?
FILE* fp;
size_t read, len;
char *key;
fp=fopen((tmpDir+"/"+filename).c_str(),"r");
while((read=getline(&key,&len,fp))!=-1){
if (header_section){
processHeader(fp);
}else{
processBody(fp);
}
}
fclose(fp);
void processHeader(FILE* fp){
size_t read, len;
char *key;
while((read=getline(&key,&len,fp))!=-1){
... do header processing ...
if(strcmp(key,"end_of_header")==0){
return;
}
}
}
void processBody(FILE* fp){
size_t read, len;
char *key;
while((read=getline(&key,&len,fp))!=-1){
... process body data ...
}
}
The above code doesn't work (I get a Segmentation Fault). Is there a way to process parts of a text file in different functions according to the section of the file?

Yes, it is possible to pass a FILE * to a function. After all, various standard C I/O functions accept an argument which is a pointer.
However, FILE is an opaque type. Whether the FILE * points at something (e.g a data structure) which is updated is implementation defined. But if your code is doing things that are valid on a FILE * (e.g. passing it to C I/O functions) then that would not explain a segmentation fault.
The partial code you have supplied is not sufficient to identify the cause of your "segmentation fault". Odds are, if the program is crashing, some code in your program is exhibiting undefined behaviour. But the simple act of passing a FILE *, obtained as a return value from fopen(), as an argument to a function would not be the cause. You need to look at other code in your program.
And, in C++, you would be better off using C++ streams than C I/O functions. But, at most, that will only change the symptom. If other code is the cause of your undefined behaviour, changing method of I/O (assuming you do it correctly) won't fix that.

Related

How does read and write function work in C++ file handling?

I'm learning file handling in c++ from internet alone. I came across the read and write function. But the parameters they take confused me.
So, I found the syntax as
fstream fout;
fout.write( (char *) &obj, sizeof(obj) );
and
fstream fin;
fin.read( (char *) &obj, sizeof(obj) );
In both of these, what is the function of char*?
And how does it read and write the file?
The function fstream::read has the following function signature:
istream& read (char* s, streamsize n);
You need to cast your arguments to the correct type. (char*) tells the compiler to pretend &obj is the correct type. Usually, this is a really bad idea.
Instead, you should do it this way:
// C++ program to demonstrate getline() function
#include <iostream>
#include <string>
using namespace std;
int main()
{
string str;
fstream fin;
getline(fin, str); // use cin instead to read from stdin
return 0;
}
Source: https://www.geeksforgeeks.org/getline-string-c/
The usage of the char * cast with read and write is to treat the obj variable as generic, continuous, characters (ignoring any structure).
The read function will read from the stream directly into the obj variable, without any byte translation or mapping to data members (fields). Note, pointers in classes or structures will be replaced with whatever value comes from the stream (which means the pointer will probably point to an invalid or improper location). Beware of padding issues.
The write function will the entire area of memory, occupied by obj, to the stream. Any padding between structure or class members will also be written. Values of pointers will be written to the stream, not the item that the pointer points to.
Note: these functions work "as-is". There are no conversions or translations of the data. For example, no conversion between Big Endain and Little Endian; no processing of the "end of line" or "end of file" characters. Basically mirror image data transfers.

Segmentation fault std::vector<std::string>

I write a function to return a vector and call it in other functions, in some function, it runs well, but in some, it throws the Segmentation fault.
This is my functionA, it returns a vector
std::vector<std::string> functionA(std::vector<std::string> l){
...
char *list_inorder;
int nnn = sprintf(list_inorder,"%-5d%-35s%-20s%-8s\n",(i),(s.at(1)).c_str(),(s.at(2)).c_str(), (s.at(0)).c_str());
...
}
return result;
}
This is how I call it in other function, I use the same method to call it but some can work some cannot.
std::vector<std::string> vectorA=functionA(vectorB);
You never point list_inorder to anything before you sprintf to it.
That means undefined behavior. Intermittent crashes are expected from undefined behavior.
Since you are using streams everywhere else, why not a stream for the output instead of sprintf? from memory ostringstream, but google will give a better answer than my memory...
A simple fix would be to change char *list_inorder; to char list_inorder[50]; (assumes 50 chars is enough to fit your string) - this is not ideal and using streams would be better.

File loader problems

i have a text file which contains authors and books lists, i need to load it to my program, here is the code of the method which should load it:
void Loader::loadFile(const char* path)
{
FILE* file = fopen(path, "r");
char* bufferString;
while (feof(file) != 1) {
fgets(bufferString, 1000, file);
printf("%s", bufferString);
}
}
I use it in my main file:
int main(int argc, char** argv) {
Loader* loader = new Loader();
loader->loadFile("/home/terayon/prog/parser/data.txt");
return 0;
}
And I get data.txt file is not completely printed.
What I should do to get data completed?
fgets reads into the memory pointed to by the pointer passed as first parameter, bufferString on your case.
But your bufferString is an uninitialised pointer (leading to undefined behaviour):
char * bufferString;
// not initialised,
// and definitely not pointing to valid memory
So you need to provide some memory to read into, e.g by making it an array:
char bufferString[1000];
// that's a bit large to store on the stack
As a side note: Your code is not idiomatic C++. You're using the IO functions provided by the C standard library, which is possible, but using the facilities of the C++ STL would be more appropriate.
You have undefined behavior, you have a pointer bufferString but you never actually make int point anywhere. Since it's not initialized its value will be indeterminate and will seem to be random, meaning you will write to unallocated memory in the fgets call.
It's easy to solve though, declare it as an array, and use the array size when calling fgets:
char bufferString[500];
...
fgets(bufferString, sizeof(bufferString), file);
Besides the problem detailed above, you should not do while(!feof(file)), it will not work as you expect it to. The reason is that the EOF flag is not set until you try to read from beyond the file, leading the loop to iterate once to many.
You should instead do e.g. while (fgets(...) != NULL)
The code you have is not very C++-ish, instead it's using the old C functions for file handling. Instead I suggest you read more about the C++ standard I/O library and std::string which is a auto-expanding string class that won't have the limits of C arrays, and won't suffer from potential buffer overflows in the same way.
The code could then look something like this
std::ifstream input_file(path);
std::string input_buffer;
while (std::getline(input_file, input_buffer))
std::cout << input_buffer << '\n';

What unexpected behaviour can returning a pointer to a char array member cause?

Okay, so. I've been working on a class project (we haven't covered std::string and std::vector yet though obviously I know about them) to construct a time clock of sorts. The main portion of the program expects time and date values as formatted c-strings (e.g. "12:45:45", "12/12/12" etc.), and I probably could have kept things simple by storing them the same way in my basic class. But, I didn't.
Instead I did this:
class UsageEntry {
public:
....
typedef time_t TimeType;
typedef int IDType;
...
// none of these getters are thread safe
// furthermore, the char* the getters return should be used immediately
// and then discarded: its contents will be modified on the next call
// to any of these functions.
const char* getUserID();
const char* getDate();
const char* getTimeIn();
const char* getTimeOut();
private:
IDType m_id;
TimeType m_timeIn;
TimeType m_timeOut;
char m_buf[LEN_MAX];
};
And one of the getters (they all do basically the same thing):
const char* UsageEntry::getDate()
{
strftime(m_buf, LEN_OF_DATE, "%D", localtime(&m_timeIn));
return m_buf;
}
And here is a function that uses this pointer:
// ==== TDataSet::writeOut ====================================================
// writes an entry to the output file
void TDataSet::writeOut(int index, FILE* outFile)
{
// because of the m_buf kludge, this cannot be a single
// call to fprintf
fprintf(outFile, "%s,", m_data[index].getUserID());
fprintf(outFile, "%s,", m_data[index].getDate());
fprintf(outFile, "%s,", m_data[index].getTimeIn());
fprintf(outFile, "%s\n", m_data[index].getTimeOut());
fflush(outFile);
} // end of TDataSet::writeOut
How much trouble will this cause? Or to look at it from another angle, what other sorts of interesting and !!FUN!! behaviour can this cause? And, finally, what can be done to fix it (besides the obvious solution of using strings/vectors instead)?
Somewhat related: How do the C++ library functions that do similar things handle this? e.g. localtime() returns a pointer to a struct tm object, which somehow survives the end of that function call at least long enough to be used by strftime.
There is not enough information to determine if it will cause trouble because you do not show how you use it. As long as you document the caveats and keep them in mind when using your class, there won't be issues.
There are some common gotchas to watch out for, but hopefully these are common sense:
Deleting the UsageEntry will invalidate the pointers returned by your getters, since those buffers will be deleted too. (This is especially easy to run into if using locally declared UsageEntrys, as in MadScienceDream's example.) If this is a risk, callers should create their own copy of the string. Document this.
It does not look like m_timeIn is const, and therefore it may change. Calling the getter will modify the internal buffer and these changes will be visible to anything that has that pointer. If this is a risk, callers should create their own copy of the string. Document this.
Your getters are neither reentrant nor thread-safe. Document this.
It would be safer to have the caller supply a destination buffer and length as a parameter. The function can return a pointer to that buffer for convenience. This is how e.g. read works.
A strong API can avoid issues. Failing that, good documentation and common sense can also reduce the chance of issues. Behavior is only unexpected if nobody expects it, this is why documentation about the behavior is important: It generally eliminates unexpected behavior.
Think of it like the "CAUTION: HOT SURFACE" warning on top of a toaster oven. You could design the toaster oven with insulation on top so that an accident can't happen. Failing that, the least you can do is put a warning label on it and there probably won't be an accident. If there's neither insulation nor a warning, eventually somebody will burn themselves.
Now that you've edited your question to show some documentation in the header, many of the initial risks have been reduced. This was a good change to make.
Here is an example of how your usage would change if user-supplied buffers were used (and a pointer to that buffer returned):
// ==== TDataSet::writeOut ====================================================
// writes an entry to the output file
void TDataSet::writeOut(int index, FILE* outFile)
{
char userId[LEN_MAX], date[LEN_MAX], timeIn[LEN_MAX], timeOut[LEN_MAX];
fprintf(outFile, "%s,%s,%s,%s\n",
m_data[index].getUserID(userId, sizeof(userId)),
m_data[index].getDate(date, sizeof(date)),
m_data[index].getTimeIn(timeIn, sizeof(timeIn)),
m_data[index].getTimeOut(timeOut, sizeof(timeOut))
);
fflush(outFile);
} // end of TDataSet::writeOut
How much trouble will this cause? Or to look at it from another angle,
what other sorts of interesting and !!FUN!! behaviour can this cause?
And, finally, what can be done to fix it (besides the obvious solution
of using strings/vectors instead)?
Well there is nothing very FUN here, it just means that the results of your getter cannot outlive the corresponding instance of UsageEntry or you have a dangling pointer.
How do the C++ library functions that do similar things handle this?
e.g. localtime() returns a pointer to a struct tm object, which
somehow survives the end of that function call at least long enough to
be used by strftime.
The documentation of localtime says:
Return value
pointer to a static internal std::tm object on success, or NULL otherwise. The structure may be shared between
std::gmtime, std::localtime, and std::ctime, and may be overwritten on
each invocation.
The main problem here, as the main problem with most pointer based code, is the issue of ownership. The problem is the following:
const char* val;
{
UsageEntry ue;
val = ue.getDate();
}//ue goes out of scope
std::cout << val << std::endl;//SEGFAULT (maybe, really nasal demons)
Because val is actually owned by ue, you shoot yourself in the foot if they exist in different scopes. You COULD document this, but it is oh-so-much simpler to pass the buffer in as an argument (just like the strftime function does).
(Thanks to odedsh below for pointing this one out)
Another issue is that subsequent calls will blow away the info gained. The example odesh used was
fprintf(outFile, "%s\n%s",ue.getUserID(), ue.getDate());
but the problem is more pervasive:
const char* id = ue.getUserID();
const char* date = ue.getDate();//Changes id!
This violates the "Principal of Least Astonishment" becuase...well, its weird.
This design also breaks the rule-of-thumb that each class should do exactly one thing. In this case, UsageEntry both provides accessors to get the formatted time as a string, AND manages that strings buffer.

Multi level pointer as argument c++

I will use a pseudo example here, though I have noticed this behaviour in several APIs, like sqlite3 or windows.
Say a function is declared like so:
void Fu(some_identifier **ppBar);
and I do this in my code:
some_identifier **ppFubar;
fu(ppFubar);
It is my understanding that this would work and indeed it does in my own functions. Yet my program crashes after a buffer overflow when I do this with some APIs.
If I do this:
some_identifier *pFubar;
fu(&pFubar);
everything is fine.
Do ppFubar and &pFubar not evaluate to the exact same thing?
EDIT:
A concrete example would be (fourth argument):
int sqlite3_prepare(
sqlite3 *db, /* Database handle */
const char *zSql, /* SQL statement, UTF-8 encoded */
int nByte, /* Maximum length of zSql in bytes. */
sqlite3_stmt **ppStmt, /* OUT: Statement handle */
const char **pzTail /* OUT: Pointer to unused portion of zSql */
);
Your understanding is wrong.
If a function takes a some_identifier **ppFubar; parameter it is probably going to do something related to a some_identifier object somewhere inside its function body.
If you call it with some_identifier **ppFubar; you are giving it an uninitialized pointer, i.e. a pointer to garbage. If the function does anything with it (for example, it dereferences it, either once or twice), you are incurring in undefined behaviour (most likely, it will crash).
Pass a correctly initialized pointer to the function.