I am reading a file header using ifstream.
Edit: I was asked to put the full minimal program, so here it is.
#include <iostream>
#include <fstream>
using namespace std;
#pragma pack(push,2)
struct Header
{
char label[20];
char st[11];
char co[7];
char plusXExtends[9];
char minusXExtends[9];
char plusYExtends[9];
};
#pragma pack(pop)
int main(int argc,char* argv[])
{
string fileName;
fileName = "test";
string fileInName = fileName + ".dst";
ifstream fileIn(fileInName.c_str(), ios_base::binary|ios_base::in);
if (!fileIn)
{
cout << "File Not Found" << endl;
return 0;
}
Header h={};
if (fileIn.is_open()) {
cout << "\n" << endl;
fileIn.read(reinterpret_cast<char *>(&h.label), sizeof(h.label));
cout << "Label: " << h.label << endl;
fileIn.read(reinterpret_cast<char *>(&h.st), sizeof(h.st));
cout << "Stitches: " << h.st << endl;
fileIn.read(reinterpret_cast<char *>(&h.co), sizeof(h.co));
cout << "Colour Count: " << h.co << endl;
fileIn.read(reinterpret_cast<char *>(&h.plusXExtends),sizeof(h.plusXExtends));
cout << "Extends: " << h.plusXExtends << endl;
fileIn.read(reinterpret_cast<char *>(&h.minusXExtends),sizeof(h.minusXExtends));
cout << "Extends: " << h.minusXExtends << endl;
fileIn.read(reinterpret_cast<char *>(&h.plusYExtends),sizeof(h.plusYExtends));
cout << "Extends: " << h.plusYExtends << endl;
// This will output corrupted
cout << endl << endl;
cout << "Label: " << h.label << endl;
cout << "Stitches: " << h.st << endl;
cout << "Colour Count: " << h.co << endl;
cout << "Extends: " << h.plusXExtends << endl;
cout << "Extends: " << h.minusXExtends << endl;
cout << "Extends: " << h.plusYExtends << endl;
}
fileIn.close();
cout << "\n";
//cin.get();
return 0;
}
ifstream fileIn(fileInName.c_str(), ios_base::binary|ios_base::in);
Then I use a struct to store the header items
The actual struct is longer than this. I shortened it because I didn't need the whole struct for the question.
Anyway as I read the struct I do a cout to see what I am getting. This part is fine.
As expected my cout shows the Label, Stitches, Colour Count no problem.
The problem is that if I want to do another cout after it has read the header I am getting corruption in the output. For instance if I put the following lines right after the above code eg
Instead of seeing Label, Stitches and Colour Count I get strange symbols, and corrupt output. Sometimes you can see the output of the h.label, with some corruption, but the labels are Stitches are written over. Sometimes with strange symbols, but sometimes with text from the previous cout. I think either the data in the struct is getting corrupted, or the cout output is getting corrupted, and I don't know why. The longer the header the more the problem becomes apparent. I would really like to do all the couts at the end of the header, but if I do that I see a big mess instead of what should be outputting.
My question is why is my cout becoming corrupted?
Using arrays to store strings is dangerous because if you allocate 20 characters to store the label and the label happens to be 20 characters long, then there is no room to store a NUL (0) terminating character. Once the bytes are stored in the array there's nothing to tell functions that are expecting null-terminated strings (like cout) where the end of the string is.
Your label has 20 chars. That's enough to store the first 20 letters of the alphabet:
ABCDEFGHIJKLMNOPQRST
But this is not a null-terminated string. This is just an array of characters. In fact, in memory, the byte right after the T will be the first byte of the next field, which happens to be your 11-character st array. Let's say those 11 characters are: abcdefghijk.
Now the bytes in memory look like this:
ABCDEFGHIJKLMNOPQRSTabcdefghijk
There's no way to tell where label ends and st begins. When you pass a pointer to the first byte of the array that is intended to be interpreted as a null-terminated string by convention, the implementation will happily start scanning until it finds a null terminating character (0). Which, on subsequent reuses of the structure, it may not! There's a serious risk of overrunning the buffer (reading past the end of the buffer), and potentially even the end of your virtual memory block, ultimately causing an access violation / segmentation fault.
When your program first ran, the memory of the header structure was all zeros (because you initialized with {}) and so after reading the label field from disk, the bytes after the T were already zero, so your first cout worked correctly. There happened to be a terminating null character at st[0]. You then overwrite this when you read the st field from disk. When you come back to output label again, the terminator is gone, and some characters of st will get interpreted as belonging to the string.
To fix the problem you probably want to use a different, more practical data structure to store your strings that allows for convenient string functions. And use your raw header structure just to represent the file format.
You can still read the data from disk into memory using fixed sized buffers, this is just for staging purposes (to get it into memory) but then store the data into a different structure that uses std::string variables for convenience and later use by your program.
For this you'll want these two structures:
#pragma pack(push,2)
struct RawHeader // only for file IO
{
char label[20];
char st[11];
char co[7];
char plusXExtends[9];
char minusXExtends[9];
char plusYExtends[9];
};
#pragma pack(pop)
struct Header // A much more practical Header struct than the raw one
{
std::string label;
std::string st;
std::string co;
std::string plusXExtends;
std::string minusXExtends;
std::string plusYExtends;
};
After you read the first structure, you'll transfer the fields by assigning the variables. Here's a helper function to do it.
#include <string>
#include <string.h>
template <int n> std::string arrayToString(const char(&raw)[n]) {
return std::string(raw, strnlen_s(raw, n));
}
In your function:
Header h;
RawHeader raw;
fileIn.read((char*)&raw, sizeof(raw));
// Now marshal all the fields from the raw header over to the practical header.
h.label = arrayToString(raw.label);
h.st = arrayToString(raw.st);
h.st = arrayToString(raw.st);
h.co = arrayToString(raw.co);
h.plusXExtends = arrayToString(raw.plusXExtends);
h.minusXExtends = arrayToString(raw.minusXExtends);
h.plusYExtends = arrayToString(raw.plusYExtends);
It's worth mentioning that you also have the option of keeping the raw structure around and not copying your raw char arrays to std::strings when you read the file. But you must then be certain that when you want to use the data, you always to compute and pass lengths of the strings to functions that will deal with those buffers as string data. (Similar to what my arrayToString helper does anyway.)
Related
in this minimal example there is a weird messing up between the input to a stringstream and the content of a previously used cout:
online gdb:
https://onlinegdb.com/itO69QGAE
code:
#include <string>
#include <iostream>
#include <sstream>
using namespace std;
const char sepa[] = {':', ' '};
const char crlf[] = {'\r', '\n'};
int main()
{
cout<<"Hello World" << endl;
stringstream s;
string test1 = "test_01";
string test2 = "test_02";
s << test1;
cout << s.str() << endl;
// works as expected
// excpecting: "test_01"
// output: "test_01"
s << sepa;
cout << s.str() << endl;
// messing up with previous cout output
// expecting: "test_01: "
// output: "test_01: \nHello World"
s << test2;
cout << s.str() << endl;
// s seems to be polluted
// expecting: "test_01: test_02"
// output: "test_01: \nHello Worldtest_02"
s << crlf;
cout << s.str() << endl;
// once again messing up with the cout content
// expecting: "test_01: test_02\r\n"
// output: "test_01: Hello Worldtest_02\r\nHello World"
return 0;
}
So I am wondering why is this happing?
As it only happens when a char array is pushed into the stringstream it's likely about this... but according to the reference the stringstream's "<<"-operator can/should handle char* (what actually the name of this array stand's for).
Beside that there seems to be a (?hidden, or at least not obvious?) relation between stringstream and cout. So why does the content pollute into the stringstream?
Is there any wrong/foolish usage in this example or where is the dog buried (-> german idiom :P )?
Best regards and thanks
Damian
P.S. My question is not about "fixing" this issue like using a string instead of the char array (this will work)... it's about comprehend the internal mechanics and why this is actually happing, because for me this is just an unexpected behaviour.
The std::stringstream::str() function returns a string containing all characters previously written into the stream, in all previous calls to operator<< (or other output functions). However it seems that you expect that only the last output operation will be returned - this is not the case.
This is analogous to how e.g. std::cout works: each invocation of std::cout << appends the string to standard output; it does not clear the console's screen.
To achieve what you want, you either need to use a separate std::stringstream instance every time:
std::stringstream s1;
s1 << test1;
std::cout << s1.str() << std::endl;
std::stringstream s2;
s2 << sepa;
std::cout << s2.str() << std::endl;
Or better, clear the contents of the std::stringstream using the single argument overload of the str() function:
std::stringstream s;
s << test1;
std::cout << s.str() << std::endl;
// reset the contents of s to an empty string
s.str("");
s << sepa;
std::cout << s.str() << std::endl;
The s.str("") call effectively discards all characters previously written into the stream.
Note, that even though std::stringstream contains a clear() function that would seem a better candidate, it's not analogous to e.g. std::string::clear() or std::vector::clear() and won't yield the effect desired in your case.
Here I am again,
Thanks to "Some programmer dude"'s comment I think I figured it out:
As there is no (null-)termination-symbol related to both char arrays it seems that the stringstream-<<-operator inserts until it stumbles over an null-terminator '\0'.
Either expending both arrays with a \0-symbol (e.g. const char sepa[] = {':', ' ', '\0'}) or terminating the length with e.g. s << string(sepa,2) will do the expected output.
In this specific case above the data seems to lay aligned in memory, so that the next null-terminator will be found inside the cout << "Hello World"-statement. As this alignment is not guaranteed, this will actually result in undefined behaviour, when the termination is missing.
So also two additional "terminating"-arrays like e.g const char sepa[] = {':', ' '}; char[] end_of_sepa = {'\0'}; declared right after the mentioned arrays will result in expected output, eventhough when the rest will be left unchanged... but this is probably not guaranteed and depends on the internal representation in memory.
P.S. As previously written this issue is not about fixing but comprehension. So please feel free to confirm or correct my assumption.
EDIT: Corrected the bold code section.
I am currently learning to program in c++. I am making my way through a programming project I found online, and try and recreate it line by line looking up why certain things work the way they do. The project is a simple hotel booking system that has a menu system and saves users input i.e name, address, phone number etc.
I have been looking about trying understand what certain parts of this code do. I want to take a users input and save it to a .dat file, however it doesnt seem to work and im not sure why. Is there a better way to read and write to a text file.
This is the function that deals with checking if a room is free or reserved:
#include <fstream>
#include "Hotel.h"
int Hotel::check_availabilty(int room_type){
int flag = 0;
std::ifstream room_check("Room_Bookings.dat",std::ios::in);
while(!room_check.eof()){
room_check.read((char*)this, sizeof(Hotel));
//if room is already taken
if(room_no == room_type){
flag = 1;
break;
}
}
room_check.close();//close the ifstream
return(flag);//return result
}
This is the code that books a room:
#include "Hotel.h"
#include "check_availability.cpp"
void Hotel::book_a_room()
{
system("CLS");//this clears the screen
int flag;
int room_type;
std::ofstream room_Booking("Room_Bookings.dat");
std::cout << "\t\t" << "***********************" << "\n";
std::cout << "\t\t " << "THE GREATEST HOTEL" << "\n";
std::cout << "\t\t" << "***********************" << "\n";
std::cout << "\t\t " <<"Type of Rooms "<< "\t\t Room Number" "\n";
std::cout << "\t\t" << " Standard" << "\t\t 1 - 30" "\n";
std::cout << "\t\t" << " Luxury" << "\t\t\t 31 - 45" "\n";
std::cout << "\t\t" << " Royal" << "\t\t\t 46 - 50" "\n";
std::cout << "Please enter room number: ";
std::cin >> room_type;
flag = check_availabilty(room_type);
if(flag){
std::cout << "\n Sorry, that room isn't available";
}
else{
room_no = room_type;
std::cout<<" Name: ";
std::cin>>name;
std::cout<<" Address: ";
std::cin>>address;
std::cout<<" Phone No: ";
std::cin>>phone;
room_Booking.write((char*)this,sizeof(Hotel));
std::cout << "Your room is booked!\n";
}
std::cout << "Press any key to continue...";
getch();
room_Booking.close();
}
And this is the Hotel.h file
class Hotel
{
int room_no;
char name[30];
char address[50];
char phone[10];
public:
void main_menu();
void book_a_room();
int check_availabilty(int);
void display_details();
};
I dont fully understand what this part of the while loop does:
room_check.read((char*)this, sizeof(Hotel));
If you need any more info, please ask.
Any hints and tips towards making this better would be welcomed.
Hotel is an entirely self-contained type with no heap allocations or references to external objects. Therefore, one can serialize its state by simply writing out the object's representation in memory, and deserialize by doing the opposite.
room_check.read((char*)this, sizeof(Hotel));
This line of code asks the room_check input stream to read sizeof(Hotel) bytes and store them directly where the Hotel object pointed to by this lives. Effectively, you're restoring the memory contents as they were before being written to disk.
(Note that (char*)this is better written as reinterpret_cast<char *>(this) in C++.)
That's the inverse of this operation:
room_Booking.write((char*)this,sizeof(Hotel));
There's some advantages to serializing this way instead of creating your own data structure.
It's really easy; with one line of code you can serialize, and with another you can deserialize.
Serialization and deserialization is very fast since there is no parsing or conversion happening.
However, there are also some disadvantages:
The on-disk format is dictated by the layout of objects in memory. If you reorder or change any class data members, old serialized objects will not load correctly any more. Moreover, the read operation will succeed but you'll be left with a garbage object state.
You depend on the endianness of the host machine for number types. A data file created on a little-endian machine will be useless on a big-endian machine.
It's very easy to accidentally create a security vulnerability. For example, with just this code, an attacker could easily craft a .dat file that causes out-of-bounds reads when you go to read the "string" (character array) members by simply not NUL-terminating any of those arrays.
Using a different serialization mechanism, such as leveraging JSON, XML, protocol buffers, etc. requires more work but the results are more portable because your data structure on disk is no longer tied to the object's layout in memory.
when doing
room_check.read((char*)this, sizeof(Hotel));
you are reading from the stream, and write in the buffer, but you need to specify how many characters must be read...
room_check contains the data as a stream. room_check.read((char*)this, sizeof(Hotel)); reads the data from the stream and stores it in the current instance of Hotel (this). sizeof(Hotel) tells the function how many bites should be read from the stream.
room_check contains the data of the class Hotel in the order it is listed in the class declaration:
int room_no;
char name[30];
char address[50];
char phone[10];
With this declaration the byte-size of an instance of hotel is known: sizeof(1*int + 30*char + 50*char + 10*char). The content of the dat-file is stored int the members of this very current instance of Hotel.
I need to create a string capable of holding the entire book 'The Hunger Games' which comes out to around 100500 words. My code can capture samples of the txt, but anytime I exceed a string size of 36603(tested), I receive a 'stack overflow' error.
I can successfully capture anything below 36603 elements and can output them perfectly.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
int i;
char set[100];
string fullFile[100000]; // this will not execute if set to over 36603
ifstream myfile("HungerGames.txt");
if (myfile.is_open())
{
// saves 'i limiter' words from the .txt to fullFile
for (i = 0; i < 100000; i++) {
//each word is saparated by a space
myfile.getline(set, 100, ' ');
fullFile[i] = set;
}
myfile.close();
}
else cout << "Unable to open file";
//prints 'i limiter' words to window
for (i = 0; i < 100000; ++i) {
cout << fullFile[i] << ' ';
}
What is causing the 'stack overflow' and how can I successfully capture the txt? I will later be doing a word counter and word frequency counter, so I need it in "word per element" form.
There's a limit on how much stack is used in a function; Use std::vector instead.
More here and here. The default in Visual studio is 1MB (more info here) and you can change it with /F, but this is a bad idea generally.
My system is Lubuntu 18.04, with g++ 7.3. The following snippet shows some "implementation details" of my system, and how to report them on yours. It would help you to understand what your system provides ...
void foo1()
{
int i; // Lubuntu
cout << "\n sizeof(i) " << sizeof(i) << endl; // 4 bytes
char c1[100];
cout << "\n sizeof(c1) " << sizeof(c1) << endl; // 100 bytes
string s1; // empty string
cout << "\n s1.size() " << s1.size() // 0 bytes
<< " sizeof(s1) " << sizeof(s1) << endl; // 32 bytes
s1 = "1234567890"; // now has 10 chars
cout << "\n s1.size() " << s1.size() // 10 bytes
<< " sizeof(s1) " << sizeof(s1) << endl; // 32 bytes
string fullFile[100000]; // this is an array of 100,000 strings
cout << "\n sizeof(fullFile) " // total is vvvvvvvvv
<< sops.digiComma(sizeof(fullFile)) << endl; // 3,200,000 bytes
uint64_t totalChars = 0;
for( auto ff : fullFile ) totalChars += ff.size();
cout << "\n total chars in all strings " << totalChars << endl;
}
What is causing the 'stack overflow' and how can I successfully
capture the txt?
The fullFile array is an unfortunate choice ... because each std::string, even when empty, consumes 32 bytes of automatic memory (~stack), for a total of 3,200,000 bytes, and this is with no data in the strings! This will stack overflow your system when the stack is smaller than the automatic var space.
On Lubuntu the default automatic-memory size (lately) is 10 M Bytes, so not a problem for me. But you will have to check on what your version of your target os defaults to. I think Windows defaults down near 1 M Byte. (Sorry, I don't know how to check Windows automatic-memory size.)
How can I make a string capable of capturing my entire .txt file.
The answer is -- you don't need to make your own. (unless you have some unstated requirement)
Also, you really should look at en.cppreference.com/w/cpp/string/basic_string/append".
In my 1st snippet above, you should take notice that the sizeof(string) reports 32 bytes, regardless of how many chars are in it.
Think on that a while ... if you put 1000 chars into a string, where do they go? The objects stays at 32 bytes! You might guess or read that the string object handles memory management on your behalf, and puts all characters into dynamic-memory (heap).
On my system, heap is about 4 G bytes. That's a lot more than stack.
In summary, every single std::string expands auto-magically, using heap, so if your text input will fit in heap, it will fit into '1 std::string'.
While browsing around in the cppreference, check out the 'string::reserve()' command.
Conclusion:
Any std::string you declare can auto-magically 'grow' to support your need, and will thus hold the entire text (if it will fit in memory).
Operationally, you simply get a line of text from the file, then append it to the single string, until the entire file is contained. You only need the one array, which std::string provides.
With this new idea ... I suggest you change fullFile from an array to a string.
string fullFile; // file will expand to handle append actions
// to the limit of available heap.
// open file ... check status
do {
myfile.getline(line); // fetch line of text up thru the line feed
// Note that getline does not put the \n into 'line'
// there are file state checks that should be done (perhaps here?)
// tbd - line += '\n';
// you may need the line feed in your fullFile string?
fullFile += line; // append the line
} while (!myfile.eof); // check for eof
// ... other file cleanup.
foo1() output on Lubuntu 18.04, g++ v7.3
sizeof(i) 4
sizeof(c1) 100
s1.size() 0 sizeof(s1) 32
s1.size() 10 sizeof(s1) 32
sizeof(fullFile) 3,200,000
total chars in all strings 0
Example slurp() :
string slurp(ifstream& sIn)
{
stringstream ss;
ss << sIn.rdbuf();
dtbAssert(!sIn.bad());
if(sIn.bad())
throw "\n DTB::slurp(sIn) 'ss << sIn.rdbuf()' is bad";
ss.clear(); // clear flags
return ss.str();
}
I am currently reading a binary file that i know the structure of and i am trying to place into a struct but when i come to read off the binary file i am finding that when it prints out the struc individually it seems to come out right but then on the fourth read it seems to add it onto last member from the last read.
here the code which probably make's more sense than how i am explaining it:
Struc
#pragma pack(push, r1, 1)
struct header
{
char headers[13];
unsigned int number;
char date[19];
char fws[16];
char collectversion[12];
unsigned int seiral;
char gain[12];
char padding[16];
};
Main
header head;
int index = 0;
fstream data;
data.open(argv[1], ios::in | ios::binary);
if(data.fail())
{
cout << "Unable to open the data file!!!" << endl;
cout << "It looks Like Someone Has Deleted the file!"<<endl<<endl<<endl;
return 0;
}
//check the size of head
cout << "Size:" << endl;
cout << sizeof(head) << endl;
data.seekg(0,std::ios::beg);
data.read( (char*)(&head.headers), sizeof(head.headers));
data.read( (char*)(&head.number), sizeof(head.number));
data.read( (char*)(&head.date), sizeof(head.date));
data.read( (char*)head.fws, sizeof(head.fws));
//Here im just testing to see if the correct data went in.
cout<<head.headers<< endl;
cout<<head.number<< endl;
cout<<head.date<< endl;
cout<<head.fws<< endl;
data.close();
return 0;
Output
Size:
96
CF001 D 01.00
0
15/11/2013 12:16:56CF10001001002000
CF10001001002000
for some reason the fws seems to add to head.date? but when i take out the line to read head.fws i get a date that doesn't have anything added?
i also know thier more data to get for the header but i wanted to check the data up to what i have written is correct
cheers
1. Your date is declared as:
char date[19];
2. Your date format is exactly 19-characters long:
15/11/2013 12:16:56
3. And you print it this way:
cout<<head.date
Shortly speaking, you try to print fixed char[] using its address, which means, that it will be interpreted as null-terminated c-string. Is it null-terminated? No.
To solve this problem, declare date as:
char date[20];
And after you fill it, append null terminator:
date[19] = 0;
It applies to all members, that will be interpreted as string literals.
You have char date[19] filled with 15/11/2013 12:16:56 which is exactly 19 valid characters. This leaves no space for a terminating null and so doing cout << head.date outputs your 19 valid characters and then a load of garbage.
I am new in C++. I generally program in C#, so I'm having troubles with arrays and loops. When I try to print content of dynamic array using a loop, it says corrupted requested area... For example I will give it recognize the condition used with content of array but doesn't print content of it:
// Array.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <iostream>
using namespace std;
void main()
{
int size=3;
int *p;
int myarray[10];
myarray[3]=4;
p=new int[size];
p[2]=3;
if(myarray[3]==4){
cout << myarray[3] +"/n";
cout << "Why?";
}
else
cout << "Not equal " << endl;
cin.get();
delete [] p;
}
Code looks fine, unless it should be
cout << myarray[3] << "\n";
Not +
The problem is that myarray[3] +"\n".
"\n" represents the memory location of the string "\n".
You are trying to add 4 to that location and printing it. This should give you junk data or a hardware exception (resulting in a coredump) if you are accessing a protected memory location.
To get what (i think) you are asking for do,
cout << myarray[3] << '\n'
While a solution has been given:
cout << myarray[3] << "\n"
the point to get is that myarray[3] is an integer while "\n" is a string and the only way to "add" them together as strings is to first make the integer into a string. The << operator will handle the work of converting myarray[3] into a string, nothing special, and then the second << pumps a new line after it. I personally prefer code like this and find it more flexible, but it may be more that you're looking for at this stage of learning:
printf("%i\n", myarray[3]);
where printf searches for flags and loads in the other arguments as strings and outputs it in one command.