How to read a string from a binary file? - c++

I am trying to read a string(ver) from a binary file. the number of characters(numc) in the string is also read from the file.This is how I read the file:
uint32_t numc;
inFile.read((char*)&numc, sizeof(numc));
char* ver = new char[numc];
inFile.read(ver, numc);
cout << "the version is: " << ver << endl;
what I get is the string that I expect plus some other symbols. How can I solve this problem?

A char* string is a nul terminated sequence of characters. Your code ignores the nul termination part. Here's how it should look
uint32_t numc;
inFile.read((char*)&numc, sizeof(numc));
char* ver = new char[numc + 1]; // allocate one extra character for the nul terminator
inFile.read(ver, numc);
ver[numc] = '\0'; // add the nul terminator
cout << "the version is: " << ver << endl;
Also sizeof(numc) not size(numc) although maybe that's a typo.

Related

A 'stack overflow' error returns upon any array size I enter above 36603. How can I make a string capable of capturing my entire .txt file?

I need to create a string capable of holding the entire book 'The Hunger Games' which comes out to around 100500 words. My code can capture samples of the txt, but anytime I exceed a string size of 36603(tested), I receive a 'stack overflow' error.
I can successfully capture anything below 36603 elements and can output them perfectly.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
int i;
char set[100];
string fullFile[100000]; // this will not execute if set to over 36603
ifstream myfile("HungerGames.txt");
if (myfile.is_open())
{
// saves 'i limiter' words from the .txt to fullFile
for (i = 0; i < 100000; i++) {
//each word is saparated by a space
myfile.getline(set, 100, ' ');
fullFile[i] = set;
}
myfile.close();
}
else cout << "Unable to open file";
//prints 'i limiter' words to window
for (i = 0; i < 100000; ++i) {
cout << fullFile[i] << ' ';
}
What is causing the 'stack overflow' and how can I successfully capture the txt? I will later be doing a word counter and word frequency counter, so I need it in "word per element" form.
There's a limit on how much stack is used in a function; Use std::vector instead.
More here and here. The default in Visual studio is 1MB (more info here) and you can change it with /F, but this is a bad idea generally.
My system is Lubuntu 18.04, with g++ 7.3. The following snippet shows some "implementation details" of my system, and how to report them on yours. It would help you to understand what your system provides ...
void foo1()
{
int i; // Lubuntu
cout << "\n sizeof(i) " << sizeof(i) << endl; // 4 bytes
char c1[100];
cout << "\n sizeof(c1) " << sizeof(c1) << endl; // 100 bytes
string s1; // empty string
cout << "\n s1.size() " << s1.size() // 0 bytes
<< " sizeof(s1) " << sizeof(s1) << endl; // 32 bytes
s1 = "1234567890"; // now has 10 chars
cout << "\n s1.size() " << s1.size() // 10 bytes
<< " sizeof(s1) " << sizeof(s1) << endl; // 32 bytes
string fullFile[100000]; // this is an array of 100,000 strings
cout << "\n sizeof(fullFile) " // total is vvvvvvvvv
<< sops.digiComma(sizeof(fullFile)) << endl; // 3,200,000 bytes
uint64_t totalChars = 0;
for( auto ff : fullFile ) totalChars += ff.size();
cout << "\n total chars in all strings " << totalChars << endl;
}
What is causing the 'stack overflow' and how can I successfully
capture the txt?
The fullFile array is an unfortunate choice ... because each std::string, even when empty, consumes 32 bytes of automatic memory (~stack), for a total of 3,200,000 bytes, and this is with no data in the strings! This will stack overflow your system when the stack is smaller than the automatic var space.
On Lubuntu the default automatic-memory size (lately) is 10 M Bytes, so not a problem for me. But you will have to check on what your version of your target os defaults to. I think Windows defaults down near 1 M Byte. (Sorry, I don't know how to check Windows automatic-memory size.)
How can I make a string capable of capturing my entire .txt file.
The answer is -- you don't need to make your own. (unless you have some unstated requirement)
Also, you really should look at en.cppreference.com/w/cpp/string/basic_string/append".
In my 1st snippet above, you should take notice that the sizeof(string) reports 32 bytes, regardless of how many chars are in it.
Think on that a while ... if you put 1000 chars into a string, where do they go? The objects stays at 32 bytes! You might guess or read that the string object handles memory management on your behalf, and puts all characters into dynamic-memory (heap).
On my system, heap is about 4 G bytes. That's a lot more than stack.
In summary, every single std::string expands auto-magically, using heap, so if your text input will fit in heap, it will fit into '1 std::string'.
While browsing around in the cppreference, check out the 'string::reserve()' command.
Conclusion:
Any std::string you declare can auto-magically 'grow' to support your need, and will thus hold the entire text (if it will fit in memory).
Operationally, you simply get a line of text from the file, then append it to the single string, until the entire file is contained. You only need the one array, which std::string provides.
With this new idea ... I suggest you change fullFile from an array to a string.
string fullFile; // file will expand to handle append actions
// to the limit of available heap.
// open file ... check status
do {
myfile.getline(line); // fetch line of text up thru the line feed
// Note that getline does not put the \n into 'line'
// there are file state checks that should be done (perhaps here?)
// tbd - line += '\n';
// you may need the line feed in your fullFile string?
fullFile += line; // append the line
} while (!myfile.eof); // check for eof
// ... other file cleanup.
foo1() output on Lubuntu 18.04, g++ v7.3
sizeof(i) 4
sizeof(c1) 100
s1.size() 0 sizeof(s1) 32
s1.size() 10 sizeof(s1) 32
sizeof(fullFile) 3,200,000
total chars in all strings 0
Example slurp() :
string slurp(ifstream& sIn)
{
stringstream ss;
ss << sIn.rdbuf();
dtbAssert(!sIn.bad());
if(sIn.bad())
throw "\n DTB::slurp(sIn) 'ss << sIn.rdbuf()' is bad";
ss.clear(); // clear flags
return ss.str();
}

Netbeans C++ not printing UTF-8 characters

Here is the very simple C++ code:
char a00 = 'Z';
char a01 = '\u0444';
char a02[5] = {'H','e','l','l','o'};
char a03[] = {'W','o','r','l','d','\0','Z','Z'};
cout << "Simple char: " << a00
<< "\nUTF-8 char: " << a01
<< "\nFull char array: " << a02
<< "\n2nd in char array: " << a02[1]
<< "\nWith null character: " << a03 << endl;
My problem is when Netbeans 8.1 tries to show the output of such a program, it does not create the UTF-8 character.
The character should look like this: ф (see: link)
Instead, I get the following output:
(image)
I have tried adding -J-Dfile.encoding=UTF-8 to netbeans_default-options inside the netbeans.conf file located at inside the etc folder. It made no difference.
UTF-8 is a multibyte character encoding which means most of the characters occupy several bytes. So a single char is not enough to hold most UTF-8 characters.
You can store them in a string like this:
std::string s = "\u0444";

Extra characters on cstring when cout

I have a char[4] dataLabel that when I say
wav.read(dataLabel, sizeof(dataLabel));//Read data label
cout << "Data label:" <<dataLabel << "\n";
I get the output Data label:data� but when I loop through each char I get the correct output, which should be "data".
for (int i = 0; i < sizeof(dataLabel); ++i) {
cout << "Data label " << i << " " << dataLabel[i] << "\n";
}
The sizeof returns 4. I'm at a loss for what the issue is.
EDIT: What confuses me more is that essentially the same code from earlier in my program works perfectly.
ifstream wav;
wav.open("../../Desktop/hello.wav", ios::binary);
char riff[4]; //Char to hold RIFF header
if (wav.is_open()) {
wav.read(riff, sizeof(riff));//Read RIFF header
if ((strcmp(riff, "RIFF"))!=0) {
fprintf(stderr, "Not a wav file");
exit(1);
}
else {
cout << "RIFF:" << riff << "\n";
This prints RIFF:RIFF as intended.
You are missing a null terminator on your character array. Try making it 5 characters and making the last character '\0'. This lets the program know that your string is done without needing to know the size.
What is a null-terminated string?
The overload of operator<< for std::ostream for char const* expects a null terminated string. You are giving it an array of 4 characters.
Use the standard library string class instead:
std::string dataLabel;
See the documentation for istream::read; it doesn't append a null terminator, and you're telling it to read exactly 4 characters. As others have indicated, the << operator is looking for a null terminator so it's continuing to read past the end of the array until it finds one.
I concur with the other suggested answer of using std::string instead of char[].
Your char[] array is not null-terminated, but the << operator that accepts char* input requires a null terminator.
char dataLabel[5];
wav.read(dataLabel, 4); //Read data label
dataLabel[4] = 0;
cout << "Data label:" << dataLabel << "\n";
Variable dataLabel is defined like
char[4] dataLabel;
that it has only four characters that were filled with characters { 'd', 'a', 't', 'a' ) in statement
wav.read(dataLabel, sizeof(dataLabel));//
So this character array does not have the terminating zero that is required for the operator << when its argument is a character array.
Thus in this statement
cout << "Data label:" <<dataLabel << "\n";
the program has undefined behaviour.
Change it to
std::cout << "Data label: ";
std::cout.write( dataLabel, sizeof( dataLabel ) ) << "\n";

Read into std::string using scanf

As the title said, I'm curious if there is a way to read a C++ string with scanf.
I know that I can read each char and insert it in the deserved string, but I'd want something like:
string a;
scanf("%SOMETHING", &a);
gets() also doesn't work.
Thanks in advance!
this can work
char tmp[101];
scanf("%100s", tmp);
string a = tmp;
There is no situation under which gets() is to be used! It is always wrong to use gets() and it is removed from C11 and being removed from C++14.
scanf() doens't support any C++ classes. However, you can store the result from scanf() into a std::string:
Editor's note: The following code is wrong, as explained in the comments. See the answers by Patato, tom, and Daniel Trugman for correct approaches.
std::string str(100, ' ');
if (1 == scanf("%*s", &str[0], str.size())) {
// ...
}
I'm not entirely sure about the way to specify that buffer length in scanf() and in which order the parameters go (there is a chance that the parameters &str[0] and str.size() need to be reversed and I may be missing a . in the format string). Note that the resulting std::string will contain a terminating null character and it won't have changed its size.
Of course, I would just use if (std::cin >> str) { ... } but that's a different question.
Problem explained:
You CAN populate the underlying buffer of an std::string using scanf, but(!) the managed std::string object will NOT be aware of the change.
const char *line="Daniel 1337"; // The line we're gonna parse
std::string token;
token.reserve(64); // You should always make sure the buffer is big enough
sscanf(line, "%s %*u", token.data());
std::cout << "Managed string: '" << token
<< " (size = " << token.size() << ")" << std::endl;
std::cout << "Underlying buffer: " << token.data()
<< " (size = " << strlen(token.data()) << ")" << std::endl;
Outputs:
Managed string: (size = 0)
Underlying buffer: Daniel (size = 6)
So, what happened here?
The object std::string is not aware of changes not performed through the exported, official, API.
When we write to the object through the underlying buffer, the data changes, but the string object is not aware of that.
If we were to replace the original call: token.reseve(64) with token.resize(64), a call that changes the size of the managed string, the results would've been different:
const char *line="Daniel 1337"; // The line we're gonna parse
std::string token;
token.resize(64); // You should always make sure the buffer is big enough
sscanf(line, "%s %*u", token.data());
std::cout << "Managed string: " << token
<< " (size = " << token.size() << ")" << std::endl;
std::cout << "Underlying buffer: " << token.data()
<< " (size = " << strlen(token.data()) << ")" << std::endl;
Outputs:
Managed string: Daniel (size = 64)
Underlying buffer: Daniel (size = 6)
Once again, the result is sub-optimal. The output is correct, but the size isn't.
Solution:
If you really want to make do this, follow these steps:
Call resize to make sure your buffer is big enough. Use a #define for the maximal length (see step 2 to understand why):
std::string buffer;
buffer.resize(MAX_TOKEN_LENGTH);
Use scanf while limiting the size of the scanned string using "width modifiers" and check the return value (return value is the number of tokens scanned):
#define XSTR(__x) STR(__x)
#define STR(__x) #x
...
int rv = scanf("%" XSTR(MAX_TOKEN_LENGTH) "s", &buffer[0]);
Reset the managed string size to the actual size in a safe manner:
buffer.resize(strnlen(buffer.data(), MAX_TOKEN_LENGTH));
The below snippet works
string s(100, '\0');
scanf("%s", s.c_str());
Here a version without limit of length (in case of the length of the input is unknown).
std::string read_string() {
std::string s; unsigned int uc; int c;
// ASCII code of space is 32, and all code less or equal than 32 are invisible.
// For EOF, a negative, will be large than 32 after unsigned conversion
while ((uc = (unsigned int)getchar()) <= 32u);
if (uc < 256u) s.push_back((char)uc);
while ((c = getchar()) > 32) s.push_back((char)c);
return s;
}
For performance consideration, getchar is definitely faster than scanf, and std::string::reserve could pre-allocate buffers to prevent frequent reallocation.
You can construct an std::string of an appropriate size and read into its underlying character storage:
std::string str(100, ' ');
scanf("%100s", &str[0]);
str.resize(strlen(str.c_str()));
The call to str.resize() is critical, otherwise the length of the std::string object will not be updated. Thanks to Daniel Trugman for pointing this out.
(There is no off-by-one error with the size reserved for the string versus the width passed to scanf, because since C++11 it is guaranteed that the character data of std::string is followed by a null terminator so there is room for size+1 characters.)
int n=15; // you are going to scan no more than n symbols
std::string str(n+1); //you can't scan more than string contains minus 1
scanf("%s",str.begin()); // scanf only changes content of string like it's array
str=str.c_str() //make string normal, you'll have lots of problems without this string

Read binary file c++

I'm trying to read an image into a char array. Here is my try:
ifstream file ("htdocs/image.png", ios::in | ios::binary | ios::ate);
ifstream::pos_type fileSize;
char* fileContents;
if(file.is_open())
{
fileSize = file.tellg();
fileContents = new char[fileSize];
file.seekg(0, ios::beg);
if(!file.read(fileContents, fileSize))
{
cout << "fail to read" << endl;
}
file.close();
cout << "size: " << fileSize << endl;
cout << "sizeof: " << sizeof(fileContents) << endl;
cout << "length: " << strlen(fileContents) << endl;
cout << "random: " << fileContents[55] << endl;
cout << fileContents << endl;
}
And this is the output:
size: 1944
sizeof: 8
length: 8
random: ?
?PNG
Can anyone explain this to me? Is there an end-of-file char at position 8? This example was taken from cplusplus.com
Running Mac OS X and compiling with XCode.
Returns the size of the file. size of your image.png is 1944 bytes.
cout << "size: " << fileSize << endl;
Returns the sizeof(char*), which is 8 on your environment. Note that size of any pointer is always the same on any environment.
cout << "sizeof: " << sizeof(fileContents) << endl;
The file you are reading is a binary file so it might contain 0 as a valid data. When you use strlen, it returns the length until a 0 is encountered, which in the case of your file is 8.
cout << "length: " << strlen(fileContents) << endl;
Returns the character at the 56th location (remember array indexing starts from 0) from start of file.
cout << "random: " << fileContents[55] << endl;
A suggestion:
Do remember to deallocate the dynamic memory allocation for fileContents using:
delete[] fileContents;
if you don't, you will end up creating a memory leak.
fileSize - the number of bytes in the file.
sizeof( fileContents ) - returns the size of a char* pointer.
strlen( fileContents) - counts the number of characters until a character with a value of '0' is found. That is apparently after just 8 characters - since you are reading BINARY data this is not an unexpected result.
cout << fileContents - like strlen, cout writes out characters until one with a value of '0' is found. From the output it looks like some of the characters aren't printable.
Your example has some other issues - it doesn't free the memory used, for example. Here's a slightly more robust version:
vector< char > fileContents;
{
ifstream file("htdocs/image.png", ios::in | ios::binary | ios::ate);
if(!file.is_open())
throw runtime_error("couldn't open htdocs/image.png");
fileContents.resize(file.tellg());
file.seekg(0, ios::beg);
if(!file.read(&fileContents[ 0 ], fileContents.size()))
throw runtime_error("failed to read from htdocs/image.png");
}
cout << "size: " << fileContents.size() << endl;
cout << "data:" << endl;
for( unsigned i = 0; i != fileContents.size(); ++i )
{
if( i % 65 == 0 )
cout << L"\n';
cout << fileContents[ i ];
}
This answer of mine to another question should be exactly what you are looking for (especially the second part about reading it into a vector<char>, which you should prefer to an array.
As for your output:
sizeof(fileContents) return the size of a char *, which is 8 on your system (64 bit I guess)
strlen stops at the first '\0', just as the output operator does.
What do you expect? png files are binary so they may contain '\0' character (character having numeric value 0) somewhere.
If you treat the png file contents as string ('\0' terminated array of characters) and print it as string then it will stop after encountering the first '\0' character.
So there is nothing wrong with the code, fileContents is correctly contains the png file (with size 1944 bytes)
size: 1944 // the png is 1944 bytes
sizeof: 8 // sizeof(fileContents) is the sizeof a pointer (fileContents type is char*) which is 8 bytes
length: 8 // the 9th character in the png file is '\0' (numeric 0)
random: ? // the 56th character in the png file
?PNG // the 5th-8th character is not printable, the 9th character is '\0' so cout stop here
It's a good practice to use unsigned char to use with binary data.
The character randomly selected might not be displayed properly in the console window due to the limitations in the fonts supported. Also you can verify the same thing by printing it in hexadecimal and open the same file in a hex editor to verify it. Please don't forget to delete the memory allocated after use.