Reading and writing to files C++ [duplicate] - c++

This question already has an answer here:
feof() and fscanf() stop working after scanning byte 1b as a char. Is it because it is 'ESC' in ascii? What can I do?
(1 answer)
Closed 5 years ago.
I've got problem regarding output/input from files.
Here is my program:
#include <bits/stdc++.h>
using namespace std;
int main()
{
FILE * out;
out=fopen("tmp.txt", "w");
for(int i=0; i<256; i++)
{
fprintf(out, "%c", char(i));
}
fclose(out);
FILE * in;
in=fopen("tmp.txt", "r");
while(!feof(in))
{
char a=fgetc(in);
cout<<int(a)<<endl;
}
fclose(in);
}
and here is the output:
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-1
Why is it stopping so quickly?
Does that mean char(26) is EOF?
How could i write to file (of any type) to overcome this problem?
What I'm looking for is a way to freely write values (of any range, can be char, int or sth else) to a file and then reading it.

Works for me *), however a few remarks:
You should not use #include <bits/stdc++.h>, that is an internal header intended for compiler use, not to be included from the client apps.
As some characters are translated (e.g. EOL) or specifically interpreted in the text (default) mode, you should open the files in binary mode.
Reading as (signed) char and converting to int will result in negative values past 127.
As fgetc already returns int, you actually do not need to do that conversion to signed char and back at all.
See here the code with the corrections.
*) Apparently as mentioned in other comments it might not work on Windows in text mode (see the point 2.).

What I'm looking for is a way to freely write values (of any range, can be char, int or sth else) to a file and then reading it.
In this case you must:
Separate the individual values with a delimiter, such as space or new-line symbol.
Read back integers rather than individual separate characters / bytes.
The easiest is to use C++ std::fstream for that. E.g.:
int main() {
{
std::ofstream out("tmp.txt");
for(int i=0; i<256; i++)
out << i << '\n';
// out destructor flushes and closes the stream.
}
{
std::ifstream in("tmp.txt");
for(int c; in >> c;)
std::cout << c << '\n';
}
}

Related

C++ binary files I/O, data lost when writing

I am learning C++ with the "Programming: Principles and Practice Using C++" book from Bjarne Stroustrup. I am currently studying chapter 11 and I found an example on how to read and write binary files of integers (section 11.3.2). I played around with the example and used a .txt file (input.txt) with a sentence which I read and wrote to another file (output.txt) (text_to_binary fnc) and then read and wrote back to the original file (input.txt) (binary_to_text fnc).
#include<fstream>
#include<iostream>
using namespace std;
void text_to_binary(ifstream &ifs, ofstream &ofs)
{
for (int x; ifs.read(as_bytes(x), sizeof(char));)
{
ofs << x << '\n';
}
ofs.close();
ifs.close();
}
void binary_to_text(ifstream &ifs, ofstream &ofs)
{
for (int x; ifs >> x;)
{
ofs.write(as_bytes(x), sizeof(char));
}
ifs.close();
ofs.close();
}
int main()
{
string iname = "./chapter_11/input.txt";
string oname = "./chapter_11/output.txt";
ifstream ifs{iname, ios_base::binary};
ofstream ofs{oname, ios_base::binary};
text_to_binary(ifs, ofs);
ifstream ifs2{oname, ios_base::binary};
ofstream ofs2{iname, ios_base::binary};
binary_to_text(ifs2, ofs2);
return 0;
}
I figured out that I have to use sizeof(char) rather than sizeof(int) in the .read and .write command. If I use the sizeof(int) some chars of the .txt file go missing when I write them back to text. Funnily enough chars only goes missing if
x%4 != 0 (x = nb of chars in .txt file)
example with sizeof(int):
input.txt:
hello this is an amazing test. 1234 is a number everything else doesn't matter..asd
(text_to_binary fnc) results in:
output.txt:
1819043176
1752440943
1763734377
1851859059
1634558240
1735289210
1936028704
824192628
540291890
1629516649
1836412448
544367970
1919252069
1768453241
1696622446
543519596
1936027492
544483182
1953784173
774795877
(binary_to_text fnc) results back in:
input.txt:
hello this is an amazing test. 1234 is a number everything else doesn't matter..
asd went missing.
Now to my question, why does this happen? Is it because int's are saved as 4 bytes?
Bonus question: Out of interest, is there a simpler/more efficient way of doing this?
edit: updated the question with the results to make it hopefully more clear
When you attempt to do a partial read, the read will attempt to go beyond the end of the file and the eof flag will be set for the stream. This makes its use in the loop condition false so the loop ends.
You need to check gcount of the stream after the loop to see if any bytes was actually read into the variable x.
But note that partial reads will only write to parts of the variable x, leaving the rest indeterminate. Exactly which parts depends on the system endianness, and using the variable with its indeterminate bits will lead to undefined behavior.

cout chr(10) adds a superfluous chr(13) before it [duplicate]

This question already has answers here:
c++: how to print new line without carriage return [duplicate]
(2 answers)
Closed 2 years ago.
I have two c++ exes communicating over iostream. First exe sends a stream of chars (or bytes) and second intercepts this and decodes them.
exe1.exe emits chars:
void main()
{
for (int i = 0; i < 256; ++i)
cout << static_cast<char>(i);
}
exe2.exe takes them in:
void main()
{
FILE* pipe = _popen("exe1.exe", "rb");
while (!feof(pipe))
cout << static_cast<int>(fgetc(pipe)) << endl;
_pclose(pipe);
}
One would expect to receive 256 values in serial order as so:
0,1,2,3,4,5,6,7,8,9,10,11,12,13...
But one gets
0,1,2,3,4,5,6,7,8,9,13,10,11,12,13...
There is a problem at 10, where you can see an additional 13 before it. Possibly cout wants to be helpful by adding an extra carriage return before a \n char. But it is annoying when one wants to transfer pure bytes between two processes. Yes, cout is for human readability, but is there a way to tell cout or printf to not do that? Or to use another stream which is not intended for humans to read?
Character 10 is the ASCII LF, which is treated as a line break on most platforms. On Windows specifically, the standard line break is a 13 10 (CRLF) sequence. C++ stream implementations are smart enough to know that and will convert character 10 to 13 10 on output when operating in text mode. If you don't want that to happen, you have to put the output stream into binary mode instead.

String.length() shows incorrect value [duplicate]

This question already has answers here:
Read whole ASCII file into C++ std::string [duplicate]
(9 answers)
Closed 3 years ago.
I am trying this following code in cpp
#include <iostream>
#include <string>
void Strlength(const string& s) {
cout << s.length() << endl;
}
int main() {
string s;
cin >> s;
cout << s.length() << endl;
Strlength(s);
return 0;
}
The string which I am giving as input is 100,000 characters long and is like "xxxxxxx...xxxxabcde"(fill the ... with remaining x)
This gives me the output as
4095
4095
I am expecting the output to be 100000. what am I doing wrong?
This relates to one of the hackerrank problem (Test case 10): String Similarity
Assuming you describe the input correctly, that is it is one single "word", then the issue is not in your code. The issue must be in the environment which runs the code. It has some kind of mechanism to feed the standard input to your program. Either that has a limitation on total input length, or it has a limitation of line length. 4 kilobytes is 4096 bytes, so perhaps your input is limited by that: 4095 chars of the word plus a newline character (or terminating 0 byte of string, or whatever).
If you are running this under some kind of web interface in browser, the problem could even be, that the input field in the web page has that limitation.
If you need to dig into this, try to read char by char and see what you get, how many chars and how many newlines. Also examine cin.fail(), cin.eof(), cin.bad() and cin.good(). For the question code, you should expect failbit to be false, and eofbit might be true oe false depending on how the input was truncated.

C++ ifstream will read some values then stop

I am trying to write a program that reads 940 4-byte long values of binary data [hex] from a bin file, and output the values to console. I have ifstream::read, cout and seekg operations in a loop.
It will work for the first 10 or so iterations, and then in one iteration skip the read and write operations, preform the seekg operation, and continue on reading and writing. Also the last 200 lines or so are coming out the same value.
It will work properly for 12 iterations, then it will start outputting the wrong numbers. At this point it goes from address 0x230 to 0x28B when it should be at 0x260. It looks like read and cout are not called in this particular iteration.
The last correct value reads 3f4fc938. The next value should be 3ef646c1.
Does anyone know why this would fail? Any help is appreciated.
This is the program:
int main(int argc, char* argv[]) {
fstream in;
uint32_t buffer;
in.open(argv[1]);
in.seekg(0x6500,in.beg);
for(int i = 0; i < 940; i++) {
in.read(reinterpret_cast<char*> (&buffer),4);
cout << hex << buffer << endl;
in.seekg(0x2c,in.cur);
}
}
You have opened your file in text mode. Text mode means that operations on the file will interpret a Byte sequence that matches the platform-specific representation of a newline as a single '\n' character. If you're on Windows, for example, newlines are represented as the Byte sequence 0D 0A. So on Windows, whatever you do in your file will work well up to the point where your file happens to have a Byte with value 13 followed by a Byte with value 10. Once you reach that point, that 13 followed by 10 will be interpreted as a single character. Essentially, text mode will just swallow any Byte with value 13 if it happens to appear right before a Byte with value 10. Your application will never see the 13 and anything beyond the point where the 13 appeared will end up "shifted" by one Byte. On other platforms, other newline representations are common. If you wanna work with binary data, you will generally want to open your file in binary mode, for example
fstream in(argv[1], std::ios::binary);
or
in.open(argv[1], std::ios::binary);

C++ reading a file in binary mode. Problems with END OF FILE

I am learning C++and I have to read a file in binary mode. Here's how I do it (following the C++ reference):
unsigned values[255];
unsigned total;
ifstream in ("test.txt", ifstream::binary);
while(in.good()){
unsigned val = in.get();
if(in.good()){
values[val]++;
total++;
cout << val <<endl;
}
}
in.close();
So, I am reading the file byte per byte till in.good() is true. I put some cout at the end of the while in order to understand what's happening, and here is the output:
marco#iceland:~/workspace/huffman$ ./main
97
97
97
97
10
98
98
10
99
99
99
99
10
100
100
10
101
101
10
221497852
marco#iceland:~/workspace/huffman$
Now, the input file "test.txt" is just:
aaaa
bb
cccc
dd
ee
So everything works perfectly till the end, where there's that 221497852. I guess it's something about the end of file, but I can't figure the problem out.
I am using gedit & g++ on a debian machine(64bit).
Any help help will be appreciated.
Many thanks,
Marco
fstream::get returns an int-value. This is one of the problems.
Secondly, you are reading in binary, so you shouldn't use formatted streams. You should use fstream::read:
// read a file into memory
#include <iostream> // std::cout
#include <fstream> // std::ifstream
int main () {
std::ifstream is ("test.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
char * buffer = new char [length];
std::cout << "Reading " << length << " characters... ";
// read data as a block:
is.read (buffer,length);
if (is)
std::cout << "all characters read successfully.";
else
std::cout << "error: only " << is.gcount() << " could be read";
is.close();
// ...buffer contains the entire file...
delete[] buffer;
}
return 0;
}
This isn't the way istream::get() was designed to be used.
The classical idiom for using this function would be:
for ( int val = in.get(); val != EOF; val = in.get() ) {
// ...
}
or even more idiomatic:
char ch;
while ( in.get( ch ) ) {
// ...
}
The first loop is really inherited from C, where in.get() is
the equivalent of fgetc().
Still, as far as I can tell, the code you give should work.
It's not idiomatic, and it's not
The C++ standard is unclear what it should return if the
character value read is negative. fgetc() requires a value in
the range [0...UCHAR_MAX], and I think it safe to assume that
this is the intent here. It is, at least, what every
implementation I've used does. But this doesn't affect your
input. Depending on how the implementation interprets the
standard, the return value of in.get() must be in the range
[0...UCHAR_MAX] or [CHAR_MIN...CHAR_MAX], or it must be EOF
(typically -1). (The reason I'm fairly sure that the intent is
to require [0...UCHAR_MAX] is because otherwise, you may not
be able to distinguish end of file from a valid character.)
And if the return value is EOF (almost always
-1), failbit should be set, so in.good() would return
false. There is no case where in.get() would be allowed
to return 221497852. The only explication I can possibly think
of for your results is that your file has some character with
bit 7 set at the end of the file, that the implementation is
returning a negative number for this (but not end of file,
because it is a character), which results in an out of bounds
index in values[val], and that this out of bounds index
somehow ends up modifying val. Or that your implementation is
broken, and is not setting failbit when it returns end of
file.
To be certain, I'd be interested in knowing what you get from
the following:
std::ifstream in( "text.txt", std::ios_base::binary );
int ch = in.get();
while ( ch != std::istream::traits_type::eof() ) {
std::cout << ch << std::endl;
ch = in.get();
}
This avoids any issues of a possibly invalid index, and any type
conversions (although the conversion int to unsigned is well
defined). Also, out of curiosity (since I can only access VC++
here), you might try replacing in as follows:
std::istringstream in( "\n\xE5" );
I would expect to get:
10
233
(Assuming 8 bit bytes and an ASCII based code set. Both of
which are almost, but not quite universal today.)
I've eventually figured this out.
Apparently it seems the problem wasn't due to any code. The problem was gedit. It always appends a newline character at the end of file. This also happen with other editors, such as vim. For some editor this can be configured to not append anything, but in gedit this is apparently not possible. https://askubuntu.com/questions/13317/how-to-stop-gedit-gvim-vim-nano-from-adding-end-of-file-newline-char
Cheers to everyone who asked me,
Marco