I saved all the contents from an exe file to char buffer.
When I tried:
string bufferStr=(string)buffer;
cout<<bufferStr.length();
I got that bufferStr is much smaller than buffer, so I thought since i was reading an exe file that somewhere in there I had read an escape character "\0" or something.
How can I use buffer to cout or even write to a file, without buffer escaping any characters?
Thanks
stirng constructor doesn't know anything about length of your data and assumes that it is 0-terminated string. You should use
string bufferStr=string(buffer, bufferSize);
cout<<bufferStr.length();
with such constructor string will save also \0 bytes.
Related
I am trying to read a file into a string either by getline function or fileContents.assign( (istreambuf_iterator<char>(myFile)), (istreambuf_iterator<char>()));
Either of the way gives me the above output which shown in the image.
First way:
string fileContents;
ifstream myFile("textFile.txt");
while(getline(myFile,fileContents))
cout<<fileContents<<endl;
Alternate way:
string fileContents;
ifstream myFile(fileName.c_str());
if (myFile.is_open())
{
fileContents.assign( (istreambuf_iterator<char>(myFile) ),
(istreambuf_iterator<char>() ) );
cout<<fileContents;
}
The file begins with those characters, most likely a BOM to tell you what the encoding of the file is.
You probably are not able to see them in Windows Notepad because Notepad hides the encoding bytes. Get a decent text editor that lets you see the binary of the file and you will see those characters.
Your file starts with a UTF-8 BOM (bytes 0xEF 0xBB 0xBF). You are reading the file's raw bytes as-is and outputting them to a display that is using an OEM font for codepage 437. To handle text files properly, especially Unicode-encoded text files, you need to read the first few bytes, check for a BOM (and there are several you can look for), and if detected then seek past the BOM and interpret the remaining bytes of the file in the specified encoding, in this case UTF-8.
Consider I created a file using this way:
std::ofstream osf("MyTextFile.txt");
string buffer="spam\neggs\n";
osf.write(buffer,buffer.length());
osf.close();
When I was trying to read that file using the following way, I realized that more characters than present was read.
std::ifstream is("MyTextFile.txt");
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
char * buffer = new char [length];
is.read (buffer,length);
//work with buffer
delete[] buffer;
For example, if the file contains spam\neggs\n, then this procedure reads 12 characters instead of 10. The first 10 chars are spam\neggs\n as expected but there are 2 more, which have the integer value 65533.
Moreover, this problem happens only when \n is present in the file. For example, there is no problem if file contains spam\teggs\t instead.
The question is;
Am I doing anything wrong? Or doesn't this procedure work as it should do?
Bonus Q: Can you suggest an alternative for reading the whole file at once?
Note: I found this way here.
The problem is that you wrote the string
"spam\neggs\n"
initially to an ofstream, without setting the std::ios::binary flag at the open (or on the initializator). This causes the runtime to translate to the "native text format", i. e., to convert each \n to \r\n on the output (as you are on Windows OS). So, after being written, the contents of your file was actually:
"spam\r\neggs\r\n"
(i. e., 12 chars). That was returned by
int length = is.tellg();
But, when you tried to read 12 chars you got
"spam\neggs\n"
back, because the runtime converted each \r\n back to \n.
As a final advice, please, please, don't use new char[length]... use std::string and reserve so you won't leak memory etc. And if your file can be very big, maybe it's not a good idea to slurp the whole file to memory at once, also.
Just an idea, since the number 2 corresponds to the count of \ns: Are you doing this on Windows? It might have something to do with the file actually containing \r\n. What happens if you open the file in binary mode (std::ios::binary)?
Can you suggest an alternative for reading the whole file at once?
Yes:
std::ifstream is("MyTextFile.txt");
std::string str( std::istreambuf_iterator<char>{is}, {} ); // requires <iterator>
str now contains the file. Does this solve your problem?
I'm trying to read a text file, and for each word, I will put them into a node of a binary search tree. However, the first character is always read as " + first word". For example, if my first word is "This", then the first word that is inserted into my node is "This". I've been searching the forum for a solution to fix it, there was one post asking the same problem in Java, but no one has addressed it in C++. Would anyone help me to fix it ? Thank you.
I came to the a simple solution. I opened the file in Notepad, and saved it as ANSI. After that, the file is reading and passing correctly into the binary search tree
That's UTF-8's BOM
You need to read the file as UTF-8. If you don't need Unicode and just use the first 127 ASCII code points then save the file as ASCII or UTF-8 without BOM
This is Byte Order Mark (BOM). It's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.
In C++, you can use the following function to convert a UTF-8 BOM file to ANSI.
void change_encoding_from_UTF8BOM_to_ANSI(const char* filename)
{
ifstream infile;
string strLine="";
string strResult="";
infile.open(filename);
if (infile)
{
// the first 3 bytes (ef bb bf) is UTF-8 header flags
// all the others are single byte ASCII code.
// should delete these 3 when output
getline(infile, strLine);
strResult += strLine.substr(3)+"\n";
while(!infile.eof())
{
getline(infile, strLine);
strResult += strLine+"\n";
}
}
infile.close();
char* changeTemp=new char[strResult.length()];
strcpy(changeTemp, strResult.c_str());
char* changeResult = change_encoding_from_UTF8_to_ANSI(changeTemp);
strResult=changeResult;
ofstream outfile;
outfile.open(filename);
outfile.write(strResult.c_str(),strResult.length());
outfile.flush();
outfile.close();
}
in debug mode findout the symbol for the special character and then replace it
content.replaceAll("\uFEFF", "");
I'm doing some file io and created the test below, but I thought testoutput2.txt would be the same as testinputdata.txt after running it?
testinputdata.txt:
some plain
text
data with
a number
42.0
testoutput2.txt (In some editors its on seperate lines, but in others its all on one line)
some plain
ऀ琀攀砀琀ഀഀ
data with
愀 渀甀洀戀攀爀ഀഀ
42.0
int main()
{
//Read plain text data
std::ifstream filein("testinputdata.txt");
filein.seekg(0,std::ios::end);
std::streampos length = filein.tellg();
filein.seekg(0,std::ios::beg);
std::vector<char> datain(length);
filein.read(&datain[0], length);
filein.close();
//Write data
std::ofstream fileoutBinary("testoutput.dat");
fileoutBinary.write(&datain[0], datain.size());
fileoutBinary.close();
//Read file
std::ifstream filein2("testoutput.dat");
std::vector<char> datain2;
filein2.seekg(0,std::ios::end);
length = filein2.tellg();
filein2.seekg(0,std::ios::beg);
datain2.resize(length);
filein2.read(&datain2[0], datain2.size());
filein2.close();
//Write data
std::ofstream fileout("testoutput2.txt");
fileout.write(&datain2[0], datain2.size());
fileout.close();
}
Its working fine on my side, i have run your program on VC++ 6.0 and checked the output on notepad and MS Word. can you specify name of editor where you are facing problem.
You can't read Unicode text into a std::vector<char>. The char data type only works with narrow strings, and my guess is that the text file you're reading in (testinputdata.txt) is saved with either UTF-8 or UTF-16 encoding.
Try using the wchar_t type for your characters, instead. It is specifically designed to work with "wide" (or Unicode) characters.
Thou shalt verify thy input was successful! Although this would sort you out, you should also note that number of bytes in the file has no direct relationship to the number of characters being read: there can be less characters than bytes (think Unicode character using multiple bytes using UTF8 to be encoded) or vice versa (although the latter doesn't happen with any of the Unicode encodings). All you experience is that read() couldn't read as many characters as you'd asked it to read but write() happily wrote the junk you gave it.
I want to create a series of files under "log" directory which every file named based on execution time. And in each of these files, I want to store some log info for my program like the function prototype that acts,etc.
Usually I use the hard way of fopen("log/***","a") which is not for this purpose.And I just write a timestamp function:
char* timeStamp(char* txt){
char* rc;
char timestamp[16];
time_t rawtime = time(0);
tm *now = localtime(&rawtime);
if(rawtime != -1) {
strftime(timestamp,16,"%y%m%d_%H%M%S",now);
rc = strcat(txt,timestamp);
}
return(rc);
}
But I don't know what to do next. Please help me with this!
Declare a char array big enough to hold 16 + "log/" (so 20 characters total) and initialize it to "log/", then use strcat() or something related to add the time string returned by your function to the end of your array. And there you go!
Note how the string addition works: Your char array is 16 characters, which means you can put in 15 characters plus a nul byte. It's important not to forget that. If you need a 16 character string, you need to declare it as char timestamp[17] instead. Note that "log/" is a 4 character string, so it takes up 5 characters (one for the nul byte at the end), but strcat() will overwrite starting at the nul byte at the end, so you'll end up with the right number. Don't count the nul terminator twice, but more importantly, don't forget about it. Debugging that is a much bigger problem.
EDIT: While we're at it, I misread your code. I thought it just returned a string with the time, but it appears that it adds the time to a string passed in. This is probably better than what I thought you were doing. However, if you wanted, you could just make the function do all the work - it puts "log/" in the string before it puts the timestamp. It's not that hard.
What about this:
#include <stdio.h>
#include <time.h>
#define LOGNAME_FORMAT "log/%Y%m%d_%H%M%S"
#define LOGNAME_SIZE 20
FILE *logfile(void)
{
static char name[LOGNAME_SIZE];
time_t now = time(0);
strftime(name, sizeof(name), LOGNAME_FORMAT, localtime(&now));
return fopen(name, "ab");
}
You'd use it like this:
FILE *file = logfile();
// do logging
fclose(file);
Keep in mind that localtime() is not thread-safe!
Steps to create (or write to) a sequential access file in C++:
1.Declare a stream variable name:
ofstream fout; //each file has its own stream buffer
ofstream is short for output file stream
fout is the stream variable name
(and may be any legal C++ variable name.)
Naming the stream variable "fout" is helpful in remembering
that the information is going "out" to the file.
2.Open the file:
fout.open(filename, ios::out);
fout is the stream variable name previously declared
"scores.dat" is the name of the file
ios::out is the steam operation mode
(your compiler may not require that you specify
the stream operation mode.)
3.Write data to the file:
fout<<grade<<endl;
fout<<"Mr";
The data must be separated with space characters or end-of-line characters (carriage return), or the data will run together in the file and be unreadable. Try to save the data to the file in the same manner that you would display it on the screen.
If the iomanip.h header file is used, you will be able to use familiar formatting commands with file output.
fout<<setprecision(2);
fout<<setw(10)<<3.14159;
4.Close the file:
fout.close( );
Closing the file writes any data remaining in the buffer to the file, releases the file from the program, and updates the file directory to reflect the file's new size. As soon as your program is finished accessing the file, the file should be closed. Most systems close any data files when a program terminates. Should data remain in the buffer when the program terminates, you may loose that data. Don't take the chance --- close the file!
Sounds like you have mostly solved it already - to create a file like you describe:
char filename[256] = "log/";
timeStamp( filename );
f = fopen( filename, "a" );
Or do you wish do do something more?