I have a large document that has two pieces. The first is a header, which uses standard characters and ends with [END]. The second part is in binary, and looks something like: NUL DLE NUL DC1 NUL. I am attempting to read in this document using an ifstream. My code is:
std::string filename = "file.txt";
std::ifstream originalFile;
originalFile.open(filename,std::ios::binary);
std::streampos fsize = 0;
fsize = originalFile.tellg();
originalFile.open(0,std::ios::end);
fsize = originalFile.tellg() - fsize;
char * buffer = new char [int(fsize)];
originalFile.seekg(0,std::ios::beg);
originalFile.reade(buffer,fsize);
std::cout << fsize << std::endl;
std::cout << buffer << std::endl;
When I run it, The program outputs the entire header of my file, and then ends. It does not access or print any of the binary data. Is this the right command to be using? If not, is there something else I can try?
Your dump of the file data (which presumably;y really looks like std::cout << buffer << std::endl;) is stopping when it hits the NUL character which it considers to be the end of a C-style string.
Related
Preface: I am a inexperienced coder so its probably an obvious error. Also like all of this code is stolen and slapped together so I claim no ownership of this code.
System: I am using windows 10 64 bit. I write my code in Notepad++ and compile with MinGW G++.
What I'm trying to do: I am trying to read an entire file (BMP format) into a variable and return a pointer to that variable as the return of a function.
What's happening: The variable is only storing the first char of the file.
char* raw_data(std::string filename){
//100% non-stolen
std::ifstream is (filename, std::ifstream::binary);
if (is) {
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
std::cout << is.tellg() << "\n";
char * buffer = new char [length];
std::cout << "Reading " << length << " characters... \n";
// read data as a block:
is.read (buffer,length);
std::cout << "\n\n" << *buffer << "\n\n";
if (is)
{std::cout << "all characters read successfully.";}
else
{std::cout << "error: only " << is.gcount() << " could be read";}
is.close();
// ...buffer contains the entire file...
//101% non-stolen
return {buffer};
}
return {};
}
The code calling the function is
char * image_data = new char [image_size];
image_data = raw_data("Bitmap.bmp");
This compiles fine and the EXE outputs
0
Reading 2665949 characters...
B
all characters read successfully.
The file Bitmap.bmp starts:
BM¶ƒ 6 ( € ‰ €ƒ Δ Δ ¨δό¨δό¨δό¨
As you can see, the variable buffer only stores the first char of Bitmap.bmp (if I change the 1st char it also changes)
Any help would be appreciated.
Thank you for your time.
std::cout << "\n\n" << *buffer << "\n\n";
Buffer is a char*, so by dereferencing it you get a single char, which in your case is B. If you want to output the whole data that you read just don't dereference the pointer, in C/C++ char* has special treatment when outputing with std::cout,printf and such.
std::cout << "\n\n" << buffer << "\n\n";
Keep in mind that by convention, C-strings in char* should be null-terminated, yours is not and the caller of your function has no effective way to check how long it is, that information is lost as functions like strlen expect the Cstring to be null-terminated too. You should look at std::vector<char> or std::string for holding such data, as they will hold the information about the size, and clean after themselves.
I was trying to write to a file or save the string s.substr (space_pos) in a vector as fast as possible. I tried to write it to a file with ofstream or to output it with cout but it takes a long time. The size of the text file is 130mb.
This is the code:
fstream f(legitfiles.c_str(), fstream::in );
string s;
while(getline(f, s)){
size_t space_pos = s.rfind(" ") + 1;
cout << s.substr(space_pos) << endl;
ofstream results("results.c_str()");
results << s.substr(space_pos) << endl;
results.close();
}
cout << s << endl;
f.close();
Is there a way to write or print the string in a faster way?
Uncouple the C++ stream from the C stream:
std::ios_base::sync_with_stdio(false);
Remove the coupling between cin and cout
std::cin.tie(NULL);
Now don't use std::endl needlessly flushes the fstream buffer after every line, flushing is expensive. You should use a newline escape character \n instead and leave the buffer flushing to the stream.
Also don't build an extra string you don't need. Use a character string_view (which prevents copying)
s.substr(space_pos)
//replace with:
std::string_view view(s);
view.substr(space_pos);
If you don't have a modern compiler just use C-Strings.
s.data() + space_pos
You are duplicating the substring. I suggest creating a temporary:
ofstream results("results.c_str()");
while(getline(f, s)){
size_t space_pos = s.rfind(" ") + 1;
const std::string sub_string(s.substr(space_pos));
cout << sub_string << "\n";
results << sub_string << "\n";
}
results.close();
You'll need to profile to see if the next code fragment is faster:
while(getline(f, s))
{
static const char newline[] = "\n";
size_t space_pos = s.rfind(" ") + 1;
const std::string sub_string(s.substr(space_pos));
const size_t length(sub_string.length());
cout.write(sub_string.c_str(), length);
cout.write(newline, 1);
results.write(sub_string.c_str(), length);
results.write(newline, 1);
}
The idea behind the 2nd fragment is that you are bypassing the formatting process and directly writing the contents of the string to the output stream. You'll need to measure both fragments to see which is faster (start a clock, run an example at least 1E6 iterations, stop the clock. Take average).
If you want to speed up the file writing, remove the writing to std::cout.
Edit 1: multiple threads
You may be able to get some more efficiency out of this by using multiple threads: "Read Thread", "Processing Thread" and "Writing Thread".
The "Read Thread" reads the lines and appends to a buffer. Start this one first.
After a delay, the "Processing Thread" performs the substr method on all the strings.
After N about of strings have been processed, the "Writing Thread" starts and writes the substr strings to the file.
This technique uses double buffering. One thread reads and places data into the buffer. When the buffer is full, the Processing Thread should start processing and placing results into a second buffer. When the 2nd buffer is full, the Writing Thread starts and writes the buffer to the results file. There should be at least 2 "read" buffers and 2 "write" buffers. The amount and size of the buffers should be adjusted to get the best performance from your program.
//Edit: Please note that this answer solves a different problem than that stated in the question. It will copy each line skipping everything from the beginning of the line up to and including the first whitespace.
It might be faster to read the first word of a line and throw it away before getline()ing the rest of it instead of using string::find() and std::substr(). Also you should avoid opening and closing the output file on every iteration.
#include <string>
#include <fstream>
int main()
{
std::ifstream is{ "input" };
std::ofstream os{ "output" };
std::string str;
str.reserve(1024); // change 1024 to your estimated line length.
while (is.peek() == ' ' || is >> str, std::getline(is, str)) {
str += '\n'; // save an additional call to operator<<(char)
os << str.data() + 1; // +1 ... skip the space
// os.write(str.data() + 1, str.length() - 1); // might be even faster
}
}
This question already has answers here:
How to read a file line by line or a whole text file at once?
(9 answers)
Closed 5 years ago.
i am having difficulty reading a line of text from file to a char array.` assume that i have a text file named "sample.txt", and it contains only few words per line. here is what my code is:
char buffer[100];
ifstream file;
file.open("sample.txt")
file >> buffer;
this stops reading after space. I also tried:
file.getline(buffer,100);
but this does not give me the correct text. After the text, some random symbols were assigned to the remaining of the array.
any help would be deeply appreciated!
Edit:
This char array is a temporary array. Im passing text to this array, and then pass it to a class data member
For every and any input operation, you must check the return value; otherwise you cannot know whether the operation succeeded and what it did.
If you are reading into a fixed buffer, you need to check the stream object and the count of extracted characters:
file.getline(buffer, sizeof buffer);
auto n = file.gcount();
if (file) {
std::cout << "Read line with " << n << " characters: '";
std::copy_n(buffer, n, std::ostream_iterator<char>(std::cout));
std::cout << "'\n";
} else if (n > 0) {
std::cout << "Read incomplete line with prefix '";
std::copy_n(buffer, n, std::ostream_iterator<char>(std::cout));
std::cout << "'.\n";
file.clear();
} else {
std::cout << "Did not read any lines.\n";
}
Note that the extracted count (file.gcount()) includes the null terminator, which basic_istream::getline writes into the output buffer. (So the maximal length of a line that can be read completely is sizeof(buffer) - 1.)
Alternatively, you can read into a dynamic string. This means that memory will be automatically allocated to hold each complete line, but it's a lot easier to reason about:
for (std::string line; std::getline(file, line); ) {
std::cout << "Read one line: '" << line << "'\n";
}
Here we only check the success of the input operation, and we do this inside the loop condition. The number of extracted characters (this time excluding the null terminator) is precisely line.size() after the successful read.
It would be much easier to just use a std::string object to get the line content and later convert it to a char array:
std::string str;
std::getline(file,str);
const char* c = str.c_str();
Why is this happening
Try printing out the array before doing getline. You'll notice that the zany symbols populate the whole array.
getline() will plop characters into your array from the input stream until either 100 characters are read, or the delimiting character is reached. If the delimiting character is reached before 100 characters, the rest of your length-100 array will point to uninitialized memory.
If you want to interact with streams in this way, (using a char array instead of a string) you'll have to decide how you want the remainder of the array to be initialized if you'd like to avoid nonsense.
Using fstreams I'm attempting to read single characters from a specified location in a file and append them onto a string. For some reason, reading in these characters returns special characters. I've tried numerous things, but the more curious thing that I found while debugging was that changing the initial value of the char temp; will cause the whole string to change to that value.
int Class::numbers(int number, string& buffer) {
char temp;
if (number < 0 || buffer.length() > size) {
exit(0);
}
string fname = name + ".txt";
int start = number * size;
ifstream readin(fname.c_str());
readin.open(fname.c_str(), ios::in)
readin.seekg(start);
for (int i = 0; i < size; ++i) {
readin.get(temp);
buffer += temp;
}
cout << buffer << endl;
readin.close();
return 0;
}
Here is an example screenshot of the special characters being outputted: http://i.imgur.com/6HCI7TT.png
Could the issue be where I'm starting using seekg? It seems to start in the appropriate position. Another thing I've considered is that maybe I'm reading some invalid place into the stream and it's just giving me junk characters from memory.
Any thoughts?
WORKING SOLUTION:
int Class::numbers(int number, string& buffer) {
char temp;
if (number < 0 || buffer.length() > size) {
exit(0);
}
string fname = name + ".txt";
int start = number * size;
ifstream readin(fname.c_str());
readin.open(fname.c_str(), ios::in)
readin.seekg(start);
for (int i = 0; i < size; ++i) {
readin.get(temp);
buffer += temp;
}
cout << buffer << endl;
readin.close();
return 0;
}
Here is the working solution. In my program I had already had this file name open, so opening it twice was likely to cause issues I suppose. I will do some further testing on this in my own time.
For ASCII characters with a numeric value greater than 127, the actual character rendered on screen depends on the code page of the system you are currently using.
What is likely happening is that you are not getting a single "character" as you think you are.
First, to debug this, use your existing code to just open and print out an entire text file. Is your program capable of doing this? If not, it's likely that the "text" file you are opening isn't using ASCII, but possibly UTF or some other form of encoding. That means when you read a "character" (8-bits most likely), you're just reading half of a 16-bit "wide character", and the result is meaningless to you.
For example, the gedit application will automatically render "Hello World" on screen as I'd expect, regardless of character encoding. However, in a hex editor, a UTF8 encoded file looks like:
UTF8 Raw text:
0000000: 4865 6c6c 6f20 776f 726c 642e 0a Hello world..
While UTF16 looks like:
0000000: fffe 4800 6500 6c00 6c00 6f00 2000 7700 ..H.e.l.l.o. .w.
0000010: 6f00 7200 6c00 6400 2e00 0a00 o.r.l.d.....
This is what your program sees. C/C++ expect ASCII encoding by default. If you want to handle other encodings, it's up to your program to accomodate it manually or by using a third-party library.
Also, you aren't testing to see if you've exceeded the length of the file. You could just be grabbing random garbage.
Using a simple text file just containing the string "Hello World", can your program do this:
Code Listing
// read a file into memory
#include <iostream> // std::cout
#include <fstream> // std::ifstream
#include <string.h>
int main () {
std::ifstream is ("test.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
// allocate memory:
char * buffer = new char [length];
// read data as a block:
is.read (buffer,length);
// print content:
std::cout.write (buffer,length);
std::cout << std::endl;
// repeat at arbitrary locations:
for (int i = 0; i < length; i++ )
{
memset(buffer, 0x00, length);
is.seekg (i, is.beg);
is.read(buffer, length-i);
// print content:
std::cout.write (buffer,length);
std::cout << std::endl;
}
is.close();
delete[] buffer;
}
return 0;
}
Sample Output
Hello World
Hello World
ello World
llo World
lo World
o World
World
World
orld
rld
ld
d
I'm currently trying to read the contents of a file into a char array.
For instance, I have the following text in a char array. 42 bytes:
{
type: "Backup",
name: "BackupJob"
}
This file is created in windows, and I'm using Visual Studio c++, so there is no OS compatibility issues.
However, executing the following code, at the completion of the for loop, I get Index: 39, with no 13 displayed prior to the 10's.
// Create the file stream and open the file for reading
ifstream fs;
fs.open("task.txt", ifstream::in);
int index = 0;
int ch = fs.get();
while (fs.good()) {
cout << ch << endl;
ch = fs.get();
index++;
}
cout << "----------------------------";
cout << "Index: " << index << endl;
return;
However, when attempting to create a char array the length of the file, reading the file size as per below results in the 3 additional CR chars attributing to the total filesize so that length is equal 42, which is adding screwing up the end of the array with dodgy bytes.
// Create the file stream and open the file for reading
ifstream fs;
fs.seekg(0, std::ios::end);
length = fs.tellg();
fs.seekg(0, std::ios::beg);
// Create the buffer to read the file
char* buffer = new char[length];
fs.read(buffer, length);
buffer[length] = '\0';
// Close the stream
fs.close();
Using a hex viewer, I have confirmed that file does indeed contain the CRLF (13 10) bytes in the file.
There seems to be a disparity with getting the end of the file, and what the get() and read() methods actually return.
Could anyone please help with this?
Cheers,
Justin
You should open your file in binary mode. This will stop read dropping CR.
fs.open("task.txt", ifstream::in|ifstream::binary);