I need to use string instead of char to write several characters at once.
I want the first cycle to take data from the file and the second cycle to go through the string cycle to \0
since in the future I want to receive 2 or 4 characters at a time.
Can I implement this to get .get working through string?
fstream fs("file.txt", fstream::in | fstream::out | ios::binary);
for (string i; fs.get(i);) {
cout << i;
}
istream::get with a c-string reads up to n characters, or a delimiter, default newline (very similar to istream::getline, but it leaves the delimiter in the stream, while getline consumes it).
To read fixed length blocks regardless there is istream::read, and istream::gcount says how much was actually read. Unfortunately neither have an overload for std::string specifically, the main downside being having to size (and thus initialize) a string first.
Putting them together you can get something like:
std::string buffer;
std::fstream is("file.txt", std::ios::in | std::ios::binary);
while (is)
{
buffer.resize(128); // Whatever size you want
is.read(buffer.data(), buffer.size()); // Read into buffer, note *does not null terminate* // C++17
//is.read(&buffer[0], buffer.size()); // Older C++
buffer.resize(is.gcount()); // Actual amount read. Might be less than requested, or even zero at the end or a read failure.
std::cout << "Read " << buffer.size() << " characters." << std::endl;
std::cout << buffer << std::endl;
}
For getline specifically, there is std::getline which handles std::string for you:
std::string buffer;
std::fstream is("file.txt", std::ios::in | std::ios::binary);
while (std::getline(is, buffer))
{
std::cout << "Line: " << buffer << std::endl;
}
Note that both get and getline can use some other delimiter, so it doesn't have to be "lines".
Related
I have a binary file compressed in gz which I wish to stream using boost::iostream. After searching the web the past few hours, I have found a nice code snippet that does what I want, except for std::getline:
try
{
std::ifstream file("../data.txt.gz", std::ios_base::in | std::ios_base::binary);
boost::iostreams::filtering_istream in;
in.push(boost::iostreams::gzip_decompressor());
in.push(file);
std::vector<std::byte> buffer;
for(std::string str; std::getline(in, str); )
{
std::cout << "str length: " << str.length() << '\n';
for(auto c : str){
buffer.push_back(std::byte(c));
}
std::cout << "buffer size: " << buffer.size() << '\n';
// process buffer
// ...
// ...
}
}
catch(const boost::iostreams::gzip_error& e) {
std::cout << e.what() << '\n';
}
I want to read the file, store it into some intermediary buffer, and fill up the buffer as I stream the file. However, std::getline uses \n delimiter, and when it does, does not include the delimiter in the output string.
Is there a way I could read, for instance, 2048 bytes of data at a time?
Uncompressing the gzip stream the way you want isn't exactly straight forward. One option is using boost::iostreams::copy to uncompress the gzip stream into the vector but since you are wanting to decompress the stream in chunks (2k mentioned in your post) that may not be an option.
Now normally with an input stream it's as simple as calling the read() function on the stream specifying the buffer and number of bytes to read in and then calling gcount() to determine how many bytes were actually read. Unfortunately it seems that there is either bug in either filtering_istream or gzip_decompressor or possibly that gcount is not supported (it should be) as it always seems to return the number of bytes requested instead of the actual bytes read. As you might imagine this can cause problems when reading the last few bytes of the file unless you know ahead of time how many bytes to read.
Fortunately the size of the uncompressed data is stored at the end of the gzip file which means we can account for that but we just have to work a little bit harder in the decompression loop.
Below is the code I came up with to handle uncompressing the stream in the way you would like. It creates a two vectors - one for decompressing each 2k chunk and one for the final buffer. It's quite basic and I haven't done anything to really optimize memory usage on the vectors but if that's an issue I suggest switching to a single vector, resize it to the length of the uncompressed data, and call read passing an offset into the vector data for the 2k chunk being read.
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <fstream>
#include <iostream>
#include <utility>
int main()
{
namespace io = boost::iostreams;
std::ifstream file("../data.txt.gz", std::ios_base::in | std::ios_base::binary);
// Get the uncompressed size (stored in big endian, assume we're BE)
uint32_t dataLeft;
file.seekg(-4, std::ios_base::end);
file.read(reinterpret_cast<char*>(&dataLeft), sizeof(dataLeft));
file.seekg(0);
// Set up the gzip stream
io::filtering_istream in;
in.push(io::gzip_decompressor());
in.push(file);
std::vector<std::byte> buffer, tmp(2048);
for (auto toRead(std::min(tmp.size(), dataLeft));
dataLeft && in.read(reinterpret_cast<char*>(tmp.data()), toRead);
dataLeft -= toRead, toRead = std::min(tmp.size(), dataLeft))
{
tmp.resize(toRead);
buffer.insert(buffer.end(), tmp.begin(), tmp.end());
std::cout << "buffer size: " << buffer.size() << '\n';
}
}
I was trying to write to a file or save the string s.substr (space_pos) in a vector as fast as possible. I tried to write it to a file with ofstream or to output it with cout but it takes a long time. The size of the text file is 130mb.
This is the code:
fstream f(legitfiles.c_str(), fstream::in );
string s;
while(getline(f, s)){
size_t space_pos = s.rfind(" ") + 1;
cout << s.substr(space_pos) << endl;
ofstream results("results.c_str()");
results << s.substr(space_pos) << endl;
results.close();
}
cout << s << endl;
f.close();
Is there a way to write or print the string in a faster way?
Uncouple the C++ stream from the C stream:
std::ios_base::sync_with_stdio(false);
Remove the coupling between cin and cout
std::cin.tie(NULL);
Now don't use std::endl needlessly flushes the fstream buffer after every line, flushing is expensive. You should use a newline escape character \n instead and leave the buffer flushing to the stream.
Also don't build an extra string you don't need. Use a character string_view (which prevents copying)
s.substr(space_pos)
//replace with:
std::string_view view(s);
view.substr(space_pos);
If you don't have a modern compiler just use C-Strings.
s.data() + space_pos
You are duplicating the substring. I suggest creating a temporary:
ofstream results("results.c_str()");
while(getline(f, s)){
size_t space_pos = s.rfind(" ") + 1;
const std::string sub_string(s.substr(space_pos));
cout << sub_string << "\n";
results << sub_string << "\n";
}
results.close();
You'll need to profile to see if the next code fragment is faster:
while(getline(f, s))
{
static const char newline[] = "\n";
size_t space_pos = s.rfind(" ") + 1;
const std::string sub_string(s.substr(space_pos));
const size_t length(sub_string.length());
cout.write(sub_string.c_str(), length);
cout.write(newline, 1);
results.write(sub_string.c_str(), length);
results.write(newline, 1);
}
The idea behind the 2nd fragment is that you are bypassing the formatting process and directly writing the contents of the string to the output stream. You'll need to measure both fragments to see which is faster (start a clock, run an example at least 1E6 iterations, stop the clock. Take average).
If you want to speed up the file writing, remove the writing to std::cout.
Edit 1: multiple threads
You may be able to get some more efficiency out of this by using multiple threads: "Read Thread", "Processing Thread" and "Writing Thread".
The "Read Thread" reads the lines and appends to a buffer. Start this one first.
After a delay, the "Processing Thread" performs the substr method on all the strings.
After N about of strings have been processed, the "Writing Thread" starts and writes the substr strings to the file.
This technique uses double buffering. One thread reads and places data into the buffer. When the buffer is full, the Processing Thread should start processing and placing results into a second buffer. When the 2nd buffer is full, the Writing Thread starts and writes the buffer to the results file. There should be at least 2 "read" buffers and 2 "write" buffers. The amount and size of the buffers should be adjusted to get the best performance from your program.
//Edit: Please note that this answer solves a different problem than that stated in the question. It will copy each line skipping everything from the beginning of the line up to and including the first whitespace.
It might be faster to read the first word of a line and throw it away before getline()ing the rest of it instead of using string::find() and std::substr(). Also you should avoid opening and closing the output file on every iteration.
#include <string>
#include <fstream>
int main()
{
std::ifstream is{ "input" };
std::ofstream os{ "output" };
std::string str;
str.reserve(1024); // change 1024 to your estimated line length.
while (is.peek() == ' ' || is >> str, std::getline(is, str)) {
str += '\n'; // save an additional call to operator<<(char)
os << str.data() + 1; // +1 ... skip the space
// os.write(str.data() + 1, str.length() - 1); // might be even faster
}
}
i have the next code:
std::string line;
std::ifstream myfile ("text.txt");
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
std::cout << line << std::endl;
}
myfile.close();
}
is there a way to do it, and use char* instead of string?
Yes, if you really insist. There's a version of getline that's a member of std::istream that will do it:
char buffer[1024];
std::ifstream myfile("text.txt");
while (myfile.getline(buffer, sizeof(buffer))
std::cout << buffer << "\n";
myfile.close();
Note, however, that most C++ programmers would consider this obsolescent at best. Oh, and for the record, the loop in your question isn't really correct either. Using string, you'd typically want something like:
std::string line;
std::ifstream myfile("text.txt");
while (std::getline(myfile, line))
std::cout << line << "\n";
myfile.close();
or, you could use the line proxy from one of my previous answers, in which case it becomes simpler still:
std::copy(std::istream_iterator<line>(myfile),
std::istream_iterator<line>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
So you're looking for a more "C-like" solution?
#include<cstdio>
#define ENOUGH 1000
int main() {
char buffer[ENOUGH];
FILE* f = fopen("text.txt", "r");
while (true) {
if (fgets(buffer, ENOUGH, f) == NULL) break;
puts(buffer);
}
fclose(f);
return 0;
}
...plus some check whether the file was correctly opened. In this case, you use fgets() on the file f, reading into the char* buffer. However, buffer has only ENOUGH space allocated and this limit is also an important parameter to the fgets() function. It will stop reading the line when reaching ENOUGH - 1 characters, so you should make sure the ENOUGH constant is large enough.
But if you didn't mean to solve this in a "C-like" way, but are still going to use <iostream>, then you probably just want to know that the c_str() method of std::string returns the char* representation of that std::string.
I want to put each byte in a char array and rewrite the text file removing the first 100,000 characters.
int fs=0;
ifstream nm,nm1;
nm1.open("C:\\Dev-Cpp\\DCS\\Decom\\a.txt");
if(nm1.is_open())
{
nm1.seekg(0, ios::end );
fs = nm1.tellg();
}
nm1.close();
char ss[500000];
nm.open("C:\\Dev-Cpp\\DCS\\Decom\\a.txt");
nm.read(ss,fs-1);
nm.close();
ofstream om;
om.open("C:\\Dev-Cpp\\DCS\\Decom\\a.txt");
for(int i=100000;i<fs-1;i++){
om >> ss[i];
}
om.close();
Problem is i can't set the character array to a 5 million size. I tried using vector also
vector <char> ss (5000000);
int w=0;
ifstream in2("C:\\Dev-Cpp\\DCS\\Decom\\a.txt", ios::binary);
unsigned char c2;
while( in2.read((char *)&c2, 1) )
{
in2 >> ss[w];
w++;
}
Over here the size of w is almost half that of fs and a lot of characters are missing.
How to do it ?
In most implementations, char ss[5000000] tries allocating on the stack, and the size of the stack is limited as compared to the overall memory size. You can often allocate larger arrays on the heap than on the stack, like this:
char *ss = new char [5000000];
// Use ss as usual
delete[] ss; // Do not forget to delete
Note that if the file size fs is larger than 5000000, you will write past the end of the buffer. You should limit the amount of data that you read:
nm.read(ss,min(5000000,fs-1));
This part is not correct
while( in2.read((char *)&c2, 1) )
{
in2 >> ss[w];
w++;
}
bacause you first try to read one character into c2 and, if that succeeds, read another character into ss[w].
I'm not at all surprised if you lose about half the characters here!
The best way to solve your problem is to use the facilities of the standard library. That way, you also don't have to care about buffer overflows.
The following code is untested.
std::fstream file("C:\\Dev-Cpp\\DCS\\Decom\\a.txt", std::ios_base::in);
if (!file)
{
std::cerr << "could not open file C:\\Dev-Cpp\\DCS\\Decom\\a.txt for reading\n";
exit(1);
}
std::vector<char> ss; // do *not* give a size here
ss.reserve(5000000); // *expected* size
// if the file is too large, the capacity will automatically be extended
std::copy(std::istreambuf_iterator<char>(file), std::istreambuf_iterator<char>(),
std::back_inserter(ss));
file.close();
file.open("C:\\Dev-Cpp\\DCS\\Decom\\a.txt", std::ios_base::out | std::ios_base::trunc);
if (!file)
{
std::cerr << "could not open C:\\Dev-Cpp\\DCS\\Decom\\a.txt for writing\n";
exit(1);
}
if (ss.size() > 100000) // only if the file actually contained more than 100000 characters
std::copy(ss.begin()+100000, ss.end(), std::ostreambuf_iterator<char>(file));
file.close();
I'm currently using std::ofstream as follows:
std::ofstream outFile;
outFile.open(output_file);
Then I attempt to pass a std::stringstream object to outFile as follows:
GetHolesResults(..., std::ofstream &outFile){
float x = 1234;
std::stringstream ss;
ss << x << std::endl;
outFile << ss;
}
Now my outFile contains nothing but garbage: "0012E708" repeated all over.
In GetHolesResults I can write
outFile << "Foo" << std:endl;
and it will output correctly in outFile.
Any suggestion on what I'm doing wrong?
You can do this, which doesn't need to create the string. It makes the output stream read out the contents of the stream on the right side (usable with any streams).
outFile << ss.rdbuf();
If you are using std::ostringstream and wondering why nothing get written with ss.rdbuf() then use .str() function.
outFile << oStream.str();
When passing a stringstream rdbuf to a stream newlines are not translated. The input text can contain \n so find replace won't work. The old code wrote to an fstream and switching it to a stringstream losses the endl translation.
I'd rather write ss.str(); instead of ss.rdbuf(); (and use a stringstream).
If you use ss.rdbuf() the format-flags of outFile will be reset rendering your code non-reusable.
I.e., the caller of GetHolesResults(..., std::ofstream &outFile) might want to write something like this to display the result in a table:
outFile << std::setw(12) << GetHolesResults ...
...and wonder why the width is ignored.