Detect and Read Entire UTF-8 File using C++11? - c++

I know of this traditional way,
#include <fstream>
#include <string>
#include <cerrno>
#include <iostream>
int main()
{
std::ifstream in("file.txt", std::ios::in | std::ios::binary);
if (in)
{
std::string contents;
in.seekg(0, std::ios::end);
contents.resize((size_t)in.tellg()); // Allocate buffer
in.seekg(0, std::ios::beg);
// Read the file
in.read(&contents[0], contents.size());
// ... do something ..
// Close
in.close();
}
else
throw(errno);
}
In order to detect if its ANSI or UTF-8 file, do I need to read first three bytes to check if BOM matches, or, is there a better C++11 way using codecvt? How can I make this codecvt way work for entire file?

Related

How to find the file pointer position in c++?

I am currently working with files in c++ and I want to read a file after a certain position. I read online that you can't open a file to read and write simultaneously. Is there a way to return the position of the file pointer at a certain moment and use it to extract the information after it?
What you read was wrong.
#include <fstream>
#include <string>
int main()
{
std::fstream file("test.txt", std::ios::in | std::ios::out | std::ios::binary);
std::string s;
file >> s;
// get current read position
auto read_pos = file.tellg();
// set current write position
file.seekp(read_pos, std::ios::beg);
static const char data[] = "aaa";
// write some data
file.write(data, 3);
}

Read a binary file into a std::vector<uint16_t> instead of std::vector<char>

I want to read a binary file containing uint16_t values. What I've done so far is:
std::ifstream is;
std::vector<char> rawfilebuffer; /* should be std::vector<uint16_t> */
is.open("uint16_t_file.raw", std::ios::binary);
is.seekg(0, std::ios::end);
size_t filesize=is.tellg();
is.seekg(0, std::ios::beg);
rawfilebuffer.reserve(filesize);
rawfilebuffer.assign(std::istreambuf_iterator<char>(is),
std::istreambuf_iterator<char>());
Using std::istreambuf_iterator<char>does not work (error: no matching conversion for functional-style cast from 'std::ifstream').
Is it possible to cast istreambuf_iterator to uint16_t?
With c++11, you can use the data() member of std::vector and read all of file (or big chunks, if the file is too big).
Something like
#include <vector>
#include <fstream>
#include <iostream>
using myType = uint16_t;
int main ()
{
std::ifstream is;
std::vector<myType> rawfilebuffer;
is.open("uint16_t_file.raw", std::ios::binary);
is.seekg(0, std::ios::end);
size_t filesize=is.tellg();
is.seekg(0, std::ios::beg);
rawfilebuffer.resize(filesize/sizeof(myType));
is.read((char *)rawfilebuffer.data(), filesize);
for ( auto const & ui : rawfilebuffer )
std::cout << '[' << ui << ']';
std::cout << '\n';
return 0;
}
Attention to the file size. If it's an exact multiple of sizeof(myType), well.
Otherwise, you should modify you resize instruction in this way
rawfilebuffer.resize(filesize/sizeof(myType)+(filesize%sizeof(myType)?1U:0U));

Read a file and write its contents to another C++

So I'm writing a function to read a file and put its content into another file. Here's what I've got so far:
void myFile::printWords(string input, string output) {
ifstream file(input.c_str());
ofstream file_out(output.c_str());
string word;
if(!file.is_open())
{
printf("File can't be opened\n");
exit(o);
}
while(file >> word) {
cout<< word << '\n';
}
file.close();
}
Question is how do I proceed with writing to a file?
You don't quite need iostreams to copy files; you just need raw stream buffers. For example, here's a complete copy program:
#include <algorithm> // for std::copy
#include <cstdlib> // for EXIT_FAILURE
#include <fstream> // for std::filebuf
#include <iterator> // for std::{i,o}streambuf_iterator
int main(int argc, char *argv[])
{
if (argc != 3) { return EXIT_FAILURE; }
std::filebuf infile, outfile;
infile.open(argv[1], std::ios::in | std::ios::binary);
outfile.open(argv[2], std::ios::out | std::ios::binary);
std::copy(std::istreambuf_iterator<char>(&infile), {},
std::ostreambuf_iterator<char>(&outfile));
}
rather than doing this on a word to word bassis, which doesn't work weel with whitespaces, you could (if you really waht to use c++) use a char[] dump of the file
std::fstream ifile(input.c_str(), std::ios::in | std::ios::binary | std::ios::ate);
std::fstream ofile(output.c_str(), std::ios::out | std::ios::binary);
if (!(ifile.is_open() && ofile.is_open())) { handle_error(); }
size_t size = ifile.tellg();
char* buffer = new char[size];
ifile.seekg(0, std::ios::beg);
ifile.read(buffer, size);
ofile.write(buffer, size);
ifile.close();
ofile.close();
Still it would make much more sense to use your OS functionality

Copy data from fstream to stringstream with no buffer?

Is there anyway I can transfer data from an fstream (a file) to a stringstream (a stream in the memory)?
Currently, I'm using a buffer, but this requires double the memory, because you need to copy the data to a buffer, then copy the buffer to the stringstream, and until you delete the buffer, the data is duplicated in the memory.
std::fstream fWrite(fName,std::ios::binary | std::ios::in | std::ios::out);
fWrite.seekg(0,std::ios::end); //Seek to the end
int fLen = fWrite.tellg(); //Get length of file
fWrite.seekg(0,std::ios::beg); //Seek back to beginning
char* fileBuffer = new char[fLen];
fWrite.read(fileBuffer,fLen);
Write(fileBuffer,fLen); //This writes the buffer to the stringstream
delete fileBuffer;`
Does anyone know how I can write a whole file to a stringstream without using an inbetween buffer?
ifstream f(fName);
stringstream s;
if (f) {
s << f.rdbuf();
f.close();
}
// need to include <algorithm> and <iterator>, and of course <fstream> and <sstream>
ifstream fin("input.txt");
ostringstream sout;
copy(istreambuf_iterator<char>(fin),
istreambuf_iterator<char>(),
ostreambuf_iterator<char>(sout));
In the documentation for ostream, there are several overloads for operator<<. One of them takes a streambuf* and reads all of the streambuffer's contents.
Here is a sample use (compiled and tested):
#include <exception>
#include <iostream>
#include <fstream>
#include <sstream>
int main ( int, char ** )
try
{
// Will hold file contents.
std::stringstream contents;
// Open the file for the shortest time possible.
{ std::ifstream file("/path/to/file", std::ios::binary);
// Make sure we have something to read.
if ( !file.is_open() ) {
throw (std::exception("Could not open file."));
}
// Copy contents "as efficiently as possible".
contents << file.rdbuf();
}
// Do something "useful" with the file contents.
std::cout << contents.rdbuf();
}
catch ( const std::exception& error )
{
std::cerr << error.what() << std::endl;
return (EXIT_FAILURE);
}
The only way using the C++ standard library is to use a ostrstream instead of stringstream.
You can construct a ostrstream object with your own char buffer, and it will take ownership of the buffer then (so no more copying is needed).
Note however, that the strstream header is deprecated (though its still part of C++03, and most likely, it will always be available on most standard library implementations), and you will get into big troubles if you forget to null-terminate the data supplied to the ostrstream.This also applies to the stream operators, e.g: ostrstreamobject << some_data << std::ends; (std::ends nullterminates the data).
If you're using Poco, this is simply:
#include <Poco/StreamCopier.h>
ifstream ifs(filename);
string output;
Poco::StreamCopier::copyToString(ifs, output);

C++ Reading Files

What is the minimum code required to read a file and assign its contents to a string in c++?
I did read a lot of tutorials that worked but they were all different in a way so i am trying to see why, so if you could please include some explanatory comments that would be great.
Related: What is the best way to read an entire file into a std::string in C++?
#include <fstream>
#include <string>
int main()
{
std::ifstream file("myfile.txt"); // open the file
std::string line, whole_file;
// Read one line at a time from 'file' and store the result
// in the string called 'line'.
while (std::getline(file, line))
{
// Append each line together so the entire file will
// be in one string.
whole_file += line;
whole_file += '\n';
}
return 0;
// 'file' is closed automatically when the object goes out of scope.
}
A couple of things to note here. getline() returns a reference to the stream object, which fails the while-test if anything bad happens or if you reach the end of the file. Also, the trailing newline is not included in the string, so you have to append it manually.
The shortest code: (not effecient)
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include <fstream>
int main()
{
std::ifstream f("plop");
std::string buffer;
std::copy(std::istreambuf_iterator<char>(f),
std::istreambuf_iterator<char>(),
std::back_inserter(buffer));
}
How I would probably do it:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
#include <fstream>
int main()
{
// Find the size of the file
std::ifstream file("Plop");
file.seekg(0,std::ios_base::end);
std::streampos size = file.tellg();
// Read the file in one go.
file.seekg(0);
std::vector<char> buffer(size); // pre-szie the vector.
file.read(&buffer[0],size);
// or
// Until the next version of the standard I don't think string gurantees contigious storage.
// But all the current versions I know do use continious storage so it should workd.
file.seekg(0);
std::string buffer1(size);
file.read(&buffer1[0],size);
}
I'm not seeing as much:
#include <fstream>
#include <sstream>
#include <string>
using namespace std;
int main() {
ifstream ifs("filename");
stringstream ss;
ss << ifs.rdbuf();
string s = ss.str();
}
... as I'd expect. You'd want some error-checking too.
Konrad Rudolph gave this as the answer to the "related question" linked above. I suppose this isn't a duplicate, since this asks for the shortest code, but the answer is the same either way. So I repost it here as wiki.
I am reading a word from each line.
#include<fstream>
#include<string>
using namespace std;
int main(int argc, char **argv)
{
fstream inFile;
string str;
while(!inFile.eof())
{
inFile.open("file.txt");
infile>>str;
}
inFile.close();
return 0;
}
This is longer than the short solutions, but is possibly slightly more efficient as it does a bit less copying - I haven't done any timing comparisons though:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;;
unsigned int FileRead( istream & is, vector <char> & buff ) {
is.read( &buff[0], buff.size() );
return is.gcount();
}
int main() {
ifstream ifs( "afile.dat", ios::binary );
const unsigned int BUFSIZE = 64 * 1024;
std::vector <char> buffer( BUFSIZE );
unsigned int n;
string s;
while( n = FileRead( ifs, buffer ) ) {
s.append( &buffer[0], n );
}
cout << s;
}
If you know that your file contains text, then you can use STLSoft's platformstl::memory_mapped_file:
platformstl::memory_mapped_file file("your-file-name");
std::string contents(static_cast<char const*>(file.memory()), file.size());
or
platformstl::memory_mapped_file file("your-file-name");
std::wstring contents(static_cast<wchar_t const*>(file.memory()),
file.size() / sizeof(wchar_t));
On WIndows, that will leave your string containing \r\n sequences, so you could instead use the winstl::load_text_file() function:
std::string contents;
winstl::load_text_file("your-file-name", contents);
If you want it loaded into a collection of lines, then use platformstl::read_lines():
platformstl::basic_file_lines<char> lines("your-file-name");
size_t n = lines.size();
std::string line3 = lines[3];