C++ iostream UTF-16 file I/O with CR LF translation

C++ iostream UTF-16 file I/O with CR LF translation - c++

I want to read and write utf-16 files which use CR LF line separators (L"\r\n"). Using C++ (Microsoft Visual Studio 2010) iostreams. I want every L"\n" written to the stream to be translated to L"\r\n" transparently. Using the codecvt_utf16 locale facet requires to open the fstream in ios::binary mode, losing the usual text mode \n to \r\n translation.
std::wofstream wofs;
wofs.open("try_utf16.txt", std::ios::binary);
wofs.imbue(
std::locale(
wofs.getloc(),
new std::codecvt_utf16<wchar_t, 0x10ffff, std::generate_header>));
wofs << L"Hi!\n"; // i want a '\r' to be inserted before the '\n' in the output file
wofs.close();++
I want a solution without needing extra libraries like BOOST.

I think I've found a solution myself, I want to share it. Your comments are welcome!
#include <iostream>
#include <fstream>
class wcrlf_filebuf : public std::basic_filebuf<wchar_t>
{
typedef std::basic_filebuf<wchar_t> BASE;
wchar_t awch[128];
bool bBomWritten;
public:
wcrlf_filebuf()
: bBomWritten(false)
{ memset(awch, 0, sizeof awch); }
wcrlf_filebuf(const wchar_t *wszFilespec,
std::ios_base::open_mode _Mode = std::ios_base::out)
: bBomWritten(false)
{
memset(awch, 0, sizeof awch);
BASE::open(wszFilespec, _Mode | std::ios_base::binary);
pubsetbuf(awch, _countof(awch));
}
wcrlf_filebuf *open(const wchar_t *wszFilespec,
std::ios_base::open_mode _Mode = std::ios_base::out)
{
BASE::open(wszFilespec, _Mode | std::ios_base::binary);
pubsetbuf(awch, _countof(awch));
return this;
}
virtual int_type overflow(int_type ch = traits_type::eof())
{
if (!bBomWritten) {
bBomWritten = true;
int_type iRet = BASE::overflow(0xfeff);
if (iRet != traits_type::not_eof(0xfeff)) return iRet;
}
if (ch == '\n') {
int_type iRet = BASE::overflow('\r');
if (iRet != traits_type::not_eof('\r')) return iRet;
}
return BASE::overflow(ch);
}
};
class wcrlfofstream : public std::wostream
{
typedef std::wostream BASE;
public:
wcrlfofstream(const wchar_t *wszFilespec,
std::ios_base::open_mode _Mode = std::ios_base::out)
: std::wostream(new wcrlf_filebuf(wszFilespec, _Mode))
{}
wcrlf_filebuf* rdbuf()
{
return dynamic_cast<wcrlf_filebuf*>(std::wostream::rdbuf());
}
void close()
{
rdbuf()->close();
}
};

Related

Can't read file with cyrillic path in C++

I'm trying to read file, which contains Cyrillic characters in their path, and got ifstream.is_open() == false
This is my code:
std::string ReadFile(const std::string &path) {
std::string newLine, fileContent;
std::ifstream in(path.c_str(), std::ios::in);
if (!in.is_open()) {
return std::string("isn't opened");
}
while (in.good()) {
getline(in, newLine);
fileContent += newLine;
}
in.close();
return fileContent;
}
int main() {
std::string path = "C:\\test\\документ.txt";
std::string content = ReadFile(path);
std::cout << content << std::endl;
return 0;
}
Specified file exists
I'm trying to find solution in google, but I got nothing
Here is links, which I saw:
I don't need wstring
The same as previous
no answer here
is not about C++
has no answer too
P.S. I need to get file's content in string, not in wstring
THIS IS ENCODING SETTINGS OF MY IDE (CLION 2017.1)

You'll need an up-to-date compiler or Boost. std::filesystem::path can handle these names, but it's new in the C++17 standard. Your compiler may still have it as std::experimental::filesystem::path, or else you'd use the third-party boost::filesystem::path. The interfaces are pretty comparable as the Boost version served as the inspiration.

The definition for std::string is std::basic_string, so your Cyrillic chararecters are not stored as intended. Atleast, try to use std::wstring to store your file path and then you can read from file using std::string.

First of all, set your project settings to use UTF-8 encoding instead of windows-1251. Until standard library gets really good (not any time soon) you basically can not rely on it if you want to deal with io properly. To make input stream read from files on Windows you need to write your own custom input stream buffer that opens files using 2-byte wide chars or rely on some third-party implementations of such routines. Here is some incomplete (but sufficient for your example) implementation:
// assuming that usual Windows SDK macros such as _UNICODE, WIN32_LEAN_AND_MEAN are defined above
#include <Windows.h>
#include <string>
#include <iostream>
#include <system_error>
#include <memory>
#include <utility>
#include <cstdlib>
#include <cstdio>
static_assert(2 == sizeof(wchar_t), "wchar_t size must be 2 bytes");
using namespace ::std;
class MyStreamBuf final: public streambuf
{
#pragma region Fields
private: ::HANDLE const m_file_handle;
private: char m_buffer; // typically buffer should be much bigger
#pragma endregion
public: explicit
MyStreamBuf(wchar_t const * psz_file_path)
: m_file_handle(::CreateFileW(psz_file_path, FILE_GENERIC_READ, FILE_SHARE_READ, nullptr, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL))
, m_buffer{}
{
if(INVALID_HANDLE_VALUE == m_file_handle)
{
auto const error_code{::GetLastError()};
throw(system_error(static_cast< int >(error_code), system_category(), "::CreateFileW call failed"));
}
}
public:
~MyStreamBuf(void)
{
auto const closed{::CloseHandle(m_file_handle)};
if(FALSE == closed)
{
auto const error_code{::GetLastError()};
//throw(::std::system_error(static_cast< int >(error_code), system_category(), "::CloseHandle call failed"));
// throwing in destructor is kinda wrong
// but if CloseHandle returned false then our program is in inconsistent state
// and must be terminated anyway
(void) error_code; // not used
abort();
}
}
private: auto
underflow(void) -> int_type override
{
::DWORD bytes_count_to_read{1};
::DWORD read_bytes_count{};
{
auto const succeeded{::ReadFile(m_file_handle, addressof(m_buffer), bytes_count_to_read, addressof(read_bytes_count), nullptr)};
if(FALSE == succeeded)
{
auto const error_code{::GetLastError()};
setg(nullptr, nullptr, nullptr);
throw(system_error(static_cast< int >(error_code), system_category(), "::ReadFile call failed"));
}
}
if(0 == read_bytes_count)
{
setg(nullptr, nullptr, nullptr);
return(EOF);
}
setg(addressof(m_buffer), addressof(m_buffer), addressof(m_buffer) + 1);
return(m_buffer);
}
};
string
MyReadFile(wchar_t const * psz_file_path)
{
istream in(new MyStreamBuf(psz_file_path)); // note that we create normal stream
string new_line;
string file_content;
while(in.good())
{
getline(in, new_line);
file_content += new_line;
}
return(::std::move(file_content));
}
int
main(void)
{
string content = MyReadFile(L"C:\\test\\документ.txt"); // note that path is a wide string
cout << content << endl;
return 0;
}

Change your code to use wstring and save your file using Unicode encoding (non UTF8 one, use USC-2, UTF16 or something like that). MSVC has non-standard overload specifically for this reason to be able to handle non-ascii chars in filenames:
std::string ReadFile(const std::wstring &path)
{
std::string newLine, fileContent;
std::ifstream in(path.c_str(), std::ios::in);
if (!in)
return std::string("isn't opened");
while (getline(in, newLine))
fileContent += newLine;
return fileContent;
}
int main()
{
std::wstring path = L"C:\\test\\документ.txt";
std::string content = ReadFile(path);
std::cout << content << std::endl;
}
Also, note corrected ReadFile code.

Writing C++ program compressing/decompressing data

I have to write C++ program that like gzip can
*Take input from file or from char stream like compression below
gzip file
type file | gzip
*Program have file or char stream output like decompression below
gzip -d file.gz
gzip -dc file.gz
I don't know how to take to the task and what techniques have to use and how to create classes buffering input and output. I have classes buffering input and output and read/write data from/to file.
DataBuffer.h (taking uncompressed data from file):
#ifndef DataBuffer_h
#define DataBuffer_h
#include <fstream>
#include <string>
enum DataBufferState
{
DATABUFFER_OK = 0,
DATABUFFER_EOF = 1
};
class DataBuffer
{
std::fstream file;
std::string buffer;
unsigned int maxBufferSize;
public:
DataBuffer(const std::string& filename, unsigned int maxBuffSize);
~DataBuffer();
bool OpenFile(const std::string& filename);
void SetMaxBufferSize(unsigned int maxBuffSize);
DataBufferState FullBufferWithDataOld();
DataBufferState FullBufferWithData();
std::string GetDataBuffer();
};
#endif
DataBuffer.cpp:
#include "DataBuffer.h"
using namespace std;
DataBuffer::DataBuffer(const string& filename, unsigned int maxBuffSize)
{
OpenFile(filename);
SetMaxBufferSize(maxBuffSize);
}
DataBuffer::~DataBuffer()
{
file.close();
}
bool DataBuffer::OpenFile(const string& filename)
{
file.open(filename.c_str(),ios::in);
if(!file.is_open())
return false;
return true;
}
void DataBuffer::SetMaxBufferSize(unsigned int maxBuffSize)
{
maxBufferSize = maxBuffSize;
}
DataBufferState DataBuffer::FullBufferWithDataOld()
{
while(true)
{
string line;
streampos pos = file.tellg(); // Zapamietaj polozenie przed pobraniem linii
getline(file,line);
if( buffer.size()+line.size()>maxBufferSize )
{
// Cofnac wskaznik pliku
file.seekg(pos,ios::beg); // Przywroc polozenie sprzed pobrania linii
break;
}
buffer += line + "\n";
if(file.eof())
return DATABUFFER_EOF;
}
return DATABUFFER_OK;
}
DataBufferState DataBuffer::FullBufferWithData()
{
char c;
for(unsigned int i=0;i<maxBufferSize;++i)
{
c = file.get();
if(file.eof()) break;
buffer += c;
}
if(file.eof())
return DATABUFFER_EOF;
return DATABUFFER_OK;
}
string DataBuffer::GetDataBuffer()
{
string buf = buffer;
buffer.clear();
return buf;
}
BufferWriter.h (Save uncompressed data into file):
#ifndef BufferWriter_h
#define BufferWriter_h
#include <string>
#include <fstream>
class BufferWriter
{
std::string filename;
std::fstream file;
public:
BufferWriter(const std::string& filename_);
~BufferWriter();
bool OpenFile(const std::string& filename, bool appending);
void SendBufferToFile(std::string& buffer);
};
#endif
BufferWriter.cpp
#include "BufferWriter.h"
using namespace std;
BufferWriter::BufferWriter(const string& filename_)
{
filename = filename_;
OpenFile(filename.c_str(),false);
file.close();
}
BufferWriter::~BufferWriter()
{
file.close();
}
bool BufferWriter::OpenFile(const string& filename, bool appending)
{
if(appending)
file.open(filename.c_str(),ios::out | ios::app);
else
file.open(filename.c_str(),ios::out);
if(!file.is_open())
return false;
return true;
}
void BufferWriter::SendBufferToFile(string& buffer)
{
OpenFile(filename,true);
file.write(buffer.c_str(),buffer.size());
file.close();
}
Can you give me some hints how to improve code for input and output mechanisms?
Assume that I have class presented below, how to use istream or iterators to fill buffer with data from file or standard input. What classes from std or boost? What parameters? Somelike to support definition of class with this functionality.
[EDIT]:
#ifndef StreamBuffer_h
#define StreamBuffer_h
#include <string>
using namespace std;
enum DataBufferState
{
DATABUFFER_OK = 0,
DATABUFFER_EOF = 1
};
// gzip plik
// type plik | gzip -d
// gzip -d plik.gz
// gzip -dc plik.gz
// Parametr konstruktora to strumien z ktorego chcemy czytac i dlugosc bufora
class StreamBuffer
{
int maxBufferSize;
std::string buffer;
StreamBuffer(int maxBuffSize)
{
SetMaxBufferSize(maxBuffSize);
}
~StreamBuffer()
{
}
void SetMaxBufferSize(unsigned int maxBuffSize)
{
maxBufferSize = maxBuffSize;
}
DataBufferState FullBufferWithData()
{
// What to use what to do in this method to read part of file or standard char input to buffer?
}
std::string GetDataBuffer()
{
return buffer;
}
};
#endif
[EDIT2]:
I want to do the same thing as in this thread: Read from file or stdin, but in C++.

In general you read input from a source and write it to a sink. The simplest case is when you simply write what you read. You, however, want to apply a transformation (or filter) to the data that you read. Seeing as you're after "the c++ way," I'd suggest taking a look at boost::iostreams which abstracts the task in terms of sources/sinks.
Boost defines an abstract source by:
struct Source {
typedef char char_type;
typedef source_tag category;
std::streamsize read(char* s, std::streamsize n)
{
// Read up to n characters from the input
// sequence into the buffer s, returning
// the number of characters read, or -1
// to indicate end-of-sequence.
}
};
And sinks are defined in a similar way (with a write instead of a read, of course). The benefit of this is that the details of the source/sink is irrelevant - you can read/write to file, to a network adapter, or whatever, without any structural changes.
To apply filters I'd again suggest looking at boost::iostreams, although they do abstract a lot which somewhat complicates implementation..

Can anyone explain why my crypto++ decrypted file is 16 bytes short?

In order that I might feed AES encrypted text as an std::istream to a parser component I am trying to create a std::streambuf implementation wrapping the vanilla crypto++ encryption/decryption.
The main() function calls the following functions to compare my wrapper with the vanilla implementation:
EncryptFile() - encrypt file using my streambuf implementation
DecryptFile() - decrypt file using my streambuf implementation
EncryptFileVanilla() - encrypt file using vanilla crypto++
DecryptFileVanilla() - decrypt file using vanilla crypto++
The problem is that whilst the encrypted files created by EncryptFile() and EncryptFileVanilla() are identical. The decrypted file created by DecryptFile() is incorrect being 16 bytes short of that created by DecryptFileVanilla(). Probably not coincidentally the block size is also 16.
I think the issue must be in CryptStreamBuffer::GetNextChar(), but I've been staring at it and the crypto++ documentation for hours.
Can anybody help/explain?
Any other comments about how crummy or naive my std::streambuf implementation are also welcome ;-)
Thanks,
Tom
// Runtime Includes
#include <iostream>
// Crypto++ Includes
#include "aes.h"
#include "modes.h" // xxx_Mode< >
#include "filters.h" // StringSource and
// StreamTransformation
#include "files.h"
using namespace std;
class CryptStreamBuffer: public std::streambuf {
public:
CryptStreamBuffer(istream& encryptedInput, CryptoPP::StreamTransformation& c);
CryptStreamBuffer(ostream& encryptedOutput, CryptoPP::StreamTransformation& c);
~CryptStreamBuffer();
protected:
virtual int_type overflow(int_type ch = traits_type::eof());
virtual int_type uflow();
virtual int_type underflow();
virtual int_type pbackfail(int_type ch);
virtual int sync();
private:
int GetNextChar();
int m_NextChar; // Buffered character
CryptoPP::StreamTransformationFilter* m_StreamTransformationFilter;
CryptoPP::FileSource* m_Source;
CryptoPP::FileSink* m_Sink;
}; // class CryptStreamBuffer
CryptStreamBuffer::CryptStreamBuffer(istream& encryptedInput, CryptoPP::StreamTransformation& c) :
m_NextChar(traits_type::eof()),
m_StreamTransformationFilter(0),
m_Source(0),
m_Sink(0) {
m_StreamTransformationFilter = new CryptoPP::StreamTransformationFilter(c, 0, CryptoPP::BlockPaddingSchemeDef::PKCS_PADDING);
m_Source = new CryptoPP::FileSource(encryptedInput, false, m_StreamTransformationFilter);
}
CryptStreamBuffer::CryptStreamBuffer(ostream& encryptedOutput, CryptoPP::StreamTransformation& c) :
m_NextChar(traits_type::eof()),
m_StreamTransformationFilter(0),
m_Source(0),
m_Sink(0) {
m_Sink = new CryptoPP::FileSink(encryptedOutput);
m_StreamTransformationFilter = new CryptoPP::StreamTransformationFilter(c, m_Sink, CryptoPP::BlockPaddingSchemeDef::PKCS_PADDING);
}
CryptStreamBuffer::~CryptStreamBuffer() {
if (m_Sink) {
delete m_StreamTransformationFilter;
// m_StreamTransformationFilter owns and deletes m_Sink.
}
if (m_Source) {
delete m_Source;
// m_Source owns and deletes m_StreamTransformationFilter.
}
}
CryptStreamBuffer::int_type CryptStreamBuffer::overflow(int_type ch) {
return m_StreamTransformationFilter->Put((byte)ch);
}
CryptStreamBuffer::int_type CryptStreamBuffer::uflow() {
int_type result = GetNextChar();
// Reset the buffered character
m_NextChar = traits_type::eof();
return result;
}
CryptStreamBuffer::int_type CryptStreamBuffer::underflow() {
return GetNextChar();
}
CryptStreamBuffer::int_type CryptStreamBuffer::pbackfail(int_type ch) {
return traits_type::eof();
}
int CryptStreamBuffer::sync() {
// TODO: Not sure sync is the correct place to be doing this.
// Should it be in the destructor?
if (m_Sink) {
m_StreamTransformationFilter->MessageEnd();
// m_StreamTransformationFilter->Flush(true);
}
return 0;
}
int CryptStreamBuffer::GetNextChar() {
// If we have a buffered character do nothing
if (m_NextChar != traits_type::eof()) {
return m_NextChar;
}
// If there are no more bytes currently available then pump the source
if (m_StreamTransformationFilter->MaxRetrievable() == 0) {
m_Source->Pump(1024);
}
// Retrieve the next byte
byte nextByte;
size_t noBytes = m_StreamTransformationFilter->Get(nextByte);
if (0 == noBytes) {
return traits_type::eof();
}
// Buffer up the next character
m_NextChar = nextByte;
return m_NextChar;
}
void InitKey(byte key[]) {
key[0] = -62;
key[1] = 102;
key[2] = 78;
key[3] = 75;
key[4] = -96;
key[5] = 125;
key[6] = 66;
key[7] = 125;
key[8] = -95;
key[9] = -66;
key[10] = 114;
key[11] = 22;
key[12] = 48;
key[13] = 111;
key[14] = -51;
key[15] = 112;
}
/** Decrypt using my CryptStreamBuffer */
void DecryptFile(const char* sourceFileName, const char* destFileName) {
ifstream ifs(sourceFileName, ios::in | ios::binary);
ofstream ofs(destFileName, ios::out | ios::binary);
byte key[CryptoPP::AES::DEFAULT_KEYLENGTH];
InitKey(key);
CryptoPP::ECB_Mode<CryptoPP::AES>::Decryption decryptor(key, sizeof(key));
if (ifs) {
if (ofs) {
CryptStreamBuffer cryptBuf(ifs, decryptor);
std::istream decrypt(&cryptBuf);
int c;
while (EOF != (c = decrypt.get())) {
ofs << (char)c;
}
ofs.flush();
}
else {
std::cerr << "Failed to open file '" << destFileName << "'." << endl;
}
}
else {
std::cerr << "Failed to open file '" << sourceFileName << "'." << endl;
}
}
/** Encrypt using my CryptStreamBuffer */
void EncryptFile(const char* sourceFileName, const char* destFileName) {
ifstream ifs(sourceFileName, ios::in | ios::binary);
ofstream ofs(destFileName, ios::out | ios::binary);
byte key[CryptoPP::AES::DEFAULT_KEYLENGTH];
InitKey(key);
CryptoPP::ECB_Mode<CryptoPP::AES>::Encryption encryptor(key, sizeof(key));
if (ifs) {
if (ofs) {
CryptStreamBuffer cryptBuf(ofs, encryptor);
std::ostream encrypt(&cryptBuf);
int c;
while (EOF != (c = ifs.get())) {
encrypt << (char)c;
}
encrypt.flush();
}
else {
std::cerr << "Failed to open file '" << destFileName << "'." << endl;
}
}
else {
std::cerr << "Failed to open file '" << sourceFileName << "'." << endl;
}
}
/** Decrypt using vanilla crypto++ */
void DecryptFileVanilla(const char* sourceFileName, const char* destFileName) {
byte key[CryptoPP::AES::DEFAULT_KEYLENGTH];
InitKey(key);
CryptoPP::ECB_Mode<CryptoPP::AES>::Decryption decryptor(key, sizeof(key));
CryptoPP::FileSource(sourceFileName, true,
new CryptoPP::StreamTransformationFilter(decryptor,
new CryptoPP::FileSink(destFileName), CryptoPP::BlockPaddingSchemeDef::PKCS_PADDING
) // StreamTransformationFilter
); // FileSource
}
/** Encrypt using vanilla crypto++ */
void EncryptFileVanilla(const char* sourceFileName, const char* destFileName) {
byte key[CryptoPP::AES::DEFAULT_KEYLENGTH];
InitKey(key);
CryptoPP::ECB_Mode<CryptoPP::AES>::Encryption encryptor(key, sizeof(key));
CryptoPP::FileSource(sourceFileName, true,
new CryptoPP::StreamTransformationFilter(encryptor,
new CryptoPP::FileSink(destFileName), CryptoPP::BlockPaddingSchemeDef::PKCS_PADDING
) // StreamTransformationFilter
); // FileSource
}
int main(int argc, char* argv[])
{
EncryptFile(argv[1], "encrypted.out");
DecryptFile("encrypted.out", "decrypted.out");
EncryptFileVanilla(argv[1], "encrypted_vanilla.out");
DecryptFileVanilla("encrypted_vanilla.out", "decrypted_vanilla.out");
return 0;
}

After working with a debug build of crypto++ it turns out that what was missing was a call to the StreamTransformationFilter advising it that there would be nothing more coming from the Source and that it should wrap up the processing of the final few bytes, including the padding.
In CryptStreamBuffer::GetNextChar():
Replace:
// If there are no more bytes currently available then pump the source
if (m_StreamTransformationFilter->MaxRetrievable() == 0) {
m_Source->Pump(1024);
}
With:
// If there are no more bytes currently available from the filter then
// pump the source.
if (m_StreamTransformationFilter->MaxRetrievable() == 0) {
if (0 == m_Source->Pump(1024)) {
// This seems to be required to ensure the final bytes are readable
// from the filter.
m_StreamTransformationFilter->ChannelMessageEnd(CryptoPP::DEFAULT_CHANNEL);
}
}
I make no claims that this is the best solution, just one I discovered by trial and error that appears to work.

If your input buffer is not a multiplicity of a 16-byte block, you need to stuff the last block with dummy bytes. If the last block is less than 16 bytes it is dropped by crypto++ and not encrypted. When decrypting, you need to truncate the dummy bytes.
That 'another way' you are referring to, already does the addition and truncation for you.
So what should be the dummy bytes, to know how many of them there is, thus should be truncated? I use the following pattern: fill each byte with the value of dummies count.
Examples: You need to add 8 bytes? set them to 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08. You need to add 3 bytes? set them to 0x03, 0x03, 0x03 etc.
When decrypting, get the value of last byte of the output buffer. Assume it is N. Check, if the values last N bytes are equal to N. Truncate, if true.
UPDATE:
CryptStreamBuffer::CryptStreamBuffer(istream& encryptedInput, CryptoPP::StreamTransformation& c) :
m_NextChar(traits_type::eof()),
m_StreamTransformationFilter(0),
m_Source(0),
m_Sink(0) {
m_StreamTransformationFilter = new CryptoPP::StreamTransformationFilter(c, 0, CryptoPP::BlockPaddingSchemeDef::ZEROS_PADDING);
m_Source = new CryptoPP::FileSource(encryptedInput, false, m_StreamTransformationFilter);
}
CryptStreamBuffer::CryptStreamBuffer(ostream& encryptedOutput, CryptoPP::StreamTransformation& c) :
m_NextChar(traits_type::eof()),
m_StreamTransformationFilter(0),
m_Source(0),
m_Sink(0) {
m_Sink = new CryptoPP::FileSink(encryptedOutput);
m_StreamTransformationFilter = new CryptoPP::StreamTransformationFilter(c, m_Sink, CryptoPP::BlockPaddingSchemeDef::ZEROS_PADDING);
}
Setting the ZEROS_PADDING made your code working (tested on text files). However why it does not work with DEFAULT_PADDING - I did not find the cause yet.

std::ostream interface to an OLE IStream

I have a Visual Studio 2008 C++ application using IStreams. I would like to use the IStream connection in a std::ostream. Something like this:
IStream* stream = /*create valid IStream instance...*/;
IStreamBuf< WIN32_FIND_DATA > sb( stream );
std::ostream os( &sb );
WIN32_FIND_DATA d = { 0 };
// send the structure along the IStream
os << d;
To accomplish this, I've implemented the following code:
template< class _CharT, class _Traits >
inline std::basic_ostream< _CharT, _Traits >&
operator<<( std::basic_ostream< _CharT, _Traits >& os, const WIN32_FIND_DATA& i )
{
const _CharT* c = reinterpret_cast< const _CharT* >( &i );
const _CharT* const end = c + sizeof( WIN32_FIND_DATA ) / sizeof( _CharT );
for( c; c < end; ++c ) os << *c;
return os;
}
template< typename T >
class IStreamBuf : public std::streambuf
{
public:
IStreamBuf( IStream* stream ) : stream_( stream )
{
setp( reinterpret_cast< char* >( &buffer_ ),
reinterpret_cast< char* >( &buffer_ ) + sizeof( buffer_ ) );
};
virtual ~IStreamBuf()
{
sync();
};
protected:
traits_type::int_type FlushBuffer()
{
int bytes = std::min< int >( pptr() - pbase(), sizeof( buffer_ ) );
DWORD written = 0;
HRESULT hr = stream_->Write( &buffer_, bytes, &written );
if( FAILED( hr ) )
{
return traits_type::eof();
}
pbump( -bytes );
return bytes;
};
virtual int sync()
{
if( FlushBuffer() == traits_type::eof() )
return -1;
return 0;
};
traits_type::int_type overflow( traits_type::int_type ch )
{
if( FlushBuffer() == traits_type::eof() )
return traits_type::eof();
if( ch != traits_type::eof() )
{
*pptr() = ch;
pbump( 1 );
}
return ch;
};
private:
/// data queued up to be sent
T buffer_;
/// output stream
IStream* stream_;
}; // class IStreamBuf
Yes, the code compiles and seems to work, but I've not had the pleasure of implementing a std::streambuf before. So, I'd just like to know if it's correct and complete.
Thanks,
PaulH

Here's an implementation I wrote a while ago: https://github.com/senderista/UnbufferedOLEStreamBuf
For the most part you just need to mechanically translate between STL and Windows/COM types. This translation is tedious and error-prone, though, so you would probably be best off using the implementation I linked. It's only 150 lines, it was used in a production application, and there are few places for bugs to hide.
Since apparently links in answers are cause for deletion, here's the code in its entirety:
//Copyright (c) 2010 Tobin Baker
//
//Permission is hereby granted, free of charge, to any person obtaining a copy
//of this software and associated documentation files (the "Software"), to deal
//in the Software without restriction, including without limitation the rights
//to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
//copies of the Software, and to permit persons to whom the Software is
//furnished to do so, subject to the following conditions:
//
//The above copyright notice and this permission notice shall be included in
//all copies or substantial portions of the Software.
//
//THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
//IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
//FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
//AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
//LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
//OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
//THE SOFTWARE.
#include <ios>
#include <streambuf>
#include <windows.h>
// Read-write unbuffered streambuf implementation which uses no
// internal buffers (conventional wisdom says this can't be done
// except for write-only streams, but I adapted Matt Austern's example
// from http://www.drdobbs.com/184401305).
class UnbufferedOLEStreamBuf : public std::streambuf {
public:
UnbufferedOLEStreamBuf(IStream *stream, std::ios_base::openmode which = std::ios_base::in | std::ios_base::out);
~UnbufferedOLEStreamBuf();
protected:
virtual int overflow(int ch = traits_type::eof());
virtual int underflow();
virtual int uflow();
virtual int pbackfail(int ch = traits_type::eof());
virtual int sync();
virtual std::streampos seekpos(std::streampos sp, std::ios_base::openmode which = std::ios_base::in | std::ios_base::out);
virtual std::streampos seekoff(std::streamoff off, std::ios_base::seekdir way, std::ios_base::openmode which = std::ios_base::in | std::ios_base::out);
virtual std::streamsize xsgetn(char *s, std::streamsize n);
virtual std::streamsize xsputn(const char *s, std::streamsize n);
virtual std::streamsize showmanyc();
private:
IStream *stream_;
bool readOnly_;
bool backup();
};
UnbufferedOLEStreamBuf::UnbufferedOLEStreamBuf(IStream *stream, std::ios_base::openmode which)
: std::streambuf(), stream_(stream), readOnly_(!(which & std::ios_base::out)) {}
UnbufferedOLEStreamBuf::~UnbufferedOLEStreamBuf() { if (!readOnly_) UnbufferedOLEStreamBuf::sync(); }
bool UnbufferedOLEStreamBuf::backup() {
LARGE_INTEGER liMove;
liMove.QuadPart = -1LL;
return SUCCEEDED(stream_->Seek(liMove, STREAM_SEEK_CUR, NULL));
}
int UnbufferedOLEStreamBuf::overflow(int ch) {
if (ch != traits_type::eof()) {
if (SUCCEEDED(stream_->Write(&ch, 1, NULL))) {
return ch;
}
}
return traits_type::eof();
}
int UnbufferedOLEStreamBuf::underflow() {
char ch = UnbufferedOLEStreamBuf::uflow();
if (ch != traits_type::eof()) {
ch = backup() ? ch : traits_type::eof();
}
return ch;
}
int UnbufferedOLEStreamBuf::uflow() {
char ch;
ULONG cbRead;
// S_FALSE indicates we've reached end of stream
return (S_OK == stream_->Read(&ch, 1, &cbRead))
? ch : traits_type::eof();
}
int UnbufferedOLEStreamBuf::pbackfail(int ch) {
if (ch != traits_type::eof()) {
ch = backup() ? ch : traits_type::eof();
}
return ch;
}
int UnbufferedOLEStreamBuf::sync() {
return SUCCEEDED(stream_->Commit(STGC_DEFAULT)) ? 0 : -1;
}
std::ios::streampos UnbufferedOLEStreamBuf::seekpos(std::ios::streampos sp, std::ios_base::openmode which) {
LARGE_INTEGER liMove;
liMove.QuadPart = sp;
return SUCCEEDED(stream_->Seek(liMove, STREAM_SEEK_SET, NULL)) ? sp : -1;
}
std::streampos UnbufferedOLEStreamBuf::seekoff(std::streamoff off, std::ios_base::seekdir way, std::ios_base::openmode which) {
STREAM_SEEK sk;
switch (way) {
case std::ios_base::beg: sk = STREAM_SEEK_SET; break;
case std::ios_base::cur: sk = STREAM_SEEK_CUR; break;
case std::ios_base::end: sk = STREAM_SEEK_END; break;
default: return -1;
}
LARGE_INTEGER liMove;
liMove.QuadPart = static_cast<LONGLONG>(off);
ULARGE_INTEGER uliNewPos;
return SUCCEEDED(stream_->Seek(liMove, sk, &uliNewPos))
? static_cast<std::streampos>(uliNewPos.QuadPart) : -1;
}
std::streamsize UnbufferedOLEStreamBuf::xsgetn(char *s, std::streamsize n) {
ULONG cbRead;
return SUCCEEDED(stream_->Read(s, static_cast<ULONG>(n), &cbRead))
? static_cast<std::streamsize>(cbRead) : 0;
}
std::streamsize UnbufferedOLEStreamBuf::xsputn(const char *s, std::streamsize n) {
ULONG cbWritten;
return SUCCEEDED(stream_->Write(s, static_cast<ULONG>(n), &cbWritten))
? static_cast<std::streamsize>(cbWritten) : 0;
}
std::streamsize UnbufferedOLEStreamBuf::showmanyc() {
STATSTG stat;
if (SUCCEEDED(stream_->Stat(&stat, STATFLAG_NONAME))) {
std::streampos lastPos = static_cast<std::streampos>(stat.cbSize.QuadPart - 1);
LARGE_INTEGER liMove;
liMove.QuadPart = 0LL;
ULARGE_INTEGER uliNewPos;
if (SUCCEEDED(stream_->Seek(liMove, STREAM_SEEK_CUR, &uliNewPos))) {
return std::max<std::streamsize>(lastPos - static_cast<std::streampos>(uliNewPos.QuadPart), 0);
}
}
return 0;
}
std::streambuf* StdStreamBufFromOLEStream(IStream *pStream, std::ios_base::openmode which)
{
return new (std::nothrow) UnbufferedOLEStreamBuf(pStream, which);
}
Hope this helps.

My suggestion would be to use Boost.IOStreams Device to wrap the IStream. It takes care of all of the C++ IO streams stuff so that you do not have to.

UTF-8 output on Windows console

The following code shows unexpected behaviour on my machine (tested with Visual C++ 2008 SP1 on Windows XP and VS 2012 on Windows 7):
#include <iostream>
#include "Windows.h"
int main() {
SetConsoleOutputCP( CP_UTF8 );
std::cout << "\xc3\xbc";
int fail = std::cout.fail() ? '1': '0';
fputc( fail, stdout );
fputs( "\xc3\xbc", stdout );
}
I simply compiled with cl /EHsc test.cpp.
Windows XP: Output in a console window is
Ã¼0Ã¼ (translated to Codepage 1252, originally shows some line drawing
charachters in the default Codepage, perhaps 437). When I change the settings
of the console window to use the "Lucida Console" character set and run my
test.exe again, output is changed to 1ü, which means
the character ü can be written using fputs and its UTF-8 encoding C3 BC
std::cout does not work for whatever reason
the streams failbit is setting after trying to write the character
Windows 7: Output using Consolas is ��0ü. Even more interesting. The correct bytes are written, probably (at least when redirecting the output to a file) and the stream state is ok, but the two bytes are written as separate characters).
I tried to raise this issue on "Microsoft Connect" (see here),
but MS has not been very helpful. You might as well look here
as something similar has been asked before.
Can you reproduce this problem?
What am I doing wrong? Shouldn't the std::cout and the fputs have the same
effect?
SOLVED: (sort of) Following mike.dld's idea I implemented a std::stringbuf doing the conversion from UTF-8 to Windows-1252 in sync() and replaced the streambuf of std::cout with this converter (see my comment on mike.dld's answer).

I understand the question is quite old, but if someone would still be interested, below is my solution. I've implemented a quite simple std::streambuf descendant and then passed it to each of standard streams on the very beginning of program execution.
This allows you to use UTF-8 everywhere in your program. On input, data is taken from console in Unicode and then converted and returned to you in UTF-8. On output the opposite is done, taking data from you in UTF-8, converting it to Unicode and sending to console. No issues found so far.
Also note, that this solution doesn't require any codepage modification, with either SetConsoleCP, SetConsoleOutputCP or chcp, or something else.
That's the stream buffer:
class ConsoleStreamBufWin32 : public std::streambuf
{
public:
ConsoleStreamBufWin32(DWORD handleId, bool isInput);
protected:
// std::basic_streambuf
virtual std::streambuf* setbuf(char_type* s, std::streamsize n);
virtual int sync();
virtual int_type underflow();
virtual int_type overflow(int_type c = traits_type::eof());
private:
HANDLE const m_handle;
bool const m_isInput;
std::string m_buffer;
};
ConsoleStreamBufWin32::ConsoleStreamBufWin32(DWORD handleId, bool isInput) :
m_handle(::GetStdHandle(handleId)),
m_isInput(isInput),
m_buffer()
{
if (m_isInput)
{
setg(0, 0, 0);
}
}
std::streambuf* ConsoleStreamBufWin32::setbuf(char_type* /*s*/, std::streamsize /*n*/)
{
return 0;
}
int ConsoleStreamBufWin32::sync()
{
if (m_isInput)
{
::FlushConsoleInputBuffer(m_handle);
setg(0, 0, 0);
}
else
{
if (m_buffer.empty())
{
return 0;
}
std::wstring const wideBuffer = utf8_to_wstring(m_buffer);
DWORD writtenSize;
::WriteConsoleW(m_handle, wideBuffer.c_str(), wideBuffer.size(), &writtenSize, NULL);
}
m_buffer.clear();
return 0;
}
ConsoleStreamBufWin32::int_type ConsoleStreamBufWin32::underflow()
{
if (!m_isInput)
{
return traits_type::eof();
}
if (gptr() >= egptr())
{
wchar_t wideBuffer[128];
DWORD readSize;
if (!::ReadConsoleW(m_handle, wideBuffer, ARRAYSIZE(wideBuffer) - 1, &readSize, NULL))
{
return traits_type::eof();
}
wideBuffer[readSize] = L'\0';
m_buffer = wstring_to_utf8(wideBuffer);
setg(&m_buffer[0], &m_buffer[0], &m_buffer[0] + m_buffer.size());
if (gptr() >= egptr())
{
return traits_type::eof();
}
}
return sgetc();
}
ConsoleStreamBufWin32::int_type ConsoleStreamBufWin32::overflow(int_type c)
{
if (m_isInput)
{
return traits_type::eof();
}
m_buffer += traits_type::to_char_type(c);
return traits_type::not_eof(c);
}
The usage then is as follows:
template<typename StreamT>
inline void FixStdStream(DWORD handleId, bool isInput, StreamT& stream)
{
if (::GetFileType(::GetStdHandle(handleId)) == FILE_TYPE_CHAR)
{
stream.rdbuf(new ConsoleStreamBufWin32(handleId, isInput));
}
}
// ...
int main()
{
FixStdStream(STD_INPUT_HANDLE, true, std::cin);
FixStdStream(STD_OUTPUT_HANDLE, false, std::cout);
FixStdStream(STD_ERROR_HANDLE, false, std::cerr);
// ...
std::cout << "\xc3\xbc" << std::endl;
// ...
}
Left out wstring_to_utf8 and utf8_to_wstring could easily be implemented with WideCharToMultiByte and MultiByteToWideChar WinAPI functions.

Oi. Congratulations on finding a way to change the code page of the console from inside your program. I didn't know about that call, I always had to use chcp.
I'm guessing the C++ default locale is getting involved. By default it will use the code page provide by GetThreadLocale() to determine the text encoding of non-wstring stuff. This generally defaults to CP1252. You could try using SetThreadLocale() to get to UTF-8 (if it even does that, can't recall), with the hope that std::locale defaults to something that can handle your UTF-8 encoding.

It's time to close this now. Stephan T. Lavavej says the behaviour is "by design", although I cannot follow this explanation.
My current knowledge is: Windows XP console in UTF-8 codepage does not work with C++ iostreams.
Windows XP is getting out of fashion now and so does VS 2008. I'd be interested to hear if the problem still exists on newer Windows systems.
On Windows 7 the effect is probably due to the way the C++ streams output characters. As seen in an answer to Properly print utf8 characters in windows console, UTF-8 output fails with C stdio when printing one byte after after another like putc('\xc3'); putc('\xbc'); as well. Perhaps this is what C++ streams do here.

I just follow mike.dld's answer in this question, and add the printf support for the UTF-8 string.
As mkluwe mentioned in his answer that by default, printf function will output to the console one by one byte, while the console can't handle single byte correctly. My method is quite simple, I use the snprintf function to print the whole content to a internal string buffer, and then dump the buffer to std::cout.
Here is the full testing code:
#include <iostream>
#include <locale>
#include <windows.h>
#include <cstdlib>
using namespace std;
// https://stackoverflow.com/questions/4358870/convert-wstring-to-string-encoded-in-utf-8
#include <codecvt>
#include <string>
// convert UTF-8 string to wstring
std::wstring utf8_to_wstring (const std::string& str)
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
return myconv.from_bytes(str);
}
// convert wstring to UTF-8 string
std::string wstring_to_utf8 (const std::wstring& str)
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
return myconv.to_bytes(str);
}
// https://stackoverflow.com/questions/1660492/utf-8-output-on-windows-console
// mike.dld's answer
class ConsoleStreamBufWin32 : public std::streambuf
{
public:
ConsoleStreamBufWin32(DWORD handleId, bool isInput);
protected:
// std::basic_streambuf
virtual std::streambuf* setbuf(char_type* s, std::streamsize n);
virtual int sync();
virtual int_type underflow();
virtual int_type overflow(int_type c = traits_type::eof());
private:
HANDLE const m_handle;
bool const m_isInput;
std::string m_buffer;
};
ConsoleStreamBufWin32::ConsoleStreamBufWin32(DWORD handleId, bool isInput) :
m_handle(::GetStdHandle(handleId)),
m_isInput(isInput),
m_buffer()
{
if (m_isInput)
{
setg(0, 0, 0);
}
}
std::streambuf* ConsoleStreamBufWin32::setbuf(char_type* /*s*/, std::streamsize /*n*/)
{
return 0;
}
int ConsoleStreamBufWin32::sync()
{
if (m_isInput)
{
::FlushConsoleInputBuffer(m_handle);
setg(0, 0, 0);
}
else
{
if (m_buffer.empty())
{
return 0;
}
std::wstring const wideBuffer = utf8_to_wstring(m_buffer);
DWORD writtenSize;
::WriteConsoleW(m_handle, wideBuffer.c_str(), wideBuffer.size(), &writtenSize, NULL);
}
m_buffer.clear();
return 0;
}
ConsoleStreamBufWin32::int_type ConsoleStreamBufWin32::underflow()
{
if (!m_isInput)
{
return traits_type::eof();
}
if (gptr() >= egptr())
{
wchar_t wideBuffer[128];
DWORD readSize;
if (!::ReadConsoleW(m_handle, wideBuffer, ARRAYSIZE(wideBuffer) - 1, &readSize, NULL))
{
return traits_type::eof();
}
wideBuffer[readSize] = L'\0';
m_buffer = wstring_to_utf8(wideBuffer);
setg(&m_buffer[0], &m_buffer[0], &m_buffer[0] + m_buffer.size());
if (gptr() >= egptr())
{
return traits_type::eof();
}
}
return sgetc();
}
ConsoleStreamBufWin32::int_type ConsoleStreamBufWin32::overflow(int_type c)
{
if (m_isInput)
{
return traits_type::eof();
}
m_buffer += traits_type::to_char_type(c);
return traits_type::not_eof(c);
}
template<typename StreamT>
inline void FixStdStream(DWORD handleId, bool isInput, StreamT& stream)
{
if (::GetFileType(::GetStdHandle(handleId)) == FILE_TYPE_CHAR)
{
stream.rdbuf(new ConsoleStreamBufWin32(handleId, isInput));
}
}
// some code are from this blog
// https://blog.csdn.net/witton/article/details/108087135
#define printf(fmt, ...) __fprint(stdout, fmt, ##__VA_ARGS__ )
int __vfprint(FILE *fp, const char *fmt, va_list va)
{
// https://stackoverflow.com/questions/7315936/which-of-sprintf-snprintf-is-more-secure
size_t nbytes = snprintf(NULL, 0, fmt, va) + 1; /* +1 for the '\0' */
char *str = (char*)malloc(nbytes);
snprintf(str, nbytes, fmt, va);
std::cout << str;
free(str);
return nbytes;
}
int __fprint(FILE *fp, const char *fmt, ...)
{
va_list va;
va_start(va, fmt);
int n = __vfprint(fp, fmt, va);
va_end(va);
return n;
}
int main()
{
FixStdStream(STD_INPUT_HANDLE, true, std::cin);
FixStdStream(STD_OUTPUT_HANDLE, false, std::cout);
FixStdStream(STD_ERROR_HANDLE, false, std::cerr);
// ...
std::cout << "\xc3\xbc" << std::endl;
printf("\xc3\xbc");
// ...
return 0;
}
The source code is saved in UTF-8 format, and build under Msys2's GCC and run under Windows 7 64bit. Here is the result
ü
ü

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ iostream UTF-16 file I/O with CR LF translation - c++

Related

Can't read file with cyrillic path in C++

Writing C++ program compressing/decompressing data

Can anyone explain why my crypto++ decrypted file is 16 bytes short?

std::ostream interface to an OLE IStream

UTF-8 output on Windows console

Categories

Resources