I am using zlib to compress data for a game I am making. Here is the code I have been using
#include <SFML/Graphics.hpp>
#include <Windows.h>
#include <fstream>
#include <iostream>
#include "zlib.h"
#include "zconf.h"
using namespace std;
void compress(Bytef* toWrite, int bufferSize, char* filename)
{
uLongf comprLen = compressBound(bufferSize);
Bytef* data = new Bytef[comprLen];
compress(data, &comprLen, &toWrite[0], bufferSize);
ofstream file(filename);
file.write((char*) data, comprLen);
file.close();
cout<<comprLen;
}
int main()
{
const int X_BLOCKS = 1700;
const int Y_BLOCKS = 19;
int bufferSize = X_BLOCKS * Y_BLOCKS;
Bytef world[X_BLOCKS][Y_BLOCKS];
//fill world with integer values
compress(&world[0][0], bufferSize, "Level.lvl");
while(2);
return EXIT_SUCCESS;
}
Now I would have expected the program to simply compress the array world and save it to a file. However I noticed a weird behavior. When I prited the value for 'comprLen' it was a different length then the created file. I couldn't understand where the extra bytes in the file were coming from.
You need to open the file in binary mode:
std::ofstream file(filename, std::ios_base::binary);
without the std::ios_base::binary flag the system will replace end of line characters (\n) by end of line sequences (\n\r). Suppressing this conversion is the only purpose of the std::ios_base::binary flag.
Note that the conversion is made on the bytes written to the stream. That is, the number of actually written bytes will increase compared to the second argument to write(). Also note, that you need to make sure that you are using the "C" locale rather than some locale with a non-trivial code conversion facet (since you don't explicitly set the global std::locale in your code you should get the default which is the "C" locale).
Related
How can I read a Unicode (UTF-8) file into wstring(s) on the Windows platform?
With C++11 support, you can use std::codecvt_utf8 facet which encapsulates conversion between a UTF-8 encoded byte string and UCS2 or UCS4 character string and which can be used to read and write UTF-8 files, both text and binary.
In order to use facet you usually create locale object that encapsulates culture-specific information as a set of facets that collectively define a specific localized environment. Once you have a locale object, you can imbue your stream buffer with it:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
which can be used like this:
std::wstring wstr = readFile("a.txt");
Alternatively you can set the global C++ locale before you work with string streams which causes all future calls to the std::locale default constructor to return a copy of the global C++ locale (you don't need to explicitly imbue stream buffers with it then):
std::locale::global(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
According to a comment by #Hans Passant, the simplest way is to use _wfopen_s. Open the file with mode rt, ccs=UTF-8.
Here is another pure C++ solution that works at least with VC++ 2010:
#include <locale>
#include <codecvt>
#include <string>
#include <fstream>
#include <cstdlib>
int main() {
const std::locale empty_locale = std::locale::empty();
typedef std::codecvt_utf8<wchar_t> converter_type;
const converter_type* converter = new converter_type;
const std::locale utf8_locale = std::locale(empty_locale, converter);
std::wifstream stream(L"test.txt");
stream.imbue(utf8_locale);
std::wstring line;
std::getline(stream, line);
std::system("pause");
}
Except for locale::empty() (here locale::global() might work as well) and the wchar_t* overload of the basic_ifstream constructor, this should even be pretty standard-compliant (where “standard” means C++0x, of course).
Here's a platform-specific function for Windows only:
size_t GetSizeOfFile(const std::wstring& path)
{
struct _stat fileinfo;
_wstat(path.c_str(), &fileinfo);
return fileinfo.st_size;
}
std::wstring LoadUtf8FileToString(const std::wstring& filename)
{
std::wstring buffer; // stores file contents
FILE* f = _wfopen(filename.c_str(), L"rtS, ccs=UTF-8");
// Failed to open file
if (f == NULL)
{
// ...handle some error...
return buffer;
}
size_t filesize = GetSizeOfFile(filename);
// Read entire file contents in to memory
if (filesize > 0)
{
buffer.resize(filesize);
size_t wchars_read = fread(&(buffer.front()), sizeof(wchar_t), filesize, f);
buffer.resize(wchars_read);
buffer.shrink_to_fit();
}
fclose(f);
return buffer;
}
Use like so:
std::wstring mytext = LoadUtf8FileToString(L"C:\\MyUtf8File.txt");
Note the entire file is loaded in to memory, so you might not want to use it for very large files.
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <cstdlib>
int main()
{
std::wifstream wif("filename.txt");
wif.imbue(std::locale("zh_CN.UTF-8"));
std::wcout.imbue(std::locale("zh_CN.UTF-8"));
std::wcout << wif.rdbuf();
}
This question was addressed in Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI. In sum, wstring is based upon the UCS-2 standard, which is the predecessor of UTF-16. This is a strictly two byte standard. I believe this covers Arabic.
Recently dealt with all the encodings, solved this way. It is better to use std::u32string as it has stable size on all platforms, and most fonts work with utf-32 format. (the file should still be in utf-8)
std::u32string readFile(std::string filename) {
std::basic_ifstream<char32_t> fin(filename);
std::u32string str{};
std::getline(fin, str, U'\0');
return str;
}
For this approach to work multiplatform, when you need to read a file incompletely, you should use only getline function (remember to write separator, without separator function returns exception std::bad_cast) to move between lines (or to find a certain character), you can save line position value by seekg and tellg. And don't move between characters, just use substr.
All other methods of reading files in the standard library that I have found are not able to work adequately with files with dynamic character sizes.
This is a bit raw, but how about reading the file as plain old bytes then cast the byte buffer to wchar_t* ?
Something like:
#include <iostream>
#include <fstream>
std::wstring ReadFileIntoWstring(const std::wstring& filepath)
{
std::wstring wstr;
std::ifstream file (filepath.c_str(), std::ios::in|std::ios::binary|std::ios::ate);
size_t size = (size_t)file.tellg();
file.seekg (0, std::ios::beg);
char* buffer = new char [size];
file.read (buffer, size);
wstr = (wchar_t*)buffer;
file.close();
delete[] buffer;
return wstr;
}
So i am new to linux programming on C++ and i am trying to write the contents of a binary file (.dll, .exe etc) to a .txt to test and see the results of the operation, the code works and writes the .txt file and some of the binary into it, but when i open the .txt file there is not the full binary writed inside and the problem is due invalid unicode from far i know.
Here is a screenshot for better understanding:
Click here to see image from stackoverflow
or
Text Output when open the .txt file:
MZ\90\00\00\00\00\00
And here is the code i am using (reproducible example):
#include <algorithm>
#include <array>
#include <chrono>
#include <cstring>
#include <fstream>
#include <functional>
#include <iostream>
#include <memory>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <unordered_map>
#include <unordered_set>
std::vector<char> buffer;
bool read_file(std::string name, std::vector<char>& out)
{
std::ifstream file(name.c_str(), std::ios::binary);
if (!file.good())
{
return false;
}
file.unsetf(std::ios::skipws);
file.seekg(0, std::ios::end);
const size_t size = file.tellg();
file.seekg(0, std::ios::beg);
out.resize(size);
file.read(out.data(), size);
file.close();
return true;
}
void write_text_to_log_file(char* text)
{
std::ofstream log_file("log_file.txt", std::ios_base::out | std::ios_base::app );
log_file.write(text, sizeof(text));
}
int main(int argc, char* argv[])
{
read_file("bin.dll", buffer);
printf("Image Array: %s\r\n", buffer.data());
printf("Image Size: %zu\r\n", buffer.size());
write_text_to_log_file(buffer.data());
}
Any help is apreciated, i am trying to do exactly the same than file_get_contents of php and whit the raw binary buffer write the file, for example write the raw binary to .dll format .exe, .png etc etc.
log_file.write(text, sizeof(text));
sizeof is a compile time constant that gives you the size of the object. text is a char *, so this gives you a grand total of 4 or 8, depending on whether you compiled a 32bit or a 64bit binary. It doesn't matter whether text points to just a few bytes, or the entire contents of "Harry Potter And The Deathly Hallows". This sizeof will always produce either a 4 or an 8 for you, no matter what's in text.
You need to pass an additional parameter here that comes from the buffer.size() of the std::vector where the data is stored, and use that here. sizeof() is not the same thing as a method of std::vector that's called "size".
Question
I have a few structures I want to write to a binary file. They consist of integers from cstdint, for example uint64_t. Is there a way to write those to a binary file that doesn not involve me manually splitting them into arrays of char and using the fstream.write() functions?
What I've tried
My naive idea was that c++ would figure out that I have a file in binary mode and << would write the integers to that binary file. So I tried this:
#include <iostream>
#include <fstream>
#include <cstdint>
using namespace std;
int main() {
fstream file;
uint64_t myuint = 0xFFFF;
file.open("test.bin", ios::app | ios::binary);
file << myuint;
file.close();
return 0;
}
However, this wrote the string "65535" to the file.
Can I somehow tell the fstream to switch to binary mode, like how I can change the display format with << std::hex?
Failing all that above I'd need a function that turns arbitrary cstdint types into char arrays.
I'm not really concerned about endianness, as I'd use the same program to also read those (in a next step), so it would cancel out.
Yes you can, this is what std::fstream::write is for:
#include <iostream>
#include <fstream>
#include <cstdint>
int main() {
std::fstream file;
uint64_t myuint = 0xFFFF;
file.open("test.bin", std::ios::app | std::ios::binary);
file.write(reinterpret_cast<char*>(&myuint), sizeof(myuint)); // or with recent C++: reinterpret_cast<std::byte*>
}
I am using the filetering_istream type to save the information in a decompressed file while using 'boost/iostreams/filtering_stream.hpp'. But I want to cast it into the ifstream type. It there any way to do it? Great thanks!
The code is as follows:
#include <istream>
#include <fstream>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
int main(){
std::ifstream file("test_data.dat.gz");
boost::iostreams::filtering_istream in;
in.push(boost::iostreams::gzip_decompressor());
in.push(file);
/* add code to convert filtering_istream 'in' into ifstream 'pfile' */
/* It seems that the following code returns a pointer NULL */
// std::ifstream* pfile = in.component<std::ifstream>(1);
return 0;
}
After trying boost::ref and boost::wrapper proposed by zett42, the ifstream really works. The only problem is that it doesn't give the phrases wanted.
In my text of .gz file, I wrote:
THIS IS A DATA FILE!
8 plus 8 is 16
But using the ifstream, I got:
is_open: 1
\213<\373Xtest_data.dat\361\360V"G\307G7OWE.\205\202\234\322b\205\314bC3.\327+>\314$
I am not sure what happened here, and can I do something to recover it?
From the reference of filtering_stream:
filtering_stream derives from std::basic_istream, std::basic_ostream
or std::basic_iostream, depending on its Mode parameter.
So no, you can't cast a filtering_stream directly to an ifstream because there is no inheritance relationship between the two.
What you can do instead, if your filter chain ends with a device that is an ifstream, you can grap that device by calling filtering_stream::component(). For streams this function returns a boost::iostreams::detail::mode_adapter (you can see the type by calling in.component_type(1)).
It's propably not a good idea to depend on an internal boost type (indicated by namespace "detail") which could change with next boost version, so one workaround is to use boost::reference_wrapper instead.
#include <iostream>
#include <istream>
#include <fstream>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <boost/core/ref.hpp>
int main(){
std::ifstream file("test_data.dat.gz");
boost::iostreams::filtering_istream in;
in.push(boost::iostreams::gzip_decompressor());
in.push(boost::ref(file));
if( auto pfile = in.component<boost::reference_wrapper<std::ifstream>>( 1 ) )
{
std::ifstream& rfile = *pfile;
std::cout << "is_open: " << rfile.is_open() << "\n";
}
}
How can I read a Unicode (UTF-8) file into wstring(s) on the Windows platform?
With C++11 support, you can use std::codecvt_utf8 facet which encapsulates conversion between a UTF-8 encoded byte string and UCS2 or UCS4 character string and which can be used to read and write UTF-8 files, both text and binary.
In order to use facet you usually create locale object that encapsulates culture-specific information as a set of facets that collectively define a specific localized environment. Once you have a locale object, you can imbue your stream buffer with it:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
which can be used like this:
std::wstring wstr = readFile("a.txt");
Alternatively you can set the global C++ locale before you work with string streams which causes all future calls to the std::locale default constructor to return a copy of the global C++ locale (you don't need to explicitly imbue stream buffers with it then):
std::locale::global(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
According to a comment by #Hans Passant, the simplest way is to use _wfopen_s. Open the file with mode rt, ccs=UTF-8.
Here is another pure C++ solution that works at least with VC++ 2010:
#include <locale>
#include <codecvt>
#include <string>
#include <fstream>
#include <cstdlib>
int main() {
const std::locale empty_locale = std::locale::empty();
typedef std::codecvt_utf8<wchar_t> converter_type;
const converter_type* converter = new converter_type;
const std::locale utf8_locale = std::locale(empty_locale, converter);
std::wifstream stream(L"test.txt");
stream.imbue(utf8_locale);
std::wstring line;
std::getline(stream, line);
std::system("pause");
}
Except for locale::empty() (here locale::global() might work as well) and the wchar_t* overload of the basic_ifstream constructor, this should even be pretty standard-compliant (where “standard” means C++0x, of course).
Here's a platform-specific function for Windows only:
size_t GetSizeOfFile(const std::wstring& path)
{
struct _stat fileinfo;
_wstat(path.c_str(), &fileinfo);
return fileinfo.st_size;
}
std::wstring LoadUtf8FileToString(const std::wstring& filename)
{
std::wstring buffer; // stores file contents
FILE* f = _wfopen(filename.c_str(), L"rtS, ccs=UTF-8");
// Failed to open file
if (f == NULL)
{
// ...handle some error...
return buffer;
}
size_t filesize = GetSizeOfFile(filename);
// Read entire file contents in to memory
if (filesize > 0)
{
buffer.resize(filesize);
size_t wchars_read = fread(&(buffer.front()), sizeof(wchar_t), filesize, f);
buffer.resize(wchars_read);
buffer.shrink_to_fit();
}
fclose(f);
return buffer;
}
Use like so:
std::wstring mytext = LoadUtf8FileToString(L"C:\\MyUtf8File.txt");
Note the entire file is loaded in to memory, so you might not want to use it for very large files.
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <cstdlib>
int main()
{
std::wifstream wif("filename.txt");
wif.imbue(std::locale("zh_CN.UTF-8"));
std::wcout.imbue(std::locale("zh_CN.UTF-8"));
std::wcout << wif.rdbuf();
}
This question was addressed in Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI. In sum, wstring is based upon the UCS-2 standard, which is the predecessor of UTF-16. This is a strictly two byte standard. I believe this covers Arabic.
Recently dealt with all the encodings, solved this way. It is better to use std::u32string as it has stable size on all platforms, and most fonts work with utf-32 format. (the file should still be in utf-8)
std::u32string readFile(std::string filename) {
std::basic_ifstream<char32_t> fin(filename);
std::u32string str{};
std::getline(fin, str, U'\0');
return str;
}
For this approach to work multiplatform, when you need to read a file incompletely, you should use only getline function (remember to write separator, without separator function returns exception std::bad_cast) to move between lines (or to find a certain character), you can save line position value by seekg and tellg. And don't move between characters, just use substr.
All other methods of reading files in the standard library that I have found are not able to work adequately with files with dynamic character sizes.
This is a bit raw, but how about reading the file as plain old bytes then cast the byte buffer to wchar_t* ?
Something like:
#include <iostream>
#include <fstream>
std::wstring ReadFileIntoWstring(const std::wstring& filepath)
{
std::wstring wstr;
std::ifstream file (filepath.c_str(), std::ios::in|std::ios::binary|std::ios::ate);
size_t size = (size_t)file.tellg();
file.seekg (0, std::ios::beg);
char* buffer = new char [size];
file.read (buffer, size);
wstr = (wchar_t*)buffer;
file.close();
delete[] buffer;
return wstr;
}