Read Unicode files C++ - c++

I have a simple question to ask. I have a UTF 16 text file to read wich starts with FFFE. What are the C++ tools to deal with this kind of file? I just want to read it, filter some lines, and display the result.
It looks simple, but I just have experience in work with plain ascci files and I'm in the hurry. I'm using VS C++, but I'm not want to work with managed C++.
Regards
Here a put a very simple example
wifstream file;
file.open("C:\\appLog.txt", ios::in);
wchar_t buffer[2048];
file.seekg(2);
file.getline(buffer, bSize-1);
wprintf(L"%s\n", buffer);
file.close();

You can use fgetws, which reads 16-bit characters. Your file is in little-endian,byte order. Since x86 machines are also little-endian you should be able to handle the file without much trouble. When you want to do output, use fwprintf.
Also, I agree more information could be useful. For instance, you may be using a library that abstracts away some of this.

Since you are in the hurry, use ifstream in binary mode and do your job. I had the same problems with you and this saved my day. (it is not a recommended solution, of course, its just a hack)
ifstream file;
file.open("k:/test.txt", ifstream::in|ifstream::binary);
wchar_t buffer[2048];
file.seekg(2);
file.read((char*)buffer, line_length);
wprintf(L"%s\n", buffer);
file.close();

For what it's worth, I think I've read you have to use a Microsoft function which allows you to specfiy the encoding.
http://msdn.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx

The FFFE is just the initial BOM (byte order mark). Just read from the file like you normally do, but into a wide char buffer.

Related

Read and write image data C++

I've just started learning C++, and I'm working on a program that is supposed to grab an image from the hard disk and then save it as another name. The original image should still remain. I've got it work with text files, because with those I can just do like this:
ifstream fin("C:\\test.txt");
ofstream fout("C:\\new.txt");
char ch;
while(!fin.eof())
{
fin.get(ch);
fout.put(ch);
}
fin.close();
fout.close();
}
But I suppose that it's not like this with images. Do I have to install a lib or something like that to get it work? Or can I "just" use the included libraries? I know I'm not really an expert of C++ so please tell me if I'm totally wrong.
I hope someone can and want to help me! Thanks in advance!
Btw, the image is a .png format.
You can use the std streams but use the ios::binary argument when you open the stream. It's well documented and there is several examples around the internet
You are apparently using MS Windows: Windows distinguishes between "text" and "binary" files by different handling of line separators. For a binary file, you do not want it to translate \n\r to \n on reading. To prevent it, using the ios::binary mode when opening the file, as #Emil tells you.
BTW, you do not have to use \\ in paths under windows. Just use forward slashes:
ifstream fin("C:/test.txt");
This worked even back in WWII using MS-DOS.
If the goal is just to copy a file then CopyFile is probably better choice than doing it manually.
#include <Windows.h>
// ...
BOOL const copySuccess = CopyFile("source.png", "dest.png", failIfExists);
// TODO: handle errors.
If using Windows API is not an option, then copying a file one char at a time like you have done is very inefficient way of doing this. As others have noted, you need to open files as binary to avoid I/O messing with line endings. A simpler and more efficient way than one char at a time is this:
#include <fstream>
// ...
std::ifstream fin("source.png", std::ios::binary);
std::ofstream fout("dest.png", std::ios::binary);
// TODO: handle errors.
fout << fin.rdbuf();

Can't read unicode (japanese) from a file

Hi I have a file containing japanese text, saved as unicode file.
I need to read from the file and display the information to the stardard output.
I am using Visual studio 2008
int main()
{
wstring line;
wifstream myfile("D:\sample.txt"); //file containing japanese characters, saved as unicode file
//myfile.imbue(locale("Japanese_Japan"));
if(!myfile)
cout<<"While opening a file an error is encountered"<<endl;
else
cout << "File is successfully opened" << endl;
//wcout.imbue (locale("Japanese_Japan"));
while ( myfile.good() )
{
getline(myfile,line);
wcout << line << endl;
}
myfile.close();
system("PAUSE");
return 0;
}
This program generates some random output and I don't see any japanese text on the screen.
Oh boy. Welcome to the Fun, Fun world of character encodings.
The first thing you need to know is that your console is not unicode on windows. The only way you'll ever see Japanese characters in a console application is if you set your non-unicode (ANSI) locale to Japanese. Which will also make backslashes look like yen symbols and break paths containing european accented characters for programs using the ANSI Windows API (which was supposed to have been deprecated when Windows XP came around, but people still use to this day...)
So first thing you'll want to do is build a GUI program instead. But I'll leave that as an exercise to the interested reader.
Second, there are a lot of ways to represent text. You first need to figure out the encoding in use. Is is UTF-8? UTF-16 (and if so, little or big endian?) Shift-JIS? EUC-JP? You can only use a wstream to read directly if the file is in little-endian UTF-16. And even then you need to futz with its internal buffer. Anything other than UTF-16 and you'll get unreadable junk. And this is all only the case on Windows as well! Other OSes may have a different wstream representation. It's best not to use wstreams at all really.
So, let's assume it's not UTF-16 (for full generality). In this case you must read it as a char stream - not using a wstream. You must then convert this character string into UTF-16 (assuming you're using windows! Other OSes tend to use UTF-8 char*s). On windows this can be done with MultiByteToWideChar. Make sure you pass in the right code page value, and CP_ACP or CP_OEMCP are almost always the wrong answer.
Now, you may be wondering how to determine which code page (ie, character encoding) is correct. The short answer is you don't. There is no prima facie way of looking at a text string and saying which encoding it is. Sure, there may be hints - eg, if you see a byte order mark, chances are it's whatever variant of unicode makes that mark. But in general, you have to be told by the user, or make an attempt to guess, relying on the user to correct you if you're wrong, or you have to select a fixed character set and don't attempt to support any others.
Someone here had the same problem with Russian characters (He's using basic_ifstream<wchar_t> wich should be the same as wifstream according to this page). In the comments of that question they also link to this which should help you further.
If understood everything correctly, it seems that wifstream reads the characters correctly but your program tries to convert them to whatever locale your program is running in.
Two errors:
std::wifstream(L"D:\\sample.txt");
And do not mix cout and wcout.
Also check that your file is encoded in UTF-16, Little-Endian. If not so, you will be in trouble reading it.
wfstream uses wfilebuf for the actual reading and writing of the data. wfilebuf defaults to using a char buffer internally which means that the text in the file is assumed narrow, and converted to wide before you see it. Since the text was actually wide, you get a mess.
The solution is to replace the wfilebuf buffer with a wide one.
You probably also need to open the file as binary.
const size_t bufsize = 128;
wchar_t buffer[bufsize];
wifstream myfile("D:\\sample.txt", ios::binary);
myfile.rdbuf()->pubsetbuf(buffer, 128);
Make sure the buffer outlives the stream object!
See details here: http://msdn.microsoft.com/en-us/library/tzf8k3z8(v=VS.80).aspx

Reading bmp file for steganography

I am trying to read a bmp file in C++(Turbo). But i m not able to print binary stream.
I want to encode txt file into it and decrypt it.
How can i do this. I read that bmp file header is of 54 byte. But how and where should i append txt file in bmp file. ?
I know only Turbo C++, so it would be helpfull for me if u provide solution or suggestion related to topic for the same.
int main()
{
ifstream fr; //reads
ofstream fw; // wrrites to file
char c;
int random;
clrscr();
char file[2][100]={"s.bmp","s.txt"};
fr.open(file[0],ios::binary);//file name, mode of open, here input mode i.e. read only
if(!fr)
cout<<"File can not be opened.";
fw.open(file[1],ios::app);//file will be appended
if(!fw)
cout<<"File can not be opened";
while(!fr)
cout<<fr.get(); // error should be here. but not able to find out what error is it
fr.close();
fw.close();
getch();
}
This code is running fine when i pass txt file in binary mode
EDIT :
while(!fr)
cout<<fr.get();
I am not able to see binary data in console
this was working fine for text when i was passing character parameter in fr.get(c)
I think you question is allready answered:
Print an int in binary representation using C
convert your char to an int and you are done (at least for the output part)
With steganography, what little I know about it, you're not "appending" text. You're making subtle changes to the pixels (shading, etc..) to hide something that's not visually obvious, but should be able to be reverse-decrypted by examining the pixels. Should not have anything to do with the header.
So anyway, the point of my otherwise non-helpful answer is to encourage you go to and learn about the topic which you seek answers, so that you can design your solution, and THEN come and ask for specifics about implementation.
You need to modify the bit pattern, not append any text to the file.
One simple example :
Read the Bitmap Content (after header), and sacrifice a bit from each of the byte to hold your content
If on Windows, recode to use CreateFile and see what the real error is. If on Linux, ditto for open(2). Once you have debugged the problem you can probably shift back to iostreams.

UCS-2LE text file parsing

I have a text file which was created using some Microsoft reporting tool. The text file includes the BOM 0xFFFE in the beginning and then ASCII character output with nulls between characters (i.e "F.i.e.l.d.1."). I can use iconv to convert this to UTF-8 using UCS-2LE as an input format and UTF-8 as an output format... it works great.
My problem is that I want to read in lines from the UCS-2LE file into strings and parse out the field values and then write them out to a ASCII text file (i.e. Field1 Field2). I have tried the string and wstring-based versions of getline – while it reads the string from the file, functions like substr(start, length) do interpret the string as 8-bit values, so the start and length values are off.
How do I read the UCS-2LE data into a C++ String and extract the data values? I have looked at boost and icu as well as numerous google searches but have not found anything that works. What am I missing here? Please help!
My example code looks like this:
wifstream srcFile;
srcFile.open(argv[1], ios_base::in | ios_base::binary);
..
..
wstring srcBuf;
..
..
while( getline(srcFile, srcBuf) )
{
wstring field1;
field1 = srcBuf.substr(12, 12);
...
...
}
So, if, for example, srcBuf contains "W.e. t.h.i.n.k. i.n. g.e.n.e.r.a.l.i.t.i.e.s." then the substr() above returns ".k. i.n. g.e" instead of "g.e.n.e.r.a.l.i.t.i.e.s.".
What I want is to read in the string and process it without having to worry about the multi-byte representation. Does anybody have an example of using boost (or something else) to read these strings from the file and convert them to a fixed width representation for internal use?
BTW, I am on a Mac using Eclipse and gcc.. Is it possible my STL does not understand wide character strings?
Thanks!
Having spent some good hours tackling this question, here are my conclusions:
Reading an UTF-16 (or UCS2-LE) file is apparently manageable in C++11, see How do I write a UTF-8 encoded string to a file in Windows, in C++
Since the boost::locale library is now part of C++11, one can just use codecvt_utf16 (see bullet below for eventual code samples)
However, in older compilers (e.g. MSVC 2008), you can use locale and a custom codecvt facet/"recipe", as very nicely exemplified in this answer to Writing UTF16 to file in binary mode
Alternatively, one can also try this method of reading, though it did not work in my case. The output would be missing lines which were replaced by garbage chars.
I wasn't able to get this done in my pre-C++11 compiler and had to resort to scripting it in Ruby and spawning a process (it's just in test so I think that kind of complications are ok there) to execute my task.
Hope this spares others some time, happy to help.
substr works fine for me on Linux with g++ 4.3.3. The program
#include <string>
#include <iostream>
using namespace std;
int main()
{
wstring s1 = L"Hello, world";
wstring s2 = s1.substr(3,5);
wcout << s2 << endl;
}
prints "lo, w" as it should.
However, the file reading probably does something different from what you expect. It converts the files from the locale encoding to wchar_t, which will cause each byte becoming its own wchar_t. I don't think the standard library supports reading UTF-16 into wchar_t.

Parse config file in C/C++

I'm a newbie looking for a fast and easy way to parse a text file in C or C++ (wxWidgets)
The file will look something like this (A main category with "sub-objects") which will appear in a list box
[CategoryA]
[SubCat]
Str1 = Test
Str2 = Description
[SubCat] [End]
[SubCat]
Str1 = Othertest
...
[CategoryA] [End]
Any suggestions?
Sounds like you want to parse a file that's pretty close to an ini file.
There's at least a few INI parser libraries out there: minIni, iniParser, libini, for instance.
It should be fairly easy to write your own parser for this if you use streams. You can read a file using an std::ifstream:
std::ifstream ifs("filename.ext");
if(!ifs.good()) throw my_exceptions("cannot open file");
read_file(ifs);
Since it seems line-oriented, you would then first read lines, and then process these:
void read_file(std::istream& is)
{
for(;;) {
std::string line;
std::getline(is, line);
if(!is) break;
std::istringstream iss(line);
// read from iss
}
if(!is.eof()) throw my_exceptions("error reading file");
}
For the actual parsing, you could 1) first peek at the first character. If that's a [, pop it from the stream, and use std::getline(is,identifier,']') to read whatever is within '[' and ']'. If it isn't a [, use std::getline(is, key, '=') to read the left side of a key-value pair, and then std::getline(is, value) to read the right side.
Note: Stream input, unfortunately, is usually not exactly lightning fast. (This doesn't have to be that way, but in practice this often is.) However, it is really easy to do and it is fairly easy to do it right, once you know a very few patterns to work with its peculiarities (like if(strm.good()) not being the same as if(strm) and not being the opposite of if(strm.bad()) and a few other things you'll have to get used to). For something as performance-critical (har har!) as reading an ini file from disk, it should be fast enough in 999,999 out of 1,000,000 cases.
You may want to try Boost.Program_Options. However it has slightly different formatting. More close to INI files. Subcategories are done like this:
[CategoryA]
Option = Data
[CategoryB.Subcategory1]
Option = Data
[CategoryB.Subcategory2]
Option = Data
Also it has some other features so it is actually very useful IMO.
Try Configurator. It's easy-to-use and flexible C++ library for configuration file parsing (from simplest INI to complex files with arbitrary nesting and semantic checking). Header-only and cross-platform. Uses Boost C++ libraries.
See: http://opensource.dshevchenko.biz/configurator
It looks more straightforward to implement your own parser than to try to adapt an existing one you are unfamiliar with.
Your structure seems - from your example - to be line-based. This makes parsing it easy.
It generally makes sense to load your file into a tree, and then walk around it as necessary.
On Windows only, GetPrivateProfileSection does this. It's deprecated in favor of the registry but it's still here and it still works.
How about trying to make a simple XML file? There are plenty of libraries that can help you read it, and the added bonus is that a lot of other programs/languages can read it too.
If you're using wxWidgets I would consider wxFileConfig. I'm not using wxWidgets, but the class seems to support categories with sub-categories.
When you are using GTK, you are lucky.
You can use the Glib KeyFile save_to_file and load_from_file.
https://docs.gtk.org/glib/struct.KeyFile.html
Or when using Gtkmm (C++).
See: https://developer-old.gnome.org/glibmm/stable/classGlib_1_1KeyFile.html
Example in C++ with load_from_file:
#include <glibmm.h>
#include <string>
Glib::KeyFile keyfile;
keyfile.load_from_file(file_path);
std::string path = keyfile.get_string("General", "Path");
bool is_enabled = keyfile.get_boolean("General", "IsEnabled");
Saving is as easy as calling save_to_file:
Glib::KeyFile keyfile;
keyfile.set_string("General", "Path", path);
keyfile.set_boolean("General", "IsEnabled", is_enabled);
keyfile.save_to_file(file_path);