How to read UTF-8 file data in C++? - c++

I have a list of IPA (UTF-8) symbols in a text file called ipa.txt with numbers assigned to them. How do I cross reference it with a source file which is also a text file that contains a bunch of words and their corresponding IPA, to return a text file for every names with their names as their filename and inside the text file should contain their corresponding numbers of IPA.
Below is what I've tried but didn't work, only outputs were mostly 000000.
int main()
{
std::unordered_map <wchar_t, int> map;
std::wifstream file;
file.open("ipa.txt");
if (file.is_open()) {
std::cout << "opened ipa file";
}
wchar_t from;
int to;
while (file >> from >> to) {
map.insert(std::make_pair(from, to));
}
std::wifstream outfile;
outfile.open("source.txt");
if (outfile.is_open()) {
std::cout << "opened source file";
}
std::wstring id;
std::wstring name;
while (outfile >> id >> name) {
std::ofstream outputfile;
outputfile.open(id + L".txt");
for (wchar_t c : name) outputfile << map[c];
}
system("pause");
return 0;
}

I believe you are using the wrong type for c used in the iteration over name. As c is used as key for the map, and name is a wstring, you should use:
for (wchar_t c : name) outputfile << map[c];
instead of:
for (char c : name) outputfile << map[c];
Isn't it?
Hope this may help, Stefano

First thought:
map <- std::unordered_map<char, int>
open ipa.txt:
for each line in file:
map[line[0]] = line[1]
open source.txt:
for each line in file:
create and open line[0].txt:
for each char in line[1]:
write map[char] to line[0].txt
Regarding the actual C++ implementation, AFAIK utf-8 should fit inside char and std::string so you don't have to do anything special there. If you need utf-8 string literals you must use the u8 prefix: u8"literal". Everything else should be standard file IO.
EDIT: Here are some links to the relevant documentation to help you get started:
ifstream (for reading from files)
ofstream (for writing to files)
unordered_map (for mapping 'keys' to 'values')
Outside of that it will probably just take a little Googling. File IO is very common so I'm sure you can find some good examples online. As long as your file format is consistent you shouldn't have too much trouble with the file parsing. Then the rest of it is just storing values in the map and then looking them up when you need them, which is pretty simple.

Related

Set integer variable through file read

I know how to pass in strings from a text file. In a previous project I read in strings and then tested them on either being "t" or "f", which the result of would set a variable to true or false.
Now I am wondering if it is efficiently possible to read numbers from a text file and pass them into an int? All I can think of is checking for the string "1" and returning 1 in a function, but that would have to be done for every possible integer I could expect in my program, which is not an effective solution.
For context, I am trying to make a save system for a game, and ints/floats that are read in would be variables such as player health, how much of an item they have, etc.
If you already know how to read a string str from a text file, reading numbers is not that difficult: jsut read the string as you did and use stoi() to convert the string into an int, or stof() into float.
int i; double d;
i=stroi(str); d=strod(str2);
Another technique is to use file streams to read or write from a file exactly as you would do from cin and cout:
ifstream file("mytext.txt");
file>>i>>d;
The previous method doesn't care so much about lines. So still another technique is to read a string, convert it into a string stream and use the stringstream as you would with cin:
if (getline(file, str)){ // read a full line
stringstream sst(str);
sst>>i>>d;
}
Using std::fstream. You can open a file, and stream input or output based on how you opened the file.
Example:
#include <iostream>
#include <fstream>
int main(int argc, char** argv)
{
// Pretend we are passed the file location as a command-line argument to our program:
std::fstream file { argv[1], std::ios::in };
if (file.is_open())
{
int value;
file >> value;
std::cout << value << std::endl;
}
else
{
std::cout << "Could not open file " << argv[1] << std::endl;
}
}
Provided that the information is correctly formatted in the file, this should work.
I didn't run it, so there might be syntax errors, but the basics are there. Check out cppreference for some help, they will have further examples.

Name *.bin file after string

I'm trying to get a string from the user through stdin and save it to the variable InputString, then create a binary file with the same name as the value of InputString. This is the code I've written so far:
std::string InputString;
getline(std::cin, InputString);
std::cout << InputString << std::endl;
// The code above works.
// Errors start below. :(
void Printi(std::string filename)
{
std::ofstream Printi(filename".bin");
Printi((char*)&Hans, sizeof(Person)); // Hans is an instance of my class Person.
Printi.close();
}
Printi(InputString);
I get the following errors (translated into English from my localized compiler):
"Printi": Local function definition is not allowed
Missing ")" (in line std::ofstream Printi..)
How can I solve this problem using only standard C++ libraries?
std::ofstream Printi(filename".bin") needs to be std::ofstream Printi(filename + ".bin"). The + operator is used to concatenate the strings and append the .bin to the end of what was supplied in the file name.

opening output filestreams with string names

Hi I have some C++ code that uses user defined input to generate file-names for some output files:
std::string outputName = fileName;
for(int i = 0; i < 4; i++)
{
outputName.pop_back();
}
std::string outputName1 = outputName;
std::string outputName2 = outputName;
outputName.append(".fasta");
outputName1.append("_Ploid1.fasta");
outputName2.append("_Ploid2.fasta");
Where fileName could be any word the user can define with .csv after it e.g. '~/Desktop/mytest.csv'
The code chomps .csv off and makes three filenames / paths for 3 output streams.
It then creates them and attempts to open them:
std::ofstream outputFile;
outputFile.open(outputName.c_str());
std::ofstream outputFile1;
outputFile1.open(outputName1.c_str());
std::ofstream outputFile2;
outputFile2.open(outputName2.c_str());
I made sure to pass the names to open as const char* with the c_str method, however if I test my code by adding the following line:
std::cout << outputFile.is_open() << " " << outputFile1.is_open() << " " << outputFile2.is_open() << std::endl;
and compiling and setting fineName as "test.csv". I successfully compile and run, however,
Three zeros's are printed to screen showing the three filestreams for output are not in fact open. Why are they not opening? I know passing strings as filenames does not work which is why I thought conversion with c_str() would be sufficient.
Thanks,
Ben W.
Your issue is likely to be due to the path beginning with ~, which isn't expanded to /{home,Users}/${LOGNAME}.
ifstream open file C++
This answer to How to create a folder in the home directory? may be of use to you.
Unfortunately, there is no standard, portable way of finding out exactly why open() failed:
Detecting reason for failure to open an ofstream when fail() is true
I know passing strings as filenames does not work which is why I thought conversion with c_str() would be sufficient.
std::basic_ofstream::open() does accept a const std::string & (since C++11)!

Reading a string from a file in C++

I'm trying to store strings directly into a file to be read later in C++ (basically for the full scope I'm trying to store an object array with string variables in a file, and those string variables will be read through something like object[0].string). However, everytime I try to read the string variables the system gives me a jumbled up error. The following codes are a basic part of what I'm trying.
#include <iostream>
#include <fstream>
using namespace std;
/*
//this is run first to create the file and store the string
int main(){
string reed;
reed = "sees";
ofstream ofs("filrsee.txt", ios::out|ios::binary);
ofs.write(reinterpret_cast<char*>(&reed), sizeof(reed));
ofs.close();
}*/
//this is run after that to open the file and read the string
int main(){
string ghhh;
ifstream ifs("filrsee.txt", ios::in|ios::binary);
ifs.read(reinterpret_cast<char*>(&ghhh), sizeof(ghhh));
cout<<ghhh;
ifs.close();
return 0;
}
The second part is where things go haywire when I try to read it.
Sorry if it's been asked before, I've taken a look around for similar questions but most of them are a bit different from what I'm trying to do or I don't really understand what they're trying to do (still quite new to this).
What am I doing wrong?
You are reading from a file and trying to put the data in the string structure itself, overwriting it, which is plain wrong.
As it can be verified at http://www.cplusplus.com/reference/iostream/istream/read/ , the types you used were wrong, and you know it because you had to force the std::string into a char * using a reinterpret_cast.
C++ Hint: using a reinterpret_cast in C++ is (almost) always a sign you did something wrong.
Why is it so complicated to read a file?
A long time ago, reading a file was easy. In some Basic-like language, you used the function LOAD, and voilĂ !, you had your file.
So why can't we do it now?
Because you don't know what's in a file.
It could be a string.
It could be a serialized array of structs with raw data dumped from memory.
It could even be a live stream, that is, a file which is appended continuously (a log file, the stdin, whatever).
You could want to read the data word by word
... or line by line...
Or the file is so large it doesn't fit in a string, so you want to read it by parts.
etc..
The more generic solution is to read the file (thus, in C++, a fstream), byte per byte using the function get (see http://www.cplusplus.com/reference/iostream/istream/get/), and do yourself the operation to transform it into the type you expect, and stopping at EOF.
The std::isteam interface have all the functions you need to read the file in different ways (see http://www.cplusplus.com/reference/iostream/istream/), and even then, there is an additional non-member function for the std::string to read a file until a delimiter is found (usually "\n", but it could be anything, see http://www.cplusplus.com/reference/string/getline/)
But I want a "load" function for a std::string!!!
Ok, I get it.
We assume that what you put in the file is the content of a std::string, but keeping it compatible with a C-style string, that is, the \0 character marks the end of the string (if not, we would need to load the file until reaching the EOF).
And we assume you want the whole file content fully loaded once the function loadFile returns.
So, here's the loadFile function:
#include <iostream>
#include <fstream>
#include <string>
bool loadFile(const std::string & p_name, std::string & p_content)
{
// We create the file object, saying I want to read it
std::fstream file(p_name.c_str(), std::fstream::in) ;
// We verify if the file was successfully opened
if(file.is_open())
{
// We use the standard getline function to read the file into
// a std::string, stoping only at "\0"
std::getline(file, p_content, '\0') ;
// We return the success of the operation
return ! file.bad() ;
}
// The file was not successfully opened, so returning false
return false ;
}
If you are using a C++11 enabled compiler, you can add this overloaded function, which will cost you nothing (while in C++03, baring optimizations, it could have cost you a temporary object):
std::string loadFile(const std::string & p_name)
{
std::string content ;
loadFile(p_name, content) ;
return content ;
}
Now, for completeness' sake, I wrote the corresponding saveFile function:
bool saveFile(const std::string & p_name, const std::string & p_content)
{
std::fstream file(p_name.c_str(), std::fstream::out) ;
if(file.is_open())
{
file.write(p_content.c_str(), p_content.length()) ;
return ! file.bad() ;
}
return false ;
}
And here, the "main" I used to test those functions:
int main()
{
const std::string name(".//myFile.txt") ;
const std::string content("AAA BBB CCC\nDDD EEE FFF\n\n") ;
{
const bool success = saveFile(name, content) ;
std::cout << "saveFile(\"" << name << "\", \"" << content << "\")\n\n"
<< "result is: " << success << "\n" ;
}
{
std::string myContent ;
const bool success = loadFile(name, myContent) ;
std::cout << "loadFile(\"" << name << "\", \"" << content << "\")\n\n"
<< "result is: " << success << "\n"
<< "content is: [" << myContent << "]\n"
<< "content ok is: " << (myContent == content)<< "\n" ;
}
}
More?
If you want to do more than that, then you will need to explore the C++ IOStreams library API, at http://www.cplusplus.com/reference/iostream/
You can't use std::istream::read() to read into a std::string object. What you could do is to determine the size of the file, create a string of suitable size, and read the data into the string's character array:
std::string str;
std::ifstream file("whatever");
std::string::size_type size = determine_size_of(file);
str.resize(size);
file.read(&str[0], size);
The tricky bit is determining the size the string should have. Given that the character sequence may get translated while reading, e.g., because line end sequences are transformed, this pretty much amounts to reading the string in the general case. Thus, I would recommend against doing it this way. Instead, I would read the string using something like this:
std::string str;
std::ifstream file("whatever");
if (std::getline(file, str, '\0')) {
...
}
This works OK for text strings and is about as fast as it gets on most systems. If the file can contain null characters, e.g., because it contains binary data, this doesn't quite work. If this is the case, I'd use an intermediate std::ostringstream:
std::ostringstream out;
std::ifstream file("whatever");
out << file.rdbuf();
std::string str = out.str();
A string object is not a mere char array, the line
ifs.read(reinterpret_cast<char*>(&ghhh), sizeof(ghhh));
is probably the root of your problems.
try applying the following changes:
char[BUFF_LEN] ghhh;
....
ifs.read(ghhh, BUFF_LEN);

How can ofstream write NULL to a file in binary mode?

I am maintaining a C++ method which one of my clients is hitting an issue with. The method is supposed to write out a series of identifiers to a file delimited by a new line. However on their machine somehow the method is writing a series of NULL's out to the file. Opening the file in a binary editor shows that it contains all zeros.
I can't understand why this is happening. I've tried assigning empty strings and strings with the first character set to 0. There is no problem creating the file, just writing the identifiers to it.
Here is the method:
void writeIdentifiers(std::vector<std::string> IDs, std::string filename)
{
std::ofstream out (filename.c_str(), std::ofstream::binary);
if (out.is_open())
{
for (std::vector<std::string>::iterator it = IDs.begin();
it != IDs.end();
it++)
{
out << *it << "\n";
}
}
out.close();
}
My questions: is there any possible input you can provide that method which will create a file which has NULL values in it?
Yeah, the following code quite clearly writes a series of NULL bytes:
std::vector<std::string> ids;
std::string nullstring;
nullstring.assign("\0\0\0\0\0\0\0\0\0\0", 10);
ids.push_back(nullstring);
writeIdentifiers(ids, "test.dat");
Because the std::string container stores the string length, it can't necessarily be used in the same way as an ordinary C (null-terminated) string. Here, I assign a string containing 10 NULL bytes. Those are then output because the string length is 10.