I/O ascii codes to foreign characters - c++

Using cout << "\n\u00f3\n << endl, I can print ó with newlines at the Unix command line. Once I start attempting to read files and print strings containing the characters, I see the literal output instead \n\u00f3\n.
I am not sure if this is because the file read techniques use character arrays or if there is some other nuance I do not know.
Any ideas?
Thanks!
const char *filename ="spanish_project_sample1.txt";
FILE *file = fopen(filename, "r");
int c;
char *data;
data = " ";
while ((c=fgetc(file)) != EOF) {
data = appendCharToCharArray(data, c);
}
printf("%s", data);

I looked at the JavaScript solutions to a similar problem (e.g. FromCharCode) and found this code online:
https://ideone.com/Udo3hN
#include <cstdarg>
#include <iostream>
using namespace std;
string FromCharCode ( int num, ... )
{
va_list arguments;
char ch;
string s;
va_start ( arguments, num );
for ( int x = 0; x < num; x++ )
{
ch = va_arg ( arguments, int );
s = s + ch;
}
va_end ( arguments );
return s;
}
int main()
{
cout<<FromCharCode (10,73,78,68,69,83,73,71,78,33,33) ;//<<endl;
return 0;
}
Specifically, it looks like reading in the characters is the issue because at runtime instead of reading '\n' as value 10 for example, the character array would actually record two ints [92,110].
Using a hardcoded string, the compiler parses the escaped characters as the desired values.
Any suggestions or solutions still welcome.

The C++ idiom for reading a file line by line is:
#include <fstream>
#include <iostream>
using namespace std;
int main(int argc, char **argv)
{
string line;
ifstream ifs;
ifs.open(argv[1]);
while(getline(ifs, line))
cout << line << endl;
}
Try that.
Your problem is probably one of interpretation though. If you have "\n\u00f3\n" in a file, that is what it reads and prints. If you have "ó" in the file, which is stored as \u00f3 in UTF-16, you will get what you want. The i/o routines don't do any conversion.
You also need to know if your file is in UTF-8 or UTF-16 so that you can read it properly.

Related

C++ : Interpret unicode white space

I have a file which contains text (ASCII + unicode) and I am trying to count total words in it using a C++ program. It is a requirement that I should read the file line by line (using getline) and then process each line to count the words within it.
So I have written the following simple program:
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
int main(int argc, char* argv[]) {
uint64_t ct = 0;
std::string line;
std::ifstream infile(argv[1]);
while(std::getline(infile, line)) {
std::stringstream inputStream(line);
std::string token;
while (inputStream >> token) {
++ct;
}
}
std::cout << ct << std::endl;
return 0;
}
However, the above program outputs a number that is lesser than what wc -w command gives. To narrow down the problem, I modified the program to simply output whatever it reads. So now the program becomes:
int main(int argc, char* argv[]) {
uint64_t ct = 0;
std::string line;
std::ifstream infile(argv[1]);
while(std::getline(infile, line)) {
std::stringstream inputStream(line);
std::string token;
while (inputStream >> token) {
std::cout << token << " ";
}
std::cout << std::endl;
}
return 0;
}
I redirected the output of this program to another file. Now, when I run wc -w on this new file, the number is same as running wc -w on the original file. This means, I am reading all the words (i.e., "words" defined by wc) in my program. And hence, a reasonable explanation would be that one of the values of token that is read using inputStream >> token consists of some unicode character that is interpreted as a white space by wc program. So how do I change my program to also support such interpretation of unicode white space characters?
You can go by either:
A. Java's definition of Unicode (not non-breaking) whitespace.
or
B. Wikipedia's list of all 25 Unicode code points defined as whitespace.

How to read a .txt file using c-strings?

I'm working on a project for school and I need to read in text from a file.
Sounds easy peasy, except my professor put a restriction on the project: NO STRINGS
("No string data types or the string library are allowed.")
I've been getting around this problem by using char arrays; however, I'm not sure how to use char arrays to read in from a file.
This is an example from another website on how to read in a file with strings.
// reading a text file
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
string line;
ifstream myfile ("example.txt");
if (myfile.is_open())
{
while ( getline (myfile,line) )
{
cout << line << '\n';
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
The important line here is while ( getline (myfile,line) );
getline accepts an ifstream and a string (not char array).
Any help is appreciated!
Use cin.getline. Refer to this site for the format: cin.getline.
You can write something like this:
ifstream x("example.txt");
char arr[105];
while (x.getline(arr,100,'\n')){
cout << arr << '\n';
}
ifstream has a method named get() that reads the contents of the file into a char array. get() takes, as parameters, a pointer to the array, and the size of the array; then fills the array up to the given size, if possible.
After get() returns, use the gcount() method to determine how many characters have been read.
You can use then, and a simple logical loop, to repeatedly read the contents of the file, in size-chunks, into an array, and collect all the chunks read into a single array, or a std::vector.
You can use the int i = 0; while (scanf("%c", &str[i ++]) != EOF) to judge the end of text input. str is the char array include newline which you wanted, and i is the input size.
You can also use while(cin.getline()) to read per line every loop in C++ style:
istream& getline (char* s, streamsize n, char delim ); just like below:
const int SIZE = 100;
const int MSIZE = 100;
int main() {
freopen("in.txt", "r", stdin);
char str[SIZE][MSIZE];
int i = -1;
while (cin.getline(str[++ i], MSIZE)) {
printf("input string is [%s]\n", str[i]);
}
}

Trying to return size of input file of c++ but recieve an error when I convert the char variable to string

I am trying to count the characters in my program. Initially my variable "words" was a char and the file read just fine. When trying to determine the length of the variable, it wouldn't work with .length(). Can you explain how I can make my "words" variable as a string so that the words.length() executes correctly?
error on line words = readFile.get(); is:
no match for ‘operator!=’ in ‘words != -0x00000000000000001’
#include <iostream>
#include <cmath>
#include <fstream>
#include <cstdlib>
#include <string>
#include <stdio.h>
#include <math.h>
using namespace std;
int main() {
//buff array to hold char words in the input text file
string words;
//char words;
//read file
ifstream readFile("TextFile1.txt");
//notify user if the file didn't transfer into the system
if (!readFile)
cout <<"I am sorry but we could not process your file."<<endl;
//read and output the file
while (readFile)
{
words = readFile.get();
if(words!= EOF)
cout <<words;
}
cout << "The size of the file is: " << words.length() << " bytes. \n";
return 0;
}
char c;
while (readFile.get(c))
{
words.insert(c);
}
Of course, if you were solely doing this to count the number of characters (and were intent on using std::istream::get) you'd probably be better off just doing this:
int NumChars = 0;
while (readFile.get())
{
NumChars++;
}
Oh, and by the way, you might want to close the file after you're done with it.
You should read some reference.. try cppreference.com and look for std::instream::get
I'm not sure what do you want, but if you wanna just count words, you can do something like this:
std::ifstream InFile(/*filename*/);
if(!InFile)
// file not found
std::string s;
int numWords = 0;
while(InFile >> s)
numWords++;
std::cout << numWords;
Or if you want to get to know how many characters are in file, change std::string s to char s and use std::ifstream::get instead:
std::ifstream InFile(/*filename*/);
if(!InFile)
// file not found
char s;
int numCharacters = 0;
while(InFile.get(s)) //this will read one character after another until EOF
numCharacters++;
std::cout << numCharacters;
The second approach is easier:
If file uses ASCII, numCharacters == fileSize;
Otherwise if it uses UNICODE, numCharacters == fileSize / 2;
get() returns an int, to do what you're doing, you must check that int before appending to "words" instead of checking words against EOF, e.g.:
...
//read and output the file
while (readFile)
{
const int w = readFile.get();
if (w!= EOF) {
words += w;
cout <<words;
}
}
...

How to read the whole lines from a file (with spaces)?

I am using STL. I need to read lines from a text file. How to read lines till the first \n but not till the first ' ' (space)?
For example, my text file contains:
Hello world
Hey there
If I write like this:
ifstream file("FileWithGreetings.txt");
string str("");
file >> str;
then str will contain only "Hello" but I need "Hello world" (till the first \n).
I thought I could use the method getline() but it demands to specify the number of symbols to be read. In my case, I do not know how many symbols I should read.
You can use getline:
#include <string>
#include <iostream>
int main() {
std::string line;
if (getline(std::cin,line)) {
// line is the whole line
}
}
using getline function is one option.
or
getc to read each char with a do-while loop
if the file consists of numbers, this would be a better way to read.
do {
int item=0, pos=0;
c = getc(in);
while((c >= '0') && (c <= '9')) {
item *=10;
item += int(c)-int('0');
c = getc(in);
pos++;
}
if(pos) list.push_back(item);
}while(c != '\n' && !feof(in));
try by modifying this method if your file consists of strings..
Thanks to all of the people who answered me. I made new code for my program, which works:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char** argv)
{
ifstream ifile(argv[1]);
// ...
while (!ifile.eof())
{
string line("");
if (getline(ifile, line))
{
// the line is a whole line
}
// ...
}
ifile.close();
return 0;
}
I suggest:
#include<fstream>
ifstream reader([filename], [ifstream::in or std::ios_base::in);
if(ifstream){ // confirm stream is in a good state
while(!reader.eof()){
reader.read(std::string, size_t how_long?);
// Then process the std::string as described below
}
}
For the std::string, any variable name will do, and for how long, whatever you feel appropriate or use std::getline as above.
To process the line, just use an iterator on the std::string:
std::string::iterator begin() & std::string::iterator end()
and process the iterator pointer character by character until you have the \n and ' ' you are looking for.

Incorrect char from file

I have the following .txt file:
test.txt
1,2,5,6
Passing into a small C++ program I made through command line as follows:
./test test.txt
Source is as follows:
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char **argv)
{
int temp =0;
ifstream file;
file.open(argv[1]);
while(!file.eof())
{
temp=file.get();
file.ignore(1,',');
cout<<temp<<' ';
}
return 0;
}
For some reason my output is not 1 2 5 6 but 49 50 53 54. What gives?
UPDATE:
Also, I noticed there is another implementation of get(). If I define char temp then I can do file.get(temp) and that will also save me converting ASCII representation. However I like using while (file >> temp) so I will be going with that. Thanks.
temp is an int. So you see the encoded ascii values after casting the char to an int.
49 is the ascii code for digit 49-48 = 1.
get() gives you a character (character code).
by the way, eof() only becomes true after a failed read attempt, so the code you show,
while(!file.eof())
{
temp=file.get();
file.ignore(1,',');
cout<<temp<<' ';
}
will possibly display one extraneous character at the end.
the conventional loop is
while( file >> temp )
{
cout << temp << ' ';
}
where the expression file >> temp reads in one number and produces a reference to file, and where that file objected is converted to bool as if you had written
while( !(file >> temp).fail() )
This does not do what you think it does:
while(!file.eof())
This is covered in Why is iostream::eof inside a loop condition considered wrong?, so I won't cover it in this answer.
Try:
char c;
while (file >> c)
{
// [...]
}
...instead. Reading in a char rather than an int will also save you having to convert the ascii representation (ASCII value 49 is 1, etc...).
For the record, and despite this being the nth duplicate, here's how this code might look in idiomatic C++:
for (std::string line; std::getline(file, line); )
{
std::istringstream iss(line);
std::cout << "We read:";
for (std::string n; std::getline(iss, line, ','); )
{
std::cout << " " << n;
// now use e.g. std::stoi(n)
}
std::cout << "\n";
}
If you don't care about lines or just have one line, you can skip the outer loop.