fstream not working properly with russian text? - c++

I work with russian a lot and I've been trying to get data from a file with an input stream. Here's the code, it's supposed to output only the words that contain no more than 5 characters.
#include <iostream>
#include <fstream>
#include <string>
#include <Windows.h>
using namespace std;
int main()
{
setlocale(LC_ALL, "ru_ru.utf8");
ifstream input{ "in_text.txt" };
if (!input) {
cerr << "Ошибка при открытии файла" << endl;
return 1;
}
cout << "Вывод содержимого файла: " << "\n\n";
string line{};
while (input >> line) {
if (line.size() <= 5)
cout << line << endl;
}
cout << endl;
input.close();
return 0;
}
Here's the problem:
I noticed the output didn't pick up all of the words that were actually containing less than 5 characters. So I did a simple test with the word "Test" in english and the translation "тест" in russian, the same number of characters. So my text file would look like this:
Test тест
I used to debugger to see how the program would run and it printed out the english word and left the russian. I can't understand why this is happening.
P.S. When I changed the code to if (line.size() <= 8) it printed out both of them. Very odd
I think I messed up my system locale somehow I don't know. I did one time try to use std::locale
without really understanding it, maybe that did something to my PC I'm not really sure. Please help

I'm very unsure about this but using codecvt_utf8 and wstring_convert seems to work:
#include <codecvt> // codecvt_utf8
#include <string>
#include <iostream>
#include <locale> // std::wstring_convert
int main() {
// ...
while (input >> line) {
// convert the utf8 encoded `line` to utf32 encoding:
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> u8_to_u32;
std::u32string u32s = u8_to_u32.from_bytes(line);
if (u32s.size() <= 5) // check the utf32 length
std::cout << line << '\n'; // but print the utf8 encoded string
}
// ...
}
Demo

Related

How to skip blank line when reading file in C++?

I want to skip blank line when readhing a file.
I've tried if(buffer == "\n") and if(buffer.empty()), but it not work. I did like this:
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream file_pointer;
file_pointer.open("rules.txt", ios::in);
if(!file_pointer.is_open())
{
cout << "failed to read rule file." << endl;
return 0;
}
string buffer;
while(getline(file_pointer, buffer))
{
if(buffer.empty())
{
continue;
}
if(buffer == "\n")
{
continue;
}
cout << buffer << endl;
}
file_pointer.close();
return 0;
}
The problem is that a “blank” line need not be “empty”.
#include <algorithm> // std::any_of
#include <cctype> // std::isspace
#include <fstream>
#include <iostream>
//using namespace std;
bool is_blank( const std::string & s )
{
return std::all_of( s.begin(), s.end(), []( unsigned char c )
{
return std::isspace( c );
} );
}
int main()
{
std::ifstream rules_file("rules.txt");
if(!rules_file)
{
std::cerr << "failed to read rule file." << endl;
return 1;
}
std::string line;
while(getline(rules_file, line))
{
if(is_blank(line))
{
continue;
}
std::cout << line << "\n";
}
return 0;
}
A few notes.
Get used to writing std:: infront of things from the Standard Library. Importing everything en masse with using namespace std is almost always a bad idea.
C++ file streams are not pointers. Moreover, be descriptive with your names! It makes reading your code easier for your future self. Honest!
Open a file at the file stream object creation. Let it close at object destruction (which you did).
Report errors to standard error and signal program failure by returning 1 from main().
Print normal output to standard output and signal program success by returing 0 from main().
It is likely that std::any_of() and lambdas are probably not something you have studied yet. There are all kinds of ways that is_blank() could have been written:
bool is_blank( const std::string & s )
{
for (char c : s)
if (!std::isspace( (unsigned char)c ))
return false;
return true;
}
Or:
bool is_blank( const std::string & s )
{
return s.find_first_not_of( " \f\n\r\t\v" ) == s.npos;
}
Etc.
The reason that the checking for newline didn’t work is that getline() removes the newline character(s) from the input stream but does not store it/them in the target string. (Unlike fgets(), which does store the newline so that you know that you got an entire line of text from the user.) C++ is much more convenient in this respect.
Overall, you look to be off to a good start. I really recommend you make yourself familiar with a good reference and look up the functions you wish to use. Even now, after 30+ years of this, I still look them up when I use them.
One way to find good stuff is to just type the name of the function in at Google: “cppreference.com getline” will take you to the ur-reference site.
https://en.cppreference.com — “the” C and C++ reference site
https://cplusplus.com/reference/ — also good, often an easier read for beginners than cppreference.com
https://www.learncpp.com/ — a good site to learn how to do things in C++
You can skip blank lines when reading a file in C++ by using the getline() function and checking the length of the resulting string. Here is an example of how you can do this:
#include <fstream>
#include <string>
int main() {
std::ifstream file("myfile.txt");
std::string line;
while (std::getline(file, line)) {
if (line.length() == 0) { // check if the line is empty
continue; // skip the iteration
}
// process the non-empty line
}
file.close();
return 0;
}
You can also use the std::stringstream class to skip blank lines, here is an example:
#include <fstream>
#include <sstream>
#include <string>
int main() {
std::ifstream file("myfile.txt");
std::string line;
while (std::getline(file, line)) {
std::stringstream ss(line);
if (ss >> line) { // check if the line is empty
// process the non-empty line
}
}
file.close();
return 0;
}
(1) Here's a solution using the ws manipulator in conjunction with the getline function to ignore leading white-space while reading lines of input from the stream. ws is a manipulator that skips whitespace characters (demo).
#include <iostream>
#include <string>
int main()
{
using namespace std;
string line;
while (getline(cin >> ws, line))
cout << "got: " << line << endl;
return 0;
}
Note that the spaces are removed even if the line is not empty (" abc " becomes "abc ".
(2) If this is a problem, you could use:
while (getline(cin, line))
if (line.find_first_not_of(" \t") != string::npos)
cout << "got: " << line << endl;

getline() not reading first lines

I am c++ beginner and this is for school..
I am trying to read a file about 28kb big. The program works but it doesnt print the first 41 lines. It works fine with a smaller file.
At first i was reading into a char array and switch it to strings.
i also tried changing the log buffer but it apparently it should be big enough..
I feel like this should be very simple, but just cant figure it out..
Any help will be greatly apreciated..
Thanks!
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>
#include <cstdio>
#include <cerrno>
using namespace std;
struct espion
{
char nom[30];
char pays[20];
char emploi[29];
};
int main()
{
const int MAX_NOM = 30, MAX_PAYS = 20, MAX_EMPLOI = 29;
char nomFichier[50] = "espion.txt";
ifstream aLire;
aLire.open(nomFichier, ios::in|ios::binary);
if(!aLire.is_open()){
exit(EXIT_FAILURE);
}
std::string infoEspion;
while(aLire)
{
infoEspion.clear();
std::getline(aLire, infoEspion);
cout << infoEspion ;
}
aLire.close();
system("pause");
return 0;
}
From the system("pause"), it looks like you're running on Windows. With ios::binary, the end-of-line marker is not translated, and the cout << infoEspion; statement prints these "raw" lines in such a way that all of the lines are written on top of each other. (More specifically, each line will end with a return but no newline, so the cursor goes back to the start of the same line after executing each cout statement.) If you take out the ios::binary, you will echo all of the input on a single, very long line. Changing the statement to cout << infoEspion << endl; will echo all of the lines.

Printing out blank spaces from a text file in C++

#include <iostream>
#include <cstdlib>
#include <cctype>
#include <cmath>
#include <string>
#include <iomanip>
#include <fstream>
#include <stdio.h>
using namespace std;
int main()
{
ifstream file;
string filename;
char character;
int letters[153] = {};
cout << "Enter text file name: ";
cin >> filename;
file.open(filename.c_str());
if (! file.is_open())
{
cout << "Error opening file. Check file name. Exiting program." << endl;
exit(0);
}
while (file.peek() != EOF)
{
file >> character;
if(!file.fail())
{
letters[static_cast<int>(character)]++;
}
}
for (int i = 0; i <= 153; i++)
{
if (letters[i] > 0)
{
cout << static_cast<char>(i) << " " << letters[i] << endl;
}
}
exit(0);
}
#endif
Hi everyone, my current code counts the frequency of each letter from a text file. However, it does not count the number of blank spaces. Is there a simple way to printout the number of blank spaces in a .txt file?
Also, how come when I'm trying to access a vector item, I run into a seg fault?
For example, if I use:
cout << " " + letters[i] << endl;, it displays a segfault. Any ideas?
Thank you so much.
By default, iostreams formatted input extraction operations (those using >>) skip past all whitespace characters to get to the first non-whitespace character. Perhaps surprisingly, this includes the extraction operator for char. In order to consider whitespace characters as characters to be processed as usual, you should alter use the noskipws manipulator before processing:
file << std::noskipws;
Don't forget to set it back on later:
file << std::skipws;
What if you're one of those crazy people who wants to make a function that leaves this aspect (or in even all aspects) of the stream state as it was before it exits? Naturally, C++ provides a discouragingly ugly way to achieve this:
std::ios_base::fmtflags old_fmt = file.flags();
file << std::noskipws;
... // Do your thang
file.flags(old_fmt);
I'm only posting this as an alternative way of doing what you're apparently trying. This uses the same lookup table approach you use in your code, but uses an istreambuf_iterator for slurping unformatted (and unfiltered) raw characters out of the stream buffer directly.
#include <iostream>
#include <fstream>
#include <iterator>
#include <climits>
int main(int argc, char *argv[])
{
if (argc < 2)
return EXIT_FAILURE;
std::ifstream inf(argv[1]);
std::istreambuf_iterator<char> it_inf(inf), it_eof;
unsigned int arr[1 << CHAR_BIT] = {};
std::for_each(it_inf, it_eof,
[&arr](char c){ ++arr[static_cast<unsigned int>(c)];});
for (int i=0;i<sizeof(arr)/sizeof(arr[0]);++i)
{
if (std::isprint(i) && arr[i])
std::cout << static_cast<char>(i) << ':' << arr[i] << std::endl;
}
return 0;
}
Executing this on the very source code file itself, (i.e. the code above) generates the following:
:124
#:4
&:3
':2
(:13
):13
*:1
+:4
,:4
/:1
0:3
1:2
2:1
::13
;:10
<:19
=:2
>:7
A:2
B:1
C:1
E:2
F:1
H:1
I:3
L:1
R:2
T:2
U:1
X:1
[:8
]:8
_:10
a:27
b:1
c:19
d:13
e:20
f:15
g:6
h:5
i:42
l:6
m:6
n:22
o:10
p:1
r:37
s:20
t:34
u:10
v:2
z:2
{:4
}:4
Just a different way to do it, but hopefully it is clear that usually the C++ standard library offers up elegant ways to do what you desire if you dig deep enough to find whats in there. Wishing you good luck.

Unable to write a std::wstring into wofstream

I'm using Qt/C++ on a Linux system. I need to convert a QLineEdit's text to std::wstring and write it into a std::wofstream. It works correctly for ascii strings, but when I enter any other character (Arabic or Uzbek) there is nothing written in the file. (size of file is 0 bytes).
this is my code:
wofstream customersFile;
customersFile.open("./customers.txt");
std::wstring ws = lne_address_customer->text().toStdWString();
customersFile << ws << ws.length() << std::endl;
Output for John Smith entered in the line edit is John Smith10. but for unicode strings, nothing.
First I thought that is a problem with QString::toStdWString(), but customersFile << ws.length(); writes correct length of all strings. So I guess I'm doing something wrong wrong with writing wstring in file. [?]
EDIT:
I write it again in eclipse. and compiled it with g++4.5. result is same:
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
cout << "" << endl; // prints
wstring ws = L"سلام"; // this is an Arabic "Hello"
wofstream wf("new.txt");
if (!wf.bad())
wf << ws;
else
cerr << "some problem";
return 0;
}
Add
#include <locale>
and at the start of main,
std::locale::global(std::locale(""));

Most Compact Way to Count Number of Lines in a File in C++

What's the most compact way to compute the number of lines of a file?
I need this information to create/initialize a matrix data structure.
Later I have to go through the file again and store the information inside a matrix.
Update: Based on Dave Gamble's. But why this doesn't compile?
Note that the file could be very large. So I try to avoid using container
to save memory.
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
using namespace std;
int main ( int arg_count, char *arg_vec[] ) {
if (arg_count !=2 ) {
cerr << "expected one argument" << endl;
return EXIT_FAILURE;
}
string line;
ifstream myfile (arg_vec[1]);
FILE *f=fopen(myfile,"rb");
int c=0,b;
while ((b=fgetc(f))!=EOF) c+=(b==10)?1:0;
fseek(f,0,SEEK_SET);
return 0;
}
I think this might do it...
std::ifstream file(f);
int n = std::count(std::istreambuf_iterator<char>(file), std::istreambuf_iterator<char>(), '\n') + 1;
If the reason you need to "go back again" is because you cannot continue without the size, try re-ordering your setup.
That is, read through the file, storing each line in a std::vector<string> or something. Then you have the size, along with the lines in the file:
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
int main(void)
{
std::fstream file("main.cpp");
std::vector<std::string> fileData;
// read in each line
std::string dummy;
while (getline(file, dummy))
{
fileData.push_back(dummy);
}
// and size is available, along with the file
// being in memory (faster than hard drive)
size_t fileLines = fileData.size();
std::cout << "Number of lines: " << fileLines << std::endl;
}
Here is a solution without the container:
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
int main(void)
{
std::fstream file("main.cpp");
size_t fileLines = 0;
// read in each line
std::string dummy;
while (getline(file, dummy))
{
++fileLines;
}
std::cout << "Number of lines: " << fileLines << std::endl;
}
Though I doubt that's the most efficient way. The benefit of this method was the ability to store the lines in memory as you went.
FILE *f=fopen(filename,"rb");
int c=0,b;while ((b=fgetc(f))!=EOF) c+=(b==10)?1:0;fseek(f,0,SEEK_SET);
Answer in c.
That kind of compact?
#include <stdlib.h>
int main(void) { system("wc -l plainfile.txt"); }
Count the number of instances of '\n'. This works for *nix (\n) and DOS/Windows (\r\n) line endings, but not for old-skool Mac (System 9 or maybe before that), which used just \r. I've never seen a case come up with just \r as line endings, so I wouldn't worry about it unless you know it's going to be an issue.
Edit: If your input is not ASCII, then you could run into encoding problems as well. What's your input look like?