c++ read Arabic text from file - c++

In C++, I have a text file that contains Arabic text like:
شكلك بتعرف تقرأ عربي يا ابن الذين
and I want to parse each line of this file into a string and use string functions on it (like substr, length, at...etc.) then print some parts of it to an output file.
I tried doing it but it prints some garbage characters like "\'c7\'e1\'de\'d1\"
Is there any library to support Arabic characters?
edit: just adding the code:
#include <iostream>
#include <fstream>
using namespace std;
int main(){
ifstream ip;
ip.open("d.rtf");
if(ip.is_open() != true){
cout<<"open failed"<<endl;
return 0;
}
string l;
while(!ip.eof()){
getline(ip, l);
cout<<l<<endl;
}
return 0;
}
Note: I still need to add some processing code like
if(l == "كلام بالعربي"){
string s = l.substr(0, 4);
cout<<s<<" is what you are looking for"<<endl;
}

You need to find out which text encoding the file is using. For example, to read an UTF-8 file as a wchar_t you can (C++11):
std::wifstream fin("text.txt");
fin.imbue(std::locale("en_US.UTF-8"));
std::wstring line;
std::getline(fin, line);
std::wcout << line << std::endl;

The best way to deal with this, in my opinion, is to use some UNICODE helper. The strings in C or even in C++ are just an array of bytes. When you do, for example, a strlen() [C] or somestring.length() [C++] you will only have the number os bytes of that string instead of number os characters.
Some auxiliar functions can be used help you on it, like mbstowcs(). But my opinion is that they are kinda old and hard to use.
Another way is to use C++11, that, in theory, has support for many things related to UTF-8. But I never saw it working perfectly, at least if you need to be multi-platform.
The best solution I found is to use ICU library. With this I can work on UTF-8 strings easily and with the same "charm" as working with a regular std::string. You have a string class with methods, for length, substrings and so on... and it's very portable. I use it on Window, Mac and Linux.

You can use Qt too .
Simple example :
#include <QDebug>
#include <QTextStream>
#include <QFile>
int main()
{
QFile file("test.txt");
file.open(QIODevice::ReadOnly | QIODevice::Text);
QTextStream stream(&file);
QString text=stream.readAll();
if(text == "شكلك بتعرف تقرأ عربي يا ابن الذين")
qDebug()<<",,,, ";
}

It is better to process an Arabic text line by line. To get all lines of Arabic text from file, try this
std::wifstream fin("arabictext.txt");
fin.imbue(std::locale("en_US.UTF-8"));
std::wstring line;
std::wstring text;
while ( std::getline(fin, line) )
{
text= text+ line + L"\n";
}

Related

is_open() function in C++ always return 0 value and getLine(myFile, line) does not return anything

Trying to read a file in C++ using fstream.
But the is_open() function always return 0 result and readline() does not read anything. Please see the code snippet below.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main() {
string line;
ifstream myfile("D:\xx\xx\xx\xx\testdata\task1.in.1");
if (myfile.is_open()) {
while (getline(myfile, line)) {
cout << line << '\n';
}
myfile.close();
}
else
cout << "Unable to open file";
return 0;
}
you think you're opening D:\<somepath>\testdata\task1.in.1
but in fact you're trying to open D:\<somepath><tabulation char>estdata<tabulation char>ask1.in.1 since \t is interpreted as a tabulation.
(like \n is a newline in printf("hello world\n");)
(\x is special too BTW that's not the real path or you would have had another error: error: \x used with no following hex digits which maybe would have talked to you better!)
You have to escape the backslashes like this:
D:\\xx\\xx\\xx\\xx\\testdata\\task1.in.1
Windows also accepts paths like those, more convenient, unless you want to generate batch scripts with cd commands or the like that would require backslashes (/ is used as option switch in batch commands):
D:/xx/xx/xx/xx/testdata/task1.in.1
As NathanOliver stated, you can use the raw prefix if your compiler has C++11 mode enabled (or with --std=c++11)
R"(D:\xx\xx\xx\xx\testdata\task1.in.1)"
Last word: dirty way of getting away with it:
D:\Xx\Xx\Xx\Xx\Testdata\Task1.in.1
Using uppercase in that case would work
because windows is case insensitive
C would let the backslashes as is.
But that's mere luck. A lot of people do that without realizing they're very close to a bug.
BTW a lot of people capitalize windows paths (as seen a lot in this site) because they noticed that their paths wouldn't work with lowercase without knowing why.

How to write from file to string

I am new to C++ and I'm having trouble understanding how to import text from a file. I have a .txt file that I am inputting from and I want to put all of the text from that file into a string. To read the text file I am using the following code:
ifstream textFile("information.txt");
Which is just reading a text file name information. I made a string named text and initialized it to "". My problem is with the following code which I am trying to use to put the text from the .txt file onto the string:
while (textFile >> text)
text += textFile;
I am clearly doing something wrong, although I'm not sure what it is.
while (textFile >> text) won't preserve spaces. If you want to keep the spaces in your string you should use other functions like textFile.get()
Example:
#include <iostream>
#include <string>
#include <fstream>
int main(){
std::ifstream textFile("information.txt");
std::string text,tmp;
while(true){
tmp=textFile.get();
if(textFile.eof()){ break;}
text+=tmp;
}
std::cout<<text;
return(0);}
while (textFile >> text) text += textFile;
You're trying to add the file to a string, which I assume will be a compiler error.
If you want to do it your way, you'll need two strings, e.g.
string text;
string tmp;
while(textFile >> tmp) text += tmp;
Note that this may omit spaces, so you may need to manually re-add them.

replace and write to file c++

I want write code to find words in a file and replace words.
I open file, next I find word. I have a problem with replace words.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
string contain_of_file,a="car";
string::size_type position;
ifstream NewFile;
NewFile.open("plik1.txt");
while(NewFile.good())
{
getline(NewFile, contain_of_file);
position=contain_of_file.find("Zuzia");
if(position!=string::npos)
{
NewFile<<contain_of_file.replace(position,5, a );
}
}
NewFile.close();
cin.get();
return 0;
}
How can I improve my code?
lose the using namespace std;
don't declare the variables before needed;
I think the English word you were looking for was content -- but I am not an English-native speaker;
getline already returns NewFile.good() in boolean context;
No need to close NewFile explicitly;
I would change the casing on the NewFile variable;
I don't think you can write to an ifstream, and you ought to manage how you are going to replace the contents of the file...
My version would be like:
#include <iostream>
#include <fstream>
#include <string>
#include <cstdio>
int main() {
std::rename("plik1.txt", "plik1.txt~");
std::ifstream old_file("plik1.txt~");
std::ofstream new_file("plik1.txt");
for( std::string contents_of_file; std::getline(old_file, contents_of_file); ) {
std::string::size_type position = contents_of_file.find("Zuzia");
if( position != std::string::npos )
contents_of_file = contents_of_file.replace(position, 5, "car");
new_file << contents_of_file << '\n';
}
return 0;
}
There are at least two issues with your code:
1. Overwriting text in a file.
2. Writing to an ifstream (the i is for input, not output).
The File object
Imagine a file as many little boxes that contain characters. The boxes are glued front to back in an endless line.
You can take letters out of boxes and put into other boxes, but since they are glued, you can't put new boxes between existing boxes.
Replacing Text
You can replace text in a file as long as the replacement text is the same length as the original text. If the text is too long, you overwrite existing text. If the replacement text is shorter, you have residual text in the file. Not good in either method.
To replace (overwrite) the text, open the file as fstream and use the ios::in and ios::out modes.
Input versus Output
The common technique for replacing text is to open the original file for *i*nput and a new file as *o*utput.
Copy the existing data, up to your target text, to the new file.
Copy the replacement text to the new file.
Copy any remaining text to the new file.
Close all files.

can't read file c++

There is a file in directory and i'm trying to read a file but i can't. What is wrong with my code. Example is taken from http://www.cplusplus.com/forum/beginner/37208/
#include <iostream>
#include <fstream>
#include <string>
#define MAX_LEN 100
using namespace std;
string inlasning ()
{
string text;
string temp; // Added this line
ifstream file;
file.open ("D:\education\Third course\semestr 2\security\lab1.2\secret_msg.txt");
while (!file.eof())
{
getline (file, temp);
text.append (temp); // Added this line
}
cout << "THE FILE, FOR TESTING:\n" // For testing
<< text << "\n";
file.close();
return text;
}
void main ()
{
inlasning();
}
Change \ to \\ in file path. (or to /)
In string literals \ is used as an escape character.
You have to write \\.
Note: you should check the open call.
On failure, the failbit flag is set (which can be checked with member
fail), and depending on the value set with exceptions an exception may
be thrown.
\ is an escape character, so you need to use \\ to obtain the result you intend.
This is true almost everywhere, even here in stackoverflow where you nee to use it in order to write something like this for example:
*A*
(just put the \ before the *), otherwise (if you don't use the \) stackoverflow will interpret the text and will output an italic A, this:
A
The same is true for bold (two asterisks... two slashes):
**A**
instead of
A
:)
... or maybe you cannot read it because it is a "secret_msg" :P (LOL)

using fstream to read every character including spaces and newline

I wanted to use fstream to read a txt file.
I am using inFile >> characterToConvert, but the problem is that this omits any spaces and newline.
I am writing an encryption program so I need to include the spaces and newlines.
What would be the proper way to go about accomplishing this?
Probably the best way is to read the entire file's contents into a string, which can be done very easily using ifstream's rdbuf() method:
std::ifstream in("myfile");
std::stringstream buffer;
buffer << in.rdbuf();
std::string contents(buffer.str());
You can then use regular string manipulation now that you've got everything from the file.
While Tomek was asking about reading a text file, the same approach will work for reading binary data, though the std::ios::binary flag needs to be provided when creating the input file stream.
For encryption, you're better off opening your file in binary mode. Use something like this to put the bytes of a file into a vector:
std::ifstream ifs("foobar.txt", std::ios::binary);
ifs.seekg(0, std::ios::end);
std::ifstream::pos_type filesize = ifs.tellg();
ifs.seekg(0, std::ios::beg);
std::vector<char> bytes(filesize);
ifs.read(&bytes[0], filesize);
Edit: fixed a subtle bug as per the comments.
I haven't tested this, but I believe you need to clear the "skip whitespace" flag:
inFile.unsetf(ios_base::skipws);
I use the following reference for C++ streams:
IOstream Library
std::ifstream ifs( "filename.txt" );
std::string str( ( std::istreambuf_iterator<char>( ifs ) ),
std::istreambuf_iterator<char>()
);
The following c++ code will read an entire file...
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main ()
{
string line;
ifstream myfile ("foo.txt");
if (myfile.is_open()){
while (!myfile.eof()){
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
return 0;
}
post your code and I can give you more specific help to your problem...
A lot of the benefit of the istream layer is providing basic formatting and parsing for simple types ro and from a stream. For the purposes that you describe, none of this is really important and you are just interested in the file as a stream of bytes.
For these purpose you may be better of just using the basic_streambuf interface provided by a filebuf. The 'skip whitespace' behaviour is part of the istream interface functionality that you just don't need.
filebuf underlies an ifstream, but it is perfectly valid to use it directly.
std::filebuf myfile;
myfile.open( "myfile.dat", std::ios_base::in | std::ios_base::binary );
// gets next char, then moves 'get' pointer to next char in the file
int ch = myfile.sbumpc();
// get (up to) the next n chars from the stream
std::streamsize getcount = myfile.sgetn( char_array, n );
Also have a look at the functions snextc (moves the 'get' pointer forward and then returns the current char), sgetc (gets the current char but doesn't move the 'get' pointer) and sungetc (backs up the 'get' pointer by one position if possible).
When you don't need any of the insertion and extraction operators provided by an istream class and just need a basic byte interface, often the streambuf interface (filebuf, stringbuf) is more appropriate than an istream interface (ifstream, istringstream).
You can call int fstream::get(), which will read a single character from the stream. You can also use istream& fstream::read(char*, streamsize), which does the same operation as get(), just over multiple characters. The given links include examples of using each method.
I also recommend reading and writing in binary mode. This allows ASCII control characters to be properly read from and written to files. Otherwise, an encrypt/decrypt operation pair might result in non-identical files. To do this, you open the filestream with the ios::binary flag. With a binary file, you want to use the read() method.
Another better way is to use istreambuf_iterator, and the sample code is as below:
ifstream inputFile("test.data");
string fileData(istreambuf_iterator<char>(inputFile), istreambuf_iterator<char>());
For encryption, you should probably use read(). Encryption algorithms usually deal with fixed-size blocks. Oh, and to open in binary mode (no translation frmo \n\r to \n), pass ios_base::binary as the second parameter to constructor or open() call.
Simple
#include <fstream>
#include <iomanip>
ifstream ifs ("file");
ifs >> noskipws
that's all.
ifstream ifile(path);
std::string contents((std::istreambuf_iterator<char>(ifile)), std::istreambuf_iterator<char>());
ifile.close();
I also find that the get() method of ifstream object can also read all the characters of the file, which do not require unset std::ios_base::skipws. Quote from C++ Primer:
Several of the unformatted operations deal with a stream one byte at a time. These operations, which are described in Table 17.19, read rather ignore whitespaces.
These operations are list as below:
is.get(), os.put(), is.putback(), is.unget() and is.peek().
Below is a minimum working code
#include <iostream>
#include <fstream>
#include <string>
int main(){
std::ifstream in_file("input.txt");
char s;
if (in_file.is_open()){
int count = 0;
while (in_file.get(s)){
std::cout << count << ": "<< (int)s <<'\n';
count++;
}
}
else{
std::cout << "Unable to open input.txt.\n";
}
in_file.close();
return 0;
}
The content of the input file (cat input.txt) is
ab cd
ef gh
The output of the program is:
0: 97
1: 98
2: 32
3: 99
4: 100
5: 10
6: 101
7: 102
8: 32
9: 103
10: 104
11: 32
12: 10
10 and 32 are decimal representation of newline and space character. Obviously, all characters have been read.
As Charles Bailey correctly pointed out, you don't need fstream's services just to read bytes. So forget this iostream silliness, use fopen/fread and be done with it. C stdio is part of C++, you know ;)