How do I read a text file having Unicode codes? - c++

I initialize a string using the following code.
std::string unicode8String = "\u00C1 M\u00F3ti S\u00F3l";
Printing it using cout, the output is Á Móti Sól.
But when I read same same string from a text file using ifstream, store it in a std::string, and print it, the output is \u00C1 M\u00F3ti S\u00F3l.
The content of my file is \u00C1 M\u00F3ti S\u00F3l and I want to print it as Á Móti Sól. Is there any way to do this?

Off the top of my head (completely untested)
std::string convert_string(const std::string& in)
{
std::string out;
for (size_t i = 0; i < in.size(); )
{
if (i + 5 < in.size() && in[i] == '\\' && in[i+1] == 'u' &&
in[i+2] == '0' && in[i+3] == '0' &&
isxdigit(in[i+4]) && isxdigit(in[i+5]))
{
out += (unsigned char)16*in[i+4] + (unsigned char)in[i+5];
i += 6;
}
else
{
out += in[i];
++i;
}
}
return out;
}
But this won't work with any unicode values above 255, (e.g. \u1234) because you have the fundamental problem that your string stores 8 bit characters, and Unicode characters can have up to 20 bits.
As I said completely untested, but I'm sure you get the idea.

Can you try printing using "std::wcout"!

The unicode characters have a different representation in a text file (There is no \u).
For Evaluation
int main()
{
// Write
{
std::string s = "\u00C1 M\u00F3ti S\u00F3l";
std::ofstream out("/tmp/test.txt");
out << s;
}
// Read Text
{
std::string s;
std::ifstream in("/tmp/test.txt");
std::getline(in, s);
std::cout << "Result: " << s << std::endl;
}
// Read Binary
{
std::ifstream in("/tmp/test.txt");
in.unsetf(std::ios_base::skipws);
std::istream_iterator<unsigned char> first(in);
std::istream_iterator<unsigned char> last;
std::vector<unsigned char> v(first, last);
std::cout << "Result: ";
for(unsigned c: v) std::cout << std::hex << c << ' ';
std::cout << std::endl;
}
return 0;
}
On Linux with UTF8:
Result: Á Móti Sól
Result: c3 81 20 4d c3 b3 74 69 20 53 c3 b3 6c

Related

How to convert hex representation from URL (%) to std::string (chinese text)?

Intro
I have some input that I need to convert to the correct Chinese characters but I think I'm stuck at the final number to string conversion. I have checked using this hex to text converter online tool that e6b9af corresponds to the text 湯.
MWE
Here is a minimal example that I made to illustrate the problem. The input is "%e6%b9%af" (obtained from an URL somewhere else).
#include <iostream>
#include <string>
std::string attempt(std::string path)
{
std::size_t i = path.find("%");
while (i != std::string::npos)
{
std::string sub = path.substr(i, 9);
sub.erase(i + 6, 1);
sub.erase(i + 3, 1);
sub.erase(i, 1);
std::size_t s = std::stoul(sub, nullptr, 16);
path.replace(i, 9, std::to_string(s));
i = path.find("%");
}
return path;
}
int main()
{
std::string input = "%E6%B9%AF";
std::string goal = "湯";
// convert input to goal
input = attempt(input);
std::cout << goal << " and " << input << (input == goal ? " are the same" : " are not the same") << std::endl;
return 0;
}
Output
湯 and 15120815 are not the same
Expected output
湯 and 湯 are the same
Additional question
Are all characters in foreign languages represented in 3 bytes or is that just for Chinese? Since my attempt assumes blocks of 3 bytes, is that a good assumption?
Based on your suggestions and changing an example from this other post. This is what I came up with.
#include <iostream>
#include <string>
#include <sstream>
std::string decode_url(const std::string& path)
{
std::stringstream decoded;
for (std::size_t i = 0; i < path.size(); i++)
{
if (path[i] != '%')
{
if (path[i] == '+')
decoded << ' ';
else
decoded << path[i];
}
else
{
unsigned int j;
sscanf(path.substr(i + 1, 2).c_str(), "%x", &j);
decoded << static_cast<char>(j);
i += 2;
}
}
return decoded.str();
}
int main()
{
std::string input = "%E6%B9%AF";
std::string goal = "湯";
// convert input to goal
input = decode_url(input);
std::cout << goal << " and " << input << (input == goal ? " are the same" : " are not the same") << std::endl;
return 0;
}
Output
湯 and 湯 are the same

Extract data from text file in formatted/sequential way

I have a file .txt with this structure:
000010109000010309000010409
Where i need to read like this:
00001 01 09
00001 03 09
00001 04 09
I have a structure in this way:
struct A{
string number // first 5 numbers
int day; // 2 numbers
int month; // last 2 numbers
};
I tried in this way.
But dont work.
char number[6];
char day[3];
char monthes[3];
ifstream read("file.txt");
for (int i = 0; i < 100; i++) {
read.get(number,6);
structure[i].number= number;
read.get(day,3);
structure[i].day= atoi(day);
read.get(month,3);
structure[i].month= atoi(month);
}
How read and store the datas from the file to this structure?
And how compare if the number have in the same month many days.
Thanks a lot.
You could try something like this:
FILE *fp = fopen("file.txt", "r");
A a;
char x[6], y[3], z[3];
fscanf(fp, "%5s%2s%2s", x, y, z);
a.number = string(x);
a.day = atoi(y);
a.month = atoi(z);
EDIT:
It is valid C++ code (with cstdio and cstdlib), but if you want some C++ version, you could try something like this
// read from file.txt
std::ifstream inFile("file.txt", std::ios::in);
char number[6], day[3], month[3];
int cnt = 0;
while (inFile.get(number, 6) && inFile.get(day, 3) && inFile.get(month, 3)) {
// Let's assume that structure[] is array of struct A
structure[cnt].number = number;
structure[cnt].day = atoi(day);
structure[cnt].month = atoi(month);
++cnt;
}
// store to file2.txt
std::ofstream outFile("file2.txt", std::ios::out);
for (int i = 0; i < cnt; ++i) {
outFile << structure[i].number << ' ' << structure[i].day << ' ' << structure[i].month << std::endl;
// If you need fixed-width for day/month, use std::setw and std::setfill
}
// compare first two
if (cnt >= 2) {
if (structure[0].number == structure[1].number &&
structure[0].month == structure[1].month &&
structure[0].day == structure[1].day) {
std::cout << "Same!" << std::endl;
} else {
std::cout << "Different!" << std::endl;
}
}
I'm using stringstream here as an example but the same will work with fstream. Convert number, day and month from char arrays to required data types as necessary (all are '\0' terminated)
#include <iostream>
#include <sstream>
int main() {
std::string s = "000010109000010309000010409";
std::istringstream iss(s);
char number[6], day[3], month[3];
while (iss.get(number, 6) && iss.get(day, 3) && iss.get(month, 3))
std::cout << number << " " << day << " " << month << std::endl;
return 0;
}
https://ideone.com/tyBShW
00001 01 09
00001 03 09
00001 04 09

How to add commas to a string using recursion

I'm a beginner on programming. I'm coding a school assignment and its asking me to add commas to a string using recursion. I have most of it done but when I input a number greater than a million it doesn't add a comma before the first digit. This is what i have so far:
// commas - Convert a number (n) into a string, with commas
string commas(int n) {
ostringstream converted;
converted << n;
string number = converted.str();
int size = number.length();
if (size < 4 )
{
return number;
}
if (size >= 4 )
{
return number.substr(0, number.size() - 3) + "," + number.substr(number.size() - 3, number.length());
}
}
Any help would be greatly appreciated!
The algorithm is fairly simple. It is very similar to your solution except I added the part necessary for recursion. To understand how it works, remove tack_on. Here is example output:
1
10
100
These are the first groups that are returned when the terminating condition is reached (s.size() < 4). Then the rest of the groups are prefixed with a comma and "tacked on". The entire string is built using recursion. This is important because if you left number.substr(0, number.size() - 3) in, your output would look like this:
11,000
1010,000
100100,000
11,0001000,000
I use std::to_string which is C++11:
#include <iostream>
std::string addCommas(int n)
{
std::string s = std::to_string(n);
if (s.size() < 4) return s;
else
{
std::string tack_on = "," + s.substr(s.size() - 3, s.size());
return addCommas(n / 1000) + tack_on;
}
}
You only need to make minimal changes for the C++03/stringstream version:
#include <sstream>
std::ostringstream oss;
std::string addCommas(int n)
{
oss.str(""); // to avoid std::bad_alloc
oss << n;
std::string s = oss.str();
// etc
}
Testing:
int main()
{
std::cout << addCommas(1) << "\n";
std::cout << addCommas(10) << "\n";
std::cout << addCommas(100) << "\n";
std::cout << addCommas(1000) << "\n";
std::cout << addCommas(10000) << "\n";
std::cout << addCommas(100000) << "\n";
std::cout << addCommas(1000000) << "\n";
return 0;
}
Output:
1
10
100
1,000
10,000
100,000
1,000,000
I think this one is a bit simpler and easier to follow:
std::string commas(int n)
{
std::string s = std::to_string(n%1000);
if ((n/1000) == 0) return s;
else
{
// Add zeros if required
while(s.size() < 3)
{
s = "0" + s;
}
return commas(n / 1000) + "," + s;
}
}
an alternative approach without recursion:
class Grouping3 : public std::numpunct< char >
{
protected:
std::string do_grouping() const { return "\003"; }
};
std::string commas( int n )
{
std::ostringstream converted;
converted.imbue( std::locale( converted.getloc(), new Grouping3 ) );
converted << n;
return converted.str();
}
will need #include <locale> in some environments
A possible solution for the assignment could be:
std::string commas( std::string&& str )
{
return str.length() > 3?
commas( str.substr( 0, str.length()-3 ) ) + "," + str.substr( str.length()-3 ):
str;
}
std::string commas( int n )
{
std::ostringstream converted;
converted << n;
return commas( converted.str() );
}

Extract integer from a string

I have string like "y.x-name', where y and x are number ranging from 0 to 100. From this string, what would be the best method to extract 'x' into an integer variable in C++.
You could split the string by . and convert it to integer type directly. The second number in while loop is the one you want, see sample code:
template<typename T>
T stringToDecimal(const string& s)
{
T t = T();
std::stringstream ss(s);
ss >> t;
return t;
}
int func()
{
string s("100.3-name");
std::vector<int> v;
std::stringstream ss(s);
string line;
while(std::getline(ss, line, '.'))
{
v.push_back(stringToDecimal<int>(line));
}
std::cout << v.back() << std::endl;
}
It will output: 3
It seem that this thread has a problem similar to you, it might help ;)
Simple string parsing with C++
You can achieve it with boost::lexical_cast, which utilizes streams like in billz' answer:
Pseudo code would look like this (indices might be wrong in that example):
std::string yxString = "56.74-name";
size_t xStart = yxString.find(".") + 1;
size_t xLength = yxString.find("-") - xStart;
int x = boost::lexical_cast<int>( yxString + xStart, xLength );
Parsing errors can be handled via exceptions that are thrown by lexical_cast.
For more flexible / powerful text matching I suggest boost::regex.
Use two calls to unsigned long strtoul( const char *str, char **str_end, int base ), e.g:
#include <cstdlib>
#include <iostream>
using namespace std;
int main(){
char const * s = "1.99-name";
char *endp;
unsigned long l1 = strtoul(s,&endp,10);
if (endp == s || *endp != '.') {
cerr << "Bad parse" << endl;
return EXIT_FAILURE;
}
s = endp + 1;
unsigned long l2 = strtoul(s,&endp,10);
if (endp == s || *endp != '-') {
cerr << "Bad parse" << endl;
return EXIT_FAILURE;
}
cout << "num 1 = " << l1 << "; num 2 = " << l2 << endl;
return EXIT_FAILURE;
}

Return fixed length std::string from integer value

Problem -> To return fixed length string to std::string*.
Target machine -> Fedora 11 .
I have to derive a function which accepts integer value and return fixed lenght string to a string pointer;
for example -> int value are in range of 0 to -127
so for int value 0 -> it shoud display 000
for value -9 -> it should return -009
for value say -50 -> it should return -050
for value say -110 -> it should return -110
so in short , lenght should be same in all cases.
What I have done : I have defined the function according to the requirement which has shown below.
Where I need help: I have derived a function but I am not sure if this is correct approach. When I test it on standalone system on windows side , the exe stopped working after sometimes but when I include this function with the overall project on Linux machine , it works flawlessly.
/* function(s)to implement fixed Length Rssi */
std::string convertString( const int numberRssi, std::string addedPrecison="" )
{
const std::string delimiter = "-";
stringstream ss;
ss << numberRssi ;
std::string tempString = ss.str();
std::string::size_type found = tempString.find( delimiter );
if( found == std::string::npos )// not found
{
tempString = "000";
}
else
{
tempString = tempString.substr( found+1 );
tempString = "-" +addedPrecison+tempString ;
}
return tempString;
}
std::string stringFixedLenght( const int number )
{
std::string str;
if( (number <= 0) && (number >= -9) )
{
str = convertString( number, "00");
}
else if( (number <= -10) && (number >= -99) )
{
str = convertString( number, "0");
}
else
{
str= convertString(number, "");
}
return str;
}
// somewhere in the project calling the function
ErrorCode A::GetNowString( std::string macAddress, std::string *pString )
{
ErrorCode result = ok;
int lvalue;
//some more code like iopening file and reading file
//..bla
// ..bla
// already got the value in lvalue ;
if( result == ok )
{
*pString = stringFixedLenght( lValue );
}
// some more code
return result;
}
You can use I/O manipulators to set the width that you need, and fill with zeros. For example, this program prints 00123:
#include <iostream>
#include <iomanip>
using namespace std;
int main() {
cout << setfill('0') << setw(5) << 123 << endl;
return 0;
}
You have to take care of the negative values yourself, though: cout << setfill('0') << setw(5) << -123 << endl prints 0-123, not -0123. Check if the value is negative, set the width to N-1, and add a minus in front.
How about using std::ostringstream and the standard output formatting manipulators?
std::string makeFixedLength(const int i, const int length)
{
std::ostringstream ostr;
if (i < 0)
ostr << '-';
ostr << std::setfill('0') << std::setw(length) << (i < 0 ? -i : i);
return ostr.str();
}
Note that your examples contradict your description: if the value is -9,
and the fixed length is 3, should the output be "-009" (as in your
example), or "-09" (as you describe)? If the former, the obvious
solution is to just use the formatting flags on std::ostringstream:
std::string
fixedWidth( int value, int width )
{
std::ostringstream results;
results.fill( '0' );
results.setf( std::ios_base::internal, std::ios_base::adjustfield );
results << std::setw( value < 0 ? width + 1 : width ) << value;
return results.str();
}
For the latter, just drop the conditional in the std::setw, and pass
width.
For the record, although I would avoid it, this is one of the rare cases
where printf does something better than ostream. Using snprintf:
std::string
fixedWidth( int value, int width )
{
char buffer[100];
snprintf( buffer, sizeof(buffer), "%.*d", width, value );
return buffer;
}
You'd probably want to capture the return value of snprintf and add
some error handling after it, just in case (but 100 chars is
sufficient for most current machines).
I have nothing against the versions that use streams, but you can do it all yourself more simply than your code:
std::string fixedLength(int value, int digits = 3) {
unsigned int uvalue = value;
if (value < 0) {
uvalue = -uvalue;
}
std::string result;
while (digits-- > 0) {
result += ('0' + uvalue % 10);
uvalue /= 10;
}
if (value < 0) {
result += '-';
}
std::reverse(result.begin(), result.end());
return result;
}
like this?
#include <cstdlib>
#include <string>
template <typename T>
std::string meh (T x)
{
const char* sign = x < 0 ? "-" : "";
const auto mag = std::abs (x);
if (mag < 10) return sign + std::string ("00" + std::to_string(mag));
if (mag < 100) return sign + std::string ("0" + std::to_string(mag));
return std::to_string(x);
}
#include <iostream>
int main () {
std::cout << meh(4) << ' '
<< meh(40) << ' '
<< meh(400) << ' '
<< meh(4000) << '\n';
std::cout << meh(-4) << ' '
<< meh(-40) << ' '
<< meh(-400) << ' '
<< meh(-4000) << '\n';
}
004 040 400 4000
-004 -040 -400 -4000