Process foreign language in C++ - c++

Hello I want to process Arabic language in my C++ code.
My code
int main() {
wchar_t s[] = "ﺳﺄﻟﺘﻚِ ﺏﺎﻠﻠﻫ - ﻳﺎ ﻥﻭﺭ ﻊﻴﻨﻳ -\nﺏﺩﻮﻨﻳ ﻫﻞ ﻒﻳ ﺍﻟﺤﻴﺎﺓِ ﺣﻴﺎﺓ ؟\nﺖﻋﺎﻠ";
for (auto ch : s) {
cout << ch;
}
}
I received an error
error: int-array initialized from non-wide string
Also I tried wstring
int main() {
wstring s = "ﺳﺄﻟﺘﻚِ ﺏﺎﻠﻠﻫ - ﻳﺎ ﻥﻭﺭ ﻊﻴﻨﻳ -\nﺏﺩﻮﻨﻳ ﻫﻞ ﻒﻳ ﺍﻟﺤﻴﺎﺓِ ﺣﻴﺎﺓ ؟\nﺖﻋﺎﻠ";
for (wchar_t ch : s) {
cout << ch;
}
}
But received error
conversion from ‘const char [148]’ to non-scalar type ‘std::__cxx11::wstring {aka std::__cxx11::basic_string<wchar_t>}’ requested

You could do it like this:
wchar_t s[] =L"ﺳﺄﻟﺘﻚِ ﺏﺎﻠﻠﻫ - ﻳﺎ ﻥﻭﺭ ﻊﻴﻨﻳ -\nﺏﺩﻮﻨﻳ ﻫﻞ ﻒﻳ ﺍﻟﺤﻴﺎﺓِ ﺣﻴﺎﺓ ؟\nﺖﻋﺎﻠ";
The whole program is here :
#include <iostream>
int main() {
wchar_t s[] =L"ﺳﺄﻟﺘﻚِ ﺏﺎﻠﻠﻫ - ﻳﺎ ﻥﻭﺭ ﻊﻴﻨﻳ -\nﺏﺩﻮﻨﻳ ﻫﻞ ﻒﻳ ﺍﻟﺤﻴﺎﺓِ ﺣﻴﺎﺓ ؟\nﺖﻋﺎﻠ";
for (auto ch : s) {
std::cout << ch;
}
}
The ouput of the execution is a list of wchar_t:
6520365156652476517665242161632651676516665248652486525932453265267651663265253652616519732652266526865256652673245106516765193652626525665267326525965246326523465267326516565247651886526865166651711616326518765268651666517132156710651746522765166652480

Wide character literals and wide character string literals in C and C++ must be prefixed by L. For example: L'A', L"hello". Also, in this particular case, you likely will need to output into std::wcout.

Related

Section character (§) overflows char type (C++)

When trying to define the char:
char q = '§';
clion throws an error: "Character too large for enclosing character literal type". This is weird as if I look up the ascii conversion of § it is just 167.
If I use:
char c;
std::string q = "§";
for (char el:q) {
c = el;
std::cout << c;
}
the output reads: §
and:
int c;
std::string q = "§";
for (char el:q) {
c = (int) el;
std::cout << c;
}
outputs: -62-89
So it seems that the character overflows the char type
I am implenting RSA encryption using unsinged long long int instead of int in this case and the overflow still occurs which corrupts the decrypted data. How can I convert this character and potentially others that may overflow the char type into their respective ascii value (for this example (char)'§' should return 167).
conversion with unsigned long long int:
#define ull unsigned long long int
int main() {
ull c;
std::string q = "§";
for (char el:q) {
c = (ull) el;
std::cout << c;
}
}
output: 1844674407370955155418446744073709551527
using wchar_t also did not fix the issue.
One way to go around it is to use unicode string:
auto q = u"\u00A7";
Unicode strings (u for 16-bit and U for 32-bit) can in general be used similarly to normal std::string type but when you iterate over it or index into it, you'll have the corresponding character type: char16_t or char32_t.

Why I cant use atoi() function on a "string" instead of the char pointer in C++?

I tried the following code and it is giving me error.
int main() {
string String = "1235";
int num = atoi(String);
cout << num << endl;
return 0;
}
/*
error: cannot convert 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}' to 'const char*' for argument '1' to 'int atoi(const char*)'
int num = atoi(String);
^
mingw32-make.exe[1]: *** [Debug/main.cpp.o] Error 1
mingw32-make.exe: *** [All] Error 2
*/
But if I use the following code it works perfectly fine.
int main() {
char* String = "1235";
int num = atoi(String);
cout << num << endl;
return 0;
}
//prints out 1235
I know I can solve my problem using stoi() function.
int main() {
string String = "1235";
int num = stoi(String);
cout << num << endl;
return 0;
}
//prints out 1235
I can solve my problem by using a char pointer instead of string. But I just want to know why this can't be done by placing string itself into atoi(). How does atoi() work internally?
I just wanna know how does atoi() function work in C++
Because const char* and std::string are incompatible, the implicit conversion
cause error.
If you still want to use std:string:
int main() {
string String = "1235";
int num = atoi(String.c_str());
cout << num << endl;
return 0;
}
see this ref.
While std::stoi accepts std::string as input, ::atoi does not.
Note: std::string is a c++ class type, const char* is a basic data type. Although std::string does have a member function .c_str(), which can return its C-Style representation with const char* type.
Protype declarations of std::stoi in <string>:
int stoi( const std::string& str, std::size_t* pos = nullptr, int base = 10 );
int stoi( const std::wstring& str, std::size_t* pos = nullptr, int base = 10);
Protype declaration of ::atoi in <stdlib.h>:
int atoi (const char *__nptr);

expected primary expression before ; token

I am trying to write a function that will split a string based on a given character and return a vector of the resulting strings but I am getting a compilation error at the line of my for loop. Any ideas why? I should be able to assign astring[0] to a char pointer correct?
/*
splits string by a given character and returns a vector of each segment
if string = "ab cd ef" and split_char = " " it will return a vector with
"ab" in first location "cd" in second location and "ef" in third location
*/
vector<string> split_string(string string_to_split, const char split_char)
{
//deletes leading split characters
int num_leading_split_char = 0;
for (char * c = string_to_split[0]; c* == split_char; c++)
{
num_leading_split_char++;
}
string_to_split.erase(0, num_leading_split_char);
//makes the split string vector
vector<string> split_string;
string temp_string = "";
for (char * c = string_to_split[0]; c*; c++)
{
if (*c == split_char)
{
split_string.push_back(temp_string); //add string to vector
temp_string = ""; //reset temp string
}
else
{
temp_string += *c; //adds char to temp string
}
}
return split_string;
}
error message:
pgm5.cpp: In function ‘std::vector >
split_string(std::__cxx11::string, char)’:
pgm5.cpp:257:34: error: invalid conversion from
‘__gnu_cxx::__alloc_traits >::value_type {aka
char}’ to ‘char*’ [-fpermissive] for (char c = string_to_split[0];
c == split_char; c++)
^
pgm5.cpp:257:40: error: expected primary-expression before ‘==’ token
for (char c = string_to_split[0]; c == split_char; c++)
^~
pgm5.cpp:269:34: error: invalid conversion from
‘__gnu_cxx::__alloc_traits >::value_type {aka
char}’ to ‘char*’ [-fpermissive] for (char c = string_to_split[0];
c; c++)
^
pgm5.cpp:269:39: error: expected primary-expression before ‘;’ token
for (char c = string_to_split[0]; c; c++)
^
Compilation failed.
I should be able to assign std::string str[0] to a char* pointer correct?
No. str[0] is a char literal not a char* pointer. Your complier is giving you this exact error. Since you're already using std::string why not just simply leverage some of the nice operations it provides for you like substr and find so you don't need to reinvent the wheel or maybe it's a flat tire in your case (kidding). Also, it's a good idea to pass non-POD types that you're not modifying as const references to avoid unnecessary copies i.e. const std::string &. I know in your code there is an erase operation, but in this example there is no need to modify the string being passed in.
std::vector<std::string> split_string(const std::string &string_to_split, char delim)
{
std::vector<std::string> results;
size_t match = 0;
size_t found = 0;
// match each part of string_to_split on delim and tokenize into results
// if delim char is never found then return empty vector
while ((found = string_to_split.find(delim, match)) != std::string::npos)
{
results.push_back(string_to_split.substr(match, found - match));
match = found + 1; // start again at next character
}
// after the loop, if any match was found store the last token
if (match != 0)
{
results.push_back(string_to_split.substr(match));
}
return results;
}
If you are tokenizing on spaces you can use it like this.
std::string test("This is a test.");
std::vector<std::string> tokenized = split_string(test, ' ');
for (const auto& s : tokenized)
{
std::cout << "s=" << s << std::endl;
}
Which will yield the following results.
s=This
s=is
s=a
s=test.
Try this you can try it at http://cpp.sh/8r6ze:
#include <sstream> // std::istringstream
#include <string>
#include <vector>
#include <iostream>
std::vector<std::string> split_string(const std::string string_to_split,
const char delimiter)
{
std::vector<std::string> tokens;
std::string token;
std::istringstream tokenStream(string_to_split);
while (std::getline(tokenStream, token, delimiter))
{
tokens.push_back(token);
}
return tokens;
}
int main ()
{
const std::string theString{"thy with this"};
for (const auto &s:split_string(theString,' ')){
std::cout << s <<std::endl;
}
return 0;
}

C++ strange function behavior

I've been working in C++ a bit lately and only using a small subset of the language (I'd call it C with classes) so I've been messing around trying to learn about some of the other features of the language. To this end I was going to write a simple JSON parser, and almost immediately hit a road block I cannot decipher. Here's the code:
//json.hpp
#include <cstring>
namespace JSON
{
const char WS[] = {0x20,0x09,0x0A,0x0D};
const char CONTROL[] = {0x5B,0x7B,0x5D,0x7D,0x3A,0x2C};
bool is_whitespace(char c) {
if (strchr(WS, c) != nullptr)
return true;
return false;
}
bool is_control_char(char c) {
if (strchr(CONTROL, c) != nullptr)
return true;
return false;
}
}
And here's main.cpp:
#include <iostream>
#include "json.hpp"
using namespace std;
int main(int argc, char **argv) {
for(int i=0; i < 127; i++) {
if(JSON::is_whitespace((char) i)) {
cout << (char) i << " is whitespace." << endl;
}
if(JSON::is_control_char((char) i)) {
cout << (char) i << " is a control char." << endl;
}
}
return 0;
}
I'm just trying to check if a char is a valid whitespace or a valid control character in JSON.
is whitespace.
is a control char.
is whitespace.
is whitespace.
is whitespace.
is whitespace.
, is whitespace.
, is a control char.
: is whitespace.
: is a control char.
[ is whitespace.
[ is a control char.
] is whitespace.
] is a control char.
{ is whitespace.
{ is a control char.
} is whitespace.
} is a control char.
I've been staring for awhile now. I don't even know what search term to put into Google to describe this error (or feature?)... any explanations would be greatly appreciated.
If you read the requirements on strchr
const char* strchr( const char* str, int ch );
str - pointer to the null-terminated byte string to be analyzed
Whereas you are passing in:
const char WS[] = {0x20,0x09,0x0A,0x0D};
const char CONTROL[] = {0x5B,0x7B,0x5D,0x7D,0x3A,0x2C};
Neither of those is a null-terminated byte string. You could manually add a 0:
const char WS[] = {0x20,0x09,0x0A,0x0D, 0x0};
const char CONTROL[] = {0x5B,0x7B,0x5D,0x7D,0x3A,0x2C, 0x0};
Or, better yet, not actually rely on that behavior:
template <size_t N>
bool contains(const char (&arr)[N], char c) {
return std::find(arr, arr+N, c) != (arr+N);
}
bool is_whitespace(char c) { return contains(WS, c); }
bool is_control_char(char c) { return contains(CONTROL, c); }
In C++11:
template <size_t N>
bool contains(const char (&arr)[N], char c) {
return std::find(std::begin(arr), std::end(arr), c) !=
std::end(arr);
}

AscW equivalent from VB in C++

I am using the function AscW (in VB6) to convert a unicode character into the character code.
I would like to know if there is an equivalent to this function in C++.
For example, I would like to get the value 32 from the character " ".
I would like to do the following:
wstring wstringToLower(wstring u)
{
wstring s;
for (int i=0;i<u.size();i++)
{
wstring sChar;
sChar=u.substr(i,1);
int iChar=static_cast<int>(sChar);
int iNewChar=charCodeToLower(iChar);
wstring sNewChar;
sNewChar=wstring(iNewChar,1);
s+=sNewChar;
}
return s;
}
The error "No conversion function found for std::wstring to int" is raised in the line
int iChar=static_cast<int>(sChar);
Cast to int:
int main()
{
std::wstring u = L"abc";
std::wstring sChar = u.substr(1, 1);
for (int i = 0; i < sChar.size(); ++i)
std::cout << static_cast<int>(sChar[i]);
}