String literal to basic_string<unsigned char> - c++

When it comes to internationalization & Unicode, I'm an idiot American programmer. Here's the deal.
#include <string>
using namespace std;
typedef basic_string<unsigned char> ustring;
int main()
{
static const ustring my_str = "Hello, UTF-8!"; // <== error here
return 0;
}
This emits a not-unexpected complaint:
cannot convert from 'const char [14]' to 'std::basic_string<_Elem>'
Maybe I've had the wrong portion of coffee today. How do I fix this? Can I keep the basic structure:
ustring something = {insert magic incantation here};
?

Narrow string literals are defined to be const char and there aren't unsigned string literals[1], so you'll have to cast:
ustring s = reinterpret_cast<const unsigned char*>("Hello, UTF-8");
Of course you can put that long thing into an inline function:
inline const unsigned char *uc_str(const char *s){
return reinterpret_cast<const unsigned char*>(s);
}
ustring s = uc_str("Hello, UTF-8");
Or you can just use basic_string<char> and get away with it 99.9% of the time you're dealing with UTF-8.
[1] Unless char is unsigned, but whether it is or not is implementation-defined, blah, blah.

Using different character types for a different encodings has the advantages that the compiler barks at you when you mess them up. The downside is, you have to manually convert.
A few helper functions to the rescue:
inline ustring convert(const std::string& sys_enc) {
return ustring( sys_enc.begin(), sys_enc.end() );
}
template< std::size_t N >
inline ustring convert(const char (&array)[N]) {
return ustring( array, array+N );
}
inline ustring convert(const char* pstr) {
return ustring( reinterpret_cast<const ustring::value_type*>(pstr) );
}
Of course, all these fail silently and fatally when the string to convert contains anything other than ASCII.

Related

Idiomatically convert signed to unsigned types if you know that value will be positive

Compare the following code. It's clear from the context that one pointer will always be bigger than the other and the result of std::distance() therefore positive. How do I idiomatically convert from signed to unsigned then without having the compiler complains about narrowing?
Demo
#include <cstdio>
#include <iterator>
#include <string_view>
#include <cstring>
std::string_view foo(const char* token)
{
const char* first = std::strchr(token, 'H');
const char* second = std::strchr(first+1, 'W');
return std::string_view { first, std::distance(first, second) }; // <-- line in question
}
int main()
{
const char* hello = "Hello World!";
const auto view = foo(hello);
printf("%.*s\n", static_cast<int>(view.size()), view.data());
}
Warning:
<source>: In function 'std::string_view foo(const char*)':
<source>:10:51: warning: narrowing conversion of 'std::distance<const char*>(first, second)' from 'std::iterator_traits<const char*>::difference_type' {aka 'long int'} to 'std::basic_string_view<char>::size_type' {aka 'long unsigned int'} [-Wnarrowing]
10 | return std::string_view { first, std::distance(first, second) };
|
~~~~~~~~~~~~~^~~~~~~~~~~~~~~
The reason you get this error is because you are using list initialization an in that context a narrowing conversion is an error. There are a couple ways to fix this. You can switch from
return std::string_view { first, std::distance(first, second) };
to
return std::string_view(first, std::distance(first, second));
which means you no longer have list initialization and narrowing is no longer considered an error or you can be explicit and use static_cast like
return std::string_view{first, static_cast<std::size_t>(std::distance(first, second))};

How do you C++ Vector init in header construct to fill with pointers of a static char const

header file:
#include<vector>
#ifndef FOO_H
#define FOO_H
class Foo
{
public:
Foo(int s) : pegs (s, *TOKEN_EMPTY){}
static char const TOKEN_EMPTY=' ';
protected:
std::vector<char*> pegs;
}
When trying to build I get the error:
error: invalid type argument of unary ‘*’ (have ‘char’)
20 | Foo(int s) : pegs (s, *TOKEN_EMPTY){}
In case it isn't clear, I just want to initialize a C++ vector of pointers that point to a static char const variable.
I'm still checking StackOverflow and online, but I am hopeful someone can help me or lead me in the right direction!
Adding few more points in the answer provided by #Roger.
*TOKEN_EMPTY is incorrect syntax, what you need is address of TOKEN_EMPTY so you need to apply & operator to get address of the variable.
the TOKEN_EMPTY is const char string, so when you apply & operator, it would return const char*. So you need to update your vector to hold const char *.
Alternatively you can also use typecast operation to convert const char * to char * but it could possibly lead to issues where you can potentially modify the value of const char.
Here is a sample code.
#include <iostream>
#include <vector>
class Foo
{
public:
Foo(int s) : pegs (s, &TOKEN_EMPTY){}
static char const TOKEN_EMPTY=' ';
protected:
std::vector<const char*> pegs;
};
int main() {
// ues class Foo
return 0;
}
Changed to &TOKEN_EMPTY and had to use:
std::vector<const char*> pegs;

Why passing char* to a string argument generates a compilation error? [duplicate]

This question already has answers here:
Pass a string in C++
(8 answers)
Closed 5 years ago.
I was prototyping with the following code.
#include <vector>
#include <iostream>
#include <algorithm>
#include <string>
template<class container, class value> class Add {
public:
Add(){};
~Add(){};
void add_value(container& cont, value& val){
std::for_each(cont.begin(), cont.end(), [&val](value& v){v +=val;});
};
};
int main(int argc, char const *argv[])
{
Add<std::vector<std::string>, std::string> a;
std::vector<std::string> vec = {"a", "b", "c", "d"};
std::string foo= "1";
a.add_value(vec, foo); // compiles fine
a.add_value(vec, "1"); // generates an error
return 0;
}
and i got the following error
template.cpp:28:25: error: invalid initialization of non-const reference of type ‘std::__cxx11::basic_string<char>&’ from an rvalue of type ‘std::__cxx11::basic_string<char>’
Why it's not possible to pass a char* to a string argument ?
As far as i know an implicit conversion will be executed in order to convert the char* to std::string and the result will be passed to the function.
You defined the add_value as following:
void add_value(container& cont, value& val)
Where the string is a non-const reference, the compiler expect this reference to point to a modifiable variable somewhere else.
However, when you pass a const char[], even if this type could be converted to a string (in case it would compile), it is done on the fly and the string is not modifiable. Actually, the char* is not modifiable neither. That is why your code do not compile.
You may define your function as following and it would work:
void add_value(container& cont, const value& val)

Invalid conversion from ‘const char*’ to ‘unsigned char*’

A simple C++ code:
int main(){
unsigned char* t="123";
}
on compilation with g++ gives following error:
invalid conversion from ‘const char*’ to ‘unsigned char*’ [-fpermissive]
Why?
In C++ string literals have types of constant character arrays. For example string literal "123" has type const char[4].
In expressions with rare exceptions arrays are converted to pointers to their first elements.
So in this declaration
unsigned char* t="123";
the initializer has type const char *. There is no implicit conversion from const char * to unsigned char *
You could write
const unsigned char* t = reinterpret_cast<const unsigned char *>( "123" );
Another approach, which gets you a modifiable unsigned char array as you originally wanted, is:
#include <cstdlib>
#include <iostream>
using std::cout;
using std::endl;
int main()
{
unsigned char ta[] = "123";
unsigned char* t = ta;
cout << t << endl; // Or ta.
return EXIT_SUCCESS;
}
You can add const to both declarations if you wish, to get const unsigned char without an explicit cast.
Simply use
just char* in place of unsigned char* during declaration
char t[MAX_SIZE] = "123"; // MAX_SIZE should be defined earlier
time tested strcpy() and strncpy functions
Conversions from one type to another type is easy when you use self-defined macros. So here is a set of macros you can use across any platform (Windows, Linux, Solaris, AIX etc...)
#define M_ToCharPtr(p) reinterpret_cast<char*>(p) // Cast to char*
#define M_ToWCharPtr(p) reinterpret_cast<wchar_t*>(p) // Cast to wchar_t*
#define M_ToConstCharPtr(p) reinterpret_cast<const char*>(p) // Cast to const char*
#define M_ToConstWCharPtr(p) reinterpret_cast<const wchar_t*>(p) // Cast to const wchar_t*
#define M_ToUCharPtr(p) reinterpret_cast<unsigned char*>(p) // Cast to unsigned char*
#define M_ToConstUCharPtr(p) reinterpret_cast<const unsigned char*>(p) // Cast to const unsigned char*
#define M_ToUCharPtr(n) reinterpret_cast<unsigned char*>(n) // Cast to unsigned char*
#define M_ToVoidPtr(p) reinterpret_cast<void*>(p) // Cast to void*
#define M_ToConstVoidPtr(p) reinterpret_cast<const void*>(p) // Cast to const void*
#define M_ToIntPtr(n) reinterpret_cast<int*>(n) // Cast to int*
#define M_ToConstIntPtr(p) reinterpret_cast<const int*>(p) // Cast to const int*
#define M_ToDoublePtr(n) reinterpret_cast<double*>(n) // Cast to double*
#define M_ToConstDoublePtr(n) reinterpret_cast<const double*>(n) // Cast to const double*
#define M_ToBoolPtr(n) reinterpret_cast<bool*>(n) // Cast to bool*
#define M_ToConstBoolPtr(n) reinterpret_cast<const bool*>(n) // Cast to const bool*
// General Cast
#define M_To(T, p) reinterpret_cast<T>(p) // Cast to T
In your case
const unsigned char* t = reinterpret_cast<const unsigned char *>("UCHAR TO CONST UCHAR");
is equivalent to
const unsigned char* t = M_ToConstUCharPtr("UCHAR TO CONST UCHAR");

SWIG-generated Lua<-->C++ Wrapper mishandling primitive types renamed by typedef

I use SWIG to generate a C++ <--> Lua wrapper for a work project.
my main problem is, in this project at the base there exist type definitions for each platform. E.g. for Win32 there exists a header Win32Types.h where things like
typedef char Char;
typedef char TChar;
typedef signed int Int;
typedef unsigned int UInt;
typedef signed char Int8;
typedef unsigned char UInt8;
...
are defined.
The problem is now, with an example class like
class Named
{
public:
Named();
virtual ~Named();
void setName(const Char *name);
const Char* GetName() const;
}
, the setName- Method generated in the SWIG-wrapper looks something like this:
static int _wrap_Named_SetName(lua_State* L) {
int SWIG_arg = 0;
Named *arg1 = (Named *) 0 ;
Char *arg2 = (Char *) 0 ;
SWIG_check_num_args("Named::SetName",2,2)
if(!SWIG_isptrtype(L,1))
SWIG_fail_arg("Named::SetName",1,"Named *");
if(!SWIG_isptrtype(L,2))
SWIG_fail_arg("Named::SetName",2,"Char const *");
if (!SWIG_IsOK(SWIG_ConvertPtr(L,1,(void**)&arg1,SWIGTYPE_p_Named,0))){
SWIG_fail_ptr("Named_SetName",1,SWIGTYPE_p_Named);
}
if (!SWIG_IsOK(SWIG_ConvertPtr(L,2,(void**)&arg2,SWIGTYPE_p_Char,0))){
SWIG_fail_ptr("Named_SetName",2,SWIGTYPE_p_Char);
}
...
}
the problem here is, the wrapper tries to treat Char as just another class pointer, although it is just a char pointer renamed to Char.
is there any way to circumvent this behaviour?
i tried to write a typemap like
%typemap(in) Char {
$1 = lua_tostring($input);
}
, but im not sure i did it the right way...
There's two easier ways you can do this:
Show SWIG the typedefs for that platform, probably using %include
Tell SWIG to just use the normal unsigned char * typemap using %apply:
%apply unsigned char * { const Char * }