here's what I am trying to do:
typedef uint16_t uchar16_t;
uchar16_t buf[32];
// buf will contain timezone information like GMT-6, Eastern Daylight Time, etc
char * str = "Test";
for (int i = 0; i <= strlen(str); i++)
buf[i] = str[i];
I guess that's not correct since uchar16_t would contain 2 bytes and str contains 1 byte.
What is it that I am supposed to do ?
Strlen? buf[32]? Trying to destroy the universe?
You want to use a wstringstream.
std::wstringstream lols;
lols << "Test";
std::wstring cakes;
lols >> cakes;
Edit#Comment:
You shouldn't use strlen because any decent string system allows embedded zeros, and strlen is seriously slow. In addition, you didn't resize your buffer as needed, so if you had a string of size > 31 you would get a buffer overflow. In addition, you would have to (if you did dynamically size your buffer) manually free it afterwards. Both of these things are serious failings of the C string system. My example code makes your standard library writer do all the work and avoid all these problems for you.
That's actually OK if your string will always be ASCII. To do it correctly, the portable function is mbstowcs which assumes you're converting from the default locale or if you're on Windows then there's API functions that let you specify the source code page explicitly.
Your code will work, as long as str is ASCII; calling strlen() in the loop condition is probably a bad idea, though. It might be easier to just use swprintf() if it's available on your system:
uchar16_t buf[32];
char *str = "Test";
swprintf(buf, sizeof buf, "%s", str);
Have a look here.
Also, is there a good reason you are defining your own type?
If you have a (narrow) char string, you cannot convert it to
a wchar_t string by setting your locale to "C" and then passing
the string through mbstowcs(). That's because the "C" locale specifies
a -particular- character encoding, and that encoding might not match
the encoding of the execution character set, so mbstowcs() might
map the characters to something unexpected, or could even fail
(if the execution character set happened to use encodings that
were incompatible with the encoding structure for the C locale
character set.)
Thus, in order to convert a char
string into a wider string, you have
to copy the chars one by one into an
array of wchar_t . If you need to work
with Unicode or utf-16 or whatever
after that, then wcstombs() is what
you should look at.
Related
Before(using ASCII) i was using std::string as buffer like this:
std::string test = "";
int value = 6;
test.append("some string");
test.append((char*)value, 4);
test.append("some string");
with expected value in test:
"some srtring\x6\x0\x0\x0somestring"
Now i am tring to use Unicode and i wanna keep the same "code" but trubles happens:
std::wstring test = "";
int value = 6;
test.append("some string");
test.append((wchar_t*)value, 4); (buffer overflow cause reading 8 bytes)
test.append("some string");
How can i append bytes like in std::string?
Doing:
std::wstring test = "";
int value = 6;
test.append("some string");
test.append((wchar_t*)value, 2);
test.append("some string");
Solve partially the problem cause after i can't append bools.
EDIT:
i can even use wstringstream if a binary copy is applied.(normally not)
You're confusing unicode and character encodings. An std::string can represent unicode code points just fine, using the UTF-8 encoding.
Windows uses the UTF-16LE (or UTF-16 with a BOM, I believe) encoding to represent unicode glyphs. Most others use UTF-8.
An std::string which is encoded in UTF-8 and which uses only ASCII characters can actually be interpreted as an ASCII string. This is the beauty of UTF-8. It's a natural extension.
Anyway,
i need a "binary" dynamic buffer, where i can add the real size of types(bool 1, int 4 etc)
An std::vector<uint8_t> is probably more suitable for this task. It communicates that it is not something human-readable, per se. If you need to embed strings into this buffer, make sure that sizeof(char) == sizeof(uint8_t) on the platform, and then just write the data as-is to this buffer.
If you're saving this buffer on one machine and try to read it on another machine, you have to take care of endianness too.
You make a function that reads the stuff you want to put:
void putBytes(std::wstring& s, char* c, int numBytes)
{
while (numBytes-- > 0)
s += (wchar_t)*c++;
}
Then you can call it:
int value = 65;
putBytes(s, reinterpret_cast<char*>(&value), sizeof(value));
I think a IStream is a proper way to do this...i'll make an interface to handle different types. I was abusing std::string for an easy "dynamic binary array", with std::wstring this is not possible,for many reasons but most silly one is that require at least 2 bytes, so no room for a bool
I need to convert utf16 text to utf8. The actual conversion code is simple:
std::wstring in(...);
std::string out = boost::locale::conv::utf_to_utf<char, wchar_t>(in);
However the issue is that the UTF16 is read from a file and it may or may not contain BOM. My code needs to be portable (minimum is windows/osx/linux). I'm really struggling to figure out how to create a wstring from the byte sequence.
EDIT: this is not a duplicate of the linked question, as in that question the OP needs to convert a wide string into an array of bytes - and I need to convert the other way around.
You should not use wide types at all in your case.
Assuming you can get a char * from your vector<char>, you can stick to bytes by using the following code:
char * utf16_buffer = &my_vector_of_chars[0];
char * buffer_end = &my_vector_of_chars[vector.size()];
std::string utf8_str = boost::locale::conv::between(utf16_buffer, buffer_end, "UTF-8", "UTF-16");
between operates on 8-bit characters and allows you to avoid conversion to 16-bit characters altogether.
It is necessary to use the between overload that uses the pointer to the buffer's end, because by default, between will stop at the first '\0' character in the string, which will be almost immediately because the input is UTF-16.
I have a wide char variable which I want to initialize with a size of string.
I tried following but didn't worked.
std::string s = "aaaaaaaaaaaaaaaaaaaaa"; //this could be any length
const int Strl = s.length();
wchar_t wStr[Strl ]; // This throws error message as constant expression expected.
what option do i have to achieve this? will malloc work in this case?
Since this is C++, use new instead of malloc.
It doesn't work because C++ doesn't support VLA's. (variable-length arrays)
The size of the array must be a compile-time constant.
wchar_t* wStr = new wchar_t[Strl];
//free the memory
delete[] wStr;
First of all, you can't just copy a string to a wide character array - everything is going to go berserk on you.
A std::string is built with char, a std::wstring is built with wchar_t. Copying a string to a wchar_t[] is not going to work - you'll get gibberish back. Read up on UTF8 and UTF16 for more info.
That said, as Luchian says, VLAs can't be done in C++ and his heap allocation will do the trick.
However, I must ask why are you doing this? If you're using std::string you shouldn't (almost) ever need to use a character array. I assume you're trying to pass the string to a function that takes a character array/pointer as a parameter - do you know about the .c_str() function of a string that will return a pointer to the contents?
std::wstring ws;
ws.resize(s.length());
this will give you a wchar_t container that will serve the purpose , and be conceptually a variable length container. And try to stay away from C style arrays in C++ as much as possible, the standard containers fit the bill in every circumstance, including interfacing with C api libraries. If you need to convert your string from char to wchar_t , c++11 introduced some string conversion functions to convert from wchar_t to char, but Im not sure if they work the other way around.
std::strlen doesn't handle c strings that are not \0 terminated. Is there a safe version of it?
PS I know that in c++ std::string should be used instead of c strings, but in this case my string is stored in a shared memory.
EDIT
Ok, I need to add some explanation.
My application is getting a string from a shared memory (which is of some length), therefore it could be represented as an array of characters. If there is a bug in the library writing this string, then the string would not be zero terminated, and the strlen could fail.
You've added that the string is in shared memory. That's guaranteed readable, and of fixed size. You can therefore use size_t MaxPossibleSize = startOfSharedMemory + sizeOfSharedMemory - input; strnlen(input, MaxPossibleSize) (mind the extra n in strnlen).
This will return MaxPossibleSize if there's no \0 in the shared memory following input, or the string length if there is. (The maximal possible string length is of course MaxPossibleSize-1, in case the last byte of shared memory is the first \0)
C strings that are not null-terminated are not C strings, they are simply arrays of characters, and there is no way of finding their length.
If you define a c-string as
char* cowSays = "moo";
then you autmagically get the '\0' at the end and strlen would return 3. If you define it like:
char iDoThis[1024] = {0};
you get an empty buffer (and array of characters, all of which are null characters). You can then fill it with what you like as long as you don't over-run the buffer length. At the start strlen would return 0, and once you have written something you would also get the correct number from strlen.
You could also do this:
char uhoh[100];
int len = strlen(uhoh);
but that would be bad, because you have no idea what is in that array. It could hit a null character you might not. The point is that the null character is the defined standard manner to declare that the string is finished.
Not having a null character means by definition that the string is not finished. Changing that will break the paradigm of how the string works. What you want to do is make up your own rules. C++ will let you do that, but you will have to write a lot of code yourself.
EDIT
From your newly added info, what you want to do is loop over the array and check for the null character by hand. You should also do some validation if you are expecting ASCII characters only (especially if you are expecting alpha-numeric characters). This assumes that you know the maximum size.
If you do not need to validate the content of the string then you could use one of the strnlen family of functions:
http://msdn.microsoft.com/en-us/library/z50ty2zh%28v=vs.80%29.aspx
http://linux.about.com/library/cmd/blcmdl3_strnlen.htm
size_t safe_strlen(const char *str, size_t max_len)
{
const char * end = (const char *)memchr(str, '\0', max_len);
if (end == NULL)
return max_len;
else
return end - str;
}
Yes, since C11:
size_t strnlen_s( const char *str, size_t strsz );
Located in <string.h>
Get a better library, or verify the one you have - if you can't trust you library to do what it says it will, then how the h%^&l do you expect your program to?
Thats said, Assuming you know the length of the buiffer the string resides, what about
buffer[-1+sizeof(buffer)]=0 ;
x = strlen(buffer) ;
make buffer bigger than needed and you can then test the lib.
assert(x<-1+sizeof(buffer));
C11 includes "safe" functions such as strnlen_s. strnlen_s takes an extra maximum length argument (a size_t). This argument is returned if a null character isn't found after checking that many characters. It also returns the second argument if a null pointer is provided.
size_t strnlen_s(const char *, size_t);
While part of C11, it is recommended that you check that your compiler supports these bounds-checking "safe" functions via its definition of __STDC_LIB_EXT1__. Furthermore, a user must also set another macro, __STDC_WANT_LIB_EXT1__, to 1, before including string.h, if they intend to use such functions. See here for some Stack Overflow commentary on the origins of these functions, and here for C++ documentation.
GCC and Clang also support the POSIX function strnlen, and provide it within string.h. Microsoft too provide strnlen which can also be found within string.h.
You will need to encode your string. For example:
struct string
{
size_t len;
char *data;
} __attribute__(packed);
You can then accept any array of characters if you know the first sizeof(size_t) bytes of the shared memory location is the size of the char array. It gets tricky when you want to chain arrays this way.
It's better to trust your other end to terminate it's strings or roll your own strlen that does not go outside the bounderies of the shared memory segment (providing you know at least the size of that segment).
If you need to get the size of shared memory, try to use
// get memory size
struct shmid_ds shm_info;
size_t shm_size;
int shm_rc;
if((shm_rc = shmctl(shmid, IPC_STAT, &shm_info)) < 0)
exit(101);
shm_size = shm_info.shm_segsz;
Instead of using strlen you can use shm_size - 1 if you are sure that it is null terminated. Otherwise you can null terminate it by data[shm_size - 1] = '\0'; then use strlen(data);
a simple solution:
buff[BUFF_SIZE -1] = '\0'
ofc this will not tell you if the string originally was exactly BUFF_SIZE-1 long or it was just not terminated... so you need xtra logic for that.
How about this portable nugget:
int safeStrlen(char *buf, int max)
{
int i;
for(i=0;buf[i] && i<max; i++){};
return i;
}
As Neil Butterworth already said in his answer above: C-Strings which are not terminated by a \0 character, are no C-Strings!
The only chance you do have is to write an immutable Adaptor or something which creates a valid copy of the C-String with a \0 terminating character. Of course, if the input is wrong and there is an C-String defined like:
char cstring[3] = {'1','2','3'};
will indeed result in unexpected behavior, because there can be something like 123#4x\0 in the memory now. So the result of of strlen() for example is now 6 and not 3 as expected.
The following approach shows how to create a safe C-String in any case:
char *createSafeCString(char cStringToCheck[]) {
//Cast size_t to integer
int size = static_cast<int>(strlen(cStringToCheck)) ;
//Initialize new array out of the stack of the method
char *pszCString = new char[size + 1];
//Copy data from one char array to the new
strncpy(pszCString, cStringToCheck, size);
//set last character to the \0 termination character
pszCString[size] = '\0';
return pszCString;
}
This ensures that if you manipulate the C-String to not write on the memory of something else.
But this is not what you wanted. I know, but there is no other way to achieve the length of a char array without termination. This isn't even an approach. It just ensures that even if the User (or Dev) is inserting ***** to work fine.
I'm trying to make changes to some legacy code. I need to fill a char[] ext with a file extension gotten using filename.Right(3). Problem is that I don't know how to convert from a CStringT to a char[].
There has to be a really easy solution that I'm just not realizing...
TIA.
If you have access to ATL, which I imagine you do if you're using CString, then you can look into the ATL conversion classes like CT2CA.
CString fileExt = _T ("txt");
CT2CA fileExtA (fileExt);
If a conversion needs to be performed (as when compiling for Unicode), then CT2CA allocates some internal memory and performs the conversion, destroying the memory in its destructor. If compiling for ANSI, no conversion needs to be performed, so it just hangs on to a pointer to the original string. It also provides an implicit conversion to const char * so you can use it like any C-style string.
This makes conversions really easy, with the caveat that if you need to hang on to the string after the CT2CA goes out of scope, then you need to copy the string into a buffer under your control (not just store a pointer to it). Otherwise, the CT2CA cleans up the converted buffer and you have a dangling reference.
Well you can always do this even in unicode
char str[4];
strcpy( str, CStringA( cString.Right( 3 ) ).GetString() );
If you know you AREN'T using unicode then you could just do
char str[4];
strcpy( str, cString.Right( 3 ).GetString() );
All the original code block does is transfer the last 3 characters into a non unicode string (CStringA, CStringW is definitely unicode and CStringT depends on whether the UNICODE define is set) and then gets the string as a simple char string.
First use CStringA to make sure you're getting char and not wchar_t. Then just cast it to (const char *) to get a pointer to the string, and use strcpy or something similar to copy to your destination.
If you're completely sure that you'll always be copying 3 characters, you could just do it the simple way.
ext[0] = filename[filename.Length()-3];
ext[1] = filename[filename.Length()-2];
ext[2] = filename[filename.Length()-1];
ext[3] = 0;
I believe this is what you are looking for:
CString theString( "This is a test" );
char* mychar = new char[theString.GetLength()+1];
_tcscpy(mychar, theString);
If I remember my old school MS C++.
You do not specify where is the CStringT type from. It could be anything, including your own implementation of string handling class. Assuming it is CStringT from MFC/ATL library available in Visual C++, you have a few options:
It's not been said if you compile with or without Unicode, so presenting using TCHAR not char:
CStringT
<
TCHAR,
StrTraitMFC
<
TCHAR,
ChTraitsCRT<TCHAR>
>
> file(TEXT("test.txt"));
TCHAR* file1 = new TCHAR[file.GetLength() + 1];
_tcscpy(file1, file);
If you use CStringT specialised for ANSI string, then
std::string file1(CStringA(file));
char const* pfile = file1.c_str(); // to copy to char[] buffer