Given below is my sample code :
int function1(unsigned char *out, int length){
unsigned long crypto_out_len = 16;
unsigned char crypto_out[16] = {0};
.......
//produces 16 bytes output & stores in crypto_out
crypto_function(crypto_out, crypto_out_len);
//lets say crypto_output contents after are : "abcdefghijklmnop"
.......
memcpy(out, crypto_out,length);
return 0;
}
function2(){
unsigned char out[10] = {0};
function1(out, 10);
std::pair<unsigned char *,int> map_entry;
map_entry.first = out;
map_entry.second = 10;
}
Now, map_entry.first should contain : "abcdefghij", right?
But it contains "abcdefghij#$%f1^", some garbage associated with it. How should I avoid such unexpected behaviour so that map_entry.first should contain exactly "abcdefghij".
Since you haven't pasted the whole code, I can't be 100% sure but I think I know what's wrong. memcpy() is behaving correctly here, and everything is 100% defined behavior.
In this case, out is a 10-character string without a null terminator. You assign it to unsigned char* that contains no length information, and I suspect you simply don't use the number ten when you are referring to map_entry.first.
If you print it as unsigned char* or construct a std::string with it, C++ expects it to be a null-terminated string. Therefore, it reads it up until the first null character. Now, since out didn't have one it just runs over and starts reading characters on the stack after out which happen to be what you see as garbage.
What you need to do, is make sure that either the string is null-terminated or make sure that you always refer to it specifying the correct length. For the former, you'd want to make out 11-byte long, and leave the last byte as 0:
function2(){
unsigned char out[11] = {0};
function1(out, 10);
std::pair<unsigned char *,int> map_entry;
map_entry.first = out;
map_entry.second = 10;
}
Please also note that C++ will actually stop at the first null character it encounters. If your crypto_function() may output zero bytes in the middle of the string, you should be aware that the string will be truncated at the point.
For the latter, you'd have to use functions that actually allow you to specify the string length, and always pass the length of 10 to those. If you always work with it like this, you don't have to worry about zero bytes from crypto_function().
You are confusing char[] with strings. out does contain your expected data, but its not 0 terminated, so if you try to display it as a string it may look like it contains extra data. If the data is actually strings, you need to correctly 0 terminate them.
Related
I am using the WinAPI for one of the first times, and i have a function that returns a UCHAR*, but i need it as a std:string, because when i try printing it as a UCHAR* but when i did that it prints a lot of gibberish. There must be some easy way to fix this problem. I Googled this but i could not find anything. I don't even know what a UCHAR* is although it seems to act as some kind of string. I heard that it is a pointer to an unsigned string but i am not quite sure what that means.
This should work
char temp[5];
memcpy(temp, battery_info.Chemistry, 4);
temp[4] = '\0'; // add nul terminator
std::string s = temp; // convert to string
Because your source data does not necessarily have the usual nul terminator, I've copied the data to a temporary char array, added a nul terminator to make sure, then converted to a std::string.
Since the members of that structure are not null terminated:
std::string chemistry(battery_info.Chemistry, battery_info.Chemistry + 4);
Will get you the behavior your want without having to do a memcpy;
Could someone explain why those calls are not returning the same expected result?
unsigned int GetDigit(const string& s, unsigned int pos)
{
// Works as intended
char c = s[pos];
return atoi(&c);
// doesn't give expected results
return atoi(&s[pos]);
return atoi(&static_cast<char>(s[pos]));
return atoi(&char(s[pos]));
}
Remark: I'm not looking for the best way to convert a char to an int.
None of your attempts are correct, including the "works as intended" one (it just happened to work by accident). For starters, atoi() requires a NUL-terminated string, which you are not providing.
How about the following:
unsigned int GetDigit(const string& s, unsigned int pos)
{
return s[pos] - '0';
}
This assumes that you know that s[pos] is a valid decimal digit. If you don't, some error checking is in order.
What you are doing is use a std::string, get one character from its internal representation and feed a pointer to it into atoi, which expects a const char* that points to a NULL-terminated string. A std::string is not guaranteed to store characters so that there is a terminating zero, it's just luck that your C++ implementation seems to do this.
The correct way would be to ask std::string for a zero terminated version of it's contents using s.c_str(), then call atoi using a pointer to it.
Your code contains another problem, you are casting the result of atoi to an unsigned int, while atoi returns a signed int. What if your string is "-123"?
Since int atoi(const char* s) accepts a pointer to a field of characters, your last three uses return a number corresponding to the consecutive digits beginning with &s[pos], e.g. it can give 123 for a string like "123", starting at position 0. Since the data inside a std::string are not required to be null-terminated, the answer can be anything else on some implementation, i.e. undefined behaviour.
Your "working" approach also uses undefined behaviour.
It's different from the other attempts since it copies the value of s[pos]to another location.
It seems to work only as long as the adjacent byte in memory next to character c accidentally happens to be a zero or a non-digit character, which is not guaranteed. So follow the advice given by #aix.
To make it work really you could do the following:
char c[2] = { s[pos], '\0' };
return atoi(c);
if you want to access the data as a C string - use s.c_str(), and then pass it to atoi.
atoi expects a C-style string, std::string is a C++ class with different behavior and characteristics. For starters - it doesn't have to be NULL terminated.
atoi takes pointer to char for it's argument. In the first try when you are using the char c it takes pointer to only one character hence you get the answer you want. However in the other attempts what you get is pointer to a char which has happened to be beginning of a string of chars, therefore I assume what you are getting after atoi in the later attempts is a number converted from the chars in positions pos, pos+1, pos+2 and up to the end of the s string.
If you really want to convert just a single char in the string at the position (as opposed to a substring starting at that position and ending at the end of the string), you can do it these ways:
int GetDigit(const string& s, const size_t& pos) {
return atoi(string(1, s[pos]).c_str());
}
int GetDigit2(const string& s, const size_t& pos) {
const char n[2] = {s[pos], '\0'};
return atoi(n);
}
for example.
I was working with a program that uses a function to set a new value in the registry, I used a const char * to get the value. However, the size of the value is only four bytes. I've tried to use std::string as a parameter instead, it didn't work.
I have a small example to show you what I'm talking about, and rather than solving my problem with the function I'd like to know the reason it does this.
#include <iostream>
void test(const char * input)
{
std::cout << input;
std::cout << "\n" << sizeof("THIS IS A TEST") << "\n" << sizeof(input) << "\n";
/* The code above prints out the size of an explicit string (THIS IS A TEST), which is 15. */
/* It then prints out the size of input, which is 4.*/
int sum = 0;
for(int i = 0; i < 15; i++) //Printed out each character, added the size of each to sum and printed it out.
//The result was 15.
{
sum += sizeof(input[i]);
std::cout << input[i];
}
std::cout << "\n" << sum;
}
int main(int argc, char * argv[])
{
test("THIS IS A TEST");
std::cin.get();
return 0;
}
Output:
THIS IS A TEST
15
4
THIS IS A TEST
15
What's the correct way to get string parameters? Do I have to loop through the whole array of characters and print each to a string (the value in the registry was only the first four bytes of the char)? Or can I use std::string as a parameter instead?
I wasn't sure if this was SO material, but I decided to post here as I consider this to be one of my best sources for programming related information.
sizeof(input) is the size of a const char* What you want is strlen(input) + 1
sizeof("THIS IS A TEST") is size of a const char[]. sizeof gives the size of the array when passed an array type which is why it is 15 .
For std::string use length()
sizeof gives a size based on the type you give it as a parameter. If you use the name of a variable, sizeof still only bases its result on the type of that variable. In the case of char *whatever, it's telling you the size of a pointer to char, not the size of the zero-terminated buffer it's point at. If you want the latter, you can use strlen instead. Note that strlen tells you the length of the content of the string, not including the terminating '\0'. As such, if (for example) you want to allocate space to duplicate a string, you need to add 1 to the result to tell you the total space occupied by the string.
Yes, as a rule in C++ you normally want to use std::string instead of pointers to char. In this case, you can use your_string.size() (or, equivalently, your_string.length()).
std::string is a C++ object, which cannot be passed to most APIs. Most API's take char* as you noticed, which is very different from a std::string. However, since this is a common need, std::string has a function for that: c_str.
std::string input;
const char* ptr = input.c_str(); //note, is const
In C++11, it is now also safe-ish to do this:
char* ptr = &input[0]; //nonconst
and you can alter the characters, but the size is fixed, and the pointer is invalidated if you call any mutating member of the std::string.
As for the code you posted, "THIS IS A TEST" has the type of const char[15], which has a size of 15 bytes. The char* input however, has a type char* (obviously), which has a size of 4 on your system. (Might be other sizes on other systems)
To find the size of a c-string pointed at by a char* pointer, you can call strlen(...) if it is NULL-terminated. It will return the number of characters before the first NULL character.
If the registry you speak of is the Windows registry, it may be an issue of Unicode vs. ASCII.
Modern Windows stores almost all strings as Unicode, which uses 2 bytes per character.
If you try to put a Unicode string into an std::string, it may be getting a 0 (null), which some implementations of string classes treat as "end of string."
You may try using a std::wstring (wide string) or vector< wchar_t > (wide character type). These can store strings of two-byte characters.
sizeof() is also not giving you the value you may think it is giving you. Your system probably runs 32-bit Windows -- that "4" value is the size of the pointer to the first character of that string.
If this doesn't help, please post the specific results that occur when you use std::string or std::wstring (more than saying that it doesn't work).
To put it simply, the size of a const char * != the size of a const char[] (if they are equal, it's by coincidence). The former is a pointer. A pointer, in the case of your system, is 4 bytes REGARDLESS of the datatype. It could be int, char, float, whatever. This is because a pointer is always a memory address, and is numeric. Print out the value of your pointer and you'll see it's actually 4 bytes. const char[] now, is the array itself and will return the length of the array when requested.
std::strlen doesn't handle c strings that are not \0 terminated. Is there a safe version of it?
PS I know that in c++ std::string should be used instead of c strings, but in this case my string is stored in a shared memory.
EDIT
Ok, I need to add some explanation.
My application is getting a string from a shared memory (which is of some length), therefore it could be represented as an array of characters. If there is a bug in the library writing this string, then the string would not be zero terminated, and the strlen could fail.
You've added that the string is in shared memory. That's guaranteed readable, and of fixed size. You can therefore use size_t MaxPossibleSize = startOfSharedMemory + sizeOfSharedMemory - input; strnlen(input, MaxPossibleSize) (mind the extra n in strnlen).
This will return MaxPossibleSize if there's no \0 in the shared memory following input, or the string length if there is. (The maximal possible string length is of course MaxPossibleSize-1, in case the last byte of shared memory is the first \0)
C strings that are not null-terminated are not C strings, they are simply arrays of characters, and there is no way of finding their length.
If you define a c-string as
char* cowSays = "moo";
then you autmagically get the '\0' at the end and strlen would return 3. If you define it like:
char iDoThis[1024] = {0};
you get an empty buffer (and array of characters, all of which are null characters). You can then fill it with what you like as long as you don't over-run the buffer length. At the start strlen would return 0, and once you have written something you would also get the correct number from strlen.
You could also do this:
char uhoh[100];
int len = strlen(uhoh);
but that would be bad, because you have no idea what is in that array. It could hit a null character you might not. The point is that the null character is the defined standard manner to declare that the string is finished.
Not having a null character means by definition that the string is not finished. Changing that will break the paradigm of how the string works. What you want to do is make up your own rules. C++ will let you do that, but you will have to write a lot of code yourself.
EDIT
From your newly added info, what you want to do is loop over the array and check for the null character by hand. You should also do some validation if you are expecting ASCII characters only (especially if you are expecting alpha-numeric characters). This assumes that you know the maximum size.
If you do not need to validate the content of the string then you could use one of the strnlen family of functions:
http://msdn.microsoft.com/en-us/library/z50ty2zh%28v=vs.80%29.aspx
http://linux.about.com/library/cmd/blcmdl3_strnlen.htm
size_t safe_strlen(const char *str, size_t max_len)
{
const char * end = (const char *)memchr(str, '\0', max_len);
if (end == NULL)
return max_len;
else
return end - str;
}
Yes, since C11:
size_t strnlen_s( const char *str, size_t strsz );
Located in <string.h>
Get a better library, or verify the one you have - if you can't trust you library to do what it says it will, then how the h%^&l do you expect your program to?
Thats said, Assuming you know the length of the buiffer the string resides, what about
buffer[-1+sizeof(buffer)]=0 ;
x = strlen(buffer) ;
make buffer bigger than needed and you can then test the lib.
assert(x<-1+sizeof(buffer));
C11 includes "safe" functions such as strnlen_s. strnlen_s takes an extra maximum length argument (a size_t). This argument is returned if a null character isn't found after checking that many characters. It also returns the second argument if a null pointer is provided.
size_t strnlen_s(const char *, size_t);
While part of C11, it is recommended that you check that your compiler supports these bounds-checking "safe" functions via its definition of __STDC_LIB_EXT1__. Furthermore, a user must also set another macro, __STDC_WANT_LIB_EXT1__, to 1, before including string.h, if they intend to use such functions. See here for some Stack Overflow commentary on the origins of these functions, and here for C++ documentation.
GCC and Clang also support the POSIX function strnlen, and provide it within string.h. Microsoft too provide strnlen which can also be found within string.h.
You will need to encode your string. For example:
struct string
{
size_t len;
char *data;
} __attribute__(packed);
You can then accept any array of characters if you know the first sizeof(size_t) bytes of the shared memory location is the size of the char array. It gets tricky when you want to chain arrays this way.
It's better to trust your other end to terminate it's strings or roll your own strlen that does not go outside the bounderies of the shared memory segment (providing you know at least the size of that segment).
If you need to get the size of shared memory, try to use
// get memory size
struct shmid_ds shm_info;
size_t shm_size;
int shm_rc;
if((shm_rc = shmctl(shmid, IPC_STAT, &shm_info)) < 0)
exit(101);
shm_size = shm_info.shm_segsz;
Instead of using strlen you can use shm_size - 1 if you are sure that it is null terminated. Otherwise you can null terminate it by data[shm_size - 1] = '\0'; then use strlen(data);
a simple solution:
buff[BUFF_SIZE -1] = '\0'
ofc this will not tell you if the string originally was exactly BUFF_SIZE-1 long or it was just not terminated... so you need xtra logic for that.
How about this portable nugget:
int safeStrlen(char *buf, int max)
{
int i;
for(i=0;buf[i] && i<max; i++){};
return i;
}
As Neil Butterworth already said in his answer above: C-Strings which are not terminated by a \0 character, are no C-Strings!
The only chance you do have is to write an immutable Adaptor or something which creates a valid copy of the C-String with a \0 terminating character. Of course, if the input is wrong and there is an C-String defined like:
char cstring[3] = {'1','2','3'};
will indeed result in unexpected behavior, because there can be something like 123#4x\0 in the memory now. So the result of of strlen() for example is now 6 and not 3 as expected.
The following approach shows how to create a safe C-String in any case:
char *createSafeCString(char cStringToCheck[]) {
//Cast size_t to integer
int size = static_cast<int>(strlen(cStringToCheck)) ;
//Initialize new array out of the stack of the method
char *pszCString = new char[size + 1];
//Copy data from one char array to the new
strncpy(pszCString, cStringToCheck, size);
//set last character to the \0 termination character
pszCString[size] = '\0';
return pszCString;
}
This ensures that if you manipulate the C-String to not write on the memory of something else.
But this is not what you wanted. I know, but there is no other way to achieve the length of a char array without termination. This isn't even an approach. It just ensures that even if the User (or Dev) is inserting ***** to work fine.
Suppose that I have a unsigned char*, let's call it: some_data
unsigned char* some_data;
And some_data has url-like data in it. for example:
"aasdASDASsdfasdfasdf&Foo=cow&asdfasasdfadsfdsafasd"
I have a function that can grab the value of 'foo' as follows:
// looks for the value of 'foo'
bool grabFooValue(const std::string& p_string, std::string& p_foo_value)
{
size_t start = p_string.find("Foo="), end;
if(start == std::string::npos)
return false;
start += 4;
end = p_string.find_first_of("& ", start);
p_foo_value = p_string.substr(start, end - start);
return true;
}
The trouble is that I need a string to pass to this function, or at least a char* (which can be converted to a string no problem).
I can solve this problem by casting:
reinterpret_cast<char *>(some_data)
And then pass it to the function all okie-dokie
...
Until I used valgrind and found out that this can lead to a subtle memory leak.
Conditional jump or move depends on uninitialised value(s) __GI_strlen
From what I gathered, it has to do with the reinterpret casting messing up the null indicating the end of the string. Thus when c++ tries to figure out the length of the string thing's get screwy.
Given that I can't change the fact that some_data is represented by an unsigned char*, is there a way to go about using my grabFooValue function without having these subtle problems?
I'd prefer to keep the value-finding function that I already have, unless there is clearly a better way to rip the foo-value out of this (sometimes large) unsigned char*.
And despite the unsigned char* some_data 's varying, and sometimes large size, I can assume that the value of 'foo' will be somewhere early on, so my thoughts were to try and get a char* of the first X characters of the unsigned char*. This could potentially get rid of the string-length issue by having me set where the char* ends.
I tried using a combination of strncpy and casting but so far no dice. Any thoughts?
You need to know the length of the data your unsigned char * points to, since it isn't 0-terminated.
Then, use e.g:
std::string s((char *) some_data, (char *) some_data + len);