this is one area of C/C++ that i have never been good at.
my problem is that i have a string that will need to eventually contain some null characters. treating everything as a char array (or string) won't work, as things tend to crap out when they find the first null. so i thought, ok, i'll switch over to uint8_t, so everything is just a number. i can move things around as needed, and cast it back to a char when i'm ready.
my main question right now is: how can i copy a portion of a string to an uint8_t buffer?
effectively, i'd like to do something like:
std::string s = "abcdefghi";
uint8_t *val = (uint8_t*)malloc(s.length() + 1);
memset(val, 0, s.length() + 1);
// Assume offset is just some number
memcpy(val + offset, s.substr(1, 5).c_str(), 5);
obviously, i get an error when i try this. there is probably some sort of trickery that can be done in the first argument of the memcpy (i see stuff like (*(uint8_t*)) online, and have no clue what that means).
any help on what to do?
and while i am here, how can i easily cast this back to a char array? just static_cast the uint8_t pointer to a char pointer?
thanks a lot.
i thought, ok, i'll switch over to uint8_t, so everything is just a number.
That's not going to make algorithms that look for a '\0' suddenly stop doing it, nor do algorithms that use char have to pay attention to '\0'. Signaling the end with a null character is a convention of C strings, not char arrays. uint8_t might just be a typedef for char anyway.
As Nicol Bolas points out std::string is already capable of storing strings that contain the null character without treating the null character specially.
As for your question, I'm not sure what error you're referring to, as the following works just fine:
#include <iostream>
#include <string>
#include <cstdint>
#include <cstring>
int main() {
std::string s = "abcdefghi";
std::uint8_t *val = (std::uint8_t*)std::malloc(s.length() + 1);
std::memset(val, 0, s.length() + 1);
int offset = 2;
std::memcpy(val + offset, s.substr(1, 5).c_str(), 5);
std::cout << (val+offset) << '\n';
}
The memcpy line takes the second through sixth characters from the string s and copies them into val. The line with cout then prints "bcdef".
Of course this is C++, so if you want to manually allocate some memory and zero it out you can do so like:
std::unique_ptr<uint8_t[]> val(new uint8_t[s.length()+1]());
or use a vector:
std::vector<uint8_t> val(s.length()+1,0);
To cast from an array of uint8_t you could (but typically shouldn't) do the following:
char *c = reinterpret_cast<uint8_t*>(val);
Well, the code works ok, it copies the substring in val. However, you will have 0s on all the positions until the offset.
e.g. for offset=2 val would be {0, 0, b, c, d, e, f, 0, 0, 0}
If you print this, it will show nothing because the string is null terminated on the first position (I guess this is the error you were talking about...).
Related
I want to calculate a hash of the structure passing as string. Although vlanId values are different, the hash value is still the same. The StringHash() funtion calculates the values of the hash. I haven't assigned any value to portId and vsi.
#include<stdio.h>
#include <functional>
#include <cstring>
using namespace std;
unsigned long StringHash(unsigned char *Arr)
{
hash<string> str_hash;
string Str((const char *)Arr);
unsigned long str_hash_value = str_hash(Str);
printf("Hash=%lu\n", str_hash_value);
return str_hash_value;
}
typedef struct
{
unsigned char portId;
unsigned short vlanId;
unsigned short vsi;
}VlanConfig;
int main()
{
VlanConfig v1;
memset(&v1,0,sizeof(VlanConfig));
unsigned char *index = (unsigned char *)&v1 + sizeof(unsigned char);
v1.vlanId = 10;
StringHash(index);
StringHash((unsigned char *)&v1);
v1.vlanId = 12;
StringHash(index);
StringHash((unsigned char *)&v1);
return 0;
}
Output:
Hash=6142509188972423790
Hash=6142509188972423790
Hash=6142509188972423790
Hash=6142509188972423790
You pass the bytes of your structure to a function expecting a zero terminated string. Well, the first byte of your structure already is zero, so you calculate the same hash every time.
Now, that is the explanation why, but not the solution to your problem. Passing a random sequence of bytes to a function expecting a zero-terminated sequence of characters is going to fail spectacularly, no matter how you do it.
Find another way to hash your structure. You are already using hash<>, why not use it for your case:
namespace std
{
template<> struct hash<VlanConfig>
{
std::size_t operator()(VlanConfig const& c) const noexcept
{
std::size_t h1 = std::hash<char>{}(c.portId);
std::size_t h2 = std::hash<short>{}(c.vlanId);
std::size_t h3 = std::hash<short>{}(c.vsi);
return h1 ^ (h2 << 1) ^ (h3 << 2); // or use boost::hash_combine
}
};
}
Then you can do this:
VlanConfig myVariable;
// fill myVariable
std::cout << std::hash<VlanConfig>{}(myVariable) << std::endl;
I can't say for certain, but most likely your issue is structure padding. Unless explicietly set ot pack members and ignore alignment, most compilers will set up the struct as follows:
Byte 0: portId
Byte 1: padding
Bytes 2,3: vlanId
Bytes 4,5: vsi
So when you figure the address of index, it'll point to the padding byte, which is always zero. Thus you're always hashing an empty string.
You should be able to check this in a debugger by inspecting index and comparing it to the address of vlanId.
-- Edit --
After giving this some more thought, I have to say that in my extremely humble opinion, this isn't a good way to get a hash value. Trying to treat several numeric values that might, or might not, be contiguous in memory as a std::string, has too many possibilities for error.
Start with the fact that even if you do get the address correct, consider what happens when you hash two different configurations, one of which has vlanId set to 256, while the other has it set to 512. Assuming a little endian machine, both of those will have a zero byte as the first character of the string, and so you're right back here again.
Worse yet is the case when all four bytes in vlanId and vsi are non zero. In that case, you'll read right off the end of your struct, and keep on going, reading who knows what. There's no way that's going to end well.
One possible solution is to figure the size of data, and use the following ctor for std::string: string (char const *s, size_t n); which has the advantage of forcing the string to exactly the size you want.
How do I convert const char* to char[256]?
const char *s = std::string("x").c_str();
char c[256] = /* ??? */
You cannot initialise an array with a character pointer. But you can copy the string. For example:
const char *s = get_the_string();
char c[256]{};
auto n = std::strlen(s);
if (std::size(c) <= n)
throw std::runtime_error(
"The buffer is too small. Contact your local C++ maintainer");
std::memcpy(c, s, n);
The obvious problem with using an array of constant size is that you need to consider how to handle situation where the input string doesn't fit. Here, I've used an exception, but you can use error handling of your choice, if this is not an option for you.
You cannot copy from a const char *s = std::string("x").c_str(); though, because the pointer is dangling, and attempting to access the pointed data would have undefined behaviour.
Copying the contents from the const type to an editable one is really your only recourse for dropping the const. I'm guessing you are given a const because something has marked it "not ok for you to change" ie read only.
The trouble with a pure * though is you need to know how long it is. For null-terminated strings, strlen can get you that size (and so it works with strncpy).
strncpy(c,s,256);
If the const char * were just bytes though, you'd need another way.
There are many different ways to copy a const char* into a char[]:
#include <cstring>
const char *s = "x";
char c[256]{};
std::strncpy(c, s, 255);
#include <algorithm>
#include <cstring>
const char *s = "x";
char c[256]{};
std::copy_n(s, std::min(std::strlen(s), 255), c);
#include <string>
const char *s = "x";
char c[256]{};
std::string(s).copy(c, 255);
#include <sstream>
const char *s = "x";
char c[256]{};
std::istringstream iss(s);
iss.read(c, 255);
//or: iss.get(c, 256, '\0');
strncpy(c, s, 256);
it work for me
As others have pointed out
const char *s = std::string("x").c_str();
Is bad code. It effectively creates a new string, puts "x" in it, returns a pointer to "x", frees the string. So now what s points to is undefined
If you were not creating the string in that line it would be safe. For example
const auto t = std::string("x");
const char *s = t.c_str();
Now t will be valid until the current scope exits and so will s
As for the copy to an array of 256 characters the arguably optimal solution is
char c[256];
std::strncpy(c, s, 255);
c[255] = '\0';
Why?
This line
char c[256];
allocates space on the stack for 256 bytes and does nothing else.
This line
std::strncpy(c, s, 255);
if s is less than 255 characters it copies those characters into c then writes out zeros to pad out the buffer to the 254th element
if s is 255 characters or more it just copies the first 255 characters
This line puts a null terminating zero at the end
c[255] = '\0';
Let's compare to other solutions
This one
char c[256];
std::strncpy(c, s, 256);
Problem with this answer is if s is more than 255 characters there will be no terminating 0 at the end of c. Whether that's important or not is really up to you but 999 times out of 1000 it probably is important.
This one
char c[256]{};
std::strncpy(c, s, 255);
Is safe but slower. The difference is the {} at the end of char c[256]{}. Without that {} the c array is only allocated. With it c is not only allocated but also initialized to 0 for all 256 characters. That means for every character copied from s to c there was a wasted effort clearing the character to zero at the beginning. That's potentially double the work
const char *s = get_the_string();
char c[256]{};
auto n = std::strlen(s);
if (std::size(c) <= n)
throw std::runtime_error(
"The buffer is too small. Contact your local C++ maintainer");
std::memcpy(c, s, n);
Same as above, does double the work though it is good to point out that you must choose how to handle s being too big to fit in c.
All of the examples using char c[256]{} instead of char c[256] are potentially doing double the work. Doing double the work is not necessarily bad but given the optimal version is simple there's no reason not to use it.
One other issue is using magic numbers. Unfortunately C++ didn't add an array size function until C++ 17 (std::size) so we're left to make our own
template<class T, size_t N>
constexpr size_t array_size(T (&)[N]) { return N; }
so then we can do this
char c[256];
std::strncpy(c, s, array_size(c) - 1);
c[array_size(c) - 1] = '\0';
so now if we change the size of c the code will still work.
The standard version for getting the number of elements in an array is std::size added in C++ 17 but C++ 17 is apparently still rare, none of the online C++ compilers I tried (first several hits in Google) supported it.
Consider these two pieces of code. They're converting base10 number to baseN number, where N is the number of characters in given alphabet. Actually, they generate permutations of letters of given alphabet. It's assumed that 1 is equal to first letter of the alphabet.
#include <iostream>
#include <string>
typedef unsigned long long ull;
using namespace std;
void conv(ull num, const string alpha, string *word){
int base=alpha.size();
*word="";
while (num) {
*word+=alpha[(num-1)%base];
num=(num-1)/base;
}
}
int main(){
ull nu;
const string alpha="abcdef";
string word;
for (nu=1;nu<=10;++nu) {
conv(nu,alpha,&word);
cout << word << endl;
}
return 0;
}
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef unsigned long long ull;
void conv(ull num, const char* alpha, char *word){
int base=strlen(alpha);
while (num) {
(*word++)=alpha[(num-1)%base];
num=(num-1)/base;
}
}
int main() {
char *a=calloc(10,sizeof(char));
const char *alpha="abcdef";
ull h;
for (h=1;h<=10;++h) {
conv(h,alpha,a);
printf("%s\n", a);
}
}
Output is the same:
a
b
c
d
aa
ba
ca
da
No, I didn't forget to reverse the strings, reversal was removed for code clarification.
For some reason speed is very important for me. I've tested the speed of executables compiled from the examples above and noticed that the one written n C++ using string is more than 10 times less fast than the one written in C using char *.
Each executable was compiled with -O2 flag of GCC. I was running tests using much bigger numbers to convert, such as 1e8 and more.
The question is: why is string less fast than char * in that case?
Your code snippets are not equivalent. *a='n' does not append to the char array. It changes the first char in the array to 'n'.
In C++, std::strings should be preferred to char arrays, because they're a lot easier to use, for example appending is done simply with the += operator.
Also they automatically manage their memory for you which char arrays don't do. That being said, std::strings are much less error prone than the manually managed char arrays.
Doing a trace of your code you get:
*a='n';
// 'n0000'
// ^
// a
++a;
// 'n0000'
// ^
// a
*a='o'
// 'no000'
// ^
// a
In the end, a points to its original address + 1, wich is o. If you print a you will get 'o'.
Anyways, what if you need 'nothing' instead of 'no'? It wont fit in 5 chars and you will need to reallocate mem etc. That kind of things is what string class do for you behind the scenes, and faster enough so it's not a problem almost every scenario.
It's possible to use both char * and string to handle some text in C++. It seems to me that string addition is much slower than pointer addition. Why does this happen?
That is because when you use a char array or deal with a pointer to it (char*) the memory is only allocated once. What you describe with "addition" is only an iteration of the pointer to the array. So its just moving of a pointer.
// Both allocate memory one time:
char test[4];
char* ptrTest = new char[4];
// This will just set the values which already exist in the array and will
// not append anything.
*(ptrTest++) = 't'
*(ptrTest++) = 'e';
*(ptrTest++) = 's';
*(ptrTest) = 't';
When you use a string instead, the += operator actually appends characters to the end of your string. In order to accomplish this, memory will be dynamically allocated every time you append something to the string. This process does take longer than just iterating a pointer.
// This will allocate space for one character on every call of the += operator
std::string test;
test += 't';
test += 'e';
test += 's';
test += 't';
std::string a(2,' ');
a[0] = 'n';
a[1] = 'o';
Change the size of your string in the constructor or use the reserve, resize methods, that is your choice.
You are mixing different things in your question, one is a raw representation of bytes that can get interpreted as a string, no semantics or checks, the other is an abstraction of a string with checks, believe me, it is a lot of more important the security and avoid segfaults that can lead on code injection and privilege escalation than 2ms.
From the std::string documentation (here) you can see, that the
basic_string& operator+=(charT c)
is equivalent to calling push_back(c) on that string, so
string a;
a+='n';
a+='o';
is equivalent to:
string a;
a.push_back('n');
a.push_back('o');
The push_back does take care of a lot more than the raw pointer operations and is thus slower. It for instance takes care of automatic memory management of the string class.
I'm so new to C++ and I just can't figure out how to use any multidimesional arrays. I want to do something like that:
input number of product: number; //the products' name can be 7 with NULL char. (max 6)
char arr[number][7];
That works. But when I want to do that in a for loop(i):
cin>>arr[i][7];
and I don't know what the hell is compiler doing?
I just want that:
arr[0][7]=apple;
arr[1][7]=orange;
So please how can I do that?
#include <string>
#include <vector>
Since everybody is recommending it, I thought I'd sketch the options for you.
Note that you would have gotten this kind of answer in 10 milli-seconds by 3 different persons, if you had supplied a short, working sample code snippet (translating code 1:1 is more efficient than 'thinking up' examples that you might recognize)
Here you go:
std::vector<std::string> strings
strings.push_back("apple");
strings.push_back("banana");
// or
std::string s;
std::cin >> s; // a word
strings.push_back(s);
// or
std::getline(std::cin, s); // a whole line
strings.push_back(s);
// or:
// add #include <iterator>
// add #include <algorithm>
std::copy(std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::back_inserter(strings));
Direct addressing is also possible:
std::vector<std::string> strings(10); // 10 empty strings
strings[7] = "seventh";
Edit in response to comments:
const char* eighth = "eighth";
if (strings[7] != eighth)
{
// not equal
}
// If you really **must** (read, no you don't) you can get a const char* for the string:
const char* sz = strings[7].c_str(); // warning:
// invalidated when `strings[7]` is modified/destructed
Unless you have a real reason (which you mustn't hide from us), make as Björn says and use a vector of strings. You can even do away with the initial request for the total size:
#include <string>
#include <vector>
#include <iostream>
std::vector<std::string> fruits;
std::string line;
while (std::getline(std::cin, line))
{
fruits.push_back(line);
}
Let's test:
std::cout << "You entered the following items:\n";
for (auto const & f : fruits) std::cout << "* " << f << "\n";
Because arr[i][7] is a char, and in fact one past the last element, which means you may get memory access error.
What you want to do maybe cin>>arr[i];.
How ever, this is not a very good idea, as you cannot control how many characters are read from input, which will easily cause memory overrun.
The easy way would be using std::vector<std::string> as others have suggested.
strcpy(&arr[0], "apple");
strcpy(&arr[1], "orange");
but for C++ is better to use std::vector<std::string> for array of strings
You have a two dimensional array of char
char arr[number][7];
And then trying to assign a string (char* or const char*) to them which will not work. What you can do here is assign a character, for example:
arr[0][1] = 'a';
If you can I would recommend using std::vector and std::string it would make things much clearer. In your case you could do
cin>>arr[i];
But I would not recommend it as you could only store up to 6 character char* strings (plus the null terminator). You can also have an array of char*
char* arr[number];
then dynamically allocate memory to store the strings.
Using std::vector and std::string will usually save you headaches once you understand them. Since you are brand new to C++, it might be useful to understand what is going on with two-dimensional arrays anyhow.
When you say
char array[N][M];
With N and M being constants, not variables, you are telling the compiler to allocate N*M items of type char. There will be a block of memory dedicated to that array of size N*M*sizeof(char). (You can declare an array of anything, not just char. Since sizeof(char) is 1, the memory will be N*M bytes long.) If you looked at raw memory, the first byte in the memory would be where array[0][0] is. The second byte would be where array[0][1] is, an so on, for M bytes. Then you would see array[1][0]. This is called row-major order.
As #jbat100 mentioned, when you say array[i][j] you are referring to a single character. When you say array[i], you are referring to the address of row i in the array. There is no pointer actually stored in memory, but when you say array[i] the compiler knows that you mean that you want the address of row i in the array:
char* row_i = array[i];
Now if i>0, then row_i points to somewhere in the middle of that block of memory dedicated to the array. This would do the same thing:
char* row_i = &array[i][0];
If you have a string, "orange" and you know that the length of it is less than M, you can store it in a given row in the array, like this:
strcpy(array[i], "orange"); // or
array[i][0] = 'o'; array[i][1] = 'a'; ... array[i][6] = 0;
Or you could have said row_i instead of array[i]. This copies 7 bytes into the array in the location of row_i. The strcpy() also copies an extra byte which is a 0, and this is the convention for terminating a character string in C and C++. So the 7 bytes are six bytes, 'o', 'r', 'a', 'n', 'g', and 'e', plus a 0 byte. Now strcmp(row_i, "orange") == 0.
Beware that if your string is longer than M, the strcpy and the simple char assignments will not (probably) produce a compile error, but you will end up copying part of your string into the next row.
Read about pointers and arrays in a good C/C++ book.
I want to copy a string into a char array, and not overrun the buffer.
So if I have a char array of size 5, then I want to copy a maximum of 5 bytes from a string into it.
what's the code to do that?
This is exactly what std::string's copy function does.
#include <string>
#include <iostream>
int main()
{
char test[5];
std::string str( "Hello, world" );
str.copy(test, 5);
std::cout.write(test, 5);
std::cout.put('\n');
return 0;
}
If you need null termination you should do something like this:
str.copy(test, 4);
test[4] = '\0';
First of all, strncpy is almost certainly not what you want. strncpy was designed for a fairly specific purpose. It's in the standard library almost exclusively because it already exists, not because it's generally useful.
Probably the simplest way to do what you want is with something like:
sprintf(buffer, "%.4s", your_string.c_str());
Unlike strncpy, this guarantees that the result will be NUL terminated, but does not fill in extra data in the target if the source is shorter than specified (though the latter isn't a major issue when the target length is 5).
Use function strlcpybroken link, and material not found on destination site if your implementation provides one (the function is not in the standard C library), yet it is rather widely accepted as a de-facto standard name for a "safe" limited-length copying function for zero-terminated strings.
If your implementation does not provide strlcpy function, implement one yourself. For example, something like this might work for you
char *my_strlcpy(char *dst, const char *src, size_t n)
{
assert(dst != NULL && src != NULL);
if (n > 0)
{
char *pd;
const char *ps;
for (--n, pd = dst, ps = src; n > 0 && *ps != '\0'; --n, ++pd, ++ps)
*pd = *ps;
*pd = '\0';
}
return dst;
}
(Actually, the de-facto accepted strlcpy returns size_t, so you might prefer to implement the accepted specification instead of what I did above).
Beware of the answers that recommend using strncpy for that purpose. strncpy is not a safe limited-length string copying function and is not supposed to be used for that purpose. While you can force strncpy to "work" for that purpose, it is still akin to driving woodscrews with a hammer.
Update: Thought I would try to tie together some of the answers, answers which have convinced me that my own original knee-jerk strncpy response was poor.
First, as AndreyT noted in the comments to this question, truncation methods (snprintf, strlcpy, and strncpy) are often not a good solution. Its often better to check the size of the string string.size() against the buffer length and return/throw an error or resize the buffer.
If truncation is OK in your situation, IMHO, strlcpy is the best solution, being the fastest/least overhead method that ensures null termination. Unfortunately, its not in many/all standard distributions and so is not portable. If you are doing a lot of these, it maybe worth providing your own implementation, AndreyT gave an example. It runs in O(result length). Also the reference specification returns the number of bytes copied, which can assist in detecting if the source was truncated.
Other good solutions are sprintf and snprintf. They are standard, and so are portable and provide a safe null terminated result. They have more overhead than strlcpy (parsing the format string specifier and variable argument list), but unless you are doing a lot of these you probably won't notice the difference. It also runs in O(result length). snprintf is always safe and that sprintf may overflow if you get the format specifier wrong (as other have noted, format string should be "%.<N>s" not "%<N>s"). These methods also return the number of bytes copied.
A special case solution is strncpy. It runs in O(buffer length), because if it reaches the end of the src it zeros out the remainder of the buffer. Only useful if you need to zero the tail of the buffer or are confident that destination and source string lengths are the same. Also note that it is not safe in that it doesn't necessarily null terminate the string. If the source is truncated, then null will not be appended, so call in sequence with a null assignment to ensure null termination: strncpy(buffer, str.c_str(), BUFFER_LAST); buffer[BUFFER_LAST] = '\0';
Some nice libc versions provide non-standard but great replacement for strcpy(3)/strncpy(3) - strlcpy(3).
If yours doesn't, the source code is freely available here from the OpenBSD repository.
void stringChange(string var){
char strArray[100];
strcpy(strArray, var.c_str());
}
I guess this should work. it'll copy form string to an char array.
i think snprintf() is much safe and simlest
snprintf ( buffer, 100, "The half of %d is %d", 60, 60/2 );
null character is append it end automatically :)
The most popular answer is fine but the null-termination is not generic. The generic way to null-terminate the char-buffer is:
std::string aString = "foo";
const size_t BUF_LEN = 5;
char buf[BUF_LEN];
size_t len = aString.copy(buf, BUF_LEN-1); // leave one char for the null-termination
buf[len] = '\0';
len is the number of chars copied so it's between 0 and BUF_LEN-1.
std::string my_string("something");
char* my_char_array = new char[5];
strncpy(my_char_array, my_string.c_str(), 4);
my_char_array[4] = '\0'; // my_char_array contains "some"
With strncpy, you can copy at most n characters from the source to the destination. However, note that if the source string is at most n chars long, the destination will not be null terminated; you must put the terminating null character into it yourself.
A char array with a length of 5 can contain at most a string of 4 characters, since the 5th must be the terminating null character. Hence in the above code, n = 4.
std::string str = "Your string";
char buffer[5];
strncpy(buffer, str.c_str(), sizeof(buffer));
buffer[sizeof(buffer)-1] = '\0';
The last line is required because strncpy isn't guaranteed to NUL terminate the string (there has been a discussion about the motivation yesterday).
If you used wide strings, instead of sizeof(buffer) you'd use sizeof(buffer)/sizeof(*buffer), or, even better, a macro like
#define ARRSIZE(arr) (sizeof(arr)/sizeof(*(arr)))
/* ... */
buffer[ARRSIZE(buffer)-1]='\0';
char mystring[101]; // a 100 character string plus terminator
char *any_input;
any_input = "Example";
iterate = 0;
while ( any_input[iterate] != '\0' && iterate < 100) {
mystring[iterate] = any_input[iterate];
iterate++;
}
mystring[iterate] = '\0';
This is the basic efficient design.
If you always have a buffer of size 5, then you could do:
std::string s = "Your string";
char buffer[5]={s[0],s[1],s[2],s[3],'\0'};
Edit:
Of course, assuming that your std::string is large enough.