Char has a different size than a string - c++

I was working with a program that uses a function to set a new value in the registry, I used a const char * to get the value. However, the size of the value is only four bytes. I've tried to use std::string as a parameter instead, it didn't work.
I have a small example to show you what I'm talking about, and rather than solving my problem with the function I'd like to know the reason it does this.
#include <iostream>
void test(const char * input)
{
std::cout << input;
std::cout << "\n" << sizeof("THIS IS A TEST") << "\n" << sizeof(input) << "\n";
/* The code above prints out the size of an explicit string (THIS IS A TEST), which is 15. */
/* It then prints out the size of input, which is 4.*/
int sum = 0;
for(int i = 0; i < 15; i++) //Printed out each character, added the size of each to sum and printed it out.
//The result was 15.
{
sum += sizeof(input[i]);
std::cout << input[i];
}
std::cout << "\n" << sum;
}
int main(int argc, char * argv[])
{
test("THIS IS A TEST");
std::cin.get();
return 0;
}
Output:
THIS IS A TEST
15
4
THIS IS A TEST
15
What's the correct way to get string parameters? Do I have to loop through the whole array of characters and print each to a string (the value in the registry was only the first four bytes of the char)? Or can I use std::string as a parameter instead?
I wasn't sure if this was SO material, but I decided to post here as I consider this to be one of my best sources for programming related information.

sizeof(input) is the size of a const char* What you want is strlen(input) + 1
sizeof("THIS IS A TEST") is size of a const char[]. sizeof gives the size of the array when passed an array type which is why it is 15 .
For std::string use length()

sizeof gives a size based on the type you give it as a parameter. If you use the name of a variable, sizeof still only bases its result on the type of that variable. In the case of char *whatever, it's telling you the size of a pointer to char, not the size of the zero-terminated buffer it's point at. If you want the latter, you can use strlen instead. Note that strlen tells you the length of the content of the string, not including the terminating '\0'. As such, if (for example) you want to allocate space to duplicate a string, you need to add 1 to the result to tell you the total space occupied by the string.
Yes, as a rule in C++ you normally want to use std::string instead of pointers to char. In this case, you can use your_string.size() (or, equivalently, your_string.length()).

std::string is a C++ object, which cannot be passed to most APIs. Most API's take char* as you noticed, which is very different from a std::string. However, since this is a common need, std::string has a function for that: c_str.
std::string input;
const char* ptr = input.c_str(); //note, is const
In C++11, it is now also safe-ish to do this:
char* ptr = &input[0]; //nonconst
and you can alter the characters, but the size is fixed, and the pointer is invalidated if you call any mutating member of the std::string.
As for the code you posted, "THIS IS A TEST" has the type of const char[15], which has a size of 15 bytes. The char* input however, has a type char* (obviously), which has a size of 4 on your system. (Might be other sizes on other systems)
To find the size of a c-string pointed at by a char* pointer, you can call strlen(...) if it is NULL-terminated. It will return the number of characters before the first NULL character.

If the registry you speak of is the Windows registry, it may be an issue of Unicode vs. ASCII.
Modern Windows stores almost all strings as Unicode, which uses 2 bytes per character.
If you try to put a Unicode string into an std::string, it may be getting a 0 (null), which some implementations of string classes treat as "end of string."
You may try using a std::wstring (wide string) or vector< wchar_t > (wide character type). These can store strings of two-byte characters.
sizeof() is also not giving you the value you may think it is giving you. Your system probably runs 32-bit Windows -- that "4" value is the size of the pointer to the first character of that string.
If this doesn't help, please post the specific results that occur when you use std::string or std::wstring (more than saying that it doesn't work).

To put it simply, the size of a const char * != the size of a const char[] (if they are equal, it's by coincidence). The former is a pointer. A pointer, in the case of your system, is 4 bytes REGARDLESS of the datatype. It could be int, char, float, whatever. This is because a pointer is always a memory address, and is numeric. Print out the value of your pointer and you'll see it's actually 4 bytes. const char[] now, is the array itself and will return the length of the array when requested.

Related

C++ calculate size of array of strings from file [duplicate]

#include <cstdlib>
#include <iostream>
int main(int argc, char *argv[])
{
cout << "size of String " << sizeof( string );
system("PAUSE");
return EXIT_SUCCESS;
}
Output:
size of String = 4
Does that mean that, since sizeof(char) = 1 Byte (0 to 255), string can only hold 4 characters?
It isn't clear from your example what 'string' is. If you have:
#include <string>
using namespace std;
then string is std::string, and sizeof(std::string) gives you the size of the class instance and its data members, not the length of the string. To get that, use:
string s;
cout << s.size();
When string is defined as:
char *string;
sizeof(string) tells you the size of the pointer. 4 bytes (You're on a 32-bit machine.) You've allocated no memory yet to hold text. You want a 10-char string? string = malloc(10); Now string points to a 10-byte buffer you can put characters in.
sizeof(*string) will be 1. The size of what string is pointing to, a char.
If you instead did
char string[10];
sizeof(string) would be 10. It's a 10-char array.
sizeof(*string) would be 1 still.
It'd be worth looking up and understanding the __countof macro.
Update: oh, yeah, NOW include the headers :) 'string' is a class whose instances take up 4 bytes, that's all that means. Those 4 bytes could point to something far more useful, such as a memory area holding more than 4 characters.
You can do things like:
string s = "12345";
cout << "length of String " << s.length();
sizeof(char) is always 1 byte. A byte which we think is 8-bits need not be the case. There are architectures where a BYTE is 32-bits, 24-bits and so on. The sizeof applied to any other type is in multiples of sizeof(char) which is by definition 1.
The next important thing to note is that C++ has three character types: plain char, signed char and unsigned char. A plain char is either signed or unsigned. So it is wrong to assume that char can have only values from 0 to 255. This is true only when a char is 8-bits, and plain char is unsigned.
Having said, that assuming that 'string' is 'std::namespace', sizeof(string) == 4 means that the sizeof the 'std::string' class is 4 bytes. It occupies 4 times the number of bytes that a 'char' on that machine takes. Note that signed T, unsigned T always have the same size. It does not mean that the actual buffer of characters (which is called string in common parlance) is only 4 bytes. Inside the 'std::string' class, there is a non static member pointer which is allocated dynamically to hold the input buffer. This can have as many elements as the system allows (C++ places no restriction on this length). But since the 'std::string' class only holds the pointer to this potentially infite length buffer, the sizeof(std::string) always remains the same as sizeof pointer on the given architecture which on your system is 4.
I know a lot of people had answered your question, but here are some points:
It's not the size of the string or the capacity of the string, this value represents the structural size of the class string, which you can see by its implementation (and it can change from implementation to implementation) that is a simple pointer;
As the sizeof(string) is the size of the class structure, you'll get the size of the only internal pointer, that in your case is 4 bytes (because you are in a 32-bit machine, this can change from platform to platform too);
This pointer inside the string class, points to a memory buffer where the class will hold the real string data, this memory buffer is reallocated as needed, it can increase/decrease as you append/delete/create more string text;
If you want to get the real size of the string, you need to call the size() method from the class which will check the memory buffer string size (which isn't the same as the memory buffer size).
I think your problem is your conception of sizeof, see more information here and here is some explanation on how it works.
Not at all. It means that the class's structure is that, it doesn't include the dynamic memory it can control. std::string will expand dynamically to meet any required size.
s.max_size() // will give the true maximum size
s.capacity() // will tell you how much it can hold before resizing again
s.size() // tells you how much it currently holds
The 4 you get from sizeof is likely a pointer of some kind to the larger structure. Although some optimizations on some platforms will use it as the actual string data until it grows larger than can fit.
No, it means that the sizeof the class string is 4.
It does not mean that a string can be contained in 4 bytes of memory. Not at all. But you have to difference between dynamic memory, used to contain the size characters a string can be made of, and the memory occupied by the address of the first of those characters
Try to see it like this:
contents --------> |h|e|l|l|o| |w|o|r|ld|\0|
sizeof 4 refers to the memory occupied by contents. What it contents? Just a pointer to (the address of ) the first character in the char array.
How many characters does a string can contain ? Ideally, a character per byte available in memory.
How many characters does a string actually have? Well, theres a member function called size() that will tell you just that
size_type size() const
See more on the SGI page !
A string object contains a pointer to a buffer on the heap that contains the actual string data. (It can also contain other implementation-specific meta-information, but yours apparently doesn't.) So you're getting the size of that pointer, not the size of the array it points to.
you can also use strings and can find out its length by string.length() function. look at the below code:
// Finding length of a string in C++
#include<iostream>
#include<string>
using namespace std;
int count(string);
int main()
{
string str;
cout << "Enter a string: ";
getline(cin,str);
cout << "\nString: " << str << endl;
cout << count(str) << endl;
return 0;
}
int count(string s){
if(s == "")
return 0;
if(s.length() == 1)
return 1;
else
return (s.length());
}
you can get the details from :
http://www.programmingtunes.com/finding-length-of-a-string-in-c/
size() of string gives the number of elements in the string whereas sizeof() function on a string gives three extra bits. strlen() of a character array gives the number of elements + 1 (because of null char delimiter) and keep in mind size of char is 1 byte. sizeof() on a char array gives the size assigned to the array
string str="hello";
char arr[x]="hello";
cout<<str.size()<<endl<<sizeof(str)<<endl;
cout<<strlen(arr)<<endl<<sizeof(arr)<<endl;
output is 5 8 5 x

how character pointer could be used to point a string in c++?

First of all I am beginner in C++. I was trying to learn about type casting in C++ with strings and character pointer. Is it possible to point a string with a character pointer?
int main() {
string data="LetsTry";
cout<<(&data)<<"\n";
cout<<data<<"\n"<<"size "<<sizeof(data)<<"\n";
//char *ptr = static_cast<char*>(data);
//char *ptr=(char*)data;
char *ptr = reinterpret_cast<char*>(&data);
cout<<(ptr)<<"\n";
cout<<*ptr;
}
The above code yields outcome as below:
0x7ffea4a06150
LetsTry
size 32
`a���
`
I understand as ptr should output the address 0x7ffea4a06150
Historically, in C language strings were just a memory areas filled with characters. Consequently, when a string was passed to a function, it was passed as a pointer to its very first character, of type char *, for mutable strings, or char const *, if the function had no intent to modify string's contents. Such strings were delimited with a zero-character ((char)0 a.k.a. '\0') at the end, so for a string of length 3 you had to allocate at least four bytes of memory (three characters of the string itself plus the zero terminator); and if you only had a pointer to a string's start, to know the size of the string you'd have to iterate it to find how far is the zero-char (the standard function strlen did it). Some standard functions accepted en extra parameter for a string size if you knew it in advance (those starting with strn or, more primitive and effective, those starting with mem), others did not. To concatenate two strings you first had to allocate a sufficient buffer to contain the result etc.
The standard functions that process char pointers can still be found in STL, under the <cstring> header: https://en.cppreference.com/w/cpp/header/cstring, and std::string has synonymous methods c_str() and data() that return char pointers to its contents, should you need it.
When you write a program in C++, its main function has the header of int main(int argc, char *argv[]), where argv is the array of char pointers that contains any command-line arguments your program was run with.
Ineffective as it is, this scheme could still be regarded as an advantage over strings of limited capacity or plain fixed-size character arrays, for instance in mid-nineties, when Borland introduced the PChar type in Turbo Pascal and added a unit that exported Pascal implementations of functions from C's string.h.
std::string and const char* are different types, reinterpret_cast<char*>(&data) means reinterpret the bits located at &data as const char*, which is not we want in this case.
so assuming we have type A and type B:
A a;
B b;
the following are conversion:
a = (A)b; //c sytle
// and
a = A(b);
// and
a = static_cast<A>(b); //c++ style
the following are bit reinterpretation:
a = *(A*)&b; //c style
// and
a = *reinterpret_cast<A*>(&b); //c++ style
finally, this should works:
int main() {
string data = "LetsTry";
const char *ptr = data.c_str();
cout<< ptr << "\n";
}
bit reinterpretation is sometimes used, like when doing bit manipulation of a floating point number, but there are some rules to follow like this one What is the strict aliasing rule?
also note that cout << ptr << "\n"; is a specially case because feeds a pointer to std::cout usually output the address that pointer points to, but std::cout treats char* specially so that it output the content of that char array instead
In C++, string is class and what you doing is creating a string object. So, to use are char * you need to convert it using c_str()
You can refer below code:
std::string data = "LetsTry";
// declaring character array
char * cstr = new char [data.length()+1];
// copying the contents of the
// string to char array
std::strcpy (cstr, data.c_str());
Now, you can get use char * to point your data.

How do you best utilize wcsdup?

I'm writing code and a good portion of it requires returning wchar arrays. Returning wstrings aren't really an option (although I can use them) and I know I can pass a pointer as an argument and populate that, but I'm looking specifically to return a pointer to this array of wide chars. The first few iterations, I found that I would return the arrays alright, but by the time they are processed and printed, the memory would be overwritten and I would be left with gibberish. To fix this, I started using wcsdup, which fixed everything, but I'm struggling to grasp exactly what is happening, and thus, when it should be called so that it works and I leak no memory. As it is, I pretty much use wcsdup every time I return a string and every time a string is returned, which I know leaks memory. Here is what I'm doing. Where and why should I use wcsdup, or is there a better solution than wcsdup altogether?
wchar_t *intToWChar(int toConvert, int base)
{
wchar_t converted[12];
/* Conversion happens... */
return converted;
}
wchar_t *intToHexWChar(int toConvert)
{
/* Largest int is 8 hex digits, plus "0x", plus /0 is 11 characters. */
wchar_t converted[11];
/* Prefix with "0x" for hex string. */
converted[0] = L'0';
converted[1] = L'x';
/* Populate the rest of converted with the number in hex. */
wchar_t *hexString = intToWChar(toConvert, 16);
wcscpy((converted + 2), hexString);
return converted;
}
int main()
{
wchar_t *hexConversion = intToHexWChar(12345);
/* Other code. */
/* Without wcsdup calls, this spits out gibberish. */
wcout << "12345 in Hex is " << hexConversion << endl;
}
wchar_t *intToWChar(int toConvert, int base)
{
wchar_t converted[12];
/* Conversion happens... */
return converted;
}
This returns a pointer to a local variable.
wchar_t *hexString = intToWChar(toConvert, 16);
After this line, hexString will point to invalid memory and using it is undefined (may still have value or may be garbage!).
You do the same thing with the return from intToHexWChar.
Solutions:
use std::wstring
use std::vector<wchar_t>
pass in an array to the function for it to use
use smart pointers
use dynamic memory allocation (please don't!)
Note: you might also need to change to wcout instead of cout
Since you tagged your question with 'C++' the answer is a resounding: no, you should not use wcsdup at all. Instead, for passing arrays of wchar_t values around, use std::vector<wchar_t>.
If needed, you can turn those into a wchar_t* by taking the address of the first element (since vectors are guaranteed to be stored in contiguous memory), e.g.
cout << "12345 in Hex is " << &hexConversion[0] << endl;

C++ how to count correctly the characters in a const char?

I have got a const char which is made by concatenation like this:
const char *fileName = "background1";
std::stringstream sstm;
sstm << fileName << "-hd.png";
fileName = sstm.str().c_str();
My problem is that the following instruction:
printf("const char = %s size = %d", fileName, sizeof(fileName));
returns:
"const char = background1-hd.png size = 4"
whereas I would expect that it returns:
"const char = background1-hd.png size = 19"
For example, the following gives the convenient result (as there is no concatenation):
const char *fileName2 = "background1-hd";
printf("const char = %s size = %d", fileName2, sizeof(fileName2));
returns:
"const char = background1-hd.png size = 19"
How to avoid this issue and guarantee that the characters will be correctly counted in my concatenated char ?
Thanks !!
sizeof() returns the number of bytes the variable occupies in memory (in this case returns the size of the pointer fileName).
strlen() returns the length of the string (which is what you need).
You could as well try something like:
#include <iostream>
#include <cstdio>
int main()
{
std::string fileName("background1");
fileName.append("-hd.png");
printf("const char = %s size = %d", fileName.c_str(), fileName.length());
return 0;
}
sizeof returns the size of the variable you give to it; it's evaluated at compile time. The "4" is the size of a pointer on your system. You want to use strlen() to determine the length of a string.
The result of sizeof(fileName) is related to fileName being a pointer, not an array. It literally returns the size of a pointer to a constant character string, and on a 32-bit system, all pointers are 32 bits (so sizeof == 4).
What you should use instead is strlen or similar, which will count the characters in the string, up to the trailing null, and return that. The results with strlen in place of sizeof will be about what you expect.
Side-related, with const char strings there is only ever one character per "cell" (actually byte). There are character sets which make for multiple bytes per character, but packing multiple characters into a single byte is quite rare, at least in C-family languages.
sizeof calculates the size of the data type in bytes and not the size of its contents (what it points to). In your example you are calculating the sizeof char* which is 4 bytes on your system. To get the length of a C string use strlen.
There is a distinction in the language between arrays and pointers, even if this distinction seems diluted both by implicit conversions (arrays tend to decay into pointers quite easily), and common statements that they are the same.
How does this even relate to your code?
Well, a string literal is actually an array of constant characters, not a pointer to character(s). In the initialization const char *fileName = "background1"; you are creating a pointer variable that points to the first element of the array ("background1" is decaying into a pointer to the first element), and from there on the variable you are managing is pointer and not the literal.
If you mix this with the fact that sizeof will tell you the size of the variable, you get that in a platform with 32bit pointers and 8 bit chars, sizeof( const char* ) is always 4, regardless of the object that is pointed by that pointer (if there is even one).
Now, if you were treating the literal as what it actually is you would be having a bit more luck there:
const char filename[] = "background1";
assert( sizeof filename == 12 ); // note: NUL character is counted!
const char *fname = filename;
assert( sizeof filename == sizeof( void* ) );
In real code, you are not a so lucky and in many cases the literals have decayed into pointers well before you get a chance of getting the compile time size of the literal, so you cannot ask the compiler to tell you the size. In that case you need to calculate the length of the C style string, which can be done by calling strlen.
strlen has been suggested a number of times already, and for this case it's probably perfectly reasonable.
There is an alternative that will let you use sizeof though:
char fileName[] = "background1";
std::cout << sizeof(fileName) << "\n";
Since you're making fileName an array, it has all the characteristics of an array -- including the fact that your later attempt at assigning to it:
fileName = sstm.str().c_str();
...would fail (won't even compile when fileName is defined as an array). I should add, however, that it seems to me that you'd be better off just using std::string throughout:
std::string fileName("background1");
std::stringstream sstm;
sstm << fileName << "-hd.png";
fileName = sstm.str();
In this case, you can use string's size() or length() member.

c++ sizeof( string )

#include <cstdlib>
#include <iostream>
int main(int argc, char *argv[])
{
cout << "size of String " << sizeof( string );
system("PAUSE");
return EXIT_SUCCESS;
}
Output:
size of String = 4
Does that mean that, since sizeof(char) = 1 Byte (0 to 255), string can only hold 4 characters?
It isn't clear from your example what 'string' is. If you have:
#include <string>
using namespace std;
then string is std::string, and sizeof(std::string) gives you the size of the class instance and its data members, not the length of the string. To get that, use:
string s;
cout << s.size();
When string is defined as:
char *string;
sizeof(string) tells you the size of the pointer. 4 bytes (You're on a 32-bit machine.) You've allocated no memory yet to hold text. You want a 10-char string? string = malloc(10); Now string points to a 10-byte buffer you can put characters in.
sizeof(*string) will be 1. The size of what string is pointing to, a char.
If you instead did
char string[10];
sizeof(string) would be 10. It's a 10-char array.
sizeof(*string) would be 1 still.
It'd be worth looking up and understanding the __countof macro.
Update: oh, yeah, NOW include the headers :) 'string' is a class whose instances take up 4 bytes, that's all that means. Those 4 bytes could point to something far more useful, such as a memory area holding more than 4 characters.
You can do things like:
string s = "12345";
cout << "length of String " << s.length();
sizeof(char) is always 1 byte. A byte which we think is 8-bits need not be the case. There are architectures where a BYTE is 32-bits, 24-bits and so on. The sizeof applied to any other type is in multiples of sizeof(char) which is by definition 1.
The next important thing to note is that C++ has three character types: plain char, signed char and unsigned char. A plain char is either signed or unsigned. So it is wrong to assume that char can have only values from 0 to 255. This is true only when a char is 8-bits, and plain char is unsigned.
Having said, that assuming that 'string' is 'std::namespace', sizeof(string) == 4 means that the sizeof the 'std::string' class is 4 bytes. It occupies 4 times the number of bytes that a 'char' on that machine takes. Note that signed T, unsigned T always have the same size. It does not mean that the actual buffer of characters (which is called string in common parlance) is only 4 bytes. Inside the 'std::string' class, there is a non static member pointer which is allocated dynamically to hold the input buffer. This can have as many elements as the system allows (C++ places no restriction on this length). But since the 'std::string' class only holds the pointer to this potentially infite length buffer, the sizeof(std::string) always remains the same as sizeof pointer on the given architecture which on your system is 4.
I know a lot of people had answered your question, but here are some points:
It's not the size of the string or the capacity of the string, this value represents the structural size of the class string, which you can see by its implementation (and it can change from implementation to implementation) that is a simple pointer;
As the sizeof(string) is the size of the class structure, you'll get the size of the only internal pointer, that in your case is 4 bytes (because you are in a 32-bit machine, this can change from platform to platform too);
This pointer inside the string class, points to a memory buffer where the class will hold the real string data, this memory buffer is reallocated as needed, it can increase/decrease as you append/delete/create more string text;
If you want to get the real size of the string, you need to call the size() method from the class which will check the memory buffer string size (which isn't the same as the memory buffer size).
I think your problem is your conception of sizeof, see more information here and here is some explanation on how it works.
Not at all. It means that the class's structure is that, it doesn't include the dynamic memory it can control. std::string will expand dynamically to meet any required size.
s.max_size() // will give the true maximum size
s.capacity() // will tell you how much it can hold before resizing again
s.size() // tells you how much it currently holds
The 4 you get from sizeof is likely a pointer of some kind to the larger structure. Although some optimizations on some platforms will use it as the actual string data until it grows larger than can fit.
No, it means that the sizeof the class string is 4.
It does not mean that a string can be contained in 4 bytes of memory. Not at all. But you have to difference between dynamic memory, used to contain the size characters a string can be made of, and the memory occupied by the address of the first of those characters
Try to see it like this:
contents --------> |h|e|l|l|o| |w|o|r|ld|\0|
sizeof 4 refers to the memory occupied by contents. What it contents? Just a pointer to (the address of ) the first character in the char array.
How many characters does a string can contain ? Ideally, a character per byte available in memory.
How many characters does a string actually have? Well, theres a member function called size() that will tell you just that
size_type size() const
See more on the SGI page !
A string object contains a pointer to a buffer on the heap that contains the actual string data. (It can also contain other implementation-specific meta-information, but yours apparently doesn't.) So you're getting the size of that pointer, not the size of the array it points to.
you can also use strings and can find out its length by string.length() function. look at the below code:
// Finding length of a string in C++
#include<iostream>
#include<string>
using namespace std;
int count(string);
int main()
{
string str;
cout << "Enter a string: ";
getline(cin,str);
cout << "\nString: " << str << endl;
cout << count(str) << endl;
return 0;
}
int count(string s){
if(s == "")
return 0;
if(s.length() == 1)
return 1;
else
return (s.length());
}
you can get the details from :
http://www.programmingtunes.com/finding-length-of-a-string-in-c/
size() of string gives the number of elements in the string whereas sizeof() function on a string gives three extra bits. strlen() of a character array gives the number of elements + 1 (because of null char delimiter) and keep in mind size of char is 1 byte. sizeof() on a char array gives the size assigned to the array
string str="hello";
char arr[x]="hello";
cout<<str.size()<<endl<<sizeof(str)<<endl;
cout<<strlen(arr)<<endl<<sizeof(arr)<<endl;
output is 5 8 5 x