Looping arrays using pointers - c++

char ch[] = {'h', 'e', 'l', 'l', 'o'};
char* p = ch;
while(*p){
std::cout << *p << std::endl;
++p;
}
This prints the elements of array + garbage but,
int ch[] = {1, 2, 3, 4};
int* p = ch;
while(*p){
std::cout << *p << std::endl;
++p;
}
This prints just the elements of array no garbage.
I am just a beginner to c++ and i am failing to figure out the reason behind such behaviour.
Need some help with this.

while (*p) { continues until it hits a value of 0. In neither case have you ensured that there is a 0 terminating your arrays and it's just luck that one worked whilst the other didn't.

Due to the lack of a defined end to your string you will continue to iterate beyond the intended array of characters, which will output data held within adjacent memory. Character strings are typically set up in c/c++ to terminate by null character ('\0') which allows the array to iterate until *p=='\0'.
If you add an element to the array, set the last element to '\0', and set the while loop to run until *p=='\0', you should achieve the desired output.
This will prevent the loop from overrunning your intended bounds and is a good practice in software security in addition to being functionally important, as outputting additional memory can sometimes be exploited in applications.

Related

Why do I get garbage value when output character array to console?

I always get a garbage value like this 'Íýýýý««««««««îþîþ' at the end when i output my array. What am I doing wrong?
void func()
{
const int size = 100;
char * buffer = new char[size];
for(int i=0; i<size; i++)
buffer[i] = ' ';
cout<<buffer;
}
However if I use a for loop to output the buffer, there is no garbage value.
Because you don't null terminate your buffer, std::cout.operator<<(char*) will try to find \0 as its terminating character.
As pointed out in comments, feel free to append that \0 to the end of your buffer :).
ScarletAmaranth is right. C style strings (an array of char) must finish with the char '\0'. That's the way functions that play with char arrays (cout in this case) know when the string finish in memory. If you forget the '\0' character at the end of the array, cout prints all the chars in the array and then goes on printing any data in memory after the array. These other data is rubbish, and that's why you see these strange characters.
If you use the string type (more C++ style) instead of char arrays, you won't have this problem because string type don't use '\0' as a string delimiter.
On the other hand, you don't see rubbish when use a loop to iterate over the 100 elements of the array just because you only print these 100 chars. I mean, you control exactly what you are printing and know when to stop, instead of leaving the cout function figure out when to stop printing.

How come std::cout works fine for [] = "12345" but not for [] = {'1','2','3','4','5'}?

I've noticed a weird discrepancy in C++.
Say I have this code:
const char myChars[] = "12345";
std::cout << myChars;
The output is: 12345
However, if I then change it to:
const char myChars[] = {'1','2','3','4','5'};
std::cout << myChars;
Then the output is: 12345__SOME_RANDOM_DATA_IN_MEMORY__
Why is it that cout appears to know the length of the first version but not the length of the second version? Also, does cout even know the length?
Thanks.
There is no null terminator in your second example.
const char myChars[] = {'1','2','3','4','5', 0};
would work fine.
Strings literals require a null-terminator to indicate the end of the string.
See this stackoverflow answer for more detailed information: https://stackoverflow.com/a/2037245/507793
As for your first example, when you make a string literal using quotes like "Hello", that is roughly equivalent to {'H', 'e', 'l', 'l', 'o', 0}, as the null-terminator is implicit when using quotes.
Ask the compiler for sizeof(myChars) and notice the difference. The end of a string is marked by a null character, implicit when you use "".
When setting mychars[] with "12345", you implicitly add a '\0' to the end of it, telling the program that this is the end of the string, wich you dont with {'1','2','3','4','5'};
C strings are implemented as char arrays that end with a special character \0.
String literals have it implicitly. While the curly braces array initialization doesn't add it.
You need to add it manually
const char myChars[] = {'1','2','3','4','5', '\0'};
or simply
const char myChars[] = {'1','2','3','4','5', 0};
Since '\0' == 0 numerically.
const char myChars[] = {'1','2','3','4','5','\0'};
do not forget to add null terminate string

Can you change the size of what a pointer point to

For example if a pointer points to an array of chars that read "Hello how are you?" And you only want the pointer to point to Hello. I am passing in a char pointer and when I cout it, it reads the entire array. I try to cut down the size using a for loop that break when it hit a ' '. But I am not having luck figuring it out. Any ideas?
const char *infile(char * file )
{
cout<<file<<endl; //this prints out the entire array
int j;
for(j=0;j<500; j++)
{
if(file[j]==' ')
break;
}
strncpy(file, file, j);
cout<<file<<endl; //how to get this to print out only the first word
}
strncpy() does not append a null terminator if there isn't one in the first j bytes of your source string. And your case, there isn't.
I think what you want to do is manually change the first space to a \0:
for (j = 0; j < 500; j++) {
if (file[j] == ' ') {
file[j] = '\0';
break;
}
}
First, avoid strtok (like the plague that it mostly is). It's unpleasant but sometimes justifiable in C. I've yet to see what I'd call justification for using it in C++ though.
Second, probably the easiest way to handle this (given that you're using C++) is to use a stringstream:
void infile(char const *file)
{
std::strinstream buffer(file);
std::string word;
buffer >> word;
std::cout << word;
}
Another possibility would be to use some of the functions built into std::string:
void infile(char const *file) {
std::string f(file);
std::cout << std::string(f, 0, f.find(" "));
}
...which, now that I think about it, is probably a bit simpler than the stringstream version of things.
A char* pointer actually just points to a single char object. If that object happens to be the first (or any) element of a string, you can use pointer arithmetic to access the other elements of that string -- which is how strings (C-style strings, not C++-style std::string objects) are generally accessed.
A (C-style) string is simply a sequence of characters terminated by a null character ('\0'). (Anything after the '\0' terminator isn't part of the string.) So a string "foo bar" consists of this sequence of characters:
{ 'f', 'o', 'o', ' ', 'b', 'a', 'r', '\0' }
If you want to change the string from "foo bar" to just "foo", one way to do it is simply to replace the space character with a null character:
{ 'f', 'o', 'o', '\0', ... }
The ... is not part of the syntax; it represents characters that are still there ('b', 'a', 'r', '\0'), but are no longer part of the string.
Since you're using C++, you'd probably be much better off using std::string; it's much more powerful and flexible, and frees you from having to worry about terminators, memory allocation, and other details. (Unless the point of this exercise is to learn how C-style strings work, of course.)
Note that this modifies the string pointed to by file, and that change will be visible to the caller. You can avoid that by making a local copy of the string (which requires allocating space for it, and later freeing that space). Again, std::string makes this kind of thing much easier.
Finally, this:
strncpy(file, file, j);
is bad on several levels. Calling strncpy() with an overlapping source and destination like this has undefined behavior; literally anything can happen. And strncpy() doesn't necessarily provide a proper NUL terminator in the destination. In a sense, strncpy() isn't really a string function. You're probably better off pretending it doesn't exist.
See my rant on the topic.
Doing this would be much easier
if(file[j]==' ') {
file[j] = 0;
break;
..
// strncpy(file, file, j);
Using strtok might make your life much easier.
Split up the string with ' ' as a delimiter, then print the first element you get from strtok.
Use 'strtok', see e.g. http://www.cplusplus.com/reference/clibrary/cstring/strtok/
If what you're asking is "can I dynamically resize the memory block pointed to by this pointer" then... not really, no. (You have to create a new block of the desired size, then copy the bytes over, delete the first block, etc.)
If you're trying to just "print the first word" then set the character at the position of the space to 0. Then, when you output the file* pointer you'll just get the first word (everything up to the \0.) (Read null terminated strings for more information on why that works that way.)
But this all depends on how much of what you're doing is an example to demonstrate the problem you're trying to solve. If you're really 'splitting up strings' then you'll at least want to look in to using strtok.
Why not just output each character at a time and then break once you hit a space.
const char *infile(char * file )
{
cout<<file<<endl; //this prints out the entire array
int j;
for(j=0;j<500; j++)
{
if(file[j]==' ')
break;
cout<<file[j];
}
cout<<endl;
}
This has nothing to do with the size of the pointer. A pointer always has the same size for a particular type.
Strtok might be the best solution (this code using strtok will break the string into substring every time is meets a space, an ",", a dot or a "-".
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
Source : CPP STRTOK
std::copy(file, std::find(file, file+500, ' '),
std::ostream_iterator<char>(std::cout, ""));
If you allocated the space that a char * points to using malloc, you can change the size using realloc.
char * pzFile = malloc(sizeof("Hello how are you?" + 1));
strcpy(pzFile, "Hello how are you?");
realloc(pzFile, 6);
pzFile[6] = '\0';
Note that if you do not set the null pointer, using the string can cause a problem.
If you were just trying to shorten the string, all you had to do is set the null terminator at position 6. The space allocated is larger than needed, but that's OK as long as it's not shorter.
I strongly advise that mostly what you want to do is COPY the string up to the space.
char * pzInput = malloc(sizeof("Hello how are you?" + 1));
strcpy(pzInput, "Hello how are you?");
char * pzBuffer = malloc(BUFFER_SIZE);
char * pzSrc = pzInput;
char * pzDst = pzBuffer;
while (*pzSrc && ' ' != *pzSrc)
*(pzDst++) = *(pzSrc++);
*pzDst = '\0';
This also ends up with pzSrc pointing at the rest of the string for later use!

Sending C strings and vectors to MurmurHash gives inconsistent results

I'm trying to use MurmurHash (returning 64 bit hashes on a 64bit comoputer) and have sent it the simple 3 letter string 'yes' as follows
char* charptr = "yes";
cout << MurmurHash64A(charptr, 3, 10);
(where 3 is the length and 10 is the seed)
This gives a 64bit hashed response as expected, I can set up more pointers to C strings holding yes and they all return the same hashed value.
But if I try to sending it a C++ string:
string mystring = "yes";
string* strptr = &mystring;
cout << MurmurHash64A(strptr, 3, 10);
...I get a different result to the C string method, what's more if I set up several of these strings in the same way, they all give different results.
This suggests to me that strings are maybe not stored in contiguous memory locations, some Googling backed this up.
So I then tried to set up a vector in dynamic memory as this was the only way I could think of to force contigous memory.
Just like the C++ string method it returned a different result from the C string method and when I set up several they all return a different result from each other. I set them up like follows:
char yes[3] = {'y', 'e', 's'};
vector<char> *charvec = new vector<char>;
void* myvecptr3 = &charvec;
charvec->reserve(3);
charvec->push_back(yes[0]);
charvec->push_back(yes[1]);
charvec->push_back(yes[2]);
As I understand it my char vector will start at the address the vector is given and fill consecutive bytes with my three characters in the same way as a C string.
I am confused why I'm getting different results, any help appreciated?
Thanks
C
&mystring points at the string object. You want to use mystring.c_str() to get a pointer to a raw character array.
For the vector, you want &(*charvec)[0]. But you probably don't want to use new; you could just do vector<char> charvec; void *myvecptr3 = &charvec[0];.
The reason is that std::string itself stores a pointer to the char array. Try
string mystring = "yes";
cout << MurmurHash64A(mystring.c_str(), 3, 10);
And you would not need to work with char vector indeed.

C++ Static Array Initialization - Memory Issue

I have a header file which contains a member variable declaration of a static char array:
class ABC
{
public:
static char newArray[4];
// other variables / functions
private:
void setArray(int i, char * ptr);
}
In the CPP file, I have the array initialized to NULL:
char ABC::newArray[4] = {0};
In the ABC constructor, I need to overwrite this value with a value constructed at runtime, such as the encoding of an integer:
ABC::ABC()
{
int i; //some int value defined at runtime
memset(newArray, 0, 4); // not sure if this is necessary
setArray(i,newArray);
}
...
void setArray(int i, char * value)
{
// encoding i to set value[0] ... value [3]
}
When I return from this function, and print the modified newArray value, it prints out many more characters than the 4 specified in the array declaration.
Any ideas why this is the case.
I just want to set the char array to 4 characters and nothing further.
Thanks...
How are you printing it? In C++ (and C), strings are terminated with a nul. (\0). If you're doing something like:
char arr[4] = {'u', 'h', 'o', 'h'};
std::cout << arr;
It's going to print "uhoh" along with anything else it runs across until it gets to a \0. You might want to do something like:
for (unsigned i = 0; i < 4; ++i)
std::cout << arr[i];
(Having a static tied to instances of a class doesn't really make sense, by the way. Also, you can just do = {}, though it's not needed since static variables are zero-initialized anyway. Lastly, no it doesn't make sense to memset something then rewrite the contents anyway.)
cout.write(arr, count_of(arr))
If count_of isn't defined in a system header:
template<typename T, size_t N>
inline size_t count_of(T (&array)[N]) { return N; }
Are you printing it using something like
printf("%s", newArray); //or:
cout << newArray;
? If so, you need to leave space for the nul-terminator at the end of the string. C strings are just arrays of characters, so there's no indication of the length of the string; standard library functions that deal with strings expect them to end in a nul (0) character to mark the ending, so they'll keep reading from memory until they find one. If your string needs to hold 4 characters, it needs to be 5 bytes wide so you can store the \0 in the fifth byte
You'll need a 5th character with a 0 byte to mark the end of the 4 character string, unless you use custom char-array output methods. If you set value[3] to something other than 0, you'll start printing bytes next to newArray in the static data area.
There's also no need to explicitly 0 initialize static data.
You can best catch those kinds of errors with valgrind's memcheck tool.
It is printing out a string that starts at the address &newArray[0] and ends at the first 0 in memory thereafter (called the null terminator).
char strArr[] = {"Hello"};
char strArr[] = {'H', 'e', "llo");
char strArr[] = "Hello";
char* strArr = "Hello"; // careful, this is a string literal, you can't mess with it (read-only usually)
...are all null terminated because anything in double quotes gets the null terminator tacked on at the end
char strArr[] = {'H', 'e', 'l', 'l', 'o'};
...is not null terminated, single quotes contain a single character and do not add a null terminator
Here are examples of adding a null terminator...
strArr[3] = '\0';
strArr[3] = NULL;
strArr[3] = 0;
With a bit loss of performance, you can fit into 4 byte.. in 'c-style'.
Print either 4 characters or until \0 is reached:
#include <cstdio>
#include <cstring>
...
//calculate length
size_t totalLength = sizeof(ABC::newArray) / sizeof(ABC::newArray[0]);
char* arrayEnd = (char*)memchr(ABC::newArray, '\0', totalLength);
size_t textLength = arrayEnd != 0 ?
arrayEnd-ABC::newArray : totalLength;
//print
fwrite(
ABC::newArray, //source array
sizeof(ABC::newArray[0]), //one item's size
textLength, //item count
stdout); //destination stream
By the way, try to use std::string and std::cout.