Understanding multidimensional string array in C++ - c++

I am new to c++. I can not understand why the following code prints the "r" string. I think the it should be an array of 2X3X4 elements, so by pointing to the arr[0][0][0] i would expect the first char in the first string of the first arr=a, but this prints abcd. Can anyone explain it?
#include <iostream>
using namespace std;
int main()
{
string arr [2] [3] [4]={
{"abcd","efgh","ijkl"},
{"mnop","qrst","xywz"}
};
cout<<arr [1] [0] [1] [1]<<endl;
return 0;
}
Edit:
What makes me confused is the behavior in python. The following python code prints a:
arr=[["abcd","efgh","ijkl"],["mnop","qrst","xwyz"]]
print arr[0][0][0]
It addresses to the first letter of the first string in the first list.
I would think that the equivalent of this in c++ would be:
#include <iostream>
using namespace std;
int main()
{
string arr [2] [3] [4]={
{"abcd","efgh","ijkl"},
{"mnop","qrst","xywz"}
};
cout<<arr[0][0][0]<<endl;
return 0;
}
by pointing to the first letter in the first string of the first array. But this prints the first string abcd. My question is why should i put another [0] in order to get to the a?

Your initializer populates the array as follows:
arr[0][0][0] = "abcd";
arr[0][0][1] = "efgh";
arr[0][0][2] = "ijkl";
arr[1][0][0] = "mnop";
arr[1][0][1] = "qrst";
arr[1][0][2] = "xywz";
All other elements are default-initialized to empty string.
Thus, arr[1][0][1] is the string containing "qrst", and arr[1][0][1][1] is the second character of that string, namely 'r'.

You've confused the standard library string object with the concept of a c-string/string literal, and you've helped yourself with this by avoiding the use of the std:: prefix. If we add this, it starts to make more sense:
std::string arr [2] [3] [4]={
{"abcd","efgh","ijkl"},
{"mnop","qrst","xywz"}
};
What you are declaring here is an array of 2 x 3 x 4 instances of std::string. But what you wrote looks like you thought you were declaring character arrays:
char arr [2] [3] [4] = {
{"abcd","efgh","ijkl"},
{"mnop","qrst","xywz"}
};
would almost have the effect you were trying to achieve -- in this case arr[0][0][0] does point to a rather than the string.
Unfortunately the problem here is that you've specified a final dimension of 4 and then supplied 5-character c-strings to the initializer. Remember:
"abcd"
is equivalent to
{ 'a', 'b', 'c', 'd', 0 }
because c-strings are nul-terminated. So you would need to write
char arr [2] [3] [5] = {
{"abcd","efgh","ijkl"},
{"mnop","qrst","xywz"}
};
or, if what you actually want is specifically arrays of characters, not nul-terminated c-strings:
charr arr[2][3][4] = {
{ { 'a', 'b', 'c', 'd' }, { 'e', 'f', 'g', 'h' }, ...
std::string is a discrete object, not an alias for a c-string.
#include <iostream>
#include <string>
int main() {
std::string arr[2][3] = {
{ "abcd", "efgh", "ijkl" },
{ "mnop", "qrst", "wxyz" }, // who needs 'u' or 'v'?
};
std::cout << "arr[0][0] = " << arr[0][0] << "\n";
std::cout << "arr[0][0][0] = " << arr[0][0][0] << "\n";
}
http://ideone.com/JQrDxr

The array you initialized is probably not what you wanted to initialize.
One dimension string array
string arr [2]= {"abcd","efgh"};
Two dimensional string array
string arr [2][2]= {{"abcd","efgh"}, {"ijkl","mnop"}};
Three dimensional string array
string arr [2][2][2]= {
{
{"abcd","qwer"},
{"efgh","tyui"}
},
{
{"ijkl","zxcv"},
{"mnop","bnmo"}
}
};
so, cout<<arr [1] [0] [1] [1]<<endl; will output 'x'

You can't compare python to c++ because in python a list and a string is nearly the same while in c++ there completely different. In Python it doesn't matter if you write
arr=[["abcd","efgh","ijkl"],["mnop","qrst","xwyz"]]
or
arr=[["a","b","c","d","e","f","g","h","i","j","k","l"],
["m", "n","o","p","q","r","s","t","x","w","y","z"]]
because there meaning the same. in c++ instead it treats a string as one "container" which leads to your observed behavior that it will assign these "containers" to the first indexes instead the individual chars. What you can instead do is
char[2][12] arr = {
{"a","b","c","d","e","f","g","h","i","j","k","l"},
{"m", "n","o","p","q","r","s","t","x","w","y","z"}
};

Related

Missing elements when returning a std::vector

I am writing a function in C++ which should, in theory, take user input and splice this input into segments according to whitespace and return these segments as a vector.
What I am currently doing is by using strtok() on the input string in order to separate the words by whitespace. For each "word", I push it on the buffer vector. After iterating over each word, I return the vector.
So this is the code I have thus far:
#include <iostream>
#include <string>
#include <cstring>
#include <vector>
std::vector<char*> tokenize(std::string input_, char const* delims=" \t\r\n\a")
{
char* input = (char*)input_.c_str();
std::vector<char*> tk_stream;
char* tk = strtok(input, delims);
while(tk != NULL) {
tk_stream.push_back(tk);
tk = strtok(NULL, delims);
}
return tk_stream;
}
int main(int argc, char** argv)
{
while (true) {
std::string input;
std::getline(std::cin, input);
if (input.empty()) {
continue;
}
std::vector<char*> tks = tokenize(input);
for (char* el : tks) {
std::cout << el << std::endl;
}
}
return 0;
}
So what is supposed to happen? well, if I have an input of "1 2 3 4" it should print each of those numbers on separate lines. This actually works with that input. But when the length of the input string is greater, for example, "1 2 3 4 5 6 7 8 9", the output is different:
1 2 3 4 5 6 7 8 9
5
6
7
8
9
It is missing the first 4 numbers! This also happens for any string with a greater length above this and the number of missing numbers is constant. I also noticed this happens with longers sentences. For example "hello everyone this is a test" gives:
hello everyone this is a test
0��
this
is
a
test
I have already done some digging with gdb and found something that is interesting. With the input of "1 2 3 4 5 6 7 8 9", I set up a breakpoint before the 'tk_stream' is returned and checked the value of it:
(gdb) print tk_stream
$1 = std::vector of length 9, capacity 16 = {0x6176c0 "1", 0x6176c2 "2", 0x6176c4 "3", 0x6176c6 "4", 0x6176c8 "5", 0x6176ca "6", 0x6176cc "7", 0x6176ce "8", 0x6176d0 "9"}
This seems correct. But after I step a few lines when this is returned from the function and check the value of 'tks' (the vector which should contain the returned value of the 'tokenize' function); I receive this:
(gdb) print tks
$2 = std::vector of length 9, capacity 16 = {0x6176c0 "", 0x6176c2 "a", 0x6176c4 "", 0x6176c6 "", 0x6176c8 "5", 0x6176ca "6", 0x6176cc "7", 0x6176ce "8", 0x6176d0 "9"}
which is missing the first 4 entries with a garbled 2nd entry.
So something must happen in the returning of the 'tk_stream' vector.
What is the reason for this abnormal behavior? How can I fix this so no elements of the vector are deleted?
You don't want to be using raw pointers like char*, use std::string instead.
Something like:
std::vector<std::string> tokenize(std::string input_, const std:string delims=" \t\r\n\a")
{
std::string input = input_;
std::vector<std::string> tk_stream;
// ...
You're passing your string by value into your tokenize function. You then call c_str() on that local string object and store pointers into that space into your vector. Your function then returns, and with it the storage in the local string object. Which now means that all of the pointers that you've stored into the vector are now all dangling pointers. Deferencing any of them is Undefined Behaviour.
It "appears to work" for short strings (likely string < 16 characters long) due to something called the Short String Optimization. Many implementations of std::string has a small buffer (a common size is 16 bytes, but it is not defined by Standard) inside the std::string object itself. Once the string gets longer than that, std::string will dynamically allocate a buffer to hold the string. When you use the short string, your dangling pointers are pointing into your stack, and your data has not yet been overwritten there. When you use a long string, your pointers are pointing into some arbitrary place in memory, which may have been overwritten by something else.
Oh, to fix, pass your std::string by reference: const std::string & input_.

C Char Array Creation Differences

In C I've noticed that there are a few ways to declare a char array. What's the difference between:
char arr[10] = "abcdefghij";
char* arr2[10] = {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j"};
gcc says I need the star after char in 2 and not in 1.
when printing 1 I can use printf("%s\n", arr); and it prints abcdefghij#
when printing 2 I have to use a for loop
Why are they different?
Because 1 is a char array, while the second is an array of arrays as "a" is in fact an array of 2 chars 'a' and '\0'
arr is a vector of char of 10 elements, and arr2 is a vector of chars pointers of 10 elements
char arr2[10] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'};
It is equal arr.
In the first one you are just declaring a string, but in the second one you are creating 10 pointers to 10 strings. As an example
char* arr2_0 = "a" ;
char* arr2_1 = "b" ;
char* arr2_2 = "c" ;
char* arr2_3 = "d" ;
char* arr2_4 = "e" ;
char* arr2_5 = "f" ;
char* arr2_6 = "g" ;
char* arr2_7 = "h" ;
char* arr2_8 = "i" ;
char* arr2_9 = "j" ;
To understand this behavior, you have to know the difference between strings and chars. The former are created with " and can contain several chars, while the latter are created with ' and represent only one char. Since you used " in your array initialisation, you created an array of strings, not chars. Now strings are represented via pointers to chars. Therefore the compiler wanted you to create an array of pointers to chars, instead of an array of chars.

console outputs smiley face

I have this code:
#include "stdafx.h"
#include <iostream>
typedef struct{
int s1;
int s2;
char c1;
char* arr;
}inner_struc;
int _tmain(int argc, _TCHAR* argv[])
{
inner_struc myinner_struct;
myinner_struct.s1 = myinner_struct.s2 = 3;
myinner_struct.c1 = 'D';
char arr[3] = {1,2,3};
myinner_struct.arr = arr;
std::cout << "first array element: " << myinner_struct.arr[1] << std::endl;
return 0;
}
I wonder why am I getting a smiley face instead of the first array element! what am I doing wrong here? It compiles and runs fine however the output is
first array element: "smiley face"
What does this means?
I am using Visual Studio 2010. Thanks
You are outputing the second element in this array:
char arr[3] = {1,2,3};
You have assigned it the values 1 2 and 3, but the variable is of type char, so it is being interpreted as ascii. If you look up what character has a value of 2 on an ascii chart, you will see that it is a smily face. So it is indeed doing what you asked it to do.
http://mathbits.com/MathBits/CompSci/Introduction/ASCIIch.jpg
What are you expecting the output to be? if you wanted it to be a number, then you will have to add the character representation of that number into the array. ie in stead of 1, 2 or 3 use '1', '2', and '3'
As far, as I can see, you are trying to output the first array element. But instead, you are printing the second one (the arrays are indexed starting from 0, not 1). The second element is 2. Now, please, take a look at this table, as you can see: the number 2 is a smiley face. The problem is that you are outputing a character with code 2, not '2'. In order to output a deuce, make your array look like this:
char arr[3] = {'1','2','3'};
inner_struct.arr is declared as char *, which means it holds an array of characters (bytes). Do you want an array to hold the numbers 1, 2, 3? If so, use int. If you want letters, initialize the array with:
char arr[3] = { 'a', 'b', 'c' };
KC

Translating std::string to vector<char>

I'm trying to convert a std::string to a char* (copying rather than casting) due to having to pass some data to a rather dated API.
On the face of it, there are a number of ways to do this, but it was suggested that I do this as a vector which seemed sensible. However, when I tried this the result was garbled. The code is like:
const string rawStr("My dog has no nose.");
vector<char> str(rawStr.begin(), rawStr.end());
cout << "\"" << (char*)(&str) << "\"" << endl;
(Note the unpleasant C cast - using static_cast does not work which is probably telling me something)
When I run this I get:
"P/"
Clearly not right. I took a look at the vector in gdb
(gdb) print str
$1 = std::vector of length 19, capacity 19 = {77 'M', 121 'y', 32 ' ', 100 'd', 111 'o',
103 'g', 32 ' ', 104 'h', 97 'a', 115 's', 32 ' ', 110 'n', 111 'o', 32 ' ', 110 'n',
111 'o', 115 's', 101 'e', 46 '.'}
Which looks correct although there's no null terminator at the end, which is concerning. The size of the vector (sizeof(str)) is 24 which suggests the characters are being stored as 8-bits.
Where am I going wrong?
The instance of std::vector is not itself an array of characters - it points to an array. Rather than (char*)(&str) try &str[0].
Judging from your gdb output you'll also want to push a zero onto the end of the vector before passing it to your legacy API.
First, the std::string does not contain the null termination as an element within the range covered by [begin(), end()). Second, the address of the vector is not the address of the first element of the vector's data. For this you need &str[0] or str.data():
#include <vector>
#include <string>
#include <iostream>
int main()
{
const std::string rawStr("My dog has no nose.");
std::vector<char> str(rawStr.begin(), rawStr.end());
str.push_back('\0');
std::cout << "\"" << &str[0] << "\"" << std::endl;
std::cout << "\"" << str.data() << "\"" << std::endl; // C++11
}
Two things you need to do:
1) take the address of the first character in the vector using &str[0]; This is absolutely fine (if a little contrived) since the standard guarantees the vector memory is contiguous. You can't simply write &str as that is the address of the vector which is not necessarily the address of the first data element.
2) inject a null terminator at the end of your vector if you want to display the characters as a string using the standard c-like functions. I might be wrong on this second point; does rawStr.end() point at an implicit null terminator associated with "My dog has no nose."?
The &str gets you a pointer to the vector object, not to the contained string of characters.
If you wish to print it as a C string, you'll need to push a 0 onto the end, and then outputting &str[0] (which will grab you the address to the beginning of the contained array).
This is very ugly, though. You are much better off either creating your own string vector class which inherits std::vector or using a function crafted to iterate through a vector, printing each element literally.
Edit:
If you are privy to C++11, for_each with a lambda could be used here in a clean way:
std::for_each(str.begin(), str.end(), [](char i) -> void {std::cout << i;});

Assign a fixed length character array to a string

I have a fixed length character array I want to assign to a string. The problem comes if the character array is full, the assign fails. I thought of using the assign where you can supply n however that ignores \0s. For example:
std::string str;
char test1[4] = {'T', 'e', 's', 't'};
str.assign(test1); // BAD "Test2" (or some random extra characters)
str.assign(test1, 4); // GOOD "Test"
size_t len = strlen(test1); // BAD 5
char test2[4] = {'T', 'e', '\0', 't'};
str.assign(test2); // GOOD "Te"
str.assign(test2, 4); // BAD "Tet"
size_t len = strlen(test2); // GOOD 2
How can I assign a fixed length character array to a string correctly for both cases?
Use the "pair of iterators" form of assign.
str.assign(test1, std::find(test1, test1 + 4, '\0'));
Character buffers in C++ are either-or: either they are null terminated or they are not (and fixed-length). Mixing them in the way you do is thus not recommended. If you absolutely need this, there seems to be no alternative to manual copying until either the maximum length or a null terminator is reached.
for (char const* i = test1; i != test1 + length and *i != '\0'; ++i)
str += *i;
You want both NULL termination and fixed length? This is highly unusual and not recommended. You'll have to write your own function and push_back each individual character.
For the first case, when you do str.assign(test1) and str.assign(test2), you have to have /0 in your array, otherwise this is not a "char*" string and you can't assign it to std::string like this.
saw your serialization comment -- use std::vector<char>, std::array<char,4>, or just a 4 char array or container.
Your second 'bad' example - the one which prints out "Tet" - actually does work, but you have to be careful about how you check it:
str.assign(test2, 4); // BAD "Tet"
cout << "\"" << str << "\"" << endl;
does copy exactly four characters. If you run it through octal dump(od) on Linux say, using my.exe | od -c you'd get:
0000000 " T e \0 t " \n
0000007