Indexing of integer array through characters of a string - c++

I was doing a problem on dynamic programming. The problem was for printing distinct sub sequences from a given string. So I encounter something which was unknown to me. In that code elements of integer array were accessed via character of a string, (actually that was a vector of int type). So I tried to do the same thing in a new code. It was giving me some output. But I didn't understand that.
I have tried this code on my PC but couldn't understand the output. I want to know the logic behind the output and want to know whether indexing is possible through characters of a string.
#include<bits/stdc++.h>
using namespace std;
int main(){
string s;
cin>>s;
int* last = new int[1000];
for(int i=0;i<s.length();i++){
cout<<last[s[i]];
}
}
When I input something in it, lets say "abcdefgh", it will give me "00000000".
Why and what is this? I don't know what is expected output.

Let me explain this to you.
But first, some recommendation. If you do write real code and not for competitive programming, please do never use
#include<bits/stdc++.h>
using namespace std;
So, now for basic understanding. A string consists for your eyes of characters or letters. The computer, in its memory cannot store letters. It does only know bits and bytes. So, numbers.
There is a code for which number is associated with what character. One of them is ASCII
So, if the computer sends those numbers to an output or printing device, these numbers will be converted to some understandable letters.
In reality a string is an array of numbers. Just nicely wrapped for you. And, a string has an index operator []. If you say s[0], then it will give the first character to you, a number. You can check this by casting the character value to an integer. Simply try std::cout << static_cast<int>(s[0]);. And you will see a number.
Now you know that s[i] will give you a number. Then you have an array of int's: "last". And you use the index operator [k] to get the k'th element.
If you write last[s[i]], then, first the inner value is evaluated: s[i]. Let us assume that this was character 'A' which is equal to number 65. This results to last[65] and you will read the 66th element of last.
That is already important to understand.
Now to the array last. (By the way, never use ""arrays" or "new").
int* last = new int[1000];
"new" will allocate a contiguous memory area of 1000 int's (in nowadays computer systems 4000 bytes) on the heap, so somehwehre in the memory. Where it does that, is out of our control. And those values will have some (randdom) content. The memory area is not initialized. (In my opinion this is wrong, we shoud initialize everthing).
In your case, there are accidently many 0's in it, but sometimes also others.
And if you enter now the string "Hello", then this is equivalent to the numbers 72, 101, 108, 108, 111. With that, you will display, the 72nd, 101st, 108th, 108th and 111th integer value of the integer array last.
Hopy this make things clear.

Related

Write a function that crosses out up to n-1 signs of a n-letter word, and returns the amount of unique strings created that way

Let's say we have the string* abcd, which I will refer to as "word". The program should return the amount of unique strings* that are made by crossing out zero or more letters from "abcd". In this particular example, these unique strings* are "abcd", "abc", "abd", "acd", "bcd", "ab", "ac", "ad", "bc", "bd", "cd", "a", "b", "c", "d". Therefore, in this case, the program should return 15. Using vectors and strings in this assignment is forbidden, so I will have to use char[] arrays instead. The use of the word string* above is to avoid complicating the already complicated task. By string I mean char[].
So far my idea is to create arrays that store the strings with the same length. The number of such strings I find using the binomial formula. (For example the number of 3-letter strings from a 4-letter word is C(3,4). So in a for-loop I create the required arrays to accomodate the strings and add only those strings that aren't already in that array. Then I return the number of elements in the array.
//size is the size of the word, i is the number of crossed out letters
int total=0; //stores number of all possible little strings
for(int i=1; i<size; i++){
int sizeOfSubstring=binomial(size-i, size);
char substrings[sizeOfSubstring][size-i];
//populate the substrings array and return the number of char[] arrays added to it.
//Then add that number to total.
}
However, as you can clearly see, this problem is already complicated. I bypass the C++ requirement for constant array sizes by using GCC. But it gets even worse, when you have to populate the array of strings. For example, we need to add "abc", "abd", "acd", "bcd" to substrings[4][3]. And then the same procedure for substrings[6][2] etc. This will require a function like
void addSubstring(char crossedOutIndexes[], char word[], char substr[][]){
//I haven't implemented that yet
}
I am asking this question because I already have great difficulties with this problem and I don't know how to implement the addSubstring() function. Is this even the right idea to solve the problem?
I recommend first solving the problem without the complicated cases. For instance, assume your input will never have repeated letters. Then solve it manually on paper for a string of 2 characters, then 3, then 4, until you see a pattern develop as to an algorithm you manually use when doing it by hand. Then code that, and get it working. Once that's working. Solve the next problem, duplicates. Well there's a couple options there. Don't add a word if it's already in your list or remove all duplicates from the list.
As to variable length arrays, your initial solution of using a compiler that supports variable length arrays is fine for the first version. Another possibility is simply over allocating an array of strings. For instance an array of 10000 filled with empty strings. While memory inefficient, it's fine for learning. Once you have a working solution you can always move to a standard container like std::vector.
One thing that happens here is that learners often get great advice. But they're still learning so the advice just overwhelms them. There's nothing wrong with using a compiler that supports variable length arrays, it just locks you in to that tool.
I do recommend finding a development environment with a great debugger. One that will let you step through the code line by line and see what's happening. Visual Studio Community is the free one, that I'm familiar with. But I know there are others. I just don't know what they are.

C++ string copy() gives me extra characters at the end, why is that?

I am studying c++, In a blog they introduced the concept of copy function. When I tried the same in my system, the result is not matching to what I expected. Please let me know what I did wrong here in the below code.
#include <iostream>
main(){
std::string statement = "I like to work in Google";
char compName[6];
statement.copy(compName, 6, 18);
std::cout<<compName;
}
I expected Google but actual output is Googlex
I am using windows - (MinGW.org GCC-6.3.0-1)
You are confusing a sequence of characters, C style string, and std::string. Let's break them down:
A sequence of characters is just that, one character after another in some container (in your case a C style array). To a human being several characters may look like a string, but there is nothing in your code to make it such.
C style string is an array of characters terminated by a symbol \0. It is a carry over from C, as such a compiler will assume that if even if you don't tell it otherwise the array of characters may potentially be such a string.
C++ string (std::string) is a template class that stores strings. There is no need to worry how it does so internally. Although there are functions for interoperability with the first two categories, it is a completely different thing.
Now, let's figure out how a compiler sees your code:
char compName[6];
This creates an array of characters with enough space to store 6 symbols. You can write C style strings into it as long as they are 5 symbols or less, since you will need to also write '\0' at the end. Since in C++ C style arrays are unsafe, they will allow you to write more characters into them, but you cannot predict in advance where those extra characters will be written into memory (or even if your program will continue to execute). You can also potentially read more characters from the array... But you cannot even ask the question where that data will be coming from, unless you are simply playing around with your compiler. Never do that in your code.
statement.copy(compName, 6, 18);
This line writes 6 characters. It does not make it into a C style string, it is simply 6 characters in an array.
std::cout<<compName;
You are trying to output to the console a C style string... which you have not provided to a compiler. So a an operator<< receives a char [], and it assumes that you knew what you were doing and works as if you gave it C string. It displays one character after another until it reaches '\0'. When will it get such a character? I have no idea, since you never gave it one. But due to C style arrays being unsafe, it will have no problem trying to read characters past the end of an array, reading some memory blocks and thinking that they are a continuation of your non-existent C style sting.
Here you got "lucky" and you only got a single byte that appeared as an 'x', and then you got a byte with 0 written in it, and the output stopped. If you run your program at a different time, with a different compiler, or compiled with different optimisations you might get a completely different data displayed.
So what should you have done?
You can try this:
#include <iostream>
#include <string>
int main()
{
std::string statement = "I like to work in Google";
char compName[7]{};
statement.copy(compName, 6, 18);
std::cout<<compName;
return 0;
}
What did i change? I made an array able to hold 7 characters (leaving enough space for a C style string of 6 characters) and i have provided an empty initialisation list {}, which will fill the array with \0 characters. This means that when you will replace the first 6 of them with your data, there will be a terminating character in the very end.
Another approach would be to do this:
#include <iostream>
#include <string>
int main()
{
std::string statement = "I like to work in Google";
char compName[7];
auto length = statement.copy(compName, 6, 18);
compName[length] = '\0';
std::cout<<compName;
return 0;
}
Here i do not initialise the array, but i get the length of the data that is written there with a .copy method and then add the needed terminator in the correct position.
What approach is best depends on your particular application.
When inserting pointer to a character into the stream insertion operator, the pointer is required to point to null terminated string.
compName does not contain the null terminator character. Therefore inserting inserting (a pointer to an element of) it into a character stream violates the requirement above.
Please let me know what I did wrong here
You violate the requirement above. As a consequence, the behaviour of your program is undefined.
I expected Google but actual output is Googlex
This is because the behaviour of the program is undefined.
How to terminate it?
Firstly, make sure that there is room in the array for the null terminator character:
char compName[7];
Then, assign the null terminator character:
compName[6] = '\0';

What is the advantage of using gets(a) instead of cin.getline(a,20)?

We will have to define an array for storing the string either way.
char[10];
And so suppose I want to store smcck in this array. What is the advantage of using gets(a)? My teacher said that the extra space in the array is wasted when we use cin.getline(a, 20), but that applies for gets(a) too right?
Also just an extra question, what exactly is stored in the empty "boxes"of an array?
gets() is a C function,it does not do bounds checking and is considered dangerous, it has been kept all this years for compatibility and nothing else.
You can check the following link to clear your doubt :
http://www.gidnetwork.com/b-56.html
Don't mix C features with C++, though all the feature of C works in C++ but it is not recommended . If you are working on C++ then you should probably avoid using gets(). use getline() instead.
Well, I don't think gets(a) is bettet because it does not check for the size of the string. If you try to read a long string using it, it may cause an buffer overflow. That means it will use all the 10 spaces you allocated for it and then it will try to use space allocated for another variables or another programs (what is going to make you publication crash).
The cin.getline() receives an int as a parameter with tells it to not read more than the expected number of characters. If you allocate a vector with only 10 positions and read 20 characters it will cause the same problem I told you about gets().
About the strings representation in memory, if you put "smcck" on an array
char v[10];
The word will take the first 5 positions (0 to 4), the position 5 will be taken by a null character (represented by '\0') that will mark the end of the string. Usually, what comes next in the array does not matter and are kept the way it were in the past.the null terminated character is used to mark where the string ends, so you can work it safely.

What is the difference between "length" and "size" of a sequence?

I'm currently doing c++ stack problems and I am having trouble understanding the meaning of these two instructions. Can someone help explain to me what the difference length and size is in this context?
Read in a sequence of positive integers from the keyboard, one per line, and terminated by any negative integer;
Output a blank line, followed by a line with the length of the sequence, followed by a line with the sum of the values in the sequence, followed by another blank line; To determine the size of the sequence you must use the stack size function;
Here's what I think it means:
2 // one sequence?
3 // second sequence?
4 // third sequence?
length of sequence: 3?
Sum: 9
Stack Size: 3??? // isn't stack size just the size of sequence? confused?
IMHO, you're supposed to read integers into std::stack<int> in a for (or while or do-while) loop terminating with a negative input. Then you shall print to stdout the length=size of the sequence (the number of elements) as given by std::stack::size() and next the sum, which you may also compute using some std functionality.
I think the assignment is pretty clear, but perhaps your ability of reading & understanding plain English could be improved?
Based on how I understand the assignment, size is used to mean length.
The length of a C++ container is the number of elements, not its storage capacity.

What does it mean to be "terminated by a zero"?

I am getting into C/C++ and a lot of terms are popping up unfamiliar to me. One of them is a variable or pointer that is terminated by a zero. What does it mean for a space in memory to be terminated by a zero?
Take the string Hi in ASCII. Its simplest representation in memory is two bytes:
0x48
0x69
But where does that piece of memory end? Unless you're also prepared to pass around the number of bytes in the string, you don't know - pieces of memory don't intrinsically have a length.
So C has a standard that strings end with a zero byte, also known as a NUL character:
0x48
0x69
0x00
The string is now unambiguously two characters long, because there are two characters before the NUL.
It's a reserved value to indicate the end of a sequence of (for example) characters in a string.
More correctly known as null (or NUL) terminated. This is because the value used is zero, rather than being the character code for '0'. To clarify the distinction check out a table of the ASCII character set.
This is necessary because languages like C have a char data type, but no string data type. Therefore it is left to the devleoper to decide how to manage strings in their application. The usual way of doing this is to have an array of chars with a null value used to terminate (i.e. signify the end of) the string.
Note that there is a distinction between the length of the string, and the length of the char array that was originally declared.
char name[50];
This declares an array of 50 characters. However, these values will be uninitialised. So if I want to store the string "Hello" (5 characters long) I really don't want to bother setting the remaining 45 characters to spaces (or some other value). Instead I store a NUL value after the last character in my string.
More recent languages such as Pascal, Java and C# have a specific string type defined. These have a header value to indicate the number of characters in the string. This has a couple of benefits; firstly you don't need to walk to the end of the string to find out its length, secondly your string can contain null characters.
Wikipedia has further information in the String (computer science) entry.
Arrays and string in C is just a pointers to a memory location. By pointer you can find a start of array. The end of array is undefined. The end of character array (which is the string) is zero-byte.
So, in memory string hello is written as:
68 65 6c 6c 6f 00 |hello|
It refers to how C strings are stored in memory. The NUL character represented by \0 in string iterals is present at the end of a C string in memory. There is no other meta data associated with a C string like length for example. Note the different spelling between NUL character and NULL pointer.
There are two common ways to handle arrays that can have varying-length contents (like Strings). The first is to separately keep the length of the data stored in the array. Languages like Fortran and Ada and C++'s std::string do this. The disadvantage to doing this is that you somehow have to pass that extra information to everything that is dealing with your array.
The other way, is to reserve an extra non-data element at the end of the array to serve as a sentinel. For the sentinel you use a value that should never appear in the actual data. For strings, 0 (or "NUL") is a good choice, as that is unprintable and serves no other purpose in ASCII. So what C (and many languages copied from C) do is to assume that all strings end (or "are terminated by") a 0.
There are several drawbacks to this. For one thing, it is slow. Any time a routine needs to know the length of the string, it is an O(n) operation (searching through the entire string looking for the 0). Another problem is that you may one day want to put a 0 in your string for some reason, so now you need a whole second set of string routines that ignore the null and use a separate length anyway (eg: strnlen() ). The third big problem is that if someone forgets to put that 0 at the end (or it gets wiped out somehow), the next string operation to do a lenth check will go merrily marching through memory until it either happens to randomly find another 0, crashes, or the user loses patience and kills it. Such bugs can be a serious PITA to track down.
For all these reasons, the C approach is generally viewed with disfavor.
C-style strings are terminated by a NUL character ('\0'). This provides a marker for functions that operate on strings (e.g. strlen, strcpy) to use to identify the end of the string.
While the classic example of "terminated by a zero" is that of strings in C, the concept is more general. It can be applied to any list of things stored in an array, the size of which is not known explicitly.
The trick is simply to avoid passing around an array size by appending a sentinel value to the end of the array. Typically, some form of a zero is used, but it can be anything else (like a NAN if the array contains floating point values).
Here are three examples of this concept:
C strings, of course. A single zero character is appended to the string: "Hello" is encoded as 48 65 6c 6c 6f 00.
Arrays of pointers naturally allow zero termination, because the null pointer (the one that points to address zero) is defined to never point to a valid object. As such, you might find code like this:
Foo list[] = { somePointer, anotherPointer, NULL };
bar(list);
instead of
Foo list[] = { somePointer, anotherPointer };
bar(sizeof(list)/sizeof(*list), list);
This is why the execvpe() only needs three arguments, two of which pass arrays of user defined length. Since all that's passed to execvpe() are (possibly lots of) strings, this little function actually sports two levels of zero termination: null pointers terminating the string lists, and null characters terminating the strings themselves.
Even when the element type of the array is a more complex struct, it may still be zero terminated. In many cases, one of the struct members is defined to be the one that signals the end of the list. I have seen such function definitions, but I can't unearth a good example of this right now, sorry. Anyway, the calling code would look something like this:
Foo list[] = {
{ someValue, somePointer },
{ anotherValue, anotherPointer },
{ 0, NULL }
};
bar(list);
or even
Foo list[] = {
{ someValue, somePointer },
{ anotherValue, anotherPointer },
{} //C zeros out an object initialized with an empty initializer list.
};
bar(list);