Order of precedence in C++: & or ()? - c++

Provided that texts is an array of 3 strings, what's the difference between &texts[3] and (&texts)[3]?

The [] subscript operator has a higher precedence than the & address-of operator.
&texts[3] is the same as &(texts[3]), meaning the 4th element of the array is accessed and then the address of that element is taken. Assuming the array is like string texts[3], that will produce a string* pointer that is pointing at the 1-past-the-end element of the array, ie similar to an end iterator in a std::array or std::vector.
----------------------------
| string | string | string |
----------------------------
^
&texts[3]
(&texts)[3], on the other hand, takes the address of the array itself, producing a string(*)[3] pointer, and then increments that pointer by 3 whole string[3] arrays. So, again assuming string texts[3], you have a string(*)[3] pointer that is WAY beyond the end boundary of the array.
---------------------------- ---------------------------- ----------------------------
| string | string | string | | string | string | string | | string | string | string |
---------------------------- ---------------------------- ----------------------------
^ ^
&texts[3] (&texts)[3]

Related

Test if all characters in string are not alphanumeric

The string below is probably the result of bad API call:
_±êµÂ’¥÷“_¡“__‘_Ó ’¥Ï“ùü’ÄÛ“_« “_Ô“Ü“ù÷ “Ïã“_÷’¥Ï “µÏ“ÄÅ“ù÷ “Á¡ê±«“ùã ê¡Û“_ã “__’
I am not sure which rows contain non-alphanumeric characters and my task is to identify which rows are problematic.
Another problem is that some non-alphanumeric characters appear with strings that I would like to still keep and search, like:
This sentence is fine and searchable, but a few non-alphanumeric äóî donäó»t popup
Is there a way to test if the entire contents of a string are non-alphanumeric?
You can use a regular expression to find all rows with only standard alphabetic and numeric characters including commas, periods, exclamation and question marks as well as spaces:
clear
input str168 var1
"_±êµÂ’¥÷“_¡“__‘_Ó ’¥Ï“ùü’ÄÛ“_« “_Ô“Ü“ù÷ “Ïã“_÷’¥Ï “µÏ“ÄÅ“ù÷ “Á¡ê±«“ùã ê¡Û“_ã “__’"
"This sentence is fine and searchable, but a few non unicode äóî donäó»t popup"
" This is a regular sentence of course"
" another sentence, but with comma"
" but what happens with question marks?"
" or perhaps an exclamation mark!"
end
generate tag = ustrregexm(var1, "^[A-Za-z0-9 ,.?!]*$")
. list tag, separator(0)
+-----+
| tag |
|-----|
1. | 0 |
2. | 0 |
3. | 1 |
4. | 1 |
5. | 1 |
6. | 1 |
+-----+
Another possibility is to use a regular expression to exclude any rows that do not have any alphabetic and numeric characters, a solution which in this case covers both required cases:
clear
input str168 var1
"_±êµÂ’¥÷“_¡“__‘_Ó ’¥Ï“ùü’ÄÛ“_« “_Ô“Ü“ù÷ “Ïã“_÷’¥Ï “µÏ“ÄÅ“ù÷ “Á¡ê±«“ùã ê¡Û“_ã “__’"
"This sentence is fine and searchable, but a few non unicode äóî donäó»t popup"
" This is a regular sentence of course"
" another sentence, but with comma"
" but what happens with question marks?"
" or perhaps an exclamantion mark!"
"¥Ï“ùü’ÄÛ“_« “_Ô“Ü“ù÷ "
"¥Ï“ùü’ÄÛ hihuo"
end
generate tag = ustrregexm(var1, "^[^A-Za-z0-9]*$")
list tag, separator(0)
+-----+
| tag |
|-----|
1. | 1 |
2. | 0 |
3. | 0 |
4. | 0 |
5. | 0 |
6. | 0 |
7. | 1 |
8. | 0 |
+-----+

Puts and Gets in c++ [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I know what puts and gets do, but I don't understand the meaning of this code.
int main(void) {
char s[20];
gets(s); //Helloworld
gets(s+2);//dog
sort(s+1,s+7);
puts(s+4);
}
Could you please help me to understand?
Draw it on paper, along these lines.
At first, twenty uninitialised elements:
| | | | | | | | | | | | | | | | | | | | |
gets(s):
|H|e|l|l|o|w|o|r|l|d|0| | | | | | | | | |
gets(s+2):
|H|e|d|o|g|0|o|r|l|d|0| | | | | | | | | |
^
|
s+2
sort(s+1, s+7):
|H|0|d|e|g|o|o|r|l|d|0| | | | | | | | | |
^ ^
| |
s+1 s+7
puts(s+4):
|H|0|d|e|g|o|o|r|l|d|0| | | | | | | | | |
^
|
s+4
The best thing to say about the code is that it is very bad. Luckily, it is short but it is vulnerable, unmaintainable and error prone.
However, since the previous is not really an answer, let's go through the code, assuming the standard include files were used and "using namespace std;":
char s[20];
This declares an array of 20 characters with the intent of filling it with a null-terminated string. If somehow, the string becomes larger, you're in trouble
gets(s); //Helloworld
This reads in a string from stdin. No checks can be done on the size. The comment assumes it will read in Helloworld, which should fit in s.
gets(s+2);//dog
This reads in a second string from stdin, but it will overwrite the previous string starting from the third character. So if the comment is write, s will contain the null-terminated string "Hedog".
sort(s+1,s+7);
This will sort the characters in asserting ascii value from the second up to the seventh character. With the given input, we already have a problem that the null-character is on the sixth position so it will be part of the sorted characters and thus will be second, so the null-terminated string will be "H".
puts(s+4);
Writes out the string from the fifth position on, so until the null-charater that was read in for "Helloworld", but then overwritten and half-sorted. Of course input can be anything, so expect surprises.
gets(s); //Helloworld -- reads a string from keyboard to s
gets(s+2);//dog -- reads a string from keyboard to s started with char 2
sort(s+1,s+7); -- sorts s in interval [1, 7]
puts(s+4); -- writes to console s from char 4
gets(s); //Helloworld --> s=Helloworld
gets(s+2);//dog --> s=Hedog
sort(s+1,s+7); --> s=Hdego
puts(s+4); --> console=Hdego

Memory map of what happens when we use command line arguments? [duplicate]

This question already has answers here:
What does int argc, char *argv[] mean?
(12 answers)
Closed 6 years ago.
What I understand is argc holds total number of arguments. Suppose my program takes 1 argument apart from program name. Now what does argv hold? Two pointer eg: 123,130 or ./hello\0 and 5. If it holds 123 how does it know it has read one argument? Does it know because of \0.
If all the above is wrong, can someone help me understand using memory map.
The argv array is an array of strings (where each entry in the array is of type char*). Each of those char* arrays is, itself, NUL-terminated. The argv array, itself, does not need to end in NULL (which is why a separate argc variable is used to track the length of the argv array).
In terms of those arrays being constructed to begin with, this is dependent on the calling program. Typically, the calling program is a shell program (such as BASH), where arguments are separated via whitespace (with various quoting options available to allow arguments to include whitespace). Regardless of how the argc, argv parameters are constructed, the operating system provides routines for executing a program with this as the program inputs (e.g. on UNIX, that method is one of the various variations of exec, often paired with a call to fork).
To make this a bit more concrete, suppose you ran:
./myprog "arg"
Here is an example of how this might look in memory (using completely fake addresses):
Addresss | Value | Comment
========================
0058 | 2 | argc
0060 | 02100 | argv (value is the memory address of "argv[0]")
...
02100 | 02116 | argv[0] (value is the memory address of "argv[0][0]")
02104 | 02300 | argv[1] (value is the memory address of "argv[1][0]")
...
02116 | '.' | argv[0][0]
02117 | '/' | argv[0][1]
02118 | 'm' | argv[0][2]
02119 | 'y' | argv[0][3]
02120 | 'p' | argv[0][4]
02121 | 'r' | argv[0][5]
02122 | 'o' | argv[0][6]
02123 | 'g' | argv[0][7]
02124 | '\0' | argv[0][8]
...
02300 | 'a' | argv[1][0]
02301 | 'r' | argv[1][1]
02302 | 'g' | argv[1][2]
02303 | '\0' | argv[1][3]

Explaining a string trimming function

I came across the code below but need some help with understanding the code. Assume that the string s has spaces either side.
string trim(string const& s){
auto front = find_if_not(begin(s), end(s), isspace);
auto back = find_if_not(rbegin(s), rend(s), isspace);
return string { front, back.base() };
}
The author stated that back points to the end of the last space whereas the front points to the first non-white space character. So back.base() was called but I don't understand why.
Also what do the curly braces, following string in the return statement, represent?
The braces are the new C++11 initialisation.
.base() and reverse iterators
The .base() is to get back the the underlying iterator (back is a reverse_iterator), to properly construct the new string from a valid range.
A picture. Normal iterator positions of a string (it is a little more complex than this regarding how rend() works, but conceptually anyway...)
begin end
v v
-------------------------------------
| sp | sp | A | B | C | D | sp | sp |
-------------------------------------
^ ^
rend rbegin
Once your two find loops finish, the result of those iterators in this sequence will be positioned at:
front
v
-------------------------------------
| sp | sp | A | B | C | D | sp | sp |
-------------------------------------
^
back
Were we to take just those iterators and construct a sequence from them (which we can't, as they're not matching types, but regardless, supposed we could), the result would be "copy starting at A, stopping at D" but it would not include D in the resulting data.
Enter the back() member of a reverse iterator. It returns a non-reverse iterator of the forward iterator class, that is positioned at the element "next to" the back iterator; i.e.
front
v
-------------------------------------
| sp | sp | A | B | C | D | sp | sp |
-------------------------------------
^
back.base()
Now when we copy our range { front, back.base() } we copy starting at A and stopping at the first space (but not including it), thereby including the D we would have missed.
Its actually a slick little piece of code, btw.
Some additional checking
Added some basic checks to the original code.
In trying to keeping with the spirit of the original code (C++1y/C++14 usage), adding some basic checks for empty and white space only strings;
string trim_check(string const& s)
{
auto is_space = [](char c) { return isspace(c, locale()); };
auto front = find_if_not(begin(s), end(s), is_space);
auto back = find_if_not(rbegin(s), make_reverse_iterator(front), is_space);
return string { front, back.base() };
}

Will delete[] after strcpy cause memory leak?

char* myChar=new char[20];
char* myChar2="123";
strcpy(myChar, myChar2);
...
delete[] myChar;
My question is if strcpy puts a '\0' at the end of "123", then will delete[] myChar only delete the first 3 chars and fail to delete the rest of myChar?
Thank you...
No, delete [] deallocates all the memory allocated by new [] as long as you pass the same address to delete [] that was returned by new [].
It just correctly remembers how much memory was allocated irrespective of what is placed at that memory.
Your delete[] deallocates all of 20 chars, not only 3+1 that you really did use.
Delete doesn't look for "\n" while deleting a character string.
Rather the compiler looks for "\n" while allocating the memory-chunk for your string.
Hence, deleting both myChar, and myChar2 would hence work in exactly the same way, by looking at the size of memory-chunk that was actually allocated for the particular pointer. This emplies no memory leaks in your situation.
This is a fundamental aspect of C++ that needs understanding. It causes confusion that has its ground. Look a the example:
char* myChar1 = new char[20];
char* myChar2 = (char*)malloc(20);
In spite of the fact that both pointers have the same type, you should use different methods to release objects that they are pointing at:
delete [] myChar1;
free(myChar2);
Note that if you do:
char *tmp = myChar1;
myChar1 = myChar2;
myChar2 = myChar1;
After that you need:
delete [] myChar2;
free(myChar1);
You need to track the object itself (i.e. how it was allocated), not the place where you keep a pointer to this object. And release the object that you want to release, not the place that stores info about this object.
char* myChar=new char[20]; // you allocate 20 space for 20 chars
+-----------------+
myChar -> | x | x | ... | x | // x = uninitialized char
+-----------------+
char* myChar2="123";
+----------------+
myChar2 -> | 1 | 2 | 3 | \0 | // myChar2 points to string
+----------------+
strcpy(myChar, myChar2); // copy string to 20 char block
// strcpy copies char by char until it finds a \0 i.e. 4 chars
// in this case
+----------------------------------+
myChar -> | 1 | 2 | 3 | \0 | x | x | ... | x |
+----------------------------------+
// note that characters after the string 123\0 are
// still uninitialized
delete[] myChar;
// the whole 20 chars has been freed