C++ string and string literal comparison - c++

So I am trying to simply do a std::string == "string-literal" which would work just fine, except that I am creating my string with
std::string str(strCreateFrom, 0, strCreateFrom.find(' '));
and find returns string::npos now both of these contain the string "submit" however == returns false, now I have narrowed this down to the fact that the sizes are "different" even though they really aren't. str.size() is 7 and strlen("submit") is 6. Is this why == is failing, I assume it is but I don't see why... shouldn't it check to see if the last char of dif is \0 as is the case in this situation?
And is there anyway that I can get around this without having to using compare and specify the length to compare or change my string?
Edit:
std::string instruction(unparsed, 0, unparsed.find(' '));
boost::algorithm::to_lower(instruction);
for(int i = 0; i < instruction.size(); i++){
std::cout << "create from " << (int) unparsed[i] << std::endl;
std::cout << "instruction " << (int) instruction[i] << std::endl;
std::cout << "literal " << (int) "submit"[i] << std::endl;
}
std::cout << (instruction == "submit") << std::endl;
prints
create from 83
instruction 115
literal 115
create from 117
instruction 117
literal 117
create from 98
instruction 98
literal 98
create from 77
instruction 109
literal 109
create from 105
instruction 105
literal 105
create from 116
instruction 116
literal 116
create from 0
instruction 0
literal 0
0
EDIT:
For more clarification as to why I'm confused I read the basic_string.h header and saw this:
/**
* #brief Compare to a C string.
* #param s C string to compare against.
* #return Integer < 0, 0, or > 0.
*
* Returns an integer < 0 if this string is ordered before #a s, 0 if
* their values are equivalent, or > 0 if this string is ordered after
* #a s. Determines the effective length rlen of the strings to
* compare as the smallest of size() and the length of a string
* constructed from #a s. The function then compares the two strings
* by calling traits::compare(data(),s,rlen). If the result of the
* comparison is nonzero returns it, otherwise the shorter one is
* ordered first.
*/
int
compare(const _CharT* __s) const;
Which is called from operator== so I am trying to find out why the size dif matters.

I didn't quite understand your question more details may be needed, but you can use the c compare which shouldn't have issues with null termination counting.
You could use:
bool same = (0 == strcmp(strLiteral, stdTypeString.c_str());
strncmp also can be used to compare only a given number of chars in a char array
Or try to fix the creation of the stdstring
Your unparsed std::string is already bad. It already contains the extra null in the string, so what you should look at is how it is being created.
Like I mentioned before mystring[mystring.size() -1] is the last character not the terminating null so if you see a '\0' there like you do in your output it means the null is treated like part of the string.
Try to trace back your parsed input and keep making sure that mystring[mystring.size() -1] is not '\0'.
To answer your size diff question:
The two strings are not the same the literal is shorter and doesn't have a null.
Memory of std::string->c_str() [S,u,b,m,i,t,\0,\0] length = 7, memory size = 8;
Memory of literal [S,u,b,m,i,t,\0] length = 6, memory size = 7;
Compare stops comparing when it reaches the the terminating null in the literal but it uses the stored size for the std::string which is 7 seeing that literal terminated at 6 but the std is size 7 it will say that std is larger.
I think if you do the following it will return that the strings are the same (because it will create an std string with an extra null on the right side as well):
std::cout << (instruction == str("submit", _countof("submit"))) << std::endl;
PS: This is a common error made when taking a char* and making an std::string out of it, frequently just the array size itself is used, but that includes the terminating zero which std::string will add anyway. I believe that something like this is happening to your input somewhere and if you get add a -1 wherever that is everything will work as expected.

Related

In what case std::basic_string::find with a count argument greater than the string length can be useful?

One of the signatures of std::basic_string::find method is:
size_type find( const CharT* s, size_type pos, size_type count ) const;
The parameters are the following:
pos    - position at which to start the search
count - length of substring to search for
s         - pointer to a character string to search for
The description of the behavior of the method for this overload is:
Finds the first substring equal to the range [s, s+count). This range may contain null characters.
I would like to know in what case it can be useful to have a range that contain null characters. For instance:
s.find("A", 0, 2);
Here, s corresponds to a string with a length of 1. Because count is 2, the range [s, s+count) contains a null character. What is the point?
There is a false premise that you didn't spell out, but combining the title and the question it is:
The null character indicates the end of a std::string.
This is wrong. std::strings can contain null characters at any position. One has to be cautious with functions that expect a null-terminated c-string, but find is so nice that it explicitly reminds you that it also works in the general case.
C-Strings are null terminated, hence this:
std::string x("My\0str0ing\0with\0null\0characters");
std::cout << x.size() << '\n';
Prints: 2, ie only characters before the \0 are used to constuct the std::string.
However, this
std::string s("Hello world");
s[5] = '\0';
std::cout << s << '\n';
Prints Helloworld (because \0 is not printable). Also char arrays can contain \0 at any postition. Usually this is interpreted as the terminating character of the string. However, as std::strings can contain null characters at any position, it is just consistent to provide also an overload that takes pointer to a character array that can contain null characters in the middle. An example for the usage of that overload is (s is the string from above)
std::string f;
f.push_back('\0');
f.push_back('w');
std::cout << s.find(f.c_str()) << '\n';
std::cout << s.find("") << '\n';
std::cout << s.find(f.c_str(),0,2) << '\n';
Output:
0
0
5
The overload without the count parameter assumes a null terminated c-string, hence s.find(f.c_str()) is the same as s.find(""). Only with the overload that has the count paramter the substring \0w is found at index 5.

error : Vector subscript out of range error

I have this code in c++ and I used vectors but I got this error:
error: Vector subscript out of range error.
Can some help me in this issue.
int const TN = 4;
vector <uint32_t> totalBytesReceived(TN);
void ReceivePacket(string context, Ptr <const Packet> p)
{
totalBytesReceived[context.at(10)] += p->GetSize();
}
void CalculateThroughput()
{
double mbs[TN];
for (int f = 0; f<TN; f++)
{
// mbs = ((totalBytesReceived*8.0)/100000);
mbs[f] = ((totalBytesReceived[f] * 8.0) / 100000);
//totalBytesReceived =0;
rdTrace << Simulator::Now().GetSeconds() << "\t" << mbs[f] << "\n";
Simulator::Schedule(Seconds(0.1), &CalculateThroughput);
}
}
It seems like
totalBytesReceived[context.at(10)] += p->GetSize();
throws the exception because the char at position 10 of context is out of range. Since you use it to index the vector, it has to be in the range 0 to 3.
Looking at the content of context you posted:
"/NodeList/" 1 "/DeviceList/*/$ns3::WifiNetDevice/Mac/MacRx"
^ ^ ^
0 10 12
If you want to extract the 1 and use it as an index, you need to use:
char c = context.at(12); // Extract the char.
int index = c - '0'; // Convert the character '1' to the integer 1.
This is because of the ASCII standard which determines how characters are stored as numbers.
Probably the real issue is that you get the character '1' and use its ASCII value as index to the vector instead of the intended integer value 1.
This out of bounds access is then undefined behaviour, which in your case leads to an exception.
The following is not the cause, leaving it for reference:
The exception is probably coming from this expression:
context.at(10)
This is the only operation (*) involved that is actually performing bounds checking. The vector operator[] isn't doing that, neither does a C array check it's bounds.
So: Are you sure the string context is never shorter than 11 characters?
(*) Accessing a vector out of bounds is undefined behaviour, and throwing an exception is within the possible outcomes of that. Thanks to Beta Carotin and Benjamin Lindley for that.
This is the real thing:
Also note that a vector isn't resized like map when accessing an out of bounds index using operator[], so unless you can guarantee that the characters in the string are between 0 and 3 inclusive this will be your next issue.
And this means (size_t)0 and (size_t)3, not the characters '0' and '3'.

c++ dynamic allocation initial values

I'm trying to concatenate two strings into a new one (finalString) like this:
finalString = string1 + '&' + string2
Firstly, I allocate the memory for finalString, then i use strcat().
finalString = new char[strlen(string1 ) + strlen(string2) + 2];
cout << finalString << endl;
finalString = strcat(finalString , string1 );
finalString = strcat(finalString , "&");
finalString = strcat(finalString , string2);
cout << finalString << endl;
I'll suppose that string1 is "Mixt" and string2 is "Supermarket".
The output looks like this:
═════════════════řřřř //(which has 21 characters)
═════════════════řřřřMixt&Supermarket
I know that if I use round brackets in "new char" the string will be initialized to 0 and I'll get the desired result, but my question is why does the first output has 21 characters, supposing that I allocated only 17. And even so, why does the final string length exceed the initial allocation size (21 > 17) ?
Thanks in advance!
Two words for you "buffer overrun"
The reason you have 21 characters initially is because there is a '/0' (also called null) character 22 characters away from the memory address that finalString points to. This may or may not be consistent based on what is in your memory.
As for the reason why you have a longer than what you wanted again you wrote outside the initial buffer into random memory. You did not crash because you did not write over something important.
strcat will take the memory address given, find the first '/0' it finds and from that place on it will copy the data from the second memory pointer you provide until the first '/0' it finds there.
What you are doing is VERY DANGEROUS, if you do not hit a /0' before you hit something vital you will cause a crash or at minimum bad behavior.
Undersand in C/C++ a char[] is just a pointer to the initial memory location of the first element. THERE ARE NO SAFEGUARDS! You alone must be careful with that..
if you set the first character of the finalString[0] = 0 then you the logic will work better.
As a different answer, why not use std::string:
std::string a, b, c;
a = "part1";
b = "part2";
c = a + " & " + b;
std::cout << c << '\n';
part1 & part2
Live example: http://ideone.com/pjqz9T
It will make your life easier! You should always look to use stl types with c++.
If you really do need a char * then at the end you can do c.c_str().
Your string is not initialized which leads to undefined behavior. In strcat, string will be appended when it finds the null character.
So, as others already mentioned, either you can do
finalString[0] = 0;
or in place of your first strcat use strcpy. This will copy the first string and put a null character at the end.
why 21 characters?
This is due to undefined behavior. It will keep on printing until it won't find a null or else it will crash as soon as it tries to access any illegal memory.

array in C++ inside forloop

What is happening when i write array[i] = '\0' inside a for loop?
char arrayPin[256];
for(int i = 0; i<256; i++)
{
arrayPin[i] = '\0';
}
The program attempts to access memory at the location of <base address of 'array'> + (<sizeof array element> * 'i') and assign the value 0 to it (binary 0, not character '0'). This operation may or may not succeed, and may even crash the application, depending upon the state of 'array' and 'i'.
If your array is of type char* or char[] and the assignment operation succeeds, then inserting the binary 0 at position 'i' will truncate the string at that position when it is used with things that understand C-style strings (printf() being one example).
So if you do this in a for loop across the entire length of the string, you will wipe out any existing data in the string and cause it to be interpreted as an empty/zero-length string by things that process C-style strings.
char arrayPin[256];
After the line above, arrayPin in an uninitialized array whose contents are unknown (assuming it is not a global).
----------------------------
|?|?|?|?|?|?|?|?|?|?|...|? |
----------------------------
byte: 0 1 2 3 4 5 6 7 8 9 255
Following code:
for(int i = 0; i<256; i++)
{
arrayPin[i] = '\0';
}
initializes every arrayPin element to 0:
----------------------------
|0|0|0|0|0|0|0|0|0|0|...|0 |
----------------------------
byte: 0 1 2 3 4 5 6 7 8 9 255
I suppose you have something like char *array. In this case It will write character with the code 0x00 into ith position.
This is quite useful when you work with ANSI strings. \0 indicates the end of the string. For example:
char str[] = "Hello world";
cout << str << endl; // Output "Hello world"
str[5] = '\0';
cout << str << endl; // Output just "Hello"

Loop efficiency - C++

Beginners question, on loop efficiency. I've started programming in C++ (my first language) and have been using 'Principles and Practice Using C++' by Bjarne Stroustrup. I've been making my way through the earlier chapters and have just been introduced to the concept of loops.
The first exercise regarding loops asks of me the following:
The character 'b' is char('a'+1), 'c' is char('a'+2), etc. Use a loop to write out
a table of characters with their corresponding integer values:
a 97, b 98, ..., z 122
Although, I used uppercase, I created the following:
int number = 64; //integer value for # sign, character before A
char letter = number;//converts integer to char value
int i = 0;
while (i<=25){
cout << ++letter << "\t" << ++number << endl;
++i;
}
Should I aim for only having 'i' be present in a loop or is it simply not possible when converting between types? I can't really think of any other way the above can be done apart from having the character value being converted to it's integer counterpart(i.e. opposite of current method) or simply not having the conversion at all and have letter store '#'.
Following on from jk you could even use the letter itself in the loop (letter <= 'z'). I'd also use a for loop but that's just me.
for( char letter = 'a'; letter <= 'z'; ++letter )
std::cout << letter << "\t" << static_cast<int>( letter ) << std::endl;
You should aim for clarity first and you try to micro-optimize instead. You could better rewrite that as a for loop:
const int offsetToA = 65;
const int numberOfCharacters = 26;
for( int i = 0; i < numberOfCharacters; ++i ) {
const int characterValue = i + offsetToA;
cout << static_cast<char>( characterValue ) << characterValue << endl;
}
and you can convert between different types - that's called casting (the static_cast construct in the code above).
That's not a bad way to do it, but you can do it with only one loop variable like this:
char letter = 65;
while(letter <= 65+25){
printf("%c\t%d\n", letter, letter);
++letter;
}
there is nothing particularly inefficient about the way you are doing it but it certainly is possible to just convert between chars and ints (a char is an integer type). this would mean you only need to store 1 counter rather than the 3 (i, letter + number) you curently have
also, for looping from a fixed start to end a 'for' loop is perhaps more idiomatic (though its possible you havent met this yet!)
If you are concerned about the efficiency of your loop, I would urge you to try this:
Get this code compiled and running under an IDE, such as Visual Studio, and set a break point at the beginning. When you get there, switch to the disassembly view (instruction view) and start hitting the F11 (single-step) key, and keep a mental count of how many times you are hitting it.
You will see that it enters the loop, compares i against 25, and then starts doing the code for the cout line. That involves incrementing letter, and then going into the << routine for cout. It does a number of things in there, possibly going deeper into subroutines, etc., and finally comes back out, returning an object. Then it pushes "\t" as an argument and passes it to that object, and goes back in and does all the stuff it did before. Then it takes number, increments it, and passes it to the cout::<< routine that accepts an integer, calls a function to convert it to a string (which involves a loop), then does all the stuff it did before to loop that string into the output buffer and return.
Tired? You're not done yet. The endl has to be output, and when that happens, not only does it put "\n" in the buffer, but it calls the system routine to flush that buffer to the file or console where you are sending the I/O. You probably can't F11 into that, but rest assured it takes lots of cycles and doesn't return until the I/O is done.
By now, your F11-count should be in the vicinity of several thousand, more or less.
Finally, you come out and get to the ++i statement, which takes 1 or 2 instructions, and jumps back to the top of the loop to start the next iteration.
NOW, are you still worried about the efficiency of the loop?
There's an easier way to make this point, and it's just as instructive. Wrap an infinite loop around your entire code so it runs forever. While it's running, hit the "pause" button in the IDE, and look at the call stack. (This is called a "stackshot".) If you do this several times you get a good idea of how it spends time. Here's an example:
NTDLL! 7c90e514()
KERNEL32! 7c81cbfe()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
MSVCRTD! 1021bed3()
MSVCRTD! 1021bd59()
MSVCRTD! 10218833()
MSVCRTD! 1023a500()
std::_Fputc() line 42 + 18 bytes
std::basic_filebuf<char,std::char_traits<char> >::overflow() line 108 + 25 bytes
std::basic_streambuf<char,std::char_traits<char> >::sputc() line 85 + 94 bytes
std::ostreambuf_iterator<char,std::char_traits<char> >::operator=() line 304 + 24 bytes
std::num_put<char,std::ostreambuf_iterator<char,std::char_traits<char> > >::_Putc() line 633 + 32 bytes
std::num_put<char,std::ostreambuf_iterator<char,std::char_traits<char> > >::_Iput() line 615 + 25 bytes
std::num_put<char,std::ostreambuf_iterator<char,std::char_traits<char> > >::do_put() line 481 + 71 bytes
std::num_put<char,std::ostreambuf_iterator<char,std::char_traits<char> > >::put() line 444 + 44 bytes
std::basic_ostream<char,std::char_traits<char> >::operator<<() line 115 + 114 bytes
main() line 43 + 96 bytes
mainCRTStartup() line 338 + 17 bytes
I did this a bunch of times, and not ONCE did it stop in the code for the outer i<=25 loop. So optimizing that loop is like someone's great metaphor: "getting a haircut to lose weight".
Since no one else mentioned it: Having a fixed amount of iterations, this is also a candidate for post-condition iteration with do..while.
char letter = 'a';
do {
std::cout << letter << "\t" << static_cast<int>( letter ) << std::endl;
} while ( ++letter <= 'z' );
However, as shown in Patrick's answer the for idiom is often shorter (in number of lines in this case).
You can promote char to int...
//characters and their corresponding integer values
#include"../../std_lib_facilities.h"
int main()
{
char a = 'a';
while(a<='z'){
cout<<a<<'\t'<<a*1<<'\n'; //a*1 => char operand promoted to integer!
++a;
}
cout<<endl;
}
Incrementing three separate variables is probably a little confusing. Here's a possibility:
for (int i = 0; i != 26; ++i)
{
int chr = 'a' + i;
std::cout << static_cast<char>(chr) << ":\t" << chr << std::endl;
}
Note that using a for loop keeps all the logic of setting up, testing and incrementing the loop variable in one place.
At this point, I wouldn't worry about micro-optimizations such as an efficient way to write a small loop like this. What you have allows a for loop to do the job nicely, but if you are more comfortable with while, you should use that. But I am not sure if that is your question.
I don't think you have understood the question properly. You are writing the code, knowing that 'A' is 65. The whole point of the exercise is to print the value of 'A' to 'Z' on your system, without knowing what value they have.
Now, to get an integer value for a character c, you can do: static_cast<int>(c). I believe that is what you're asking.
I haven't written any code because it should be more fun for you to do so.
Question for the experts: In C, I know that 'a'...'z' need not have continuous values (same for 'A'...'Z'). Is the same true for C++? I would think so, but then it seems highly unlikely that Stroustrup's book assumes that.
thanks for the help.. all i wrote down was
int main()
{
char letter = 96;
int number = letter;
int i = 0;
while(i <26)
{
cout <<++letter <<":" <<++numbers <<" ";
++i;
}
works great...and pretty simple to understand now.
I've tried this and worked fine:
char a = 'a';
int i = a; //represent char a as an int
while (a <= 'z') {
cout << a << '\t' << i << '\n';
++a;
++i;
}
Programming Principles and Practice using C++ (2nd Edition) | Bjarne Stroustrup
Chapter 4 - Computation (Try this #3 - Character Loop)
The character 'b' is char('a'+1), 'c' is char('a'+2), etc. Use
a loop to write out a table of characters with their corresponding integer values:
a 97 b 98 . . . z 122
This is how I solved the problem (from 10 years ago :D)
I am a freshmen btw, so I just started reading this book now... just want to input my solution
#include <iostream>
using namespace std;
int main()
{
int i = 0;
while (i < 26) {
cout << char('a' + i) << '\t' << int(97 + i) << '\n';
++i;
}
}
I solved it by analyzing first the problem which is knowing the char value of 'a' which is 97 up to 'z'. According to this ASCII table
https://www.ascii-code.com/#:~:text=ASCII%20printable%20characters%20%28character%20code%2032-127%29%20Codes%2032-127,digits%2C%20punctuation%20marks%2C%20and%20a%20few%20miscellaneous%20symbols.
Now, we have a clearer understanding on how to solve the said problem.