I recently migrated from C to C++, and there's a little confusion about strings. Strings just aren't what they used to be any more, as in, not just char arrays with a terminating '\0'.
I haven't found a real answer to this question, so how far can you treat the std::string class like C-Strings?
For example: If I know there's a number somewhere in a string, let the string be ireallylike314, in C I could use strtol(string + 10, NULL, 10) to just get that number.
And, if this doesn't work, is there a way to use std::string like C-strings?
Use c_str().
strtol(string.c_str() + 10, NULL, 10);
If you want to get C-style string from std::string, then as mentioned use c_str() method. But another solution to this specific problem would be just using stol instead of strtol.
While stol doesn't (in itself) support what you want, I think I'd use it in conjunction with substr to get the required result:
std::string in = "ireallylike314";
// extract number and print it out multiplied by 2 to show we got a number
std::cout << 2 * stol(in.substr(11));
Result:
628
This has both good and bad points though. On the bad side, it creates a whole new string object to hold the digits out of the input string. On the good side, it gives a little more control over the number of digits to convert, so if (for example) you only wanted to convert the first two digits from the string (even if, as in this case, they're followed by more digits) you can do that pretty easily too:
std::cout << 2 * stol(in.substr(11, 2));
Result:
62
In quite a few cases, the degree to which this is likely to be practical for you will depend heavily upon whether your implementation includes the short string optimization. If it does, creating a (small) string is often cheap enough to make this perfectly reasonable. If it doesn't, the heap allocation to create the temporary string object as the return value from substr may be a higher price than you want to pay.
The C-like way:
long n = std::strtol( string.c_str() + offset, nullptr, 10 );
// sets ERRNO on error and returns value by saturating arithmetic.
The Java-ish way:
long n = std::stol( string.substr( offset, std::string::npos ) );
// exception (no return value) and perhaps ERRNO is set on error.
The streams way:
long n = 0;
std::istringstream( string ).ignore( offset ) >> n;
// n is unmodified on error
The locales way:
long n = 0;
std::ios_base fmt; // Use default formatting: base-10 only.
std::ios::iostate err = {};
std::num_get< char, std::string::iterator >()
.get( string.begin() + offset, string.end(), fmt, err, n );
// err is set to std::ios::failbit on error
This is maybe beyond the scope of the question but since you are migrating to C++ and you seem confused about std::string, you'll likely find the following useful.
The point of having std::string is not to use it like C-Strings (ofc you can do it, like the previous answers showed). You can take a lot more advantages of std::string capabilities. For example it is a C++ container, there are functions to get substrings, to compare strings, etc ...
String manipultions are generally a lot easier with std::string rather than C-Strings.
See for example http://www.cplusplus.com/reference/string/string/ for its capabilities.
Strings just aren't what they used to be any more, as in, not just
char arrays with a terminating '\0'.
You are wrong. In C++ strings are defined the same way. In both languages strings are defined like
A string is a contiguous sequence of characters terminated by and
including the first null character.
You mix strings with class std::string (or std::basic_string) that are not the same.
For example: If I know there's a number somewhere in a string, let the
string be ireallylike314, in C I could use strtol(string[10], NULL,
10) to just get that number
You are mistaken. The valid function call will look like
strtol( &string[11], NULL, 10)
or
strtol( string + 11, NULL, 10)
The same function you can call for an object of class std::string using member function c_str() or (starting from C++ 2011) data()
For example
std::string s( "ireallylike314" );
auto x = std::strtol( s.c_str() + 11, NULL, 10 );
or
auto x = std::strtol( s.data() + 11, NULL, 10 );
Related
This is an old problem, which I have observed in past. So thought of getting a clarification once & for all. There are many standard / orthodox C library functions, which deal only with C-style strings. For example, my current implementation looks like this:
std::string TimeStamp (const time_t seconds) // version-1
{
auto tm = *std::localtime(&seconds); // <ctime>
char readable[30] = {};
std::strftime(&readable[0], sizeof(readable) - 1, "%Y-%h-%d %H:%M:%S:", &tm);
return readable;
}
Above works as expected. But as you can see, that the readable is copied from stack array to std::string. Now this function is called very frequently for logging & other purposes.
Hence, I converted it to following:
std::string TimeStamp (const time_t seconds) // version-2
{
auto tm = *std::localtime(&seconds); // <ctime>
std::string readable(30,0);
std::strftime(&readable[0], readable.length(), "%Y-%h-%d %H:%M:%S:", &tm);
return readable;
}
At unit test level, it apparently seems to work. But for overall logging in my much larger code, it somehow gets messed up. A new line character appears after this output & many of the output strings which are called outside this function are not printed. Such issue happens only when the "version-1" is changed to "version-2".
Even following modification also doesn't help:
readable.resize(1 + std::strftime(&readable[0], readable.length(), "%Y-%h-%d %H:%M:%S:", &tm));
Is there anything wrong in my code? What is the correct way of directly using std::string in the C-style string functions?
Your first function is correct. There is no point mucking around with the troublesome details in the second function because even once you get it right, it is no improvement on the first function.
In fact it might even perform worse, because of the need to over-allocate the string and resize it down. For example perhaps the size 30 exceeds the size for Small String Optimization but the actual length of data doesn't.
std::string can have \0 in it.
so
std::string s1 = "ab\0\0cd"; // s1 contains "ab" -> size = 2
std::string s2{"ab\0\0cd", 6}; // s2 contains "ab\0\0cd" -> size = 6
Your first snippet use constructor 1 whereas the second is similar to the second one (string of size 30 filled with \0).
So you have to resize correctly your string to avoid trailling \0.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to convert a single char into an int
Well, I'm doing a basic program, wich handles some input like:
2+2
So, I need to add 2 + 2.
I did something like:
string mys "2+2";
fir = mys[0];
sec = mys[2];
But now I want to add "fir" to "sec", so I need to convert them to Int.
I tried "int(fir)" but didn't worked.
There are mulitple ways of converting a string to an int.
Solution 1: Using Legacy C functionality
int main()
{
//char hello[5];
//hello = "12345"; --->This wont compile
char hello[] = "12345";
Printf("My number is: %d", atoi(hello));
return 0;
}
Solution 2: Using lexical_cast(Most Appropriate & simplest)
int x = boost::lexical_cast<int>("12345");
Solution 3: Using C++ Streams
std::string hello("123");
std::stringstream str(hello);
int x;
str >> x;
if (!str)
{
// The conversion failed.
}
Alright so first a little backround on why what you attempted didn't work. In your example, fir is declared as a string. When you attempted to do int(fir), which is the same as (int)fir, you attempted a c-style cast from a string to an integer. Essentially you will get garbage because a c-style cast in c++ will run through all of the available casts and take the first one that works. At best your going to get the memory value that represents the character 2, which is dependent upon the character encoding your using (UTF-8, ascii etc...). For instance, if fir contained "2", then you might possibly get 0x32 as your integer value (assuming ascii). You should really never use c-style casts, and the only place where it's really safe to use them are conversions between numeric types.
If your given a string like the one in your example, first you should separate the string into the relevant sequences of characters (tokens) using a function like strtok. In this simple example that would be "2", "+" and "2". Once you've done that you can simple call a function such as atoi on the strings you want converted to integers.
Example:
string str = "2";
int i = atoi(str.c_str()); //value of 2
However, this will get slightly more complicated if you want to be able to handle non-integer numbers as well. In that case, your best bet is to separate on the operand (+ - / * etc), and then do a find on the numeric strings for a decimal point. If you find one you can treat it as a double and use the function atof instead of atoi, and if you don't, just stick with atoi.
Have you tried atoi or boost lexical cast?
What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?
EDIT: I am interested in character arrays only and not in STL string.
Typically as another poster wrote:
char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};
or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)
char s[6] = {115, 116, 114, 105, 110, 107};
There is also a largely ignored way that works only in C (not C++)
char s[6] = "string";
If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).
Obviously you can also do it at run time:
char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';
or (same remark on ASCII charset as above)
char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;
Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).
memcpy(c, "string", 6);
or strncpy
strncpy(c, "string", 6);
What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.
As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.
What I mean is (for example) that you don't have to do
char c = '\0';
To store a code 0 in a char, just do:
char c = 0;
As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.
"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a n letter in their name like strncpy).
The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.
Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.
The terminating null is there to terminate the string. Without it, you need some other method to determine it's length.
You can use a predefined length:
char s[6] = {'s','t','r','i','n','g'};
You can emulate pascal-style strings:
unsigned char s[7] = {6, 's','t','r','i','n','g'};
You can use std::string (in C++). (since you're not interested in std::string).
Preferably you would use some pre-existing technology that handles unicode, or at least understands string encoding (i.e., wchar.h).
And a comment: If you're putting this in a program intended to run on an actual computer, you might consider typedef-ing your own "string". This will encourage your compiler to barf if you ever accidentally try to pass it to a function expecting a C-style string.
typedef struct {
char[10] characters;
} ThisIsNotACString;
C++ std::strings are not NUL terminated.
P.S : NULL is a macro1. NUL is \0. Don't mix them up.
1: C.2.2.3 Macro NULL
The macro NULL, defined in any of <clocale>, <cstddef>, <cstdio>, <cstdlib>, <cstring>,
<ctime>, or <cwchar>, is an implementation-defined C++ null pointer constant in this International
Standard (18.1).
In C++ you can use the string class and not deal with the null char at all.
Just for the sake of completeness and nail this down completely.
vector<char>
Use std::string.
There are dozens of other ways to store strings, but using a library is often better than making your own. I'm sure we could all come up with plenty of wacky ways of doing strings without null terminators :).
In C there generally won't be an easier solution. You could possibly do what pascal did and put the length of the string in the first character, but this is a bit of a pain and will limit your string length to the size of the integer that can fit in the space of the first char.
In C++ I'd definitely use the std::string class that can be accessed by
#include <string>
Being a commonly used library this will almost certainly be more reliable than rolling your own string class.
The reason for the NULL termination is so that the handler of the string can determine it's length. If you don't use a NULL termination, you need to pass the strings length, either through a separate parameter/variable, or as part of the string. Otherwise, you could use another delimeter, so long as it isn't used within the string itself.
To be honest, I don't quite understand your question, or if it actually is a question.
Even the string class will store it with a null. If for some reason you absolutely do not want a null character at the end of your string in memory, you'd have to manually create a block of characters, and fill it out yourself.
I can't personally think of any realistic scenario for why you'd want to do this, since the null character is what signals the end of the string. If you're storing the length of the string too, then I guess you've saved one byte at the cost of whatever the size of your variable is (likely 4 bytes), and gained faster access to the length of said string.
OWASP says:
"C library functions such as strcpy
(), strcat (), sprintf () and vsprintf
() operate on null terminated strings
and perform no bounds checking."
sprintf writes formatted data to string
int sprintf ( char * str, const char * format, ... );
Example:
sprintf(str, "%s", message); // assume declaration and
// initialization of variables
If I understand OWASP's comment, then the dangers of using sprintf are that
1) if message's length > str's length, there's a buffer overflow
and
2) if message does not null-terminate with \0, then message could get copied into str beyond the memory address of message, causing a buffer overflow
Please confirm/deny. Thanks
You're correct on both problems, though they're really both the same problem (which is accessing data beyond the boundaries of an array).
A solution to your first problem is to instead use std::snprintf, which accepts a buffer size as an argument.
A solution to your second problem is to give a maximum length argument to snprintf. For example:
char buffer[128];
std::snprintf(buffer, sizeof(buffer), "This is a %.4s\n", "testGARBAGE DATA");
// std::strcmp(buffer, "This is a test\n") == 0
If you want to store the entire string (e.g. in the case sizeof(buffer) is too small), run snprintf twice:
int length = std::snprintf(nullptr, 0, "This is a %.4s\n", "testGARBAGE DATA");
++length; // +1 for null terminator
char *buffer = new char[length];
std::snprintf(buffer, length, "This is a %.4s\n", "testGARBAGE DATA");
(You can probably fit this into a function using va or variadic templates.)
Both of your assertions are correct.
There's an additional problem not mentioned. There is no type checking on the parameters. If you mismatch the format string and the parameters, undefined and undesirable behavior could result. For example:
char buf[1024] = {0};
float f = 42.0f;
sprintf(buf, "%s", f); // `f` isn't a string. the sun may explode here
This can be particularly nasty to debug.
All of the above lead many C++ developers to the conclusion that you should never use sprintf and its brethren. Indeed, there are facilities you can use to avoid all of the above problems. One, streams, is built right in to the language:
#include <sstream>
#include <string>
// ...
float f = 42.0f;
stringstream ss;
ss << f;
string s = ss.str();
...and another popular choice for those who, like me, still prefer to use sprintf comes from the boost Format libraries:
#include <string>
#include <boost\format.hpp>
// ...
float f = 42.0f;
string s = (boost::format("%1%") %f).str();
Should you adopt the "never use sprintf" mantra? Decide for yourself. There's usually a best tool for the job and depending on what you're doing, sprintf just might be it.
Yes, it is mostly a matter of buffer overflows. However, those are quite serious business nowdays, since buffer overflows are the prime attack vector used by system crackers to circumvent software or system security. If you expose something like this to user input, there's a very good chance you are handing the keys to your program (or even your computer itself) to the crackers.
From OWASP's perspective, let's pretend we are writing a web server, and we use sprintf to parse the input that a browser passes us.
Now let's suppose someone malicious out there passes our web browser a string far larger than will fit in the buffer we chose. His extra data will instead overwrite nearby data. If he makes it large enough, some of his data will get copied over the webserver's instructions rather than its data. Now he can get our webserver to execute his code.
Your 2 numbered conclusions are correct, but incomplete.
There is an additional risk:
char* format = 0;
char buf[128];
sprintf(buf, format, "hello");
Here, format is not NULL-terminated. sprintf() doesn't check that either.
Your interpretation seems to be correct. However, your case #2 isn't really a buffer overflow. It's more of a memory access violation. That's just terminology though, it's still a major problem.
The sprintf function, when used with certain format specifiers, poses two types of security risk: (1) writing memory it shouldn't; (2) reading memory it shouldn't. If snprintf is used with a size parameter that matches the buffer, it won't write anything it shouldn't. Depending upon the parameters, it may still read stuff it shouldn't. Depending upon the operating environment and what else a program is doing, the danger from improper reads may or may not be less severe than that from improper writes.
It is very important to remember that sprintf() adds the ASCII 0 character as string terminator at the end of each string. Therefore, the destination buffer must have at least n+1 bytes (To print the word "HELLO", a 6-byte buffer is required, NOT 5)
In the example below, it may not be obvious, but in the 2-byte destination buffer, the second byte will be overwritten by ASCII 0 character. If only 1 byte was allocated for the buffer, this would cause buffer overrun.
char buf[3] = {'1', '2'};
int n = sprintf(buf, "A");
Also note that the return value of sprintf() does NOT include the null-terminating character. In the example above, 2 bytes were written, but the function returns '1'.
In the example below, the first byte of class member variable 'i' would be partially overwritten by sprintf() (on a 32-bit system).
struct S
{
char buf[4];
int i;
};
int main()
{
struct S s = { };
s.i = 12345;
int num = sprintf(s.buf, "ABCD");
// The value of s.i is NOT 12345 anymore !
return 0;
}
I pretty much have stated a small example how you could get rid of the buffer size declaration for the sprintf (if you intended to, of course!) and no snprintf envolved ....
Note: This is an APPEND/CONCATENATION example, take a look at here
I am stumped by the behaviour of the following in my Win32 (ANSI) function:
(Multi-Byte Character Set NOT UNICODE)
void sOut( HWND hwnd, string sText ) // Add new text to EDIT Control
{
int len;
string sBuf, sDisplay;
len = GetWindowTextLength( GetDlgItem(hwnd, IDC_EDIT_RESULTS) );
if(len > 0)
{
// HERE:
sBuf.resize(len+1, 0); // Create a string big enough for the data
GetDlgItemText( hwnd, IDC_EDIT_RESULTS, (LPSTR)sBuf.data(), len+1 );
} // MessageBox(hwnd, (LPSTR)sBuf.c_str(), "Debug", MB_OK);
sDisplay = sBuf + sText;
sDisplay = sDisplay + "\n\0"; // terminate the string
SetDlgItemText( hwnd, IDC_EDIT_RESULTS, (LPSTR)sDisplay.c_str() );
}
This should append text to the control with each call.
Instead, all string concatenation fails after the call to GetDlgItemText(), I am assuming because of the typecast?
I have used three string variables to make it really obvious. If sBuf is affected then sDisplay should not be affected.
(Also, why is len 1 char less than the length in the buffer?)
GetDlgItemText() corretly returns the content of the EDIT control, and SetDlgItemText() will correctly set any text in sDisplay, but the concatenation in between is just not happening.
Is this a "hidden feature" of the string class?
Added:
Yes it looks like the problem is a terminating NUL in the middle. Now I understand why the len +1. The function ensures the last char is a NUL.
Using sBuf.resize(len); will chop it off and all is good.
Added:
Charles,
Leaving aside the quirky return length of this particular function, and talking about using a string as a buffer:
The standard describes the return value of basic_string::data() to be a pointer to an array whose members equal the elements of the string itself.
That's precisely what's needed isn't it?
Further, it requires that the program must not alter any of the values of that array.
As I understand it that is going to change along with the guarantee that all bytes are contiguous. I forget where I read a long article on this, but MS already implements this it asserted.
What I don't like about using a vector is that the bytes are copied twice before I can return them: once into the vector and again into the string. I also need to instantiate a vector object and a string object. That is a lot of overhead. If there were some string friendly of working with vectors (or CStrings) without resorting to old C functions or sopying characters one by one, I would use them. The string is very syntax friendly in that way.
The data() function on a std::string returns a const char*. You are not allowed to right into the buffer returned by it, it may be a duplicated buffer.
What you could do instead is to used a std::vector<char> as a temporary buffer.
E.g. (untested)
std::vector<char> sBuf( len + 1 );
GetDlgItemText( /* ... */, &sBuf[0], len + 1 );
std::string newText( &sBuf[0] );
newText += sText;
Also, the string you pass to SetDlgItemText should be \0 terminated so you should used c_str() not data() for this.
SetDlgItemText( /* ... */, newText.c_str() );
Edit:
OK, I've just checked the contract for GetWindowTextLength and GetDlgItemText. Check my edits above. Both will include the space for a null terminator so you need to chop it off the end of your string otherwise concatenation of the two strings will include a null terminator in the middle of the string and the SetDlgItemText call will only use the first part of the string.
There is a further complication in that GetWindowTextLength isn't guaranteed to be accurate, it only guarantees to return a number big enough for a program to create a buffer for storing the result. It is extremely unlikely that this will actually affect a dialog box item owned by the calling code but in other situations the actual text may be shorter than the returned length. For this reason you should search for the first \0 in the returned text in any case.
I've opted to just use the std::string constructor that takes a const char* so that it finds the first \0 correctly.
The standard describes the return value of basic_string::data() to be a pointer to an array whose members equal the elements of the string itself. Further, it requires that the program must not alter any of the values of that array. This means that the return value of data() may or may not be a copy of the string's internal representation and even if it isn't a copy you still aren't allowed to write to it.
I am far away from the win32 api and their string nightmare, but there is something in the code that you can check. Standard C++ strings do not need to be null terminated and nulls can happen anywhere within the string. I won't comment on the fact that you are casting away constantness with your C-style cast, which is a problem on its own, but rather on the strange effect you are
When you initially create the string you allocate extra space for the null (and initialize all elements to '\0') and then you copy the elements. At that point your string is len+1 in size and the last element is a null. After that you append some other string, and what you get is a string that will still have a null character at position len. When you retrieve the data with either data() (does not guarantee null termination!) or c_str() the returned buffer will still have the null character at len position. If that is passed to a function that stops on null (takes a C style string), then even if the string is complete, the function will just process the first len characters and forget about the rest.
#include <string>
#include <cstdio>
#include <iostream>
int main()
{
const char hi[] = "Hello, ";
const char all[] = "world!";
std::string result;
result.resize( sizeof(hi), 0 );
// simulate GetDlgItemText call
std::copy( hi, hi+sizeof(hi), const_cast<char*>(result.data()) ); // this is what your C-style cast is probably doing
// append
result.append( all );
std::cout << "size: " << result.size() // 14
<< ", contents" << result // "Hello, \0world!" - dump to a file and edit with a binary editor
<< std::endl;
std::printf( "%s\n", result.c_str() ); // "Hello, "
}
As you can see, printf expects a C-style string and will stop when the first null character is found, so that it can seem as if the append operation never took place. On the other hand, c++ streams do work properly with std::string and will dump the whole content, checking that the strings were actually appended.
A patch to your append operation disappearing would be removing the '\0' from the initial string (reserve only len space in the string). But that is not really a good solution, you should never use const_cast (there are really few places where it can be required and this is not one of them), the fact that you don't see it is even worse: using C style casts is making your code look nicer than it is.
You have commented on another answer that you do not want to add std::vector (which would provide with a correct solution as &v[0] is a proper mutable pointer into the buffer), of course, not adding the extra space for the '\0'. Consider that this is part of an implementation file, and the fact that you use or not std::vector will not extend beyond this single compilation unit. Since you are already using some STL features, you are not adding any extra requirement to your system. So to me that would be the way to go. The solution provided by Charles Bailey should work provided that you remove the extra null character.
This is NOT an answer. I have added it here as an answer only so that I can use formatting in a long going discussion about const_cast.
This is an example where using const_cast can break a running application:
#include <iostream>
#include <map>
typedef std::map<int,int> map_type;
void dump( map_type const & m ); // implemented somewhere else for concision
int main() {
map_type m;
m[1] = 10;
m[2] = 20;
m[3] = 30;
map_type::iterator it = m.find(2);
const_cast<int&>(it->first) = 10;
// At this point the order invariant of the container is broken:
dump(); // (1,10),(10,20),(3,30) !!! unordered by key!!!!
// This happens with g++-4.0.1 in MacOSX 10.5
if ( m.find(3) == m.end() ) std::cout << "key 3 not found!!!" << std::endl;
}
That is the danger of using const_cast. You can get away in some situations, but in others it will bite back, and probably hard. Try to debug in thousands of lines where the element with key 3 was removed from the container. And good luck in your search, for it was never removed.