Why is the last character of character array is getting excluded? - c++

#include<iostream>
using namespace std;
int main()
{
int n;
cin>>n;
cin.ignore();
char arr[n+1];
cin.getline(arr,n);
cin.ignore();
cout<<arr;
return 0;
}
Input:
11
of the year
Output:
of the yea
I'm already providing n+1 for the null character. Then why is the last character getting excluded?

You allocated n+1 characters for your array, but then you told getline that there were only n characters available. It should be like this:
int n;
cin>>n;
cin.ignore();
char arr[n+1];
cin.getline(arr,n+1); // change here
cin.ignore();
cout<<arr;

Per cppreference.com:
https://en.cppreference.com/w/cpp/io/basic_istream/getline
Behaves as UnformattedInputFunction. After constructing and checking the sentry object, extracts characters from *this and stores them in successive locations of the array whose first element is pointed to by s, until any of the following occurs (tested in the order shown):
end of file condition occurs in the input sequence (in which case setstate(eofbit) is executed)
the next available character c is the delimiter, as determined by Traits::eq(c, delim). The delimiter is extracted (unlike basic_istream::get()) and counted towards gcount(), but is not stored.
count-1 characters have been extracted (in which case setstate(failbit) is executed).
If the function extracts no characters (e.g. if count < 1), setstate(failbit)is executed.
In any case, if count > 0, it then stores a null character CharT() into the next successive location of the array and updates gcount().
In your case, n=11. You are allocating n+1 (12) chars, but telling getline() that only n (11) chars are available, so it reads only n-1 (10) chars into the array and then terminates the array with '\0' in the 11th char. That is why you are missing the last character.
of the year
^
10th char, stops here
You need to +1 when calling getline(), to match your actual array size:
cin.getline(arr,n+1);

john's answer should fix your issue. Variable-length arrays (your char arr[n+1]) are not part of the C++ standard, for justified reasons. Yet I've taken a few hours of my time to go way out of question's scope and create the...
Student's guide to C++ I/O
...and I/O in general, with an emphasis on the I part. Fear not, do it the C++ way! The following snippets should be compiled with a standard-conforming C++ compiler.
C++ I/O & standard library
Textual input
This is the recommended way of reading UTF-8 encoded strings in C++, the most widespread text encoding. We will use std::string for storage, which is the de-facto way for holding UTF-8 encoded strings, and std::getline for the reading itself.
#include <iostream> // std::cin, std::cout, std::ws
#include <string> // std::string, std::getline
int main() {
int size;
// std::ws ignores all whitespace in the stream,
// until the first non-whitespace character.
// it's prettier and handles cases a simple .ignore() does not.
std::cin >> size >> std::ws;
std::string input;
std::getline(std::cin, input);
// This condition will most certainly be true (output will be 1).
std::cout << (size == input.size()) << '\n';
}
std::string is dynamically allocated, or, as you may hear, on the heap. This is a broad subject, so feel free to venture on your own, from this given starting point! How does this help us? We can store strings of sizes unknown ahead of time on the heap, because we can always reallocate a bigger buffer! std::getline allocates and reallocates as it reads the input until newline is reached, so you can read without knowing a size beforehand. Your size variable will most probably be equal with the size of the string under the assumption that this is a school exercise where the input length is provided as you're probably not taught of dynamic memory. For good reason, though - it's complex and would needlessly distract from the actual subject (algorithms, data structures etc.). Good to keep in mind: std::strings, unlike C-style strings, are not null terminated, but you can get a null-terminated C-style string from an std::string by calling the .c_str() method.
Binary data
What's binary data? Everything that's not text: images, videos, music, 2003 MS Word documents (the .doc ones, wait 'til you see what .docx is) and many others. It's customary to store binary as raw bytes, which is a fancy way to say numbers. unsigned char is the C/C++ type used to represent these raw bytes (C++17 introduces std::byte for this purpose. To work with data from binary input we need to store it somewhere in memory - either on the stack, or on the heap. We could store the whole input at once, but binary files are considered too large for this (and, really, are - think about the size of a movie!), so we usually read it in chunks - that means, we read only a finite part of it at a time (say 256 characters, that's our buffer), and we keep reading until we reach the end of the input (usually called end-of-file or, short, EOF). As a rule of thumb, when a buffer is small and static (doesn't need to be resized, as our string above), we can store it on the stack. If any of those conditions is not met, it goes on the heap. We should note that the notions of small and large are quite context dependent - compiler, OS, hardware, runtime environment (see this thread on stack size limits and embedded systems). The buffer size you'll choose is also task-specific, so there's no rule here, too. Let's see some code now!
#include <array> // std::array
#include <fstream> // std::ifstream, std::ofstream
int main() {
// We open this file in binary mode.
// The default mode may modify the input.
std::ifstream input{"some_image.jpg", std::ios::binary};
// 256 is our buffer size, unsigned char is the array type.
// This is the C++ way of `unsigned char buffer[256]`.
std::array<unsigned char, 256> buffer;
while (input.read(buffer.data(), buffer.size())) {
// Buffer is filled, do something with it
}
// At this point, either EOF is reached or an error occurred.
if (input.eof()) {
// Less characters than the buffer's size have been read.
// .gcount() returns the number of characters read by
// the last operation.
const std::streamsize chunk_size = input.gcount();
// Do something with these characters, as in the loop.
// Valid range to access in the buffer is [0, chunk_size).
// chunk_size can be 0, too. In that case, there is no more data
// to handle.
} else {
// Some other failure, handle error.
}
}
This snippet is reading through a file using a small, stack-allocated buffer of 256 bytes. std::array makes usage convenient and safe with its methods - read the linked docs! If we want to use a large buffer (say, 16MB), we replace the std::array with an std::vector:
std::vector buffer(1 << 24); // 1 << 24 gives 16MB in bytes
Rest is the same. You could also use std::string here, too, as std::string does not imply/force UTF-8 encoding of input. It's useful to have a convention that easily differentiates between binary and text data, in code.
Something to note is that reading in smaller chunks uses less space, but takes more time - taking bytes from a file involves making OS system calls and moving disks or electrons, when reading from a hard drive or an SSD, respectively. C++'s fstream objects already do buffering for you to speed up reads, which is usually a much-needed optimization. You'll know if this affects you.
Another thing to note is the EOF and error handling, using the .eof() method. We have omitted error handling in the textual input retrieval, but here we are forced into doing it, if we don't want to lose data. When EOF is reached, usually less bytes than the buffer size have been read, so we need a way to know how much of the buffer was filled with data. This is what .gcount() tells us. Depending on the program you're making, you may deem the EOF error as "unexpected" if the buffer is partially filled (.gcount() returns a non-0 value) - for example, the data read is incomplete, according to the rules it was assumed to be created after, or in other words, the end of file was reached before the data was supposed to end. Other than that, EOF is a condition that all files are in after being fully read.
C-style I/O
This may look closer to what's taught in school. As we've explained the general concepts above, this section will be richer in coding and explanations of code. We still use C++ as a language, so the C++ version of the C headers and the std namespace will be used - to have the code that follows work in a C compiler, replace the <csomething> headers with <something.h> and remove the std:: namespace prefix from types and functions. Let's dive into it!
Textual input
The equivalent of a C++ stream (std::cin, std::fstream etc.) in C is the std::FILE. FILEs are buffered by default, as are C++ streams. We'll use std::fscanf for reading the size of the input, which is just scanf but it takes as parameter the stream you read from, and std::fgets for reading the text line.
#include <cstdio> // std::FILE, std::fscanf, std::fgets, stdin
#include <cstring> // std::strcspn
// discard_whitespace does what std::ws did above.
// It consumes all whitespace before a non-whitespace
// character from stream f.
void discard_whitespace(std::FILE* f) {
char discard;
// The leading space in the format string
// tells fscanf to consume all whitespace.
std::fscanf(f, " %c", &discard);
}
int main() {
int size;
// stdin is a macro, doesn't have a namespace,
// hence no std:: prefix.
std::fscanf(stdin, "%d", &size);
// fscanf, like std::cin, doesn't consume whitespace
discard_whitespace(stdin);
// Your school exercise will probably have a size limit for the input.
// We consider it to be 256.
const int SIZE_UPPER_BOUND = 256;
// We add some extra bytes so the maximum length input can be accomodated.
// 1 is added for the null terminator of C-style strings.
// The other 2 is because `fgets` will also read the newline,
// which can be \n or \r\n, depending on OS. See explanation after code.
char input[SIZE_UPPER_BOUND + 3];
// The actual read - sizeof gets the size of our input buffer,
// we don't have to write it twice.
std::fgets(input, sizeof input, stdin);
// fgets also reads the newline, unlike `std::getline` or
// `std::cin.getline` - we have to remove it ourselves.
input[std::strcspn(input, "\r\n")] = '\0';
// This condition will be true, as in the C++ example.
std::fprintf(stdin, "%d\n", std::strlen(input) == size);
}
Let's unpack that newline removal. std::strcspn finds the first position of any of the given characters in the input. We provide both \r and \n, to support UNIX (\n) and Windows (\r\n) newline terminators - yeah, they're different, see Wikipedia, on "Newline". By adding the null terminator, '\0', we move the ending of the string where the newline was, basically "removing" the newline. If this is a school assignment, we can assume input is correct, so we could have used size + 1 instead of std::strcspn to remove the newline:
input[size + 1] = '\0';
This doesn't work when we don't know the input size or the input may be invalid.
As an optimization trick, observe that std::strcspn returns the line length, in this case. When you don't know the size, but you need it for later, you can save the result of std::strcspn in a variable before, and then use it instead of std::strlen:
// std::size_t is an unsigned integral type, used to represent
// array sizes and indexes in C/C++
const std::size_t input_size = std::strcspn(input, "\r\n");
input[input_size] = '\0';
You'll see some people use 0 or NULL for the terminator. I recommend against it - unlike the \0 literal, that is of char type, the other two variants are implicitly casted to char. If you read the linked documentation, you'll realize NULL is even incorrect, according to the spec, as it's meant to be used in contexts that require pointers only.
An alternative method to fgets is fscanf, again. Thread carefully, though - while a simple %s may do it, it makes your code vulnerable to buffer overflow exploits. See this StackOverflow thread on disadvantages of scanf, too. Let's see the (safe) code:
std::fscanf(stdin, "%256[^\r\n]s", input);
That number limits the input size to our SIZE_UPPER_BOUND, and the [^\r\n] tells fscanf to read all characters up to \r or \n. With this method you can remove the discard_whitespace call, as fscanf with the %s verb consumes leading whitespace. A downside to fscanf is that you have to keep the size limit in the input string and the buffer size in sync - you have no way to specify the input size dynamically other than building the format string dynamically, which is overkill for a school assignment). This is a problem in more sizable codebases, but for a one-file, one-time school assignment it's not a big deal, so you may prefer fscanf over fgets, as it's less work. fscanf doesn't read the newline in the buffer, too.
Binary data
The equivalent of C++'s std::cin.read in C world is std::fread. Code will resemble its C++ counterpart:
#include <cstdio>
int main() {
// The second parameter is the file access mode.
// In this case, it is read (r) binary (b).
std::FILE* f = std::fopen("some_image.jpg", "rb");
unsigned char buffer[256];
std::size_t chunk_size;
while (chunk_size = std::fread(buffer, sizeof buffer[0], sizeof buffer, f)) {
// chunk_size == sizeof buffer, do something with the buffer
}
if (std::feof(f)) {
// chunk_size != sizeof buffer, do something with buffer
// or handle as error
} else {
// an error occurred, handle it
}
// We need to close the file, unlike in C++, where it is closed automatically.
std::fclose(f);
}
The arguments to std::fread are hairy: read the documentation. Everything else looks very similar to the C++ way, from the loop to the error handling. Why? Because it's literally the same thing - we're just using different (standard) libraries. Another similarity is that C I/O is also buffered by default, just like C++'s. What's different is the line at the end - the call to std::fclose. We're not doing anything similar in the C++ code, right? No. Remember that C++ classes have constructors and destructors, functions that are automatically called at the beginning, respectively at the end of a variable's lifetime. These two allow us to implement the RAII technique, which will do the resource management automatically (opening the file in the constructor, closing it in the destructor). RAII is used inside std::string and std::vector (and other containers, smart pointers & others). In other words, the destructor of std::ifstream closes the file at the end of main(), just as we are doing here, manually.
Hybrid approach (??)
Would you ever want to combine the two? So it seems. Let's talk drawbacks:
The C++ I/O library, due to the way it's built, takes more care to use in a performant manner compared to C's (virtual function calls and extra function calls in general, especially when using << and >> operators & stream manipulators, as each of these is a function call, compared to a single plain function call/operation with the C library). See this StackOverflow thread on i/ostream speed, too. The C++ library is also more verbose, especially in the case of outputting (ever heard of the "chevrone hell"?)
The C I/O library is easy to use improperly/unsafely, the terse, shorthand namings make code difficult to follow, and output cannot be extended to support custom types (this is a problem when using C-style I/O in C++). It also takes great care to handle dynamic buffers correctly, given that the only way of managing heap memory in C is malloc and free.
Some schools may crucify you if any trace of std::string is left in sight (or so I've heard)
Using C-style types (char[N] instead of std::array<char, N>, for example), is easier - no headers to include, as the types are builtin primitives and less to type. May be preferred in short, throwaway programs like algorithmic exercises at school.
With these in mind, we can take a look at how to conveniently combine the two when reading text and binary!
Textual input
We will take advantage of the terseness of C-style types and the ease of use of C++'s I/O library:
#include <iostream>
int main() {
int size;
std::cin >> size >> std::ws;
const int SIZE_UPPER_BOUND = 256;
char input[SIZE_UPPER_BOUND + 1];
std::cin.getline(input, sizeof input);
// Input done, solve the problem.
}
Teachers don't have to scratch their head at the presence of std::string and std::getline and all the standard library shenanigans you start using after diving in this rabbit hole. You, the programmer, don't have to clean up newline endings and memorize arcane format specifiers just to read a string and an int. Focus on code and solve problems without having to debug the input reading logic, ever - it just works!
Binary data
The convoluted hierarchical tree of C++'s I/O library types scares you, the clean assembly output enjoyer, just like Linus Torvalds. You're still somehow afraid to manually manage memory, so you choose this solution:
#include <cstdio>
#include <vector>
int main() {
// The second parameter is the file access mode.
// In this case, it is read (r) binary (b).
std::FILE* f = std::fopen("some_image.jpg", "rb");
std::vector<unsigned char> buffer(1 << 24);
std::size_t chunk_size;
while (chunk_size = std::fread(buffer.data(), sizeof buffer[0], buffer.size(), f)) {
// use the buffer
}
if (std::feof(f)) {
// handle EOF
} else {
// handle error
}
std::fclose(f);
}
Weird choice, given that you still manage the file's lifetime manually. While this may not be the best example, using C++ RAII containers together with C libraries is not uncommon - memory safety is crucial.
Trivia
as usual, weigh your decision of using namespace std;
Cool things you won't need:
speed up C++ I/O using a single line at the beginning of the program (but be careful)
disable C I/O buffering
disable C++ I/O buffering
Conclusion
I/O is the crowded junction of fundamental CS concepts, hardware & software inner workings and C++'s features and quirks. Take in what you can at a time & focus on what matters, and make sure you're building on sturdy fundamentals.

Related

Implementation of getline ( istream& is, string& str )

My Question is very simple, how is getline(istream, string) implemented?
How can you solve the problem of having fixed size char arrays like with getline (char* s, streamsize n ) ?
Are they using temporary buffers and many calls to new char[length] or another neat structure?
getline(istream&, string&) is implemented in a way that it reads a line. There is no definitive implementation for it; each library probably differs from one another.
Possible implementation:
istream& getline(istream& stream, string& str)
{
char ch;
str.clear();
while (stream.get(ch) && ch != '\n')
str.push_back(ch);
return stream;
}
#SethCarnegie is right: more than one implementation is possible. The C++ standard does not say which should be used.
However, the question is still interesting. It's a classic computer-science problem. Where, and how, does one allocate memory when one does not know in advance how much memory to allocate?
One solution is to record the string's characters as a linked list of individual characters. This is neither memory-efficient nor fast, but it works, is robust, and is relatively simple to program. However, a standard library is unlikely to be implemented this way.
A second solution is to allocate a buffer of some fixed length, such as 128 characters. When the buffer overflows, you allocate a new buffer of double length, 256 characters, then copy the old characters over to the new storage, then release the old. When the new buffer overflows, you allocate an even newer buffer of double length again, 512 characters, then repeat the process; and so on.
A third solution combines the first two. A linked list of character arrays is maintained. The first two members of the list store (say) 128 characters each. The third stores 256. The fourth stores 512, and so on. This requires more programming than the others, but may be preferable to either, depending on the application.
And the list of possible implementations goes on.
Regarding standard-library implementations, #SteveJessop adds that "[a] standard library's string isn't permitted to be implemented as (1), because of the complexity requirement of operator[] for strings. In C++11 it's not permitted to be implemented as (3) either, because of the contiguity requirement for strings. The C++ committee expressed the belief that no active C++ implementation did (3) at the time they added the contiguity requirement. Of course, getline can do what it likes temporarily with the characters before adding them all to the string, but the standard does say a lot about what string can do."
The addition is relevant because, although getline could temporarily store its data in any of several ways, if the data's ultimate target is a string, this may be relevant to getline's implementation. #SteveJessop further adds, "For string itself, implementations are pretty much required to be (2) except that they can choose their own rate of expansion; they don't have to double each time as long as they multiply by some constant."
As #3bdalla said, implementation of thb doesn't work as gnu implementation. So, I wrote my own implementation, which works like gnu's one. I don't know what will be with errors in this variant, so it needs to be tested.
My implementation of getline:
std::istream& getline(std::istream& is, std::string& s, char delim = '\n'){
s.clear();
char c;
std::string temp;
if(is.get(c)){
temp.push_back(c);
while((is.get(c)) && (c != delim))
temp.push_back(c);
if(!is.bad())
s = temp;
if(!is.bad() && is.eof())
is.clear(std::ios_base::eofbit);
}
return is;
}

How to read the standard istream buffer in c++?

I have the following problem. I have to implement a class that has an attribute that is a char pointer meant to point to the object's "code", as follows:
class foo{
private:
char* cod;
...
public:
foo();
void getVal();
...
}
So on, so forth. getVal() is a method that takes the code from the standard istream and fills in all the information, including the code. The thing is, the "code" that identifies the object can't be longer than a certain number of characters. This has to be done without using customized buffers for the method getVal(), so I can't do the following:
//suppose the maximum number of characters is 50
void foo::getVal()
{
char buffer[100];
cin >> buffer;
if (strlen(buffer) > 50) //I'm not sure this would work considering how the stream
of characters would be copied to buffer and how strlen
works, but suppose this tells me how long the stream of
characters was.
{
throw "Exception";
}
...
}
This is forbidden. I also can't use a customized istream, nor the boost library.
I thought I could find the place where istream keeps its information rather easily, but I can't find it. All I've found were mentions to other types of stream.
Can somebody tell me if this can be done or where the stream keeps its buffered information?
Thanks
yes using strlen would work definitely ..you can write a sample program
int main()
{
char buffer[10];
std::cout << "enter buffer:" ;
std::cin >>buffer;
if(strlen(buffer)>6)
std::cout << "size > 6";
getch();
}
for inputs greater than size 6 characters it will display size >6
uhm .... >> reads up to the first blank, while strlen counts up to the first null. They can be mixed if you know for sure no blanks are in the middle of string you're going to read and that there are no more than 100 consecutive characted. If not, you will overrun the buffer before throwing.
Also, accessing the buffer does not grant all the string to be already there (the string can go past the buffer space, requiring to partially read and refill the buffer...)
If blanks are separator, why not just read into an std::string, and react to its final state? All the dynamics above are already handled inside >> for std::string.
[EDIT after the comments below]
The only way to store a sequence of unknown size, is to dynamically allocate the space and make it grow as it is required to grow. This is, no more - no less, what sting and vector do.
Whether you use them or write your own code to allocate and reallocate where more space is required, doesn't change the substance.
I'm start thinking the only reason of those requirements is to see your capability in writing your own string class. So ... just write it:
declare a class holding a pointer a size and a capacity, allocate some space, track how much you store, and when no store is available, allocate another wider store, copy the old, destroy it, and adjust the data member accordingly.
Accessing directly the file buffer is not the way, since you don't control how the file buffer is filled in.
An istream uses a streambuf.
I find that www.cppreference.com is a pretty good place for quick C++ references. You can go there to see how to use a streambuf or its derivative filebuf.

Reading a fixed number of chars with << on an istream

I was trying out a few file reading strategies in C++ and I came across this.
ifstream ifsw1("c:\\trys\\str3.txt");
char ifsw1w[3];
do {
ifsw1 >> ifsw1w;
if (ifsw1.eof())
break;
cout << ifsw1w << flush << endl;
} while (1);
ifsw1.close();
The content of the file were
firstfirst firstsecond
secondfirst secondsecond
When I see the output it is printed as
firstfirst
firstsecond
secondfirst
I expected the output to be something like:
fir
stf
irs
tfi
.....
Moreover I see that "secondsecond" has not been printed. I guess that the last read has met the eof and the cout might not have been executed. But the first behavior is not understandable.
The extraction operator has no concept of the size of the ifsw1w variable, and (by default) is going to extract characters until it hits whitespace, null, or eof. These are likely being stored in the memory locations after your ifsw1w variable, which would cause bad bugs if you had additional variables defined.
To get the desired behavior, you should be able to use
ifsw1.width(3);
to limit the number of characters to extract.
It's virtually impossible to use std::istream& operator>>(std::istream&, char *) safely -- it's like gets in this regard -- there's no way for you to specify the buffer size. The stream just writes to your buffer, going off the end. (Your example above invokes undefined behavior). Either use the overloads accepting a std::string, or use std::getline(std::istream&, std::string).
Checking eof() is incorrect. You want fail() instead. You really don't care if the stream is at the end of the file, you care only if you have failed to extract information.
For something like this you're probably better off just reading the whole file into a string and using string operations from that point. You can do that using a stringstream:
#include <string> //For string
#include <sstream> //For stringstream
#include <iostream> //As before
std::ifstream myFile(...);
std::stringstream ss;
ss << myFile.rdbuf(); //Read the file into the stringstream.
std::string fileContents = ss.str(); //Now you have a string, no loops!
You're trashing the memory... its reading past the 3 chars you defined (its reading until a space or a new line is met...).
Read char by char to achieve the output you had mentioned.
Edit : Irritate is right, this works too (with some fixes and not getting the exact result, but that's the spirit):
char ifsw1w[4];
do{
ifsw1.width(4);
ifsw1 >> ifsw1w;
if(ifsw1.eof()) break;
cout << ifsw1w << flush << endl;
}while(1);
ifsw1.close();
The code has undefined behavior. When you do something like this:
char ifsw1w[3];
ifsw1 >> ifsw1w;
The operator>> receives a pointer to the buffer, but has no idea of the buffer's actual size. As such, it has no way to know that it should stop reading after two characters (and note that it should be 2, not 3 -- it needs space for a '\0' to terminate the string).
Bottom line: in your exploration of ways to read data, this code is probably best ignored. About all you can learn from code like this is a few things you should avoid. It's generally easier, however, to just follow a few rules of thumb than try to study all the problems that can arise.
Use std::string to read strings.
Only use fixed-size buffers for fixed-size data.
When you do use fixed buffers, pass their size to limit how much is read.
When you want to read all the data in a file, std::copy can avoid a lot of errors:
std::vector<std::string> strings;
std::copy(std::istream_iterator<std::string>(myFile),
std::istream_iterator<std::string>(),
std::back_inserter(strings));
To read the whitespace, you could used "noskipws", it will not skip whitespace.
ifsw1 >> noskipws >> ifsw1w;
But if you want to get only 3 characters, I suggest you to use the get method:
ifsw1.get(ifsw1w,3);

understanding the dangers of sprintf(...)

OWASP says:
"C library functions such as strcpy
(), strcat (), sprintf () and vsprintf
() operate on null terminated strings
and perform no bounds checking."
sprintf writes formatted data to string
int sprintf ( char * str, const char * format, ... );
Example:
sprintf(str, "%s", message); // assume declaration and
// initialization of variables
If I understand OWASP's comment, then the dangers of using sprintf are that
1) if message's length > str's length, there's a buffer overflow
and
2) if message does not null-terminate with \0, then message could get copied into str beyond the memory address of message, causing a buffer overflow
Please confirm/deny. Thanks
You're correct on both problems, though they're really both the same problem (which is accessing data beyond the boundaries of an array).
A solution to your first problem is to instead use std::snprintf, which accepts a buffer size as an argument.
A solution to your second problem is to give a maximum length argument to snprintf. For example:
char buffer[128];
std::snprintf(buffer, sizeof(buffer), "This is a %.4s\n", "testGARBAGE DATA");
// std::strcmp(buffer, "This is a test\n") == 0
If you want to store the entire string (e.g. in the case sizeof(buffer) is too small), run snprintf twice:
int length = std::snprintf(nullptr, 0, "This is a %.4s\n", "testGARBAGE DATA");
++length; // +1 for null terminator
char *buffer = new char[length];
std::snprintf(buffer, length, "This is a %.4s\n", "testGARBAGE DATA");
(You can probably fit this into a function using va or variadic templates.)
Both of your assertions are correct.
There's an additional problem not mentioned. There is no type checking on the parameters. If you mismatch the format string and the parameters, undefined and undesirable behavior could result. For example:
char buf[1024] = {0};
float f = 42.0f;
sprintf(buf, "%s", f); // `f` isn't a string. the sun may explode here
This can be particularly nasty to debug.
All of the above lead many C++ developers to the conclusion that you should never use sprintf and its brethren. Indeed, there are facilities you can use to avoid all of the above problems. One, streams, is built right in to the language:
#include <sstream>
#include <string>
// ...
float f = 42.0f;
stringstream ss;
ss << f;
string s = ss.str();
...and another popular choice for those who, like me, still prefer to use sprintf comes from the boost Format libraries:
#include <string>
#include <boost\format.hpp>
// ...
float f = 42.0f;
string s = (boost::format("%1%") %f).str();
Should you adopt the "never use sprintf" mantra? Decide for yourself. There's usually a best tool for the job and depending on what you're doing, sprintf just might be it.
Yes, it is mostly a matter of buffer overflows. However, those are quite serious business nowdays, since buffer overflows are the prime attack vector used by system crackers to circumvent software or system security. If you expose something like this to user input, there's a very good chance you are handing the keys to your program (or even your computer itself) to the crackers.
From OWASP's perspective, let's pretend we are writing a web server, and we use sprintf to parse the input that a browser passes us.
Now let's suppose someone malicious out there passes our web browser a string far larger than will fit in the buffer we chose. His extra data will instead overwrite nearby data. If he makes it large enough, some of his data will get copied over the webserver's instructions rather than its data. Now he can get our webserver to execute his code.
Your 2 numbered conclusions are correct, but incomplete.
There is an additional risk:
char* format = 0;
char buf[128];
sprintf(buf, format, "hello");
Here, format is not NULL-terminated. sprintf() doesn't check that either.
Your interpretation seems to be correct. However, your case #2 isn't really a buffer overflow. It's more of a memory access violation. That's just terminology though, it's still a major problem.
The sprintf function, when used with certain format specifiers, poses two types of security risk: (1) writing memory it shouldn't; (2) reading memory it shouldn't. If snprintf is used with a size parameter that matches the buffer, it won't write anything it shouldn't. Depending upon the parameters, it may still read stuff it shouldn't. Depending upon the operating environment and what else a program is doing, the danger from improper reads may or may not be less severe than that from improper writes.
It is very important to remember that sprintf() adds the ASCII 0 character as string terminator at the end of each string. Therefore, the destination buffer must have at least n+1 bytes (To print the word "HELLO", a 6-byte buffer is required, NOT 5)
In the example below, it may not be obvious, but in the 2-byte destination buffer, the second byte will be overwritten by ASCII 0 character. If only 1 byte was allocated for the buffer, this would cause buffer overrun.
char buf[3] = {'1', '2'};
int n = sprintf(buf, "A");
Also note that the return value of sprintf() does NOT include the null-terminating character. In the example above, 2 bytes were written, but the function returns '1'.
In the example below, the first byte of class member variable 'i' would be partially overwritten by sprintf() (on a 32-bit system).
struct S
{
char buf[4];
int i;
};
int main()
{
struct S s = { };
s.i = 12345;
int num = sprintf(s.buf, "ABCD");
// The value of s.i is NOT 12345 anymore !
return 0;
}
I pretty much have stated a small example how you could get rid of the buffer size declaration for the sprintf (if you intended to, of course!) and no snprintf envolved ....
Note: This is an APPEND/CONCATENATION example, take a look at here

Is there any way to determine how many characters will be written by sprintf?

I'm working in C++.
I want to write a potentially very long formatted string using sprintf (specifically a secure counted version like _snprintf_s, but the idea is the same). The approximate length is unknown at compile time so I'll have to use some dynamically allocated memory rather than relying on a big static buffer. Is there any way to determine how many characters will be needed for a particular sprintf call so I can always be sure I've got a big enough buffer?
My fallback is I'll just take the length of the format string, double it, and try that. If it works, great, if it doesn't I'll just double the size of the buffer and try again. Repeat until it fits. Not exactly the cleverest solution.
It looks like C99 supports passing NULL to snprintf to get the length. I suppose I could create a module to wrap that functionality if nothing else, but I'm not crazy about that idea.
Maybe an fprintf to "/dev/null"/"nul" might work instead? Any other ideas?
EDIT: Alternatively, is there any way to "chunk" the sprintf so it picks up mid-write? If that's possible it could fill the buffer, process it, then start refilling from where it left off.
The man page for snprintf says:
Return value
Upon successful return, these functions return the number of
characters printed (not including the trailing '\0' used to end
output to strings). The functions snprintf and vsnprintf do not
write more than size bytes (including the trailing '\0'). If
the output was truncated due to this limit then the return value
is the number of characters (not including the trailing '\0')
which would have been written to the final string if enough
space had been available. Thus, a return value of size or more
means that the output was truncated. (See also below under
NOTES.) If an output error is encountered, a negative value is
returned.
What this means is that you can call snprintf with a size of 0. Nothing will get written, and the return value will tell you how much space you need to allocate to your string:
int how_much_space = snprintf(NULL, 0, fmt_string, param0, param1, ...);
As others have mentioned, snprintf() will return the number of characters required in a buffer to prevent the output from being truncated. You can simply call it with a 0 buffer length parameter to get the required size then use an appropriately sized buffer.
For a slight improvement in efficiency, you can call it with a buffer that's large enough for the normal case and only do a second call to snprintf() if the output is truncated. In order to make sure the buffer(s) are properly freed in that case, I'll often use an auto_buffer<> object that handles the dynamic memory for me (and has the default buffer on the stack to avoid a heap allocation in the normal case).
If you're using a Microsoft compiler, MS has a non-standard _snprintf() that has serious limitations of not always null terminating the buffer and not indicating how big the buffer should be.
To work around Microsoft's non-support, I use a nearly public domain snprintf() from Holger Weiss.
Of course if your non-MS C or C++ compiler is missing snprintf(), the code from the above link should work just as well.
I would use a two-stage approach. Generally, a large percentage of output strings will be under a certain threshold and only a few will be larger.
Stage 1, use a reasonable sized static buffer such as 4K. Since snprintf() can restrict how many characters are written, you won't get a buffer overflow. What you will get returned from snprintf() is the number of characters it would have written if your buffer had been big enough.
If your call to snprintf() returns less than 4K, then use the buffer and exit. As stated, the vast majority of calls should just do that.
Some will not and that's when you enter stage 2. If the call to snprintf() won't fit in the 4K buffer, you at least now know how big a buffer you need.
Allocate, with malloc(), a buffer big enough to hold it then snprintf() it again to that new buffer. When you're done with the buffer, free it.
We worked on a system in the days before snprintf() and we acheived the same result by having a file handle connected to /dev/null and using fprintf() with that. /dev/null was always guaranteed to take as much data as you give it so we would actually get the size from that, then allocate a buffer if necessary.
Keep in kind that not all systems have snprintf() (for example, I understand it's _snprintf() in Microsoft C) so you may have to find the function that does the same job, or revert to the fprintf /dev/null solution.
Also be careful if the data can be changed between the size-checking snprintf() and the actual snprintf() to the buffer (i.e., wathch out for threads). If the sizes increase, you'll get buffer overflow corruption.
If you follow the rule that data, once handed to a function, belongs to that function exclusively until handed back, this won't be a problem.
For what it's worth, asprintf is a GNU extension that manages this functionality. It accepts a pointer as an output argument, along with a format string and a variable number of arguments, and writes back to the pointer the address of a properly-allocated buffer containing the result.
You can use it like so:
#define _GNU_SOURCE
#include <stdio.h>
int main(int argc, char const *argv[])
{
char *hi = "hello"; // these could be really long
char *everyone = "world";
char *message;
asprintf(&message, "%s %s", hi, everyone);
puts(message);
free(message);
return 0;
}
Hope this helps someone!
Take a look at CodeProject: CString-clone Using Standard C++. It uses solution you suggested with enlarging buffer size.
// -------------------------------------------------------------------------
// FUNCTION: FormatV
// void FormatV(PCSTR szFormat, va_list, argList);
//
// DESCRIPTION:
// This function formats the string with sprintf style format-specs.
// It makes a general guess at required buffer size and then tries
// successively larger buffers until it finds one big enough or a
// threshold (MAX_FMT_TRIES) is exceeded.
//
// PARAMETERS:
// szFormat - a PCSTR holding the format of the output
// argList - a Microsoft specific va_list for variable argument lists
//
// RETURN VALUE:
// -------------------------------------------------------------------------
void FormatV(const CT* szFormat, va_list argList)
{
#ifdef SS_ANSI
int nLen = sslen(szFormat) + STD_BUF_SIZE;
ssvsprintf(GetBuffer(nLen), nLen-1, szFormat, argList);
ReleaseBuffer();
#else
CT* pBuf = NULL;
int nChars = 1;
int nUsed = 0;
size_type nActual = 0;
int nTry = 0;
do
{
// Grow more than linearly (e.g. 512, 1536, 3072, etc)
nChars += ((nTry+1) * FMT_BLOCK_SIZE);
pBuf = reinterpret_cast<CT*>(_alloca(sizeof(CT)*nChars));
nUsed = ssnprintf(pBuf, nChars-1, szFormat, argList);
// Ensure proper NULL termination.
nActual = nUsed == -1 ? nChars-1 : SSMIN(nUsed, nChars-1);
pBuf[nActual+1]= '\0';
} while ( nUsed < 0 && nTry++ < MAX_FMT_TRIES );
// assign whatever we managed to format
this->assign(pBuf, nActual);
#endif
}
I've looked for the same functionality you're talking about, but as far as I know, something as simple as the C99 method is not available in C++, because C++ does not currently incorporate the features added in C99 (such as snprintf).
Your best bet is probably to use a stringstream object. It's a bit more cumbersome than a clearly written sprintf call, but it will work.
Since you're using C++, there's really no need to use any version of sprintf. The simplest thing to do is use a std::ostringstream.
std::ostringstream oss;
oss << a << " " << b << std::endl;
oss.str() returns a std::string with the contents of what you've written to oss. Use oss.str().c_str() to get a const char *. It's going to be a lot easier to deal with in the long run and eliminates memory leaks or buffer overruns. Generally, if you're worrying about memory issues like that in C++, you're not using the language to its full potential, and you should rethink your design.