How to get characters out of stringstream without copy? - c++

What is the proper c++11 way to extract a set of characters out of a stringstream without using boost?
I want to do it without copying, if possible, because where this is used is in a critical data loop. It seems, though, std::string does not allow direct access to the data.
For example, the code below performs a substring copy out of a stringstream:
inline std::string left(std::stringstream ss, uint32_t count) {
char* buffer = new char[count];
ss.get(buffer, count);
std::string str(buffer); // Second copy performed here
delete buffer;
return str;
}
Should I even be using char *buffer according to c++11?
How do I get around making a second copy?
My understanding is that vectors initialize every character, so I want to avoid that.
Also, this needs to be passed into a function which accepts const char *, so now after this runs I am forced to do a .c_str(). Does this also make a copy?
It would be nice to be able to pass back a const char *, but that seems to go against the "proper" c++11 style.
To understand what I am trying to do, here is "effectively" what I want to use it for:
fprintf( stderr, "Data: [%s]...", left(ststream, 255) );
But the c++11 forces:
fprintf( stderr, "Data: [%s]...", left(str_data, 255).c_str() );
How many copies of that string am I making here?
How can I reduce it to only a single copy out of the stringstream?

You could use something like described in this link: How to create a std::string directly from a char* array without copying?
Basically, create a string, call the resize() method on the string with the size that is passed to your function and then pass the pointer to the first character of the string to the stringstring.get() method. You will end up with only one copy.
inline std::string left(std::stringstream& ss, uint32_t count) {
std::string str;
str.resize(count);
ss.get(&str[0], count);
return str;
}

My suggestion:
Create the std::string to be returned by giving it the size.
Read the characters one by one from the stringstream and set the values in the std::string.
Here's what the function looks like:
inline std::string left(std::stringstream ss, uint32_t count) {
std::string str(count+1, '\0');
for (uint32_t i = 0; i < count; ++i )
{
int c = ss.getc();
if ( c != EOF )
{
str[i] = c;
}
else
{
break;
}
}
return str;
}

R Sahu, this I like! Obvious now that I see it done. ;-)
I do have one mod though (as well as passed a shared_ptr of stream which is what I actually had in my version):
In your initializer, you are filling with nulls. You only need to fill with the last one, so I propose a tweak of this:
inline std::string left(std::shared_ptr<std::stringstream> ss, uint32_t count) {
std::string str;
str.reserve(count + 1);
uint32_t i;
for(i = 0; i < count; ++i) {
int c = ss->get();
if(c != EOF) {
str[i] = c;
} else {
break;
}
}
str[i] = '\0';
return str;
}
Now, only initialized with nulls on a single character.
Thanks R Sahu!

If the purpose of this function is solely for passing to fprintf or another C-style stream, then you could avoid allocation completely by doing the following:
void left(FILE *out, std::stringstream &in, size_t count)
{
in.seekg(0);
char ch;
while ( count-- && in.get(ch) )
fputc(out, static_cast<unsigned char>(ch));
}
Usage:
fprintf( stderr, "Data: [" );
left(stderr, stream, 255);
fprintf( stderr, "] ...\n");
Bear in mind that another seekg will be required if you try to use the stream reading functions on the stringstream later; and it would not surprise me if this is the same speed or slower than the options involving str().

Related

sprintf buffer issue, wrong assignment to char array

I got an issue with sprintf buffer.
As you can see in the code down below I'm saving with sprintf a char array to the buffer, so pFile can check if there's a file named like that in the folder. If it's found, the buffer value will be assigned to timecycles[numCycles], and numCycles will be increased. Example: timecycles[0] = "timecyc1.dat". It works well, and as you can see in the console output it recognizes that there are only timecyc1.dat and timecyc5.dat in the folder. But as long as I want to read timecycles with a for loop, both indexes have the value "timecyc9.dat", eventhough it should be "timecyc1.dat" for timecycles[0] and "timecyc5.dat" for timecycles1. Second thing is, how can I write the code so readTimecycles() returns char* timecycles, and I could just initialize it in the main function with char* timecycles[9] = readTimecycles() or anything like that?
Console output
#include <iostream>
#include <cstdio>
char* timecycles[9];
void readTimecycles()
{
char buffer[256];
int numCycles = 0;
FILE* pFile = NULL;
for (int i = 1; i < 10; i++)
{
sprintf(buffer, "timecyc%d.dat", i);
pFile = fopen(buffer, "r");
if (pFile != NULL)
{
timecycles[numCycles] = buffer;
numCycles++;
std::cout << buffer << std::endl; //to see if the buffer is correct
}
}
for (int i = 0; i < numCycles; i++)
{
std::cout << timecycles[i] << std::endl; //here's the issue with timecyc9.dat
}
}
int main()
{
readTimecycles();
return 0;
}
With the assignment
timecycles[numCycles] = buffer;
you make all pointers point to the same buffer, since you only have a single buffer.
Since you're programming in C++ you could easily solve your problem by using std::string instead.
If I would remake your code into something a little-more C++-ish and less C-ish, it could look something like
std::array<std::string, 9> readTimeCycles()
{
std::array<std::string, 9> timecycles;
for (size_t i = 0; i < timecycles.size(); ++i)
{
// Format the file-name
std::string filename = "timecyc" + std::to_string(i + 1) + ".dat";
std::ifstream file(filename);
if (file)
{
// File was opened okay
timecycles[i] = filename;
}
}
return timecycles;
}
References:
std::array
std::string
std::to_string
std::ifstream
The fundamental problem is that your notion of a string doesn't match what a 'char array' is in C++. In particular you think that because you assign timecycles[numCycles] = buffer; somehow the chars of the char array are copied. But in C++ all that is being copied is a pointer, so timecycles ends up with multiple pointers to the same buffer. And that's not to mention the problem you will have that when you exit the readTimecycles function. At that point you will have multiple pointers to a buffer which no longer exists as it gets destroyed when you exit the readTimecycles function.
The way to fix this is to use C++ code that does match your expectations. In particular a std::string will copy in the way you expect it to. Here's how you can change your code to use std::string
#include <string>
std::string timecycles[9];
timecycles[numCycles] = buffer; // now this really does copy a string

Malformed output when converting string to char* in C++

I've got a function that splits up a string into various sections and then parses them, but when converting a string to char* I get a malformed output.
int parseJob(char * buffer)
{ // Parse raw data, should return individual jobs
const char* p;
int rows = 0;
for (p = strtok( buffer, "~" ); p; p = strtok( NULL, "~" )) {
string jobR(p);
char* job = &jobR[0];
parseJobParameters(job); // At this point, the data is still in good condition
}
return (1);
}
int parseJobParameters(char * buffer)
{ // Parse raw data, should return individual job parameters
const char* p;
int rows = 0;
for (p = strtok( buffer, "|" ); p; p = strtok( NULL, "|" )) { cout<<p; } // At this point, the data is malformed.
return (1);
}
I don't know what happens between the first function calling the second one, but it malforms the data.
As you can see from the code example given, the same method to convert string to char* is used and it works fine.
I'm using Visual Studio 2012/C++, any guidance and code examples will be greatly appreciated.
The "physical" reason your code does not work has nothing to do with std::string or C++. It wouldn't work in pure C as well. strtok is a function that stores its intermediate parsing state in some global variable. This immediately means that you cannot use strtok to parse more than one string at a time. Starting the second parse session before finishing the first would override the internal data stored by the first parse session, thus ruining it beyond repair. In other words, strtok parse sessions must not overlap. In your code they do overlap.
Also, in C++03 the idea of using std::string with strtok directly is doomed from the start. The internal sequence stored in std::string is not guaranteed to be null-terminated. This means that generally &jobR[0] is not a C-string. It can't be used with strtok. To convert a std::string to a C-string you have to use c_str(). But C-string returned by c_str() is non-modifiable.
In C++11 the null-termination is supposed to be visible through the [] operator, but still there seems to be no requirement to store the terminator object contiguously with the actual string, so &jobR[0] is still not a C-string even in C++11. C-string returned by c_str() or data() is non-modifiable.
You cannot use strtok() to parse multiple strings at the same time, like you are doing. The first call to parseJobParameters() in the first loop iteration of parseJob() will alter the internal buffer that strtok() points to, thus the second loop iteration of parseJob() will not be processing the original data anymore. You need to rewrite your code to not use nested calls to strtok() anymore, eg:
#include <vector>
#include <string>
void split(std::string s, const char *delims, std::vector &vec)
{
// alternatively, use s.find_first_of() and s.substr() instead...
for (const char* p = strtok(s.c_str(), delims); p != NULL; p = strtok(NULL, delims))
{
vec.push_back(p);
}
}
int parseJob(char * buffer)
{
std::vector<std::string> jobs;
split(buffer, "~", jobs);
for (std::vector<std::string>::iterator i = jobs.begin(); i != jobs.end(); ++i)
{
parseJobParameters(i->c_str());
}
return (1);
}
int parseJobParameters(char * buffer)
{
std::vector<std::string> params;
split(buffer, "|", params);
for (std::vector<std::string>::iterator i = params.begin(); i != params.end(); ++i)
{
std::cout << *i;
}
return (1);
}
Whilst this will give you the address of the first character in the string char* job = &jobR[0];, it does not give you a valid C-style string. YOu SHOULD use char* job = jobR.c_str();
I'm fairly sure that will solve your problem, but there could of course be something wrong with the way you read the buffer that is passed to parseJob in as well.
Edit: of course, you are also calling strtok from a function that uses strtok. Inside strtok looks a bit like this:
char *strtok(char *str, char *separators)
{
static char *last;
char *found = NULL;
if (!str) str = last;
... do searching for needle, set found to beginning of non-separators ...
if (found)
{
*str = 0; // mark end of string.
}
last = str;
return found;
}
Since "last" gets overwritten when you call parseParameters, you can't use strtok(NULL, ... ) when you get back to parseJobs

c++ delay printf until needed

In c++, Is it possible to write to some kind of buffer with printf (or similar) and then later in the program either write the buffer to the screen or discard it depending on the outcome.
I want to do this because I have a recursive function and only want the see the things printed throughout the recursion if the result is of interest.
The class std::ostringstream is what you are looking for.
In C++, formatted IO is done (preferably) through the <iostream> library. This is the famous cout << variable << endl.
cout outputs directly to the standard output. If you want to buffer instead, you can redirect your output to a std::ostringstream instance that you can later redirect to the standard out:
#include <iostream>
[...]
ostringstream buf;
buf << myVar1 << "MyStr" << endl;
[...] // some time later
cout << buf.str();
If you prefer the printf way of doing things, you can use sprintf (though I won't recommend it). It's a bit more complex because you need to know the size of the buffer in advance.
char myBuf[10000]; // up to you do to the proper bound checking
sprintf(myBuf, "format %d", myvar);
[...] // you may want to use strcat and such for more complex operations
printf(myBuf);
Certainly. You can leverage the power of vsnprintf for that purpose. I'd suggest some sort of class wrapping an std::string or std::vector<char> (essentially the same in C++11):
#include <cstdargs>
#include <cstdio>
#include <string>
class Formatter
{
std::string buf;
public:
void add(char const * fmt, ...)
{
std::va_list ap, aq;
va_start(ap, fmt);
va_copy(aq, ap);
int ret1 = std::vsnprintf(NULL, 0, fmt, ap);
// check ret1 != -1
std::size_t cur = buf.size();
buf.resize(cur + ret1 + 1);
int ret2 = std::vsnprintf(&buf[cur], ret1 + 1, fmt, aq);
// check ret2 != -1
buf.resize(cur + ret1);
va_end(aq);
va_end(ap);
}
std::string const & str() const { return buf; }
};
Now you can say:
Formatter f;
f.add("Hello, %s", "world");
f.add("%i%i%i", 1, 2, 3);
std::cout << f.str() << std::endl;
If you're very concerned about performance, you can try and preallocate some space for the print operation and maintain a separate "end" position, in the hope that you'll never have to run the vnsprintf call more than once.
What is about using a string ?
Or a string array. Or a collection ?
Gathering all data u need and printing if needed ?
You could use sprintf function which does the same thing as printf into char buffer. But you shall not. These old C-style functions are obsolete in C++, you shall use streams instead. Looks like std::stringstream fit you needs.
For a recursing function, the best way would be to delay getting the result, not printing it, so instead of this:
int fact( int n )
{
printf("%d", n);
if( n!=1 )
return n * fact(n - 1);
else return 1;
};
<....>
fact( 5 );
you might use this:
int fact( int n )
{
if( n!=1 )
return n * fact(n - 1);
else return 1;
};
<....>
int result = fact( 5 );
printf("%d", result);
Basically, print it only when it's ready. If for some reasons you can't do it directly, save the result into some kind of a buffer variable, and access it after the function ends.

Environment Variables are in a char* how to get it to a std::string

I am retrieving the environment variables in win32 using GetEnvironmentStrings(). It returns a char*.
I want to search this string(char pointer) for a specific environmental variable (yes I know I can use GetEnvironmentVariable() but I am doing it this way because I also want to print all the environment variables on the console aswell - I am just fiddling around).
So I thought I would convert the char* to an std::string & use find on it (I know I can also use a c_string find function but I am more concerned about trying to copy a char* into a std::string). But the following code seems to not copy all of the char* into the std::string (it makes me think there is a \0 character in the char* but its not actually the end).
char* a = GetEnvironmentStrings();
string b = string(a, sizeof(a));
printf( "%s", b.c_str() ); // prints =::=
Is there a way to copy a char* into a std::string (I know I can use strcpy() to copy a const char* into a string but not a char*).
You do not want to use sizeof() in this context- you can just pass the value into the constructor. char* trivially becomes const char* and you don't want to use strcpy or printf either.
That's for conventional C-strings- however GetEnvironmentStrings() returns a bit of a strange format and you will probably need to insert it manually.
const char* a = GetEnvironmentStrings();
int prev = 0;
std::vector<std::string> env_strings;
for(int i = 0; ; i++) {
if (a[i] == '\0') {
env_strings.push_back(std::string(a + prev, a + i));
prev = i;
if (a[i + 1] == '\0') {
break;
}
}
}
for(int i = 0; i < env_strings.size(); i++) {
std::cout << env_strings[i] << "\n";
}
sizeof(a) in what you have above will return the size of char*, i.e. a pointer (32 or 64bits usually). You were looking for function strlen there. And it's not actually required at all:
std::string b(a);
should be enough to get the first environment variable pair.
The result of GetEnvironmentStrings() points to memory containing all environment strings. Similar to the solution of Puppy it will be put into a vector of string, where each string contains just one environment variable ("key=value")
std::vector<std::string> env_strings;
LPTCH a = GetEnvironmentStrings();
As example we will have 2 environment variables defined:
"A=ABC"
"X=XYZ"
LPTCH a will be:
A=ABC\0X=XYZ\0\0
Each variable is '\0' - terminated and finally the complete environment string (a) will be terminated with an additional '\0'.
strlen will return the size to the first occurrence of the termination character '\0'. The last string will always be empty.
while ((std::size_t len = strlen(a)) > 0)
{
env_strings.push_back(std::string(a, len));
a += len + 1;
}
Multi-byte character
For multi-byte characters it will work as well:
LPTCH a = GetEnvironmentStrings();
std::vector<std::wstring> env_strings;
while ((std::size_t len = wcslen(a)) > 0)
{
env_strings.push_back(std::wstring(a, len));
a += len + 1;
}
FreeEnvironmentStrings(a);
Does the following causes any problems?
char* a = GetEnvironmentStrings();
string b;
b=a;
printf( "%s", b.c_str() );
When you say:
string b = string(a, sizeof(a));
you are getting the size of a, which is a pointer and is probably 4. So you will get the first 4 characters. I'm not sure what you are really trying to do, but you should be able just to say:
string b( a );
char* a = ...;
string str(a);
string b;
b = a;
I assume you mean the Windows API GetEnvironmentStrings function. So, test the result against nullptr and perform simple assignment:
char* env = ::GetEnvironmentStrings();
if (0 != env)
{
std::string senv = env;
// use senv to find variables
}
else
{
// report problem or ignore
}

Can getline() be used to get a char array from a fstream

I want to add a new (fstream) function in a program that already uses char arrays to process strings.
The problem is that the below code yields strings, and the only way i can think of getting this to work would be to have an intermediary function that would copy the strings, char by char, into a new char array, pass these on to the functions in the program, get back the results and then copy the results char by char back into the string.
Surely (hopefully) there must be a better way?
Thanks!
void translateStream(ifstream &input, ostream& cout) {
string inputStr;
string translated;
getline(input, inputStr, ' ');
while (!input.eof()) {
translateWord(inputStr, translated);
cout << translated;
getline(input, inputStr, ' ');
}
cout << inputStr;
the translateWord func:
void translateWord(char orig[], char pig[]) {
bool dropCap = false;
int len = strlen(orig)-1;
int firstVowel = findFirstVowel(orig);
char tempStr[len];
strcpy(pig, orig);
if (isdigit(orig[0])) return;
//remember if dropped cap
if (isupper(orig[0])) dropCap = true;
if (firstVowel == -1) {
strcat(pig, "ay");
// return;
}
if (isVowel(orig[0], 0, len)) {
strcat(pig, "way");
// return;
} else {
splitString(pig,tempStr,firstVowel);
strcat(tempStr, pig);
strcat(tempStr, "ay");
strcpy(pig,tempStr);
}
if (dropCap) {
pig[0] = toupper(pig[0]);
}
}
You can pass a string as the first parameter to translateWord by making the first parameter a const char *. Then you call the function with inputStr.c_str() as the first parameter. Do deal with the second (output) parameter though, you need to either completely re-write translateWord to use std::string (the best solution, IMHO), or pass a suitably sized array of char as the second parameter.
Also, what you have posted is not actually C++ - for example:
char tempStr[len];
is not supported by C++ - it is an extension of g++, taken from C99.
You can use the member function ifstream::getline. It takes a char* buffer as the first parameter, and a size argument as the second.