Getting address of string in stringstream object - c++

Is it possible to get the pointer to the string stored in string stream object . Basically i want to copy the string from that location into another buffer ..
I found that i can get the length from below code
myStringStreamObj.seekg(0, ios::end);
lengthForStringInmyStringStreamObj = myStringStreamObj.tellg();
I know i can always domyStringStreamObj.str().c_str(). However my profiler tells me this code is taking time and i want to avoid it . hence i need some good alternative to get pointer to that string.
My profiler also tells me that another part of code where is do myStringStreamObj.str(std::string()) is slow too . Can some one guide me on this too .
Please , I cant avoid stringstream as its part of a big code which i cant change / dont have permission to change .

The answer is "no". The only documented API to obtain the formatted string contents is the str() method.
Of course, there's always a small possibility that whatever the compiler or platform you're using might have its own specific non-standard and/or non-documented methods for accessing the internals of a stringstream object; which might be faster. Because your question did not specify any particular compiler or implementation, I must conclude that you are looking for a portable, standards-compliant answer; so the answer in that case is a pretty much a "no".
I would actually be surprised if any particular compiler or platform, even some who might have a "reputation" for poisonings language standards <cough>, would offer any alternatives. I would expect that all implementations would prefer to keep the internal stringstream moving gears private, so that they can be tweaked and fiddled with, in future releases, without breaking binary ABI compatibility.
The only other possibility you might want to investigate is to obtain the contents of the stringstream using its iterators. This is, of course, an entirely different mechanism for pulling out what's in a stringstream; and is not as straightforward as just calling one method that hands you the string, on a silver platter; so it's likely to involve a fairly significant rewrite of your code that works with the returned string. But it is possible that iterating over what's in the stringstream might turn out to be faster since, presumably, there will not be any need for the implementation to allocate a new std::string instance just for str()'s benefit, and jamming everything inside it.
Whether or not iterating will be faster in your case depends on how your implementation's stringstream works, and how efficient the compiler is. The only way to find out is to go ahead and do it, then profile the results.

I can not provide a portable, standards compliant method.
Although you can't get the internal buffer you can provide your own.
According to the standard setting the internal buffer of a std::stringbuf object has implementation defined behaviour.
cplusplus.com: std::stringbuf::setbuf()
As it happens in my implementation of GCC 4.8.2 the behaviour is to use the external buffer you provide instead if its internal std::string.
So you can do this:
int main()
{
std::ostringstream oss;
char buf[1024]; // this is where the data ends up when you write to oss
oss.rdbuf()->pubsetbuf(buf, sizeof(buf));
oss << "Testing" << 1 << 2 << 3.2 << '\0';
std::cout << buf; // see the data
}
But I strongly advise that you only do stuff like this as a (very) temporary measure while you sort out something more efficient that is portable according to the standard.
Having said all that I looked at how my implementation implements std::stringstream::str() and it basically returns its internal std::string so you get direct access and with optimization turned on the function calls should be completely optimized away. So this should be the preferred method.

Related

Function to call #include macro from a string variable argument?

Is it possible to have a function like this:
const char* load(const char* filename_){
return
#include filename_
;
};
so you wouldn't have to hardcode the #include file?
Maybe with a some macro?
I'm drawing a blank, guys. I can't tell if it's flat out not possible or if it just has a weird solution.
EDIT:
Also, the ideal is to have this as a compile time operation, otherwise I know there's more standard ways to read a file. Hence thinking about #include in the first place.
This is absolutely impossible.
The reason is - as Justin already said in a comment - that #include is evaluated at compile time.
To include files during run time would require a complete compiler "on board" of the program. A lot of script languages support things like that, but C++ is a compiled language and works different: Compile and run time are strictly separated.
You cannot use #include to do what you want to do.
The C++ way of implementing such a function is:
Find out the size of the file.
Allocate memory for the contents of the file.
Read the contents of the file into the allocated memory.
Return the contents of the file to the calling function.
It will better to change the return type to std::string to ease the burden of dealing with dynamically allocated memory.
std::string load(const char* filename)
{
std::string contents;
// Open the file
std::ifstream in(filename);
// If there is a problem in opening the file, deal with it.
if ( !in )
{
// Problem. Figure out what to do with it.
}
// Move to the end of the file.
in.seekg(0, std::ifstream::end);
auto size = in.tellg();
// Allocate memory for the contents.
// Add an additional character for the terminating null character.
contents.resize(size+1);
// Rewind the file.
in.seekg(0);
// Read the contents
auto n = in.read(contents.data(), size);
if ( n != size )
{
// Problem. Figure out what to do with it.
}
contents[size] = '\0';
return contents;
};
PS
Using a terminating null character in the returned object is necessary only if you need to treat the contents of the returned object as a null terminated string for some reason. Otherwise, it maybe omitted.
I can't tell if it's flat out not possible
I can. It's flat out not possible.
Contents of the filename_ string are not determined until runtime - the content is unknown when the pre processor is run. Pre-processor macros are processed before compilation (or as first step of compilation depending on your perspective).
When the choice of the filename is determined at runtime, the file must also be read at runtime (for example using a fstream).
Also, the ideal is to have this as a compile time operation
The latest time you can affect the choice of included file is when the preprocessor runs. What you can use to affect the file is a pre-processor macro:
#define filename_ "path/to/file"
// ...
return
#include filename_
;
it is theoretically possible.
In practice, you're asking to write a PHP construct using C++. It can be done, as too many things can, but you need some awkward prerequisites.
a compiler has to be linked into your executable. Because the operation you call "hardcoding" is essential for the code to be executed.
a (probably very fussy) linker again into your executable, to merge the new code and resolve any function calls etc. in both directions.
Also, the newly imported code would not be reachable by the rest of the program which was not written (and certainly not compiled!) with that information in mind. So you would need an entry point and a means of exchanging information. Then in this block of information you could even put pointers to code to be called.
Not all architectures and OSes will support this, because "data" and "code" are two concerns best left separate. Code is potentially harmful; think of it as nitric acid. External data is fluid and slippery, like glycerine. And handling nitroglycerine is, as I said, possible. Practical and safe are something completely different.
Once the prerequisites were met, you would have two or three nice extra functions and could write:
void *load(const char* filename, void *data) {
// some "don't load twice" functionality is probably needed
void *code = compile_source(filename);
if (NULL == code) {
// a get_last_compiler_error() would be useful
return NULL;
}
if (EXIT_SUCCESS != invoke_code(code, data)) {
// a get_last_runtime_error() would also be useful
release_code(code);
return NULL;
}
// it is now the caller's responsibility to release the code.
return code;
}
And of course it would be a security nightmare, with source code left lying around and being imported into a running application.
Maintaining the code would be a different, but equally scary nightmare, because you'd be needing two toolchains - one to build the executable, one embedded inside said executable - and they wouldn't necessarily be automatically compatible. You'd be crying loud for all the bugs of the realm to come and rejoice.
What problem would be solved?
Implementing require_once in C++ might be fun, but you thought it could answer a problem you have. Which is it exactly? Maybe it can be solved in a more C++ish way.
A better alternative, considering also performances etc., to compile a loadable module beforehand, and load it at runtime.
If you need to perform small tunings to the executable, place parameters into an external configuration file and provide a mechanism to reload it. Once the modules conform to a fixed specification, you can even provide "plugins" that weren't available when the executable was first developed.

What is the right way to parse large amount of input for main function in C++

Suppose there are 30 numbers I had to input into an executable, because of the large amount of input, it is not reasonable to input them via command line. One standard way is to save them into a single XML file and use XML parser like tinyxml2 to parse them. The problem is if I use tinyxml2 to parse the input directly I will have a very bloated main function, which seems to contradict the common good practice.
For example:
int main(int argc, char **argv){
int a[30];
tinyxml2::XMLDocument doc_xml;
if (doc_xml.LoadFile(argv[1])){
std::cerr << "failed to load input file";
}
else {
tinyxml2::XMLHandle xml(&doc_xml);
tinyxml2::XMLHandle a0_xml =
xml.FirstChildElement("INPUT").FirstChildElement("A0");
if (a0_xml.ToElement()) {
a0_xml.ToElement()->QueryIntText(&a[0]);
}
else {
std::cerr << "A0 missing";
}
tinyxml2::XMLHandle a1_xml =
xml.FirstChildElement("INPUT").FirstChildElement("A1");
if (a1_xml.ToElement()) {
a1_xml.ToElement()->QueryIntText(&a[1]);
}
else {
std::cerr << "A1 missing";
}
// parsing all the way to A29 ...
}
// do something with a
return 0;
}
But on the other hand, if I write an extra class just to parse these specific type of input in order to shorten the main function, it doesn't seem to be right either, because this extra class will be useless unless it's used in conjunction with this main function since it can't be reused elsewhere.
int main(int argc, char **argv){
int a[30];
ParseXMLJustForThisExeClass ParseXMLJustForThisExeClass_obj;
ParseXMLJustForThisExeClass_obj.Run(argv[1], a);
// do something with a
return 0;
}
What is the best way to deal with it?
Note, besides reading XML files you can also pass lots of data through stdin. It's pretty common practice to use e.g. mycomplexcmd | hexdump -C, where hexdump is reading from stdin through the pipe.
Now up to the rest of the question: there's a reason to go with the your multiple-functions example (here it's not very important whether they're constructors or usual functions). It's pretty much the same as why would you want any function to be smaller — readability. That said, I don't know about the "common good practice", and I've seen many terminal utilities with very big main().
Imagine someone new is reading 1-st variant of main(). They'd be going through the hoops of figuring out all these handles, queries, children, parents — when all they wanted is to just look at the part after // do something with a. It's because they don't know if it's relevant to their problem or not. But in the 2-nd variant they'll quickly figure it out "Aha, it's the parsing logic, it's not what I am looking for".
That said, of course you can break the logic with detailed comments. But now imagine something went wrong, someone is debugging the code, and they pinned down the problem to this function (alright, it's funny given the function is main(), maybe they just started debugging). The bug turned out to be very subtle, unclear, and one is checking everything in the function. Now, because you're dealing with mutable language, you'd often find yourself in situation where you think "oh, may be it's something with this variable, where it's being changed?"; and you first look up every use of the variable through this large function, then conditions that could lead to blocks where it's changed; then you figuring out what does this another big block, relevant to the condition, that could've been extracted to a separate function, what variables are used in there; and to the moment you figured out what it's doing you already forgot half of what you were looking before!
Of course sometimes big functions are unavoidable. But if you asked the question, it's probably not your case.
Rule of thumb: you see a function doing two different things having little in common, you want to break it to 2 separate functions. In your case it's parsing XML and "doing something with a". Though if that 2-nd part is a few lines, probably not worth extracting — speculate a bit. Don't worry about the overhead, compilers are good at optimizing. You can either use LTO, or you can declare a function in .cpp file only as static (non-class static), and depending on optimization options a compiler may inline the code.
P.S.: you seem to be in the state where it's very useful to learn'n'play with some Haskell. You don't have to use it for real serious projects, but insights you'd get can be applied anywhere. It forces you into better design, in particular you'd quickly start feeling when it's necessary to break a function (aside of many other things).

std::stringstream with direct output buffer / string result access, avoiding copy?

Is there a canonical / public / free implementations variant of std::stringstream where I don't pay for a full string copy each time I call str()? (Possibly through providing a direct c_str() member in the osteam class?)
I've found two questions here:
C++ stl stringstream direct buffer access (Yeah, it's basically the same title, but note that it's accepted answer doesn't fit this here question at all.)
Stream from std::string without making a copy? (Again, accepted answer doesn't match this question.)
And "of course" the deprecated std::strstream class does allow for direct buffer access, although it's interface is really quirky (apart from it being deprecated).
It also seems one can find several code samples that do explain how one can customize std::streambuf to allow direct access to the buffer -- I haven't tried it in practice, but it seems quite easily implemented.
My question here is really two fold:
Is there any deeper reason why std::[o]stringstream (or, rather, basic_stringbuf) does not allow direct buffer access, but only access through an (expensive) copy of the whole buffer?
Given that it seems easy, but not trivial to implement this, is there any varaint available via boost or other sources, that package this functionality?
Note: The performance hit of the copy that str() makes is very measurable(*), so it seems weird to have to pay for this when the use cases I have seen so far really never need a copy returned from the stringstream. (And if I'd need a copy I could always make it at the "client side".)
(*): With our platform (VS 2005), the results I measure in the release version are:
// tested in a tight loop:
// variant stream: run time : 100%
std::stringstream msg;
msg << "Error " << GetDetailedErrorMsg() << " while testing!";
DoLogErrorMsg(msg.str().c_str());
// variant string: run time: *** 60% ***
std::string msg;
((msg += "Error ") += GetDetailedErrorMsg()) += " while testing!";
DoLogErrorMsg(msg.c_str());
So using a std::string with += (which obviously only works when I don't need custom/number formatting is 40% faster that the stream version, and as far as I can tell this is only due to the complete superfluous copy that str() makes.
I will try to provide an answer to my first bullet,
Is there any deeper reason why std::ostringstream does not allow direct buffer access
Looking at how a streambuf / stringbuf is defined, we can see that the buffer character sequence is not NULL terminated.
As far as I can see, a (hypothetical) const char* std::ostringstream::c_str() const; function, providing direct read-only buffer access, can only make sense when the valid buffer range would always be NULL terminated -- i.e. (I think) when sputc would always make sure that it inserts a terminating NULL after the character it inserts.
I wouldn't think that this is a technical hindrance per se, but given the complexity of the basic_streambuf interface, I'm totally not sure whether it is correct in all cases.
As for the second bullet
Given that it seems easy, but not trivial to implement this, is there
any variant available via boost or other sources, that package this
functionality?
There is Boost.Iostreams and it even contains an example of how to implement an (o)stream Sink with a string.
I came up with a little test implementation to measure it:
#include <string>
#include <boost/iostreams/stream.hpp>
#include <libs/iostreams/example/container_device.hpp> // container_sink
namespace io = boost::iostreams;
namespace ex = boost::iostreams::example;
typedef ex::container_sink<std::wstring> wstring_sink;
struct my_boost_ostr : public io::stream<wstring_sink> {
typedef io::stream<wstring_sink> BaseT;
std::wstring result;
my_boost_ostr() : BaseT(result)
{ }
// Note: This is non-const for flush.
// Suboptimal, but OK for this test.
const wchar_t* c_str() {
flush();
return result.c_str();
}
};
In the tests I did, using this with it's c_str()helper ran slightly faster than a normal ostringstream with it's copying str().c_str() version.
I do not include measuring code. Performance in this area is very brittle, make sure to measure your use case yourself! (For example, the constructor overhead of a string stream is non-negligible.)

Basic C++ Design

I have a question re my C++ class design. Often I face little issues like this and I'd like some advice as to what is better accepted.
I have some class which monitors the temperature of some device over UDP. If the device receives a packet of data, it is to print "x\n" to stdout to show that it received something. Then it is to check the data in that packet and verify that the data doesn't show that the temperature of the device is too high. If it is too high, I must call some function. If it isn't, I must call some other function.
I'm not sure if I should do this:
enum temperature {TEMPERATURE_FINE, TEMPERATURE_EXCEEDED};
int main(int argc, char* argv[])
{
std::vector<std::string> args(argv+1, argv + argc);
if(!args.size())
cout << "No parameters entered.\n";
else
{
CTemperatureMonitor tempMonitor(args);
if(tempMonitor.MonitorTemperature() == TEMPERATURE_EXCEEDED)
tempMonitor.ActivateAlarm();
else
tempMonitor.DisableAlarm();
}
return 0;
}
where tempMonitor.MonitorTemperature() calls std::cout << "x\n". So std::cout << "x\n" is built into the class.
Or:
enum temperature {TEMPERATURE_FINE, TEMPERATURE_EXCEEDED};
int main(int argc, char* argv[])
{
std::vector<std::string> args(argv+1, argv + argc);
if(!args.size())
cout << "No parameters entered.\n";
else
{
CTemperatureMonitor tempMonitor(args);
temperature tempExceeded = tempMonitor.MonitorTemperature();
std::cout << "x\n";
if(tempExceeded == TEMPERATURE_EXCEEDED)
tempMonitor.ActivateAlarm();
else
tempMonitor.DisableAlarm();
}
return 0;
}
where std::cout << "x\n" is not included in the class.
The std::cout << "x\n" must occur before calling CTemperatureMonitor::ActivateAlarm() and CTemperatureMonitor::DisableAlarm().
I know this might seem really minor and simplistic, but I'm often wondering what exactly should be part of the class. Should the class be outputting to stdout? Does it make any difference whether I do one or the other? Am I being to pedantic about this?
Also, as an aside, I know global variable are considered poor practise. I use the temperature enum in both the main and the class. Should I declare it twice, once in main and once in the CTemperatureMonitor class, or once globally? Although this question seems rather specific, it would actually clear a whole lot of other things up for me.
Thanks.
First off, I'd like to point out that there are various size of projects, and that depending on the size (and criticity), the advices will actually be different. So first a rule of thumb:
The size of the "framework" that you put in place (Logger, Option Parser, etc...) should probably not exceed 10% of the total program. After this point, it's just overkill. Unless it's the goal of the exercise!
That being said, we can start have a look at your actual questions.
Also, as an aside, I know global variable are considered poor practice. I use the temperature enum in both the main and the class. Should I declare it twice, once in main and once in the CTemperatureMonitor class, or once globally?
You are actually mistaking variables and types here. temperature is a type (of enum kind).
In general, Types are used as bridges between various parts of the program, and to do so it is important that all those parts share the same definition of the type. Therefore, for types, it would be bad practice to actually declare it twice.
Furthermore, not all globals are evil. Global variables are (shared state), but global constants are fine, and usually play a role similar to types.
I know this might seem really minor and simplistic, but I'm often wondering what exactly should be part of the class. Should the class be outputting to stdout? Does it make any difference whether I do one or the other? Am I being to pedantic about this?
There are two kinds of output:
logging output, which is used to diagnose issues when they are encountered
real output, which is what the program does
Depending on programs, you might have either, both or none.
From a pedantic point of view, you would generally prefer not mix those. For example, you could perfectly send the logging to a file, or stderr when it's serious, and use stdout for the "useful" stuff.
This actually drives the design somewhat, as then you need two sinks: one for each output.
Being that you have a quite simplistic program, the easiest way might be to simply pass two different std::ostream& to your class upon construction. Or, even simpler, just have two generic functions and use the (evil) global variables.
In larger program, you would probably design a Logger class that would have various log levels and provide specific macros to register (automatically) the function name, file name and line number of the log line. You would also probably have a lightweight logging mechanism that would allow you to disable logging DEBUG/DEV level traces in the Release builds.
The single level of abstraction principle would favor doing all I/O in the same method instead of doing some at a high level and some at a low level of abstraction.
In other words, if you believe in that principle, keeping both input and output via cin/cout in the same method instead of showing some and hiding some is a good idea. It tends to give more readable code with less dependencies in each class.
According to the Single responsibility principle, the second option is preferred (any class should have exactly one responsibility, which is monitoring the temperature in your case, not outputting the results,) though you might want to set up another class to handle the temperature monitoring results (e.g. to write the results to a certain log file or something).
It's normal to have some way of logging information about your program.
It's also normal for this way to be, well, global.
Otherwise things just get too complicated when calling a method.
The only thing you should do to improve this is:
Have a logger class (or use an existing one) that can have its output stream set to whatever you choose (including std::out and an empty stream that doesn't print anything).
Ultimately, have a logger than can be hidden behind a #define so that it does not slow down code running in Release mode.
In my opinion in such cases you should not include cout into class, as maybe sometimes later you will need to output to the file or don't output at all. So if the cout is not put to the class you will have opportunity to reuse your class code without any changes.
Neither. Responsibility of main is to start the application with the command line arguments and to give a return value. Everything else should not be there. You might want to take a look at a book like "Object Design, Roles, Responsibilities, and Collaborations"

Issue with using std::copy

I am getting warning when using the std copy function.
I have a byte array that I declare.
byte *tstArray = new byte[length];
Then I have a couple other byte arrays that are declared and initialized with some hex values that i would like to use depending on some initial user input.
I have a series of if statements that I use to basically parse out the original input, and based on some string, I choose which byte array to use and in doing so copy the results to the original tstArray.
For example:
if(substr1 == "15")
{
std::cout<<"Using byte array rated 15"<<std::endl;
std::copy(ratedArray15,ratedArray15+length,tstArray);
}
The warning i get is
warning C4996: 'std::copy': Function call with parameters
that may be unsafe
- this call relies on the caller to check that the passed
values are correct.
A possible solution is to to disable this warning is by useing -D_SCL_SECURE_NO_WARNINGS, I think. Well, that is what I am researching.
But, I am not sure if this means that my code is really unsafe and I actually needed to do some checking?
C4996 means you're using a function that was marked as __declspec(deprecated). Probably using D_SCL_SECURE_NO_WARNINGS will just #ifdef out the deprecation. You could go read the header file to know for sure.
But the question is why is it deprecated? MSDN doesn't seem to say anything about it on the std::copy() page, but I may be looking at the wrong one. Typically this was done for all "unsafe string manipulation functions" during the great security push of XPSP2. Since you aren't passing the length of your destination buffer to std::copy, if you try to write too much data to it it will happily write past the end of the buffer.
To say whether or not your usage is unsafe would require us to review your entire code. Usually there is a safer version they recommend when they deprecate a function in this manner. You could just copy the strings in some other way. This article seems to go in depth. They seem to imply you should be using a std::checked_array_iterator instead of a regular OutputIterator.
Something like:
stdext::checked_array_iterator<char *> chkd_test_array(tstArray, length);
std::copy(ratedArray15, ratedArray15+length, chkd_test_array);
(If I understand your code right.)
Basically, what this warning tells you is that you have to be absolutely sure that tstArray points to an array that is large enough to hold "length" elements, as std::copy does not check that.
Well, I assume Microsoft's unilateral deprecation of the stdlib also includes passing char* to std::copy. (They've messed with a whole range of functions actually.)
I suppose parts of it has some merit (fopen() touches global ERRNO, so it's not thread-safe) but other decisions do not seem very rational. (I'd say they took a too big swathe at the whole thing. There should be levels, such as non-threadsafe, non-checkable, etc)
I'd recommend reading the MS-doc on each function if you want to know the issues about each case though, it's pretty well documented why each function has that warning, and the cause is usually different in each case.
At least it seems that VC++ 2010 RC does not emit that warning at the default warning level.