What is the best way to find wide string headaches such as L"%s"? - c++

Here is an example of one of the headaches I mean:
We have a multiplatform project that uses mostly Unicode strings for rendering text to the screen. On windows in VC++ the line:
swprintf(swWideDest, LEN, L"%s is a wide string", swSomeWideString);
compiles fine and prints the wide string into the other wide string.
However, this should really be:
swprintf(swWideDest, LEN, L"%ls is a wide string", swSomeWideString);
Without replacing the '%s' with a '%ls' this will not work on other platforms. As testing in our environment on Windows is easier, quicker, and far simpler to debug. These kind of bugs can easily go unnoticed.
I know that the best solution is to write correct code in the first place, but under pressure simple mistakes are made, and in this particular case, the mistake can easily go unnoticed for a long time.
I suspect there are many variations on this sort of bug, that we are yet to enjoy.
Does anyone have a nice and neat way of finding these kind of bugs?
: D

You might want to have a look at FastFormat in case Boost.Format is too slow for your needs.
Compared to stringstreams and Boost.Format:
IOStreams: FastFormat.Format is
faster than IOStreams, by between
~100-900%, in all cases
Boost.Format: FastFormat.Format is
faster than Boost.Format, by between
~400-1650%, in all cases

As none of the functions of *printf family are typesafe you either
search for probable errors via regular expressions and fix them manually
use another approach that is typesafe, maybe based on stringstreams or boost.format

Related

C++ sprintf boxing... how?

I want to make my own string formatting function in C++ and I'd like to have the function figure out how big types are on it's own without specifying it, like sprintf. Does it work by cheating or is it something I can do as well?
Output of objects to strings should probably be done with the regular stringstream stuff, so that you can use all the wonderful features that exist in c++ without redeveloping the wheel.
Quite a lot of time has been spent getting that working well, I'm not sure about the rationale of repeating that effort. Perhaps if you told us the real requirement for bypassing what's already there (and "I just want to do it myself" is rarely a good reason), we could come up with more targeted or applicable answers.

Most efficient way to write to the console?

I want to make an iostream type class. I would like to find the most efficient way to write a set of characters to the screen.
Ideas:
printf-I dont want the type formating I need to do that myself.
WriteConsole-Read that it was slower than printf? True/False?
*Assembly-Dont know how
other?
*my main concern is if I could find how to do it. I dont have any rush as far as time.
EDIT: for some reason WriteConsole is slower.
Use "fwrite":
fwrite( buffer, size, 1, stderr );
This will be much faster than you will ever need. And you have a bonus that you can then make your iostream class be able to write not just to the console but to files too.
I would suggest trying a few methods (you've mentioned a couple there) and benchmarking the results. You may be suprised by your results but even if they're as you expect, you can at least be certain you're doing the best you can. For the record though, I would be surprised if you find much faster than printf.
The most pragmatic way to code (in my experience) goes along these lines:
Get something the functionally performs.
Set up a benchmark to test whether your solution is fast enough.
If it's not fast enough, try something else then go back to 2.
If it's fast enough you're done!
It sounds like you've not even started designing / coding from your question. Beware premature optimisation...
I found that for Windows using WriteConsoleOutputCharacter() averages about the same as fwrite() for stdout, and requires one less file to include if you're not using <stdio.h>. Both are very fast though. I did not test FillConsoleOutputCharacter(). I probably didn't use that great of a benchmark either. As for premature optimization I had to tackle this problem first when creating a cool little library for the console window that more or less turned it into a windows based environment with an overarching system managing it. I used this system for college and personal text based games. For logging and similar behaviour using cout and friends does the job just as well, despite being slow(er).

HSP to C++: Language conversion of a large codebase

I have large codebase written in HSP(wikipedia article - think "BASIC", but japanese).
By "large" I mean it has 151352 lines of code, 60 source files with total code size of 4.5 megabytes. Also, it has plenty of spaghetti code, no comments and badly needs refactoring. The good thing is that it has a lot of text messages, so not all of those lines represent actual program logic.
I'd like to convert this codebase to C++, while retaining my sanity. "I'd like" means that I'm not required to do it, but I'd strongly prefer to find a method to do it.
What's a good way to do it? Obviously, I can't just rewrite it all in C++ (Well, I could do it in theory, but it would take up to 2 years, and I would introduce many bugs in process), so (I think) a reasonable decision would be to implement code recompiler/preprocessor that would allow me to convert source code into messy C++ (HSP is much simpler than C++, so it should be possible) and then start refactoring/documenting the result.
Unfortunately, i'm not entirely sure how to approach building the recompiler efficiently. While I know there are Lex/Yacc/Bison/Boost::spirit, I haven't used them personally.
So can you recommend a good way perform such conversion?
Any free tool ("free" as in "free beer") that is available on windows platform is allowed, as long as it doesn't affect license of original source code.
Yacc it's targeted to efficiently handle more complex tasks, and it's complex to learn, I think it's overkill.
Spirit should be a better choice, if you already know go with it, personally I would use Prolog for this task.
Prolog has builtin syntax analysis, so called DCG. For a language simple as Basic, I'm pretty sure there are no practical problems in the grammar, and modern Prologs (I think to SWI-Prolog, effectively) can handle complex characters encoding in the source very well.
Also, in Prolog you could try to apply some naivety to unroll the spaghetti code. Doing in general it's a complex task, but could be easy if you have just a small number of patterns, repeated many times.
Pattern matching it's key in such problems...
Well, if you really want to go this way and forget about the advices in the comment, you should probably have a good look at the openhsp compiler, and mostly the codegen file :
http://dev.onionsoft.net/trac/browser/trunk/hspcmp/codegen.cpp
and also have the tokens under your eyes :
http://dev.onionsoft.net/trac/browser/trunk/hspcmp/token.h
http://dev.onionsoft.net/trac/browser/trunk/hspcmp/token.cpp
it seems that HSP is not that complicated, and you can skip the AST step. Though, you could get good optimizations out of that. Don't forget also to prepare a C++ lib to embed your generated code in, so you can manage HSP oddities (like globals, and dynamic typing).
if you can hack something out of that, you'll also have to remove most of what this compiler does (create executable, linkage and stuff). Don't forget, it's a really long and hard task that may not be faster or easier than a full rewrite. But if you're ready, you'll find it out the hard way :)
According to original owner of the codebase, HSP starting with version 3 includes HSP to C code converter. Information is not verified due to lack of time, but this blog article documents the tool called hspcnv which is supposed to convert HSP code into C code. The article is in japanese.

Searching a data file: coding in python vs c++

First off , this isn't a homework assignment!!! :p What I want to do is this:
Given a data file(be it text or numbers) saved on the desktop i.e., I want to be able to search that file and pull out only the data I want and print it to the screen. I may want to do other stuff with it but I have no idea what options there are.
Also, would python or c++ be more appropriate. I'm not familiar much with python and it's been years since I've picked up c++ but I've heard that python is more efficient and although this program's efficiency may or may not be a big deal I have heard python is much easier to understand.
Examples,Code, Templates(<-- would be awesome)
Thanks all!
This is a bit difficult to answer without knowing how you want to specify the data you want.
If you can specify the necessary data using regexes, Python will probably be about equally efficient, and a bit quicker to write -- but you may be able to do the job with something like grep even more easily.
If it'll take a lot more processing to figure out what data to display, Python may start to get quite a bit slower -- it can be quite fast as long as the Python part is mostly a fairly "thin" shell and most of the heavy lifting is done by various libraries. It can get quite a bit slower if you're doing serious/significant processing in Python itself.
If you write in in C++, you'll get more or less the opposite situation -- as long as you're reasonably careful, chances are pretty good that performance won't be an issue. The real question will be how much work it takes to produce what you want. Without knowing anything about what data you're looking for, how you want to display it, etc., it's nearly impossible to guess about that though.
edit based on comment: A pattern like Data = #### sounds like pretty much a classic case for a regular expression, for which grep will work just fine.
This is also something Python can probably do perfectly well, but if you did decide to do your own in C++, it could look something like this:
#include <iostream>
#include <string>
#include <regex>
#include <fstream>
int main(int argc, char **argv) {
if (argc < 2) {
std::cerr << "Usage: searched <filename>\n";
return 1;
}
std::ifstream in(argv[1]);
std::string line;
std::regex pat("Data = [0-9]+");
while (std::getline(line, in))
if (std::regex_search(line, pat))
std::cout << line << "\n";
return 0;
}
This assumes you're looking for the Data = # pattern happening somewhere in the line. If you want to only consider it a match if that's the whole line, change the regex_search to regex_match instead.
The other assumption is that you're using a relatively recent compiler that includes the standard regular expression classes. This is the case with VS 2010 and gcc 4.6 (if I recall correctly) but some older compilers may name it std::tr1::regex instead, and some that are older still won't have it at all.
C++ will be faster (maybe, if you write it well), but, it will be harder, but easier to start since you know it.
Python will take some time to get used to, and it will probably run a wee bit slower, but, will be easier (once you learn the language).
This is a very easy problem solved numerous times, so, what language you pick really doesn't matter.
If you like a GUI, then look at GUI libraries.
Python will be much better for this task:
for line in file("path/to/file.txt", "rb"):
print line
The equivalent C++ is much more involved.

Is it a good idea to apply some basic macros to simplify code in a large project?

I've been working on a foundational c++ library for some time now, and there are a variety of ideas I've had that could really simplify the code writing and managing process. One of these is the concept of introducing some macros to help simplify statements that appear very often, but are a bit more complicated than should be necessary.
For example, I've come up with this basic macro to simplify the most common type of for loop:
#define loop(v,n) for(unsigned long v=0; v<n; ++v)
This would enable you to replace those clunky for loops you see so much of:
for (int i = 0; i < max_things; i++)
With something much easier to write, and even slightly more efficient:
loop (i, max_things)
Is it a good idea to use conventions like this? Are there any problems you might run into with different types of compilers? Would it just be too confusing for someone unfamiliar with the macro(s)?
IMHO this is generally a bad idea. You are essentially changing well known and understood syntax to something of your own invention. Before long you may find that you have re-invented the language. :)
No, not a good idea.
int max = 23;
loop(i, ++max)...
It is, however, a good idea to refactor commonly used code into reusable components and then reuse instead of copy. You should do this through writing functions similar to the standard algorithms like std::find(). For instance:
template < typename Function >
void loop(size_t count, Function f)
{
for (size_t i = 0; i < count, ++i) f();
}
This is a much safer approach:
int max = 23;
loop(++max, boost::bind(....));
I think you've provided one strong argument against this macro with your example usage. You changed the loop iterator type from int to unsigned long. That has nothing to do with how much typing you want to do, so why change it?
That cumbersome for loop specifies the start value, end value, type and name of the iterator. Even if we assume the final part will always be ++name, and we're happy to stick to that, you have two choices - remove some of the flexibility or type it all out every time. You've opted to remove flexibility, but you also seem to be using that flexibility in your code base.
I would say it depends upon whether you expect anyone else to ever have to make sense of your code. If it's only ever going to be you in there, then I don't see a problem with the macros.
If anyone else is ever going to have to look at this code, then the macros are going to cause problems. The other person won't know what they are or what they do (no matter how readable and obvious they seem to you) and will have to go hunting for them when they first run across them. The result will be to make your code unreadable to anyone but yourself - anyone using it will essentially have to learn a new language and program at the same time.
And since the chances of it just being you dealing with the code are pretty much nil if you hope the code to be a library that will be for more than just your personal use - then I'd go with don't.
In Unix, I find that by the time I want to create an alias for a command I use all the time, the command is on my fingers, and I'd have a harder time remembering the syntax of my alias than the original command.
The same applies here -- by the time you use an idiom so much that you want to create a macro for it, the idiom will be on you fingers and cause you more pain than just typing out the code.
Getting rid of the for loops is generally a good idea -- but replacing them with macros is not. I'd take a long, hard look at the standard library algorithms instead.
Apart from the maintenance/comprehension problems mentionned by others, you'll also have a hard time breaking and single-stepping through macro code.
One area where I think macros might be acceptable would be for populating large data structures with constants/litterals (when it can save an excessive amount of typing). You normally would not single-step through such code.
Steve Jessop makes a good point. Macros have their uses. If I may expound upon his statements, I would go so far as to say that the argument for or against macros comes down to "It depends". If you make your macros without careful thought, you risk making future maintaners' lives harder. On the other hand, using the wxWidgets library requires using library provided macros to connect your code with the gui library. In this case, the macros lower the barrier of entry for using the library, as magic whose innards are irrelevant to understanding how to work with the library are hidden away from the user. In this case, the user is saved from having to understand things they really don't need to know about, and can be argued that this is a "Good" use of macros. Also, wxWidgets clearly documents how these macros are supposed to be used. So make sure that what you hide isn't something that is going to need to be understood by someone else coming in.
Or, if its just for your use, knock yourself out.
It's a question of where you're getting your value. Is typing those 15 extra characters in your loops really what's slowing your development down? Probably not. If you've got multiple lines of confusing, unavoidable boilerplate popping up all over the place, then you can and should look for ways to avoid repeating yourself, such as creating useful functions, cleaning up your class hierarchies, or using templates.
But the same optimization rules apply to writing code as to running it: optimizing small things with little effect is not really a good use of time or energy.