Binary diff with masking - Needed to watch consistency of builds - build

I am looking for a binary diff tool which allows me to mask parts of the binary files so they won't be taken into account.
The challenge I am facing is the following: Once we finished actively developing a branch of our embedded software we put it into maintenance. This means it is still build once a week to ensure that it doesn't erode and is not buildable any more. This is necessary so that once we urgently need to fix a bug or add a little feature we can concentrate on this and not need to solve other problems as well.
With our current approach we know that the branch still builds without errors but we do not know whether the produced binaries are is still the same. So I thought comparing a “golden master” binary against what was build and decide whether they are identical would be an approach to this. Anyhow, the files will never be identical to the bit because e.g. build dates are part of the binaries.
Does anyone have an idea or has even solved this puzzle before? I guess I am not the first developer facing this challenge.
Regards,
Mark

The Linux from Scratch project ran into this problem when they were transitioning from Glibc 2.2 to Glibc 2.3, I think in 2001. They developed some scripts and techniques to reduce the amount of noise, but it was still only reducing the amount of noise. They still needed to manually inspect the remaining differences. Anyway, if you dig around the mailinglist archives from around 2001 to 2003, you should find some pointers.
However, aren't you going at this the wrong way? I mean, the interesting question is not so much whether it still builds identical binaries, but rather whether it still works, isn't it? So, it is much more interesting whether it still passes the testsuite rather than whether it is still the same binary.
After all, nobody cares whether it is still the same binary, if there is no obversable difference in behavior. And if it's broken, then nobody is going to take "Yes, I know, but it is broken exactly the same way it was last week, and I can prove it!" as an excuse.

Related

Tips for preventing Xcode 9.4 from Indexing indefinitely

For fun, I have started to develop games with Unreal and with that comes learning C++ and using an actual IDE. My past experience has been with web development, so something like Atom or Sublime text was all that was needed to get the job done.
Something that has been a nuisance is the indefinite indexing that can occur after builds in XCode. I realize that this is a little out of my control, since it would require Apple to fix these issues. Maybe they will and maybe they won't, but until then I would like to spend more of my time coding and less time waiting for XCode to reboot.
For reference, the reboot is being done because the CLANG process (from my understanding it is the complier responsible for the indexing in XCode) is eating up at least 95% of my CPU.
I would like to code and create game worlds more efficiently, and not have to deal with this indexing issue so much. Since I can't fix the issue then maybe there is a way to avoid it. I was hoping that some insight could be shared in this regard. These are the two things that I have noticed that can set it off:
If there is an error or a warning during the build, then this can
trigger the indexing to run indefinitely. I can fix the issue,
re-initialize the build, and then the indexing continues to run
indefinitely :(. If there are no issues or errors during the build,
then indexing would actually complete in a timely manner. For me, I
don't see any avoidance other than don't make errors or create
warnings (which I can tell you, is unavoidable because I will make
errors).
The second, which seems to be easier to avoid, is that if I do any
clicking, button pushing, etc. in Xcode while it is building then
this can also set off the indefinite indexing.
I have read several posts, forum discussions, etc. on this issue and tried several of the suggestion, i.e. removing the DerivedData from Xcode. It looks like you can even turn indexing off. This shuts down the auto-complete and refactoring features, which might in the end be worth it since (Refactor -> Extract Function) hasn't exactly been kind either.
Any workflow suggestions on things to do and things NOT to do is this kind of scenario would be appreciated!
Long post, but I thought this could be good for anyone else in similar shoes, so I wanted to include details.
When this happened to me, I thought it might be because iCloud Drive was stalling for some reason (as i mentioned in my comment). I didn't really need it to be synced, so I just moved the project directory to outside the iCloud Drive, then the infinite indexing problem went away.
I'm not sure if you're using iCloud or not, but hopefully this answer helps someone anyway.

How to approach debugging a huge not so familiar code base?

Seldom during working on large scale projects, suddenly you are moved on to a project which is already in maintainance phase.You end up with having a huge code C/C++ code base on your hands, with not much doccumentation about the design.The last person who could give you some knowledge transfer about the code has left the company already and to add to your horrors there is not enough time to get acquainted with the code and develop an understanding of the overall module/s.In this scenario when you are expected to fix bugs(core dumps,functionality,performance problems etc) on the module/s what is the approach that you will take?
So the question is:
What are your usual steps for debugging a not so familiar C/C++ code base when trying to fix a bug?
EDIT: Enviornment is Linux, but code is ported on Windows too so suggestions for both will be helpful.
If possible, step through it from main() to the problematic area, and follow the execution path. Along the way you'll get a good idea of how the different parts play together.
It could also be helpful to use a static code analysis tool, like CppDepends or even Doxygen, to figure out the relations between modules and be able to view them graphically.
Use a pen and paper, or images/graphs/charts in general, to figure out which parts belong where and draw some arrows and so on.
This helps you build and see the image that will then be refined in your mind as you become more comfortable with it.
I used a similar approach attacking a hellish system that had 10 singletons all #including each other. I had to redraw it a few times in order to fit everything, but seeing it in front of you helps.
It might also be useful to use Graphviz when constructing dependency graphs. That way you only have to list everything (in a text file) and then the tool will draw the (often unsightly) picture. (This is what I did for the #include dependencies in above syste,)
As others have already suggested, writing unit-tests is a great way to get into the codebase. There are a number of advantages to this approach:
It allows you to test your
assumptions about how the code
works. Adding a passing test proves
that your assumptions about that
small piece of code that you are
testing are correct. The more
passing tests you write, the better
you understand the code.
A failing unit test that reproduces
the bug you want to fix will pass
when you fix the bug and you know
that you have succeeded.
The unit tests that you write act as
documentation for the future.
The unit tests you write act as
regression tests as more bugs are
fixed.
Of course adding unit tests to legacy code is not always an easy task. Happily, a gentleman by the name of Michael Feathers has written an excellent book on the subject, which includes some great 'recipes' on adding tests to code bases without unit tests.
Some pointers:
Debug from the part which seems more
relevant to the workflow.
Use debug
strings
Get appropriate .pdb and attach the
core dump in debuggers like Windbg
or debugdiag to analyze it.
Get a person's help in your
organization who is good at
debugging. Even if he is new to your
codebase, he could be very helpful.
I had prior experience. They would
give you valuable pointers.
Per Assaf Lavie's advice, you could use static code analyzers.
The most important thing: as you
explore and debug, document
everything as you progress. At least
the person succeeding you would
suffer less.
Three things i don't see yet:
write some unit tests which use the libraries/interfaces. demonstrate/verify your understanding of them and promote their maintainability.
sometimes it is nice to create an special assertion macro to check that the other engineer's assumptions are in line with yours. you could:
not commit their uses
commit their uses, converting them to 'real' assertions after a given period
commit their uses, allowing another engineer (more familiar with the project) to dispose or promote them to real assertions
refactoring can also help. code that is difficult to read is an indication.
The first step should be try to read the code. Try to see the code where the bug is. Follow the code from main to that point ans try to see what could be wrong. Read the comments from the code(if any). Normally the function names are useful. Understand what each function does.
Once you get some idea of the code then you can start debugging the code. Put breakpoints where you don't understand the code or where you think the error can be. Start following the code line by line. Debugging is like sex. Initially painful, but slowly you start to enjoy it.
cscope + ctags are available on both Linux and Windows (via Cygwin). If you give them a chance, these tools will become indispensable to you. Although, IDEs like Visual Studio also do an excellent job with code browsing facilities as well.
In a situation like yours, because of time constraints, you are driven by symptoms. I mean that you don't have time to reconstruct the big picture / design / architecture. So you focus on the symptoms and work outwards, and each time reconstruct as much of the big picture as you need for that particular problem. But do not make "local" decisions in a hurry. Have the patience to see as much of the big picture as needed to make a good quality decision. And don't get caught in the band-aid syndrome i.e. put any old fix in that will work. It is your job to preserve the underlying architecture / design (if there is one, and to whatever extent that you can discover it).
It will be a struggle at first, as your mind "hunts" excessively. But soon the main themes in the design / architecture will emerge, and all of it will start to make sense. Think, by not thinking, grasshoppa :)
You have to have a fully reliable IDE which has a lot of debbugging tools (breakpoints, watches, and the like). The best way to familiarize yourself with a huge code is to play around with it and see how data is passed from one method to another. Also, you can reverse engineer the code so could see the relationship of the classes. :D Good Luck!
For me, there is only one way to get to know a process - Interaction. Identify the interfaces of the process/system. Then identify the input/output relationship (these steps maybe not linear). Once you do that, you can start tinkering at the code with a fair amount of confidence because you know what it is "supposed to do" then it's just a matter of finding out "how it is actually being done". For me though, getting to know the interface (Not necessarily the user interface) of the system is the key. To put it bluntly - Never touch the code first!!!
Not sure about C/C++, but coming from Java and C#, unit testing will help. In Java there's JUnit and TestNG libraries for unit testing, in C# there's NUnit and mstest. Not sure about C/C++.
Read the book 'Refactoring: Improving the Design of Existing Code' by Martin Fowler, Kent Beck, et al. Will be quite a few tips in there I'm sure that will help, and give you some guidance to improving the code.
One tip: if it aint broke, don't fix it. Don't bother trying to fix some library or really complicated function if it works. Focus on parts where there's bugs.
Write a unit test to reproduce the scenario where the code should work. The test will fail at first. Fix the code until the unit test passes successfully. Repeat :)
Once a majority of your code, the important bits that are too complex to manually debug and fix, is under automated unit tests, you'll have a safety harness of regression tests that'll make you feel more confident at changing the existing code base.
while (!codeUnderstood)
{
Breakpoints();
Run();
StepInto();
if(needed)
{
StepOver();
}
}
I don't try to get an overview of the whole system as suggested by many here. If there is something which needs fixing I learn the smallest part of the code I can to fix the bug. The next time there is an issue I'm a little more familiar and a little less daunted and I learn a little more. Eventually I'm able to support the whole shebang.
If management suggests I do a major change to something I'm not familiar with I make sure they understand the time scales and if things a really messy suggest a rewrite.
Usually the program in question will produce some kind of output ( log, console printout, dialog box ).
Find the closest place to your
problem in the program output
Search through the code base and look for the text in that output
Start putting your own printouts, nothing fancy, just printf( "Calling xxx\n" );, so you can pinpoint exactly to the point where the problem starts.
Once you pinpointed the problem spot, put a breakpoint
When you hit the breakpoint, print a stacktrace
Now you can see what players you have and start the analysis of how you've got to the wrong place.
Hopefully the names of the methods on the call stack are more meaningful than a, b and c ( seen this ), and there is some sort of comments, method documentation more meaningful than calling a ( seen this many times ).
If the source is poorly documented, don't be afraid to leave your comments once you have figured out what's going on. If program design permits it create a unit test for the problem you've fixed.
Thanks for the nice answers, quite a number of points to take up. I have worked on such situation a number of times and here is the usual procedure i follow:
Check the crash log or trace log. Check relevant trace if just a simple developer mistake if cannot evaluate in one go, then move on to 2.
Reproduce the bug! This is the most important thing to do. Some bugs are rare to occur and if you get to reproduce the bug nothing like it. It means you have a better % of cracking it.
If you cant reproduce a bug, find a alternative use case, situation where in you can actually reproduce the bug. Being able to actually debug a scenario is much more useful than just the crash log.
Head to version control! Check if the same buggy behavior exists on previous few SW versions. If NOT..Voila! You can find between what two versions the bug got introduced and You can easily get the code difference of the two versions and target the relevant area.(Sometimes it is not the newly added code which has the bug but it exposes some old leftovers.Well, We atleast have a start I would say!)
Enable the debug traces. Run the use case of the bug, check if you can find some additional information useful for investigation.
Get hold of the relevant code area through the trace log. Check out there for some code introducing the bug.
Put some breakpoints in the relevant code. Study the flow. Check the data flows.Lookout for pointers(usual culprits). Repeat till you get a hold of the flow.
If you have a SW version which does not reproduce the bug, compare what is different in the flows. Ask yourself, Whats the difference?
Still no Luck!- Arghh...My tricks have exhausted..Need to head the old way. Understand the code..and understand the code and understand it till you know what is happening in the code when that particular use case is being executed.
With newly developed understanding try debugging the code and sure the solution is around the corner.
Most important - Document the understanding you have developed about the module/s. Even small knitty gritty things. It is sure going to help you or someone just like you, someday..sometime!
You can try GNU cFlow tool (http://www.gnu.org/software/cflow/).
It will give you graph, charting control flow within program.

Updating a codebase to meet standards

If you've got a codebase which is a bit messy in respect to coding standards - a mix of different conventions from different people - is it reasonable to give one person the task of going through every file and bringing it up to meet standards?
As well as being tremendously dull, you're going to get a mass of changes in SVN (or whatever) which can make comparing versions harder. Is it sensible to set someone on the whole codebase, or is it considered stupid to touch a file only to make it meet standards? Should files be left alone until some 'real' change is needed, and then updated?
Tagged as C++ since I think different languages have different automated tools for this.
Should files be left alone until some 'real' change is needed, and then updated?
This is what I would do.
Even if it's primarily text layout changes, doing it by a manual process on a large scale risks breaking code that was working.
Treat it as a refactor and do it locally whenever code has to be touched for some other reason. Add tests if they're missing to improve your chances of not breaking the code.
If your code is already well covered by tests, you might get away with something global, but I still wouldn't advocate it.
I also think this is pretty much language-agnostic.
It also depends on what kind of changes you are planning to make in order to bring it up to your coding standard. Everyone's definition of coding standard is different.
More specifically:
Can your proposed changes be made to the project with 100% guarantee that the entire project will work identically the same as before? For example, changes that only affect comments, line breaks and whitespaces should be fine.
If you do not have 100% guarantee, then there is a risk that should not be taken unless it can be balanced with a benefit. For example, is there a need to gain a deeper understanding of the current code base in order to continue its development, or fix its bugs? Is the jumble of coding conventions preventing these initiatives? If so, evaluate the costs and benefits and decide whether a makeover is justified.
If you need to understand the current code base, here is a technique: tracing.
Make a copy of the code base. Note that tracing involves adding code, so it should not be performed on the production copy.
In the new copy, insert many fprintf (trace) statements into any functions considered critical. It may be possible to automate this.
Run the project with various inputs and collect those tracing results. This will help everyone understand the current project's design.
Another technique for understanding the current code base is to document the dependencies in the project.
Some kinds of dependencies (interface dependency, C++ include dependency, C++ typedef / identifier dependency) can be extracted by automated tools.
Run-time dependency can only be extracted through tracing, or by profiling tools.
I was thinking it's a task you might give a work-experience kid or put out onto RentaCoder
This depends mainly on the codebase's size.
I've seen three trainees given the task to go through a 2MLoC codebase (several thousand source files) in order to insert one new line into the standard disclaimer at the top of all the source files (with the line's content depending on the file's name and path). It took them several days. One of the three used most of that time to write a script that would do it and later only fixed the files where the script had failed to insert the line correctly, the other two just ploughed through the files. (The one who wrote the script later got a job at that company.)
The job of manually adapting all those files in that codebase to certain coding standards would probably have to be measured in man-years.
OTOH, if it's just a few dozen files, it's certainly doable.
Your codebase is very likely somewhere in between, so your best bet might be to set a "work-experience kid" to find out whether there's a tool that can do this to your satisfaction and, if so, make it work.
Should files be left alone until some 'real' change is needed, and then updated?
I'd strongly advice against this. If you do this, you will have "real" changes intermingled with whatever reformatting took place, making it nigh impossible to see the "real" changes in the diff.
You can address the formatting aspect of coding style fairly easily. There are a number of tools that can auto-format your code. I recommend hooking one of these up to your version control tool's "check in" feature. This way, people can use whatever format they want while editing their code, but when it gets checked in, it's reformatted to the official style.
In general, I think it's best if you can do the big change all at once. In the past, we've done the following:
1. have a time dedicated to the reformatting when most people aren't working (e.g. at night or on the weekend
2. have a person check out as many files as possible at that time, reformat them, and check them in again
With a reformatting-only revision, you don't have to figure out what has changed in addition to the formatting.

Are C++ static code analyis tools worth it?

Our management has recently been talking to some people selling C++ static analysis tools. Of course the sales people say they will find tons of bugs, but I'm skeptical.
How do such tools work in the real world? Do they find real bugs? Do they help more junior programmers learn?
Are they worth the trouble?
Static code analysis is almost always worth it. The issue with an existing code base is that it will probably report far too many errors to make it useful out of the box.
I once worked on a project that had 100,000+ warnings from the compiler... no point in running Lint tools on that code base.
Using Lint tools "right" means buying into a better process (which is a good thing). One of the best jobs I had was working at a research lab where we were not allowed to check in code with warnings.
So, yes the tools are worth it... in the long term. In the short term turn your compiler warnings up to the max and see what it reports. If the code is "clean" then the time to look at lint tools is now. If the code has many warnings... prioritize and fix them. Once the code has none (or at least very few) warnings then look at Lint tools.
So, Lint tools are not going to help a poor code base, but once you have a good codebase it can help you keep it good.
Edit:
In the case of the 100,000+ warning product, it was broken down into about 60 Visual Studio projects. As each project had all of the warnings removed it was changed so that the warnings were errors, that prevented new warnings from being added to projects that had been cleaned up (or rather it let my co-worker righteously yell at any developer that checked in code without compiling it first :-)
In my experience with a couple of employers, Coverity Prevent for C/C++ was decidedly worth it, finding some bugs even in good developers’ code, and a lot of bugs in the worst developers’ code. Others have already covered technical aspects, so I’ll focus on the political difficulties.
First, the developers whose code need static analysis the most, are the least likely to use it voluntarily. So I’m afraid you’ll need strong management backing, in practice as well as in theory; otherwise it might end up as just a checklist item, to produce impressive metrics without actually getting bugs fixed. Any static analysis tool is going to produce false positives; you’re probably going to need to dedicate somebody to minimizing the annoyance from them, e.g., by triaging defects, prioritizing the checkers, and tweaking the settings. (A commercial tool should be extremely good at never showing a false positive more than once; that alone may be worth the price.) Even the genuine defects are likely to generate annoyance; my advice on this is not to worry about, e.g., check-in comments grumbling that obviously destructive bugs are “minor.”
My biggest piece of advice is a corollary to my first law, above: Take the cheap shots first, and look at the painfully obvious bugs from your worst developers. Some of these might even have been found by compiler warnings, but a lot of bugs can slip through those cracks, e.g., when they’re suppressed by command-line options. Really blatant bugs can be politically useful, e.g., with a Top Ten List of the funniest defects, which can concentrate minds wonderfully, if used carefully.
As a couple people remarked, if you run a static analysis tool full bore on most applications, you will get a lot of warnings, some of them may be false positives or may not lead to an exploitable defect. It is that experience that leads to a perception that these types of tools are noisy and perhaps a waste of time. However, there are warnings that will highlight a real and potentially dangerous defects that can lead to security, reliability, or correctness issues and for many teams, those issues are important to fix and may be nearly impossible to discover via testing.
That said, static analysis tools can be profoundly helpful, but applying them to an existing codebase requires a little strategy. Here are a couple of tips that might help you..
1) Don't turn everything on at once, decide on an initial set of defects, turn those analyses on and fix them across your code base.
2) When you are addressing a class of defects, help your entire development team to understand what the defect is, why it's important and how to code to defend against that defect.
3) Work to clear the codebase completely of that class of defects.
4) Once this class of issues have been fixed, introduce a mechanism to stay in that zero issue state. Luckily, it is much easier make sure you are not re-introducing an error if you are at a baseline has no errors.
It does help. I'd suggest taking a trial version and running it through a part of your codebase which you think is neglected. These tools generate a lot of false positives. Once you've waded through these, you're likely to find a buffer overrun or two that can save a lot of grief in near future. Also, try at least two/three varieties (and also some of the OpenSource stuff).
I've used them - PC-Lint, for example, and they did find some things. Typically they are configurable and you can tell them 'stop bothering me about xyz', if you determine that xyz really isn't an issue.
I don't know that they help junior programmers learn a lot, but they can be used as a mechanism to help tighten up the code.
I've found that a second set of (skeptical, probing for bugs) eyes and unit testing is typically where I've seen more bug catching take place.
Those tools do help. lint has been a great tool for C developers.
But one objection that I have is that they're batch processes that run after you've written a fair amount of code and potentially generate a lot of messages.
I think a better approach is to build such a thing into your IDE and have it point out the problem while you're writing it so you can correct it right away. Don't let those problems get into the code base in the first place.
That's the difference between the FindBugs static analysis tool for Java and IntelliJ's Inspector. I greatly prefer the latter.
You are probably going to have to deal with a good amount of false positives, particularly if your code base is large.
Most static analysis tools work using "intra-procedural analysis", which means that they consider each procedure in isolation, as opposed to "whole-program analysis" which considers the entire program.
They typically use "intra-procedural" analysis because "whole-program analysis" has to consider many paths through a program that won't actually ever happen in practice, and thus can often generate false positive results.
Intra-procedural analysis eliminates those problems by just focusing on a single procedure. In order to work, however, they usually need to introduce an "annotation language" that you use to describe meta-data for procedure arguments, return types, and object fields. For C++ those things are usually implemented via macros that you decorate things with. The annotations then describe things like "this field is never null", "this string buffer is guarded by this integer value", "this field can only be accessed by the thread labeled 'background'", etc.
The analysis tool will then take the annotations you supply and verify that the code you wrote actually conforms to the annotations. For example, if you could potentially pass a null off to something that is marked as not null, it will flag an error.
In the absence of annotations, the tool needs to assume the worst, and so will report a lot of errors that aren't really errors.
Since it appears you are not using such a tool already, you should assume you are going to have to spend a considerably amount of time annotating your code to get rid of all the false positives that will initially be reported. I would run the tool initially, and count the number of errors. That should give you an estimate of how much time you will need to adopt it in your code base.
Wether or not the tool is worth it depends on your organization. What are the kinds of bugs you are bit by the most? Are they buffer overrun bugs? Are they null-dereference or memory-leak bugs? Are they threading issues? Are they "oops we didn't consider that scenario", or "we didn't test a Chineese version of our product running on a Lithuanian version of Windows 98?".
Once you figure out what the issues are, then you should know if it's worth the effort.
The tool will probably help with buffer overflow, null dereference, and memory leak bugs. There's a chance that it may help with threading bugs if it has support for "thread coloring", "effects", or "permissions" analysis. However, those types of analysis are pretty cutting-edge, and have HUGE notational burdens, so they do come with some expense. The tool probably won't help with any other type of bugs.
So, it really depends on what kind of software you write, and what kind of bugs you run into most frequently.
I think static code analysis is well worth, if you are using the right tool. Recently, we tried the Coverity Tool ( bit expensive). Its awesome, it brought out many critical defects,which were not detected by lint or purify.
Also we found that, we could have avoided 35% of the customer Field defects, if we had used coverity earlier.
Now, Coverity is rolled out in my company and when ever we get a customer TR in old software version, we are running coverity against it to bring out the possible canditates for the fault before we start the analysis in a susbsytem.
Paying for most static analysis tools is probably unnecessary when there's some very good-quality free ones (unless you need some very special or specific feature provided by a commercial version). For example, see this answer I gave on another question about cppcheck.
I guess it depends quite a bit on your programming style. If you are mostly writing C code (with the occasional C++ feature) then these tools will likely be able to help (e.g. memory management, buffer overruns, ...). But if you are using more sophisticated C++ features, then the tools might get confused when trying to parse your source code (or just won't find many issues because C++ facilities are usually safer to use).
As with everything the answer depends ... if you are the sole developer working on a knitting-pattern-pretty-printer for you grandma you'll probably do not want to buy any static analysis tools. If you are having a medium sized project for software that will go into something important and maybe on top of that you have a tight schedule, you might want to invest a little bit now that saves you much more later on.
I recently wrote a general rant on this: http://www.redlizards.com/blog/?p=29
I should write part 2 as soon as time permits, but in general do some rough calculations whether it is worth it for you:
how much time spent on debugging?
how many resources bound?
what percentage could have been found by static analysis?
costs for tool setup?
purchase price?
peace of mind? :-)
My personal take is also:
get static analysis in early
early in the project
early in the development cycle
early as in really early (before nightly build and subsequent testing)
provide the developer with the ability to use static analysis himself
nobody likes to be told by test engineers or some anonymous tool
what they did wrong yesterday
less debugging makes a developer happy :-)
provides a good way of learning about (subtle) pitfalls without embarrassment
This rather amazing result was accomplished using Elsa and Oink.
http://www.cs.berkeley.edu/~daw/papers/fmtstr-plas07.pdf
"Large-Scale Analysis of Format String Vulnerabilities in Debian Linux"
by Karl Chen, David Wagner,
UC Berkeley,
{quarl, daw}#cs.berkeley.edu
Abstract:
Format-string bugs are a relatively common security vulnerability, and can lead to arbitrary code execution. In collaboration with others, we designed and implemented a system to eliminate format string vulnerabilities from an entire Linux distribution, using typequalifier inference, a static analysis technique that can find taint violations. We successfully analyze 66% of C/C++ source packages in the Debian 3.1 Linux distribution. Our system finds 1,533 format string taint warnings. We estimate that 85% of these are true positives, i.e., real bugs; ignoring duplicates from libraries, about 75% are real bugs. We suggest that the technology exists to render format string vulnerabilities extinct in the near future.
Categories and Subject Descriptors D.4.6 [Operating Systems]: Security and Protection—Invasive Software;
General Terms: Security, Languages;
Keywords: Format string vulnerability, Large-scale analysis, Typequalifier inference
Static analysis that finds real bugs is worth it regardless of whether it's C++ or not. Some tend to be quite noisy, but if they can catch subtle bugs like signed/unsigned comparisons causing optimizations that break your code or out of bounds array accesses, they are definitely worth the effort.
At a former employer we had Insure++.
It helped to pinpoint random behaviour (use of uninitialized stuff) which Valgrind could not find. But most important: it helpd to remove mistakes which were not known as errors yet.
Insure++ is good, but pricey, that's why we bought one user license only.

Improving and publishing an application. Need some advice

Last term (August - December 2008) me and some class mates wrote an application in C++. Nothing spectacular, it is an ORM for Sqlite3. We implemented some stuff like reflection to make it work and release the end user from the ugly stuff. Personally, i think we made a nice job, and that our ORM could actually be useful for someone (even though its writen specifically for Sqlite3, its easily adaptable for oter databases).
Consequently, i`ve come to the conclusion that it should be published somewhere (sourceforge most likely) as an open source project. But, as it was a term project, there are some things that need to be addresesed before doing that. Namely, it has some memory leaks that should be fixed, and some parts of the code could be refactored to make everyone´s life easier in the future.
I would like to know more experienced C++ programmers opinion on some issues:
Is it worth rewriting some parts to
apply new techonologies (for example,
boost).
Should our ORM be adapted to latest
C++ standard? Is there any benefit in
doing this?
How will we know when our code is
ready for release?
What are the chances that this ORM
will be forgotten into the mists of
the internet? (i.e is it worth
publishing it beyond personal pride
as a programmer?)
Right now i can`t think of many more questions, but i would like to read on similar experiences.
EDIT: I should probably translate my code + comments to english right? (self question)
Thanks in advance.
I guess I am "more experienced" with regard to your particular question. I co-developed an open source web application language & template system a lot like ColdFusion back in the early days of web design before Java or ASP were around. You can still see it at http://www.steelblue.com/ if you are interested. It's still used at the company I was at when it was developed, but I don't think anywhere else.
What I found is that unless you are already well connected and people are watching what you are doing, getting people to use your open source code is just about as hard as selling somone your closed source program. You really need to advocate for your project and it should have some kind of unique selling proposition that distinguishes it from the compitition.
So, that's the unsolicited advice. Here are some specific answers to the questions you had...all purely my opinion, of course.
I wouldn't rewrite any code unless you have a featuer you want to put in. That feature might be compatibility with a specific platforms or compilers. It might be to support a new db datatype or smarter indicies or whatever. If you are going to put some more serious work into the applicaiton, think about a roadmap of what you can realistically accomplish in the next iteration and what choices will make the app the "most better" at the end of your cycle.
Release the code as soon as it is usable for a specific purpose, any purpose. Two reasons. First off, there might be someone who wants it for that purpose right now. If it's not available, they will use something else. Also, if it's open source, they might contribute back to the project. Second, the sooner you find out how much people want to use the code, the better. Either it will be more popular than you expect and you can get excited about continuing the development....or....you will find that no one is even visiting your web page to see what you've got. In either case, better to know sooner than later what people really want from your project so you can take that into account when planning new releases.
About the "forgotten into the mists." I think most projects are. I don't want to be a downer, but looking at Wikipedia, there were 5 C++ ORM tools popular enough to get mention and they were all open source. As I said above, unless you can sell your idea to people, they are going to go with another proven open source solution. For someone to choose you over them, three things have to happen: 1. They need a feature you have that the others don't. 2. They find your project web site and it demonstrates the superiority of your code. 3. They trust your code enough to give it a shot.
On the other hand, if you are in this for the long haul and want to continue development thigns get easier over time. Eventually the project will get all the basics covered and you can start developing those new featuers that aren't in the other solutions. Also, the longer you are in active development the more trustworthy the project will seem. Finally, you will get more experience in the nitch. 2 years from now you will be better positioned to say where your effort will have the most impact on bettering the project.
A final thought: If you are enjoying it, learning from it, and it's not getting in the way of you keeping food on the table, it's a good use of your time.
Good luck!
-Al
Regarding the open source part:
If you really want to make it an open source project, you really should publish it regardless of it's current state - fully working and debugged - or half working and full of memory leaks.
Just, if it's state is bad, make sure to document it, and give it a suitable version number (less than one?). then others may view your code, suggest improving, join your team, etc...
My--rather random--thoughts on the matter (in the order I think is most important):
How will we know when our code is ready for release?
Like Liran Orevi said: if you're going open source release early. Document it reasonable well, and take the time to provide a road map of planned or hoped for future improvements (these are a invitation for people to help you, so note which ones have no one working on them).
Is it worth rewriting some parts to apply new technologies (for example, boost).
Should our ORM be adapted to latest C++ standard? Is there any benefit in doing this?
SQLite relies on a fairly limited base. Maybe you don't want your tool to demand a much heavier environment. If the code in not currently a tangled and unmaintainable mess, you might want to avoid boost and newest frills. Once you have a stable release (1.0 at least) you can starting thinking about the improvements that can be made for version 2.
What are the chances that this ORM will be forgotten into the mists of the internet? (i.e is it worth publishing it beyond personal pride as a programmer?)
Most things end up in the big /dev/null in the sky, and there is only one way to find out... If it goes anywhere at all, you win. If it doesn't it was a modest investment, and maybe you learned something while you were at it.