Why should one delete sstate-cache when deleting tmp in Yocto build? - build

The section 5.2.7. build/tmp/ of the Yocto Reference Manual says:
As a last resort, to clean up a build and start it from scratch (other than the downloads), you can remove everything in the tmp directory or get rid of the directory completely. If you do, you should also completely remove the build/sstate-cache directory.
Does this mean that if one deletes the tmp, they should always delete sstate-cache as well and things break if they don't, or is it just a confusing formulation meaning it won't really be from scratch if the sstate-cache is still there?
And if it is the former, what is the reason?

It's a confusing formulation. If you want to do a build from absolute clean then you need to wipe sstate-cache as otherwise it won't be from scratch. You can delete tmp as much as you want and keep the sstate (personally my tmp is in a tmpfs so gets emptied several times a day, but the sstate-cache is years old).

Related

What is the proper workflow for major changes in git?

I am currently working on a major overhaul to an email system within a small Django project.
As it stands, I will need to make huge modifications to almost every aspect of the system and I think it would be easier instead if I started from scratch while deleting most of the old files.
1) Should I comment out old code or overwrite it?
2) Should I delete old files or should I rename it to something unused?
3) What is common practice when it comes to major overhauls in git?
You stated that starting from scratch might be easier, but you might consider if a more incremental approach is possible. Unless it is a very small program, your design will probably be broken up into several sub-systems/components. Maybe there are one or two without significant dependencies you can start with; first get them working with unit tests and then refactor the old code base to use them. When refactoring, delete all the old cruft being replaced with the new logic. As stated elsewhere, your repository should provide the history; unused or commented-out code makes your code base more confusing.
Taking the time to refactor like this might seem like a waste of time, but it will likely improve the design. While the old code base might be a huge mess, presumably it is does work. By refactoring it to use the new components as you develop them might reveal oversights in your design much earlier than if you just started from scratch. It might even make it easier to think about the new design as a set of components since you're not trying to get everything working all at once.
It depends, if you're sure you don't want to use your old code in the future, made a new branch and changes everything. But if you're not sure, I would recreates a repository. It will be easyer. But you can also do reverts. It's a personal choice. I would create a new repo.

Handling deferred code

A bit of functionality that has been mostly implemented has been shelved until a future development phase. The code is not wanted in the current phase, however it will be needed later, so simply removing it isn't an attractive option. Finishing the code is also considered an unattractive option. I'm trying to work out the best way of putting this functionality into cold storage without:
leaving clutter in the source files
removing it altogether (as per the wishes of my team)
I don't immediately see the best way of handling this "temporarily" redundant code. Part of me just wants to just tag the code base and rip out the offending code. My rationale for this is:
when (and if!) we ever go back to this functionality it'll likely need a fair few changes anyway as everything else would have moved on
littering the code with (what can only become more) broken / incomplete code wrapped in #if 0 feels wrong wrong wrong
having a tagged point in source control which has context is much more useful should this functionality be reimplemented later
Is there anything I'm missing here?
I would create a branch pointing to revision with unwanted code, continue development in master and merge this two branches later, when this code becomes useful again.
Excuse me using git vocabulary, this concept can be easily ported to other VCSes.

What is the best approach for coding in a slow compilation environment?

I used to be coding in C# in a TDD style - write/or change a small chunk of code, re-compile in 10 seconds the whole solution, re-run the tests and again. Easy...
That development methodology worked very well for me for a few years, until a last year when I had to go back to C++ coding and it really feels that my productivity has dramatically decreased since. The C++ as a language is not a problem - I had quite a lot of C++ dev experience... but in the past.
My productivity is still OK for a small projects, but it gets worse when with the increase of the project size and once compilation time hits 10+ minutes it gets really bad. And if I find the error I have to start compilation again, etc. That is just purely frustrating.
Thus I concluded that in a small chunks (as before) is not acceptable - any recommendations how can I get myself into the old gone habit of coding for an hour or so, when reviewing the code manually (without relying on a fast C# compiler), and only recompiling/re-running unit tests once in a couple of hours.
With a C# and TDD it was very easy to write a code in a evolutionary way - after a dozen of iterations whatever crap I started with was ending up in a good code, but it just does not work for me anymore (in a slow compilation environment).
Would really appreciate your inputs and recos.
p.s. not sure how to tag the question - anyone is welcome to re-tag the question appropriately.
Cheers.
I've found that recompiling and testing sort of pulls me out of the "zone", so in order to have the benefits of TDD, I commit fairly often into a git repository, and run a background process that checks out any new commit, runs the full test suite and annotates the commit object in git with the result. When I get around to it (usually in the evening), I then go back to the test results, fix any issues and "rewrite history", then re-run the tests on the new history. This way I don't have to interrupt my work even for the short times it takes to recompile (most of) my projects.
Sometimes you can avoid the long compile. Aside from improving the quality of your build files/process, you may be able to pick just a small thing to build. If the file you're working on is a .cpp file, just compile that one TU and unit-test it in isolation from the rest of the project. If it's a header (perhaps containing inline functions and templates), do the same with a small number of TUs that between them reference most of the functionality (if no such set of TUs exists, write unit tests for the header file and use those). This lets you quickly detect obvious stupid errors (like typos) that don't compile, and runs the subset of tests you believe to be relevant to the changes you're making. Once you have something that might vaguely work, do a proper build/test of the project to ensure you haven't broken anything you didn't realise was relevant.
Where a long compile/test cycle is unavoidable, I work on two things at once. For this to be efficient, one of them needs to be simple enough that it can just be dropped when the main task is ready to be resumed, and picked up again immediately when the main task's compile/test cycle is finished. This takes a bit of planning. And of course the secondary task has its own build/test cycle, so sometimes you want to work in separate checked-out copies of the source so that errors in one don't block the other.
The secondary task could for example be, "speed up the partial compilation time of the main task by reducing inter-component dependencies". Even so you may have hit a hard limit once it's taking 10 minutes just to link your program's executable, since splitting the thing into multiple dlls just as a development hack probably isn't a good idea. The key thing to avoid is for the secondary task to be, "hit SO", or this.
Since a simple change triggers a 10 minutes recompilation, that means you have a bad build system. Your build should recompile only changed files and files depending on the changed files.
Other then that, there are other techniques to speed up the build time (For example, try to remove unneeded includes. Then instead of including a header, use forward declaration. etc ), but the speed up of these things is not that important as what is recompiled on a change.
I don't see why you can't use TDD with C++. I used CppUnit back in 2001, so I assume it's still in place.
You don't say what IDE or build tool you're using, so I can't comment on how those affect your pace. But small, incremental compiles and running unit tests are both still possible.
Perhaps looking into Cruise Control, Team City, or another hands-off build and test process would be your cup of tea. You can just check in as fast as you can and let the automated build happen on another server.

Updating a codebase to meet standards

If you've got a codebase which is a bit messy in respect to coding standards - a mix of different conventions from different people - is it reasonable to give one person the task of going through every file and bringing it up to meet standards?
As well as being tremendously dull, you're going to get a mass of changes in SVN (or whatever) which can make comparing versions harder. Is it sensible to set someone on the whole codebase, or is it considered stupid to touch a file only to make it meet standards? Should files be left alone until some 'real' change is needed, and then updated?
Tagged as C++ since I think different languages have different automated tools for this.
Should files be left alone until some 'real' change is needed, and then updated?
This is what I would do.
Even if it's primarily text layout changes, doing it by a manual process on a large scale risks breaking code that was working.
Treat it as a refactor and do it locally whenever code has to be touched for some other reason. Add tests if they're missing to improve your chances of not breaking the code.
If your code is already well covered by tests, you might get away with something global, but I still wouldn't advocate it.
I also think this is pretty much language-agnostic.
It also depends on what kind of changes you are planning to make in order to bring it up to your coding standard. Everyone's definition of coding standard is different.
More specifically:
Can your proposed changes be made to the project with 100% guarantee that the entire project will work identically the same as before? For example, changes that only affect comments, line breaks and whitespaces should be fine.
If you do not have 100% guarantee, then there is a risk that should not be taken unless it can be balanced with a benefit. For example, is there a need to gain a deeper understanding of the current code base in order to continue its development, or fix its bugs? Is the jumble of coding conventions preventing these initiatives? If so, evaluate the costs and benefits and decide whether a makeover is justified.
If you need to understand the current code base, here is a technique: tracing.
Make a copy of the code base. Note that tracing involves adding code, so it should not be performed on the production copy.
In the new copy, insert many fprintf (trace) statements into any functions considered critical. It may be possible to automate this.
Run the project with various inputs and collect those tracing results. This will help everyone understand the current project's design.
Another technique for understanding the current code base is to document the dependencies in the project.
Some kinds of dependencies (interface dependency, C++ include dependency, C++ typedef / identifier dependency) can be extracted by automated tools.
Run-time dependency can only be extracted through tracing, or by profiling tools.
I was thinking it's a task you might give a work-experience kid or put out onto RentaCoder
This depends mainly on the codebase's size.
I've seen three trainees given the task to go through a 2MLoC codebase (several thousand source files) in order to insert one new line into the standard disclaimer at the top of all the source files (with the line's content depending on the file's name and path). It took them several days. One of the three used most of that time to write a script that would do it and later only fixed the files where the script had failed to insert the line correctly, the other two just ploughed through the files. (The one who wrote the script later got a job at that company.)
The job of manually adapting all those files in that codebase to certain coding standards would probably have to be measured in man-years.
OTOH, if it's just a few dozen files, it's certainly doable.
Your codebase is very likely somewhere in between, so your best bet might be to set a "work-experience kid" to find out whether there's a tool that can do this to your satisfaction and, if so, make it work.
Should files be left alone until some 'real' change is needed, and then updated?
I'd strongly advice against this. If you do this, you will have "real" changes intermingled with whatever reformatting took place, making it nigh impossible to see the "real" changes in the diff.
You can address the formatting aspect of coding style fairly easily. There are a number of tools that can auto-format your code. I recommend hooking one of these up to your version control tool's "check in" feature. This way, people can use whatever format they want while editing their code, but when it gets checked in, it's reformatted to the official style.
In general, I think it's best if you can do the big change all at once. In the past, we've done the following:
1. have a time dedicated to the reformatting when most people aren't working (e.g. at night or on the weekend
2. have a person check out as many files as possible at that time, reformat them, and check them in again
With a reformatting-only revision, you don't have to figure out what has changed in addition to the formatting.

How can I get c-function based diffs?

Our team uses svn to manage our source. When performing a re-factor on a C file, I occasionally both change functions and move them within the file. Generally I try to avoid moving functions, because it makes the default svn diff get a bit addled about what's going on, and it often provides a diff which is more confusing than it needs to be.
None the less, occasionally I do make both function file-location changes, and function internal code changes. Another place this comes up is in branch merging, when the file is in conflict, and either or both branches have moves as well as intra-function changes.
So, what I am looking for is a semantically aware diff tool that could tell me diffs at two levels - function arrangement, and detail (intra-function). I tried using the "-p" option to diff (-x -p to svn diff), but that's not what it's intended for, it certainly didn't do what I wanted.
Another option I just thought of is using a diff program designed to catch code-copying such as a university might use for checking assignments, but nothing obvious came up in a quick search.
One way to do it with the tools you have is to move the functions first, check them in, then change them. Or have two enlistments, and when you see this happening move them in one, svn up the other, resolve the merge issue. It moves the work to you, but makes code reviews easier.
I make cosmetic changes (moving functions around) and functional changes in different commits, and put "cosmetics" in the commit message. That way, the huge and uninteresting diff for cosmetics work is ignored, and you have a concise diff for the functional changes.
since you increased the level of difficulty of the problem a bit with your last edit:
there are limits of what svn can do, thats the reason why git was written. the answer to your problem is basically "no, there are no tools which can track code on a semantic level with svn"
(actually there are no semantic tracking tools available for git as well, it tracks content)
you could try to do that refactoring with git:
And when using git, the whole 'keep code movement separate from changes' has an even more fundamental reason: git can track code movement (again, whether moving a whole file or just a function between files), and doing a 'git blame -C' will actually follow code movement between files. It does that by similarity analysis, but it does mean that if you both move the code and change it at the same time, git cannot see that 'oh, that function came originally from that other file', and now you get worse annotations about where code actually originated.
so, the idea would be to initialize a git repository and replay all the relevant svn-commits to that repository. after that, use git to find out which content moved to where.