SVN tracking changes across files

SVN tracking changes across files - c++

I am working on a legacy C++ project where there are many classes in a single file and some files end up being 8k+ lines. I am planning on moving some of these classes to separate files. The project uses SVN. I am wondering if SVN can track the history of such change because I may not do the refactoring if it can't.
As an example, I have old.cpp with 3 classes A B C. I want to refactor this into A.cpp B.cpp C.cpp. If I select history of C.cpp, I want to see the class used to be in old.cpp and all the changes to C in old.cpp.

I have never tried it with single files (only with complete folders, which is essentially branching), but in TortoiseSVN you can drag-n-drop the file using right mouse button. When you drop the file (within the same working folder) in displayed context menu select "SVN copy and rename versioned item here". After that (and other changes) commit the working copy.
Another option is to do the same thing, but from repository browser. The difference from previous option is that here each operation is automatically a new revision. I prefer the first option.
After the file is copied edit it to remove the extraneous content. Using the first option you can do it in the same revision as the copying.

Since your project in SVN - you always will be able to return to the original version.
I strongly recommend writing unit tests before the change, so you can track the sanity of the program after each change.
I have an experience in the past with such a change, here how I did it:
Create a separate branch in SVN
Checkout the source from this branch
For each class do the next steps:
Move the class to separate (new) file without changing it. Since the creation you will be able to control each change in SVN
Commit the change to SVN - this way you know that nothing changed in the contents of the file.
Make the necessary changes to the newly created file.
Run unit tests
Commit the change

Related

GoldenFiles testing and TFS server workspaces

Our product (C++ windows application, Google Test as testing framework, VS2015 as IDE) has a number of file-based interfaces to external products, i.e., we generate a file which is then imported into an external product. For testing these interfaces, we have chosen a golden file approach:
Invoke the code that produces an interface file, save the resulting file for later reference (this is our golden file - we here assume that the current state of interface code is correct).
Commit the golden file to the TFS repository.
Make changes to the interface code.
Invoke the code, compare the resulting file with the according golden file.
If the files are equal, the test passes (the change was a refactoring). Otherwise,
Enable the refresh modus which makes sure that the golden file is overriden by the file resulting from invoking the interface code.
Invoke the interface code (thus refreshing the golden file).
Investigate the outgoing changes in VS's team explorer. If the changes are as desired by our code changes from step 3, commit code changes and golden file. Otherwise, go back to step 3.
This approach works great for us, but it has one drawback: VS only recognizes that the golden files have changed (and thus allows us to investigate the changes) if we use a local workspace. If we use a server workspace, programmatically remove the read-only flag from the golden files and refresh them as described above, VS still does not recognize that the files have changed.
So my question is: Is there any way to make our golden file testing approach work with server workspaces, e.g. by telling VS that some files have changed?

I can think of two ways.
First approach is to run a tf checkout instead of removing the Read-Only attribute.
This has an intrinsic risk as one may inadvertently checking-in the generated file; this should be prevented by restricting check-in permissions on those files. Also you may need to run tf undo to clean up the local state.
Another approach would be to map the golden files in a different directory and use a local diff tool instead of relying on Visual Studio builtin tool. This is less risky than the other solution, but may be cumbersome. Do not forget that you can "clone" a workspace (e.g. Import Visual Studio TFS workspaces).

Change stored macro SAS

In SAS using SASMSTORE option I can specify a place where the SASMACR catalog will exist. In this catalog will reside some macro.
At some moment I may need to change the macro and this moment may occure while this macro and therefore the catalog will be in use by another user. But then it will be locked and unavailable to be modified.
How can I avoid such a situation?

If you're using a SAS Macro catalog as a public catalog that is shared among colleagues, a few options exist.
First, use SVN or similar source control option so that you and your colleagues each have a local copy of the macro catalog. This is my preferred option. I'd do this, and also probably not used stored compiled macros - I'd just set it up as autocall macros, personally - because that makes it easy to resolve conflicts (as you have separate files for each macro). Using SCMs you won't be able to resolve conflicts, so you'll have to make sure everyone is very well behaved about always downloading the newest copy before making any changes, and discusses any changes so you don't have two competing changes made at about the same time. If SCMs are important for your particular use case, you could version control the macros that create the SCMs and build the SCM yourself every time you refresh your local copy of the sources.
Second, you could and should separate development from production here. Even if you have a shared library located on a shared network folder, you should have a development copy as well that is explicitly not locked by anyone except when developing a new macro for it (or updating a currently used macro). Then make your changes there, and on a consistent schedule push them out once they've been tested and verified (preferably in a test environment, so you have the classic three: dev, test, and prod environments). Something like this:
Changes in Dev are pushed to Test on Wednesdays. Anyone who's got something ready to go by Wednesday 3pm puts it in a folder (the macro source code, that is), and it's compiled into the test SCM automatically.
Test is then verified Thursday and Friday. Anything that is verified in Test by 3pm Friday is pushed to the Dev source code folder at that time, paying attention to any potential conflicts in other new code in test (nothing's pushed to dev if something currently in test but not verified could conflict with it).
Production then is run at 3pm Friday. Everyone has to be out of the SCM by then.
I suggest not using Friday for prod if you have something that runs over the weekend, of course, as it risks you having to fix something over the weekend.

Create two folders, e.g. maclib1 and maclib2, and a dataset which stores the current library number.
When you want to rebuild your library, query the current number, increment (or reset to 1 if it's already 2), assign your macro library path to the corresponding folder, compile your macros, and then update the dataset with the new library number.
When it comes to assigning your library, query the current library number from the dataset, and assign the library path accordingly.

Should these auxiliary files be under Git version control?

I decided to start using the Git version control system for my C++ project. I'm new to version control. For the trunk things are simple, I just commit all the project versions I have. I kept each version as a separate folder because I knew I'd very soon use Git. But I encountered a problem with my branches.
At some stage of the development, I decided there's one class I want to develop in a branch. Without version control, I had to use make a "manual" branch. I copied the most recent header file and source file of that class to a separate folder and started working there. I made several versions there to work with simultaneously. One version was the first prototype of the class according to the plan (for which I made the "branch"). Then I added another file, in which I copied the first one but removed things that seemed to not be necessary. This way I have 2 versions, one with all my ideas and features, the the other one just with what I really use in my code, without what's not in use at the moment.
But then I added more. As development went on, I decided it may be a good idea to make that class a template. So I added a third version, which is just like the second one, but now some functionality implemented using polymorphism is implemented using a template. And I can't tell yet which version is the best, as it's too early to tell, so I want to have all 3 together.
Then I made another special file: A copy of the third version header file, in which each line can be marked or not marked. Marked means I use that specific method or I'm sure it's going to be in use very soon, otherwise the line isn't marked.
Then, some time later, I started a new branch. And for that branch I needed a new version of that class developed in the first branch. So I just copied one of the versions to the new branch's folder and started working there. Now again I had some kind of auxiliary file: I had 2 files, one from which I delete class methods I use, and one into which I write new methods I need to have.
Now I want to start using Git and I wonder: For all the project's text files, plans, diagrams, etc., it obvious - I keep them outside the Git repo. Whenever collaborative editing is needed I can set up a wiki or something like that. But for all those copies of the same header file, and for those auxiliary "marked" files, what do I do with them? I mean, it's fine by me to have them all in a branch, but what happens when I merge a branch into the trunk? I don't want to have all these copies and versions and lists, just the one final class file I've made.
On one hand, these are C++ source files used while coding. On the other hand they're not part of the pure source code of the software package, they just help me while I work but will never get compiled because in the end there's just the final version of the class which I chose to merge, and all other aux files, lists, etc. are kept just for reference.
What would be the best thing to do?
Thanks for reading my long story :)
EDIT: It's a local repo on my personal computer

Always keep documentation in same repository as source code. If you do not, your documentation will rot. It is becouse documentation is written agains some version of your software, so it has to develop the same way as software develops.
If your documentation is automaticaly generated or compiled into another format, commit only source data, makefile and configuration of generator, just like you do with source code.

What you describe is the normal use of branches: You have your master branch ("official", if it where) and a branch to develop a new feature (it doesn't really have to live in a separate directory, if I understand you correctly). Periodically you synchronize the feature branch with the master, either by rebasing it on the master or merging its changes in. In its turn, you can well have subordinate branches in which you try out approaches to develop the feature, handled with respect to the feature branch just like that one respect to the master. But in that case you have to be careful whenever you rebase.
You should keep any data that isn't easy to recreate in the repository, be it source code, documentation or even design sketches. Stuff that can be recreated (object code, automatically formated documentation, ...) should be kept out (any change there will create a difference to be checked in). Your repository (particularly not published branches) is your own workspace, it can be all the messy you like.
Take a look at the book mentioned at the git homepage.

Well, that’s clearly documentation and not source code, so you should separate it from your source code. As your documentation seems to be branch dependent, you should still check it into the repo, but in a separate doc directory.
About the merging: How a merge works is up to you in the end. Git just has a default merge strategy which is what most people want most of the time. But if you say that a merge into the main branch should just bring the code and not the docu, then that’s fine. Just merge that way:
git merge mybranch --no-commit
rm -rf **docu-dir**
git add -A
git commit

Avoiding unneccessry recompilations using "branchy" development model

I'm using Mercurial for development of quite a large C++ project which takes about 30 minutes to get built from the scratch(while incremental builds are very quick).
I'm usually trying to implement each new feature in the new branch(using "hg clone") and I may have several new features developed during the day and it's quickly getting very boring to wait for the new feature branch to get built.
Are there any recipes to somehow re-use object files from other already built branches?
P.S. in git there are named branches within the same repository which make re-usage of the existing object files possible for the build system, however I prefer the simpler Mercurial separate branches model...

I suggest using ccache as a way to speed up compilation of (mostly) the same code tree. The way it works is as following:
You define a place to be used as the cache (and the maximum cache size) by using the CCACHE_DIR environment variable
Your compiler should be set to ccache ${CC} or ccache ${CXX}
ccache takes the output of ${CC} -E and the compilation flags and uses that as a base for its hash. As long as the compiler flags, source file and the headers are all unchanged, the object file will be taken from cache, saving valuable compilation time.
Note that this method speeds up compilation of any source file that eventually produces the same hash. If you share source files across projects, ccache will handle them as well.
If you already use distcc and wish to use it with ccache, set the CCACHE_PREFIX environment variable to distcc.
Using ccache sped up our source tree compilation around tenfold.

A simple way to speed up your builds could be to use a local "build directory" on your disk. This way you can checkout into this directory and start the build. The first time it will take the full time, but after that it will (hopefully) only rebuild the files where the source code changed.

My Localbranch extension was designed partly around this use case. It uses a single working directory, but I think it's simpler than git. It's essentially a mechanism for maintaining multiple repository clones under one working directory, where only one is active at a given time.

Woops, I missed your P.S. where you don't like having multiple named branches in the same repo and that you prefer separate clones.. sorry about that.
I too have somewhat large C++ projects and the clone-per-feature workflow didn't work for me very well. Firstly, I had to close down my Vim session and then reopen (many of the same) files once I've created the clone. Secondly, like you said, a lot of code must be recompiled unnecessarily. Thirdly, I have to keep track of where I've pushed to and pulled from - gets confusing when you start a new feature and then get sidetracked onto a new one. Before you know it you have many clones and not sure which ones need to be pushed back to your main.
You definitely don't want to use named branches (as I'm sure you know) to handle this as they are quite permanent.
What you need are bookmarks: https://www.mercurial-scm.org/wiki/BookmarksExtension
Bookmarks allow you to create lightweight (and otherwise anonymous) branches per feature by facilitating the naming of heads in your repo. These heads would normally be unnamed and you would have to look at the output of 'hg log' or use some graphical tool to find the revision numbers for the tip of your feature-branch. With bookmarks you can name them descriptive names like 'my-cool-feature' or 'bugfix-392'.
If you like the idea of bookmarks, I'd also recommend my own extension called 'tasks': http://bitbucket.org/alu/hgtasks. This extension works like bookmarks but adds some more functionality. It allows you created feature-branches (now called tasks) and suppress the pushing of incomplete tasks. This is handy when you have a few feature-branches at once. You may not be ready to push your 'my-cool-feature' task, but 'bugfix-392' is ready to go. Because tasks track a set of changesets (and not just one 'tip' changeset) there are some things you can do with tasks that you can't with bookmarks. See an example workflow here: http://x.zpuppet.org/2009/03/09/mercurial-tasks-extension/.

Mercurial also has local named branches, see the hg branch command.
If you insist on using hg clone to do branchy development, I guess you could try creating a folder link (shortcut under windows) in your repo to a shared obj folder. This will work with hg clone, but I'm not sure your build tool will pick it up.
Otherwise, you probably keep all your repos in one folder - just put your obj folder there (it shouldn't be under source control anyways, imo). Use relative paths to refer to it.

A word of warning: many .o symbol tables (or equivalent) contain the full path name of the source file. If that other file changes (or if the path is not visible from the new directory) you may encounter weirdness when debugging.

C++ Directory Restructuring

I have a source code of about 500 files in about 10 directories. I need to refactor the directory structure - this includes changing the directory hierarchy or renaming some directories.
I am using svn version control. There are two ways to refactor: one preserving svn history (using svn move command) and the other without preserving. I think refactoring preserving svn history is a lot easier using eclipse CDT and SVN plugin (visual studio does not fit at all for directory restructuring).
But right now since the code is not released, we have the option to not preserve history.
Still there remains the task of changing the include directives of header files wherever they are included. I am thinking of writing a small script using python - receives a map from current filename to new filename, and makes the rename wherever needed (using something like sed). Has anyone done this kind of directory refactoring? Do you know of good related tools?

If you're having to rewrite the #includes to do this, you did it wrong. Change all your #includes to use a very simple directory structure, at mot two levels deep and only using a second level to organize around architecture or OS dependencies (like sys/types.h).
Then change your make files to use -I include paths.
Voila. You'll never have to hack the code again for this, and compiles will blow up instantly if something goes wrong.
As far as the history part, I personally find it easier to make a clean start when doing this sort of thing; archive the old one, make a new repository v2, go from there. The counterargument is when there is a whole lot of history of changes, or lots of open issues against the existing code.
Oh, and you do have good tests, and you're not doing this with a release coming right up, right?

I would preserve the history, even if it takes a small amount of extra time. There's a lot of value in being able to read through commit logs and understand why function X is written in a weird way, or that this really is an off-by-one error because it was written by Oliver, who always gets that wrong.
The argument against preserving the history can be made for the following users:
your code might have embarrassing things, like profanity and fighting among developers
you don't care about the commit history of your code, because it's not going to change or be maintained in the future
I did some directory refactoring like this last year on our code base. If your code is reasonable structured at the beginning, you can do about 75-90% of the work using scripts written in your language of choice (I used Perl). In my case, we were moving from set of files all in one big directory, to a series of nested directories depending on namespaces. So, a file that declared the class protocols::serialization::SerializerBase was located in src/protocols/serialization/SerializerBase. The mapping from the old name to the new name was trivial, so that doing a find and replace on #includes in every source file in the tree was trivial, although it was a big change. There were a couple of weird edge cases that we had to fix by hand, but that seemed a lot better than either having to do everything by hand or having to write our own C++ parser.

Hacking up a shell script to do the svn moves is trivial. In tcsh it's foreach F ( $FILES ) ... end to adjust a set of files. Perl & Python offer better utility.
It really is worth saving the history. Especially when trying to track down some exotic bug. Those who do not learn from history are doomed to repeat it, or some such junk...
As for altering all the files... There was a similar question just the other day over at:
https://stackoverflow.com/questions/573430/
c-include-header-path-change-windows-to-linux/573531#573531

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js