Has anyone come up with a good solution for debugging executables between different code versions using git? Basically, I want to be able to generate side-by-side executables between my recent set of commits and an older set of commits, as well as keep track of the relevant code to go with it, without everything overwriting when jumping between the commits.
Currently my method of doing this is to make/compile the current code and rename the executable and whatever code I'm trying to analyze in debugger (gdb in my case). Now, to the old code, I like to checkout a new branch before checking out the older commit because I'm paranoid and it builds the reflog for extra safety and check-pointing (I know it's not really necessary). I checkout old code and run the make file on the older commit. Now I have my side-by-side executables that I can run with gdb. It's all a bit tedious though, especially if a few headers/files have changed (I like to break up my code) and also if I now want to make changes and recompile (I've got tons of references/includes and I basically just have to start the process over again). Anyone have a better method?
I recently had a similar problem when trying to debug/upgrade a library which needed to be properly installed on the system (using make install) and make jumps between last stable and development versions.
Since version 2.6.0 (about three years from now), you now can use
git worktree
It basicaly enables yourself to turn any directory, at any place, into an annex of an official git local repository. It does by filling a .git textfile (instead of the regular subdirectory) at its top level, containing informations that point the original one. On the repository side, this annex is declared as so and a new branch is created.
Consequently, you now have the possibility to perform git checkout onto two different revisions simultaneously (one per working directory). And you may add as many annexes as you need. When you're done, simply delete all useless annexes, then call git worktree prune to make it dismiss everything that no longer exists (you may use git worktree lock to prevent some of them to be deleted if their annex directory is to be unavailable sometimes, with removable devices for instance). This will let you compile two different versions of the same application at the same time.
If, however, you need to retrieve a particular revision of your soft but you can't tell which one before you have compiled it, you may want to use
git bisect run
…instead, which will automatically call a script that tells if a revision is good or bad. It's useful when a compilation is rather long and when the whole search can span over several days.
Here's a link to the documentation page related to git worktree.
Related
The problem is that the source code distribution is not exactly the code that runs after installation. The installer, which runs when the site is accessed for the first time, generates a lot of code. Also, a running system stores some data in php source code (e.g. user profiles - under the /user_privileges directory) rather than in the database. So, I have the following unsatisfactory possibilities.
(1) Put the original source code under VC and edit it. In this case I have to do a fresh install and run the installer every time to see how my changes are working.
(2) Put the installed source code (after the installer has run) under VC, and edit it. In this case I have immediate feedback, but I can't use that code for new installations. I also have to exclude everything that the running system writes in the source tree from the VC.
Any suggestions?
I am working with Vtiger CRM version 6.0Beta, but any tips relevant to version 5 would help.
Thanks.
Choice 1 is appropriate. VC must always track the source code, not the products of any interpreter or processing. I feel your pain. It is so easy to tweak that Vtiger source code, and VC tends to be left by the wayside.
Get familiar with GIT. Really, it is what you want. Look here, I already did it.
Copy the original code in one branch
Copy the modified code into another branch
Make a diff or better, run git format-patch
Install (checkout) your new Version
Check the patches and apply them, if necessary.
Bonusses
Have a private and a public remote for your repo, so that you can keep track on the crumpy files in user_privileges and friends in private, but share code with others
Have an absolutely beautiful backup with daily rollback by just setting up a branch, a remote and a cronjob.
Beeing able to replicate the live situation within minutes for local development
Painfree Updates !!
I know, this is no easy task, but once done, it will make your live pretty much easier.
It seems that a build system best-practice is to have a single script that can build all source and package the releases. See Joel Test #2
How do you account for non-portable dependencies? For example, if you code for .net 4, then you need .net 4 installed on the box. Standard MS release .net 4 is not xcopy deployable (unless I'm mistaken?). I can see a few avenues:
the dependencies are clearly stated in some resource file (wiki, txt, whatever). When you
call the build script, the build will fail if you don't have the dependency installed. This is an acceptable outcome.
The build script is responsible for setting up the environment. So if you require .net 4 and its not on the box then it installs it for you.
A flavor of #2 - instead of installing dependencies, the script spawns a pre-packaged image (virtual machine, Amazon EC2 AMI) that is setup with all dependencies
???
For implementing a build script you have to ask yourself, how much work you want/can spent on it. This leads to the question how often you have to set up the build environment. I can see #2 would be the perfect solution, but i would need a lot of work, since usually you have more than one non portable dependency.
So we use #1 one. And it works quite well. The most important thing is, that the build script is starting with some sort of self-test. It looks for everything which is needed to build the whole software and gives an error if something is not found. And it gives a clear error message, so that any new guy knows what to do to make it running. Of course as with a lot of software it is nearly never finished and gets extended by needs. The drawback that this test can take some seconds is insignificant when whole build process needs more than minutes.
A wiki (or even sth. else) with the setup solution was not a good solution for us, since after three month nobody knows where this was, but the build script is used every day.
The build script itself is a set of a lot of different things, which where chosen by needs. It is starting with a batch (we are using Windows) which invokes a lot of other things. Other batches, MSBuild, home grown tools. Each step by it self is checking for its own dependencies, to have the problem local and you can see three lines later why this special thing is needed.
Number 2 states "Can you make a build in one step?" As described this means for a development team to be effective the build process must be as simple as possible to reduce errors in the build process and insure consistency. This is especially important as a team gets larger. You want to make sure everyone is building the same thing. (What is done with that package should also be simple, but it is not as important IMHO.) Msbuild is great at this; they provide the facilities to set up a build server that access the source control system independently so the developers actions can't corrupt the build environment. I highly recommend setting up a build server using TFS -- many build issues will go away and you will have the 1-click build Joel describes.
As for your points about what that package does for deployment -- you have many options with MS, but the more "one click" you can make it the better. I believe this is slightly different than Joel's #2. In his example he describes changing what software he will use for the install not because one performs with fewer steps, but instead because one can be incorporated into a one step build.
I've been looking into using Mercurial for version management and wanted to try it out. I'm planning on doing a project on my own to help me learn C++ coming from Java and thought it would be a good idea to try version management with it. It would probably be a game or some sort of simple application, not sure with a GUI or not.
The problem is I don't know much about version management. When in a project is a good time to make the repository? Should I just do it right away? How often should I make commits?
I'll be using Netbeans for my development since it seems to be pretty good, also has Mercurial controls built in, but what should I set in the .hgignore for my project? I assume the repository would be the whole folder Netbeans creates but what should I make it ignore?
Bonus questions: I'll be using Bitbucket for hosting/backup, should I make the project public? Why/why not? Also would that make it open source? Should I bother with attaching a license? Because I have no idea how the licenses work and such. Then again, I sincerely doubt I'll be making anything anyone will want to copy too badly.
On repos:
You should use a VCS repository as soon as possible. Not only does it allow for off-site backups, but it allows you to track changes. To find out when a bug was introduced.
Having a VCS, particularly a distributed VCS like Mercurial, will change the way you code. You won't have to be careful not to break things, because you always have the old version that you can revert to. If, after a few weeks or months, you decide that a particular course of development was a bad idea, you can rewind all the way back to some prior point.
You should generally commit either every day, or every time you finish a task. Like, you might have a 4-hour task of "write this class and get it working". You commit after that. When you're done for the day, you commit. I wouldn't suggest committing in a non-compiling state, though, so you should try to stop for the day only when everything still builds.
As for ignoring, you should ignore things you don't intend to commit. Things that you wouldn't want in the repo. Generated files, temporary files, the stuff that you wouldn't need.
What should be in the repo is 100% of the information necessary to build the project from scratch (minus external dependencies).
On licenses:
I would say that it is very rude to put up a public repository without some kind of license attached. Without a license, that means that anyone even pulling from the repo (which is what you're inviting by making it public) is a violation of your copyright without direct, explicit permission from you.
So either look at how copyright works and pick a license to release your stuff under, or keep your repos private.
The instant you conceive of it, usually. There is no penalty for putting everything you have into a VCS right away, and who knows, you may want that super-duper prototype version later on down the road. Unless you're going to be checking in large media files (don't), my answer is: right now.
As for how often to commit, that's a personal preference. I use git, and as far as I'm concerned there is no problem with checking in every time you make some unit of significant change. The more snapshots along the way you have, the easier it becomes to track down bugs later on.
Your .hgignore is up to you. You need to decide what to version and what not to. As a rule of thumb, any files generated during compilation probably shouldn't be checked in.
Also, if you release it to the public, please attach a license. It doesn't matter what it is, but it makes all the bosses and lawyers in the world happy when they know what they can and can't do with your code. Finally, please don't release something to the public unless you think someone else will benefit from it. The internet already suffers from information overload.
Commit early, and commit often.
With distributed VCS like hg or git there's really no reason not to start a repo before you even write any code. If you decide to scrap your project, you can always delete your repo before you post it to whatever hosting site you're using. The more often you commit, the easier it will be to hone in on where you messed up later on (you will).
Stuff to put in your ignore file would be build files and anything that your IDE generates periodically, like user preferences and tag caches. You don't want to foist your preferences on someone else and you don't want to have to create a commit every time you change the position of a window in the IDE. What's in the repo should be the minimum that someone would need to be able to work on your project.
If you're just doing a small project to learn, then I wouldn't worry about licenses or having a private repo. You can always add a license later. Alternatively, just pick a permissive license and go with it.
Based on my experience with git, I would recommend the following:
Create repository along with project.
Commit as often as you can with detailed comments
When you want to test a new idea - create a branch. If the idea is successful, merge your code to the master branch. If the idea is not successful, you can just discard the branch. Probably this is the most important idea.
Put the names of object files, executables and editor backup files in the ignore list.
In my company we are using one SVN repository to hold our C++ code. The code base is composed from a common part (infrastructure and applications), and client projects (developed as plugins).
The repository layout looks like this:
Infrastructure
App1
App2
App3
project-for-client-1
App1-plugin
App2-plugin
Configuration
project-for-client-2
App1-plugin
App2-plugin
Configuration
A typical release for a client project includes the project data and every project that is used by it (e.g. Infrastructure).
The actual layout of each directory is -
Infrastructure
branches
tags
trunk
project-for-client-2
branches
tags
trunk
And the same goes for the rest of the projects.
We have several problems with the layout above:
It's hard to start a fresh development environment for a client project, since one has to checkout all of the involved projects (for example: Infrastructure, App1, App2, project-for-client-1).
It's hard to tag a release in a client projects, for the same reason as above.
In case a client project needs to change some common code (e.g. Infrastructure), we sometimes use a branch. It's hard to keep track which branches are used in projects.
Is there any way in SVN to solve any of the above? I thought of using svn:externals in the client projects, but after reading this post I understand it might not be right choice.
You could handle this with svn:externals. This is the url to a spot in an svn repo
This lets you pull in parts of a different repository (or the same one). One way to use this is under project-for-client2, you add an svn:externals link to the branch of infrastructure you need, the branch of app1 you need, etc. So when you check out project-for-client2, you get all of the correct pieces.
The svn:externals links are versioned along with everything else, so as project-for-client1 get tagged, branched, and updated the correct external branches will always get pulled in.
A suggestion is to change directory layout from
Infrastructure
branches
tags
trunk
project-for-client-1
branches
tags
trunk
project-for-client-2
branches
tags
trunk
to
branches
feature-1
Infrastructure
project-for-client-1
project-for-client-2
tags
trunk
Infrastructure
project-for-client-1
project-for-client-2
There are some problems with this layout too. Branches become massive, but at least it's easier to tag specific places in your code.
To work with the code, one would simply checkout the trunk and work with that. Then you don't need the scripts that check out all the different projects. They just refer to Infrastructure with "../Infrastructure". Another problem with this layout is that you need to checkout several copies if you want to work on projects completely independently. Otherwise a change in Infrastructure for one project might cause another project not to compile until that is updated too.
This might make releases a little more cumbersome too, and separating the code for different projects.
Yeah, it sucks. We do the same thing, but I can't really think of a better layout.
So, what we have is a set of scripts that can automate everything subversion related. Each customer's project will contain a file called project.list, which contains all of the subversion projects/paths needed to build that customer. For example:
Infrastructure/trunk
LibraryA/trunk
LibraryB/branches/foo
CustomerC/trunk
Each script then looks something like this:
for PROJ in $(cat project.list); do
# execute commands here
done
Where the commands might be a checkout, update or tag. It is a bit more complicated than that, but it does mean that everything is consistent, checking out, updating and tagging becomes a single command.
And of course, we try to branch as little as possible, which is the most important suggestion I can possibly make. If we need to branch something, we will try to either work off of the trunk or the previously-tagged version of as many of the dependencies as possible.
First, I don't agree that externals are evil. Although they aren't perfect.
At the moment you're doing multiple checkouts to build up a working copy. If you used externals it would do exactly this, but automatically and consistently each time.
If you point your externals to tags (and or specific revisions) within the target projects, you only need to tag the current project per release (as that tag would indicate exactly what external you were pointing to). You'd also have a record within your project of exactly when you changed your externals references to use a new version of a particular library.
Externals aren't a panacea - and as the post shows there can be problems. I'm sure there's something better than externals, but I haven't found it yet (even conceptually). Certainly, the structure you're using can yield a great deal of information and control in your development process, using externals can add to that. However, the problems he had weren't fundamental corruption issues - a clean get would resolve everything, and are pretty rare (are you really unable to create a new branch of a library in your repo?).
Points to consider - using recursive externals. I'm not sold on either the yes or no of this and tend to go with a pragmatic approach.
Consider using piston as the article suggests, I've not seen it in action so can't really comment, it may do the same job as externals in a better way.
From my experience I think it's more useful to have a repository for each individual project. Otherwise you have the problems you say and furthermore, revision numbers change if other projects change which could be confusing.
Only when there's a relation between individual projects, such as software, hardware schematics, documentation, etc. We use a single repository so the revision number serves to get the whole pack to a known state.
OK, so I have a Django project. I was wondering if I'm supposed to put each app in its own git repository, or is it better to just put the whole project into a git repository, or whether I should have a git repo for each app and also a git repo for the project?
Thanks.
It really depends on whether those reusable apps are actually going to be reused outside of the project or not?
The only reason you might need it in a different repo is if other projects with seperate repos might need to use it. But you can always split it out later. Making a git repo is cheap so it's one of those things you can do later if it becomes necessary. Making things complicated up front is just going to frustrate you later so feel free to wait till you know it's necessary
YAGNI it's for more than just code.
I like Tom's or Alex's answers, except they lack real rationale behind them
("easier to have one repository per development team" ?
"substantial numbers of people might be interested in pulling out (or watching changes to) separate apps"
why ?)
First of all, one or several repositories is an server-side administration decision (in term of server resources).
SVN easily sets up several repositories behind one server, whereas ClearCase will have its own "vob_server" process per VOB, meaning: you do not want more than a hundred VOB per (Unix) server (or more than 20-30 on a Windows server).
Git is particular: setting a repository is cheap, accessing it can be easy (through a simple shared path), or can involved a process (a git daemon). The latest solution means: "not to much repositories accessed directly from outside". (they can be accessed indirectly through submodules referenced by a super-project)
Then there is the client-side administration: how complex will the configuration of a workspace be, when one or several repositories are involved. How can the client references the right configuration? (list of labels needed to reference the correct files).
SVN would use externals, git "submodules", and in both case, that adds complexity.
That is why Tom's answer can be correct.
Finally, there is the configuration management aspect (referenced in Alex's answer). Can you tag part of a repo of not ?
If you can (like in SVN where you actually make a svn-copy of part of a repo), that means you can have a component approach, where several groups of files have their own life cycle, and their own tags (set at their own individual pace).
But in Git, that is not possible: a tag references a commit which always concerns the all repository.
That means a "system-based" approach, where you only want the all project anyway (as opposed to the "watching separate apps - I've never observed it in real life" bit from Alex's answer). If that is the case (if you want the all system anyway), that is not important.
But for those of us who think in term of "independent groups of files", that means a git repository actually represents an individual group of files (with its own rhythm in term of evolutions and tagging), with potentially a super-project referencing those repositories as submodules.
That is not your everyday setup, so for independent projects, I would recommend only few or one Git repo.
But for more complex interdependent set of projects... you need to realize that by the very way Git has been conceived, a "repository" represents a coherent set of files supposed to evolve at the same pace as a all. And not "all" can always fit in one set of files (if "all" is complex enough). Hence, several repositories would be required in this case.
And a complex set of inter-dependent projects does happen in real life too ;)
Almost always easier to have one repository per development team, no matter how many products you have. Eventually you will want to share code between projects (even several seperate DJango websites!) and it is much easier to do with only one repository.
By all means set up a nice folder structure WITHIN that repository. However, a single checkout from git (probably from a subfolder) should give you all the files you need for your test website (python, templates, css, jpegs...), and then you just copy httpd.conf and so on to complete the installation.
I'm no git expert, but I don't believe the appropriate strategy would be any different from one for hg, or even, in this case, svn, so I'm going to give my 2 cents' worth anyway;-).
A repo per app makes any sense only if substantial numbers of people might be interested in pulling out (or watching changes to) separate apps; this could theoretically be the case but I've never observed it in real life -- mostly, people who are interested in the project are interested in it as a whole. Therefore, I would avoid complications and make the whole proj into one repo.
Do those individual apps utilize shared code and libraries? If so, they really belong in the same repository .. which enables you to see the impact of new changes when running a single testing suite.
If they are completely separate and agnostic, it really doesn't matter. Just for sanity, ease of building and ease of packaging, I prefer to keep a whole project in a combined repository. However, we typically work in very small teams. So, if no shared code is involved, its really just a question of which method is most convenient and efficient for all concerned.
If you have reusable apps, put them in a seperate repo.
If you are worried about the number of private repositories have a look at bitbucket (if you decide to make them open source, I advice github)
There is a nice way of including your own apps in to the project, even making use of version tags:
I recently found out a way to do this with buildout and git tags
(I used svn:externals to include an app, but switched from svn to git recently).
I tried mr.developer first, but while not getting this to work i found an alternative for mr.developer:
I found out gp.vcsdevlop is very easy to use for this purpose.
see https://pypi.python.org/pypi/gp.vcsdevelop
I ended up, putting this in my buildout file and got it working at once (I had to add a pip requirements.txt to my app to get it working, but that's a good thing after all):
vcs-update = True
extensions =
gp.vcsdevelop
buildout-versions
develop-dir=./local_checkouts
vcs-extend-develop=git+git#bitbucket.org:<my bitbucket username>/<the app I want to include>.git#0.1.38#egg=<the appname in django>
develop = .
in this case it chacks out my app and clone it to version tag 0.1.38 in to the project in the subdit ./local_checkouts/
and develops it when running bin/buildout
Edit: remark 26 august 2013
While using this solution and editing in this local checkout of the app used in a project. I found out this:
you will get this warning when trying the normal 'git push origin master' command:
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes before pushing again. See the 'Note about
fast-forwards' section of 'git push --help' for details.
EDIT 28 august 2013
About working in the local_checkouts for shared apps included by gp.vcsdevelop:
(and handle the warning discussed in the remakr above)
git push origin +master seems to screw up commit history for the shared code
So the way to work in the local_checkout dir is like this:
go in to the local checkout (after a bin/buildout so the checkout is the master):
cd localcheckouts/<shared appname>
Create a new branch and swith to it like this:
use git checkout -b 'issue_nr1' (e.g. the name of the branch is here named after the issue you are working on)
and after you are done working in this branche, use: (after the usuals git add and git commit)
git push origin issue_nr1
when tested and completed merge the branch back in the master:
first checkout the master:
git checkout master
update (probably only neede when other commits in the mean time)
git pull
and merge to the master (where you are in right now):
git merge issue_nr1
and finaly push this merge:
git push origin master
(with sepcial thanks to this simplified guide to git: http://rogerdudler.github.io/git-guide/)
and after a while, to clean up the branches, you might want to delete this branch
git branch -d issue_nr1