I have a build process which includes making changes to files and committing them to source control.
I am moving to use mercurial, and trying to figure out how to best do this.
The problem is that the process pulls from the main repository at the start and pushes back at the end (about 2 hours later). If someone makes changes to the repository during that time, push will fail because it is creating another head.
The obvious solution is to pull and merge before pushing, but in theory someone could still make changes even in this smaller time window.
What is the best way to handle this situation?
In general, source control is for human output not build artifacts. Consider have the build artifacts go into a dedicated artifact repository or separate repo. If that's not workable maybe have the build merge default into a ci branch, do the build, and commit to ci. Then you'll always have the commit from the build right after the code that went into it, and you can pull from the ci branch on your deploys.
Related
The problem is that the source code distribution is not exactly the code that runs after installation. The installer, which runs when the site is accessed for the first time, generates a lot of code. Also, a running system stores some data in php source code (e.g. user profiles - under the /user_privileges directory) rather than in the database. So, I have the following unsatisfactory possibilities.
(1) Put the original source code under VC and edit it. In this case I have to do a fresh install and run the installer every time to see how my changes are working.
(2) Put the installed source code (after the installer has run) under VC, and edit it. In this case I have immediate feedback, but I can't use that code for new installations. I also have to exclude everything that the running system writes in the source tree from the VC.
Any suggestions?
I am working with Vtiger CRM version 6.0Beta, but any tips relevant to version 5 would help.
Thanks.
Choice 1 is appropriate. VC must always track the source code, not the products of any interpreter or processing. I feel your pain. It is so easy to tweak that Vtiger source code, and VC tends to be left by the wayside.
Get familiar with GIT. Really, it is what you want. Look here, I already did it.
Copy the original code in one branch
Copy the modified code into another branch
Make a diff or better, run git format-patch
Install (checkout) your new Version
Check the patches and apply them, if necessary.
Bonusses
Have a private and a public remote for your repo, so that you can keep track on the crumpy files in user_privileges and friends in private, but share code with others
Have an absolutely beautiful backup with daily rollback by just setting up a branch, a remote and a cronjob.
Beeing able to replicate the live situation within minutes for local development
Painfree Updates !!
I know, this is no easy task, but once done, it will make your live pretty much easier.
Our software is split into multiple components.
Msbuild scripts automate our build and batch scripts to invoke it.
We are taking the daily build of our components even if we have small changes.
We want to move to continuous integration so that whenever a check-in happens a build is triggered.
Our msbuild scripts are written in such a way that it will build all the sln files for the component.
In continuous integration, do I only need to build the slns that are modified?
If I only build the changed items, do I have to write a msbuild for each sln?
Can I simply use the existing msbuild script in teamcity?
I have done both, and there are advantages and disadvantages to both:
Incremental Build (only what's changed)
This is the default behavior of MSBuild. Even when you build in Visual Studio, it only builds what changed (unless you choose "Rebuild All").
Pros
Faster The faster your feedback loop, the better.
Less load on build server disk and network since you don't have to check everything out of source control each time. This may be important in high load build clusters.
Cons
Possible Source Control Hiccups We have had issues where a complicated rename or restructuring of our source tree caused the checkout to fail. We were using subversion, so it was the update that failed. Had we been doing a clean checkout, this would not have happened (of course a clean checkout means we must do a full rebuild).
Chance of false positives. We had a case where we weren't completely wiping the build agent's source files and checking out fresh copies from source. Someone changed the build file such that it didn't copy the binaries properly before testing them, but since the old binaries were still there on the disk, it was running the tests from them. The build was broken, but it was running old binary tests, so we didn't realize we had introduced bugs until I spotted the problem a week later.
Full Checkout and Rebuild
Pros
More Robust You rule out source control update issues and the chance of false positives becuase you are starting with a clean slate each and every time. This removes the possiblity that a previous build could affect this build.
Cons
Much Slower This involves waiting for a full checkout from source, which may take a long time if the project is large. Plus it requires building the entire source tree from scratch.
Higher Load on Build Agent, Disk & Network Because you are checking out and rebuilding everything, you will tax the cpu and disk of the build agent more, and also of the network (and your source control system as well).
Third Option: Incremental Checkout and Rebuild
This is the case where you only pull incremental changes from source control, but perform a full rebuild.
Pros
Faster than full checkout and is actually only slightly slower than incrementally building. Build times are usually small compared to the time it takes to do a full checkout, or run your tests.
Somewhat more robust since you rebuild all the source, but false positives can still sneak in.
Cons
Possible Source Control Hiccups See my comments on the Incremental Build.
Chance of false positives See my comments on the Incremental Build.
Which is best?
That depends on the balance between your need for fast feedback (speed), and the resource constraints on your network and build agent(s). If you had lots of resources and wanted both the fast feedback and the robustness of a full rebuild, you could build twice. Perhaps on checkin, do an incremental build, but nightly perform a full checkout and rebuild.
Both, sort of.
We do a scheduled daily full build, and the incremental during the day (gated checkin).
All depends on how big and complex your build is really, but you should err towards full if you are going for one or the other.
Whatever your granularity of build is, my bias would be to do a full, clean build of that whole component. If that build time is excessively long for a CI loop, consider breaking that component into smaller parts that are dependent on each-other's binaries. In that case, you would build what changed, and any other components that are dependent on the component that changed. In the .Net world, I'm hearing that Nuget is getting increasingly popular for managing those binary dependencies.
I talk a lot more about this strategy in Practice 2 of my CI pain relief series.
We have a lot of small usually one-of projects that all get put as apps inside of a Django project, and I'd like to remove the them from the code to keep it clean, and also to not have to worry about year old projects when doing upgrades to our existing codebase.
Should I just flat out do a git rm src/clients/my_project/ (and remove all references) together with a git commit -m "Removed my_project". It seems like it would be less obvious that a whole project had been removed if it's just another commit message and would disappear in the noise.
In a few cases we want to recover an old codebase as some clients requests their projects to be re-run or we are doing an adaptation upon something already existing, but definitely in the majority of cases. How do I make it reasonable and obvious there's an old project that can be recovered?
I suppose one solution would be to change to having one git repo per-project, however these projects are very small and doesn't seem to warrant the overhead of setting up github, jenkins and the servers for the deploys etc.
Have anyone solved this in their own organisation?
It probably is a good idea to use a separate repo per project. You get much cleaner histories, you don't have to worry about checking out lots of code for projects that are irrelevant, and it's just generally more flexible.
That said, to solve your immediate problem of losing the projects in the noise, you could create a tag for every deletion.
git commit -m 'Removed my_project'
git tag -a deletes/my_project -m 'Deletion of my_project'
This way you can see all the project deletions in your tag list, and you can find them again easily if you need to reference the project (just remember to look at the parent of deletes/my_project to actually get that project's code).
"Small one off projects" sounds like a perfect candidate for topic branches. Especially if they share lots of common code with your main projects. Branches are what git is really good at.
Branching off small projects means you can use the full power of git to manage your projects. You can easily merge or rebase related bug fixes from your main projects into your small projects. You can easily diff between projects. You can branch off a sub-project from a similar project. And if a project turns out to be useful you can merge it back to your master branch.
Also, it makes it easy to deploy projects independently. Just check out the different projects to different servers (or different vhosts on the same server).
When a project becomes big enough to warrant its own repo you can easily convert that branch into its own repo by cloning it again. If you don't want the old history you can get rid of it by either cloning with depth 1 or squashing its history.
We've had problems recently where developers commit code to SVN that doesn't pass unit tests, fails to compile on all platforms, or even fails to compile on their own platform. While this is all picked up by our CI server (Cruise Control), and we've instituted processes to try to stop it from happening, we'd really like to be able to stop the rogue commits from happening in the first place.
Based on a few other questions around here, it seems to be a Bad Idea™ to force this as a pre-commit hook on the server side mostly due to the length of time required to build + run the tests. I did some Googling and found this (all devs use TortoiseSVN):
http://cf-bill.blogspot.com/2010/03/pre-commit-force-unit-tests-without.html
Which would solve at least two of the problems (it wouldn't build on Unix), but it doesn't reject the commit if it fails. So my questions:
Is there a way to make a pre-commit hook in TortoiseSVN cause the commit to fail?
Is there a better way to do what I'm trying to do in general?
There is absolutely no reason why your pre-commit hook can't run the Unit tests! All your pre-commit hook has to do is:
Checkout the code to a working directory
Compile everything
Run all the unit tests
Then fail the hook if the unit tests fail.
It's completely possible to do. And, afterwords, everyone in your development shop will hate your guts.
Remember that in a pre-commit hook, the entire hook has to complete before it can allow the commit to take place and control can be returned to the user.
How long does it take to do a build and run through the unit tests? 10 minutes? Imagine doing a commit and sitting there for 10 minutes waiting for your commit to take place. That's the reason why you're told not to do it.
Your continuous integration server is a great place to do your unit testing. I prefer Hudson or Jenkins over CruiseControl. They're easier to setup, and their webpage are more user friendly. Even better they have a variety of plugins that can help.
Developers don't like it to be known that they broke the build. Imagine if everyone in your group got an email stating you committed bad code. Wouldn't you make sure your code was good before you committed it?
Hudson/Jenkins have some nice graphs that show you the results of the unit testing, so you can see from the webpage what tests passed and failed, so it's very clear exactly what happened. (CruiseControl's webpage is harder for the average eye to parse, so these things aren't as obvious).
One of my favorite Hudson/Jenkins plugin is the Continuous Integration Game. In this plugin, users are given points for good builds, fixing unit tests, and creating more passed unit tests. They lose points for bad builds and breaking unit tests. There's a scoreboard that shows all the developer's points.
I was surprised how seriously developers took to it. Once they realized that their CI game scores were public, they became very competitive. They would complain when the build server itself failed for some odd reason, and they lost 10 points for a bad build. However, the number of failed unit tests dropped way, way down, and the number of unit tests that were written soared.
There are two approaches:
Discipline
Tools
In my experience, #1 can only get you so far.
So the solution is probably tools. In your case, the obstacle is Subversion. Replace it with a DVCS like Mercurial or Git. That will allow every developer to work on their own branch without the merge nightmares of Subversion.
Every once in a while, a developer will mark a feature or branch as "complete". That is the time to merge the feature branch into the main branch. Push that into a "staging" repository which your CI server watches. The CI server can then pull the last commit(s), compile and test them and only if this passes, push them to the main repository.
So the loop is: main repo -> developer -> staging -> main.
There are many answers here which give you the details. Start here: Mercurial workflow for ~15 developers - Should we use named branches?
[EDIT] So you say you don't have the time to solve the major problems in your development process ... I'll let you guess how that sounds to anyone... ;-)
Anyway ... Use hg convert to get a Mercurial repo out of your Subversion tree. If you have a standard setup, that shouldn't take much of your time (it will just need a lot of time on your computer but it's automatic).
Clone that repo to get a work repo. The process works like this:
Develop in your second clone. Create feature branches for that.
If you need changes from someone, convert into the first clone. Pull from that into your second clone (that way, you always have a "clean" copy from subversion just in case you mess up).
Now merge the Subversion branch (default) and your feature branch. That should work much better than with Subversion.
When the merge is OK (all the tests run for you), create a patch from a diff between the two branches.
Apply the patch to a local checkout from Subversion. It should apply without problems. If it doesn't, you can clean your local checkout and repeat. No chance to lose work here.
Commit the changes in subversion, convert them back into repo #1 and pull into repo #2.
This sounds like a lot of work but within a week, you'll come up with a script or two to do most of the work.
When you notice someone broke the build (tests aren't running for you anymore), undo the merge (hg clean -C) and continue to work on your working feature branch.
When your colleagues complain that someone broke the build, tell them that you don't have a problem. When people start to notice that your productivity is much better despite all the hoops that you've got to jump, mention "it would be much more simple if we would scratch SVN".
The best thing to do is to work to improve the culture of your team, so that each developer feels enough of a commitment to the process that they'd be ashamed to check in without making sure it works properly, in whatever ways you've all agreed.
OK, so I have a Django project. I was wondering if I'm supposed to put each app in its own git repository, or is it better to just put the whole project into a git repository, or whether I should have a git repo for each app and also a git repo for the project?
Thanks.
It really depends on whether those reusable apps are actually going to be reused outside of the project or not?
The only reason you might need it in a different repo is if other projects with seperate repos might need to use it. But you can always split it out later. Making a git repo is cheap so it's one of those things you can do later if it becomes necessary. Making things complicated up front is just going to frustrate you later so feel free to wait till you know it's necessary
YAGNI it's for more than just code.
I like Tom's or Alex's answers, except they lack real rationale behind them
("easier to have one repository per development team" ?
"substantial numbers of people might be interested in pulling out (or watching changes to) separate apps"
why ?)
First of all, one or several repositories is an server-side administration decision (in term of server resources).
SVN easily sets up several repositories behind one server, whereas ClearCase will have its own "vob_server" process per VOB, meaning: you do not want more than a hundred VOB per (Unix) server (or more than 20-30 on a Windows server).
Git is particular: setting a repository is cheap, accessing it can be easy (through a simple shared path), or can involved a process (a git daemon). The latest solution means: "not to much repositories accessed directly from outside". (they can be accessed indirectly through submodules referenced by a super-project)
Then there is the client-side administration: how complex will the configuration of a workspace be, when one or several repositories are involved. How can the client references the right configuration? (list of labels needed to reference the correct files).
SVN would use externals, git "submodules", and in both case, that adds complexity.
That is why Tom's answer can be correct.
Finally, there is the configuration management aspect (referenced in Alex's answer). Can you tag part of a repo of not ?
If you can (like in SVN where you actually make a svn-copy of part of a repo), that means you can have a component approach, where several groups of files have their own life cycle, and their own tags (set at their own individual pace).
But in Git, that is not possible: a tag references a commit which always concerns the all repository.
That means a "system-based" approach, where you only want the all project anyway (as opposed to the "watching separate apps - I've never observed it in real life" bit from Alex's answer). If that is the case (if you want the all system anyway), that is not important.
But for those of us who think in term of "independent groups of files", that means a git repository actually represents an individual group of files (with its own rhythm in term of evolutions and tagging), with potentially a super-project referencing those repositories as submodules.
That is not your everyday setup, so for independent projects, I would recommend only few or one Git repo.
But for more complex interdependent set of projects... you need to realize that by the very way Git has been conceived, a "repository" represents a coherent set of files supposed to evolve at the same pace as a all. And not "all" can always fit in one set of files (if "all" is complex enough). Hence, several repositories would be required in this case.
And a complex set of inter-dependent projects does happen in real life too ;)
Almost always easier to have one repository per development team, no matter how many products you have. Eventually you will want to share code between projects (even several seperate DJango websites!) and it is much easier to do with only one repository.
By all means set up a nice folder structure WITHIN that repository. However, a single checkout from git (probably from a subfolder) should give you all the files you need for your test website (python, templates, css, jpegs...), and then you just copy httpd.conf and so on to complete the installation.
I'm no git expert, but I don't believe the appropriate strategy would be any different from one for hg, or even, in this case, svn, so I'm going to give my 2 cents' worth anyway;-).
A repo per app makes any sense only if substantial numbers of people might be interested in pulling out (or watching changes to) separate apps; this could theoretically be the case but I've never observed it in real life -- mostly, people who are interested in the project are interested in it as a whole. Therefore, I would avoid complications and make the whole proj into one repo.
Do those individual apps utilize shared code and libraries? If so, they really belong in the same repository .. which enables you to see the impact of new changes when running a single testing suite.
If they are completely separate and agnostic, it really doesn't matter. Just for sanity, ease of building and ease of packaging, I prefer to keep a whole project in a combined repository. However, we typically work in very small teams. So, if no shared code is involved, its really just a question of which method is most convenient and efficient for all concerned.
If you have reusable apps, put them in a seperate repo.
If you are worried about the number of private repositories have a look at bitbucket (if you decide to make them open source, I advice github)
There is a nice way of including your own apps in to the project, even making use of version tags:
I recently found out a way to do this with buildout and git tags
(I used svn:externals to include an app, but switched from svn to git recently).
I tried mr.developer first, but while not getting this to work i found an alternative for mr.developer:
I found out gp.vcsdevlop is very easy to use for this purpose.
see https://pypi.python.org/pypi/gp.vcsdevelop
I ended up, putting this in my buildout file and got it working at once (I had to add a pip requirements.txt to my app to get it working, but that's a good thing after all):
vcs-update = True
extensions =
gp.vcsdevelop
buildout-versions
develop-dir=./local_checkouts
vcs-extend-develop=git+git#bitbucket.org:<my bitbucket username>/<the app I want to include>.git#0.1.38#egg=<the appname in django>
develop = .
in this case it chacks out my app and clone it to version tag 0.1.38 in to the project in the subdit ./local_checkouts/
and develops it when running bin/buildout
Edit: remark 26 august 2013
While using this solution and editing in this local checkout of the app used in a project. I found out this:
you will get this warning when trying the normal 'git push origin master' command:
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes before pushing again. See the 'Note about
fast-forwards' section of 'git push --help' for details.
EDIT 28 august 2013
About working in the local_checkouts for shared apps included by gp.vcsdevelop:
(and handle the warning discussed in the remakr above)
git push origin +master seems to screw up commit history for the shared code
So the way to work in the local_checkout dir is like this:
go in to the local checkout (after a bin/buildout so the checkout is the master):
cd localcheckouts/<shared appname>
Create a new branch and swith to it like this:
use git checkout -b 'issue_nr1' (e.g. the name of the branch is here named after the issue you are working on)
and after you are done working in this branche, use: (after the usuals git add and git commit)
git push origin issue_nr1
when tested and completed merge the branch back in the master:
first checkout the master:
git checkout master
update (probably only neede when other commits in the mean time)
git pull
and merge to the master (where you are in right now):
git merge issue_nr1
and finaly push this merge:
git push origin master
(with sepcial thanks to this simplified guide to git: http://rogerdudler.github.io/git-guide/)
and after a while, to clean up the branches, you might want to delete this branch
git branch -d issue_nr1