Handling relations between multiple subversion projects

Handling relations between multiple subversion projects - c++

In my company we are using one SVN repository to hold our C++ code. The code base is composed from a common part (infrastructure and applications), and client projects (developed as plugins).
The repository layout looks like this:
Infrastructure
App1
App2
App3
project-for-client-1
App1-plugin
App2-plugin
Configuration
project-for-client-2
App1-plugin
App2-plugin
Configuration
A typical release for a client project includes the project data and every project that is used by it (e.g. Infrastructure).
The actual layout of each directory is -
Infrastructure
branches
tags
trunk
project-for-client-2
branches
tags
trunk
And the same goes for the rest of the projects.
We have several problems with the layout above:
It's hard to start a fresh development environment for a client project, since one has to checkout all of the involved projects (for example: Infrastructure, App1, App2, project-for-client-1).
It's hard to tag a release in a client projects, for the same reason as above.
In case a client project needs to change some common code (e.g. Infrastructure), we sometimes use a branch. It's hard to keep track which branches are used in projects.
Is there any way in SVN to solve any of the above? I thought of using svn:externals in the client projects, but after reading this post I understand it might not be right choice.

You could handle this with svn:externals. This is the url to a spot in an svn repo
This lets you pull in parts of a different repository (or the same one). One way to use this is under project-for-client2, you add an svn:externals link to the branch of infrastructure you need, the branch of app1 you need, etc. So when you check out project-for-client2, you get all of the correct pieces.
The svn:externals links are versioned along with everything else, so as project-for-client1 get tagged, branched, and updated the correct external branches will always get pulled in.

A suggestion is to change directory layout from
Infrastructure
branches
tags
trunk
project-for-client-1
branches
tags
trunk
project-for-client-2
branches
tags
trunk
to
branches
feature-1
Infrastructure
project-for-client-1
project-for-client-2
tags
trunk
Infrastructure
project-for-client-1
project-for-client-2
There are some problems with this layout too. Branches become massive, but at least it's easier to tag specific places in your code.
To work with the code, one would simply checkout the trunk and work with that. Then you don't need the scripts that check out all the different projects. They just refer to Infrastructure with "../Infrastructure". Another problem with this layout is that you need to checkout several copies if you want to work on projects completely independently. Otherwise a change in Infrastructure for one project might cause another project not to compile until that is updated too.
This might make releases a little more cumbersome too, and separating the code for different projects.

Yeah, it sucks. We do the same thing, but I can't really think of a better layout.
So, what we have is a set of scripts that can automate everything subversion related. Each customer's project will contain a file called project.list, which contains all of the subversion projects/paths needed to build that customer. For example:
Infrastructure/trunk
LibraryA/trunk
LibraryB/branches/foo
CustomerC/trunk
Each script then looks something like this:
for PROJ in $(cat project.list); do
# execute commands here
done
Where the commands might be a checkout, update or tag. It is a bit more complicated than that, but it does mean that everything is consistent, checking out, updating and tagging becomes a single command.
And of course, we try to branch as little as possible, which is the most important suggestion I can possibly make. If we need to branch something, we will try to either work off of the trunk or the previously-tagged version of as many of the dependencies as possible.

First, I don't agree that externals are evil. Although they aren't perfect.
At the moment you're doing multiple checkouts to build up a working copy. If you used externals it would do exactly this, but automatically and consistently each time.
If you point your externals to tags (and or specific revisions) within the target projects, you only need to tag the current project per release (as that tag would indicate exactly what external you were pointing to). You'd also have a record within your project of exactly when you changed your externals references to use a new version of a particular library.
Externals aren't a panacea - and as the post shows there can be problems. I'm sure there's something better than externals, but I haven't found it yet (even conceptually). Certainly, the structure you're using can yield a great deal of information and control in your development process, using externals can add to that. However, the problems he had weren't fundamental corruption issues - a clean get would resolve everything, and are pretty rare (are you really unable to create a new branch of a library in your repo?).
Points to consider - using recursive externals. I'm not sold on either the yes or no of this and tend to go with a pragmatic approach.
Consider using piston as the article suggests, I've not seen it in action so can't really comment, it may do the same job as externals in a better way.

From my experience I think it's more useful to have a repository for each individual project. Otherwise you have the problems you say and furthermore, revision numbers change if other projects change which could be confusing.
Only when there's a relation between individual projects, such as software, hardware schematics, documentation, etc. We use a single repository so the revision number serves to get the whole pack to a known state.

Related

How and where should I put a version number in my Django project?

I'm making a Django project consisting of several apps and I want to use a version number for the whole project, which would be useful for tracking the status of the project between each time it comes to production.
I've read and googled and I've found how to put a version number for each django app of mine, but not for a whole project.
I assume that the settings.py (in my case it would be base.py, because the settings are inherited for each environment: developmente, pre-production, production) would be the ideal file for storing it, but I would like to know good practices from other Django programmers, because I haven't found any.
Thank you in advance

I don't think I've ever needed to do this, but the two obvious choices would be either the settings file, as you state, or alternatively the __init__.py in the main project app.

You don't need it to relate to django, you can tag a commit in your source control to provide a marker of a particular version (as well as a separate branch for releases).
From the docs for git tagging
Git has the ability to tag specific points in history as being important. Typically people use this functionality to mark release points (v1.0, and so on).
You could use the same versioning number system as google if you so wish which relates to
year.month.day.optional_revision # i.e 2016.05.03 for today
Doing this would make it easier to track back to previous versions since it won't be overwritten in source code by newer version numbers.

What's good practice for archiving a portion of your unused code to be pulled back in later if needed?

We have a lot of small usually one-of projects that all get put as apps inside of a Django project, and I'd like to remove the them from the code to keep it clean, and also to not have to worry about year old projects when doing upgrades to our existing codebase.
Should I just flat out do a git rm src/clients/my_project/ (and remove all references) together with a git commit -m "Removed my_project". It seems like it would be less obvious that a whole project had been removed if it's just another commit message and would disappear in the noise.
In a few cases we want to recover an old codebase as some clients requests their projects to be re-run or we are doing an adaptation upon something already existing, but definitely in the majority of cases. How do I make it reasonable and obvious there's an old project that can be recovered?
I suppose one solution would be to change to having one git repo per-project, however these projects are very small and doesn't seem to warrant the overhead of setting up github, jenkins and the servers for the deploys etc.
Have anyone solved this in their own organisation?

It probably is a good idea to use a separate repo per project. You get much cleaner histories, you don't have to worry about checking out lots of code for projects that are irrelevant, and it's just generally more flexible.
That said, to solve your immediate problem of losing the projects in the noise, you could create a tag for every deletion.
git commit -m 'Removed my_project'
git tag -a deletes/my_project -m 'Deletion of my_project'
This way you can see all the project deletions in your tag list, and you can find them again easily if you need to reference the project (just remember to look at the parent of deletes/my_project to actually get that project's code).

"Small one off projects" sounds like a perfect candidate for topic branches. Especially if they share lots of common code with your main projects. Branches are what git is really good at.
Branching off small projects means you can use the full power of git to manage your projects. You can easily merge or rebase related bug fixes from your main projects into your small projects. You can easily diff between projects. You can branch off a sub-project from a similar project. And if a project turns out to be useful you can merge it back to your master branch.
Also, it makes it easy to deploy projects independently. Just check out the different projects to different servers (or different vhosts on the same server).
When a project becomes big enough to warrant its own repo you can easily convert that branch into its own repo by cloning it again. If you don't want the old history you can get rid of it by either cloning with depth 1 or squashing its history.

One project, Multiple customers with git?

Im new to GIT and dont know yet how much it will fit my needs, but it looks impressive.
I have a single webapp that i use for differents customers (django+javascript)
I plan to use GIT to handle these differents customers version as branches. Each customer can have custom files, folders and settings, improved versions... but the should share the same 'core'. We are a small team and suscribed a github account.
Is the branch the good way to handle this case ?
About the settings file, how would you proceed ? Would you .gitignore the customer specific settings file and add a settings.xml.sample file for example is the repo ?
Also, is there any way to prevent some files to be merged into master ? (but commited to the customer branch). For example, id like to save some customer data to the customer branch but prevent from to being commited to master.
Is the .gitignore file branch-specific ? YES
EDIT
After reading all your answers (thanks!) i decided to first refactor my django project structure to isolate the core and my differents apps in an apps subfolder. Doing this makes a cleaner project, and tweaking the .gitignore file makes it easy to use git branches to manage the differents customers and settings !
Ju.

In addition to cpharmston's answer, it sounds like you need to do some refactoring to separate out what is truly custom for each client and what isn't. Then you may consider adding additional repositories to track the customizations for each client (entirely new repos, not branches). Then your deployment can pull your "core" from your main repo, and the client-specific stuff from that repo.

I would not use branches to accomplish what you are trying to do.
In source control, branches are intended to be used for things that are meant to be merged back into trunk. For example, Alex Gaynor spent his summer of code working on a branch of Django that allows for support for multiple databases, with the goal of eventually merging it back into the Django trunk.
Checkouts (or clones, in Git's case) might better suit what you are trying to do. You would create a repo containing all of the project's base files (and .sample files, if you will), and clone the repo to all the various locations you wish to deploy the code. Then manually create the configuration and customization files at each deployment (take care not to add them to the repo). Whenever you update the code in the repo, run a pull on each deployment to update the code. Viola!

Other answers are correct that you'll be in the best shape for maintenance to the extent that you separate out your core code from the custom per-client code. However, I'll break from the crowd and say that if you're unable to do that (say because you need to add extra functionality to core code for a certain client), DVCS branches would work just fine for what you want to do. Though I'd probably recommend per-directory branches rather than in-repo branches for this purpose (git can do per-directory branches as well, it's nothing but a cloned repo that diverges).
I use hg, not git, but all of my Django projects are cloned from the same base "project template" repo that has utility scripts, a basic common set of INSTALLED_APPS, etc. This means when I make changes to that project template, I can easily merge those common updates into existing projects. This isn't exactly the same as what you're planning, but is similar. You will occasionally have to deal with merge conflicts, if you modify the same area of code in the core that you've already customized for a specific client.

Matthew Talbert is correct, you really need to separate the custom stuff from non-custom stuff. If you can refactor all the core code to be contained in one directory, your clients can use it as a read-only git submodule. The added benefit is that you lock them into an explicit version of the core code. This means that they would have to consciously update to a newer revision, which is what you want for production code.

After reading all your answers (thanks!) i decided to first refactor my django project structure to isolate the core and my differents apps in an apps subfolder. Doing this makes a cleaner project, and tweaking the .gitignore in the differents branches file makes it easy to use git branches to manage the differents customers and settings !

Best practice for managing project variants in Git?

I have to develop two Django projects which share 90% of the same code, but have some variations in several applications, templates and within the model itself.
I'm using Git for distributed source-control.
My requirements are that :
common code for both projects is developed in one place (Project1's development environment)
periodically this is merged into the development environment of the second project (Project2)
the variations are not easily encapsulated within apps. (eg. there are apps. such as "profiles" which vary between Project1 and Project2 but for which there's also an ongoing common evolution)
both Project1 and Project2 have public repositories, so that I can collaborate with others
similarly Project1 and Project2 ought to have development, demo, staging and production servers.
however, the public repository is not on the same server in both cases. So, for example, when I'm developing in Project1, I want to be able to "push" to my github server, but not have Project2 stuff going there.
there are files such as local_settings.py which are completely different between Project1 and Project2, but should be shared between multiple developers of each Project
So what's the best way of managing this situation?
What would seem to be ideal would be something like a "filtered pull" where instead of .gitignore saying "ignore this file entirely", I can say "ignore this file when pulling from that repo" I couldn't see anything quite like that in the documentation, but might there be something like that?

Move the common code to its own library and make it a dependency of those two projects. This is not a question of version control, but a question of code reuse, design and eliminating duplication.

Considering it is a Django / Pinax site, with variants scattered around inside several different applications, I would not recommend using submodules.
The variants should be managed independently in project1 branch and project2 branch, removing the need to "filter" a gitignore result.
If you identify some really common codes, they may end up in a third repo you could then "subtree merged" to project1 and project2 repositories (the meaning of subtree merge strategy being illustrated in this SO answer)

You can use two different git branches for development. When you make changes in one that are common to the other, just git-cherrypick them over. You can push and pull specific branches, too, so that no one ever need know you're working on both of them at the same time.

I would try submodules: http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#submodules

I would make a third repo where I would place the code that Projects share. Then Project1 and Project2 would have their own repo and they can pull from that "shared" third repo.
I think that your idea of "filtered pull" will make it difficult to mantein.

Do you version control the invidual apps or the whole project or both?

OK, so I have a Django project. I was wondering if I'm supposed to put each app in its own git repository, or is it better to just put the whole project into a git repository, or whether I should have a git repo for each app and also a git repo for the project?
Thanks.

It really depends on whether those reusable apps are actually going to be reused outside of the project or not?
The only reason you might need it in a different repo is if other projects with seperate repos might need to use it. But you can always split it out later. Making a git repo is cheap so it's one of those things you can do later if it becomes necessary. Making things complicated up front is just going to frustrate you later so feel free to wait till you know it's necessary
YAGNI it's for more than just code.

I like Tom's or Alex's answers, except they lack real rationale behind them
("easier to have one repository per development team" ?
"substantial numbers of people might be interested in pulling out (or watching changes to) separate apps"
why ?)
First of all, one or several repositories is an server-side administration decision (in term of server resources).
SVN easily sets up several repositories behind one server, whereas ClearCase will have its own "vob_server" process per VOB, meaning: you do not want more than a hundred VOB per (Unix) server (or more than 20-30 on a Windows server).
Git is particular: setting a repository is cheap, accessing it can be easy (through a simple shared path), or can involved a process (a git daemon). The latest solution means: "not to much repositories accessed directly from outside". (they can be accessed indirectly through submodules referenced by a super-project)
Then there is the client-side administration: how complex will the configuration of a workspace be, when one or several repositories are involved. How can the client references the right configuration? (list of labels needed to reference the correct files).
SVN would use externals, git "submodules", and in both case, that adds complexity.
That is why Tom's answer can be correct.
Finally, there is the configuration management aspect (referenced in Alex's answer). Can you tag part of a repo of not ?
If you can (like in SVN where you actually make a svn-copy of part of a repo), that means you can have a component approach, where several groups of files have their own life cycle, and their own tags (set at their own individual pace).
But in Git, that is not possible: a tag references a commit which always concerns the all repository.
That means a "system-based" approach, where you only want the all project anyway (as opposed to the "watching separate apps - I've never observed it in real life" bit from Alex's answer). If that is the case (if you want the all system anyway), that is not important.
But for those of us who think in term of "independent groups of files", that means a git repository actually represents an individual group of files (with its own rhythm in term of evolutions and tagging), with potentially a super-project referencing those repositories as submodules.
That is not your everyday setup, so for independent projects, I would recommend only few or one Git repo.
But for more complex interdependent set of projects... you need to realize that by the very way Git has been conceived, a "repository" represents a coherent set of files supposed to evolve at the same pace as a all. And not "all" can always fit in one set of files (if "all" is complex enough). Hence, several repositories would be required in this case.
And a complex set of inter-dependent projects does happen in real life too ;)

Almost always easier to have one repository per development team, no matter how many products you have. Eventually you will want to share code between projects (even several seperate DJango websites!) and it is much easier to do with only one repository.
By all means set up a nice folder structure WITHIN that repository. However, a single checkout from git (probably from a subfolder) should give you all the files you need for your test website (python, templates, css, jpegs...), and then you just copy httpd.conf and so on to complete the installation.

I'm no git expert, but I don't believe the appropriate strategy would be any different from one for hg, or even, in this case, svn, so I'm going to give my 2 cents' worth anyway;-).
A repo per app makes any sense only if substantial numbers of people might be interested in pulling out (or watching changes to) separate apps; this could theoretically be the case but I've never observed it in real life -- mostly, people who are interested in the project are interested in it as a whole. Therefore, I would avoid complications and make the whole proj into one repo.

Do those individual apps utilize shared code and libraries? If so, they really belong in the same repository .. which enables you to see the impact of new changes when running a single testing suite.
If they are completely separate and agnostic, it really doesn't matter. Just for sanity, ease of building and ease of packaging, I prefer to keep a whole project in a combined repository. However, we typically work in very small teams. So, if no shared code is involved, its really just a question of which method is most convenient and efficient for all concerned.

If you have reusable apps, put them in a seperate repo.
If you are worried about the number of private repositories have a look at bitbucket (if you decide to make them open source, I advice github)
There is a nice way of including your own apps in to the project, even making use of version tags:
I recently found out a way to do this with buildout and git tags
(I used svn:externals to include an app, but switched from svn to git recently).
I tried mr.developer first, but while not getting this to work i found an alternative for mr.developer:
I found out gp.vcsdevlop is very easy to use for this purpose.
see https://pypi.python.org/pypi/gp.vcsdevelop
I ended up, putting this in my buildout file and got it working at once (I had to add a pip requirements.txt to my app to get it working, but that's a good thing after all):
vcs-update = True
extensions =
gp.vcsdevelop
buildout-versions
develop-dir=./local_checkouts
vcs-extend-develop=git+git#bitbucket.org:<my bitbucket username>/<the app I want to include>.git#0.1.38#egg=<the appname in django>
develop = .
in this case it chacks out my app and clone it to version tag 0.1.38 in to the project in the subdit ./local_checkouts/
and develops it when running bin/buildout
Edit: remark 26 august 2013
While using this solution and editing in this local checkout of the app used in a project. I found out this:
you will get this warning when trying the normal 'git push origin master' command:
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes before pushing again. See the 'Note about
fast-forwards' section of 'git push --help' for details.
EDIT 28 august 2013
About working in the local_checkouts for shared apps included by gp.vcsdevelop:
(and handle the warning discussed in the remakr above)
git push origin +master seems to screw up commit history for the shared code
So the way to work in the local_checkout dir is like this:
go in to the local checkout (after a bin/buildout so the checkout is the master):
cd localcheckouts/<shared appname>
Create a new branch and swith to it like this:
use git checkout -b 'issue_nr1' (e.g. the name of the branch is here named after the issue you are working on)
and after you are done working in this branche, use: (after the usuals git add and git commit)
git push origin issue_nr1
when tested and completed merge the branch back in the master:
first checkout the master:
git checkout master
update (probably only neede when other commits in the mean time)
git pull
and merge to the master (where you are in right now):
git merge issue_nr1
and finaly push this merge:
git push origin master
(with sepcial thanks to this simplified guide to git: http://rogerdudler.github.io/git-guide/)
and after a while, to clean up the branches, you might want to delete this branch
git branch -d issue_nr1

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js