Do internal artifacts belong in a repository?

Do internal artifacts belong in a repository? - build

Our team is struggling with issues around the idea of a library repository. I have used Maven, and I am familiar with Ivy. We have little doubt that for third-party jars, a common repository that is integrated into the build system has tremendous advantage.
The struggle is around how to handle artifacts that are strictly internal. We have many artifacts that use a handful of different build tools, including Maven, but essentially they are all part of one one product (one team responsible for all of it). Unfortunately, we are not currently producing a single artifact per project, but we're headed in that direction. Developers do and will check out all the projects.
We see two options:
1) Treat all artifacts even internal ones as any third-party jar. Each jar gets built and published to the repository, and other artifact projects refer to the repository for all projects.
2) Each project refers to other "sibling" projects directly. There is a "master project" that triggers the builds for all other projects with an appropriate dependency order. In the IDE (eclipse), each projects refers to it's dependent project (source) directly. The build tools look into the sibling project referencing a .jar.
It's very clear that the open-source world is moving towards the repository model. But it seems to us that their needs may be different. Most such projects are very independent and we strongly suspect users are seldom making changes across projects. There are frequent upgrades that are now easier for clients to track and be aware of.
However, it does add a burden in that you have to separately publish changes. In our case, we just want to commit to source control (something we do 20-50 times a day).
I'm aware that Maven might solve all these problems, but the team is NOT going to convert everything to Maven. Other than maven, what do you recommend (and why)?

It's not necessary to choose only one of your options. I successfully use both in combination. If a project consists of multiple modules, they are all built together, and then delivered to the repository. However, the upload only happens for "official" or "release" builds. The ongoing development builds are done at the developers' machines. You don't have to use Maven for this. Ivy would work or even a manual process. The "artifact repository" could be one of the excellent products available or just a filesystem mount point.
It's all about figuring out the appropriate component boundaries.
When you say "developers do and will check out all projects", that's fine. You never know when you might want to make a change; there's no harm in having a working copy ready. But do you really want to make every developer to build every artifact locally, even if they do not need to change it? Really, is every developer changing every single artifact?
Or maybe you just don't have a very big product, or a very big team. There's nothing wrong with developing in a single project with many sub-projects (which may themselves have sub-modules). Everybody works on everything together. A single "build all" from the top does the job. Simple, works, fast (enough). So what would you use a shared repository for in this case? Probably nothing.
Maybe you need to wait until your project/team is bigger before you see any benefit from splitting things up.
But I guarantee you this: you have some fundamental components, which are depended on (directly or indirectly) by many projects but do not themselves depend on any internal artifacts. Ideally, these components wouldn't change very much in a typical development cycle. My advice to you is twofold:
set up an internal repository since you already agree that you will benefit from doing so, if only for third-party jars.
consider separating the most fundamental project into a separate build, and delivering its artifact(s) to the repository, then having the rest of the system reference it as if it were a third-party artifact.
If a split like this works, you'll find that you're rebuilding that fundamental piece only when needed, and the build cycle for the rest of the project (for your developers) is faster by a corresponding amount: win-win. Also, you'll be able to use a different delivery schedule (if desired) for the fundamental component(s), making the changes to it more deliberate and controlled (which is as it should be for such a component). This helps facilitate growth.

If a project produces multiple jar files (and many do) I would carefully consider (case by case) whether or not each jar will ever be reused by other projects.
If yes, that jar should go into the repository as a library, because it facilitates reuse by allowing you to specify it as a dependency.
If no, it would be a so-called "multi-project" in which the whole project is built of the parts. The individual jars probably do not need to be stored individually in the repo. Just the final completed artifact.
Gradle is definitely a candidate for you: It can use Ant tasks and call Ant scripts, understands Maven pom files, and handles multi-projects well. Maven can do much of this as well, and if you have lots of patience, Ant+Ivy can.

Related

How to check which projects are dependent on a .cpp file?

I've got a solution with many projects that are dependent on one another (large program, about ~200 projects).
Alot of these connect are compiled as static libs, and are compiled into other projects that use link time code generation.
Now, lets say i want to test something and change a single .cpp file somewhere, and i don't want to re-install the whole thing, so i just want to replace the dlls that are affected by the change.
How do i find all the dlls that were re-created and are affected by the change ?

If you're using a version control system (which you probably are), and you check in DLLs before deployment (which you possibly don't), you can ask the VCS what DLLs have changed.
Because that's probably the place in your workflow to have this intelligence: you want a compact deployment, you need to create a checkpoint each time you deploy (in this case by checking in your deployable objects).

When full build and when partial build?

Hi I am trying to find out when full build is required and when partial build is sufficient.
There are many articals but I am not able to find the specific answers.
Below are my thoughts
Full build is required when :
1.Change in build of dependent modules.
---change in build option or using optimization techniques.
2.changes in the object layout:
---Any change in the headder file, adding and deleting of new methods in class .
---Changing object size by adding or removing of variables or virtual functions.
---Data alignment changes using pragma pack.
3.Any change in global variables
Partial build is sufficient when:
1.Any change in the logic as long as it is not altering the interface specified
2.change in stack variable

In the ideal world a full build should never be necessary, because all the build tools detecting automatically if one of their dependencies have changed.
But this is true only in the ideal world. Practically build tools are written by humans and humans
make failures, so the tools may not take every possible change into account,
are lazy, so the tools may not take any change into account.
For you this means you have to have some experience with your build tools. With a good written makefile may take everything into account and you rarely have to do a full build. But in the 21st century a makefile is not really state of the art any more, and they become complex very soon. Todays development environments do a fairly good job in finding dependencies, but for larger projects, you may have dependencies which are hard to put in the concept of your development environment and you will writing a script.
So there is no real answer to your question. In practise it is good to do a full rebuild for every release, then this rebuild should be done by pressing just one button. And do a partial build for daily work, since nobody wants to wait 2 hours to see if is code is compilable or not. But even in dayly work a full rebuild is sometimes neccessary because the linker/compiler/(your choice of tool here) had not recognized even the simplest change.

TDD - Creating a new class in an empty project to make dependencies explicit as they are added

Using TDD, I'm considering creating an (throw-away) empty project as Test-harness/container for each new class I create. So that it exists in a little private bubble.
When I have a dependency and need to get something else from the wider project then I have to do some work to add it into my clean project file and I'm forced into thinking about this dependency. Assuming my class has a single responsibility then I ought not to have to do this very much.
Another benefit is an almost instant compile / test / edit cycle.
Once I'm happy with the class, I can then add it to the main project/solution.
Has anyone done anything similar before or is this crazy?

I have not done this in general, create an empty project to test a new class, although it could happen if I don't want to modify the current projects in my editor.
The advantages could be :
sure not to modify the main project, or commit by accident
dependencies are none, with certaintly
The drawbacks could be :
cost some time ...
as soon as you want to add one dependency on your main project, you instantly get all the classes in that project ... not what you want
thinking about dependencies is usual, we normally don't need an empty project to do so
some tools check your project dependencies to verify they follow a set of rules, it could be better to use of those (as that could be used not only when starting a class, but also later on).
the private bubble concept can also by found as import statements.
current development environments on current machines already give you extra-fast operations ... if not, you could do something about it (tell us more ...)
when ok, you would need to copy to your regular project your main and your test class. This can cost you time, especially as the package might not be adequate (simplest possible in your early case because your project is empty, but adequate to your regular project later).
Overall, I'm afraid this would not be a timesaver... :-(

I have been to a presentation for using Endeavour. One of the concepts they depended highly upon was decoupling as you suggest:
service in seperate solution with its own testing harness
Endeavour is in a nutshell a powerfull development environment / plugin for VS which helps archieving these things. Among a lot of other stuff it also hooks into / creates a nightly build from SourceSafe to define which dll's are building and places those in a shared folder.
When you create code which depends on an other service you don't reference the VS project but the compiled DLL in the shared folder.
By doing this a few of the drawbacks suggested by KLE are resolved:
Projects depending on your code reference the DLL instead of your project (build time win)
When your project fails to build it will not break integration; they depend upon a DLL which is still available from last working build
All classes visible - nope
Middle ground:
You REALLY have to think about dependency's, more then in 'simple' setups.
Still costs time
But ofcourse there is also a downside:
its not easy to detect circular dependency's
I am currently in the process of thinking how to archieve the benefits of this setup without the full-blown install of Endeavour because its a pretty massive product which does really much (which you won't all need).

Version Control: multiple version hell, file synchronization

I would like to know how you normally deal with this situation:
I have a set of utility functions. Say..5..10 files. And technically they are static library, cross-platform - SConscript/SConstruct plus Visual Studio project (not solution).
Those utility functions are used in multiple small projects (15+, number increases over time). Each project has a copy of a few files or of an entire library, not a link into one central place. Sometimes project uses one file, two files, some use everything. Normally, utility functions are included as a copy of every file and SConscript/SConstruct or Visual Studio Project (depending on the situation). Each project has a separate git repository. Sometimes one project is derived from other, sometimes it isn't.
You work on every one of them, in random order. There are no other people (to make things simpler)
The problem arises when while working on one project you modify those utility function files.
Because each project has a copy of file, this introduces new version, which leads to the mess when you try later (week later, for example) to guess which version has a most complete functionality (i.e. you added a function to a.cpp in one project, and added another function to a.cpp in another project, which created a version fork)
How would you handle this situation to avoid "version hell"?
One way I can think of is using symbolic links/hard links, but it isn't perfect - if you delete one central storage, it will all go to hell. And hard links won't work on dual-boot system (although symbolic links will).
It looks like what I need is something like advanced git repository, where code for the project is stored in one local repository, but is synchronized with multiple external repositories. But I'm not sure how to do it or if it is possible to do this with git.
So, what do you think?

The normal simple way would be to have the library as a project in your version control, and if there is a modification, to edit only this project.
Then, other projects that need the library can get the needed files from the library project.

It is not completely clear to me what you want but maybe git submodules might help : http://git-scm.com/docs/git-submodule

In Subversion you can use externals (it's not GIT I know, but these tips might still help). This is how it works:
Split the application specific code (\MYAPP) from the common code (\COMMON)
Remove all duplicates from the applications; they should only use the common-code
Bring in the common code in the applications by adding \COMMON as an external in \MYAPP
You will also probably have versions of your application. Also introduce versions in the common code. So your application will have the following folders in the repository:
\MYAPP\TRUNK
\MYAPP\V1
\MYAPP\V2
Similarly, add versions to the common-code, either using version numbers, like this:
\COMMON\TRUNK
\COMMON\V1
\COMMON\V2
Or using dates, like this:
\COMMON\TRUNK
\COMMON\2010JAN01
\COMMON\2010MAR28
The externals of \MYAPP\TRUNK should point to \COMMON\TRUNK, that's obvious.
Try to synchronize the versions of the common-code with the versions of the applications, so every time an application version is fixed, also the common-code is fixed, and the application version will point to the relevant common-code external.
E.g. the externals of \MYAPP\V1 may point to \COMMON\2010JAN01.
The advantage of this approach is that every developer can now extend, improve, debug the common-code. Disadvantage is that the compilation time of applications will increase as the common-code will increase.
The alternative (putting libraries in your version system) has the disadvantage that the management (extending, improving, debugging) of the common code is always done separately from the management of the applications, which may prevent developers from writing generic common code at all (and everyone starts to write their own versions of 'generic' classes).
On the other hand, if you have a clear and flexible team solely responsible for the common code, the common code will be under much better control in the last alternative.

In Configuration (general) terms, a solution is to have multiple trunks (branches):
Release
Integration
Development
Release
This trunk / branch contains software that has passed Quality Assurance and can be released to a customer. After release, all files are marked as "read-only". The are given a label to identify the files with the release number.
Periodically or on demand the testing guru's will take the latest (tip) version from the Integration trunk and submit to grueling quality tests. This is how an integration version is promoted to a release version.
Integration
This trunk contains the latest working code. It contains bug fixes and new features. The files should be labeled after each bug fix or new feature.
Code is moved into the integration branch after the bug has passed the quality testing or the new feature is fully developed (and tested). A good idea here is to label the integration version with a temporary label before integrating developer's code.
Development
These are branches made by developers for fixing bugs or developing new features. This can be a copy of all the files moved onto their local machine or only the files that need to be modified (with links to the Integration trunk for all other files).
When moving between trunks, the code must pass qualification testing, and there must be permission to move to the trunk. For example, unwarranted new features should not be put into the integration branch without authorization.
In your case, the files need to be either checked back into the Integration trunk after they have been modified OR a whole new branch or trunk if the code is too different from the previous version (such as adding new features).
I've been studying GIT and SourceSafe trying to figure out how to implement this schema. The schema is easy to implement in the bigger Configuration Management applications like PVCS and ClearCase. Looks like for GIT, duplicated repositories are necessary (one repository for each trunk). SourceSafe clearly states that it only allows one label per version, so files that have not changed will loose label information.

How do you "refactor" ant build.xml files?

I'm working on a large C++ system built with ant+cpptasks. It works well enough, but the build.xml file is getting out of hand, due to standard operating procedure for adding a new library or executable target being to copy-and-paste another lib/exe's rules (which are already quite large). If this was "proper code", it'd be screaming out for refactoring, but being an ant newbie (more used to make or VisualStudio solutions) I'm not sure what the options are.
What are ant users' best-practices for stopping ant build files exploding ?
One obvious option would be to produce the build.xml via XSLT, defining our own tags for commonly recurring patterns. Does anyone do that, or are there better ways ?

you may be interested in:
<import>
<macrodef>
<subant>
Check also this article on "ant features for big projects".

If the rules are repetitive then you can factor them into an ant macro using macrodef and reuse that macro.
If it is the sheer size of the file that is unmanageable, then you can perhaps break it into smaller files and have the main build.xml call targets within those files.
If it's neither of these, then you may want to consider using a build system. Even though I have not used Maven myself, I hear it can solve many issues of large and unmanageable build files.

Generally, if your build file is large and complex then this is a clear indication that the way you have your code layed out, in terms of folders and packages, it complex and too complicated. I find that a complex ant script is a clear smell of a poorly laid out code base.
To fix this, think about how your code is laid out. How many projects do you have? Do those projects know how to build themselves with a master build script that knows how to bundle the individual projects/apps/components together into a larger whole.
When you are refactoring code, you are looking at ways or breaking things down so that they are easier to understand--smaller methods, smaller classes, methods and classes that do one thing. You need to apply these same principles to your code base as well.
Create smaller components that are functionally cohesive and are very loosely decoupled from the rest of the code. Use a build script to build that component into a library. Do this with the rest of your code. Now create a master build script that knows how to bundle up all of your libraries and build them into your application. If you have several applications, then create build script for each app and a master one that knows how to bundle the apps into distributables.
You should be able to see and understand the layout and structure of your code base just by looking at your build scripts. If they/it is not clean and understandable then neither is your source code.

Use Antlib files. It's a very clean way to
remove copy/pasted code
define default values
If you want to see an example, you can take a look at some of the build script I'm writing for my sandbox projects.

I would try Ant-Ivy- the agile dependency manager. We have recently started using it for some of our more complex systems and it works like a charm. The advantage here is that you dont get the overhead and transition cost to maven (it uses ant targets so will work with your current set up). Here is a comparison between the two.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js