Related
I have found myself trying to upgrade Apache Superset from version 0.35 to 0.37 or soon 0.38 as is about to get released.
Our version of superset was heavily modified, from code added to the default superset files to creating new one. We are talking about hundreds of files changed with 1 to 1000+ added lines Given this fact the upgrade fails with a lot of conflicts, something that was expected.
I would like to find a way to make the upgrade to the newest version and keep our modifications as well as making the process easier for future upgrades.
So far (until 0.35) we manage to upgrade to the newest version and solve the conflicts but it became more and more difficult.
The modification were from the front-end jsx files to css to python files.
What I tried and failed:
make a patch file using
diff <0.35 files> <our modified 0.35>
and apply the patch to 0.37 but this did not work as the files are very different between versions and the line numbers changed drastically as well as the folder structure is different in the newer versions.
Is there any way to keep our modification separated and make the process easier for future upgrades?
The more you fork and has the main branch evolves, the harder it gets to upgrade, that is simply to be expected when forking software. Given current high velocity and general pace of change in the Superset code base, and the pre-1.0 status of the software, you can pretty safely assume that any work you do in a fork is likely to be fully revisited every time there's a major upgrade.
The way you prevent that is:
extending only the things that were designed to be extended (plugins, extensions, configuration hooks)
contributing your changes back into the main branch
Many of us have learned that the hard way in our careers. The typical realization is that overtime, the progress in the main branch offer much more value than your customization, and upgrading and applying your customization becomes hard or simply impossible. The usual solution is to look into your customization and identify the really important ones, and try to contribute them back so that they can evolve with the software.
We have an application with 10 millions lines of code in 4GL(Progress) and a database also OpenEdge with 300 Tables. My Boss says we should migrate it to a new Programming language and a new Database Management system.
My questions are:
Do you think we should migrate it? Do you think Progress has a "future"?
If we should migrate it, how, are there any tools? Or should we begin with programming from scratch?
Thank you for the help.
Ablo
Unless your boss has access to an unlimited budget, endless user patience and a thirst for frustration and agony you should not waste any time thinking about rewrites.
http://www.joelonsoftware.com/articles/fog0000000069.html
Yes, Progress has a future. They probably will never be as sexy an option as Microsoft or Oracle or whatever the cool kids are using this week. But they have been around for 30 years and they will still be here when you and your boss retire.
There are those who will rain down scorn on Progress because it isn't X or it doesn't have Y. Maybe they can rewrite your 10 million lines of code next weekend and prove just how right they are. I would not, however, pay them for those efforts until after the user acceptance tests are passed and the implementation is completed.
A couple of years later (the original post being from 2014 and the answers being from 2014 to 2015) :
The post, which has gotten the most votes is argumenting basically two fold :
a. Progress (Openedge) has been around for a long time and is not going anywhere soon
b. Unless your boss has access to an unlimited budget, endless user patience and a thirst for frustration and agony you should not waste any time thinking about rewrites: http://www.joelonsoftware.com/articles/fog0000000069.html
With regard to a:
Yes, the Progress OpenEdge Stack is still around. But from my experience the difficulty to find experienced and skilled Openedge has gotten even more difficult.
But also an important factor here, which i think has evolved to much greater importance, since this discussion started:
The available Open Source Stacks for application development have gotten by factors better, both in terms of out-of-box functionality and quality and have decisively moved in direction of RAD.
I am thinking for instance of Spring Boot, but not only, see https://stackshare.io/spring-boot/alternatives. In the Java realm Spring Boot is certainly unique. Also for the development of rich Webui's many very valid options have emerged, which certainly are addressing RAD requirements, just some "arbitrary" examples https://vaadin.com for Java, but also https://www.polymer-project.org for Javascript, which are interestingly converging both with https://vaadin.com/flow.
Many of the available stacks are still evolving strongly, but all have making life easier for the developer as strong driver. Also in terms of architectures you will find a convergence of many of this stacks with regard basic building blocks and principles: Separation of Interfaces from Implementation, REST API's for remote communication, Object Relational Mapping Technologies, NoSql / Json approaches etc etc.
So yes the Open Source Stack are getting very efficient in terms of Development. And what must also be mentioned, that the scope of these stacks do not stop with development: Deployment, Operational Aspects and naturally also Testing are a strong ,which in the end also make the developers life easier.
Generally one can say the a well choosen Mix and Match of Open Source Stacks have a very strong value proposition, also on the background of RAD requirements, which a proprietary Stack, will have in the long run difficulty to match - at least from my point of view.
With regard to b:
Interestingly enough i was just recently with a customer, who is looking to do exactly this: rewrite their application. The irony: they are migrating from Progress to Progress OpenEdge, with several additional Open Edge compliant Tools. The reason two fold: Their code is getting very difficult to maintain and would refactoring in order to address requirements coming from Web Frontends. Also interesting, they are not finding enough qualified developers.
Basically: Code is sound and lives , when it can be refactored and when it can evolve with new requirements. Unfortunately there many examples - at least from my experience - to contrary.
Additionally End-of-Lifecyle of Software can force a company, to "rewrite" at least layers of their software. And this doesn't necessarily have to bad and impossible. I worked on a Project, which migrated over 300 Oracle Forms forms to a Java based UI within less then two years. This migration from a 2 tier to a 3 tier architecture actually positioned the company to evolve their architecture to address the needs of Web Ui's. So actually in the end this "rewrite" and a strong return of value also from the business perspective.
So to cut a (very;-)) long story short:
One way or another, it is easy to go wrong with generalizations.
You need not begin programming from scratch. There is help available online and yes, you can contact Progress Technical Support if you find difficulties. Generally, ABL code from previous version should work with only little changes. Here are few things that you need to do in order to migrate your application:
Backup databases
Backup source code and .r files
Truncate DB bi files
Convert your databases
Recompile ABL code and test
http://knowledgebase.progress.com articles will help you in this. If you are migrating from some older versions like 9, you can find a good set of new features. You can try them but only after you are done with your conversion.
If you are migrating from 32-bit to 64-bit and if you are using 32-bit libraries, you need to replace them with 64-bit
The first question I'd come back with is 'why'? If the application is not measuring up that's one thing, and the question needs to be looked at from that perspective.
If the perception is that Progress is somehow a "lesser" application development and operating environment, and the desire is only to move to a different development and operating environment - you'll end up with a lot of resources in time, effort, and money invested - not to mention the opportunity cost - and for what? To run on a different database platform? Will migrating result in a lower TCO? Faster development turn-around time? Quicker time to market? What's expected advantage in moving from Progress, and how long will it take to recover the migration cost - if ever?
Somewhere out there is a company who had similar thoughts and tried to move off of Progress and the ABL. The effort failed to meet their target performance and functionality metrics, so they eventually gave up on the migration, threw in the towel, and stayed with Progress - after spending $25M on the project.
Can your company afford that kind of risk / reward ratio?
Progress (Openedge) has been around for a long time and is not going anywhere soon. And rewriting 10 Million lines of code in any language just to use the current flavor of the month would never be worth it unless your current application is not doing what you need. Even then bringing it up to current needs would normally be a better solution.
If you need to migrate your current application to the latest version of Openedge (Progress) you would normally just make a copy of your database(s) and convert it/them to the new version of Openedge and compile your your code against the new databases and shake the bugs out. You may have some keyword issues, but this is usually pretty minor.
If you need help with programming I would suggest contacting Progress Software and attending the yearly trade show or going to https://community.progress.com/ and asking/looking for local user groups. The local user groups would be a stellar place to find local programming talent.
Hope this helps.....
I need your recommendations for continuous build products for a large (1-2MLOC) software development project. Characteristics:
ClearCase revision control
Approx 80% C++; 15% Java; 5% script or low-level
Compiles for Green Hills Integrity OS, but also some windows and JVM chunks
Mostly an embedded system; also includes some UI pieces and some development support (simulation tools, config tools, etc...)
Each notional "version" of the deliverable includes deployment images for a number of boards, UI machines, etc... (~10 separate images; 5 distinct operating systems)
Need to maintain/track many simultaneous versions which, notably, are built for a variety of different board support packages
Build cycle time is a major issue on the project, need support for whatever features help address this (mostly need to manage a large farm of build machines, I guess..)
Operates in a secure environment (this is a gov't program) (Edited to add: This is a classified program; outsourcing the build infrastructure is a non-starter.)
Interested in any best practices or peripheral guidance you might offer. The build automation issues is one of several overlapping best practices that appear to be missing on the program, but try to keep your answers focused on build infrastructure piece and observations directly related.
Cost is not the driving concern. Scalability and ease of retrofitting onto an existing infrastructure are key.
(Edited to address #Dan's comment. ;-)
From my experience with similar systems, there are approximately two parts to this problem:
A repeatable method for checking out sources, building the software, and testing it (if you want to do continual testing as well as building), using a small number of command-line invocations.
A means of calling these command lines on various servers in the build farm.
For the latter, we've been using BuildBot, which seems to work pretty well.
For the former, we have a homegrown solution that started out as a simple bash shell script and grew ... rather substantially. From experience, I'd suggest starting out in python rather than bash -- you'll spend far more code in handling setup and configuration than in actually invoking programs. (Also, it's probably easier to run it on Windows if you're doing that.)
The things I've found to be really key in our script's usefulness are:
Ironclad repeatability. We have a standard set of build tools, and the scripts start out by scrubbing environment variables. There are very few command-line options; everything goes into configuration files, and those go in version control.
Logging. We produce a log of every command that the build script executes.
Configuration file inheritance. Each variant of our software gets a configuration file, and those files can include more-general settings (which include even-more-general settings).
Extensibility. When we add a new source component, it's pretty easy to add a set of instructions for building that component (and the instructions can be arbitrary bash code). The "can be arbitrary code" part is probably key here; no way is a pre-existing product going to be able to do all of the quirky things that you need for a large complex real-world system.
You can get started with a reasonably simple script and let it grow organically as the need arises; honestly, although ours is a bit messy, I think we got a much more usable result that way than we would have with heavy top-down design.
Cost isn't an object? I've worked for GreenHills, and they've solved these issues for their in-house build/test farms. Ask them to do the same for you.
When I see emphasis on things like scalability and security in a build system, I start thinking that you might be a candidate for the enterprise class build systems / CI systems. Conveniently, it sounds like you can afford them as well. A year old SD Times article provides a basic breakdown between the enterprise and team level build tools.
My company makes AnthillPro and we've worked with a number of companies on large embedded projects as well as highly secure projects. IBM is probably the largest other player in the space with BuildForge.
AnthillPro puts some extra emphasis on what you do with the images in the minutes/hours/days post build (do you install them onto simulators / hardware and run automated tests? stage them? promote them?) but we also see folks using it for just build.
I've heard more than one person say that if your build process is clicking the build button, than your build process is broken. Frequently this is accompanied with advice to use things like make, cmake, nmake, MSBuild, etc. What exactly do these tools offer that justifies manually maintaining a separate configuration file?
EDIT: I'm most interested in answers that would apply to a single developer working on a ~20k line C++ project, but I'm interested in the general case as well.
EDIT2: It doesn't look like there's one good answer to this question, so I've gone ahead and made it CW. In response to those talking about Continuous Integration, yes, I understand completely when you have many developers on a project having CI is nice. However, that's an advantage of CI, not of maintaining separate build scripts. They are orthogonal: For example, Team Foundation Build is a CI solution that uses Visual Studio's project files as it's configuration.
Aside from continuous integration needs which everyone else has already addressed, you may also simply want to automate some other aspects of your build process. Maybe it's something as simple as incrementing a version number on a production build, or running your unit tests, or resetting and verifying your test environment, or running FxCop or a custom script that automates a code review for corporate standards compliance. A build script is just a way to automate something in addition to your simple code compile. However, most of these sorts of things can also be accomplished via pre-compile/post-compile actions that nearly every modern IDE allows you to set up.
Truthfully, unless you have lots of developers committing to your source control system, or have lots of systems or applications relying on shared libraries and need to do CI, using a build script is probably overkill compared to simpler alternatives. But if you are in one of those aforementioned situations, a dedicated build server that pulls from source control and does automated builds should be an essential part of your team's arsenal, and the easiest way to set one up is to use make, MSBuild, Ant, etc.
One reason for using a build system that I'm surprised nobody else has mentioned is flexibility. In the past, I also used my IDE's built-in build system to compile my code. I ran into a big problem, however, when the IDE I was using was discontinued. My ability to compile my code was tied to my IDE, so I was forced to re-do my entire build system. The second time around, though, I didn't make the same mistake. I implemented my build system via makefiles so that I could switch compilers and IDEs at will without needing to re-implement the build system yet again.
I encountered a similar problem at work. We had an in-house utility that was built as a Visual Studio project. It's a fairly simple utility and hasn't needed updating for years, but we recently found a rare bug that needed fixing. To our dismay, we found out that the utility was built using a version of Visual Studio that was 5-6 versions older than what we currently have. The new VS wouldn't read the old-version project file correctly, and we had to re-create the project from scratch. Even though we were still using the same IDE, version differences broke our build system.
When you use a separate build system, you are completely in control of it. Changing IDEs or versions of IDEs won't break anything. If your build system is based on an open-source tool like make, you also don't have to worry about your build tools being discontinued or abandoned because you can always re-build them from source (plus fix bugs) if needed. Relying on your IDE's build system introduces a single point of failure (especially on platforms like Visual Studio that also integrate the compiler), and in my mind that's been enough of a reason for me to separate my build system and IDE.
On a more philosophical level, I'm a firm believer that it's not a good thing to automate away something that you don't understand. It's good to use automation to make yourself more productive, but only if you have a firm understanding of what's going on under the hood (so that you're not stuck when the automation breaks, if for no other reason). I used my IDE's built-in build system when I first started programming because it was easy and automatic. I later started to become more aware that I didn't really understand what was happening when I clicked the "compile" button. I did a little reading and started to put together a simple build script from scratch, comparing my output to that of the IDE's build system. After a while I realized that I now had the power to do all sorts of things that were difficult or impossible through the IDE. Customizing the compiler's command-line options beyond what the IDE provided, I was able to produce a smaller, slightly faster output. More importantly, I became a better programmer by having real knowledge of the entire development process from writing code all the way down through the generation of machine language. Understanding and controlling the entire end-to-end process allows me to optimize and customize all of it to the needs of whatever project I'm currently working on.
If you have a hands-off, continuous integration build process it's going to be driven by an Ant or make-style script. Your CI process will check the code out of version control when changes are detected onto a separate build machine, compile, test, package, deploy, and create a summary report.
Let's say you have 5 people working on the same set of code. Each of of those 5 people are making updates to the same set of files. Now you may click the build button and you know that you're code works, but what about when you integrate it with everyone else. The only you'll know is that if you get everyone else's and try. This is easy every once in a while, but it quickly becomes tiresome to do this over and over again.
With a build server that does it automatically, it checks if the code compiles for everyone all the time. Everyone always knows if the something is wrong with the build, and what the problem is, and no one has to do any work to figure it out. Small things add up, it may take a couple of minutes to pull down the latest code and try and compile it, but doing that 10-20 times a day quickly becomes a waste of time, especially if you have multiple people doing it. Sure you can get by without it, but it is so much easier to let an automated process do the same thing over and over again, then having a real person do it.
Here's another cool thing too. Our process is setup to test all the sql scripts as well. Can't do that with pressing the build button. It reloads snapshots of all the databases it needs to apply patches to and runs them to make sure that they all work, and run in the order they are supposed to. The build server is also smart enough to run all the unit tests/automation tests and return the results. Making sure it can compile is fine, but with an automation server, it can handle many many steps automatically that would take a person maybe an hour to do.
Taking this a step further, if you have an automated deployment process along with the build server, the deployment is automatic. Anyone who can press a button to run the process and deploy can move code to qa or production. This means that a programmer doesn't have to spend time doing it manually, which is error prone. When we didn't have the process, it was always a crap shoot as to whether or not everything would be installed correctly, and generally it was a network admin or a programmer who had to do it, because they had to know how to configure IIS and move the files. Now even our most junior qa person can refresh the server, because all they need to know is what button to push.
the IDE build systems I've used are all usable from things like Automated Build / CI tools so there is no need to have a separate build script as such.
However on top of that build system you need to automate testing, versioning, source control tagging, and deployment (and anything else you need to release your product).
So you create scripts that extend your IDE build and do the extras.
One practical reason why IDE-managed build descriptions are not always ideal has to do with version control and the need to integrate with changes made by other developers (ie. merge).
If your IDE uses a single flat file, it can be very hard (if not impossible) to merge two project files into one. It may be using a text-based format, like XML, but XML it notoriously hard with standard diff/merge tools. Just the fact that people are using a GUI to make edits makes it more likely that you end up with unnecessary changes in the project files.
With distributed, smaller build scripts (CMake files, Makefiles, etc.), it can be easier to reconcile changes to project structure just like you would merge two source files. Some people prefer IDE project generation (using CMake, for example) for this reason, even if everyone is working with the same tools on the same platform.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I haven't worked for very large organizations and I've never worked for a company that had a "Build Server".
What is their purpose?
Why aren't the developers building the project on their local machines, or are they?
Are some projects so large that more powerful machines are needed to build it in a reasonable amount of time?
The only place I see a Build Server being useful is for continuous integration with the build server constantly building what is committed to the repository. Is it I have just not worked on projects large enough?
Someone, please enlighten me: What is the purpose of a build server?
The reason given is actually a huge benefit. Builds that go to QA should only ever come from a system that builds only from the repository. This way build packages are reproducible and traceable. Developers manually building code for anything except their own testing is dangerous. Too much risk of stuff not getting checked in, being out of date with other people's changes, etc. etc.
Joel Spolsky on this matter.
Build servers are important for several reasons.
They isolate the environment The local Code Monkey developer says "It compiles on my machine" when it won't compile on yours. This can mean out-of-sync check-ins or it could mean a dependent library is missing. Jar hell isn't near as bad as .dll hell; either way, using a build server is cheap insurance that your builds won't mysteriously fail or package the wrong libraries by mistake.
They focus the tasks associated with builds. This includes updating the build tag, creating any distribution packaging, running automated tests, creating and distributing build reports. Automation is the key.
They coordinate (distributed) development. The standard case is where multiple developers are working on the same code base. The version control system is the heart of this sort of distributed development but depending on the tool, the developers may not interact with each other's code much. Instead of forcing developers to risk bad builds or worry about merging code overly aggressively, design the build process where the automated build can see the appropriate code and processes the build artifacts in a predictable way. That way when a developer commits something with a problem, like not checking in a new file dependency, they can be notified quickly. Doing this in a staged area let's you flag the code that has built so that developers don't pull code that would break their local build. PVCS did this quite well using the idea of promotion groups. Clearcase could do it too using labels but would require more process administration than a lot of shops care to provide.
What is their purpose?
Take load of developer machines, provide a stable, reproducible environment for builds.
Why aren't the developers building the project on their local machines, or are they?
Because with complex software, amazingly many things can go wrong when just "compiling through". problems I have actually encountered:
incomplete dependency checks of different kinds, resulting in binaries not being updated.
Publish commands failing silently, the error message in the log ignored.
Build including local sources not yet commited to source control
(fortunately, no "damn customers" message boxes yet..).
When trying to avoid above problem by building from another folder, some files picked from the wrong folder.
Target folder where binaries are aggregated contains additional stale developer files that shoulkd not be included in release
We've got an amazing stability increase since all public releases start with a get from source control onto an empty folder. Before, there were lots of "funny problems" that "went away when Joe gave me a new DLL".
Are some projects so large that more powerful machines are needed to build it in a reasonable amount of time?
What's "reasonable"? If I run a batch build on my local machine, there are many things I can't do. Rather than pay developers for builds to complete, pay IT to buy a real build machine already.
Is it I have just not worked on projects large enough?
Size is certainly one factor, but not the only one.
A build server is a distinct concept to a Continuous Integration server. The CI server exists to build your projects when changes are made. By contrast a Build server exists to build the project (typically a release, against a tagged revision) on a clean environment. It ensures that no developer hacks, tweaks, unapproved config/artifact versions or uncommitted code makes it into the released code.
The build server is used to build everyone's code when it is checked in. Your code may compile locally, but you most likely won't have all the change made by everyone else all the time.
To add on what has already been said :
An ex-colleague worked on the Microsoft Office team and told me a complete build sometimes took 9 hours. That would suck to do it on YOUR machine, wouldn't it?
It's necessary to have a "clean" environment free of artifacts of previous versions (and configuration changes) in order to ensure that builds and tests work and don't depend on the artifacts. An effective way to isolate is to create a separate build server.
I agree with the answers so far in regards to stability, tracability, and reproducability. (Lots of 'ity's, right?). Having ONLY ever worked for large companies (Health Care, Finance) with MANY build servers, I would add that it's also about security. Ever seen the movie Office Space? If a disgruntled developer builds a banking application on his local machine and no one else looks at it or tests it... BOOM. Superman III.
These machines are used for several reasons, all trying to help you provide a superior product.
One use is to simulate a typical end user configuration. The product might work on your computer, with all your development tools and libraries set up, but the end user most likely won't have the same configuration as you. For that matter, other developers won't have the exact same setup as you either. If you have a hardcoded path somewhere in your code, it will probably work on your machine, but when Dev El O'per tries to build the same code, it won't work.
Also they can be used to monitor who broke the product last, with what update, and where the product regressed at. Whenever new code is checked in, the build server builds it, and if it fails, its clear that something is wrong and the user who committed last is at fault.
For consistent quality and to get the build 'off your machine' to spot environment errors and so that any files you forget to check in to source control also show up as build errors.
I also use it to create installers as these take a lot of time to do on the desktop with code signing etc.
We use one so that we know that the production/test boxes have the same libraries and versions of those libraries installed as what is available on the build server.
It's about management and testing for us. With a build server we always know that we can build our main "trunk" line from version control. We can create a master install with one-click and publish it to the web. We can run all of our unit tests each time code is checked in to make sure it works. By collecting all these tasks into a single machine it makes it easier to get it right repeatedly.
You are right that developers could build on their own machines.
But these are some of the things our build server buys us, and we're hardly sophisticated build makers:
Version control issues (some have been mentioned in earlier responses)
Efficiency. Devs don't have to stop to make builds locally. They can kick it off on the server and get on to the next task. If builds are large, then that is even more time the dev's machine is not occupied. For those doing continuous integration and automated testing, even better.
Centralization. Our build machine has scripts that make the build, distribute it to UAT environments, and even to production staging. Keeping them in one place reduces the hassle of keeping them in sync.
Security. We don't do much special here, but I'm sure a sysadmin can make it such that production migration tools can only be accessed on a build server by certain authorized entities.
Maybe i'm the only one...
I think everyone agrees that one should
use a file repository
do builds from the repository (and in a clean environment)
use a continous testing server (e.g. cruise control) to see if anything is broken after your "fixes"
But no one cares about automatically built versions.
When something was broken in an automatic build, but it's not anymore - who cares? It's a work in progress. Someone fixed it.
When you want to do a release version, you run a build from the repository. And i'm pretty sure you want to tag the version in the repository at that time and not every six hours when the server does it's work.
So, maybe a "build server" is just a misnomer and it's actually a "continous test server". Otherwise it sounds pretty much useless.
A build server gets you a sort of second opinion of your code. When you check it in, the code is checked. If it works, the code has a minimum quality.
Additionally, remember that low level languages take much longer to compile than high level languages. It's easy to think "Well look, my .Net project compiles in a couple of seconds! What's the big deal?" Awhile back I had to mess with some C code and I had forgotten how much longer it takes to compile.
A build server is used to schedule compile tasks (e.g. nightly builds) of usually large projects located in a repository that can sometimes take more than a couple of hours.
A build server also gives you a basis for escrow, being able to capture all the parts necessary to reproduce a build in the case that others may have rights to take ownership.