Related
We have an application with 10 millions lines of code in 4GL(Progress) and a database also OpenEdge with 300 Tables. My Boss says we should migrate it to a new Programming language and a new Database Management system.
My questions are:
Do you think we should migrate it? Do you think Progress has a "future"?
If we should migrate it, how, are there any tools? Or should we begin with programming from scratch?
Thank you for the help.
Ablo
Unless your boss has access to an unlimited budget, endless user patience and a thirst for frustration and agony you should not waste any time thinking about rewrites.
http://www.joelonsoftware.com/articles/fog0000000069.html
Yes, Progress has a future. They probably will never be as sexy an option as Microsoft or Oracle or whatever the cool kids are using this week. But they have been around for 30 years and they will still be here when you and your boss retire.
There are those who will rain down scorn on Progress because it isn't X or it doesn't have Y. Maybe they can rewrite your 10 million lines of code next weekend and prove just how right they are. I would not, however, pay them for those efforts until after the user acceptance tests are passed and the implementation is completed.
A couple of years later (the original post being from 2014 and the answers being from 2014 to 2015) :
The post, which has gotten the most votes is argumenting basically two fold :
a. Progress (Openedge) has been around for a long time and is not going anywhere soon
b. Unless your boss has access to an unlimited budget, endless user patience and a thirst for frustration and agony you should not waste any time thinking about rewrites: http://www.joelonsoftware.com/articles/fog0000000069.html
With regard to a:
Yes, the Progress OpenEdge Stack is still around. But from my experience the difficulty to find experienced and skilled Openedge has gotten even more difficult.
But also an important factor here, which i think has evolved to much greater importance, since this discussion started:
The available Open Source Stacks for application development have gotten by factors better, both in terms of out-of-box functionality and quality and have decisively moved in direction of RAD.
I am thinking for instance of Spring Boot, but not only, see https://stackshare.io/spring-boot/alternatives. In the Java realm Spring Boot is certainly unique. Also for the development of rich Webui's many very valid options have emerged, which certainly are addressing RAD requirements, just some "arbitrary" examples https://vaadin.com for Java, but also https://www.polymer-project.org for Javascript, which are interestingly converging both with https://vaadin.com/flow.
Many of the available stacks are still evolving strongly, but all have making life easier for the developer as strong driver. Also in terms of architectures you will find a convergence of many of this stacks with regard basic building blocks and principles: Separation of Interfaces from Implementation, REST API's for remote communication, Object Relational Mapping Technologies, NoSql / Json approaches etc etc.
So yes the Open Source Stack are getting very efficient in terms of Development. And what must also be mentioned, that the scope of these stacks do not stop with development: Deployment, Operational Aspects and naturally also Testing are a strong ,which in the end also make the developers life easier.
Generally one can say the a well choosen Mix and Match of Open Source Stacks have a very strong value proposition, also on the background of RAD requirements, which a proprietary Stack, will have in the long run difficulty to match - at least from my point of view.
With regard to b:
Interestingly enough i was just recently with a customer, who is looking to do exactly this: rewrite their application. The irony: they are migrating from Progress to Progress OpenEdge, with several additional Open Edge compliant Tools. The reason two fold: Their code is getting very difficult to maintain and would refactoring in order to address requirements coming from Web Frontends. Also interesting, they are not finding enough qualified developers.
Basically: Code is sound and lives , when it can be refactored and when it can evolve with new requirements. Unfortunately there many examples - at least from my experience - to contrary.
Additionally End-of-Lifecyle of Software can force a company, to "rewrite" at least layers of their software. And this doesn't necessarily have to bad and impossible. I worked on a Project, which migrated over 300 Oracle Forms forms to a Java based UI within less then two years. This migration from a 2 tier to a 3 tier architecture actually positioned the company to evolve their architecture to address the needs of Web Ui's. So actually in the end this "rewrite" and a strong return of value also from the business perspective.
So to cut a (very;-)) long story short:
One way or another, it is easy to go wrong with generalizations.
You need not begin programming from scratch. There is help available online and yes, you can contact Progress Technical Support if you find difficulties. Generally, ABL code from previous version should work with only little changes. Here are few things that you need to do in order to migrate your application:
Backup databases
Backup source code and .r files
Truncate DB bi files
Convert your databases
Recompile ABL code and test
http://knowledgebase.progress.com articles will help you in this. If you are migrating from some older versions like 9, you can find a good set of new features. You can try them but only after you are done with your conversion.
If you are migrating from 32-bit to 64-bit and if you are using 32-bit libraries, you need to replace them with 64-bit
The first question I'd come back with is 'why'? If the application is not measuring up that's one thing, and the question needs to be looked at from that perspective.
If the perception is that Progress is somehow a "lesser" application development and operating environment, and the desire is only to move to a different development and operating environment - you'll end up with a lot of resources in time, effort, and money invested - not to mention the opportunity cost - and for what? To run on a different database platform? Will migrating result in a lower TCO? Faster development turn-around time? Quicker time to market? What's expected advantage in moving from Progress, and how long will it take to recover the migration cost - if ever?
Somewhere out there is a company who had similar thoughts and tried to move off of Progress and the ABL. The effort failed to meet their target performance and functionality metrics, so they eventually gave up on the migration, threw in the towel, and stayed with Progress - after spending $25M on the project.
Can your company afford that kind of risk / reward ratio?
Progress (Openedge) has been around for a long time and is not going anywhere soon. And rewriting 10 Million lines of code in any language just to use the current flavor of the month would never be worth it unless your current application is not doing what you need. Even then bringing it up to current needs would normally be a better solution.
If you need to migrate your current application to the latest version of Openedge (Progress) you would normally just make a copy of your database(s) and convert it/them to the new version of Openedge and compile your your code against the new databases and shake the bugs out. You may have some keyword issues, but this is usually pretty minor.
If you need help with programming I would suggest contacting Progress Software and attending the yearly trade show or going to https://community.progress.com/ and asking/looking for local user groups. The local user groups would be a stellar place to find local programming talent.
Hope this helps.....
There are two main schools of thought for doing A/B (Split) Testing:
Javascript-based solutions such as Optimizely, Google Analytics Content Experiments.
Server-side solutions such as Django-AB, Splango, and django-lean. (Also, writing your own.)
My understanding is that Javascript-based solutions are spectacular for "which color button converts better," but not so great for switching out entire page layouts, and completely unworkable for trying out large functional changes such as the sequence of pages in a funnel.
That leads me towards a server-side solution. I'm not crazy about coding my own, and will do so only if there is no other option. I'm trying to add value by improving the core functionality of my site, not by creating a better split-testing framework.
The Django apps I've found for split testing are various mixtures of unmaintained, undocumented, documented incorrectly, and incompatible with Django 1.5. This surprises me, because the Django and Python communities seem to have a strong focus on good documentation. I'm also very surprised that none of the testing frameworks I've tried has been compatible with Django 1.5 -- is testing not as core a part of the philosophy in the Django/Python world as it is in Rails?
Here's what I've found:
Splango https://github.com/shimon/Splango -- Not compatible with Django 1.5 (although most compatibility bugs I found were trivial to fix). Largely un-touched since October 2010, except for a fix August 2012 which claims to make sure templates get included in the install. Since templates don't get included in the install when Splango is installed via PyPI, either the fix didn't work or didn't get submitted to PyPI. Documentation is largely accurate, but doesn't completely cover how to set up tests and get reports. It tells you how to configure the template to gather the data, but there appears to be additional steps required in the admin interface which are completely undocumented, and I'm not sure I've done them properly.
Django-lean. Original at https://bitbucket.org/akoha/django-lean has not been updated since July 2010. There is an apparently "blessed" fork at https://github.com/anandhenry2002/django-lean which has not been changed since May 2012, when it was copied over from the original. The original's documentation is incorrect in ways that make following the examples impossible. (Though you can probably muddle your way through, as I did.) The new version's documentation has formatting problems that make it difficult to read on github. (This appears to be because it's the unchanged documentation from the old project, and BitBucket syntax doesn't work on Github.) The django-lean Google Group has not had a message since July 2012.
django-mini-lean https://github.com/DanAncona/django-mini-lean -- Updated as recently as February 2013, but undocumented.
Leaner - https://bitbucket.org/brianjinwright/leaner -- Last updated July 2012, and no docs.
Django-AB -- Last updated May 2009. Is not a package, and can't be installed via PIP or PyPI. After placing the checkout in my django app folder (and renaming the folder to ab) and following the installation instructions, I get an error loading the template loader that I have not tracked down further.
So far Splango appears to be the winner, as I've actually been able to get it more-or-less working (by manually installing the templates, and then editing them to fix Django 1.5 incompatibilities).
Can anyone point me to anything I've missed?
You have missed this app : https://github.com/mixcloud/django-experiments + https://github.com/disqus/gargoyle/
And then there's waffle: http://waffle.readthedocs.org/
It's simple, updated, maintained, but not very feature rich, it doesn't have any analytics/reporting stuff integrated. But then again, google analytics or mixpanel type of service is better for this.
I first looked at Django-AB and that is almost what I wanted, but I couldn't get it to work either. After looking at django-experiments and deciding I didn't want to mess around with redis yet, I decided to roll my own. I've tried to package it up nicely and make it easy to use for the beginner. It's super basic.
https://github.com/crobertsbmw/RobertsAB
You can swap out entirely different page layouts with Google Analytics Experiments (their default experiment setup will redirect users to a different URL for each variation you have), although in general its much easier to interpret why something is more successful if you test smaller things against each other.
You are right that testing different funnels and user flows against each other using Google Analytics would require a lot of manual setup; although theoretically you could do it by swapping out different links and tracking your users with UTM campaigns.
For smaller A/B tests within the same page, I ended up using Google Analytics Experiments and writing a custom Django CMS plugin for adding a few variant options to a template, which queries the Google Analytics API and displays the correct variant using Javascript.
What is the best framework/library/reusable app for caching model instances in Django?
(This approach is also known as transparently object cache, ORM cache,
row-level object cache, object level cache)
The are reausable apps implementing this. The problem is there are to many of them!
Here what I found (probably even more exist):
http://github.com/mmalone/django-caching/
http://github.com/dcramer/django-orm-cache
http://github.com/dziegler/django-cachebot
http://bitbucket.org/jmoiron/johnny-cache
http://github.com/jbalogh/django-cache-machine
http://github.com/SeanHayes/django-query-caching
I do not want test every library, just want to peek one that do the job, that solve more
problems than it creates ("There are only two hard problems in Computer Science: cache invalidation and naming things")
Please share your expirience
About an year ago I've had the same question. Checked about a dozen solutions and finally narrowed up to johnny-cache and django-cache-machine. Used the last one for no reason, both are stable and good enough.
I've just gone through this same consideration, and settled on django-cache-machine, because it supports django 1.5 currently (Summer 2013), and johnny cache has open pull requests for django 1.5 support that have not been merged in. YMMV.
I am currently faced with the task of importing around 200K items from a custom CMS implementation into Sitecore. I have created a simple import page which connects to an external SQL database using Entity Framework and I have created all the required data templates.
During a test import of about 5K items I realized that I needed to find a way to make the import run a lot faster so I set about to find some information about optimizing Sitecore for this purpose. I have concluded that there is not much specific information out there so I'd like to share what I've found and open the floor for others to contribute further optimizations. My aim is to create some kind of maintenance mode for Sitecore that can be used when importing large columes of data.
The most useful information I found was on Mark Cassidy's blogpost http://intothecore.cassidy.dk/2009/04/migrating-data-into-sitecore.html. At the bottom of this post he provides a few tips for when you are running an import.
If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
Use BulkUpdateContext()
Don't forget your target language
If you can, make the fields shared and unversioned. This should help migration execution speed.
The first thing I noticed out of this list was the BulkUpdateContext class as I had never heard of it. I quickly understood why as a search on the SND forum and in the PDF documentation returned no hits. So imagine my surprise when i actually tested it out and found that it improves item creation/deletes by at least ten fold!
The next thing I looked at was the first point where he basically suggests creating a version of web config that only has the bare essentials needed to perform the import. So far I have removed all events related to creating, saving and deleting items and versions. I have also removed the history engine and system index declarations from the master database element in web config as well as any custom events, schedules and search configurations. I expect that there are a lot of other things I could look to remove/disable in order to increase performance. Pipelines? Schedules?
What optimization tips do you have?
Incidentally, BulkUpdateContext() is a very misleading name - as it really improves item creation speed, not item updating speed. But as you also point out, it improves your import speed massively :-)
Since I wrote that post, I've added a few new things to my normal routines when doing imports.
Regularly shrink your databases. They tend to grow large and bulky. To do this; first go to Sitecore Control Panel -> Database and select "Clean Up Database". After this, do a regular ShrinkDB on your SQL server
Disable indexes, especially if importing into the "master" database. For reference, see http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
Try not to import into "master" however.. you will usually find that imports into "web" is a lot faster, mostly because this database isn't (by default) connected to the HistoryManager or other gadgets
And if you're really adventureous, there's a thing you could try that I'd been considering trying out myself, but never got around to. They might work, but I can't guarantee that they will :-)
Try removing all your field types from App_Config/FieldTypes.config. The theory here is, that this should essentially disable all of Sitecore's special handling of the content of these fields (like updating the LinkDatabase and so on). You would need to manually trigger a rebuild of the LinkDatabase when done with the import, but that's a relatively small price to pay
Hope this helps a bit :-)
I'm guessing you've already hit this, but putting the code inside a SecurityDisabler() block may speed things up also.
I'd be a lot more worried about how Sitecore performs with this much data... assuming you only do the import once, who cares how long that process takes. Is this going to be a regular occurrence?
The story so far:
Decided to go with Xapian as search backend because it has all search-engine features I was looking for, knows about Unicode, stemming, has few dependencies and requires no bloated app-server installation on top of it.
Tried Django and Haystack (plus xapian-haystack, the backend glue code to tie Haystack to Xapian) because it was advertised on quite some blogs as "working". Did not work. Neither django-haystack nor the xapian-haystack project provide a version combination that actually works together. MASTER from both projects yields an error from Xapian, so it's not stable at all. Haystack 1.0.1 and xapian-haystack 1.0.x/1.1.0 are not API-compatible. Plus, in a minimally working installation of Haystack 1.0.1 and xapian-haystack MASTER, any complex query yields zero results due to errors in either django-haystack or xapian-haystack (I double-verified this), maybe because the unit-tests actually test very simple cases, and no edge-cases at all.
Tried Djapian. The source-code is riddled with spelling errors (mind you, in variable names, not comments), documentation is also riddled with ambiguities and outdated information that will never lead to a working installation. Not surprisingly, users rarely ask for features but how to get it working in the first place.
Next on the plate: exploring Solr (installing a Java environment plus Tomcat gives me headaches, the machine is RAM- and CPU-constrained), or Lucene (slightly less headaches, but still).
Before I proceed spending more time with a solution that might or might not work as advertised, I'd like to know: Did anyone ever get an actual, real-world search solution working in Django? I'm serious. I find it really frustrating reading about "large problems mostly solved", and then realizing that you will never get a working installation from the source-code because, actually, all bloggers dealing with those "mostly solved problems" never went past basic installation and copy-pasting the official tutorials.
So here are the requirements:
must be able to search for 10-100 terms in one query
must handle + (term must be present) and - (term must not be present), AND/OR
must handle arbitrary grouping (i.e. parentheses around AND/OR)
must allow for Django-ORM filtering before or after fulltext-search (i.e. pre-/post-processing of results with the full set of filters that Django knows about)
alternatively, there must be a facility to bulk-fetch the result set and transform it into a QuerySet
should be light on the machine, so preferably no humongous JVM and Java-based app-server installation
Is there anything out there that does this? I'm not interested in anecdotal evidence, or references to some blog posts that claim it should be working. I'd like to hear from someone who actually has a fully-functional setup working in the real world, under real conditions, with real queries.
EDIT:
Let me repeat again that I'm not so much interested in anecdotal evidence that someone, somewhere has a somewhat running installation working with unspecified properties. I already went there, I read all the blog posts, mailing lists, I contacted the authors, but when it came to actual implementation of real-world scenarios, nothing ever worked as advertised.
Also, and a user below brought that point up as well, considering the TCO of any project, I'm definitely not interested in hearing that someone, somewhere was able to pull it off once a vendor parachuted in an unknown number of specialists to monkey-patch the whole installation with specific domain-knowledge that's documented nowhere.
So, please, if you claim you have a working installation that actually satisfies minimum requirements for a full-fledged search (see requirements above), please provide the following so that we can all benefit from a search solution for Django that actually solves the problem:
exact Linux distribution, release version,
exact release version of Haystack (or equivalent) and release version of search backend,
exact release version of the search engine
publicly (!) available documentation how to set up all components exactly in the way that your installation was set up such that the minimal requirements above are met.
Thank you.
I have developed some Django applications with xapian support too. The biggest of them has a xapian database with an index of 8G storing 2.4M documents (including forum posts, wiki entries, planet entries and blog entries) - still growing.
Overall I am quite happy with xapian. It performs extremely well and is easy to use. The only thing I don't like is that xapian won't work with mod_wsgi (except of the global mode) because of a deadlock. So you are forced to use fastcgi (or connect to xapian-tcpsrv or write your own service).
I recommend you, to use the xapian-bindings directly. Xapian nowadays offers quite a lot of useful helpers (TermGenerator, QueryParser etc), which makes both the indexing and the querying simple. In fact, there is nothing I can imaging which would justify an additional library. In my opinion they are all more complicated and don't allow you to index efficiently.
The only thing you need, is some understanding of the way how xapian is working. (What are terms? What are values? What is stemming and where should I use it? and so on). You can find all those topics on the xapian website, and as soon as you understand those concepts, dealing with xapian will become easy.
Also, the xapian API is extremly stable. I've started using it a long time before the 1.0 release and never had any problems with API changes or version conflicts. The only thing which has changed is that all those helpers (query parser, tokenizer, etc.) I have once written for my Django project are now useless, because similar classes have made their way into the xapian core.
So, to summarize, just give the direct usage of xapian-bindings a try.
I can vouch for Django-Haystack with the Xapian backend (In the interest of full disclosure, I am the author of the xapian-haystack backend) in a real life, production environment. We currently use Haystack/Xapian on several sites, the largest of which has more than 20,000 registered users and a Xapian database with 20,000+ documents containing more than 143,000 unique terms for a total size of ~141mb.
As for not being able to get any combination of Haystack and the Xapian backend running, I'll admit that I was not as diligent as I should have been with my tagging and so there is some confusion with the versions. You should, however, be able to use the current master of both codebases without any issue. If this is not case, I'd be more than happy to assist with problems. You'll need to be a little bit more specific about the issue though. Simply saying "it did not work" is not enough information.
Daniel and I both do our best to respond to any issues opened on Github within a timely manner. Also, we're both usually available on the #haystack IRC channel during the day and the django-haystack Google Group.
Versions used:
Haystack 1.0BETA with Xapian-Haystack 1.1.0BETA
Haystack 1.0.1FINAL with Xapian-Haystack 1.1.3BETA
Most of the sites we've deployed with Haystack have been running Ubuntu 8.04 LTS with Xapian 1.0.5
Short answer: No.
We bailed and went with a Google Custom Search. Although the site has over 10,000 possible page views, we keep the sitemap feed down to the main 4,000 pages or so and it costs $250/year, which is about 2 hours of my time. The customer is happy and he feels comfortable with the results.
I'd love to see someone come up with a good FOSS solution, but in a commercial situation the TCO has got to make economic sense.
The details you requested.
exact Linux distribution, release version - Ubuntu 9.04 & 9.10
exact release version of Haystack (or equivalent) - Haystack 1.0 as well as master
release version of search backend - The Solr & Whoosh backends included with Haystack
exact release version of the search engine - Solr 1.3, Solr 1.4 & Whoosh 0.3.15
publicly (!) available documentation how to set up all components exactly in the way that your installation was set up such that the minimal requirements above are met.
http://docs.haystacksearch.org/dev/installing_search_engines.html#solr (or #whoosh)
Beyond this, it's the standard configuration bits from the tutorial, plus any additional overrides from (which I can't link to, thanks Stack Overflow) as needed.
As the maintainer of Haystack, I'm actively running all of the above previous setups. The smallest Haystack installation (Haystack 1.0 + Whoosh) is ~600 documents. A slightly larger one (Haystack master + Solr 1.4) is ~4000 documents. The largest deployment I'm aware of (Haystack master + Solr 1.4) is ~3 million documents.
I generally try to avoid Stack Overflow, so don't be surprised if you see nothing further from me. The mailing list is the best place for support, but given your responses thus far, I'm sure you'd rather just trash me here.
I (and my colleagues) have successfully used Haystack to achieve a fairly good search functionality.
It is easy to start with haystack and whoosh backend; and change to the Apache-Solr backend when performance of whoosh is not acceptable.
We really got to get around to write a detailed post about it with links to the projects where it works.
For now I can suggest you to have a look at this search: http://www.webdevjobshq.com/search/?q=rails implemented using Haystack with Apache-Solr backend. Or this: http://www.govbuddy.com/search/?q=Roy
Have you considered Sphinx? What are you using as you data store? It has a MySQL engine that works terrific. I think it meet most of your requirements except I'm not exactly certain how nicely it can be tied into Django-ORM.
I'm heavily considering using Sphinx in one of my own Django Apps to improve performance on an auto-suggest field that does a prefix and infix search on a corpus of 3.5 million records. But I haven't got around to implementing it yet, so I can't speak to Django+Sphinx integration. My only Sphinx experience is with the MySQL Engine and directly querying MySQL.
I use Djapian. It was quite simple to install and works great. There is an actual tutorial that covers basic use-cases and shows entire integration process.
Yes, it has some ambiguities but issue tracker is open and authors rapidly fixes bugs and add features.