AWS and Nielsen's Law - amazon-web-services

AWS and Nielsen's Law - amazon-web-services

This isn't exactly a programming question but it's relevant to programmers, and I'm looking for a data-backed, specific answer.
I'm working in a field where the size of the files we create is doubling roughly every 100 days. Nielsen's Law suggests that connection speeds are increasing by about 50% every year, which would mean doubling every 600 days. There is some talk of doing our processing on AWS or some other cloud computing service.
To me, this seems implausible, since the time required to upload the data will soon dwarf the time for processing. However, Nielsen's Law (original article by Nielsen) was made for end user connection speeds, so I'm not sure I can make my point with that.
Does anyone know of a public resource on AWS connection speeds, or institutional (e.g. university or corporation, not residential) connection speeds, over time? I'm wanting to know if it is just larger than residential, but still increasing at the same rate, or if for some reason connection speeds to institutional customers might be increasing faster than Nielsen's Law. Any help in finding evidence on the trend over time for this is appreciated.

I'd think just coming up with a nice pretty graph plotting estimated data download time into the future ought to make your point.

I can't speak for Neilsen's law, but your question in terms of AWS is nearly impossible to answer (in practical, "how should we spend this money" terms).
AWS's connection speeds vary depending on who you are, how much money you have to spend, and where you're located. For example, can you colocate a rack in Reston, VA? How about another datacenter provider used by Amazon? There are three per AZ. If you can, you can likely negotiate much more bandwidth between your rack and theirs than would be available normally. Are you a customer the size of Foursquare? Are you planning on running your jobs on 20,000 instances? I'm sure Amazon network engineering will help you squeeze out every drop of bandwidth you can, and probably write a white paper about it, too. There's rumor of dedicated, non-public-internet pipes between eu-west and us-east. The network map of today's cloud won't likely resemble 2017's at all.
This isn't to suggest your question is wrong or bad, just that it's difficult to answer. As a thought experiment, it's fascinating. As an argument in a discussion about long-term capacity planning and capital outlay, I'm not sure it's as useful.

Related

Good alternative for GeocodeFarm?

I've been using GeocodeFarm for several years now, but it's been a struggle lately. The support is very slow and not at all supportive and technical issues they are having started to insfluence the business - for example, to get a result now you sometimes need to query their server 2 to 10 times before you actually get it and there's no ETA on fixing it.
I'm now looking for a good replacement for it. Unfortunately, we cannot use Google Maps as they restrict their usage (even paid-as-you-go one) for commercial projects and i'm not able to pay for permium services as we are not that big.
Are there any reasonably priced and reliable alternative for GeocodeFarm? We are now paying £100/month for 25k requests and are ok with the price

geocode.xyz ($100 per month no limit cap)

OpenEdge 11.3 Application Migration

We have an application with 10 millions lines of code in 4GL(Progress) and a database also OpenEdge with 300 Tables. My Boss says we should migrate it to a new Programming language and a new Database Management system.
My questions are:
Do you think we should migrate it? Do you think Progress has a "future"?
If we should migrate it, how, are there any tools? Or should we begin with programming from scratch?
Thank you for the help.
Ablo

Unless your boss has access to an unlimited budget, endless user patience and a thirst for frustration and agony you should not waste any time thinking about rewrites.
http://www.joelonsoftware.com/articles/fog0000000069.html
Yes, Progress has a future. They probably will never be as sexy an option as Microsoft or Oracle or whatever the cool kids are using this week. But they have been around for 30 years and they will still be here when you and your boss retire.
There are those who will rain down scorn on Progress because it isn't X or it doesn't have Y. Maybe they can rewrite your 10 million lines of code next weekend and prove just how right they are. I would not, however, pay them for those efforts until after the user acceptance tests are passed and the implementation is completed.

A couple of years later (the original post being from 2014 and the answers being from 2014 to 2015) :
The post, which has gotten the most votes is argumenting basically two fold :
a. Progress (Openedge) has been around for a long time and is not going anywhere soon
b. Unless your boss has access to an unlimited budget, endless user patience and a thirst for frustration and agony you should not waste any time thinking about rewrites: http://www.joelonsoftware.com/articles/fog0000000069.html
With regard to a:
Yes, the Progress OpenEdge Stack is still around. But from my experience the difficulty to find experienced and skilled Openedge has gotten even more difficult.
But also an important factor here, which i think has evolved to much greater importance, since this discussion started:
The available Open Source Stacks for application development have gotten by factors better, both in terms of out-of-box functionality and quality and have decisively moved in direction of RAD.
I am thinking for instance of Spring Boot, but not only, see https://stackshare.io/spring-boot/alternatives. In the Java realm Spring Boot is certainly unique. Also for the development of rich Webui's many very valid options have emerged, which certainly are addressing RAD requirements, just some "arbitrary" examples https://vaadin.com for Java, but also https://www.polymer-project.org for Javascript, which are interestingly converging both with https://vaadin.com/flow.
Many of the available stacks are still evolving strongly, but all have making life easier for the developer as strong driver. Also in terms of architectures you will find a convergence of many of this stacks with regard basic building blocks and principles: Separation of Interfaces from Implementation, REST API's for remote communication, Object Relational Mapping Technologies, NoSql / Json approaches etc etc.
So yes the Open Source Stack are getting very efficient in terms of Development. And what must also be mentioned, that the scope of these stacks do not stop with development: Deployment, Operational Aspects and naturally also Testing are a strong ,which in the end also make the developers life easier.
Generally one can say the a well choosen Mix and Match of Open Source Stacks have a very strong value proposition, also on the background of RAD requirements, which a proprietary Stack, will have in the long run difficulty to match - at least from my point of view.
With regard to b:
Interestingly enough i was just recently with a customer, who is looking to do exactly this: rewrite their application. The irony: they are migrating from Progress to Progress OpenEdge, with several additional Open Edge compliant Tools. The reason two fold: Their code is getting very difficult to maintain and would refactoring in order to address requirements coming from Web Frontends. Also interesting, they are not finding enough qualified developers.
Basically: Code is sound and lives , when it can be refactored and when it can evolve with new requirements. Unfortunately there many examples - at least from my experience - to contrary.
Additionally End-of-Lifecyle of Software can force a company, to "rewrite" at least layers of their software. And this doesn't necessarily have to bad and impossible. I worked on a Project, which migrated over 300 Oracle Forms forms to a Java based UI within less then two years. This migration from a 2 tier to a 3 tier architecture actually positioned the company to evolve their architecture to address the needs of Web Ui's. So actually in the end this "rewrite" and a strong return of value also from the business perspective.
So to cut a (very;-)) long story short:
One way or another, it is easy to go wrong with generalizations.

You need not begin programming from scratch. There is help available online and yes, you can contact Progress Technical Support if you find difficulties. Generally, ABL code from previous version should work with only little changes. Here are few things that you need to do in order to migrate your application:
Backup databases
Backup source code and .r files
Truncate DB bi files
Convert your databases
Recompile ABL code and test
http://knowledgebase.progress.com articles will help you in this. If you are migrating from some older versions like 9, you can find a good set of new features. You can try them but only after you are done with your conversion.
If you are migrating from 32-bit to 64-bit and if you are using 32-bit libraries, you need to replace them with 64-bit

The first question I'd come back with is 'why'? If the application is not measuring up that's one thing, and the question needs to be looked at from that perspective.
If the perception is that Progress is somehow a "lesser" application development and operating environment, and the desire is only to move to a different development and operating environment - you'll end up with a lot of resources in time, effort, and money invested - not to mention the opportunity cost - and for what? To run on a different database platform? Will migrating result in a lower TCO? Faster development turn-around time? Quicker time to market? What's expected advantage in moving from Progress, and how long will it take to recover the migration cost - if ever?
Somewhere out there is a company who had similar thoughts and tried to move off of Progress and the ABL. The effort failed to meet their target performance and functionality metrics, so they eventually gave up on the migration, threw in the towel, and stayed with Progress - after spending $25M on the project.
Can your company afford that kind of risk / reward ratio?

Progress (Openedge) has been around for a long time and is not going anywhere soon. And rewriting 10 Million lines of code in any language just to use the current flavor of the month would never be worth it unless your current application is not doing what you need. Even then bringing it up to current needs would normally be a better solution.
If you need to migrate your current application to the latest version of Openedge (Progress) you would normally just make a copy of your database(s) and convert it/them to the new version of Openedge and compile your your code against the new databases and shake the bugs out. You may have some keyword issues, but this is usually pretty minor.
If you need help with programming I would suggest contacting Progress Software and attending the yearly trade show or going to https://community.progress.com/ and asking/looking for local user groups. The local user groups would be a stellar place to find local programming talent.
Hope this helps.....

How to choose server for production release of my Django application?

My company is at the very end of development process of some (awesome:)) web application. This app will be available as a online service for (hopefully) some significant number of users. This is our biggest django release so far and as we are preparing to release some question about deployment have to be answered.
Q1: how to determine required server parameters for predicted X number of users/Y hits per minute or other factor?
Q2: what hosting solution (shared/vps/dedicated) is worth considering?
Q3: what optimizations can be done at a first place?
I know that this is very subjective and dependent of size of a site, code quality and other factors but I'm very interested in your experiences with django (and not only django) deployment. Any hints, links, advices are kindly welcome. Thanks in advance.

What hosting solution you want to have depends also on how much you want to take of your server yourself (from upgrades etc to backup...), and you should decide if you want to have the responsibility or leave it to someone else.
I think you can only really determine the necessary requirements and bottlenecks in your applications through testing with the estimated load! Try to simulate as many requests.... as you expect - think about caching (where memcached is the best option you have)! If you try to cache things one great tool is the django debug toolbar (http://github.com/robhudson/django-debug-toolbar) which can show you also much about how many database hits you have (dont take the times it shows for that for granted, but analyse them and keep an eye on the number of hits) and eg. how many templates are rendered....
If your system grows, you can first of all think about serving your static media files from another location! Coming to the web server I have made great experiences using lighttpd instead of the fat apache, but I guess you have to evaluate that for yourself!
Also take in consideration what database backend to use, in shared envionments there's in most cases a bigger load on the mysql than on the postgres servers, but also evaluate what works best for you!

You can get some guesses here, but to get a halfway reasonable performance estimate you have to measure the performance of your application yourself. You should then be able to roughly extrapolate the performance on different hardware.
Most of the time the bottleneck is the database, you should get enough RAM to keep it in memory if possible.
"Web application" can encompass so many different things, we can really do no more than guess here.
As for optimization, if it fits to your needs implement some caching (e.g. with memchached), that can give you huge speed improvements.

What is the recommened HW specs for virtualizations?

We are a startup company and doesnt have invested yet in HW resources in order to prepre our dev and testing environment. The suggestion is to buy a high end server, install vmware ESX and deploy mutiple VMs for build, TFS, database, ... for testing,stging and dev enviornment.
We are still not sure what specs to go with e.g. RAM, whether SAN is needed?, HD, Processor, etc..?
Please advice.

You haven't really given much information to go on. It all depends on what type of applications you're developing, resource usage, need to configure different environments, etc.
Virtualization provides cost savings when you're looking to consolidate underutilized hardware. If each environment is sitting idle most of the time, then it makes sense to virtualize them.
However if each of your build/tfs/testing/staging/dev environments will be heavily used by all developers during the working day simultaniously then there might not be as many cost savings by virtualizing everything.
My advice would be if you're not sure, then don't do it. You can always virtualize later and reuse the hardware.

Your hardware requirements will somewhat depend on what kind of reliability you want for this stuff. If you're using this to run everything, I'd recommend having at least two machines you split the VMs over, and if you're using N servers normally, you should be able to get by on N-1 of them for the time it takes your vendor to replace the bad parts.
At the low-end, that's 2 servers. If you want higher reliability (ie. less downtime), then a SAN of some kind to store the data on is going to be required (all the live migration stuff I've seen is SAN-based). If you can live with the 'manual' method (power down both servers, move drives from server1 to server2, power up server2, reconfigure VMs to use less memory and start up), then you don't really need the SAN route.
At the end of the day, your biggest sizing requirement will be HD and RAM. Your HD footprint will be relatively fixed (at least in most kinds of a dev/test environment), and your RAM footprint should be relatively fixed as well (though extra here is always nice). CPU is usually one thing you can skimp on a little bit if you have to, so long as you're willing to wait for builds and the like.
The other nice thing about going all virtualized is that you can start with a pair of big servers and grow out as your needs change. Need to give your dev environment more power? Get another server and split the VMs up. Need to simulate a 4-node cluster? Lower the memory usage of the existing node and spin up 3 copies.
At this point, unless I needed very high-end performance (ie. I need to consider clustering high-end physical servers for performance needs), I'd go with a virtualized environment. With the extensions on modern CPUs and OS/hypervisor support for them, the hit is not that big if done correct.

This is a very open ended question that really has a best answer of ... "It depends".
If you have the money to get individual machines for everything you need then go that route. You can scale back a little on the hardware with this option.
If you don't have the money to get individual machines, then you may want to look at a top end server for this. If this is your route, I would look at a quad machine with at least 8GB RAM and multiple NICs. You can go with a server box that has multiple hard drive bays that you can setup multiple RAIDS on. I recommend that you use a RAID 5 so that you have redundancy.
With something like this you can run multiple VMWare sessions without much of a problem.
I setup a 10TB box at my last job. It had 2 NICs, 8GB, and was a quad machine. Everything included cost about 9.5K

If you can't afford to buy the single machines then you probably are not in a good position to start re-usably with virtualisation.
One way you can do it is take the minimum requirements for all your systems, i.e. TFS, mail, web etc., add them all together and that will give you an idea of half the minimum server you need to host all those systems. Double it and you be near what will get you buy, if you have spare cash double/triple the RAM. Most OSes run better with more RAM to particular ceiling. Think about buying expandable storage of some kind and aim for half populated to start with which will keep the initial cost/GB and make for some expansion at lower cost in the future.
You can also buy servers which take multiple CPUs but only put in the minimum amount of CPUs. Also go for as many cores on a CPU as you can get for thermal, physical and licensing efficiency.
I appreciate this is a very late reply but as I didn't see many ESX answers here I wanted to post a reply though my post equally relates to Hyper-V etc.

Need help with building a web server [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
Greetings!
I'd like to build an apache web server, running on debian lenny.
It will primarily be used for hosting a web-shop, so it should have some light db i/o and lots of image serving (item previews/thumbs/etc...).
It's tough to put a finger on the exact number of concurrent requests that I'll get hit with, but I'd say that a non-professional setup should be enough to handle them.
By non-professional I mean that I don't need to invest into purchasing blades, a rack or something of the like. Just regular desktop PC tweaked for web server performance.
Which exposes my current problem: I have no idea whatsoever what kind of a machine I should be looking for.
If I'd like to build a gaming rig, no problem - there are at least a million sites out there with performance benches, from bleeding edge graphic card reviews to flat panel LCD contrast/response time charts. But when it comes to trying to find reccomendations for a web server based build, I'm having a hard time finding a good RECENT review.
So, at least I've managed to gather this so far - these are the priorities I should be attending to:
1) Lots of memory (preferably fast)
2) A pair of fast HDDs
3) As many cores as I can get
4) As fast processor as I can get
5) A MB with good I/O
So, memory and HDDs aren't that big of a deal, you can't go wrong here (I guess).
With the RAM prices these days, it's pretty affordable to pump 8+ Gb into a machine.
The only question here is, would it be worth it to buy a tiny (<=32 Gb) SSD and place all my web stuff and OS onto it. My entire web server is just a couple of megs in size + the database will fit really neatly onto it with space to boot.
As for the graphics card, I'll just plug in any old PCI Ex card I can whip up, and the same goes for any peripherals. I don't need a display of any kind - I'll be logging in remotely for most of the time.
OK - and now for the most important question: Which Proc and MB to buy.
As far as I've gathered - it would be better to have 10 cores running at 100 Mhz each than only one running at 2 Ghz, taking the nature of the machine into consideration.
So I'll most likely have to get a quad core, right? The question is which... :/
For there are several affordable... My budget is around US $800. This is, again, for just the proc, the MB, and the memory. I have the HDDs. If I take a small SSD, add $100 to that budget.
AMD Phenom or Intel Core 2? Which MB to go with it? I'm totally lost here.
If this will start an AMD vs. Intel flame war, I'm truly sorry, for this is not my intention - but if you could at least point me to a good recent review for a web server build I would ge grateful.

On the one hand you say you don't need that much performance but on the another you are talking about adding as many cores as you can. A quad core CPU of either AMD or Intel will be more than sufficient. It gets into the category of "religious war" but i prefer the Intel chips; I usually buy Xeon processors. As far as SSD, I wouldn't bother. Look into a good RAID setup with a 3Ware controller; either RAID 1+0 or RAID 5 (obviously, there will be a religious anti-RAID5 crowd, though I prefer it .. at least until RAID 6 is more widespread). As much memory as you can afford is ideal, although anything more than 8 is probably overkill from what you have said. Probably the main departure from what you have already listed is that I wouldn't even bother with the SSD. Depending on your usage patterns, you may actually hurt performance with it and any benefits for your use cases would not be worth the costs. Wait for the research to catch up for SSD to really be beneficial in terms of performance. :)
If this is a business server, I recommend buying one pre-configured from IBM, Dell, or whatever major manufactuer is your preference (I prefer IBM).

This is really a stretch for the "right" kind of question for SO. Only in degrees though "implementation."
Pre-configured "Server" machines can often be more cost-beneficial. But, if you'd still prefer to build your own...
Considering just your budget ($800) for MB, Proc, and Memory...
RAM - DDR2 800 ($200/4GB, and cheaper)
MB - 1333/1066MHz FSB ($250)
CPU - Dual Core ($150)
Quad Core can still be too expensive for the benefit -- but, that's up to you to judge.
But, follow the links, and use the Advanced Search to cross out unnecessary features, and you should be able to reduce the list of items fairly easily.

Have you considered shared, dedicated, or virtual hosting? If I were you, I'd go with SliceHost for the virtual server, then use Amazon S3 for serving up images and other large static files. The combination has worked well for me in the past. I've found that, especially when it comes to hosting, don't take on more responsibility than you absolutely have to.

I use MediaTemple for my websites. They have a lot of professional organizations hosted on there servers. I'd probably go with them if I were you.
My dad thought the server route would be easy and we found out differently the hard way. If you don't have a friend or an employee that really knows what he's doing, I'd be careful. Anyways, good luck.

If you're not planning on running the next Amazon, I'd say that your choice of CPU/chipset is irrelevant. Find a motherboard with the features you need (4+ RAM slots, plenty of SATA headers, etc) that suits your budget and then buy a upper midrange multicore CPU to suit. Get a PCI express RAID card and a meaty UPS too.
Get a vanilla hard drive for the OS, and a pair of fast drives (WD Velociraptors, etc) and put them in RAID 1 for the webserver for redundancy.
Then, after a year or so or restarting the server every other day, migrate everything to a hosting company.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js