Digital Ocean or upgrade physical computers? - digital-ocean

Not sure where to post this question, but I was looking at DO and it seemed pretty compelling.
My coworker and I need new computers pretty soon at the office just to improve quality of life for when we do analysis or wrangling data on large files/queries.
I could purchase two physical computers which will become outdated (the options we have already look fairly old actually, we are stuck with Dell), or perhaps we can 'move to the cloud'. The upfront cost would be several thousand dollars and we would wait for a few months before receiving them if we chose the physical machines.
A challenge we have had is sharing tasks (like a script that cleans data and loads it into the database) and sharing the raw data files (prior to adding to the database). It would be nice to centralize all of this stuff because there has been a growing fragmentation and it makes it challenging to travel for long periods of time in case something breaks.
Is it possible to centralize some files and scripts on a DO instance and then both of us can access that machine and add new scripts, pull data, or make alterations as needed? Our computers at the office would just be for email and stuff, all of the heavy lifting could be done with DO.

Related

How did these big companies start from these three main points?

So this is something that Iv'e been thinking about lately, and it basically is : How did big music web apps or websites like Spotify, Youtube, or Anghami(if you know that one) start? I was actually thinking about 3 things, the first : How did they get these huge music libraries? the second : Did each of those big companies need to buy a special server to hold the website data and music Library? and if yes, how much does a special server cost in this case? and the third question is : How did they solve the copyrights with all of these creators or authors or publishers or whatever they're called, the copyrights owners in this case...?
1. They are uploaded by the artists/creators. I'd imagine pre-release Spotify would have had a library already put together by working with the artists.
2. Yes. They cost a lot. There are hundreds of millions of users and terabytes upon terabytes of data, spread around the world. Server costs will be in the millions. Starting out the upfront cost to set up infrastructure would be very high too.
3. This is definitely not the place to ask this kind of question. I would Google information on how copyrights with artists usually work

Achieving high concurrency in app that performs reading/writing to database?

I'm working to design a middle layer for an application that will receive up to ~5000 requests every few seconds and need to retrieve information from a database. I've been looking at use the Play Framework (I use scala for my REST api design) as they say its fully async and built on Akka. However, the main bottleneck of any solution seems to happen during read/writes to the database. Many Database cannot support simultaneous read/writes from a database of such a scale. How is such high concurrency achieved then for an app like this? I would guess Facebook/Twitter/ (name other big company) may have achieved this for their Applications as millions of people may be using them concurrently.
As Tim's comment was saying caching may or may not be able to help in your case. If not I would also recommend looking into horizontally scalable databases, for example cockroachdb if you want a transactional SQL db. Otherwise there are many no-sql choices a la mongodb etc. And if you really want to stick to traditional SQL systems you'll have to vertically scale your servers (buy the most expensive hardware) and work with read-replicas.
A huge component is your data model and query access pattern. If each query is incrementing a shared counter that has to be synchronized there will be a ton of contention, but if each query is touch completely separate data on the other end the spectrum than there will be a lot less contention.
I think there are a couple of dimensions I would consider:
Data Schema and Access Patterns (discussed above)
Language Choice
This is important becaues if you were in a web server context and were using prefork by default each process may have its own connection to the database. In an environment like python or ruby you may need hundreds of processes to handle your load. Contrast this with akka or another async networking based runtime (node, python gevent/asyncio, go, etc) where a single instance with a small thread pool can handle a large number of requests. Each have their tradeoffs.
Distributed Systems
Depending on your data schema and access patterns 5000 requests per second to a RDBMS is completely achievable. It would probably require relatively beefy hardware but but I'v personally done it a number of times. Getting to larger scales requires more computers in order to distribute the work/load. If your workload is right heavy and you can support potentially stale reads, a read replica is one option. With another machine in the mix reads are distributed over 2 machines but writes are still directed at a single machine (leader). Caching is another option.
At much higher workloads some sort of partitioning needs to occur in order to overcome the constraints of a single machine. https://github.com/vitessio/vitess
Many of the big contenders have solutions to horizontally scaling their databases. This has many drawbacks as well and will require careful planning.
The one thing I'd recommend is that if 5000 requests per second is projected for the near future, start with the minimal amount of hardware necessary (single instance) query patterns and operation get exponentially more complicated with a distributed database.

OpenEdge 11.3 Application Migration

We have an application with 10 millions lines of code in 4GL(Progress) and a database also OpenEdge with 300 Tables. My Boss says we should migrate it to a new Programming language and a new Database Management system.
My questions are:
Do you think we should migrate it? Do you think Progress has a "future"?
If we should migrate it, how, are there any tools? Or should we begin with programming from scratch?
Thank you for the help.
Ablo
Unless your boss has access to an unlimited budget, endless user patience and a thirst for frustration and agony you should not waste any time thinking about rewrites.
http://www.joelonsoftware.com/articles/fog0000000069.html
Yes, Progress has a future. They probably will never be as sexy an option as Microsoft or Oracle or whatever the cool kids are using this week. But they have been around for 30 years and they will still be here when you and your boss retire.
There are those who will rain down scorn on Progress because it isn't X or it doesn't have Y. Maybe they can rewrite your 10 million lines of code next weekend and prove just how right they are. I would not, however, pay them for those efforts until after the user acceptance tests are passed and the implementation is completed.
A couple of years later (the original post being from 2014 and the answers being from 2014 to 2015) :
The post, which has gotten the most votes is argumenting basically two fold :
a. Progress (Openedge) has been around for a long time and is not going anywhere soon
b. Unless your boss has access to an unlimited budget, endless user patience and a thirst for frustration and agony you should not waste any time thinking about rewrites: http://www.joelonsoftware.com/articles/fog0000000069.html
With regard to a:
Yes, the Progress OpenEdge Stack is still around. But from my experience the difficulty to find experienced and skilled Openedge has gotten even more difficult.
But also an important factor here, which i think has evolved to much greater importance, since this discussion started:
The available Open Source Stacks for application development have gotten by factors better, both in terms of out-of-box functionality and quality and have decisively moved in direction of RAD.
I am thinking for instance of Spring Boot, but not only, see https://stackshare.io/spring-boot/alternatives. In the Java realm Spring Boot is certainly unique. Also for the development of rich Webui's many very valid options have emerged, which certainly are addressing RAD requirements, just some "arbitrary" examples https://vaadin.com for Java, but also https://www.polymer-project.org for Javascript, which are interestingly converging both with https://vaadin.com/flow.
Many of the available stacks are still evolving strongly, but all have making life easier for the developer as strong driver. Also in terms of architectures you will find a convergence of many of this stacks with regard basic building blocks and principles: Separation of Interfaces from Implementation, REST API's for remote communication, Object Relational Mapping Technologies, NoSql / Json approaches etc etc.
So yes the Open Source Stack are getting very efficient in terms of Development. And what must also be mentioned, that the scope of these stacks do not stop with development: Deployment, Operational Aspects and naturally also Testing are a strong ,which in the end also make the developers life easier.
Generally one can say the a well choosen Mix and Match of Open Source Stacks have a very strong value proposition, also on the background of RAD requirements, which a proprietary Stack, will have in the long run difficulty to match - at least from my point of view.
With regard to b:
Interestingly enough i was just recently with a customer, who is looking to do exactly this: rewrite their application. The irony: they are migrating from Progress to Progress OpenEdge, with several additional Open Edge compliant Tools. The reason two fold: Their code is getting very difficult to maintain and would refactoring in order to address requirements coming from Web Frontends. Also interesting, they are not finding enough qualified developers.
Basically: Code is sound and lives , when it can be refactored and when it can evolve with new requirements. Unfortunately there many examples - at least from my experience - to contrary.
Additionally End-of-Lifecyle of Software can force a company, to "rewrite" at least layers of their software. And this doesn't necessarily have to bad and impossible. I worked on a Project, which migrated over 300 Oracle Forms forms to a Java based UI within less then two years. This migration from a 2 tier to a 3 tier architecture actually positioned the company to evolve their architecture to address the needs of Web Ui's. So actually in the end this "rewrite" and a strong return of value also from the business perspective.
So to cut a (very;-)) long story short:
One way or another, it is easy to go wrong with generalizations.
You need not begin programming from scratch. There is help available online and yes, you can contact Progress Technical Support if you find difficulties. Generally, ABL code from previous version should work with only little changes. Here are few things that you need to do in order to migrate your application:
Backup databases
Backup source code and .r files
Truncate DB bi files
Convert your databases
Recompile ABL code and test
http://knowledgebase.progress.com articles will help you in this. If you are migrating from some older versions like 9, you can find a good set of new features. You can try them but only after you are done with your conversion.
If you are migrating from 32-bit to 64-bit and if you are using 32-bit libraries, you need to replace them with 64-bit
The first question I'd come back with is 'why'? If the application is not measuring up that's one thing, and the question needs to be looked at from that perspective.
If the perception is that Progress is somehow a "lesser" application development and operating environment, and the desire is only to move to a different development and operating environment - you'll end up with a lot of resources in time, effort, and money invested - not to mention the opportunity cost - and for what? To run on a different database platform? Will migrating result in a lower TCO? Faster development turn-around time? Quicker time to market? What's expected advantage in moving from Progress, and how long will it take to recover the migration cost - if ever?
Somewhere out there is a company who had similar thoughts and tried to move off of Progress and the ABL. The effort failed to meet their target performance and functionality metrics, so they eventually gave up on the migration, threw in the towel, and stayed with Progress - after spending $25M on the project.
Can your company afford that kind of risk / reward ratio?
Progress (Openedge) has been around for a long time and is not going anywhere soon. And rewriting 10 Million lines of code in any language just to use the current flavor of the month would never be worth it unless your current application is not doing what you need. Even then bringing it up to current needs would normally be a better solution.
If you need to migrate your current application to the latest version of Openedge (Progress) you would normally just make a copy of your database(s) and convert it/them to the new version of Openedge and compile your your code against the new databases and shake the bugs out. You may have some keyword issues, but this is usually pretty minor.
If you need help with programming I would suggest contacting Progress Software and attending the yearly trade show or going to https://community.progress.com/ and asking/looking for local user groups. The local user groups would be a stellar place to find local programming talent.
Hope this helps.....

sync the same code blocks file across multiple computers

i have began coding on a separate computer from my home one and i am running code blocks off of a USB, but i have come across a problem when moving back to my home PC to use the same project. the problem is with syncing the files that have been updated on my USB back to my computer doing it manually is beginning to take longer and longer as the amount of files in my my project increases.
so my question is, is there some way (in code blocks preferably but if there is another ide that does it then i guess i will use that) to sync an old project with a new one and have it overwrites all the old code?
Use a version control system. Git would be a good choice, since you can set up a bare repository on the USB stick and sync with it from each machine.
Git has a hefty learning curve, but the huge benefits to your workflow will be worth the effort many times over.
Another choice is Mercurial. I strongly dislike Mercurial's branching model(s) (which I consider to be broken) and the lack of a staging area, but many projects have had good success with it, so it's worth considering. Mercurial also has the advantages of being easier to learn and better supported on Windows than git.

What is the recommened HW specs for virtualizations?

We are a startup company and doesnt have invested yet in HW resources in order to prepre our dev and testing environment. The suggestion is to buy a high end server, install vmware ESX and deploy mutiple VMs for build, TFS, database, ... for testing,stging and dev enviornment.
We are still not sure what specs to go with e.g. RAM, whether SAN is needed?, HD, Processor, etc..?
Please advice.
You haven't really given much information to go on. It all depends on what type of applications you're developing, resource usage, need to configure different environments, etc.
Virtualization provides cost savings when you're looking to consolidate underutilized hardware. If each environment is sitting idle most of the time, then it makes sense to virtualize them.
However if each of your build/tfs/testing/staging/dev environments will be heavily used by all developers during the working day simultaniously then there might not be as many cost savings by virtualizing everything.
My advice would be if you're not sure, then don't do it. You can always virtualize later and reuse the hardware.
Your hardware requirements will somewhat depend on what kind of reliability you want for this stuff. If you're using this to run everything, I'd recommend having at least two machines you split the VMs over, and if you're using N servers normally, you should be able to get by on N-1 of them for the time it takes your vendor to replace the bad parts.
At the low-end, that's 2 servers. If you want higher reliability (ie. less downtime), then a SAN of some kind to store the data on is going to be required (all the live migration stuff I've seen is SAN-based). If you can live with the 'manual' method (power down both servers, move drives from server1 to server2, power up server2, reconfigure VMs to use less memory and start up), then you don't really need the SAN route.
At the end of the day, your biggest sizing requirement will be HD and RAM. Your HD footprint will be relatively fixed (at least in most kinds of a dev/test environment), and your RAM footprint should be relatively fixed as well (though extra here is always nice). CPU is usually one thing you can skimp on a little bit if you have to, so long as you're willing to wait for builds and the like.
The other nice thing about going all virtualized is that you can start with a pair of big servers and grow out as your needs change. Need to give your dev environment more power? Get another server and split the VMs up. Need to simulate a 4-node cluster? Lower the memory usage of the existing node and spin up 3 copies.
At this point, unless I needed very high-end performance (ie. I need to consider clustering high-end physical servers for performance needs), I'd go with a virtualized environment. With the extensions on modern CPUs and OS/hypervisor support for them, the hit is not that big if done correct.
This is a very open ended question that really has a best answer of ... "It depends".
If you have the money to get individual machines for everything you need then go that route. You can scale back a little on the hardware with this option.
If you don't have the money to get individual machines, then you may want to look at a top end server for this. If this is your route, I would look at a quad machine with at least 8GB RAM and multiple NICs. You can go with a server box that has multiple hard drive bays that you can setup multiple RAIDS on. I recommend that you use a RAID 5 so that you have redundancy.
With something like this you can run multiple VMWare sessions without much of a problem.
I setup a 10TB box at my last job. It had 2 NICs, 8GB, and was a quad machine. Everything included cost about 9.5K
If you can't afford to buy the single machines then you probably are not in a good position to start re-usably with virtualisation.
One way you can do it is take the minimum requirements for all your systems, i.e. TFS, mail, web etc., add them all together and that will give you an idea of half the minimum server you need to host all those systems. Double it and you be near what will get you buy, if you have spare cash double/triple the RAM. Most OSes run better with more RAM to particular ceiling. Think about buying expandable storage of some kind and aim for half populated to start with which will keep the initial cost/GB and make for some expansion at lower cost in the future.
You can also buy servers which take multiple CPUs but only put in the minimum amount of CPUs. Also go for as many cores on a CPU as you can get for thermal, physical and licensing efficiency.
I appreciate this is a very late reply but as I didn't see many ESX answers here I wanted to post a reply though my post equally relates to Hyper-V etc.