Solr or SolrCloud in Aws? - amazon-web-services

I have to add a solr search server in an Aws-EC2 instance.Right now I have Solr installed in an AWS-EC2 instance with ram 8gb and disc space 50gb.Its working fine, but I was wondering if changing to SolrCloud improve the performance.Should I go for normal Solr or Should I go for SolrCloud? If SolrCloud,why?

It's impossible to say, both will work. "Regular" Solr allows you to scale your infrastructure by adding replicas for your cores, while SolrCloud adds hidden complexity for easier handling of replication and query distribution.
If everything works now, I wouldn't fret it. Keep track of your query times and re-evaluate if you run into issues where you need to add instances to your cluster quickly. A regular, simple Solr setup with HTTP replication will in almost all cases do just fine.

Both would work, but if you were starting afresh, go with SolrCloud. Here's a few reasons for it:
There is little or no meaningful application development overhead if you go with SolrCloud. If you are writing your integration / application to talk to Solr, just do so with SolrCloud to begin with.
SolrCloud is what's getting more and more utilized by everyone and lot of new development and capabilities are release fast enough.
As you scale, you are already on a path to simple scalability as opposed to trying to figure out High Availability and / or Data partitioning. You could use SolrCloud API to add replicas, etc.
Not much too lose from a feature, functionality perspective.
If you want to try out starting a cluster with high availability or single node deployment of SolrCloud on AWS, you can sign up for a free trial at https://searchstax.measuredsearch.com/freetrial/

Related

upgrading postgresl/redis databases without downtime on GCP

I'm creating a web app in react with a nodeJS backend. I'm hosting all this on the Google Cloud Platform. I'm using a postgresql database and a redis database, and because my knowledge of these databases is very little, I'm using the managed options (cloud SQL and cloud memorystore).
These are not the cheapest solutions, but for now, they'll do what I want them to do.
My question now is: I'm using the managed options. Imagine my web app has success and grows bigger, I'll probably want my own managed solution (like a redis cluster on compute engine or a postgresql cluster on compute engine). Will I be able to migrate my managed databases to the compute engine solution without downtime/loss of data?
If things are getting bigger, I'll probably hire someone with more knowledge about postgresql/redis, that's not the problem, the only thing I want to know: is it possible to upgrade from a GCP managed solution to an unmanaged solution on compute engine without loss of data and downtime? I'm do not want loss of data at all, a little downtime should not be the problem.
Using the managed solution is, in fact, a better approach for handling databases. GCP takes over updates, management and maintenance of the database and provides reliable tools for backup and scaling.
But to answer your question, yes it is possible to migrate with a minimum downtime. You would need to configure main/worker or master/slave (deprecated terminology) with synchronous replication. After that you can switch your database to worker (which is in Compute Engine) and make it your primary database. This would give basically minimal possible downtime.

Can long running query improves performance using AWS?

As we are a data warehouse team, we deals with millions of records in and out on daily basis. We have jobs running ever day, and loads on to SQL Server Flex clones from oracle DB through ETL loads. As we are dealing with huge amount of data and complex queries, query runs pretty longer and it goes to hours. So we are looking towards using AWS. We wanted to setup our own licensed Microsoft SQL server on EC2. But I was wondering, how this will improve performance of long running query. What would be the main reason that same query takes longer on our own servers and executes faster on AWS. Or did I misunderstood the concept?(just letting you know I am at a learning phase)
PS: We are still in a R&D phase. Any thoughts or opinion would be greatly appreciated regarding AWS for long running queries.
You need to provide more details on your question.
What is your query ?
How big is the tables ?
What is the bottle neck ? CPU ? IO ? RAM ?
AWS is just infrastructure.
It does makes your life easier because you can scale up or down your machine in a click of buttons.
Well, I guess you can crank up your machine to however big you want, but even so, nothing will solve a bad query and bad architecture.
Keep in mind, EC2 comes with 2 type of disk. EBS and Ephemeral.
EBS is SAN. Ephemeral is attached to the EC2 instance it self.
By far, Ephemeral will be much faster of course, but the downside is that when you shutdown your EC2 and start it up again, all of the data in that drive is wiped clean.
As for licensing (windows and SQL Server), it is baked into the EC2 instance pre baked AMI (Amazon Machine Image).
I've never used my own license in EC2.
With same DB, Same hardware configuration, query will perform similarly on AWS or on prim. You need to check whether you have configured DB / indexes etc optimally. Also, think of replicating data to some other database which is optimized for querying huge amount of data.

How to convert a WAMP stacked app running on a VPS to a scalable AWS app?

I have a web app running on php, mysql, apache on a virtual windows server. I want to redesign it so it is scalable (for fun so I can learn new things) on AWS.
I can see how to setup an EC2 and dump it all in there but I want to make it scalable and take advantage of all the cool features on AWS.
I've tried googling but just can't find a simple guide (note - I have no command line experience of Linux)
Can anyone direct me to detailed resources that can lead me through the steps and teach me? Or alternatively, summarise the steps in an answer so I can research based on what you say.
Thanks
AWS is growing and changing all the time, so there aren't a lot of books to help. Amazon offers training that's excellent. I took their three day class on Architecting with AWS that seems to be just what you're looking for.
Of course, not everyone can afford to spend the travel time and money to attend a class. The AWS re:Invent conference in November 2012 had a lot of sessions related to what you want, and most (maybe all) of the sessions have videos available online for free. Building Web Scale Applications With AWS is probably relevant (slides and video available), as is Dissecting an Internet-Scale Application (slides and video available).
A great way to understand these options better is by fiddling with your existing application on AWS. It will be easy to just move it to an EC2 instance in AWS, then start taking more advantage of what's available. The first thing I'd do is get rid of the MySql server on your own machine and use one offered with RDS. Once that's stable, create one or more read replicas in RDS, and change your application to read from them for most operations, reading from the main (writable) database only when you need completely current results.
Does your application keep any data on the web server, other than in the database? If so, get rid of all local storage by moving that data off the EC2 instance. Some of it might go to the database, some (like big files) might be suitable for S3. DynamoDB is a good place for things like session data.
All of the above reduces the load on the web server to just your application code, which helps with scalability. And now that you keep no state on the web server, you can use ELB and Auto-scaling to automatically run multiple web servers (and even automatically launch more as needed) to handle greater load.
Does the application have any long running, intensive operations that you now perform on demand from a web request? Consider not performing the operation when asked, but instead queueing the request using SQS, and just telling the user you'll get to it. Now have long running processes (or cron jobs or scheduled tasks) check the queue regularly, run the requested operation, and email the result (using SES) back to the user. To really scale up, you can move those jobs off your web server to dedicated machines, and again use auto-scaling if needed.
Do you need bigger machines, or perhaps can live with smaller ones? CloudWatch metrics can show you how much IO, memory, and CPU are used over time. You can use provisioned IOPS with EC2 or RDS instances to improve performance (at a cost) as needed, and use difference size instances for more memory or CPU.
All this AWS setup and configuration can be done with the AWS web console, or command-line tools, or SDKs available in many languages (Python's boto library is great). After learning the basics, look into CloudFormation to automate it better (I've written a couple of posts about that so far).
That's a bit of the 10,000 foot high view of one approach. You'll need to discover the details of each AWS service when you try to use them. AWS has good documentation about all of them.
Depending on how you look at it, this is more of a comment than it is an answer, but it was too long to write as a comment.
What you're asking for really can't be answered on SO--it's a huge, complex question. You're basically asking is "How to I design a highly-scalable, durable application that can be deployed on a cloud-based platform?" The answer depends largely on:
The specifics of your application--what does it do and how does it work?
Your tolerance for downtime balanced against your budget
Your present development and deployment workflow
The resources/skill sets you have on-staff to support the application
What your launch time frame looks like.
I run a software consulting company that specializes in consulting on Amazon Web Services architecture. About 80% of our business is investigating and answering these questions for our clients. It's a multi-week long project each time.
However, to get you pointed in the right direction, I'd recommend that you look at Elastic Beanstalk. It's a PaaS-like service that abstracts away the underlying AWS resources, making AWS easier to use for developers who don't have a lot of sysadmin experience. Think of it as "training wheels" for designing an autoscaling application on AWS.

Migrate hosted LAMP site to AWS

Is there an easy way to migrate a hosted LAMP site to Amazon Web Services? I have hobby sites and sites for family members where we're spending far too much per month compared to what we would be paying on AWS.
Typical el cheapo example of what I'd like to move over to AWS:
GoDaddy domain
site hosted at 1&1 or MochaHost
a handful of PHP files within a certain directory structure
a small MySQL database
.htaccess file for URL rewriting and the like
The tutorials I've found online necessitate PuTTY, Linux commands, etc. While these aren't the most cumbersome hurdles imaginable, it seems overly complicated. What's the easiest way to do this?
The ideal solution would be something like what you do to set up a web host: point GoDaddy to it, upload files, import database, done. (Bonus points for phpMyAdmin being already installed but certainly not necessary.)
It would seem the amazon AWS marketplace has now got a solution for your problem :
https://aws.amazon.com/marketplace/pp/B0078UIFF2/ref=gtw_msl_title/182-2227858-3810327?ie=UTF8&pf_rd_r=1RMV12H8SJEKSDPC569Y&pf_rd_m=A33KC2ESLMUT5Y&pf_rd_t=101&pf_rd_i=awsmp-gateway-1&pf_rd_p=1362852262&pf_rd_s=right-3
Or from their own site
http://www.turnkeylinux.org/lampstack
A full LAMP stack including PHPMyAdmin with no setup required.
As for your site and database migration itself (which should require no more than file copies and a database backup/restore) the only way to make this less cumbersome is to have someone else do it for you...
Dinah,
As a Web Development company I've experienced an unreal number of hosting companies. I've also been very closely involved with investigating cloud hosting solutions for sites in the LAMP and Windows stacks.
You've quoted GoDaddy, 1And1 and Mochahost for micro-sized Linux sites so I'm guessing you're using a benchmark of $2 - $4 per month, per site. It sounds like you have a "few" sites (5ish?) and need at least one database.
I've yet to see any tool that will move more than the most basic (i.e. file only, no db) websites into Cloud hosting. As most people are suggesting, there isn't much you can do to avoid the initial environment setup. (You should factor your time in too. If you spend 10 hours doing this, you could bill clients 10 x $hourly-rate and have just bought the hosting for your friends and family.)
When you look at AWS (or anyone) remember these things:
Compute cycles is only where it starts. When you buy hosting from traditional ISPs they are selling you cycles, disk space AND database hosting. Their default levels for allowed cycles, database size and traffic is also typically much higher before you are stopped or charged for "overage", or over-usage.
Factor in the cost of your 1 database, and consider how likely it will be that you need more. The database hosting charges can increase Cloud costs very quickly.
While you are likely going to need few CCs (compute cycles) for your basic sites, the free tier hosting maximums are still pretty low. Anticipate breaking past the free hosting and being charged monthly.
Disk space it also billed. Factor in your costs of CCs, DB and HDD by using their pricing estimator: http://calculator.s3.amazonaws.com/calc5.html
If your friends and family want to have access to the system they won't get it unless you use a hosting company that allows "white labeling" and provides a way to split your main account into smaller mini-hosting accounts. They can even be setup to give self-admin and direct billing options if you went with a host like www.rackspace.com. The problem is you don't sound like you want to bill anyone and their minimum account is likely way too big for your needs.
Remember that GoDaddy (and others) frequently give away a year of hosting with even simple domain registrations. Before I got my own servers I used to take HUGE advantage of these. I've probably been given like 40+ free hosting accounts, etc. in my lifetime as a client. (I still register a ton of domain through them. I also resell their hosting.)
If you aren't already, consider the use of CMS systems that support portaling (one instance, many websites under different domains). While I personally prefer DotNetNuke I'm sure that one of its LAMP stack competitors can do the same for you. This will keep you using only one database and simplify your needs further.
I hope this helps you make a well educated choice. I think it'll be a fine-line between benefits and costs. Only knowing the exact size of every site, every database and the typical traffic would allow this to be determined in advance. Database count and traffic will be your main "enemies". Optimize files to reduce disk-space needs AND your traffic levels in terms of data transferred.
Best of luck.
Actually it depends upon your server architecture, whether you want to migrate whole of your LAMP stack to Amazon EC2.
Or use different Amazon web services for different server components like Amazon S3 for storage and Amazon RDS for mysql database and so.
In case if you are going with LAMP on EC2: This tutorial will atleast give you a head up.
Anyways you still have to go with essential steps of setting up the AMI and installing LAMP through SSH.

Django redundancy and replication over two VPS accounts

I'm slowly getting into the position where one of my Django sites needs some robustness behind it. I'd currently running on a single VPS on a SQLite database with memcached.. It's about as un-scaled as things can get.
If I bought another VPS account, what would I want to do?
Move to MySQL/PostgreSQL with replication? What's easiest? Does replication protect me from one server exploding? Are there concurrency downsides?
How do I load-balance between the two servers?
I'd put memcached on the new server too. If I put both IPs into the configuration, would that keep a copy of data on both servers? (I'm thinking of what happens to session data - currently stored in memcached)
I'm currently using Cherokee as the httpd - I'm sure this has its own set of issues. If you've any tips, let me know.
Am I going at this the wrong way? Is there an easier way to have faster, more robust django sites?
First step: switch from SQLite to a real production database (I like Postgres). This should happen long before you even think about a second VPS. SQLite essentially does not support concurrency at all. Personally, I wouldn't even consider deploying a live site on SQLite in the first place.
If your site is running on SQLite and is functioning, my guess is you are still quite a long ways from actually outgrowing your single VPS (unless it's already heavily loaded otherwise).
If/when you do need to add a second server, how you configure things depends on where you're actually seeing a bottleneck. Chances are it'll be the database, in which case a good step might be simply moving the database onto its own server (presuming you can guarantee low latency between the two VPSes) and loading the database server with as much RAM as you can afford. In general disk performance suffers most in a VPS, so another step to consider might be putting the DB onto raw metal.
I'd probably look at those steps before I'd think about DB replication or multiple web-tier servers, but it really depends on profiling your actual case (and how you value performance vs reliability).
Watching the Django Deployment Workshop by Jacob Kaplan-Moss should give you a good overview.
MySQL supports Master-Slave and Master-Master setups I don't use PostgreSQL.
You can use nginx as your loadbalancer, HAProxy is an option, too (SO use it).
Memcached distributes the objects over the servers, If one crashes the data is lost.
I don't know Cherokee, but nginx is great.