Riak CS: Stanchion failover

Riak CS: Stanchion failover - riak-cs

From the Riak CS docs, I understand that only one Stanchion server should be used for a cluster. In my cluster I'm spinning up identical machines that all have the same setup, so that theoretically any machine could takeover if needed.
Ideally I would like to have Stanchion also running on all servers and if the main Stanchion server fails, I want another server to take over that role. Can this work or do I need to isolate the Stanchion server on to a separate machine (with a failover machine)? Or can I configure a list of IPs for the Stanchion server, so that if the main one becomes unavailable automatically the next one is tried?

You don't have to isolate Stanchion in separate box. At the latest version of CS has riak-cs-stanchion command to switch Stanchion host and port to new ones manually. Please make sure that former Stanchion is down before running this command to all CS nodes. Well, sooo a late answer.
http://docs.basho.com/riakcs/latest/cookbooks/command-line-tools/#riak-cs-stanchion

Related

Spawn EC2 instance via Python

Can someone help me to understand the basics of spawning EC2 instances and deploying AMIs and how to configure them properly?
Current situation:
In my company we have 1 server and a few clients which run calculations and return the results when they are done. The system is written in Python but sometimes we run out of machine power so I am considering to support the clients with additional EC2 clients - on demand. The clients connect to the server via an internal IP which is set in a config file.
Question:
Am I assuming right that I just create an AMI where our Python client sits in autostart and once its started it connects to the public IP and picks up new tasks? Is that the entire magic or do I miss some really great features in this concept?
Question II
While spawning a new instance, can I start such instance with updated configuration or meta information or do I have to update my AMI before all the time I make a small change?

if you want to stick with just plain spawning EC2 instances, here are the answers to your questions:
Question I - This is one of the valid approaches and yes, if your Python client will be configured properly, it will 'just work'.
Question II - Yes, you can achieve that, which is very well explained here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html. There's also another way of having your configuration stored somewhere else, and just fetch it when the instance is starting.

How to get IP address Jetty is running on

I'm developing Spring MVC cluster website and emulating a cluster on one developer machine, running several instances of Jetty 9.2.2 on different local addresses:
127.0.0.10
127.0.0.11
127.0.0.12
and so on. To use CometD clustering solution, I need to know at runtime IP address of Jetty server, which is currently serving this particular runtime. I mean, would it be 127.0.0.10, or 127.0.0.12. I set this parameter in start.ini:
jetty.host=127.0.0.N
where N is different for every of 5 instances.
So, how do I know it at runtime?

The CometD Oort cluster supports three modes of discovering other nodes: automatic, static and manual.
The automatic way is based on multicast, so if you have multicast working on the hosts the problem should be solved.
With the static way, you just need one "well known" server to be up and running, and point all other nodes to that "well known" server.
With the manual way, you can use other discovery mechanisms (for example, lookup jetty.host in the System properties) and initialize the Oort instances with the discovered values.
It is all explained in the documentation.

Using Amazon AWS as a development server.

I'm still cheap.
I have a software development environment which is a bog-standard Ubuntu 11.04 plus a pile of updates from Canonical. I would like to set it up such that I can use an Amazon EC2 instance for the 2 hours per week when I need to do full system testing on a server "in the wild".
Is there a way to set up an Amazon EC2 server image (Ubuntu 11.04) so that whenever I fire it up, it starts, automatically downloads code updates (or conversely accepts git push updates), and then has me ready to fire up an instance of the application server. Is it also possible to tie that server to a URL (e.g ec2.1.mydomain.com) so that I can hit my web app with a browser?
Furthermore, is there a way that I can run a command line utility to fire up my instance when I'm ready to test, and then to shut it down when I'm done? Using this model, I would be able to allocate one or more development servers to each developer and only pay for them when they are being used.

Yes, yes and more yes. Here are some good things to google/hunt down on SO and SF
--ec2 command line tools,
--making your own AMI's from running instances (to save tedious and time consuming startup gumf),
--route53 APIs for doing DNS magic,
--ubunutu cloud-init for startup scripts,
--32bit micro instances are your friend for dev work as they fall in the free usage bracket

All of what James said is good. If you're looking for something requiring less technical know-how and research, I'd also consider:
juju (sudo apt-get install -y juju). This lets you start up a series of instances. Basic tutorial is here: https://juju.ubuntu.com/docs/user-tutorial.html

Is there a way for the cache to stay up without timeout after crash in AppFabric Cache?

First my setup that is used for testing purpose:
3 Virtual Machines running with the following configuration:
MS Windows 2008 Server Standard Edition
Latest version of AppFabric Cache
Each one has a local network share where the config file is stored (I have added all the machines in each config)
The cache is distributed but not high availibility (we don't have Enterprise version of Windows)
Each host is configured as lead, so according to the documentation at least one host should be allowed to crash.
Each machine has the website I testing installed, and local cache configured
One linux machine that is used as a proxy (varnish is used) to distribute the traffic for testing purpose.
That's the setup and now on to the problem. The scenario I am testing is simulating one of the servers crashing and then bring it back in the cluster. I have problem both with the server crashing and bringing it back up. Steps I am using to test it:
Direct the traffic with Varnish on the linux machine to one server only.
Log in to make sure there is something in the cache.
Unplug the network cable for one of the other servers (simulates that server crashing)
Now I get a cache timeout and I get a service error. I want the application to still be up on the servers that didn't crash, and it take some time for the cache to come back up on the remaining servers. Is that how it should be? Plugging the network cable back in and starting the host cause a similar problem.
So my question is if I have missed something? What I would like to see happen is that if one server crashes the cache should still remaing upp since a majority of the leads are still up, and starting the crashed server again should bring it back gracefully into the cluster without any causing any problems on the other hosts. But that might no be how it works?

I ran through a similar test scenario a few months ago where I had a test client generating load on a 3 lead-server cluster with a variety of Puts, Gets, and Removes. I rebooted one of the servers multiple times while the load test was running and the cache stayed online. If I remember correctly, there were a limited number errors as that server rebooted, but overall the cache appeared to remain healthy.
I'm not sure why you're not seeing similar results, but I would try removing the Varnish proxy from your test and see if that helps.

Move to 2 Django physical servers (front and backend) from a single production server?

I currently have a growing Django production server that has all of the front end and backend services running on it. I could keep growing that server larger and larger, but instead I want to try and leave that main server as my backend server and create multiple front end servers that would run apache/nginx and remotely connect to the main production backend server.
I'm using slicehost now, so I don't think I can benefit from having the multiple servers run on an intranet. How do I do this?

The first step in scaling your server is usually to separate the database server. I'm assuming this is all you meant by "backend services", unless you give us any more details.
All this needs is a change to your settings file. Change DATABASE_HOST from localhost to the new IP of your database server.
If your site is heavy on static content, creating a separate media server could help. You may even look into a CDN.

The first step usually is to separate the server running actual Python code and the database server. Any background jobs that does processing would probably run on the database server. I assume that when you say front end server, you actually mean a server running Python code.
Now, as every request will have to do a number of database queries, latency between the webserver and the database server is very important. I don't know if Slicehost has some feature to allow you to create two virtual machines that are "close" in terms of network latency(a quick google search did not find anything). They seem like nice guys, so maybe you could ask them if they have such a service or could make an exception.
Anyway, when you do have two machines on Slicehost, you could check the latency between them by simply pinging between them. When you have the result you will probably know if this is at all feasible or not.
Further steps depends on your application. If it is media heavy, then maybe using a separate media server would make sense. Otherwise the normal step is to add more web servers.
--
As a side note, I personally think it makes more sense to invest in real dedicated servers with dedicated network equipment for this kind of setup. This of course depends on what budget you are on.
I would also suggest looking into Amazon EC2 where you can provision servers that are magically close to each other.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js