AWS Docker Interpreter with Pycharm

AWS Docker Interpreter with Pycharm - amazon-web-services

I'm having difficulty to set this up correctly, and burning through AWS server time while I try to make it work. I have segmentation code that is heavily memory intensive that I'd like to temporarily spin up an AWS server with 192GB of ram. I understand that this is possible using docker, but the instructions on pycharm are non-existent with respect to the docker instructions necessary to tie it together (it references existing code as opposed to showing how to assemble it from scratch). What would the docker run command on the server look like to enable a connection to the 2375 port?
EDIT: I am using Pycharm Professional

UPD: Checking PyCharm options I found that there is an option to use Docker Machines. This seem to be exactly what you need. With Docker Machines you can make Docker spin up an EC2 instance for you with proper security out-of-the-box. Read official documentation on how to get started here and AWS driver options to learn how to set EC2 instance type, AMI, and other options here .
Original post:
To enable this feature you have to run Docker daemon with '-H' option:
sudo dockerd -H tcp://0.0.0.0:2375
You may read more on that in the Docker docs: https://docs.docker.com/engine/reference/commandline/dockerd/ .
Beware though, for EC2 you may also need to open that port using security group https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/authorizing-access-to-an-instance.html .
I also want to add that what you want to achieve isn't good from security perspective. Exposing docker socket like that is like an invitation for bad guys to throw a party at your EC2 instance. But since you mentioned that this is temporary...

Related

Deploy a C++ application to the Google Cloud Platform Kubernetes engine

As far as my understanding goes, the Kubernetes engine is meant for deploying applications that can be load balanced, for example, having an application which unhashes a string. If pod-a is on high load, it would be offloaded to pod-b. Correct me if I am wrong here, since if this is false, my following question will not make sense.
After exploring it for few hours I can't seem to figure out how to deploy a C++ application to the Kubernetes cluster. How would I do so?
What I tried:
I tried to follow the guide: Interactive Tutorial - Deploying an App, however, I couldn't understand how I would get my C++ app as an image that could be deployed.
What the C++ application is:
At the moment it proxies TCP traffic to another HOST designated by clients' HOSTNAME. It is pretty much a reverse proxy, however, this is NOT an HTTP application.

Is Kubernetes the right choice?
-
Kubernetes is really useful to loadbalance workloads, to provide high availability in case of failure to speed up test processes, and to increase safety during production rollout through different strategies and increase security through segregation.
However, not all the kind of workloads can take advantage of all the features introduced by Kubernetes.
For example, if your application is built in such a way it needs a stable amount of RAM and CPU, the code as well is really stable and you need merely one replica, then maybe Kubernetes and containers are not the best choice (even if you can perfectly use them), and you should rather implement everything on a big monolithic server/virtual machine.
But if you need to deploy it on a different cloud provider, and it should run merely some hours every day, maybe then it can make use as well of those features. If you are willing to add a layer, make sure that you need the features it introduces, otherwise it would be merely an overhead.
Note that Kubernetes it is not capable of splitting your workload alone. Therefore I do not know if what you mean by "If pod-a is on high load, it would be offloaded to pod-b" likely yes it is possible, but you have to instruct it to do so.
Kubernetes takes care to run your POD, making sure there have been scheduled on nodes where enough memory and CPU is available according to your specification, you can set up autoscaling procedures as well to support high workload periods or to scale even the cluster itself. Your application should have been created in such a way to support a divide and conquer pattern, otherwise you will likely have three nodes, one pod running on one node, two idle and a overhead that you could have avoided.
If your C++ application POD unhashes a strings and a single request could consume all the resources of a node Kubernetes will not "spit" the initial workload and will not create for you more PODS scheduling them across the cluster! Of course you can achieve something similar, but it will not come for free and you will likely need to modify your C++ code.
For sure you can take advantage of Kubernetes, running your application on it is pretty easy, but maybe you will have to modify something in the architecture to fully make advantage of those features.
Deploy the C++ application
The process to deploy your application in Kubernetes is pretty standard. Develop it locally, create a Docker image with all the libraries and components you need, test it locally, push it to the registry, and create the deployment in Kubernetes.
Let's say that you have all the resources needed to run your application and your executable file in a local folder. Create the Docker file.
Example, modify to implement your application, I have reported it as an example to show syntax:
# Download base image, Ubuntu 16.04 (Xenial Xerus)
FROM ubuntu:16.04
# Update software repository
RUN apt-get update
# Install nginx, php-fpm and supervisord from the Ubuntu repository
RUN apt-get install -y nginx php7.0-fpm supervisor
# Define the environment variable
ENV nginx_vhost /etc/nginx/sites-available/default
[...]
# Enable php-fpm on the nginx virtualhost configuration
COPY default ${nginx_vhost}
[...]
RUN chown -R www-data:www-data /var/www/html
# Volume configuration
VOLUME ["/etc/nginx/sites-enabled", "/etc/nginx/certs", "/etc/nginx/conf.d", "/var/log/nginx", "/var/www/html"]
# Configure services and port
COPY start.sh /start.sh
CMD ["./start.sh"]
EXPOSE 80 443
Built it running:
export PROJECT_ID="$(gcloud config get-value project -q)"
docker build -t gcr.io/${PROJECT_ID}/hello-app:v1 .
gcloud docker -- push gcr.io/${PROJECT_ID}/hello-app:v1
kubectl run hello --image=gcr.io/${PROJECT_ID}/hello-app:v1 --port [port number if needed]
More information is here.

How to configure Elastic IP with django app in aws?

I am building an app using django in EC2-ubuntu and i have associated Elastic ip with my instance.
i have done following steps :
1. first created instance of ubuntu in ec2 free tier.
2. installed python.
3. installed pip.
4. installed django.
5. create a django project using django-admin startproject.
6. run server using these commads python manage.py runserver 0.0.0.0:80
7. created an elastic ip and associated to the instance.
8. configure security inbound settings with http 0.0.0.0:80 address.
9. able to ping my project using any browser.
But the problem is when i am closing my putty session where i supplied runserver command, django project is also stopped. i did not stop it manually.
Please, help me to keep on running after closing putty session as well.
Thanks,
Kripa Sharma

Take a look at this Answer
I highly recommend that you start using Elastic Beanstalk (Python instance) to take care of all these steps for you. Very simple to setup, and no need to worry about any of the steps you listed.
You can use this instruction to see how you can deploy a Django app in less than 5 minutes.

The problem
You are trying to persist the debug server for a remotely deployed application.
You probably need to review the runserver command documentation. Here are the relevant parts:
django-admin runserver [addrport]
Starts a lightweight development Web server on the local machine. By default, the server runs on port 8000 on the IP address 127.0.0.1. You can pass in an IP address and port number explicitly.
...
DO NOT USE THIS SERVER IN A PRODUCTION SETTING. It has not gone through security audits or performance tests. (And that’s how it’s gonna stay. We’re in the business of making Web frameworks, not Web servers, so improving this server to be able to handle a production environment is outside the scope of Django.)
A webserver
Having skimmed the above docs, you may want to look at "How to deploy with WSGI" section, which gives a few recommendations for commonly used Web servers. My favorite, Gunicorn, includes a usage example:
$ pip install gunicorn
$ gunicorn myproject.wsgi
Having decided, and installed a webserver, you'd need to "daemonize" it and expose it to the world.
The former is usually done by creating a service on your OS, for ubuntu it would be either upstart or systemd depending on the version. Gunicorn docs have examples for both.
The latter is usually achieved with an http-server/proxy such as nginx or apache httpd. And again, Gunicorn has an example for us.
You can see why I like it so much ☺️
Epilogue
While technically possible to run the debug server as a service or even in a terminal multiplexer such as GNU screen or tmux, it's not a recommended or stable long term solution.
That said, these are very useful to know about, so read on the above tools and learn to use them, since they would be invaluable to have in your toolset in the future, for example to avoid accidentally terminating a long running command (such as migration), etc.

Deploy Docker from Travis to AWS (or any other SSH able server)

My deployment process is missing one important piece. To push the code up to the server.
I'm banging my head whether to:
1 - Create/Build the Docker image on Travis and then somehow push it to AWS
OR
2 - Try to ssh (from travis script) into my AWS and run a command set there, including the Docker image build and initialization.
I'm definitively in doubt and I see problems in both solutions proposed above. What would be the usual mechanism here?

I will answer myself here:
1 is not a good idea, it's definitively wrong. If you build the Docker container on the Travis side, and (think you can) transfer it to AWS, you'd be overkilling the network and your deployment could take ages.
2 is more or less the way to follow (there could be others). In my case I ssh'd from travis into my server and executed a set of commands remotely, among them docker build, docker run.
It's good to mention that whatever command you remotely execute in your server, the output is beautifully transferred into your Travis output.

WebHCat on Amazon's EMR?

Is it possible or advisable to run WebHCat on an Amazon Elastic MapReduce cluster?
I'm new to this technology and I was wonder if it was possible to use WebHCat as a REST interface to run Hive queries. The cluster in question is running Hive.

I wasn't able to get it working but WebHCat is actually installed by default on Amazon's EMR instance.
To get it running you have to do the following,
chmod u+x /home/hadoop/hive/hcatalog/bin/hcat
chmod u+x /home/hadoop/hive/hcatalog/sbin/webhcat_server.sh
export TEMPLETON_HOME=/home/hadoop/.versions/hive-0.11.0/hcatalog/
export HCAT_PREFIX=/home/hadoop/.versions/hive-0.11.0/hcatalog/
/home/hadoop/hive/hcatalog/webhcat_server.sh start
You can then confirm that it's running on port 50111 using curl,
curl -i http://localhost:50111/templeton/v1/status
To hit 50111 on other machines you have to open the port up in the EC2 EMR security group.
You then have to configure the users you going to "proxy" when you run queries in hcatalog. I didn't actually save this configuration, but it is outlined in the WebHCat documentation. I wish they had some concrete examples there but basically I ended up configuring the local 'hadoop' user as the one that run the queries, not the most secure thing to do I am sure, but I was just trying to get it up and running.
Attempting a query then gave me this error,
{"error":"Server IPC version 9 cannot communicate with client version
4"}
The workaround was to switch off of the latest EMR image (3.0.4 with Hadoop 2.2.0) and switch to a Hadoop 1.0 image (2.4.2 with Hadoop 1.0.3).
I then hit another issues where it couldn't find the Hive jar properly, after struggling with the configuration more, I decided I had dumped enough time into trying to get this to work and decided to communicate with Hive directly (using RBHive for Ruby and JDBC for the JVM).
To answer my own question, it is possible to run WebHCat on EMR, but it's not documented at all (Googling lead me nowhere which is why I created this question in the first place, it's currently the first hit when you search "WebHCat EMR") and the WebHCat documentation leaves a lot to be desired. Getting it to work seems like a pain, though my hope is that by writing up the initial steps someone will come along and take it the rest of the way and post a complete answer.

I did not test it but, it should be doable.
EMR allows to customise the bootstrap actions, i.e. the scripts run where the nodes are started. You can use bootstrap actions to install additional software and to change the configuration of applications on the cluster
See more details at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html.
I would create a shell script to install WebHCat and test your script on a regular EC2 instance first (outside the context of EMR - just as a test to ensure your script is OK)
You can use EC2's user-data properties to test your script, typically :
#!/bin/bash
curl http://path_to_your_install_script.sh | sh
Then - once you know the script is working - make it available to the cluster on a S3 bucket and follow these instructions to include your script as custom bootstrap action of your cluster.
--Seb

Configuring AmazonLinux AMI instances

I am trying to setup an AMI such that, when booted it will auto configure itself with a defined "configuration" somewhere on a server. I came across Chef and Puppet. Considering Puppet, I was able to run though their examples but couldn't see one for auto configuration from master. I found out that Puppet Enterprise is not supported on "Amazon Linux". Team chose Amazon Linux and would like keep that instead of going to other OS just because one tool doesn't support it. Can someone please give me some idea about how I could achieve this? (I am trying to stay away from home grown shell scripts over a good industry adopted tool for maintainability)

What I have done in the past is to copy /etc/rc.local to /etc/rc.local.orig, and then configure /etc/rc.local to kick off a puppet run and then pave over itself.
/etc/rc.local:
#!/bin/bash
##
#add pre-puppeting stuff here, I add the hostname in "User-data" when creating the VM so I can set the hostname before checking in
##
/usr/bin/puppet agent --test
/bin/cp -f /etc/rc.local.orig /etc/rc.local
/sbin/init 6

AWS CloudFormation is one of Amazon's recommended ways to provision servers (and other cloud resources, too). You declare all the resources you need in a JSON file, and specify how to provision each server by declaring packages to install, services to run, files to create, and commands to run when the server is created. See the user guide for more information. I also wrote a couple of blog posts about getting started with it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js