Reverse SSH tunnel into AWS Batch array jobs - amazon-web-services

What exactly would it take to reverse tunnel into an AWS Batch array job from the local computer submitting the job (e.g. via AWS CLI)? Unlike the typical reverse tunneling scenario, the remote nodes here do not share a local network with the local computer. Motivation: https://github.com/mschubert/clustermq/issues/208. Related: ssh into AWS Batch jobs.
And yes, I am aware that SSH is easier in pure EC2, but Batch is preferable because of its support for arbitrary Docker images, easy job monitoring, and automatic spot pricing.

Use a Unmanaged Compute Environment. Then you can ssh into your ec2 instances as you normally would, as they are under your control. A managed compute environment means that your use of ec2 is abstracted away from you, so you cannot ssh into the underlying instances. To find out what instance a job is running on, you can use the metadata endpoint.

Related

GCP service to ssh and run a script on 10 Virtual Machines in GCE without using a bastion VM

In a GCP project, I have 10 virtual machines in GCE (runs sshd).
I have a need to run a script on each of the 10 virtual machines (in GCE) once an hour. I would like this to be centralized because number of VMs will grow over time and I do not want to have to do this on every single VM. In addition, I would want to analyze the data I get back in a central place.
However, I do not want to use a bastion VM, because I would like a cloud-native solution that does not require maintaining yet another virtual machine.
Which GCP service can do this?
I have looked into Cloud Run and Cloud Composer. I was not able to do this with Cloud Run, although that may be my own lack of familiarity with the product. Cloud Composer seems like a overkill.
As #JohnHanley mentioned, you will need to write code or scripts to launch commands on VMs dynamically because GCP doesn't have the type of service you require.
You may want to consider Cloud Identity-Aware Proxy (IAP) as it can be used for building your solution:
IAP helps to protect SSH access to your VMs without needing to provide
your VMs with public IP addresses, and without having to set up
bastion hosts.
For instance, you can check the enable IAP on Compute Engine guide.
You can also create a feature request for Google to consider implementing this solution.
Your best solution, with no additional charges, would be to :
Use a start-up script on each GCE
In order to set a CRON instruction to execute your script
crontab.guru can help you fin the CRON instruction, hourly is 0 * * * *

Can I use create a share between an EC2 instances and my local machine?

Just to give you a context... I'm new to the aws world and all the services that provides.
I have a legacy application which I need to share some binarys with a client, and I was trying to use a ec2 instance (Amazon Linux AMI) with samba, to map it into a windows local machine.
I was able to establish a conection with another ec2 instances (same vpc), just as a tryout. But I wasn't able to do so with my windows machine or even with a linux vm I have.
The inbound rules for this concept ec2 instance was fully open (All traffic allowed).
Main question
Is it possible to do? Share a file system between a ec2-instances with a (over internet) local machine?
Just saying:
S3 storage isn't an option.
And in my region FSX still ain't implemented and for latency reasons is a no go.
Please ask as many questions you want, I'll try to anwser them as fast as I Can.
Kind Regads.
TL;DR - it's possible, but there's no 'simple' solution (in my opinion).
I thought of two possible solutions that you can implement, here we go ...
1: AWS EFS, AWS Direct Connect and Docker
A possible solution would be using AWS Elastic File System (EFS), AWS Direct Connect and a Docker Linux container.
Drawbacks
If it's the first time you encounter with the above AWS services or Docker, then it's going to be a bit of a journey to learn about them
EFS pricing - it's not so cheap, and you also need to consider the inbound and outbound traffic, it's best to use the calculator that is in the pricing page
EFS performance - if you only share files then it should be okay, but if you expect to get high speeds, then remember that it's not an EBS volume, so for higher speeds you need to pay more money
AWS Direct Connect pricing - you also need to take that into consideration
Security - I'm not sure how sensitive your data is, but you need to make sure you create a very strict VPC, with Security Groups and Network Access List rules - read about the VPC Security Best Practices
Steps to implement the solution
Follow the Walkthrough: Create and Mount a File System On-Premises with AWS Direct Connect and VPN, also, here are the steps on how to combine it with Docker
(Optional) To make it a bit easier - for Windows to "support" Linux file-system, you should use Windows Git Bash. If you're not sure how to use install 3rd-party apps in Windows Git Bash (like aws-vault) then read this blog post
Create an EFS in AWS, and mount it to your EC2 instance, read more about it here
Use AWS Direct Connect to connect to your VPC from your local Windows machine
Install Docker for Windows on your local machine
Create a Docker Volume, and mount the same EFS to that volume - a good example for this step
Test it - SSH to your EC2 instance, create a file on the EFS volume and then check in your local Docker Linux container that this file appears on the EFS volume
I omitted the security steps because it's up to you how strict you want your solution to be.
2: Using S3 as a shared file-system
You can try out this tool s3fs-fuse, but you'll still need to use a Docker Linux container since you're on Windows. I haven't tested it but it looks promising. You can read this blog post, it's a step-by-step tutorial on how to do it, and also shares some other possible solutions.

Accessing files in EC2 from Lambda

I have few EC2 servers in AWS. Whenever the disk space exceeds a limit, i want to delete some files (may be logs folder) in EC2 instance automatically. I am planning to use Lambda and cloudwatch for this. Can i use Lambda to interact with EC2. If not possible, what is the alternate approach to achieve this functionality.
This is not an appropriate use-case for an AWS Lambda function.
AWS Lambda is suitable for tasks where compute is required in response to an event. Your use-case, however, is to manipulate information on an EC2 instance, which does not need cloud compute.
You could run a script on each each computer, triggered by a Scheduled Task.
Alternatively, you could use the Systems Manager Run Command (also known as the EC2 Run Command), which allows you to run commands on multiple Amazon EC2 instances and view the results. This could be used to trigger a local script, or it could pass the whole command to run (including the script). It is purpose-built for the type of task you describe.
AWS Lambda has access to your instances if they are available in the internet. If they are not available in the internet, it is possible to give access to AWS lambda using a NAT or instance Gateway in your VPC.
The problem is: access to your instance does not means access to the instances filesystems. To delete the files from Lambda you can use two alternatives:
Configure a network filesystem service in your instances an connect
to this services in your lambda function. Using windows you would
just "share" your disks, but in that case you would use some SMB
library in your lambda code, that "I think" did not have native SMB
support. Just keep in mind that your security guy will scream out
loud when you propose this alternative.
Create a "agent" in your EC2 instances and keep it running as a
Windows Service and call this agent from your lambda function. In
that case, the lambda will start the execution of the agent that
will be responsible for the file deletion.
Another option, is to follow Ramesh's suggestion and create a Powershell script and configure a cron job. To be easy, you can create a Image with this Powershell script and use the image to initialize each instance. The same solution would be applicable to "the agent" solution in the lambda alternantives.
I think that, in any case, you will need to change something in your 150 servers. Using a customized image can help you to simplify this a little bit, but you will not get a solution without some changes.
According to the following thread, you cannot access files inside a EC2 VM unless you are exposing files to the public using different methodology.
AWS Forum
Quoting from the forum
If you are talking about the underlying EC2 instance, answer is No, you cannot access those files.
However as a solution for your problem, you can used scheduled job to cleanup your files depending your usage. You can use a service or cron job.

AWS Lambda run command on EC2 instance and get result

I have an EC2 instance that is running a few processes. I also have a Lambda script that is triggered through various means. I would like this Lambda script to talk to my EC2 instance and get a list of running processes from it (Essentially run ps aux on the EC2 box, and read the output).
Now this is easy enough with just one instance and its instance-id. Just SSH in, run the command, get the output, and be on my way. However, I would like to scale this to multiple EC2 instances, for which only the instance-id is known and SSH keys may not be given.
Is such a configuration possible with Lambda and Boto (or other libraries)? Or do I just have to run a microserver on each of my instances that will reply with the given information (something I'm really trying to avoid)
You can do this easily with AWS Systems Manager - Run Command
AWS Systems Manager provides you safe, secure remote management of your instances at scale without logging into your servers, replacing the need for bastion hosts, SSH, or remote PowerShell.
Specifically:
Use the send-command API from Lambda function to get list of all processes on a group of instances. You can do this by providing a list of instances or even a tag query
You can also use CloudWatch Events to trigger a Run Command directly
I don't think there is something available out of the box for this scenario.
Instead of querying, try an alternate approach. Install an agent on all ec2 instances, which reports the required information to a central service or probably a DynamoDB table, with HashKey as InstanceId.
You may want to bake this script as a cron job, (executed probably hourly?) in the AMI itself.
With this implementation, you reduce the complexity of managing and running a separate web service on each EC2 instance.
Query the DynamoDB table on demand. There will be a lag, as data may not be real time, but you can always reduce the CRON interval per your needs.
Like Yeshodhan mentioned, There is no direct approach for this.
However, There is one more approach.
1) Save your private key file to an s3 bucket, Create a lambda function and use python fabric module to login to the remote machines from lambda function and execute commands.
The above-mentioned approach is possible but I highly recommend launching a separate machine and use a configuration management system (Preferably ansible) and get the results from remote machines.

How to understand Amazon ECS cluster

I recently tried to deploy docker containers using task definition by AWS. Along the way, I came across the following questions.
How to add an instance to a cluster? When creating a new cluster using Amazon ECS console, how to add a new ec2 instance to the new cluster. In other words, when launching a new ec2 instance, what config is needed in order to allocate it to a user created cluster under Amazon ECS.
How many ECS instances are needed in a cluster, and what are the factors?
If I have two instances (ins1, ins2) in a cluster, and my webapp, db containers are running in ins1. After I updated the running service (through http://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-service.html), I can see the newly created service is running in "ins2", before draining the old service in "ins1". My question is that after my webapp container allocated to another instance, the access IP address becomes another instance IP. How to prevent or what the solution to make the same IP address access to webapp? Not only IP, what about the data after changing to a new instance?
These are really three fairly different questions, so it might best to split them into different questions here accordingly - I'll try to provide an answer regardless:
Amazon ECS Container Instances are added indirectly, it's the job of the Amazon ECS Container Agent on each instance to register itself with the cluster created and named by you, see concepts and lifecycle for details. For this to work, you need follow the steps outlined in Launching an Amazon ECS Container Instance, be it manually or via automation. Be aware of step 10.:
By default, your container instance launches into your default
cluster. If you want to launch into your own cluster instead of the
default, choose the Advanced Details list and paste the following
script into the User data field, replacing your_cluster_name with the
name of your cluster.
#!/bin/bash
echo ECS_CLUSTER=your_cluster_name >> /etc/ecs/ecs.config
You only need a single instance for ECS to work as such, because the cluster itself is managed by AWS on your behalf. This wouldn't be sufficient for high availability scenarios though:
Because the container hosts are just regular Amazon EC2 instances, you would need to follow AWS best practices and spread them over two or three Availability Zones (AZ) so that a (rare) outage of an AZ doesn't impact your cluster, because ECS can migrate your containers to a different host instance (provided your cluster has sufficient spare capacity).
Many advanced clustering technologies that facilitate containers have their own service orchestration layers and usually require an uneven number >= 3 (service) instances for a high availability setup. You can read more about this in section Optimal Cluster Size within Administration for example (see also Running CoreOS with AWS EC2 Container Service).
This refers back to the high availability and service orchestration topics mentioned in 2. already, more precisely your are facing the problem of service discovery, which becomes more prevalent even when using container technologies in general and micro-services in particular:
To get familiar with this, I recommend Jeff Lindsay's Understanding Modern Service Discovery with Docker for an excellent overview specifically focused on your use case.
Jeff also maintains a containerized version of the increasingly popular Consul, which makes it simple for services to register themselves and to discover other services via a DNS or HTTP interface (see Running Consul in Docker and gliderlabs/docker-consul).