AWS: Mount S3 Bucket to an EC2 instance. (Later FTP Tunneling) - amazon-web-services

what do I want to do?
Step1: Mount a S3 Bucket to an EC2 Instance.
Step2: Install a FTP Server on the EC2 Instance and tunnel ftp-requests to files in the bucket.
What did I do so far?
create bucket
create security group with open input ports (FTP:20,21 - SSH:22 - some more)
connect to ec2
And the following code:
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/s3fs/s3fs-1.74.tar.gz
tar -xvzf s3fs-1.74.tar.gz
yum update all
yum install gcc libstdc++-devel gcc-c++ fuse fuse-devel curl-devel libxml2-devel openssl-devel mailcap
cd s3fs-1.74
./configure --prefix=/usr
make
make install
vi /etc/passwd-s3fs # set access:secret keys
chmod 640 /etc/passwd-s3fs
mkdir /s3bucket
cd /s3bucket
And cd anwers: Transport endpoint is not connected
Dunno what's wrong. Maybe I am using the wrong user? But currently I only have one user (for test reasons) except for root.
Next step would be the ftp tunnel, but for now I'd like getting this to work.

I followed these instructions now. https://github.com/s3fs-fuse/s3fs-fuse
I guess they are calling the API in background too, but it works as I wished.

One possible solution to mount S3 to an EC2 instance is to use the new file gateway.
Check out this:
https://aws.amazon.com/about-aws/whats-new/2017/02/aws-storage-gateway-supports-running-file-gateway-in-ec2-and-adds-file-share-security-options/
http://docs.aws.amazon.com/storagegateway/latest/userguide/WhatIsStorageGateway.html

Point 1
Whilst the other answerer is correct in saying that S3 is not built for this, it's not true to say a bucket cannot be mounted (I'd seriously consider finding a better way to solve your problem however).
That being said, you can use s3fuse to mount S3 buckets within EC2. There's plenty of good reasons not to do this, detailed here.
Point 2
From there it's just a case of setting up a standard FTP server, since the bucket now appears to your system as if it is any other file system (mostly).
vsftpd could be good choice for this. I'd have a go at both and then post separate questions with any specific problems you run into, but this should give you a rough outline to work from. (Well, in reality I'd have a go at neither and use S3 via app code consuming the API, but still).

Related

AWS EC2 User Data not working (Tried Installing and starting httpd via User Data)

The Following is my EC2 User Data:
#!/bin/bash
sudo yum update -y
sudo yum install -y httpd
sudo systemctl start httpd
sudo systemctl enable httpd
In Security Group SSH 22 Port and HTTP 80 Port is Open.
Yet when I try accessing http://public_ip_of_instance the HTTP Apache page doesn't load.
Also, on the Instance Apache is not installed when I checked sudo systemctl status httpd.
I then manually tried it on the EC2 Server and it worked. Then I removed it through yum remove as I wanted to see whether User Data works.
I stopped the Instance and started again but I observed that the User Data Script doesn't work as I am unable to access http page through browser and also on Instance http is not installed.
Where is the actual issue? Some months back this same thing worked on another instance I remember.
Your user data is correct. Whatever is happening with your website is not due to the user data code that you provided.
There could be many reasons it does not work. Public IP of the instance has changed, as always happens when you stop/start the instance. Instance may have per-existing software that clashes with httpd.
Here's some general advice on running UserData once or each startup.
Short answer as John mentioned in the comments EC2's only run the UserData (aka Bootstrap) script once on initalization.
The user data Bash/Powershell is Infrastructure-As-Code. You deploy the script and it installs and configures the machine.
This causes confusion with everyone starting AWS. When you think about it though it doesn't make sense to run the UserData script each time when the PCs already been configured.
What people do often instead is make "Golden Images" (aka Amazon Machine Images - AMI's) of pre-setup EC2s, typically for PCs that take long time to install/configure. The beauty of this is you can setup AutoScaleGroups to use the images which saves any long installation during a scale up event.
Pro Tip: When developing an UserData script run through and test it manually on the EC2. Trust me its far quicker than troubleshooting unattended EC2 UserData errors.
Long answer: you can run the UserData on each boot of the machine using Mime multi-part file. A mime multi-part file allows your script to override how frequently user data is run in the cloud-init package.
https://aws.amazon.com/premiumsupport/knowledge-center/execute-user-data-ec2/
For all those who will run into this problem, first of all check the log with the command:
sudo cat /var/log/cloud-init-output.log
then if you notice connection errors to the various repositories, the reason is because you don't have an internet connection. However, if once inside your EC2 you manage to launch the update and install commands, then the reason why they fail in the UserData is because your EC2 takes a few seconds to get the Internet connection and executes the commands before having it. So to solve this problem, just add this command after #!/bin/bash
#!/bin/bash
until ping -c1 8.8.8.8 &>/dev/null; do :; done
sudo yum update -y
...
This will prevent your EC2 from executing commands before an internet connection is established

How do I download files within a Sagemaker notebook instance programatically?

We have a notebook instance within Sagemaker which contains many Jupyter Python scripts. I'd like to write a program which downloads these various scripts each day (i.e. so that I could back them up). Unfortunately I don't see any reference to this in the AWS CLI API.
Is this achievable?
It's not exactly that you want, but looks like VCS can fit your needs. You can use Github(if you already use it) or CodeCommit(free privat repos) Details and additional ways like sync target dir with S3 bucket - https://aws.amazon.com/blogs/machine-learning/how-to-use-common-workflows-on-amazon-sagemaker-notebook-instances/
Semi automatic way:
conda install -y -c conda-forge zip
!zip -r -X folder.zip folder-to-zip
Then download that zipfile.

How to sync folders in Ubuntu 16.04 on AWS

I want to do a two directional sync between /var/www/html/data and /var/www/data so that if there are changes on any of those folder they are automatically the same automatically. what's the best way to do this?
The easiest way would be to create a symbolic link.
ln -s /var/www/html/data /var/www/data
This will basically expose the same content in different paths, but the content is the same. In many cases when deploying web apps normally a symlink is created after the deploy process has finished.
If you want duplicated copies you could use rsync.
rsync -av /var/www/html/data /var/www/data

How can I upload files to Amazon EC2 instance?

I managed to configure my website on a Linux ec2 instance with Drupal. But I don't know where I need to modify the files of the server. I already have a fully functional website on my local host and would like to upload it in my ec2 instance.
Can I upload my site somewhere in Drupal? I also tried without Drupal, I installed Apache, and everything but I can't add files on /var/www/ folder because I don't have the necessary permission.
Can you please give me some suggestions or tutorials that might help me?
You can change file permissions using terminal command.
As super user use:
chmod -R 777 var/www/
The -R makes it recursive. For security reasons it's not a good practice to give everyone access to var/www folder. Really consider do you want your filesystem to be so accessible.
My suggestion is to make sub folder and temporary give it full access while you migrate your site from local server.
chmod -R 777 var/www/folder_for_your_drupal_site
After you done with your local site migration you should change back permissions on Drupal default settings (pay attention on settings.php file).
For further info related with migration of local Drupal site check my answer here.
Hope this helps.

Vagrant Rsync Error before provisioning

So I'm having some adventures with the vagrant-aws plugin, and I'm now stuck on the issue of syncing folders. This is necessary to provision the machines, which is the ultimate goal. However, running vagrant provision on my machine yields
[root#vagrant-puppet-minimal vagrant]# vagrant provision
[default] Rsyncing folder: /home/vagrant/ => /vagrant
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!
mkdir -p '/vagrant'
I'm almost positive the error is caused because ssh-ing manually and running that command yields 'permission denied' (obviously, a non-root user is trying to make a directory in the root directory). I tried ssh-ing as root but it seems like bad practice. (and amazon doesn't like it) How can I change the folder to be rsynced with vagrant-aws? I can't seem to find the setting for that. Thanks!
Most likely you are running into the known vagrant-aws issue #72: Failing with EC2 Amazon Linux Images.
Edit 3 (Feb 2014): Vagrant 1.4.0 (released Dec 2013) and later versions now support the boolean configuration parameter config.ssh.pty. Set the parameter to true to force Vagrant to use a PTY for provisioning. Vagrant creator Mitchell Hashimoto points out that you must not set config.ssh.pty on the global config, you must set it on the node config directly.
This new setting should fix the problem, and you shouldn't need the workarounds listed below anymore. (But note that I haven't tested it myself yet.) See Vagrant's CHANGELOG for details -- unfortunately the config.ssh.pty option is not yet documented under SSH Settings in the Vagrant docs.
Edit 2: Bad news. It looks as if even a boothook will not be "faster" to run (to update /etc/sudoers.d/ for !requiretty) than Vagrant is trying to rsync. During my testing today I started seeing sporadic "mkdir -p /vagrant" errors again when running vagrant up --no-provision. So we're back to the previous point where the most reliable fix seems to be a custom AMI image that already includes the applied patch to /etc/sudoers.d.
Edit: Looks like I found a more reliable way to fix the problem. Use a boothook to perform the fix. I manually confirmed that a script passed as a boothook is executed before Vagrant's rsync phase starts. So far it has been working reliably for me, and I don't need to create a custom AMI image.
Extra tip: And if you are relying on cloud-config, too, you can create a Mime Multi Part Archive to combine the boothook and the cloud-config. You can get the latest version of the write-mime-multipart helper script from GitHub.
Usage sketch:
$ cd /tmp
$ wget https://raw.github.com/lovelysystems/cloud-init/master/tools/write-mime-multipart
$ chmod +x write-mime-multipart
$ cat boothook.sh
#!/bin/bash
SUDOERS_FILE=/etc/sudoers.d/999-vagrant-cloud-init-requiretty
echo "Defaults:ec2-user !requiretty" > $SUDOERS_FILE
echo "Defaults:root !requiretty" >> $SUDOERS_FILE
chmod 440 $SUDOERS_FILE
$ cat cloud-config
#cloud-config
packages:
- puppet
- git
- python-boto
$ ./write-mime-multipart boothook.sh cloud-config > combined.txt
You can then pass the contents of 'combined.txt' to aws.user_data, for instance via:
aws.user_data = File.read("/tmp/combined.txt")
Sorry for not mentioning this earlier, but I am literally troubleshooting this right now myself. :)
Original answer (see above for a better approach)
TL;DR: The most reliable fix is to "patch" a stock Amazon Linux AMI image, save it and then use the customized AMI image in your Vagrantfile. See below for details.
Background
A potential workaround is described (and linked in the bug report above) at https://github.com/mitchellh/vagrant-aws/pull/70/files. In a nutshell, add the following to your Vagrantfile:
aws.user_data = "#!/bin/bash\necho 'Defaults:ec2-user !requiretty' > /etc/sudoers.d/999-vagrant-cloud-init-requiretty && chmod 440 /etc/sudoers.d/999-vagrant-cloud-init-requiretty\nyum install -y puppet\n"
Most importantly this will configure the OS to not require a tty for user ec2-user, which seems to be the root of the problem. I /think/ that the additional installation of the puppet package is not required for the actual fix (although Vagrant may use Puppet for provisioning the machine later, depending on how you configured Vagrant).
My experience with the described workaround
I have tried this workaround but Vagrant still occasionally fails with the same error. It might be a "race condition" where Vagrant happens to run its rsync phase faster than cloud-init (which is what aws.user_data is passing information to) can prepare the workaround for #72 on the machine for Vagrant. If Vagrant is faster you will see the same error; if cloud-init is faster it works.
What will work (but requires more effort on your side)
What definitely works is to run the command on a stock Amazon Linux AMI image, and then save the modified image (= create an image snapshot) as a custom AMI image of yours.
# Start an EC2 instance with a stock Amazon Linux AMI image and ssh-connect to it
$ sudo su - root
$ echo 'Defaults:ec2-user !requiretty' > /etc/sudoers.d/999-vagrant-cloud-init-requiretty
$ chmod 440 /etc/sudoers.d/999-vagrant-cloud-init-requiretty
# Note: Installing puppet is mentioned in the #72 bug report but I /think/ you do not need it
# to fix the described Vagrant problem.
$ yum install -y puppet
You must then use this custom AMI image in your Vagrantfile instead of the stock Amazon one. The obvious drawback is that you are not using a stock Amazon AMI image anymore -- whether this is a concern for you or not depends on your requirements.
What I tried but didn't work out
For the record: I also tried to pass a cloud-config to aws.user_data that included a bootcmd to set !requiretty in the same way as the embedded shell script above. According to the cloud-init docs bootcmd is run "very early" in the startup cycle for an EC2 instance -- the idea being that bootcmd instructions would be run earlier than Vagrant would try to run its rsync phase. But unfortunately I discovered that the bootcmd feature is not implemented in the outdated cloud-init version of current Amazon's Linux AMIs (e.g. ami-05355a6c has cloud-init 0.5.15-69.amzn1 but bootcmd was only introduced in 0.6.1).