aws s3 sync not working as expected - amazon-web-services

I'm trying to set up a one-way directory sync process from one local PC to an AWS EC2 instance via S3.
Both machines are Windows.
I tried using the command line interface.
On the local machine:
aws s3 sync source_dir s3://bucket --region eu-central-1
This command seems to to work well. If there is nothing new, nothing is sync'ed. So far so good.
On the AWS instance:
aws s3 sync s3://bucket target_dir --region eu-central-1
With this command, I have a an issue. Whenever I run it, there is always something to download (it seems to be always the same set of files, perhaps they are all of them, but it seems a subset of them). My expectation was that once in sync, running the command again produced no downloads.
I granted these permissions in the policy:
"Action": [
"s3:GetObject",
"s3:GetObjectAcl",
"s3:ListBucket",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": [
"arn:aws:s3:::bucket_name",
"arn:aws:s3:::bucket_name/*"
]
Am I missing anything in this setup so that I do not get files downloaded if there is nothing to download when I run the second sync?

You appear to be doing two separate syncs:
From a local machine to Amazon S3
From Amazon S3 to an Amazon EC2 instance
The problem might be related to timestamps. Amazon EC2 instances always operate as UTC. This might be different to the originating local machine.
If you run the S3->EC2 sync and then run it again immediately, there should be no files copied the second time. If files ARE copied, try updating your AWS CLI to the latest version. If problems persist, try syncing from EC2->S3 and then try S3->EC2 again.

Related

AWS Cloudformation Windows 2016 EC2 S3 silent install

I have architecture created using CloudFormation utilizing Windows 2016 EC2 server and S3, written in JSON. I have 7 executables uploaded onto my S3 bucket. I can manually silently install everything from a Powershell for AWS prompt, once I Remote into the EC2. I can do it one at a time, and even have it in a .ps1 file and run it in Powershell for AWS and it runs correctly.
I am now trying to get this to install silently when the EC2 instance is created. I just can't do it and I can't understand why. The JSON code looks correct. As you can see, I first download everything from the S3 bucket, switch to the c:\TEMP directory where they were all downloaded, then run the executables in unattended install mode. I don't get any errors in my CloudFormation template. It runs "successfully." The problem is that nothing happens. Is it a permissions thing? Any help is welcome and appreciated. Thanks!
Under the AWS::EC2::Instance section I have the UserData section looking something like this (I shortened the executable names below):
"UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [
"<powershell>\n",
"copy-S3Object -BucketName mySilentInstallBucket -KeyPrefix * -LocalFolder c:\\TEMP\\",
"\n",
"cd c:\\TEMP\\",
"\n",
"firefox.exe -S ",
"\n",
"notepadpp.exe /S",
"\n",
"Git.exe /SILENT",
"\n",
"</powershell>"
]]}}
This troubleshooting doc will cover the various reasons you may not be able to connect to S3: https://aws.amazon.com/premiumsupport/knowledge-center/ec2-instance-access-s3-bucket/
To connect to your S3 buckets from your EC2 instances, you need to do
the following:
Create an AWS Identity and Access Management (IAM) profile role that grants access to Amazon S3.
Attach the IAM instance profile to the instance.
Validate permissions on your S3 bucket.
Validate network connectivity from the EC2 instance to Amazon S3.
Validate access to S3 buckets.
The CloudFormation template won't fail based on UserData execution exceptions.

How do I get a Docker Swarm manager to pull images from AWS ECR using IAM Role permissions?

I'm having trouble pulling images from AWS ECR, running Docker Swarm. It's been working ok for years, but my swarm manager nodes were changed to new EC2 instances. Now my services fail to deploy:
~ $ docker stack deploy -c dkr_compose_geo_site:3.2.0 --with-registry-auth geo_stack
The manager node log shows "no basic auth credentials":
May 19 21:21:12 ip-172-31-3-108 root: time="2020-05-19T21:21:12.857007050Z" level=error msg="pulling image failed" error="Get https://445523.dkr.ecr.us-west-2.amazonaws.com/v2/geo_site/manifests/sha256:da5820742cd0ecd52e3a2c61179a039ce80996564604b70465e3966087380a09: no basic auth credentials" module=node/agent/taskmanager node.id=eix8c6orbunemismg03ib1rih service.id=smilb788pets7y5rgbu3aze9l task.id=zd3ozdpr9exphwlz318pa9lpe
May 19 21:21:12 ip-172-31-3-108 root: time="2020-05-19T21:21:12.857701347Z" level=error msg="fatal task error" error="No such image: 445523.dkr.ecr.us-west-2.amazonaws.com/geo_site#sha256:da5820742cd0ecd52e3a2c61179a039ce80996564604b70465e3966087380a09" module=node/agent/taskmanager node.id=eix8c6orbunemismg03ib1rih service.id=smilb788pets7y5rgbu3aze9l task.id=zd3ozdpr9exphwlz318pa9lpe
This manager node is running on an EC2 Instance with an IAM Role; the IAM Role has an ECR policy that appears to grant permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
From reading the AWS/Docker docs, I thought docker commands run on a manager node should adopt the Instance IAM Role and access the ECR repo using the associated policy permissions. It's always seemed to work that way, but now it's looking like there might have been some config file hidden on the old manager node; I'm on a new instance and it doesn't work. I don't run an AWS-CLI on these manager nodes, so there's no aws ecr get-login to login manually. How do I get this new manager node to authenticate with ECR?
Thanks!
My solution, based on comment by Luigi Lopez and amazon-ecr-credential-helper:
The AWS IAM Role allows authentication, but the docker cli must still present credentials to the ECR, as Luigi pointed out in his comment.
This is a Docker Swarm implementation, with nodes running the Alpine OS. There is an aws-cli package available for Alpine, but the installation took a lot of fussing around and in the end the binary crashed anyway.
The Amazon ECR Credential Helper is a better long-term solution in any case because you don't need to get new tokens every 12 hours or set up a proxy server, etc. It uses the recommended IAM Role authentication, with no credentials stored on the machine or leaking into log files.
So under Alpine I followed the instructions in the link above to build from sources.
I installed go, git, and make, and then built the credential-helper as described. I set up the PATH as described, created a config file, and then my deployment worked. There's no docker login required.

AWS Elastic Beanstalk and Secret Manager

Does anyone know is it possible to pass a secret value as an environment variable in elastic beanstalk?
The alternative obviously is to use the sdk in our codebase but I want to explore the environment variable approach first
Cheers
Damien
Per #Ali's answer, it is not built-in at this point. However, it is relatively easy to use .ebextensions and the AWS cli. Here is an example that extracts a secret to a file, according to an MY_ENV environment variable. This value could then be set to an environment variable, but keep in mind environment variables are specific to the shell. You'd need to pass them in to anything you are launching.
10-extract-htpasswd:
env:
MY_ENV:
"Fn::GetOptionSetting":
Namespace: "aws:elasticbeanstalk:application:environment"
OptionName: MY_ENV
command: |
aws secretsmanager get-secret-value --secret-id myproj/$MY_ENV/htpasswd --region=us-east-1 --query=SecretString --output text > /etc/nginx/.htpasswd
chmod o-rwx /etc/nginx/.htpasswd
chgrp nginx /etc/nginx/.htpasswd
This also requires giving the EB service role IAM permissions to the secrets. i.e. A policy like:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "xxxxxxxxxx",
"Effect": "Allow",
"Action": "secretsmanager:GetSecretValue",
"Resource": "arn:aws:secretsmanager:us-east-1:xxxxxxxxxxxx:secret:myproj*"
}
]
}
As above answers mention, there is still no build-in solution if you want to do this in Elastic Beanstalk. However a work around solution is to use "platform hook". Unfortunately it is poorly documented at this point.
To store your secret, best solution is to create a custom secret in AWS-Secret-Manager. In secret manager you can create a new secret by clicking "Store a new secret", then selecting "Other type of secret" and entering your secret key/value (see ). At the next step you need to provide a Secret Name (say "your_secret_name") and you can leave everything else to their default settings.
Then, you need to allow Elastic Beanstalk to get this secret. You can do it by creating a new IAM policy, for instance with this content:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Getsecretvalue",
"Effect": "Allow",
"Action": [
"secretsmanager:GetResourcePolicy",
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecretVersionIds"
],
"Resource": "your-secret-arn"
}
]}
You need to replace "your-secret-arn" with your secret ARN which you can get on AWS-secret-manager interface. Then, you need to add the policy you created to EB roles (it should be either "aws-elasticbeanstalk-ec2-role" or "aws-elasticbeanstalk-service-role").
Finally you need to add a hook file in your application. From the root of your application location should be ".platform/hooks/prebuild/your_hook.sh". Content of your file can be something like this:
#!/bin/sh
export your_secret_key=$(aws secretsmanager get-secret-value --secret-id your-secret-name --region us-east-1 | jq -r '.SecretString' | jq -r '. your_secret_key')
touch .env
{
printf "SECRET_KEY=%s\n" "$your_secret_key"
# printf whatever other variable you want to pass
} >> .env
Obviously you need to replace "your_secret_name" and the other variable by your own values and set the region to the region where your secret is stored (if it is not us-east-1). And don't forget to make it executable ("chmod +x your_hook.sh").
This assumes that your application can load its env from a .env file (which works fine with docker / docker-compose for example).
Another option is to store the variable in an ".ebextensions" config file but unfortunately it doesn't seem to work with the new Amazon Linux 2 platform. What's more you should not store sensitive information such as credentials directly in your application build. Builds of the application can be accessed by anyone with Elastic Beanstalk Read Access and they are also store unencrypted on S3.
With the hook approach, the secret is only stored locally on your Elastic Beanstalk underlying EC2 instances, and you can (should!) restrict direct SSH access to them.
Unfortunately, EB doesn't support secrets at this point, this might be added down the road. You can use them in your environment variables as the documentation suggests but they will appear in plain text in the console. Another, and IMO better, approach would be to use ebextensions, and use AWS CLI commands to grab secrets from the secrets manager, which needs some set up (e.g. having AWS CLI installed and having your secrets stored in SM). You can set these as environment variables in the same eb configuration. Hope this helps!
I'm just adding to #kaliatech's answer because while very helpful, it had a few gaps that left me unable to get this working for a few days. Basically you need to add a config file to the .ebextensions directory of your EB app, which uses a container_commands section retrieve your secret (in JSON format) and output it as a .env. file into the /var/app/current directory of the EC2 instances where your app's code lives:
# .ebextensions/setup-env.config
container_commands:
01-extract-env:
env:
AWS_SECRET_ID:
"Fn::GetOptionSetting":
Namespace: "aws:elasticbeanstalk:application:environment"
OptionName: AWS_SECRET_ID
AWS_REGION: {"Ref" : "AWS::Region"}
ENVFILE: .env
command: >
aws secretsmanager get-secret-value --secret-id $AWS_SECRET_ID --region $AWS_REGION |
jq -r '.SecretString' |
jq -r 'to_entries|map("\(.key)=\(.value|tostring)")|.[]' > $ENVFILE
Note: this assumes the AWS_SECRET_ID is configured in the app environment, but it can easily be hardcoded here as well.
All the utils needed for this script to work are already baked into the EC2 Linux image, but you'll need to grant permissions to the IamInstanceProfile role (usually named aws-elasticbeanstalk-ec2-role) which is assumed by EC2 to allow it access SecretManager:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "SecretManagerAccess",
"Effect": "Allow",
"Action": "secretsmanager:GetSecretValue",
"Resource": "arn:aws:secretsmanager:ap-southeast-2:xxxxxxxxxxxx:secret:my-secret-name*"
}
]
}
Finally, to debug any issues encountered during EC2 instance bootstrap, download the EB logs and check the EC2 log files at /var/log/cfn-init.log and /var/log/cfn-init-cmd.log.
This answer only applies if you're using code pipeline.
I think you can add a secret in the environment variables section now
If you use AWS CodeBuild use pre_build, add the following commands in your project's buildspec.yml to retrieve your environment variables from AWS Secrets Manager use sed to do some substituting/formatting and append them to .ebextensions/options.config's aws:elasticbeanstalk:application:environment namespace:
phases:
pre_build:
commands:
- secret=$(aws secretsmanager get-secret-value --secret-id foo-123 --region=bar-xyz --query=SecretString --output text)
- regex=$(cat ./sed_substitute)
- echo $secret | sed "${regex}" >> .ebextensions/options.config
Bit of a hack but the sed_substitute used in the commands above used to get the correct indentation/formatting that .ebextensions/options.config demands was:
s/",/\n /g; s/":/": /g; s/{"/ /g; s/"}//g; s/"//g;

Problems mounting a S3 bucket with s3fs

I am trying to mount a S3 bucket on an AWS EC2 instance following this instruction. I was able to install the dependencies via yum, followed by cloning the git repository, and then making and installing the s3fs tool.
Furthermore, I ensured my AWSACCESSKEYID and AWSSECRETACCESSKEY values were in several locations (because I could not get the tool to work and searching for an answer suggest placing the file in different locations).
~/.passwd-s3fs
/etc/.passwd-s3fs
~/.bash_profile
For the .passwd-s3fs I have set the permissions as follows.
chmod 600 ~/.passwd-s3fs
chmod 640 /etc/.passwd-s3fs
Additionally, the .passwd-s3fs files have the content as suggested in this format: AWSACCESSKEYID:AWSSECRETACCESSKEY.
I have also logged out and in just to make sure the changes take effect. When I execute this command /usr/bin/s3fs bucketname /mnt, I get the following response.
s3fs: MOUNTPOINT: /mnt permission denied.
When I run the same command with sudo, e.g. sudo /usr/bin/s3fs mybucket /mnt, I get the following message.
s3fs: could not determine how to establish security credentials.
I am using s3fs v1.84 on the following AMI ami-0ff8a91507f77f867 (Amazon Linux AMI 2018.03.0.20180811 x86_64 HVM GP2). From the AWS Console for S3, my bucket's name is NOT mybucket but something just as simple (I am wondering if there's anything special I have to do with naming).
Additionally, my AWS access and secret key pair is generated from the IAM web interface and placed into the admin group (having AdministratorAccess policy) defined below.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
Any ideas on what's going on? Did I miss a step?
After tinkering a bit, I found the following helps.
/usr/bin/s3fs mybucket /mnt -o passwd_file=.passwd-s3fs -o allow_other
Note that I specify the .passwd-s3fs file's location. And also note that I allow others to view the mount. Additionally, I had to modify /etc/fuse.conf to enable user_allow_other.
# mount_max = 1000
user_allow_other
To test, I typed in touch /mnt/README.md and then observed the file in my S3 bucket (web UI).
I am a little disappointed that this problem is not better documented. I would have expected the default home location or /etc to be where the .passwd-s3fs file would be looked by the tool, but that's not the case. Additionally, sudo (as suggested by a link I did not bookmark) forces the tool to look in ~/home/root, which does not exists.
For me it was mismatch in IAM profile while mounting and IAM profile of ec2 server.
EC2 was launched with role2 and I was mouting with
/usr/local/bin/s3fs -o allow_other mybucket /mnt/s3fs/mybucketfolder -o iam_role='role1'
which dit not through any error but did not mounted.
PS
I do not have any access keys or s3 password file on ec2 server.

How to download Zeppelin Notebook from AWS EMR

I am running a pre-installed Zeppelin Sandbox on AWS EMR 4.3 with Spark.
I've created a Notebook on Zeppelin (on the EMR cluster) and I now want to export that notebook so that I can quickly run it the next time I spin up an EMR cluster.
It turns out that Zeppelin doesn't support the export of a notebook as yet (?).
This is fine because apparently, if you can access the folder Zeppelin is 'installed' in, then you can save the folder containing the notebook and then presumably place the folder in a Zeppelin installation on another computer to access the notebook.
(All this is from http://fedulov.website/2015/10/16/export-apache-zeppelin-notebooks/)
Trouble is I can't find where the 'Installation folder' for Zeppelin is on EMR.
ps - 'Installation Folder' may be slightly incorrect, according to the post above I should be looking in /opt/zeppelin, which doesn't exist in the Master of my EMR cluster.
Edit: Now Zeppelin supports export of the notebook in json format from the web interface itself ! There is a small icon on the center top of the page which allows you to export the notebook.
Zeppelin Notebooks can be found under /var/lib/zeppelin/notebook in an AWS EMR cluster with Zeppelin Sandbox. The notebooks are contained within folders in this directory.
These folders have random names and do not correspond to the name of the Notebook.
ls /var/lib/zeppelin/notebook/
2A94M5J1Y 2A94M5J1Z 2AZU1YEZE 2B3D826UD
There's a note.json file within each folder (which represents a Notebook) that contains the name of the Notebook and all other details.
To export a Notebook choose the notebook folder which corresponds to the notebook you are looking for copy the folder onto the new Zeppelin installation you want the notebook to be available in.
The above instructions are from: http://fedulov.website/2015/10/16/export-apache-zeppelin-notebooks/
Just that in an AWS setup the Zeppelin notebooks will be found in /var/lib/zeppelin/notebook
Other solution will be creating a step in your EMR cluster to backup all your Notebooks due going one per one is a bit tedious.
s3://{s3_bucket}/notebook/notebook_backup.sh
#!/bin/bash
# - Upload Notebooks backups.
aws s3 cp /var/lib/zeppelin/notebook/ s3://{s3_bucket}/notebook/`date +"%Y/%m/%d"` --recursive
# - Update latest folder with latest Notebooks versions.
aws s3 rm s3://{s3_bucket}/notebook/latest --recursive
aws s3 cp /var/lib/zeppelin/notebook/ s3://{s3_bucket}/notebook/latest --recursive
Then in your EMR add a Step to run your own script.
s3://elasticmapreduce/libs/script-runner/script-runner.jar will allow you to run scripts from S3.
Zeppelin release (0.5.6) and later, which is included in Amazon EMR release 4.4.0 and later supports using a Configuration json file to set the notebook storage.
https://aws.amazon.com/blogs/big-data/import-zeppelin-notes-from-github-or-json-in-zeppelin-0-5-6-on-amazon-emr/
You need to create a directory in an S3 bucket called /user/notebook
(user is the name as per the config below)
So if your S3 bucket is
S3://my-zeppelin-bucket-name
You need:
S3://my-zeppelin-bucket-name/user/notebook
and in the below config you don't include the S3:// prefix
You save this as .json file and then store it in an S3 bucket, and when you go to launch your cluster, there's a section for Configuration where you point it to this file. Then when the cluster launches, the pieces of the configuration are injected into various configs for different hadoop tools on EMR. In this case the zeppelin-env is going to be edited at launch, prior to it installing Zeppelin.
Once you've run a cluster once, you can then clone it and it will remember this config, or use cloudformation or something like ansible to script this so your clusters always start up with storage of notebooks on S3.
[
{
"Classification": "zeppelin-env",
"Properties": {
},
"Configurations": [
{
"Classification": "export",
"Properties": {
"ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
"ZEPPELIN_NOTEBOOK_S3_BUCKET":"my-zeppelin-bucket-name",
"ZEPPELIN_NOTEBOOK_USER":"user"
},
"Configurations": [
]
}
]
}
]