Using aws profile with fs S3Filesystem - amazon-web-services

Trying to use a specific AWS profile when using Apache Pyarrow. The documentation show no option to pass a profile name when instantiating S3FileSystem using pyarrow fs [https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html]
Tried to get around this by creating a session with boto3 and using that :
# include mfa profile
session = boto3.session.Session(profile_name="custom_profile")
# create filesystem with session
bucket = fs.S3FileSystem(session_name=session)
bucket.get_file_info(fs.FileSelector('bucket_name', recursive=True))
but this too fails :
OSError: When listing objects under key '' in bucket 'bucket_name': AWS Error [code 15]: Access Denied
is it possible to use fs with custom aws profile ?
~/.aws/credentials :
[default]
aws_access_key_id = <access_key>
aws_secret_access_key = <secret_key>
[custom_profile]
aws_access_key_id = <access_key>
aws_secret_access_key = <secret_key>
aws_session_token = <token>
additional context : all actions of users require MFA. custom AWS profile in credentials file stores token generated post MFA based authentication on the CLI, need to use that profile in the script

I think is better this way:
session = boto3.session.Session(profile_name="custom_profile")
credentials = session.get_credentials()
s3_files = fs.S3FileSystem(
secret_key=credentials.secret_key,
access_key=credentials.access_key,
region=session.region_name,
session_token=credentials.token)

one can specify a token, but must also specify access key and secret key :
s3 = fs.S3FileSystem(access_key="",
secret_key="",
session_token="")
one would also have to implement some method to parse the ~/.aws/credentials file to get access to these values or do it manually each time

Related

Is it possible to open the AWS Management Console website from AWS CLI?

Let's say I'm logged in into my AWS CLI tool with a particular account and I can execute commands like
aws ecr describe-repositories
Is there a AWS CLI command which opens up the AWS Management Console website on the default browser, already logged in in the same account?
E.g.: something like
aws web
Thanks!
While there is no such cli command inbuilt into aws cli. You can provide users with the direct access to the AWS Management Console if they have valid STS session IAM credentials (access and secret keys). You can read about the process of using getSigninToken action to generate pre-signed AWS console URL in exchange for your IAM creds here.
The python code example
import urllib, json, sys
import requests # 'pip install requests'
import boto3 # AWS SDK for Python (Boto3) 'pip install boto3'
# Step 1: Authenticate user in your own identity system.
# Step 2: Using the access keys for an IAM user in your AWS account,
# call "AssumeRole" to get temporary access keys for the federated user
# Note: Calls to AWS STS AssumeRole must be signed using the access key ID
# and secret access key of an IAM user or using existing temporary credentials.
# The credentials can be in Amazon EC2 instance metadata, in environment variables,
# or in a configuration file, and will be discovered automatically by the
# client('sts') function. For more information, see the Python SDK docs:
# http://boto3.readthedocs.io/en/latest/reference/services/sts.html
# http://boto3.readthedocs.io/en/latest/reference/services/sts.html#STS.Client.assume_role
sts_connection = boto3.client('sts')
assumed_role_object = sts_connection.assume_role(
RoleArn="arn:aws:iam::account-id:role/ROLE-NAME",
RoleSessionName="AssumeRoleSession",
)
# Step 3: Format resulting temporary credentials into JSON
url_credentials = {}
url_credentials['sessionId'] = assumed_role_object.get('Credentials').get('AccessKeyId')
url_credentials['sessionKey'] = assumed_role_object.get('Credentials').get('SecretAccessKey')
url_credentials['sessionToken'] = assumed_role_object.get('Credentials').get('SessionToken')
json_string_with_temp_credentials = json.dumps(url_credentials)
# Step 4. Make request to AWS federation endpoint to get sign-in token. Construct the parameter string with
# the sign-in action request, a 12-hour session duration, and the JSON document with temporary credentials
# as parameters.
request_parameters = "?Action=getSigninToken"
request_parameters += "&SessionDuration=43200"
if sys.version_info[0] < 3:
def quote_plus_function(s):
return urllib.quote_plus(s)
else:
def quote_plus_function(s):
return urllib.parse.quote_plus(s)
request_parameters += "&Session=" + quote_plus_function(json_string_with_temp_credentials)
request_url = "https://signin.aws.amazon.com/federation" + request_parameters
r = requests.get(request_url)
# Returns a JSON document with a single element named SigninToken.
signin_token = json.loads(r.text)
# Step 5: Create URL where users can use the sign-in token to sign in to
# the console. This URL must be used within 15 minutes after the
# sign-in token was issued.
request_parameters = "?Action=login"
request_parameters += "&Issuer=Example.org"
request_parameters += "&Destination=" + quote_plus_function("https://console.aws.amazon.com/")
request_parameters += "&SigninToken=" + signin_token["SigninToken"]
request_url = "https://signin.aws.amazon.com/federation" + request_parameters
# Send final URL to stdout
print (request_url)
I've also written AWS plugin back in the days that does exactly that you need but it does not work with aws cli v2
https://github.com/b-b3rn4rd/awscli-console-plugin
This does not exist. Credentials used for AWS CLI and for Console access are different.
For CLI you use Access key and Secret key.
For Console access (through a web browser) you use username and password.
It is possible that in an AWS account you have programmatic access but do not have console access.

ClientError: Failed to download data. Please check your s3 objects and ensure that there is no object that is both a folder as well as a file

How are you?
I'm trying to execute a sagemaker job but i get this error:
ClientError: Failed to download data. Cannot download s3://pocaaml/sagemaker/xsell_sc1_test/model/model_lgb.tar.gz, a previously downloaded file/folder clashes with it. Please check your s3 objects and ensure that there is no object that is both a folder as well as a file.
I'm have that model_lgb.tar.gz on that s3 path as you can see here:
This is my code:
project_name = 'xsell_sc1_test'
s3_bucket = "pocaaml"
prefix = "sagemaker/"+project_name
account_id = "029294541817"
s3_bucket_base_uri = "{}{}".format("s3://", s3_bucket)
dev = "dev-{}".format(strftime("%y-%m-%d-%H-%M", gmtime()))
region = sagemaker.Session().boto_region_name
print("Using AWS Region: {}".format(region))
# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()
boto3.setup_default_session(region_name=region)
boto_session = boto3.Session(region_name=region)
s3_client = boto3.client("s3", region_name=region)
sagemaker_boto_client = boto_session.client("sagemaker") #este pinta?
sagemaker_session = sagemaker.session.Session(
boto_session=boto_session, sagemaker_client=sagemaker_boto_client
)
sklearn_processor = SKLearnProcessor(
framework_version="0.23-1", role=role, instance_type='ml.m5.4xlarge', instance_count=1
)
PREPROCESSING_SCRIPT_LOCATION = 'funciones_altas.py'
preprocessing_input_code = sagemaker_session.upload_data(
PREPROCESSING_SCRIPT_LOCATION,
bucket=s3_bucket,
key_prefix="{}/{}".format(prefix, "code")
)
preprocessing_input_data = "{}/{}/{}".format(s3_bucket_base_uri, prefix, "data")
preprocessing_input_model = "{}/{}/{}".format(s3_bucket_base_uri, prefix, "model")
preprocessing_output = "{}/{}/{}/{}/{}".format(s3_bucket_base_uri, prefix, dev, "preprocessing" ,"output")
processing_job_name = params["project_name"].replace("_", "-")+"-preprocess-{}".format(strftime("%d-%H-%M-%S", gmtime()))
sklearn_processor.run(
code=preprocessing_input_code,
job_name = processing_job_name,
inputs=[ProcessingInput(input_name="data",
source=preprocessing_input_data,
destination="/opt/ml/processing/input/data"),
ProcessingInput(input_name="model",
source=preprocessing_input_model,
destination="/opt/ml/processing/input/model")],
outputs=[
ProcessingOutput(output_name="output",
destination=preprocessing_output,
source="/opt/ml/processing/output")],
wait=False,
)
preprocessing_job_description = sklearn_processor.jobs[-1].describe()
and on funciones_altas.py i'm using ohe_altas.tar.gz and not model_lgb.tar.gz making this error super weird.
can you help me?
Looks like you are using sagemaker generated execution role and the error is related to S3 permissions.
Here are a couple of things you can do:
make sure to check the policies on the role that they have access to your bucket.
check if the objects are encrypted in your bucket, if so then ensure to also include kms policy to the role you are linking to the job. https://aws.amazon.com/premiumsupport/knowledge-center/s3-403-forbidden-error/
You can always create your own role as well and pass the arn to the code to run the processing job.

Django Storage and Boto3 not retrieving Media from AWS S3

I am using a development server to test uploading and retrieving static files from AWS S3 using Django storages and Boto3. The file upload worked but I cannot retrieve the files.
This is what I get:
And when I check out the URL in another tab I get this
**This XML file does not appear to have any style information associated with it. The document tree is shown below.**
<Error>
<Code>IllegalLocationConstraintException</Code>
<Message>The me-south-1 location constraint is incompatible for the region specific endpoint this request was sent to.</Message>
<RequestId></RequestId>
<HostId></HostId>
</Error>
Also I configured the settings.py with my own credentials and IAM user
AWS_ACCESS_KEY_ID = <key>
AWS_SECRET_ACCESS_KEY = <secret-key>
AWS_STORAGE_BUCKET_NAME = <bucket-name>
AWS_DEFAULT_ACL = None
AWS_S3_FILE_OVERWRITE = False
AWS_S3_REGION_NAME = 'me-south-1'
AWS_S3_USE_SSL = True
AWS_S3_VERIFY = False
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
Please check in your AWS Identity & Access Management Console (IAM) whether your access keys have proper S3 permissions assigned to them.
Also, make sure you have installed AWS CLI and setup your credentials in your machine.
You can try running the below command and verify it.
$ aws s3 ls
2018-12-11 17:08:50 my-bucket
2018-12-14 14:55:44 my-bucket2
Reference : https://docs.aws.amazon.com/cli/latest/userguide/cli-services-s3-commands.html

How to download data from AWS in python

I am new to AWS and boto. The data I want to download is on AWS, and I have the access key and the secret key. My problem is I do not understand the approaches I found. For instance, this code:
import boto
import boto.s3.connection
def download_data_connect_s3(access_key, secret_key, region, bucket_name, key, local_path):
conn = boto.connect_s3(aws_access_key_id = access_key,\
aws_secret_access_key = secret_key,\
host='s3-{}.amazonaws.com'.format(region),\
calling_format = boto.s3.connection.OrdinaryCallingFormat()\
)
bucket = conn.get_bucket(bucket_name)
key = bucket.get_key(key)
key.get_contents_to_filename(local_path)
print('Downloaded File {} to {}'.format(key, local_path))
region = 'us-west-1'
access_key = # the key here
secret_key = # the secret key here
bucket_name = 'temp_name'
key = '<folder…/filename>' unique identifer
local_path = # local path
download_data_connect_s3(access_key, secret_key, region, bucket_name, key, local_path)
What I don't understand is the 'key' 'bucket_name' and 'local path'. What is 'key' in comparison to access key and secret key? I was not given a 'key'. Also, is the 'bucket_name' the name of the bucket on AWS (I was not provided with the bucket name); and local path the directory where I want to save the data?
You are right.
bucket_name = name of your S3 bucket
key = is object key. It's full path of the file in side the bucket. (ex: you have a file named a.txt in folder x, so key = x/a.txt. Refer to this link
local_path = where you want to save the data in local machine
It sounds like the data is stored in Amazon S3.
You can use the AWS Command-Line Interface (CLI) to access Amazon S3.
To view the list of buckets in that account:
aws s3 ls
To view the contents of a bucket:
aws s3 ls bucket-name
To copy a file from a bucket to the current directory:
aws s3 cp s3://bucket-name/filename.txt .
Or sync a whole folder:
aws s3 sync s3://bucket-name/folder/ local-folder/

AWS Python script vs AWS CLI

I downloaded the AWS cli and was able to successfully list objects from my bucket. But doing the same from a Python script does not work. The error is forbidden error.
How should I configure the boto to use the same default AWS credentials ( as used by AWS cli )
Thank you
import logging import urllib, subprocess, boto, boto.utils, boto.s3
logger = logging.getLogger("test") formatter =
logging.Formatter('%(asctime)s %(message)s') file_handler =
logging.FileHandler("test.log") file_handler.setFormatter(formatter)
stream_handler = logging.StreamHandler(sys.stderr)
logger.addHandler(file_handler) logger.addHandler(stream_handler)
logger.setLevel(logging.INFO)
# wait until user data is available while True:
logger.info('**************************** Test starts *******************************')
userData = boto.utils.get_instance_userdata()
if userData:
break
time.sleep(5)
bucketName = ''
deploymentDomainName = ''
if bucketName:
from boto.s3.key import Key
s3Conn = boto.connect_s3('us-east-1')
logger.info(s3Conn)
bucket = s3Conn.get_bucket('testbucket')
key.key = 'test.py'
key.get_contents_to_filename('test.py')
CLI is -->
aws s3api get-object --bucket testbucket --key test.py my.py
Is it possible to use the latest Python SDK from Amazon (Boto 3)? If so, set up your credentials as outlined here: Boto 3 Quickstart.
Also, you might check your environment variable. If they don't exist, that is okay. If they don't match those on your account, then that could be the problem as some AWS SDKs and other tools with use environment variables over the config files.
*nix:
echo $AWS_ACCESS_KEY_ID && echo $AWS_SECRET_ACCESS_KEY
Windows:
echo %AWS_ACCESS_KEY% & echo %AWS_SECRET_ACCESS_KEY%
(sorry if my windows-foo is weak)
When you use CLI by default it takes credentials from .aws/credentials file, but for running bot you will have to specify access key and secret key in your python script.
import boto
import boto.s3.connection
access_key = 'put your access key here!'
secret_key = 'put your secret key here!'
conn = boto.connect_s3(
aws_access_key_id = access_key,
aws_secret_access_key = secret_key,
host = 'bucketname.s3.amazonaws.com',
#is_secure=False, # uncomment if you are not using ssl
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)