'foo.bar.com.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com' - django

I'm using django, and things like imgs I store at s3 (for this I'm using boto), but recently I got this error:
'foo.bar.com.s3.amazonaws.com' doesn't match either of
'*.s3.amazonaws.com', 's3.amazonaws.com'
I'm searching for a possible solution for about two days, but the unique things that is suggested is to change boto's source code, however I can't do this on production.
Edit: Using Django 1.58, Boto 2.38.0
Any help would be appreciated.
Thx in advance.

As has been said, the problem occurs in buckets containing dots on name. Below just an example that prevents this.
import boto
from boto.s3.connection import VHostCallingFormat
c = boto.connect_s3(aws_access_key_id='your-access-key',
is_secure=False,
aws_secret_access_key='your-secret-access',
calling_format=VHostCallingFormat())
b = c.get_bucket(bucket_name='your.bucket.with.dots', validate=True)
print(b)

You can use this monkey patch in your connection.py file (boto/connection.py):
import ssl
_old_match_hostname = ssl.match_hostname
def _new_match_hostname(cert, hostname):
if hostname.endswith('.s3.amazonaws.com'):
pos = hostname.find('.s3.amazonaws.com')
hostname = hostname[:pos].replace('.', '') + hostname[pos:]
return _old_match_hostname(cert, hostname)
ssl.match_hostname = _new_match_hostname
(source)
another solution is in here:

It's an known issue : #2836. It's due to dots in your bucket name.
I had this issue some days before. A user seems to have succeeded to fix this by setting :
AWS_S3_HOST = 's3-eu-central-1.amazonaws.com'
AWS_S3_CALLING_FORMAT = 'boto.s3.connection.OrdinaryCallingFormat
But it didn't work for me.
Otherwise, you can create a bucket without points (E.g : foo-bar-com). It will work. This is what I did to temporarily fix this issue.

I had the same error when I moved from python3.4 to python3.5 (boto version remained the same).
The problem was solved by moving from boto to boto3 package.
Example:
import boto3
s3_client = boto3.client("s3", aws_access_key_id=aws_access_key, aws_secret_access_key=aws_secret_key)
body = open(local_filename, 'rb').read()
resp = s3_client.put_object(ACL="private", Body=body, Bucket=your_bucket_name, Key=path_to_the_file_inside_the_bucket)
if resp["ResponseMetadata"]["HTTPStatusCode"] != 200:
raise Exception("Something went wrong...")

Related

AttributeError: 'DataTransferServiceClient' object has no attribute 'project_transfer_config_path'

Got the following code:
import time
from google.protobuf.timestamp_pb2 import Timestamp
from google.cloud import bigquery_datatransfer_v1
def runQuery (parent, requested_run_time):
client = bigquery_datatransfer_v1.DataTransferServiceClient()
projectid = '[enter your projectId here]' # Enter your projectID here
transferid = '[enter your transferId here]' # Enter your transferId here
parent = client.project_transfer_config_path(projectid, transferid)
start_time = bigquery_datatransfer_v1.types.Timestamp(seconds=int(time.time() + 10))
response = client.start_manual_transfer_runs(parent, requested_run_time=start_time)
print(response)
We used it in few different projects and cases and everything works fine. Today I deployed another function using this code and keep getting the following error:
AttributeError: 'DataTransferServiceClient' object has no attribute
'project_transfer_config_path'
What am I missing?
Thank you!
You are probably using a newer version (2.0.0 or 2.1.0) of the google-cloud-bigquery-datatransfer client library. In these versions, most utility methods have been removed, one of them being project_transfer_config_path.
You can use the method transfer_config_path of the client to achieve the same result.
I would strongly suggest that you study the Migration Guide to 2.0.0 as there might be other changes that you need to make too.
In case you are using version 2.0.0 and not 2.1.0, I would recommend upgrading to the latest since there are breaking changes between them, for example the import paths that were changed in 2.0.0 have been reverted in 2.1.0.

How to download a snappy.parquet file from s3 using Boto in Python

I'm new to this, and trying to download a snappy.parquet file from Amazon s3 I can later convert to CSV file.
I tried working with the following example I've found online, and I get an empty folder. can anyone please help me?
import boto
import sys, os
from boto.s3.key import Key
from boto.exception import S3ResponseError
DOWNLOAD_LOCATION_PATH =""
BUCKET_NAME = ""
AWS_ACCESS_KEY_ID= ""
AWS_ACCESS_SECRET_KEY = ""
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_ACCESS_SECRET_KEY)
bucket = conn.get_bucket(BUCKET_NAME)
#goto through the list of files
bucket_list = bucket.list()
for l in bucket_list:
key_string = str(l.key)
s3_path = DOWNLOAD_LOCATION_PATH + key_string
try:
print ("Current File is ", s3_path)
l.get_contents_to_filename(s3_path)
except (OSError, S3ResponseError) as e:
pass
# check if the file has been downloaded locally
if not os.path.exists(s3_path):
try:
os.makedirs(s3_path)
except OSError as exc:
# let guard againts race conditions
import errno
if exc.errno != errno.EEXIST:
raise
The script you are using appears to recursively download the contents of the specified S3 bucket (BUCKET_NAME) to the specified local directory (DOWNLOAD_LOCATION_PATH). FWIW, I notice this script looks like it comes from here.
The "Current File is ..." output line should show you the progress of these files being written. One problem you might be having is due to this line:
s3_path = DOWNLOAD_LOCATION_PATH + key_string
If you had specified DOWNLOAD_LOCATION_PATH at the top as a directory without a trailing '/' character, e.g. like this:
DOWNLOAD_LOCATION_PATH = '/tmp/my_dir'
then the files being downloaded would be written not underneath the /tmp/my_dir directory, but directly in /tmp/ with a my_dir prefix on each filename! You can fix this by changing this line to:
s3_path = os.path.join(DOWNLOAD_LOCATION_PATH, key_string)
Other than that, the script appears to work alright. You may want to add this line at the very top:
from __future__ import print_function
if you are still using Python 2.x, otherwise the print output will look a bit odd (print will think you are printing a 2-Tuple).
Your question also makes it sound like you really only want/need to download a single file from the bucket -- if so, this isn't really a great script to be using, since it's downloading everything.

django boto3: NoCredentialsError -- Unable to locate credentials

I am trying to use boto3 in my django project to upload files to Amazon S3. Credentials are defined in settings.py:
AWS_ACCESS_KEY = xxxxxxxx
AWS_SECRET_KEY = xxxxxxxx
S3_BUCKET = xxxxxxx
In views.py:
import boto3
s3 = boto3.client('s3')
path = os.path.dirname(os.path.realpath(__file__))
s3.upload_file(path+'/myphoto.png', S3_BUCKET, 'myphoto.png')
The system complains about Unable to locate credentials. I have two questions:
(a) It seems that I am supposed to create a credential file ~/.aws/credentials. But in a django project, where do I have to put it?
(b) The s3 method upload_file takes a file path/name as its first argument. Is it possible that I provide a file stream obtained by a form input element <input type="file" name="fileToUpload">?
This is what I use for a direct upload, i hope it provides some assistance.
import boto
from boto.exception import S3CreateError
from boto.s3.connection import S3Connection
conn = S3Connection(settings.AWS_ACCESS_KEY,
settings.AWS_SECRET_KEY,
is_secure=True)
try:
bucket = conn.create_bucket(settings.S3_BUCKET)
except S3CreateError as e:
bucket = conn.get_bucket(settings.S3_BUCKET)
k = boto.s3.key.Key(bucket)
k.key = filename
k.set_contents_from_filename(filepath)
Not sure about (a) but django is very flexible with file management.
Regarding (b) you can also sign the upload and do it directly from the client to reduce bandwidth usage, its quite sneaky and secure too. You need to use some JavaScript to manage the upload. If you want details I can include them here.

Collectstatic and S3: not finding updated files

I am using Amazon S3 to store static files for a Django project, but collectstatic is not finding updated files - only new ones.
I've been looking for an answer for ages, and my guess is that I have something configured incorrectly. I followed this blog post to help get everything set up.
I also ran into this question which seems identical to my problem, but I have tried all the solutions already.
I even tried using this plugin which is suggested in this question.
Here is some information that might be useful:
settings.py
...
STATICFILES_FINDERS = (
'django.contrib.staticfiles.finders.FileSystemFinder',
'django.contrib.staticfiles.finders.AppDirectoriesFinder',
'django.contrib.staticfiles.finders.DefaultStorageFinder',
)
...
# S3 Settings
AWS_STORAGE_BUCKET_NAME = os.environ['AWS_STORAGE_BUCKET_NAME']
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
S3_URL = 'http://%s.s3.amazonaws.com/' % AWS_STORAGE_BUCKET_NAME
STATIC_URL = S3_URL
AWS_PRELOAD_METADATA = False
requirements.txt
...
Django==1.5.1
boto==2.10.0
django-storages==1.1.8
python-dateutil==2.1
Edit1:
I apologize if this question is too unique to my own circumstances to be of any help to a large audience. Nonetheless - this is has been hampering my productivity for a long time and I have wasted many hours looking for solutions, so I am starting a bounty to reward anyone who can help troubleshoot this problem.
Edit2:
I just ran across a similar problem somewhere. I am in a different timezone than the location of my AWS bucket. If by default collectstatic uses time stamp, could this interfere with the process?
Thanks
I think I solved this problem. Like you, I have spent so many hours on this problem. I am also on the bug report you found on bitbucket. Here is what I just accomplished.
I had
django-storages==1.1.8
Collectfast==0.1.11
This does not work at all. Delete everything for once does not work either. After that, it cannot pick up modifications and refuse to update anything.
The problem is with our time zone. S3 will say the files it has is last modified later than the ones we want to upload. django collectstatic will not try to copy the new ones over at all. And it will call the files "unmodified". For example, here is what I see before my fix:
Collected static files in 0:00:45.292022.
Skipped 407 already synced files.
0 static files copied, 1 unmodified.
My solution is "To the hell with modified time!". Besides the time zone problems that we are solving here, what if I made a mistake and need to roll back? It will refuse to deploy the old static files and leave my website broken.
Here is my pull request to Collectfast https://github.com/FundedByMe/collectfast/pull/11 . I still left a flag so if you really want to check modified time, you can still do it. Before it got merged, just use my code at https://github.com/sunshineo/collectfast
You have a nice day!
--Gordon
PS: Stayed up till 4:40am for this. My day is ruined for sure.
After hours of digging around, I found this bug report.
I changed my requirements to revert to a previous version of Django storages.
django-storages==1.1.5
You might want to consider using this plugin written by antonagestam on Github:
https://github.com/FundedByMe/collectfast
It compares the checksum of the files, which is a guaranteed way of determining when a file has changed. It's the accepted answer at this other stackoverflow question: Faster alternative to manage.py collectstatic (w/ s3boto storage backend) to sync static files to s3?
There are some good answers here but I spent some time on this today so figured I'd contribute one more in case it helps someone in the future. Following advice found in other threads I confirmed that, for me, this was indeed caused by a difference in time zone. My django time wasn't incorrect but was set to EST and S3 was set to GMT. In testing, I reverted to django-storages 1.1.5 which did seem to get collectstatic working. Partially due to personal preference, I was unwilling to a) roll back three versions of django-storages and lose any potential bug fixes or b) alter time zones for components of my project for what essentially boils down to a convenience function (albeit an important one).
I wrote a short script to do the same job as collectstatic without the aforementioned alterations. It will need a little modifying for your app but should work for standard cases if it is placed at the app level and 'static_dirs' is replaced with the names of your project's apps. It is run via terminal with 'python whatever_you_call_it.py -e environment_name (set this to your aws bucket).
import sys, os, subprocess
import boto3
import botocore
from boto3.session import Session
import argparse
import os.path, time
from datetime import datetime, timedelta
import pytz
utc = pytz.UTC
DEV_BUCKET_NAME = 'dev-homfield-media-root'
PROD_BUCKET_NAME = 'homfield-media-root'
static_dirs = ['accounts', 'messaging', 'payments', 'search', 'sitewide']
def main():
try:
parser = argparse.ArgumentParser(description='Homfield Collectstatic. Our version of collectstatic to fix django-storages bug.\n')
parser.add_argument('-e', '--environment', type=str, required=True, help='Name of environment (dev/prod)')
args = parser.parse_args()
vargs = vars(args)
if vargs['environment'] == 'dev':
selected_bucket = DEV_BUCKET_NAME
print "\nAre you sure? You're about to push to the DEV bucket. (Y/n)"
elif vargs['environment'] == 'prod':
selected_bucket = PROD_BUCKET_NAME
print "Are you sure? You're about to push to the PROD bucket. (Y/n)"
else:
raise ValueError
acceptable = ['Y', 'y', 'N', 'n']
confirmation = raw_input().strip()
while confirmation not in acceptable:
print "That's an invalid response. (Y/n)"
confirmation = raw_input().strip()
if confirmation == 'Y' or confirmation == 'y':
run(selected_bucket)
else:
print "Collectstatic aborted."
except Exception as e:
print type(e)
print "An error occured. S3 staticfiles may not have been updated."
def run(bucket_name):
#open session with S3
session = Session(aws_access_key_id='{aws_access_key_id}',
aws_secret_access_key='{aws_secret_access_key}',
region_name='us-east-1')
s3 = session.resource('s3')
bucket = s3.Bucket(bucket_name)
# loop through static directories
for directory in static_dirs:
rootDir = './' + directory + "/static"
print('Checking directory: %s' % rootDir)
#loop through subdirectories
for dirName, subdirList, fileList in os.walk(rootDir):
#loop through all files in subdirectory
for fname in fileList:
try:
if fname == '.DS_Store':
continue
# find and qualify file last modified time
full_path = dirName + "/" + fname
last_mod_string = time.ctime(os.path.getmtime(full_path))
file_last_mod = datetime.strptime(last_mod_string, "%a %b %d %H:%M:%S %Y") + timedelta(hours=5)
file_last_mod = utc.localize(file_last_mod)
# truncate path for S3 loop and find object, delete and update if it has been updates
s3_path = full_path[full_path.find('static'):]
found = False
for key in bucket.objects.all():
if key.key == s3_path:
found = True
last_mode_date = key.last_modified
if last_mode_date < file_last_mod:
key.delete()
s3.Object(bucket_name, s3_path).put(Body=open(full_path, 'r'), ContentType=get_mime_type(full_path))
print "\tUpdated : " + full_path
if not found:
# if file not found in S3 it is new, send it up
print "\tFound a new file. Uploading : " + full_path
s3.Object(bucket_name, s3_path).put(Body=open(full_path, 'r'), ContentType=get_mime_type(full_path))
except:
print "ALERT: Big time problems with: " + full_path + ". I'm bowin' out dawg, this shitz on u."
def get_mime_type(full_path):
try:
last_index = full_path.rfind('.')
if last_index < 0:
return 'application/octet-stream'
extension = full_path[last_index:]
return {
'.js' : 'application/javascript',
'.css' : 'text/css',
'.txt' : 'text/plain',
'.png' : 'image/png',
'.jpg' : 'image/jpeg',
'.jpeg' : 'image/jpeg',
'.eot' : 'application/vnd.ms-fontobject',
'.svg' : 'image/svg+xml',
'.ttf' : 'application/octet-stream',
'.woff' : 'application/x-font-woff',
'.woff2' : 'application/octet-stream'
}[extension]
except:
'ALERT: Couldn\'t match mime type for '+ full_path + '. Sending to S3 as application/octet-stream.'
if __name__ == '__main__':
main()
I had a similar problem pushing new files to a S3 bucket (previously working well), but is not problem about django or python, on my end I fixed the issue when I deleted my local repository and cloned it again.

How do you set "Content-Type" when saving to S3 using django-storages with S3boto backend?

I am using django-storages with s3boto as a backend.
I have one bucket with two folders - one for static and one for media. I achieve this using django-s3-folder-storage.
As well as saving to S3 using a model, I also want to implement an image-resize-and-cache function to save the files to S3. To do this I interact directly with my S3 bucket. The code works, but the Content-Type isn't set on S3.
in iPython:
In [2]: from s3_folder_storage.s3 import DefaultStorage
In [3]: s3media = DefaultStorage()
In [4]: s3media
Out[4]: <s3_folder_storage.s3.DefaultStorage at 0x4788780>
Test we're accessing the right bucket - storage_test is one I created earlier:
In [5]: s3media.exists('storage_test')
Out[5]: True
In [6]: s3media.open("test.txt", "w")
Out[6]: <S3BotoStorageFile: test.txt>
In [7]: test = s3media.open("test.txt", "w")
In [8]: test
Out[8]: <S3BotoStorageFile: test.txt>
In [9]: test.key.content_type = "text/plain"
In [10]: test.write("...")
In [11]: test.close()
In [12]: test = s3media.open("test.txt", "w")
In [13]: test.key.content_type
Out[13]: 'binary/octet-stream'
I've also tried instead of In [9] using test.key.metadata and test.key.set_metadata. None of them do it.
How do I set the correct Content-Type?
If you go through the source code in class S3BotoStorageFile and function write, the header is updated from only 2 places,
upload_headers.update(self._storage.headers) where self._storage.headers is taken from AWS_HEADERS
self._storage.default_acl
And in function _flush_write_buffer only self._storage.headers is considered. Check for the line headers = self._storage.headers.copy()
So updating test.key.content_type will not work.
Instead of test.key.content_type = "text/plain" at In [9]: try using test._storage.headers['Content-Type'] = 'text/plain', it should work.
This is for Boto3 ONLY, not Boto. If you would like to set those headers, you will need to access the object like so, file_ is refereing to a FileField with storage setup to be using Boto3 from django-storages:
file_.storage.object_parameters = { 'ContentType': 'text/plain' }
NOTE: it requires header names to be camelcase, so Content-Type = ContentType, Content-Dispostion = ContentDispostion etc. Hope this helps!
Now you can use django-storages >= 1.4 and it automatically guesses the mime types.
According to this answer, the Content-Type isn't metadata but rather headers that you set up when you upload the file.
I've had a similar issue - I wanted to set my header for all the files uploaded to S3 using django-storages, without relying on the default library approach which is guessing mime type based on the filename.
Please note that you can tweak the way how the header is set and you don't have to have it fixed like I have (my case was specific).
This is what worked for me:
Implement custom file manager:
import os
from storages.backends.s3boto3 import S3Boto3Storage
class ManagedS3BotoS3Storage(S3Boto3Storage):
def _save(self, name, content):
cleaned_name = self._clean_name(name)
name = self._normalize_name(cleaned_name)
params = self._get_write_parameters(name, content)
content_type = "application/octet-stream". # Content-Type that I wanted to have for each file
params["ContentType"] = content_type
encoded_name = self._encode_name(name)
obj = self.bucket.Object(encoded_name)
if self.preload_metadata:
self._entries[encoded_name] = obj
content.seek(0, os.SEEK_SET)
obj.upload_fileobj(content, ExtraArgs=params)
return cleaned_name
Use ManagedS3BotoS3Storage in model:
class SomeCoolModel(models.Model):
file = models.FileField(
storage=ManagedS3BotoS3Storage(bucket="my-awesome-S3-bucket"),
upload_to="my_great_path_to_file",
)
Run python manage.py makemigrations.
That's it, after this all files that I had was uploaded with Content-Type: "application/octet-stream