S3 versioning with django-storages - django

I use django-storages in my app, and I want to use S3 versioning just out of the box. I mean, I want to store and retrieve different versions of the same file without implementing any additional mechanism.
By reading the docs, I understand that:
Retrieving an object is as easy as adding a version=XX parameter to the GET request.
Uploading an object is handled by S3 itself. You just need to configure versioning in your bucket
But, going to the code. If I have this:
from django.db import models
from django_s3_storage.storage import S3Storage
storage = S3Storage(aws_s3_bucket_name='test_bucket')
class Document(models.Model):
name = models.CharField(max_length=255)
s3_file = models.FileField(storage=storage)
How can I get one specific version of a Document? Something like:
doc = Document.objects.get(pk=XX)
doc.s3_file.read(version=XXXXXXXXXX) # Something like this?
I've been reading the official documentation, but can't find how to:
Get available versions of an object
Retrieve one specific version
EDIT: Reading the source code, I understand I could use parameters in url() call, but not sure about what parameter (version?) and how to get the existent version of an object.
Any help is appreciated.

Ok, the comment of #dirkgroten was pretty accurate. So:
get-object using version-id to get a specific version of an object
list-object-versions to get all available versions of an object

Related

Django: Insert image to PostgreSQL (PgAdmin4)

Im currently working an e-shop. So my idea is to store images with Django models in PgAdmin4. As i saw in older posts methods like bytea('D:\image.jpg') and so on just converts the string constant to its binary representation.
So my question is if there is a newer method to store the actual image, or if it is possible to grab the image via a path?
models.py
image = models.ImageField(null=True, blank=True)
PgAdmin4
INSERT INTO product_images(
id, image)
VALUES (SERIAL, ?);// how to insert image?
There are several options for keeping images. The first is to use a storage service like S3, which I recommend. You can read this article for more detailed information. I can also recommend that I have used a third party package ready to use S3 with Django. If you use this option, imagefield will keep the path in S3.
Another option is if you are using only one server, you can keep the pictures in that server's local. Again imagefield will keep the path.
If you say I want to keep it directly in the database, you can follow this link. Currently, there is no newer method for it.
But I have to say that I think using a storage service like S3 is the best way under all circumstances.

Does emrfs support custom query parameters in s3 url?

Is it possible to add customer query parameters in s3 url?
We would like to add some custom meta data to S3 objects, but would like it to be transparent to EMRFS
Something like:
s3://bucket-name/object-name?x-amz-meta-tag=magic-tag
Then in our PySpark or hadoop job, we would like to write:
data.write.csv('s3://bucket-name/object-name?x-amz-meta-tag=magic-tag')
Trying this on the emrfs shows that it treats "object-name?x-amz-meta-tag=magic-tag" as the entire object name instead of ignoring the query parameters.
I can't speak for the closed source EMRFS, but for the ASF s3 connectors, the answer is "no". Interesting proposal though; maybe you should think about contributing it to the ASF. Of course, that adds a new problem: what if existing users are creating files with ? in their names —how to retain compatibility.

AWS S3 error with PFFiles after importing the exported Parse data

Looks like Parse.com stores the PFFile objects on AWS S3 and only stores a reference to the actual files on S3 in Parse for the PFFile object types.
So my problem here is I only get a link to AWS S3 link for my PFFile if I export the data using the out of the box Parse.com export functionality. After I import the same data to my Parse application, for some reason the security setting on those PFFiles on S3 is changed in a way that all PFFiles won't be accessible to me after an import due to security error.
My question is, does anyone know how the security is being set on the PFFiles? Here's a link to PFFile https://parse.com/docs/osx/api/Classes/PFFile.html but I guess this is rather an advanced topic and wasn't revealed on this page.
Also looking a solution for this, all I found is this from their forum:
In this case, the PFFiles are stored in a different app. You might
need to download these files and upload them again to the new app and
update the pointers. I know this is not a great answer but we're
working on making this process more straightforward.
https://www.parse.com/questions/import-pffile-object-not-working-in-iphone-application

python boto for aws s3, how to get sorted and limited files list in bucket?

If There are too many files on a bucket, and I want to get only 100 newest files,
How can I get only these list?
s3.bucket.list seems not to have that function. Is there anybody who know this?
please let me know. thanks.
There is no way to do this type of filtering on the service side. The S3 API does not support it. You might be able to accomplish something like this by using prefixes in your object names. For example, if you named all of your objects using a pattern like this:
YYYYMMDD/<objectname>
20140618/foobar (as an example)
you could use the prefix parameter of the ListBucket request in S3 to return only the object that were stored today. In boto, this would look like:
import boto
s3 = boto.connect_s3()
bucket = s3.get_bucket('mybucket')
for key in bucket.list(prefix='20140618'):
# do something with the key object
You would still have to retrieve all of the objects with that prefix and then sort them locally based on their last_modified_date but that would be much easier than listing all of the objects in the bucket and then sorting.
The other option would be to store metadata object the S3 objects in a database like DynamoDB and then query that database to find the objects to retrieve from S3.
You can find out more about hierarchical listing in S3 here
Can you try this code. This worked for me.
import boto,operator,time
con = boto.connect_s3()
key_repo = []
bucket = con.get_bucket('<your bucket name>')
bucket_keys = bucket.get_all_keys()
for object in bucket_keys:
t = (object.key,time.strptime(object.last_modified[:19], "%Y-%m-%dT%H:%M:%S"))
key_repo.append(t)
key_repo.sort(key=lambda item:item[1], reverse=1)
for key in key_repo[:10]: #top 10 items in the list
print key[0], ' ',key[1]
PS : I am beginner to Python so the code might not be optimized. Fell free to edit the answer to provide best code.

How to mix Django, Uploadify, and S3Boto Storage Backend?

Background
I'm doing fairly big file uploads on Django. File size is generally 10MB-100MB.
I'm on Heroku and I've been hitting the request timeout of 30 seconds.
The Beginning
In order to get around the limit, Heroku's recommendation is to upload from the browser DIRECTLY to S3.
Amazon documents this by showing you how to write an HTML form to perform the upload.
Since I'm on Django, rather than write the HTML by hand, I'm using django-uploadify-s3 (example). This provides me with an SWF object, wrapped in JS, that performs the actual upload.
This part is working fine! Hooray!
The Problem
The problem is in tying that data back to my Django model in a sane way.
Right now the data comes back as a simple URL string, pointing to the file's location.
However, I was previously using S3 Boto from django-storages to manage all of my files as FileFields, backed by the delightful S3BotoStorageFile.
To reiterate, S3 Boto is working great in isolation, Uploadify is working great in isolation, the problem is in putting the two together.
My understanding is that the only way to populate the FileField is by providing both the filename AND the file content. When you're uploading files from the browser to Django, this is no problem, as Django has the file content in a buffer and can do whatever it likes with it. However, when doing direct-to-S3 uploads like me, Django only receives the file name and URL, not the binary data, so I can't properly populate the FieldFile.
Cry For Help
Anyone know a graceful way to use S3Boto's FileField in conjunction with direct-to-S3 uploading?
Else, what's the best way to manage an S3 file just based on its URL? Including setting expiration, key id, etc.
Many thanks!
Use a URLField.
I had a similar issue where i want to store file to s3 either directly using FileField or i have an option for the user to input the url directly. So to circumvent that, i used 2 fields in my model, one for FileField and one for URLField. And in the template i could use 'or' to see which one exists and to use that like {{ instance.filefield or instance.url }}.
This is untested, but you should be able to use:
from django.core.files.storage import default_storage
f = default_storage.open('name_you_expect_in_s3', 'r')
#f is an instance of S3BotoStorageFile, and can be assigned to a field
obj, created = YourObject.objects.get_or_create(**stuff_you_know)
obj.s3file_field = f
obj.save()
I think this should set up the local pointer to s3 and save it, without over writing the content.
ETA: You should do this only after the upload completes on S3 and you know the key in s3.
Checkout django-filetransfers. Looks like it plays nice with django-storages.
I've never used django, so ymmv :) but why not just write a single byte to populate the content? That way, you can still use FieldFile.
I'm thinking that writing actual SQL may be the easiest solution here. Alternatively you could subclass S3BotoStorage, override the _save method and allow for an optional kwarg of filepath which sidesteps all the other saving stuff and just returns the cleaned_name.