Amazon Mechanical Turk ExternalQuestions Image URL - amazon-web-services

I'm trying to make an ExternalQuestion HIT and was wondering how I could pass S3 image URLs to the hit and display them.
I considered passing the URL as a url parameter but that doesn't make sense since it's a url.
Is it possible to do something like this?
<ExternalQuestion xmlns="[the ExternalQuestion schema URL]">
<ExternalURL>http://tictactoe.amazon.com/gamesurvey.cgi?gameid=01523</ExternalURL>
<FrameHeight>400</FrameHeight>
<ImageURL1>[image_url]</ImageURL1>
<ImageURL2>[image_url]</ImageURL2>
....
</ExternalQuestion>

It looks like you're trying to do something like what the Requester User Interface does when creating a batch, which involves building an HTMLQuestion out of a template HTML page with some named variables and a CSV file containing the values for each variable.
The ExternalQuestion only contains the ExternalURL and FrameHeight parameters. If you want to display different images, those need to be on their own pages (i.e., your server needs to decide what image to display to a given worker OR you have to create multiple HITs, one for each page that contains a distinct URL).

Related

Informatica Power center-- How to pass multiple values from transformation to HTTP transformation Base URL

Trying to pass multiple values in the HTTP transformation URL(Put method) but unable to do it. URL doesn't support query strings
Example:
URL: http://example.com/**page**
need to pass multiple values into the page(please see the URL) from another transformation. URL does not support the query string values. Any Idea how to pass multiple values to the URL.
Page values like "1234", "5678", "891". So the URL will be as shown below.
http://example.com/1234,
http://example.com/5678,
http://example.com/891
Thank you
If page numbers are finite,
then you can create that many http transformations in a mapping.
If page numbers are infinite or many, you need to-
create a list of pages, create a mapping with http transformations where http://example.com/$pageno
create a shell script which will iterate through above list, create a param file with $pageno=1234 etc. and kick off above mapping with this param file.
schedule this shell script using another workflow.
HTH

How to reference other collections in postman

Is there a way to reference other collections from a specific collection. For example, if I have a file upload collection (something that uploads a file), I want to be able to use that from other collections. How would I reference the file upload?
Here's an example of what I'm talking about.
I have a collection where a file is uploaded and a calculation needs to be performed. The test or collection would go something like this where each step is a POST, GET, etc
Upload and run calculation:
Generate a token
make call
copy/save token value
Upload specific file (these would be 3 individual requests)
Upload file
Monitor upload status
Return ID of file uploaded
run calculation
use ID to pass as parameter
pass other values to set up calculation
monitor run
validate results
In another collection I need to validate uploaded files metadata is correct. Not directly related to the one above, but has some similarities
Generate a token
make call
copy/save token value
Upload specific file (these would be 3 individual requests)
Upload file
Monitor upload status
Get final result and return ID of file uploaded
Get me
validate metadata is correct.
Steps 1 and 2 are common functionality, there would be no difference there. How could I extract those two steps as modular components or functionality so I can reference them from any collection?
For additional clarity, we use ReadyAPI and are able to do 'Run Test Case' which can obviously run another test case. We've separated the functionality of token and file uploads into it's own test case and use it as a modular component. I'd like to achieve something similar with Postman.
Unfortunately Postman collections are working a little bit different.
But you can Merge your two collections to a single one, and execute it as one single collections.

How to use Django ImageField, and why use it at all?

Up until now, I've been storing my image filenames in a CharField and saving the actual file directly to S3. This was a fine solution for my own usage. I'd like to reconsider using an ImageField, since now there will be other users and file input validation would be appropriate.
I have a couple of questions that weren't exactly answered after reading the docs and the source code for FileField (which appears to be essentially ImageField minus the Pillow check and dimension field updating functionality).
1) Why use an ImageField at all? Or rather, why use a FileField? Sure, it's convenient for quick-and-easy forms and convenient for inserting to Django templates. But are there any substantial reasons, eg. Is it evidently secured against exploits and malicious uploads?
2) How to write to the field file? If it is correct that the file can be read by instance.imagefield (or is it instance.imagefield.file?), if I want to write to it can I simply do the following?
#receiver(pre_save, sender=Image)
def pre_save_image(sender, instance, *args, **kwargs):
instance.imagefield = process_image(instance.imagefield)
3) How to try saving with a specific filename, then try again with a new filename if that randomly generated filename already exists? For example with my code right now I do this, how can it be done with ImageField? I want to do it at the model layer, because if I do repeated tries at the view layer then the pre_save processing would run again which is ghetto (even though it's unlikely that it'll have a second try ever in the lifetime of the service).
for i in range(tries):
try:
name = generate_random_name()
media_storage.save(name + '.jpg', ContentFile(final_bytes))
break
except:
pass
4) In the models.py pre_save and post_save signals and in the actual model's save(), how can I tell if a file came in with the request? i.e. I want to know if a new image is incoming to be saved, or if there is no image (some other field in the object is being updated and the image itself remains unchanged).
I don't see any advantage of FileField or ImageField over what you are doing today. In fact, as I see it, the proper/modern/scalable way to deal with uploads is to have the client (browser) upload files directly to S3.
If done correctly (from a security stand point), this scheme allows you to scale in an incredible way without the need to add more computer power on your side. As an example, consider 100 people uploading a picture at the same time. Your server will need to receive all these data, only to upload it again to S3. On the other side, you can have a 1000 people upload at the same time, and I can assure you AWS can handle it. Your server only needs to handle the signing of the URL, which is a lot less work.
Take a look at fine-uploader, as a good technology to use to handle the efficient upload to s3 (loading in chunks, error checking, etc): http://docs.fineuploader.com/endpoint_handlers/amazon-s3.html. Google "django fineuploader" to find a sample application for Django.
In my case, I use a Model with a couple CharFields (bucket, key) plus a few other things specific to my application. My data flow is as follows:
Django services a page with the fine-uploader widget, configured based on my settings.
Fineuploader requests a signed URL from the django server (endpoint), and uses that to upload to S3 directly.
When the upload is complete, fineUploader makes another request to my server to register the completion of the upload, at which time, I create my object on the database. In this case, if the upload fails, I never create an object on the database.
On the AWS side, S3 triggers a Lambda function, which I use to create a thumbnail, and store it back to S3. So, I don't even use my own CPU (e.g. Celery) for resizing. So you see, not only can I have thousands of users uploading at the same time, but I can resize those thousand pictures in parallel, and for less than what an EC2 worker will cost me.
My Django Model is also used as a wrapper to manage the business logic (e.g. functions like get_original_url() and get_thumbnail_url()), so after the uploads, it is easy for my templates to get the signed read-onlly URLs.
In short, you can implement your own version of Fineuploader if you want, or use many of the alternative, but assuming you follow the recommended security best practices on the AWS side (e.g. create a special IAM with only write permission for the client, even if you are using signed URLs), this, IMO, is the best practice for dealing with uploads, especially if you are using S3 or similar to store these files.
Sorry if I am only really answering question 1, but questions 2 and 3 don't apply if you accept my answer for 1.
1) Why use an ImageField at all? Or rather, why use a FileField?
It's convenient for quick-and-easy forms and convenient for inserting
to Django templates.
But are there any substantial reasons, eg. Is it evidently secured against exploits and malicious uploads?
Yes. I daresay your own code probably does it too, but for a newby using the FileField will probably ensure that your important system files are not getting overwritten by a malicious upload.
2) How to write to the field file?
In your situation you would need to use a special storage backend that makes it possible to write directly to the Amazon S3. As you know, the storage backend for FileFile and ImageField are plugable. Here is one example plugin: `http://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
There is sample code which demonstrates how it can be written to. So I wll not go into that.`
3) How to try saving with a specific filename, then try again with a new filename if that randomly generated filename already exists?
ImageField and FileField takes care of this for you automatically. It will create a new filename if the old one exists. The code in my answer here did that automatically when I called it over and over again. here are some sample filenames produces (input being bada.png)
"4", "media/bada.png"
"5", "media/bada_aH0gV7t.png"
"7", "media/bada_XkzthgK.png"
"8", "media/bada_YzZuwDi.png"
"9", "media/bada_wpkasI3.png"
4) In the models.py pre_save and post_save signals and in the actual model's save(), how can I tell if a file came in with the request?
Your instance.pk will be None
If this is a modification to an existing file the PK will be set.
If this is a new image upload in the pre_save
Took me forever to learn how to save an image using ImageField. Turns out it's crazy easy -- once you know how to do it, it is, at least. I mean, it all comes together sensibly after you see it.
So basically, you're working with a FileField. I already looked into the differences between ImageField and FileField:
ImageField takes everything FileField takes in terms of attributes,
but ImageField also takes a width and height attribute if indicated.
ImageField, unlike FileField, validates an upload, making sure it's
an image.
Using ImageField comes down to most of the same constructs as FileField does. The biggest things to remember:
request.FILES['name_of_model']
So a form is generated from something in forms.py (or wherever your forms are) like this:
imgfile = forms.ImageField(label = 'Choose your image',
help_text = 'The image should be cool.')
In the model, you might have this in correspondence:
imgfile = models.ImageField(upload_to='images/%m/%d')
So there will be a POST request from the user (when the user completes the form). That request will contain basically a dictionary of data. The dictionary holds the submitted files. To focus the request on the file from the field (in our case, an ImageField), you would use:
request.FILES['imgfield']
You would use that when you construct the model object (instantiating your model class):
newPic = ImageModel(imgfile = request.FILES['imgfile'])
To save that the simple way, you'd just use the save() method bestowed upon your object (because Django is that awesome):
if form.is_valid():
newPic = Pic(imgfile = request.FILES['imgfile'])
newPic.save()
Your image will be stored, by default, to the directory you indicate for MEDIA_ROOT in settings.py.
The tough part, which isn't really so tough when you catch on, is accessing the image.
In your template, you could have something like this:
<img src="{{ MEDIA_URL }}{{ image.imgfile.name }}"></img>
Where {{ MEDIA_URL }} is something like /media/, as indicated in settings.py and {{ image.imgfile.name }} is the name of the file and the subdirectory you indicated in the model. "image" in this case is just the current image in a loop of images you might create to access each image in the database:
{% for image in images %}
{% endfor %}
Make SURE you configure your urls properly to handle the image or the image won't work. Add this to your urls:
urlpatterns += patterns('',
url(r'^media/(?P<path>.*)$', 'django.views.static.serve', {
'document_root': settings.MEDIA_ROOT,
}),
)

Amazon S3 different image dimensions with query string parameters

I need to store different dimensions of an image on S3. I am new to AWS Services, so it is taking some time for me to figure out how I can accomplish this.
Say I have an image called abc.png. I want to get different versions of this image using query string parameters, i.e apc.png?s=medium for medium 400x400, and abc.png?s=large for 1200*1200. I do not want to do a preprocessing, or on the fly resizing.
Is there a way to do this on the S3 level only?
The S3 API doesn't allow selecting a different file through the query string. Put the size in the image path instead: /medium/apc.png or /apc.png/medium. Don't forget to set the content-type option to the appropriate MIME type for each S3 object, which should allow the browser to render the images correctly.
Try this service, they work with s3 as well I believe
http://cloudinary.com/

How to mix Django, Uploadify, and S3Boto Storage Backend?

Background
I'm doing fairly big file uploads on Django. File size is generally 10MB-100MB.
I'm on Heroku and I've been hitting the request timeout of 30 seconds.
The Beginning
In order to get around the limit, Heroku's recommendation is to upload from the browser DIRECTLY to S3.
Amazon documents this by showing you how to write an HTML form to perform the upload.
Since I'm on Django, rather than write the HTML by hand, I'm using django-uploadify-s3 (example). This provides me with an SWF object, wrapped in JS, that performs the actual upload.
This part is working fine! Hooray!
The Problem
The problem is in tying that data back to my Django model in a sane way.
Right now the data comes back as a simple URL string, pointing to the file's location.
However, I was previously using S3 Boto from django-storages to manage all of my files as FileFields, backed by the delightful S3BotoStorageFile.
To reiterate, S3 Boto is working great in isolation, Uploadify is working great in isolation, the problem is in putting the two together.
My understanding is that the only way to populate the FileField is by providing both the filename AND the file content. When you're uploading files from the browser to Django, this is no problem, as Django has the file content in a buffer and can do whatever it likes with it. However, when doing direct-to-S3 uploads like me, Django only receives the file name and URL, not the binary data, so I can't properly populate the FieldFile.
Cry For Help
Anyone know a graceful way to use S3Boto's FileField in conjunction with direct-to-S3 uploading?
Else, what's the best way to manage an S3 file just based on its URL? Including setting expiration, key id, etc.
Many thanks!
Use a URLField.
I had a similar issue where i want to store file to s3 either directly using FileField or i have an option for the user to input the url directly. So to circumvent that, i used 2 fields in my model, one for FileField and one for URLField. And in the template i could use 'or' to see which one exists and to use that like {{ instance.filefield or instance.url }}.
This is untested, but you should be able to use:
from django.core.files.storage import default_storage
f = default_storage.open('name_you_expect_in_s3', 'r')
#f is an instance of S3BotoStorageFile, and can be assigned to a field
obj, created = YourObject.objects.get_or_create(**stuff_you_know)
obj.s3file_field = f
obj.save()
I think this should set up the local pointer to s3 and save it, without over writing the content.
ETA: You should do this only after the upload completes on S3 and you know the key in s3.
Checkout django-filetransfers. Looks like it plays nice with django-storages.
I've never used django, so ymmv :) but why not just write a single byte to populate the content? That way, you can still use FieldFile.
I'm thinking that writing actual SQL may be the easiest solution here. Alternatively you could subclass S3BotoStorage, override the _save method and allow for an optional kwarg of filepath which sidesteps all the other saving stuff and just returns the cleaned_name.