Retrieving results from Mturk Sandbox

Retrieving results from Mturk Sandbox - amazon-web-services

I'm working on retrieving my HIT results from my local computer. I followed the template of get_results.py and entered my key_id, access_key correctly, and installed xmltodict but got the error message. Could anyone help me figure out why? Here is my HIT address if anyone needs the format of my HIT https://workersandbox.mturk.com/mturk/preview?groupId=3MKP0VNPM2VVY0K5UTNZX9OO9Q8RJE
import boto3
mturk = boto3.client('mturk',
aws_access_key_id = "PASTE_YOUR_IAM_USER_ACCESS_KEY",
aws_secret_access_key = "PASTE_YOUR_IAM_USER_SECRET_KEY",
region_name='us-east-1',
endpoint_url = MTURK_SANDBOX
)
# You will need the following library
# to help parse the XML answers supplied from MTurk
# Install it in your local environment with
# pip install xmltodict
import xmltodict
# Use the hit_id previously created
hit_id = 'PASTE_IN_YOUR_HIT_ID'
# We are only publishing this task to one Worker
# So we will get back an array with one item if it has been completed
worker_results = mturk.list_assignments_for_hit(HITId=hit_id, AssignmentStatuses=['Submitted'])

Related

Trouble authenticating and writing to database locally

I'm having trouble authenticating and writing data to a spanner database locally. All imports are up to date - google.cloud, google.auth2, etc. I have tried having someone else run this and it works fine, so the problem seems to be something on my end - something wrong or misconfigured on my computer, maybe where the credentials are stored or something?
Anyone have any ideas?
from google.cloud import spanner
from google.api_core.exceptions import GoogleAPICallError
from google.api_core.datetime_helpers import DatetimeWithNanoseconds
import datetime
from google.oauth2 import service_account
def write_to(database):
record = [[
1041613562310836275,
'test_name'
]]
columns = ("id", "name")
insert_errors = []
try:
with database.batch() as batch:
batch.insert_or_update(
table = "guild",
columns = columns,
values = record,
)
except GoogleAPICallError as e:
print(f'error: {e}')
insert_errors.append(e.message)
pass
return insert_errors
if __name__ == "__main__":
credentials = service_account.Credentials.from_service_account_file(r'path\to\a.json')
instance_id = 'instance-name'
database_id = 'database-name'
spanner_client = spanner.Client(project='project-name', credentials=credentials)
print(f'spanner creds: {spanner_client.credentials}')
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)
insert_errors = write_to(database)
some credential tests:
creds = service_account.Credentials.from_service_account_file(a_json)
<google.oauth2.service_account.Credentials at 0x...>
spanner_client.credentials
<google.auth.credentials.AnonymousCredentials at 0x...>
spanner_client.credentials.signer_email
AttributeError: 'AnonymousCredentials' object has no attribute 'signer_email'
creds.signer_email
'...#....iam.gserviceaccount.com'
spanner.Client().from_service_account_json(a_json).credentials
<google.auth.credentials.AnonymousCredentials object at 0x...>

The most common reason for this is that you have accidentally set (or forgot to unset) the environment variable SPANNER_EMULATOR_HOST. If this environment variable has been set, the client library will try to connect to the emulator instead of Cloud Spanner. This will cause the client library to wait for a long time while trying to connect to the emulator (assuming that the emulator is not running on your machine). Unset the environment variable to fix this problem.
Note: This environment variable will only affect Cloud Spanner client libraries, which is why other Google Cloud product will work on the same machine. The script will also in most cases work on other machines, as they are unlikely to have this environment variable set.

SageMaker PyTorch estimator.fit freezes when running in local mode from EC2

I am trying to train a PyTorch model through SageMaker. I am running a script main.py (which I have posted a minimum working example of below) which calls a PyTorch Estimator. I have the code for training my model saved as a separate script, train.py, which is called by the entry_point parameter of the Estimator. These scripts are hosted on a EC2 instance in the same AWS region as my SageMaker domain.
When I try running this with instance_type = "ml.m5.4xlarge", it works ok. However, I am unable to debug any problems in train.py. Any bugs in that file simply give me the error: 'AlgorithmError: ExecuteUserScriptError', and will not allow me to set breakpoint() lines in train.py (encountering a breakpoint throws the above error).
Instead I am trying to run in local mode, which I believe does allow for breakpoints. However, when I reach estimator.fit(inputs), it hangs on that line indefinitely, giving no output. Any print statements that I put at the start of the main function in train.py are not reached. This is true no matter what code I put in train.py. It also did not throw an error when I had an illegal underscore in the base_job_name parameter of the estimator, which suggests that it does not even create the estimator instance.
Below is a minimum example which replicates the issue on my instance. Any help would be appreciated.
### File structure
main.py
customcode/
|
|_ train.py
### main.py
import sagemaker
from sagemaker.pytorch import PyTorch
import boto3
try:
# When running on Studio.
sess = sagemaker.session.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
except ValueError:
# When running from EC2 or local machine.
print('Performing manual setup.')
bucket = 'MY-BUCKET-NAME'
region = 'us-east-2'
role = 'arn:aws:iam::MY-ACCOUNT-NUMBER:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXXXXX'
iam = boto3.client("iam")
sagemaker_client = boto3.client("sagemaker")
boto3.setup_default_session(region_name=region, profile_name="default")
sess = sagemaker.Session(sagemaker_client=sagemaker_client, default_bucket=bucket)
hyperparameters = {'epochs': 10}
inputs = {'data': f's3://{bucket}/features'}
train_instance_type = 'local'
hosted_estimator = PyTorch(
source_dir='customcode',
entry_point='train.py',
instance_type=train_instance_type,
instance_count=1,
hyperparameters=hyperparameters,
role=role,
base_job_name='mwe-train',
framework_version='1.12',
py_version='py38',
input_mode='FastFile',
)
hosted_estimator.fit(inputs) # This is the line that freezes
### train.py
def main():
breakpoint() # Throws an error in non-local mode.
return
if __name__ == '__main__':
print('Reached') # Never reached in local mode.
main()

How to configure firebase admin at Django server?

I am trying to add custom token for user authentication (phone number and password) and as per reference documents I would like to configure server to generate custom token.
I have installed, $ sudo pip install firebase-admin
and also setup an environment : export GOOGLE_APPLICATION_CREDENTIALS="[to json file at my server]"
I am using Django project at my server where i have created all my APIs.
I am stucked at this point where it says to initialize app:
default_app = firebase_admin.initialize_app()
Where should i write the above statement within Django files? and how should i generate endpoint to get custom token?
Regards,
PD

pip install firebase-admin
credentials.json file includes some private keys. So, you can’t add to your project directly. If you’re using the git version system and you want to host this file in your project folder, you must add the file name to your “.gitignore”.
Set your operation system environ variable. You can use for MacOSX or Linux distributions; For set a variable in window os (https://www.computerhope.com/issues/ch000549.htm).
$ export GOOGLE_APPLICATION_CREDENTIALS='/path/to/credentials.json'
This part is important, google package (It came with firebase_admin package) looking at some conditions for credentials. One of them is os.environ.get(‘GOOGLE_APPLICATION_CREDENTIALS’). If you set this file, than you don’t need to anythig fot initialize firebase. Otherwise you should define manually.
For initial firebase look at. Set up configurations (https://firebase.google.com/docs/firestore/quickstart#initialize )
Create a file named “firebase.py”.
$ touch firebase.py
Now we can use the “firebase_admin” package for querying. Our firebase.py seems like;
import time
from datetime import timedelta
from uuid import uuid4
from firebase_admin import firestore, initialize_app
__all__ = ['send_to_firebase', 'update_firebase_snapshot']
initialize_app()
def send_to_firebase(raw_notification):
db = firestore.client()
start = time.time()
db.collection('notifications').document(str(uuid4())).create(raw_notification)
end = time.time()
spend_time = timedelta(seconds=end - start)
return spend_time
def update_firebase_snapshot(snapshot_id):
start = time.time()
db = firestore.client()
db.collection('notifications').document(snapshot_id).update(
{'is_read': True}
)
end = time.time()
spend_time = timedelta(seconds=end - start)
return spend_time
You Refer this link(https://medium.com/#canadiyaman/how-to-use-firebase-with-django-project-34578516bafe)

Do any AWS API scripts exist in a repository somewhere?

I want to start 10 instances, get their instance id's and get their private IP addresses.
I know this can be done using AWS CLI, I'm wondering if there are any such scripts already written so I don't have to reinvent the wheel.
Thanks

I recommend to use python and boto package for such automation. Python is more clear than bash. You can user following page as starting point: http://boto.readthedocs.org/en/latest/ec2_tut.html

In the off chance that someone in the future comes across my question, I thought I'd give my (somewhat) final solution.
Using python and the Boto package that was suggested, I have the following python script.
It's pretty well commented but feel free to ask if you have any questions.
import boto
import time
import sys
IMAGE = 'ami-xxxxxxxx'
KEY_NAME = 'xxxxx'
INSTANCE_TYPE = 't1.micro'
SECURITY_GROUPS = ['xxxxxx'] # If multiple, separate by commas
COUNT = 2 #number of servers to start
private_dns = [] # will be populated with private dns of each instance
print 'Connecting to AWS'
conn = boto.connect_ec2()
print 'Starting instances'
#start instance
reservation = conn.run_instances(IMAGE, instance_type=INSTANCE_TYPE, key_name=KEY_NAME, security_groups=SECURITY_GROUPS, min_count=COUNT, max_count=COUNT)#, dry_run=True)
#print reservation #debug
print 'Waiting for instances to start'
# ONLY CHECKS IF RUNNING, MAY NOT BE SSH READY
for instance in reservation.instances: #doing this for every instance we started
while not instance.update() == 'running': #while it's not running (probably 'pending')
print '.', # trailing comma is intentional to print on same line
sys.stdout.flush() # make the thing print immediately instead of buffering
time.sleep(2) # Let the instance start up
print 'Done\n'
for instance in reservation.instances:
instance.add_tag("Name","Hadoop Ecosystem") # tag the instance
private_dns.append(instance.private_dns_name) # adding ip to array
print instance, 'is ready at', instance.private_dns_name # print to console
print private_dns

Collectstatic and S3: not finding updated files

I am using Amazon S3 to store static files for a Django project, but collectstatic is not finding updated files - only new ones.
I've been looking for an answer for ages, and my guess is that I have something configured incorrectly. I followed this blog post to help get everything set up.
I also ran into this question which seems identical to my problem, but I have tried all the solutions already.
I even tried using this plugin which is suggested in this question.
Here is some information that might be useful:
settings.py
...
STATICFILES_FINDERS = (
'django.contrib.staticfiles.finders.FileSystemFinder',
'django.contrib.staticfiles.finders.AppDirectoriesFinder',
'django.contrib.staticfiles.finders.DefaultStorageFinder',
)
...
# S3 Settings
AWS_STORAGE_BUCKET_NAME = os.environ['AWS_STORAGE_BUCKET_NAME']
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
S3_URL = 'http://%s.s3.amazonaws.com/' % AWS_STORAGE_BUCKET_NAME
STATIC_URL = S3_URL
AWS_PRELOAD_METADATA = False
requirements.txt
...
Django==1.5.1
boto==2.10.0
django-storages==1.1.8
python-dateutil==2.1
Edit1:
I apologize if this question is too unique to my own circumstances to be of any help to a large audience. Nonetheless - this is has been hampering my productivity for a long time and I have wasted many hours looking for solutions, so I am starting a bounty to reward anyone who can help troubleshoot this problem.
Edit2:
I just ran across a similar problem somewhere. I am in a different timezone than the location of my AWS bucket. If by default collectstatic uses time stamp, could this interfere with the process?
Thanks

I think I solved this problem. Like you, I have spent so many hours on this problem. I am also on the bug report you found on bitbucket. Here is what I just accomplished.
I had
django-storages==1.1.8
Collectfast==0.1.11
This does not work at all. Delete everything for once does not work either. After that, it cannot pick up modifications and refuse to update anything.
The problem is with our time zone. S3 will say the files it has is last modified later than the ones we want to upload. django collectstatic will not try to copy the new ones over at all. And it will call the files "unmodified". For example, here is what I see before my fix:
Collected static files in 0:00:45.292022.
Skipped 407 already synced files.
0 static files copied, 1 unmodified.
My solution is "To the hell with modified time!". Besides the time zone problems that we are solving here, what if I made a mistake and need to roll back? It will refuse to deploy the old static files and leave my website broken.
Here is my pull request to Collectfast https://github.com/FundedByMe/collectfast/pull/11 . I still left a flag so if you really want to check modified time, you can still do it. Before it got merged, just use my code at https://github.com/sunshineo/collectfast
You have a nice day!
--Gordon
PS: Stayed up till 4:40am for this. My day is ruined for sure.

After hours of digging around, I found this bug report.
I changed my requirements to revert to a previous version of Django storages.
django-storages==1.1.5

You might want to consider using this plugin written by antonagestam on Github:
https://github.com/FundedByMe/collectfast
It compares the checksum of the files, which is a guaranteed way of determining when a file has changed. It's the accepted answer at this other stackoverflow question: Faster alternative to manage.py collectstatic (w/ s3boto storage backend) to sync static files to s3?

There are some good answers here but I spent some time on this today so figured I'd contribute one more in case it helps someone in the future. Following advice found in other threads I confirmed that, for me, this was indeed caused by a difference in time zone. My django time wasn't incorrect but was set to EST and S3 was set to GMT. In testing, I reverted to django-storages 1.1.5 which did seem to get collectstatic working. Partially due to personal preference, I was unwilling to a) roll back three versions of django-storages and lose any potential bug fixes or b) alter time zones for components of my project for what essentially boils down to a convenience function (albeit an important one).
I wrote a short script to do the same job as collectstatic without the aforementioned alterations. It will need a little modifying for your app but should work for standard cases if it is placed at the app level and 'static_dirs' is replaced with the names of your project's apps. It is run via terminal with 'python whatever_you_call_it.py -e environment_name (set this to your aws bucket).
import sys, os, subprocess
import boto3
import botocore
from boto3.session import Session
import argparse
import os.path, time
from datetime import datetime, timedelta
import pytz
utc = pytz.UTC
DEV_BUCKET_NAME = 'dev-homfield-media-root'
PROD_BUCKET_NAME = 'homfield-media-root'
static_dirs = ['accounts', 'messaging', 'payments', 'search', 'sitewide']
def main():
try:
parser = argparse.ArgumentParser(description='Homfield Collectstatic. Our version of collectstatic to fix django-storages bug.\n')
parser.add_argument('-e', '--environment', type=str, required=True, help='Name of environment (dev/prod)')
args = parser.parse_args()
vargs = vars(args)
if vargs['environment'] == 'dev':
selected_bucket = DEV_BUCKET_NAME
print "\nAre you sure? You're about to push to the DEV bucket. (Y/n)"
elif vargs['environment'] == 'prod':
selected_bucket = PROD_BUCKET_NAME
print "Are you sure? You're about to push to the PROD bucket. (Y/n)"
else:
raise ValueError
acceptable = ['Y', 'y', 'N', 'n']
confirmation = raw_input().strip()
while confirmation not in acceptable:
print "That's an invalid response. (Y/n)"
confirmation = raw_input().strip()
if confirmation == 'Y' or confirmation == 'y':
run(selected_bucket)
else:
print "Collectstatic aborted."
except Exception as e:
print type(e)
print "An error occured. S3 staticfiles may not have been updated."
def run(bucket_name):
#open session with S3
session = Session(aws_access_key_id='{aws_access_key_id}',
aws_secret_access_key='{aws_secret_access_key}',
region_name='us-east-1')
s3 = session.resource('s3')
bucket = s3.Bucket(bucket_name)
# loop through static directories
for directory in static_dirs:
rootDir = './' + directory + "/static"
print('Checking directory: %s' % rootDir)
#loop through subdirectories
for dirName, subdirList, fileList in os.walk(rootDir):
#loop through all files in subdirectory
for fname in fileList:
try:
if fname == '.DS_Store':
continue
# find and qualify file last modified time
full_path = dirName + "/" + fname
last_mod_string = time.ctime(os.path.getmtime(full_path))
file_last_mod = datetime.strptime(last_mod_string, "%a %b %d %H:%M:%S %Y") + timedelta(hours=5)
file_last_mod = utc.localize(file_last_mod)
# truncate path for S3 loop and find object, delete and update if it has been updates
s3_path = full_path[full_path.find('static'):]
found = False
for key in bucket.objects.all():
if key.key == s3_path:
found = True
last_mode_date = key.last_modified
if last_mode_date < file_last_mod:
key.delete()
s3.Object(bucket_name, s3_path).put(Body=open(full_path, 'r'), ContentType=get_mime_type(full_path))
print "\tUpdated : " + full_path
if not found:
# if file not found in S3 it is new, send it up
print "\tFound a new file. Uploading : " + full_path
s3.Object(bucket_name, s3_path).put(Body=open(full_path, 'r'), ContentType=get_mime_type(full_path))
except:
print "ALERT: Big time problems with: " + full_path + ". I'm bowin' out dawg, this shitz on u."
def get_mime_type(full_path):
try:
last_index = full_path.rfind('.')
if last_index < 0:
return 'application/octet-stream'
extension = full_path[last_index:]
return {
'.js' : 'application/javascript',
'.css' : 'text/css',
'.txt' : 'text/plain',
'.png' : 'image/png',
'.jpg' : 'image/jpeg',
'.jpeg' : 'image/jpeg',
'.eot' : 'application/vnd.ms-fontobject',
'.svg' : 'image/svg+xml',
'.ttf' : 'application/octet-stream',
'.woff' : 'application/x-font-woff',
'.woff2' : 'application/octet-stream'
}[extension]
except:
'ALERT: Couldn\'t match mime type for '+ full_path + '. Sending to S3 as application/octet-stream.'
if __name__ == '__main__':
main()

I had a similar problem pushing new files to a S3 bucket (previously working well), but is not problem about django or python, on my end I fixed the issue when I deleted my local repository and cloned it again.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js