Getting Lowest Used Price with Amazon API - django

I am using Django with Amazon API to receive books pricing. I NEED the Used Book Lowest Price. I keep getting the New Lowest Price showing. Below is the code my developer used. She said there is no way to get the Used Lowest Price. I have tried multiple different variations of the Offers/OfferSummary etc. Still not change. I also tried adding Response elements but no change... From what I read amazon always returns the Lowest New Price first and then the Lowested Used Price.
I am not great with coding but it looks like BeautifulSoup is just pulling the first amount that is shown. Any thoughts on how I can achieve this? Spent all day with changes and got no where getting the Used prices....
price_response = amazon.ItemLookup(ItemId=isbn, IdType="ISBN", SearchIndex="Books",
MerchantID="All", ResponseGroup="Small,OfferSummary")
try:
desc_soup = BeautifulSoup(price_response,"html.parser")
book_price = desc_soup.amount.get_text()
book_price =float( book_price)/100
book_price = decimal.Decimal(book_price)
price_range = []

Currently the code is just finding the first amount that it can. This should work:
desc_soup = BeautifulSoup(price_response)
book_price = desc_soup.lowestusedprice.amount.get_text()
You really shouldn't be using html.parser either given that the API response is XML. You should do this instead:
desc_soup = BeautifulSoup(price_response, 'xml')
book_price = desc_soup.LowestUsedPrice.Amount.get_text()
(Note the capitalisation of the tree objects changes. You also need lxml to be installed for this to work).

Related

Getting an anomaly score for every datapoint in SageMaker?

I'm very new to SageMaker, and I've run into a bit of confusion as to how to achieve the output I am looking for. I am currently attempting to use the built-in RCF algorithm to perform anomaly detection on a list of stock volumes, like this:
apple_stock_volumes = [123412, 465125, 237564, 238172]
I have created a training job, model, and endpoint, and I'm trying now to invoke the endpoint using boto3. My current code looks like this:
apple_stock_volumes = [123412, 465125, 237564, 238172]
def inference():
client = boto3.client('sagemaker-runtime')
body = " ".join(apple_stock_volumes)
response = client.invoke_endpoint(
EndpointName='apple-volume-endpoint',
Body=body,
ContentType='text/csv'
)
inference = json.loads(response['Body'].read())
print(inference)
inference()
What I wanted was to get an anomaly score for every datapoint, and then to alert if the anomaly score was a few standard deviations above the mean. However, what I'm actually receiving is just a single anomaly score. The following is my output:
{'scores': [{'score': 0.7164874384}]}
Can anyone explain to me what's going on here? Is this an average anomaly score? Why can't I seem to get SageMaker to output a list of anomaly scores corresponding to my data? Thanks in advance!
Edit: I have already trained the model on a csv of historical volume data for the last year, and I have created an endpoint to hit.
Edit 2: I've accepted #maafk's answer, although the actual answer to my question was provided in one of his comments. The piece I was missing was that each data point must be on a new line in your csv input to the endpoint. Once I substituted body = " ".join(apple_stock_volumes) for body = "\n".join(apple_stock_volumes), everything worked as expected.
In your case, you'll want to get the standard deviation from getting the scores from historical stock volumes, and figuring out what your anomaly score is by calculating 3 * standard deviation
Update your code to do inference on multiple records at once
apple_stock_volumes = [123412, 465125, 237564, 238172]
def inference():
client = boto3.client('sagemaker-runtime')
body = "\n".join(apple_stock_volumes). # New line for each record
response = client.invoke_endpoint(
EndpointName='apple-volume-endpoint',
Body=body,
ContentType='text/csv'
)
inference = json.loads(response['Body'].read())
print(inference)
inference()
This will return a list of scores
Assuming apple_stock_volumes_df has your volumes and the scores (after running inference on each record):
score_mean = apple_stock_volumes_df['score'].mean()
score_std = apple_stock_volumes_df['score'].std()
score_cutoff = score_mean + 3*score_std
There is a great example here showing this

Searching a many to many database using Google Cloud Datastore

I am quite new to google app engine. I know google datastore is not sql, but I am trying to get many to many relationship behaviour in it. As you can see below, I have Gif entities and Tag entities. I want my application to search Gif entities by related tag. Here is what I have done;
class Gif(ndb.Model):
author = ndb.UserProperty()
link = ndb.StringProperty(indexed=False)
class Tag(ndb.Model):
name = ndb.StringProperty()
class TagGifPair(ndb.Model):
tag_id = ndb.IntegerProperty()
gif_id = ndb.IntegerProperty()
#classmethod
def search_gif_by_tag(cls, tag_name)
query = cls.query(name=tag_name)
# I am stuck here ...
Is this a correct start to do this? If so, how can I finish it. If not, how to do it?
You can use repeated properties https://developers.google.com/appengine/docs/python/ndb/properties#repeated the sample in the link uses tags with entity as sample but for your exact use case will be like:
class Gif(ndb.Model):
author = ndb.UserProperty()
link = ndb.StringProperty(indexed=False)
# you store array of tag keys here you can also just make this
# StringProperty(repeated=True)
tag = ndb.KeyProperty(repeated=True)
#classmethod
def get_by_tag(cls, tag_name):
# a query to a repeated property works the same as if it was a single value
return cls.query(cls.tag == ndb.Key(Tag, tag_name)).fetch()
# we will put the tag_name as its key.id()
# you only really need this if you wanna keep records of your tags
# you can simply keep the tags as string too
class Tag(ndb.Model):
gif_count = ndb.IntegerProperty(indexed=False)
Maybe you want to use list? I would do something like this if you only need to search gif by tags. I'm using db since I'm not familiar with ndb.
class Gif(db.Model):
author = db.UserProperty()
link = db.StringProperty(indexed=False)
tags = db.StringListProperty(indexed=True)
Query like this
Gif.all().filter('tags =', tag).fetch(1000)
There's different ways of doing many-to-many relationships. Using ListProperties is one way. The limitation to keep in mind if using ListProperties is that there's a limit to the number of indexes per entity, and a limit to the total entity size. This means that there's a limit to the number of entities in the list (depending on whether you hit the index count or entity size first). See the bottom of this page: https://developers.google.com/appengine/docs/python/datastore/overview
If you believe the number of references will work within this limit, this is a good way to go. Considering that you're not going to have thousands of admins for a Page, this is probably the right way.
The other way is to have an intermediate entity that has reference properties to both sides of your many-to-many. This method will let you scale much higher, but because of all the extra entity writes and reads, this is much more expensive.

Get most commented posts in django with django comments

I'm trying to get the ten most commented posts in my django app, but I'm unable to do it because I can't think a proper way.
I'm currently using the django comments framework, and I've seen a possibility of doing this with aggregate or annotate , but I can figure out how.
The thing would be:
Get all the posts
Calculate the number of comments per post (I have a comment_count method for that)
Order the posts from most commented to less
Get the first 10 (for example)
Is there any "simple" or "pythonic" way to do this? I'm a bit lost since the comments framework is only accesible via template tags, and not directly from the code (unless you want to modify it)
Any help is appreciated
You're right that you need to use the annotation and aggregation features. What you need to do is group by and get a count of the object_pk of the Comment model:
from django.contrib.comments.models import Comment
from django.db.models import Count
o_list = Comment.objects.values('object_pk').annotate(ocount=Count('object_pk'))
This will assign something like the following to o_list:
[{'object_pk': '123', 'ocount': 56},
{'object_pk': '321', 'ocount': 47},
...etc...]
You could then sort the list and slice the top 10:
top_ten_objects = sorted(o_list, key=lambda k: k['ocount'])[:10]
You can then use the values in object_pk to retrieve the objects that the comments are attached to.
Annotate is going to be the preferred way, partially because it will reduce db queries and it's basically a one-liner. While your theoretical loop would work, I bet your comment_count method relies on querying comments for a given post, which would be 1 query per post that you loop over- nasty!
posts_by_score = Comment.objects.filter(is_public=True).values('object_pk').annotate(
score=Count('id')).order_by('-score')
post_ids = [int(obj['object_pk']) for obj in posts_by_score]
top_posts = Post.objects.in_bulk(post_ids)
This code is shameless adapted from Django-Blog-Zinnia (no affiliation)

I'm confused about how distinct() works with Django queries

I have this query:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)
Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()
But that doesn't work, I get back an empty Array. Any ideas why?
Here is my CheckinAct model:
class CheckinAct(models.Model):
user = models.ForeignKey(User)
location = models.ForeignKey(Location)
time = models.DateTimeField()
----Update------
So now I have updated my query to look like this:
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
But I'm still getting multiple objects back that have the same user, like so:
[{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
---- Update 2------
Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.
checkins = CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user')).values('user', 'dcount')
logger.info("checkins!!! : " + str(checkins))
Logs the following:
checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.
The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)
For your use-case, the following should give you exactly what you want:
checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
values('user').annotate(checkin_count=Count('pk')).order_by()
UPDATE
Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):
User.objects.filter(checkinat__location=location).distinct().count()
UPDATE for non-rel support
checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))
This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.
You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.
What you want to do to get your results is to use annotate first, and then values, such as:
CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')
The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).
Hope that helps!
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?
Can you tell me what happens when you try this?
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(Count('user')).order_by()
(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)

Bulk updating a table

I want to update a customer table with a spreadsheet from our accounting system. Unfortunately I can't just clear out the data and reload all of it, because there are a few records in the table that are not in the imported data (don't ask).
For 2000 records this is taking about 5 minutes, and I wondered if there was a better way of doing it.
for row in data:
try:
try:
customer = models.Retailer.objects.get(shared_id=row['Customer'])
except models.Retailer.DoesNotExist:
customer = models.Retailer()
customer.shared_id = row['Customer']
customer.name = row['Name 1']
customer.address01 = row['Street']
customer.address02 = row['Street 2']
customer.postcode = row['Postl Code']
customer.city = row['City']
customer.save()
except:
print formatExceptionInfo("Error with Customer ID: " + str(row['Customer']))
Look at my answer here: Django: form that updates X amount of models
The QuerySet has update() method - rest is explained in above link.
I've had some success using this bulk update snippet:
http://djangosnippets.org/snippets/446/
It's a bit outdated, but it worked on django 1.1, so I suppose you can still make it work. If you are looking for a quick way to do a one time bulk insert, this is the quickest (I'm not sure I'd trust it for regular use without seriously testing performance).
I've made a terribly crude attempt on a solution for this problem, but it's not finished yet and it doesn`t support working with django orm objects directly - yet.
http://pypi.python.org/pypi/dse/0.1.0
It`s not been properly testet and let me know if you have any suggestions on how to improve it. Using the django orm to do stuff like this is terrible.
Thomas