it seems that Boto is the official Amazon API module for Python, and this one is for Tornado, so here is my questions:
does it offer pagination (requests only 10 products, since amazon offers 10 products per page, then i want only to get the first page...), then how (sample code?)
how then to parse the product parse, i've used python-amazon-simple-product-api but sadly it dont offer pagination, so all the offers keep iterating.
generally, pagination is performed by the client requesting the api. To do this in boto, you'll need to cut your systems up. So for instance, say you make a call to AWS via boto, using the get_all_instances def; you'll need to store those somehow and then keep track of which servers have been displayed, and which not. To my knowledge, boto does not have the LIMIT functionality most dev's are used to from MySQL. Personally, I scan all my instances and stash them in mongo like so:
for r in conn.get_all_instances(): # loop through all reservations
groups = [g.name for g in r.groups] # get a list of groups for this reservation
for x in r.instances: # loop through all instances with-in reservation
groups = ','.join(groups) # join the groups into a comma separated list
name = x.tags.get('Name',''); # get instance name from the 'Name' tag
new_record = { "tagname":name, "ip_address":x.private_ip_address,
"external_ip_nat":x.ip_address, "type":x.instance_type,
"state":x.state, "base_image":x.image_id, "placement":x.placement,
"public_ec2_dns":x.public_dns_name,
"launch_time":x.launch_time, "parent": ObjectId(account['_id'])}
new_record['groups'] = groups
systems_coll.update({'_id':x.id},{"$set":new_record},upsert=True)
error = db.error()
if error != None:
print "err:%s:" % str(error)
You could also wrap these in try/catch blocks. Up to you. Once you get them out of boto, should be trivial to do the cut up work.
-- Jess
Related
I would like to find a list of dynamic thing groups. I can see the type of field when I go to one of the thing groups in AWS IoT Core. How do I search and find a list of thing groups which has a Type as Dynamic?
e.g.
When I visit one of the Thing Group present in IoT Core.
You do not have a description for the thing group yet.
Created
Jul 26, 2019 11:21:44 AM -0700
Type
Static
0 Attributes
I tried a few variants, but they did not work.
Type: Dynamic
attributes.Type: Dynamic
Type == Dynamic
Thanks in advance for any suggestions.
Configure a Thing Group index on Fleet Indexing.
For every dynamic Thing Group created, add an attribute to distinguish it as a dynamic thing group, i.e. attribute.dynamic: true
Call SearchIndex on the index with the query attributes.dynamic: true and that will return all dynamic Thing Groups.
It looks like it is not straight-forward. Thanks to my colleague, I created a script to get that list.
import boto3
client = boto3.client('iot')
list_thing_groups = client.list_thing_groups()
while True:
for thing_group in list_thing_groups['thingGroups']:
name = thing_group['groupName']
response = client.describe_thing_group(
thingGroupName=name
)
query = response.get('queryString')
if query:
print(name)
if list_thing_groups.get('nextToken'):
list_thing_groups = client.list_thing_groups(nextToken=list_thing_groups.get('nextToken'))
else:
break
The idea is queryString for dynamic Thing Group won't be null.
I am working with Tweepy (python's REST API client) and I'm trying to find tweets by several keywords and without url included in tweet.
But search results are not up to our satisfaction. Looks like query has erros and was stopped. Additionally we had observed that results were returned one-by-one not (as previously) in bulk packs of 100.
Could you please tell me why this search does not work properly?
We wanted to get all tweets mentioning 'Amazon' without any URL links in the text.
We used search shown below. Search results were still containing tweets with URLs or without 'Amazon' keyword.
Could you please let us know what we are doing wrong?
auth = tweepy.AppAuthHandler(consumer_key, consumer_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
searchQuery = 'Amazon OR AMAZON OR amazon filter:-links' # Keyword
new_tweets = api.search(q=searchQuery, count=100,
result_type = "recent",
max_id = sinceId,
lang = "en")
The minus sign should be put before "filter", not before "links", like this:
searchQuery = 'Amazon OR AMAZON OR amazon -filter:links'
Also, I doubt that the count = 100 option is a valid one, since it is not listed on the API documentation (which may not be very up-to-date, though). Try to replace that with rpp = 100 to get tweets in bulk packs.
I am not sure why some of the tweets you find do not contain the "Amazon" keyword, but a possibility is that "Amazon" is contained within the username of the poster. I do not know if you can filter that directly in the query, or even if you would want to filter it, since it would mean you would reject tweets from the official Amazon accounts. I would suggest that, for each tweet the query returns, you check it to make sure it does contain "Amazon".
Currently I have a method that retrieves all ~119,000 gmail accounts and writes them to a csv file using python code below and the enabled admin.sdk + auth 2.0:
def get_accounts(self):
students = []
page_token = None
params = {'customer': 'my_customer'}
while True:
try:
if page_token:
params['pageToken'] = page_token
current_page = self.dir_api.users().list(**params).execute()
students.extend(current_page['users'])
# write each page of data to a file
csv_file = CSVWriter(students, self.output_file)
csv_file.write_file()
# clear the list for the next page of data
del students[:]
page_token = current_page.get('nextPageToken')
if not page_token:
break
except errors.HttpError as error:
break
I would like to retrieve all 119,000 as a lump sum, that is, without having to loop or as a batch call. Is this possible and if so, can you provide example python code? I have run into communication issues and have to rerun the process multiple times to obtain the ~119,000 accts successfully (takes about 10 minutes to download). Would like to minimize communication errors. Please advise if better method exists or non-looping method also is possible.
There's no way to do this as a batch because you need to know each pageToken and those are only given as the page is retrieved. However, you can increase your performance somewhat by getting larger pages:
params = {'customer': 'my_customer', 'maxResults': 500}
since the default page size when maxResults is not set is 100, adding maxResults: 500 will reduce the number of API calls by an order of 5. While each call may take slightly longer, you should notice performance increases because you're making far fewer API calls and HTTP round trips.
You should also look at using the fields parameter to only specify user attributes you need to read in the list. That way you're not wasting time and bandwidth retrieving details about your users that your app never uses. Try something like:
my_fields = 'nextPageToken,users(primaryEmail,name,suspended)'
params = {
'customer': 'my_customer',
maxResults': 500,
fields: my_fields
}
Last of all, if your app retrieves the list of users fairly frequently, turning on caching may help.
I have 2 servers with Haystack:
Server1: This has elasticsearch installed
Server2: This doesn't have elasticsearch, the queries are made to Server1
My issue is about pagination when I make queries from Server2 to Server1:
Server2 makes query to Server1
Server1 send all the results back to Server2
Server2 makes the pagination
But this is not optimal, if the query return 10.000 objects, the query will be slow.
I know that you can send to elasticsearch some values in the query (size, from and to) but I don't know if this is possible using Haystack, I've checked documentation and googled it and found nothing.
How could I configure the query in Haystack to receive the results 10 by 10 ?
Edit
Is possible that if I make SearchQuerySet()[10000:10010] it will only ask for this 10 items ?
Or it will ask for all the items and then filter them ?
Edit2
I found this on Haystack Docs:
SearchQuery API - set_limits
it seems a function to do whatt I'm trying to do:
Restricts the query by altering either the start, end or both offsets.
And then I tried to do:
from haystack.query import SearchQuerySet
sqs = SearchQuerySet()
sqs.query.set_limits(low=0, high=4)
sqs.filter(content='anything')
The result is the full list, like I never add the set_limit line
Why is not working ?
Haystack works kinda different from Django ORM. After limiting the queryset, you should call get_results() in order to get limited results. This is actually smart, because it avoids multiple requests from Elastic.
Example:
# Assume you have 800 records.
sqs = SearchQuerySet()
sqs.query.set_limits(low=0, high=4)
len(sqs) # Will return 800 records
len(sqs.get_results()) # Will return first 4 records.
Hope that it helps.
Adding on to the Yigit answer, if you want to have these offsets on filtered records just add filter condition when you form the SearchQuerySet.
Also remember once the limits are set you can't change them by setting them again. You would need to form the SearchQuerySet() again, or there is a method to clear the limits.
results = SearchQuerySet().filter(content="keyword")
#we have a filtered resultSet now let's find specific records
results.query.set_limits(0,4)
return results.query.get_results()
When using Amazon's web service to get any product's information, is there a direct way to get the Average Customer Rating (1-5 stars)? Here are the parameters I'm using:
Service=AWSECommerceService
Version=2011-08-01
Operation=ItemSearch
SearchIndex=Books
Title=A Game of Thrones
ResponseGroup=Large
I would expect it to have a customer rating of 4.5 and total reviews of 2177. But instead I get the following in the response.
<CustomerReviews><IFrameURL>http://www.amazon.com/reviews/iframe?...</IFrameURL></CustomerReviews>
Is there a way to get the overall customer rating, besides for reading the <IFrameURL/> value, making another HTTP request for that page of reviews, and then screen scraping the HTML? That approach is fragile since Amazon could easily change the reviews page structure which would bust my application.
You can scrape from here. Just replace the asin with what you need.
http://www.amazon.com/gp/customer-reviews/widgets/average-customer-review/popover/ref=dpx_acr_pop_?contextId=dpx&asin=B000P0ZSHK
As far as i know, Amazon changed it's API so its not possible anymore to get the reviewrank information. If you check this Link the note sais:
As of November 8, 2010, only the iframe URL is returned in the request
content.
However, testing with the params you used to get the Iframe it seems that now even the Iframe dosn't work anymore. Thus, even in the latest API Reference in the chapter "Motivating Customers to Buy" the part "reviews" is compleatly missing.
However: Since i'm also very interested if its still possible somehow to get the reviewrank information - maybe even not using amazon API but a competitors API to get review rank informations - i'll set up a bounty if anybody can provide something helpful on that. Bounty will be set in this topic in two days.
You can grab the iframe review url and then use css to position it so only the star rating shows. It's not ideal since you're not getting raw data, but it's an easy way to add the rating to your page.
Sample of this in action - http://spamtech.co.uk/positioning-content-inside-an-iframe/
Here is a VBS script that would scrape the rating. Paste the code below to a text file, rename it to Test.vbs and double click to run on Windows.
sAsin = InputBox("What is your ASIN?", "Amazon Standard Identification Number (ASIN)", "B000P0ZSHK")
if sAsin <> "" Then
sHtml = SendData("http://www.amazon.com/gp/customer-reviews/widgets/average-customer-review/popover/ref=dpx_acr_pop_?contextId=dpx&asin=" & sAsin)
sRating = ExtractHtml(sHtml, "<span class=""a-size-base a-color-secondary"">(.*?)<\/span>")
sReviews = ExtractHtml(sHtml, "<a class=""a-size-small a-link-emphasis"".*?>.*?See all(.*?)<\/a>")
MsgBox sRating & vbCrLf & sReviews
End If
Function ExtractHtml(sHtml,sPattern)
Set oRegExp = New RegExp
oRegExp.Pattern = sPattern
oRegExp.IgnoreCase = True
Set oMatch = oRegExp.Execute(sHtml)
If oMatch.Count = 1 Then
ExtractHtml = Trim(oMatch.Item(0).SubMatches(0))
End If
End Function
Function SendData(sUrl)
Dim oHttp 'As XMLHTTP30
Set oHttp = CreateObject("Msxml2.XMLHTTP")
oHttp.open "GET", sUrl, False
oHttp.send
SendData = Replace(oHttp.responseText,vbLf,"")
End Function
Amazon has completely removed support for accessing rating/review information from their API. The docs mention a Response Element in the form of customer rating, but that doesn't work either.
Google shopping using Viewpoints for some reviews and other sources
This is not possible from PAPI. You either need to scrape it by yourself, or you can use other free/cheaper third-party alternatives for that.
We use the amazon-price API from RapidAPI for this, it supports price/rating/review count fetching for up to 1000 products in a single request.