Getting users latest tweet with Django - django

I want to create a function which grabs every users latest tweet from a specific group. So, if a user is in the 'authors' group, I want to grab their latest tweet and then finally cache the result for the day so we only do the crazy leg work once.
def latest_tweets(self):
g = Group.objects.get(name='author')
users = []
for u in g.user_set.all():
acc = u.get_profile().twitter_account
users.append('http://twitter.com/statuses/user_timeline/'+acc+'.rss')
return users
Is where I am at so far, but I'm at a complete loose end as to how I parse the RSS to get there latest tweet. Can anyone help me out here? If there is a better way to do this, any suggestions are welcome! I'm sure someone will suggest using django-twitter or other such libraries, but I'd like to do this manually if possible.
Cheers

why redo the stone?, you can download/install/import python-twitter and do something like:
tweet = twitter.Api().GetUserTimeline( u.get_profile().twitter_account )[0]
http://code.google.com/p/python-twitter/
an example: http://www.omh.cc/blog/2008/aug/4/adding-your-twitter-status-django-site/

Rss can be parsed by any xml parser. I've used the built-in module htmllib before for a different task and found it easy to deal with. If all you're doing is rss parsing though, I'd recommend feedparser. I haven't used it before, but it seems pretty straight forward.

If you go with python-twitter it is pretty simple. This is from memory so forgive me if I make a mistake here.
from django.core.cache import cache
import twitter
TWITTER_USER = 'username'
TWITTER_TIMEOUT = 3600
def latest_tweet(request):
tweet = cache.get('tweet')
if tweet:
return {"tweet":tweet}
api = twitter.Api()
tweets = api.GetUserTimeline(TWITTER_USER)
tweet = tweets[0]
tweet.date = datetime.strptime(
tweet.created_at, "%a %b %d %H:%M:%S +0000 %Y"
)
cache.set( 'tweet', tweet, TWITTER_TIMEOUT )
return {"tweet": tweet}

Related

How to mix multiple querysets into one and re order them by time created?

I am learning Django and still a beginner. For practising, i am trying to make a demo social media website. In my project, users can create groups, then they can post and comment there. In the home page, i am trying to add a section like 'recent activities' where a user can see recent activities in that website like "John created a group 'Javascript', Tim posted a comment in 'Python', Sarah posted in 'CSS'" Now i have made some queries like:
groups = Group.objects.all().order_by('-created')[0:5]
posts = Post.objects.all().order_by('-created')[0:5]
comments = Comment.objects.all().order_by('-created')[0:5]
I want to mix them all in a single queryset. Then order them all by the time they were created. I know it's a silly question and i have been stuck here since morning. Can you help me and show me the process please?
You can chain these together and order by the created field with:
from operator import attrgetter
groups = Group.objects.order_by('-created')[:5]
posts = Post.objects.order_by('-created')[:5]
comments = Comment.objects.order_by('-created')[:5]
all_items = sorted(
[*groups, *posts, *comments],
key=attrgetter('created'),
reversed=True
)
Now all_items is a hetrogenous list with different types of objects. This will thus make the rendering process a bit more complicated since a comment probably has different fields than a Post for example.
You can also use chain function from itertools module to combine the querysets and then sort them in reverse order using the created field as key.
from itertools import chain
groups = Group.objects.all()[0:5]
posts = Post.objects.all()[0:5]
comments = Comment.objects.all()[0:5]
queryset = sorted(
chain(groups, posts, comments),
key=lambda instance: instance.created,
reverse=True
)

ElasticSearch - bulk indexing for a completion suggester in python

I am trying to add a completion suggester to enable search-as-you-type for a search field in my Django app (using Elastic Search 5.2.x and elasticseach-dsl). After trying to figure this out for a long time, I am not able to figure yet how to bulk index the suggester. Here's my code:
class SchoolIndex(DocType):
name = Text()
school_type = Keyword()
name_suggest = Completion()
Bulk indexing as follows:
def bulk_indexing():
SchoolIndex.init(index="school_index")
es = Elasticsearch()
bulk(client=es, actions=(a.indexing() for a in models.School.objects.all().iterator()))
And have defined an indexing method in models.py:
def indexing(self):
obj = SchoolIndex(
meta = {'id': self.pk},
name = self.name,
school_type = self.school_type,
name_suggest = {'input': self.name } <--- # what goes in here?
)
obj.save(index="school_index")
return obj.to_dict(include_meta=True)
As per the ES docs, suggestions are indexed like any other field. So I could just put a few terms in the name_suggest = statement above in my code which will match the corresponding field, when searched. But my question is how to do that with a ton of records? I was guessing there would be a standard way for ES to automatically come up with a few terms that could be used as suggestions. For example: using each word in the phrase as a term. I could come up something like that on my own (by breaking each phrase into words) but it seems counter-intuitive to do that on my own since I'd guess there would already be a default way that the user could further tweak if needed. But couldn't find anything like that on SO/blogs/ES docs/elasticsearch-dsl docs after searching for quite sometime. (This post by Adam Wattis was very helpful in getting me started though). Will appreciate any pointers.
I think I figured it out (..phew)
In the indexing function, I need to use the following to enable to the prefix completion suggester:
name_suggest = self.name
instead of:
name_suggest = {'input': something.here }
which seems to be used for more custom cases.
Thanks to this video that helped!

GQL Queries - Retrieving specific data from query object

I'm building a database using Google Datastore. Here is my model...
class UserInfo(db.Model):
name = db.StringProperty(required = True)
password = db.StringProperty(required = True)
email = db.StringProperty(required = False)
...and below is my GQL Query. How would I go about retrieving the user's password and ID from the user_data object? I've gone through all the google documentation, find it hard to follow, and have spent ages trying various things I've read online but nothing has helped! I'm on Python 2.7.
user_data = db.GqlQuery('SELECT * FROM UserInfo WHERE name=:1', name_in)
user_info = user_data.get()
This is basic Python.
From the query, you get a UserInfo instance, which you have stored in the user_info variable. You can access the data of an instance via dot notation: user_info.password and user_info.email.
If this isn't clear, you really should do a basic Python tutorial before going any further.
You are almost there. Treat the query object like a class.
name = user_info.name
Documentation on queries here gives some examples
There are some python tips that might help you
dir(user_info)
help(user_info)
you can also print almost anything, like
print user_info[0]
print user_info[0].name
Setup logging for your app
Logging and python etc

Django-haystack (xapian) autocomplete giving incomplete results

I have a django site running django-haystack with xapian as a back end. I got my autocomplete working, but it's giving back weird results. The results coming back from the searchqueryset are incomplete.
For example, I have the following data...
['test', 'test 1', 'test 2']
And if I type in 't', 'te', or 'tes' I get nothing back. However, if I type in 'test' I get back all of the results, as would be expected.
I have something looking like this...
results = SearchQuerySet().autocomplete(auto=q).values('auto')
And my search index looks like this...
class FacilityIndex(SearchIndex):
text = CharField(document=True, use_template=True)
created = DateTimeField(model_attr='created')
auto = EdgeNgramField(model_attr='name')
def get_model(self):
return Facility
def index_queryset(self):
return self.get_model().objects.filter(created__lte=datetime.datetime.now())
Any tips are appreciated. Thanks.
A bit late, but you need to check the min ngram size that is being indexed. It is most likely 4 chars, so it won't match on anything with fewer chars than that. I am not a Xapian user though, so I don't know how to change this configuration option for that backend.

Is there an Amazon.com API to retrieve product reviews? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Do any of the AWS APIs/Services provide access to the product reviews for items sold by Amazon? I'm interested in looking up reviews by (ASIN, user_id) tuple. I can see that the Product Advertising API returns a URL to a page (for embedding in an IFRAME) containing the URLs, but I am interested in a machine-readable format of the review data, if possible.
Update 2:
Please see #jpillora's comment. It's probably the most relevant regarding Update 1.
I just tried out the Product Advertising API (as of 2014-09-17), it seems that this API only returns a URL pointing to an iframe containing just the reviews. I guess you'd have to screen scrape - though I imagine that would break Amazon's TOS.
Update 1:
Maybe. I wrote the original answer below earlier. I don't have time to look into this right now because I'm no longer on a project concerned with Amazon reviews, but their webpage at Product Advertising API states "The Product Advertising API helps you advertise Amazon products using product search and look up capability, product information and features such as Customer Reviews..." as of 2011-12-08. So I hope someone looks into it and posts back here; feel free to edit this answer.
Original:
Nope.
Here is an intersting forum discussion about the fact including theories as to why: http://forums.digitalpoint.com/showthread.php?t=1932326
If I'm wrong, please post what you find. I'm interested in getting the reviews content, as well as allowing submitting reviews to Amazon, if possible.
You might want to check this link: http://reviewazon.com/. I just stumbled across it and haven't looked into it, but I'm surprised I don't see any mention on their site about the update concerning the drop of Reviews from the Amazon Products Advertising API posted at: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
Here's my quick take on it - you easily can retrieve the reviews themselves with a bit more work:
countries=['com','co.uk','ca','de']
books=[
'''http://www.amazon.%s/Glass-House-Climate-Millennium-ebook/dp/B005U3U69C''',
'''http://www.amazon.%s/The-Japanese-Observer-ebook/dp/B0078FMYD6''',
'''http://www.amazon.%s/Falling-Through-Water-ebook/dp/B009VJ1622''',
]
import urllib2;
for book in books:
print '-'*40
print book.split('%s/')[1]
for country in countries:
asin=book.split('/')[-1]; title=book.split('/')[3]
url='''http://www.amazon.%s/product-reviews/%s'''%(country,asin)
try: f = urllib2.urlopen(url)
except: page=""
page=f.read().lower(); print '%s=%s'%(country, page.count('member-review'))
print '-'*40
According to Amazon Product Advertizing API License Agreement (https://affiliate-program.amazon.com/gp/advertising/api/detail/agreement.html) and specifically it's point 4.b.iii:
You will use Product Advertising Content only ... to send end users to and drive sales on the Amazon Site.
which means it's prohibited for you to show Amazon products reviews taken trough their API to sale products at your site. It's only allowed to redirect your site visitors to Amazon and get the affiliate commissions.
I would use something like the answer of #mfs above. Unfortunately, his/her answer would only work for up to 10 reviews, since this is the maximum that can be displayed on one page.
You may consider the following code:
import requests
nreviews_re = {'com': re.compile('\d[\d,]+(?= customer review)'),
'co.uk':re.compile('\d[\d,]+(?= customer review)'),
'de': re.compile('\d[\d\.]+(?= Kundenrezens\w\w)')}
no_reviews_re = {'com': re.compile('no customer reviews'),
'co.uk':re.compile('no customer reviews'),
'de': re.compile('Noch keine Kundenrezensionen')}
def get_number_of_reviews(asin, country='com'):
url = 'http://www.amazon.{country}/product-reviews/{asin}'.format(country=country, asin=asin)
html = requests.get(url).text
try:
return int(re.compile('\D').sub('',nreviews_re[country].search(html).group(0)))
except:
if no_reviews_re[country].search(html):
return 0
else:
return None # to distinguish from 0, and handle more cases if necessary
Running this with 1433524767 (which has significantly different number of reviews for the three countries of interest) I get:
>> print get_number_of_reviews('1433524767', 'com')
3185
>> print get_number_of_reviews('1433524767', 'co.uk')
378
>> print get_number_of_reviews('1433524767', 'de')
16
Hope it helps
As said by others above, amazon has discontinued providing reviews in its api. Howevever, i found this nice tutorial to do the same with python. Here is the code he gives, works for me! He uses python 2.7
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Written as part of https://www.scrapehero.com/how-to-scrape-amazon-product-reviews-using-python/
from lxml import html
import json
import requests
import json,re
from dateutil import parser as dateparser
from time import sleep
def ParseReviews(asin):
#This script has only been tested with Amazon.com
amazon_url = 'http://www.amazon.com/dp/'+asin
# Add some recent user agent to prevent amazon from blocking the request
# Find some chrome user agent strings here https://udger.com/resources/ua-list/browser-detail?browser=Chrome
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
page = requests.get(amazon_url,headers = headers).text
parser = html.fromstring(page)
XPATH_AGGREGATE = '//span[#id="acrCustomerReviewText"]'
XPATH_REVIEW_SECTION = '//div[#id="revMHRL"]/div'
XPATH_AGGREGATE_RATING = '//table[#id="histogramTable"]//tr'
XPATH_PRODUCT_NAME = '//h1//span[#id="productTitle"]//text()'
XPATH_PRODUCT_PRICE = '//span[#id="priceblock_ourprice"]/text()'
raw_product_price = parser.xpath(XPATH_PRODUCT_PRICE)
product_price = ''.join(raw_product_price).replace(',','')
raw_product_name = parser.xpath(XPATH_PRODUCT_NAME)
product_name = ''.join(raw_product_name).strip()
total_ratings = parser.xpath(XPATH_AGGREGATE_RATING)
reviews = parser.xpath(XPATH_REVIEW_SECTION)
ratings_dict = {}
reviews_list = []
#grabing the rating section in product page
for ratings in total_ratings:
extracted_rating = ratings.xpath('./td//a//text()')
if extracted_rating:
rating_key = extracted_rating[0]
raw_raing_value = extracted_rating[1]
rating_value = raw_raing_value
if rating_key:
ratings_dict.update({rating_key:rating_value})
#Parsing individual reviews
for review in reviews:
XPATH_RATING ='./div//div//i//text()'
XPATH_REVIEW_HEADER = './div//div//span[contains(#class,"text-bold")]//text()'
XPATH_REVIEW_POSTED_DATE = './/a[contains(#href,"/profile/")]/parent::span/following-sibling::span/text()'
XPATH_REVIEW_TEXT_1 = './/div//span[#class="MHRHead"]//text()'
XPATH_REVIEW_TEXT_2 = './/div//span[#data-action="columnbalancing-showfullreview"]/#data-columnbalancing-showfullreview'
XPATH_REVIEW_COMMENTS = './/a[contains(#class,"commentStripe")]/text()'
XPATH_AUTHOR = './/a[contains(#href,"/profile/")]/parent::span//text()'
XPATH_REVIEW_TEXT_3 = './/div[contains(#id,"dpReviews")]/div/text()'
raw_review_author = review.xpath(XPATH_AUTHOR)
raw_review_rating = review.xpath(XPATH_RATING)
raw_review_header = review.xpath(XPATH_REVIEW_HEADER)
raw_review_posted_date = review.xpath(XPATH_REVIEW_POSTED_DATE)
raw_review_text1 = review.xpath(XPATH_REVIEW_TEXT_1)
raw_review_text2 = review.xpath(XPATH_REVIEW_TEXT_2)
raw_review_text3 = review.xpath(XPATH_REVIEW_TEXT_3)
author = ' '.join(' '.join(raw_review_author).split()).strip('By')
#cleaning data
review_rating = ''.join(raw_review_rating).replace('out of 5 stars','')
review_header = ' '.join(' '.join(raw_review_header).split())
review_posted_date = dateparser.parse(''.join(raw_review_posted_date)).strftime('%d %b %Y')
review_text = ' '.join(' '.join(raw_review_text1).split())
#grabbing hidden comments if present
if raw_review_text2:
json_loaded_review_data = json.loads(raw_review_text2[0])
json_loaded_review_data_text = json_loaded_review_data['rest']
cleaned_json_loaded_review_data_text = re.sub('<.*?>','',json_loaded_review_data_text)
full_review_text = review_text+cleaned_json_loaded_review_data_text
else:
full_review_text = review_text
if not raw_review_text1:
full_review_text = ' '.join(' '.join(raw_review_text3).split())
raw_review_comments = review.xpath(XPATH_REVIEW_COMMENTS)
review_comments = ''.join(raw_review_comments)
review_comments = re.sub('[A-Za-z]','',review_comments).strip()
review_dict = {
'review_comment_count':review_comments,
'review_text':full_review_text,
'review_posted_date':review_posted_date,
'review_header':review_header,
'review_rating':review_rating,
'review_author':author
}
reviews_list.append(review_dict)
data = {
'ratings':ratings_dict,
'reviews':reviews_list,
'url':amazon_url,
'price':product_price,
'name':product_name
}
return data
def ReadAsin():
#Add your own ASINs here
AsinList = ['B01ETPUQ6E','B017HW9DEW']
extracted_data = []
for asin in AsinList:
print "Downloading and processing page http://www.amazon.com/dp/"+asin
extracted_data.append(ParseReviews(asin))
sleep(5)
f=open('data.json','w')
json.dump(extracted_data,f,indent=4)
if __name__ == '__main__':
ReadAsin()
Here, is the link to his website reviews scraping with python 2.7
Unfortunately you can only get an iframe URL with the reviews, the content itself is not accessible.
Source: http://docs.amazonwebservices.com/AWSECommerceService/2011-08-01/DG/CHAP_MotivatingCustomerstoBuy.html#GettingCustomerReviews
Checkout RapidAPI: https://rapidapi.com/blog/amazon-product-reviews-api/
By using this API we can get Amazon product reviews.
You can use Amazon Product Advertising API. It has a Response Group 'Reviews' which you can use with operation 'ItemLookup'. You need to know ASIN i.e. unique item id of the product.
Once you set all the parameters and execute the signed URL, you will receive an XML which contains a link to customer reviews under "IFrameURL" tag.
Use this URL and use pattern searching in html returned from this url to extract the reviews. For each review in the html, there will be a unique review id and under that you can get all the data for that particular review.