Reevoo API filtering - python-2.7

I am using Python to query the Reevoo API. As far as I can tell, the options for filtering are somewhat limited and the docs are an exhaustive list of what query parameters you can use. I was wondering if anybody had found a way to filter customer experience reviews with a date range?
Currently my hack solution is to use a generator which calls the API page by page and yields the review if its publish_date is after a certain date, which is obviously really inefficient. It doesn't help that the API returns the results slightly out of order, so I can't break/return as soon as I find one review that's out of range.
for i in range(number_of_pages, 0, -1):
# API call wrapper
page_of_reviews = self.reevoo.get_customer_experience_review_list(self.trkref, older_reviews=True,
page=i, per_page=30)
page_of_reviews = json.loads(page_of_reviews.text.replace('\r\n', ''))
customer_experience_reviews = page_of_reviews.get('customer_experience_reviews')
processed_reviews = self.process_customer_experience_reviews(customer_experience_reviews)
for item in processed_reviews['review_list']:
if from_dt:
if datetime.strptime(item['publish_date'], '%Y-%m-%d') >= datetime.strptime(from_dt, '%Y-%m-%d'):
yield item
else:
yield item
I've scoured the docs and Reevoo's GitHub page and haven't found anything, but in the hopes that some random person on the Internet has found a workaround... Does anyone have any ideas?

I emailed Reevoo to ask about date filtering and the short answer is that there is no way to filter or sort by date.
Explanation from the email:
Unfortunately, we cannot filter reviews by date as when we display the reviews, they are not necessarily in date order. For example, reviews with written content come before those which don't have written content as they have more value to the consumer. We would also prefer that you refreshed everything at least once a day, because older reviews sometimes have to be renewed or customers may sometimes request that there reviews be amended.
I understand why you would lie to do date filtering but at the moment, if you are caching reviews on your server, this is the way we prefer you to do it.

Related

Django Designing Models for Dating App Matches

I’m working on a dating app for a hackathon project. We have a series of questions that users fill out, and then every few days we are going to send suggested matches. If anyone has a good tutorial for these kinds of matching algorithms, it would be very appreciated. One idea is to assign a point value for each question and then to do a
def comparison(person_a, person_b) function where you iterate through these questions, and where there’s a common answer, you add in a point. So the higher the score, the better the match. I understand this so far, but I’m struggling to see how to save this data in the database.
In python, I could take each user and then iterate through all the other users with this comparison function and make a dictionary for each person that lists all the other users and a score for them. And then to suggest matches, I iterate through the dictionary list and if that person hasn’t been matched up already with that person, then make a match.
person1_dictionary_of_matches = {‘person2’: 3, ‘person3’: 5, ‘person4’: 10, ‘person5’: 12, ‘person6’: 2,……,‘person200’:10}
person_1_list_of_prior_matches = [‘person3’, 'person4']
I'm struggling on how to represent this in django. I could have a bunch of users and make a Match model like:
class Match(Model):
person1 = models.ForeignKey(User)
person2 = models.ForeignKey(User)
score = models.PositiveIntegerField()
Where I do the iteration and save all the pairwise scores.
and then do
person_matches = Match.objectsfilter(person1=sarah, person2!=sarah).order_by('score').exclude(person2 in list_of_past_matches)
But I’m worried with 1000 users, I will have 1000000 rows in my table if do this. Will this be brutal to have to save all these pairwise scores for each user in the database? Or does this not matter if I run it at like Sunday night at 1am or just cache these responses once and use the comparisons for a period of months? Is there a better way to do this than matching everyone up pairwise? Should I use some other data structure to capture the people and their compatibility score? Thanks so much for any guidance!
Interesting question. In machine learning's current paradigm you work with sparse matrices that means that you would not have to perform every single match evaluation. The sparsity may come from two alternatives:
Create a batch offline analysis of your data to perform some clustering (fancy solution).
Filter the individuals by some key attributes: a) gender/sexual preference, b) geographical location, c) dating status etc. (simple solution)
After the filtering you could perform a function for estimating appropriate matches for the new user. Based on the selected choices of the user adscribe selected matches into the database for future queries. However, if you get serious about this problem I suggest you give Spark a try. This is not a problem for an SQL database but for a Big Data Engine.

Feed Algorithm + Database: Either too many rows or too slow retrieval

Say I have a general website that allows someone to download their feed in a small amount of time. A user can be subscribed to many different pages, and the user's feed must be returned to the user from the server with only N of the most recent posts between all of the pages subscribed to. Originally when a user queried the server for a feed, the algorithm was as follows:
look at all of the pages a user subscribed to
getting the N most recent posts from each page
sorting all of the posts
return the N most recent posts to the user as their feed
As it turns out, doing this EVERY TIME a user tried to refresh a feed was really slow. Thus, I changed the database to have a table of feedposts, which simply has a foreign key to a user and a foreign key to the post. Every time a page makes a new post, it creates a feed post for each of its subscribing followers. That way, when a user wants their feed, it is already created and does not have to be created upon retrieval.
The way I am doing this is creating far too many rows and simply does not seem scalable. For instance, if a single page makes 1 post & has 1,000,000 followers, we just created 1,000,000 new rows in our feedpost table.
Please help!
How do companies such as facebook handle this problem? Do they generate the feed upon request? Are my database relationships terrible?
It's not that the original schema itself would be inherently wrong, at least not based on the high-level description you have provided. The slowness stems from the fact that you're not accessing the database in a way relational databases should be accessed.
In general, when querying a relational database, you should use JOINs and in-database ordering where possible, instead of fetching a bunch of data, and then trying to connect related objects and sort them in your code. If you let the database do all this for you, it will be much faster, because it can take advantage of indices, and only access those objects that are actually needed.
As a rule of thumb, if you need to sort the results of a QuerySet in your Python code, or loop through multiple querysets and combine them somehow, you're most likely doing something wrong and you should figure out how to let the database do it for you. Of course, it's not true every single time, but certainly often enough.
Let me try to illustrate with a simple piece of code. Assume you have the following models:
class Page(models.Model):
name = models.CharField(max_length=47)
followers = models.ManyToManyField('auth.User', related_name='followed_pages')
class Post(models.Model):
title = models.CharField(max_length=147)
page = models.ForeignKey(Page, related_name='posts')
content = models.TextField()
time_published = models.DateTimeField(auto_now_add=True)
You could, for example, get the list of the last 20 posts posted to pages followed by the currently logged in user with the following single line of code:
latest_posts = Post.objects.filter(page__followers=request.user).order_by('-time_published')[:20]
This runs a single SQL query against your database, which only returns the (up to) 20 results that match, and nothing else. And since you're joining on primary keys of all tables involved, it will conveniently use indices for all joins, making it really fast. In fact, this is exactly the kind of operation relational databases were designed to perform efficiently.
Caching will be the solution here.
You will have to reduce the database reads, which are much slower as compared to cache reads.
You can use something like Redis to cache the post.
Here is an amazing answer for better understanding
Is Redis just a cache
Each page can be assigned a key, and you can pull all of the posts for that page under that key.
you need not to cache everything , just cache resent M posts, where M>>N and safe enough to reduce the database calls.Now if in case user requests for posts beyond the latesd M ones, then they can be directly fetched from the DB.
Now when you have to generate the feed you can make a DB call to get all of the subscribed pages(or you can put in the cache as well) and then just get the required number of post's from the cache.
The problem here would be keeping the cache up-to date.
For that you can use something like django-signals. Whenever a new post is added, add it to the cache as well using the signal.
So for each DB write you will have to write to cache as well.
But then you will not have to read from DB and as Redis is a in memory datastore it is pretty fast as compared to standard relational databases.
Edit:
These are a few more articles which can help for better understanding
Does Stack Exchange use caching and if so, how
How Twitter Uses Redis to Scale - 105TB RAM, 39MM QPS, 10,000+ Instances

Musicbrainz querying artist and release

I am trying to get an artist and their albums. So reading this page https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2 i created the following query to get Michael Jackson's albums
http://musicbrainz.org/ws/2/artist/?query=artist:michael%20jackson?inc=releases+recordings
My understanding is to add ?inc=releases+recordings at the end of the URL which should return Michael Jackson's albums however this doesnt seem to return the correct results or i cant seem to narrow down the results? I then thought to use the {MBID} but again thats not returned in the artists query (which is why im trying to use inc in my query)
http://musicbrainz.org/ws/2/artist/?query=artist:michael%20jackson
Can anyone suggest where im going wrong with this?
You're not searching for the correct Entity. What you want is to get the discography, not artist's infos. Additionally, query fields syntax is not correct (you must use Lucene Search Syntax).
Here is what you're looking for:
http://musicbrainz.org/ws/2/release-group/?query=artist:"michael jackson" AND primarytype:"album"
We're targeting the release-group entity to get the albums, searching for a specific artist and filtering the results to limit them to albums. (accepted values are: album, single, ep, other)
There are more options to fit your needs, for example you can filter the type of albums using the secondarytype parameter. Here is the query to retrieve only live albums:
http://musicbrainz.org/ws/2/release-group/?query=artist:"michael jackson" AND primarytype:"album" AND secondarytype="live"
Here is the doc:
https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2/Search
Note that to be able to use MB's API you need to understand how it is structured, especially, the relations between release_group, release and medium.

Querying earliest post of a Facebook user using Facebook Graph API or FQL

I would like to query earliest posts of a Facebook user using FQL or Graph API. The big issue is by default, Facebook limit return items, which are ordered by descending time.
I know I can limit my query by until, but I don't know what date to put in, because I have no idea when my user become Facebook member. I have to do search like:
find post until Jan 2006
if null, then find post until Jan 2007
if null, then find post until Jan 2008
....
which I hate so much.
Is there a smarter way to find out earliest posts by user?
First off, it's near impossible to have an all encompassing program that determines when a user joined Facebook, to put it quite bluntly. I know from your past questions, you have been trying but many have tried before you, it's not possible.
For example what happens if no one decides to write anything on my wall from the date I joined to 1 year after? That indicator becomes pretty inaccurate now does it?
Anything smarter is based on assumptions that may or may not hold true.
e.g.
Assumption 1: Every Facebook user would publish a post on or near when they joined
this give an initial guess based on A1
Assumptions 2: Given A1, any post by a friend on a user's wall that is posted before the unix time returned by A1 will be earlier in date
this will always be true as long as A1 holds.
All of this falls when there is a year between actual activity and join date.
You can minimize the set returned by calling less data per item and more items overall
/me/feed?fields=created_time&limit=200
Then you page until there is no next paging parameter left.
If you are indeed trying to find when did a user join Facebook, I agree with phwd's answer.
The best way I have been able to find out (which is also cheaper than having to reiterate through tons of posts) is accessing the earliest "profile pictures" of the user. This is making the assumption that a user would post a profile picture soon after creating their account.
Once you can get access to "Profile Pictures" album, you might be able to use created_time field for the album (or sort Profile Pictures by created_time for individual photos).
Even if the earliest photo was deleted, what are the chances that the user stays without any profile picture for a long time?
Reference:
https://developers.facebook.com/docs/graph-api/reference/v2.0/album

Find User's First Post?

Using the Graph API or FQL, is there a way to efficiently find a user's first post or status? As in, the first one they ever made?
The slow way, I assume, would be to paginate through the feed, but for users like me who joined in 2005 or earlier, that would take a very long time with a huge amount of API calls.
From what I have found, we cannot obtain the date the user registered with Facebook for a good starting point, and we cannot sort by date ascending (not outside of the single page of data returned) to get the oldest post on top.
Is there any reasonable way to do this?
you can use facebook query language (FQL) to get first post information.
Please refer below query for more details :-
SELECT message, time FROM status WHERE uid= me() ORDER BY time ASC LIMIT 1
Please check and let me know in case of any issue.
Thanks and Regards
Durgaprasad
I think the Public API is limited to the depth of information it is allowed to query. Facebook probably put in these constraints for performance and cost concerns. Maybe they've changed it. When I tried to go backwards thru a person's stream about 4 months ago, there seemed to be a limit as to how far back I could go. Maybe it's a time limit or a # posts back limit. If you know when your user first posted, then getting to it should be fairly quick using the since/until time stamps in your queries.