I am trying to extract all the posts from a private FB group of which I am an admin. I am using a Python script to access the data. Whether I use the Graph API Explorer via the web, or my Python script, I am having the exact same problem. I am able to gather the first 6 pages of the feed, each page containing 25 posts. The very first request looks like this:
https://graph.facebook.com/<groupID>/feed?access_token=<accessToken>
That will return, as I stated, the latest 25 posts on the group page.
At the bottom of the JSON that is returned for each request is a section like this:
"paging": {
"previous": "https://graph.facebook.com/v13.0/<pageID>/feed?access_token=<tokenID>&pretty=0&until&__previous=1&since=1649789940&__paging_token=<paging_token>",
"next": "https://graph.facebook.com/v13.0/<pageID>/feed?access_token=<tokenID>&pretty=0&until=1647885515&since&__paging_token=<paging_token>&__previous"
}
I use the value in next to launch the next query. This works until I get to the 6th request. At that point when I request the URL in next it spins for about 15 seconds and then I get the following error:
{
"error": {
"code": 1,
"message": "Please reduce the amount of data you're asking for, then retry your request"
}
}
How exactly do I reduce my data that I'm requesting? I've tried adding the feed.limit() to the request, and it works for the very first request. But that limit is never included in the next URL. Adding it in myself via the script still always returns 25 posts, not what the limit was on the first try. So if I set feed.limit(7) it returns 7 posts on the first request, but then when I use the next link I get 25.
I've set the limit to 100, the first request works, next works the first time, but not the second. If I set the limit to 120 it works with the first query, but now next doesn't. So it seems like it has this built in barrier at 125, it won't give me any more data than that. Any help would be greatly appreciated.
Related
I am trying to do some unit tests using elasticsearch. I first start by using the index API about 100 times to add new data to my index. Then I use the search API with aggs. The problem is if I don't pause for 1 second after adding data 100 times, I get random results. If I wait 1 second I always get the same result.
I'd rather not have to wait x amount of time in my tests, that seems like bad practice. Is there a way to know when the data is ready?
I am waiting until I get a success response from elasticsearch /index api already, but that is not enough it seems.
First I'd suggest you to index your documents with a single bulk query : it would save some time because of less http/tcp overhead.
To answer your question, you should consider using the refresh=true parameter (or wait_for) while indexing your 100 documents.
As stated in documentation, it would :
Refresh the relevant primary and replica shards (not the whole index)
immediately after the operation occurs, so that the updated document
appears in search results immediately
More about it here :
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html
I have a list of football team names my_team_list = ['Bayern Munich', 'Chelsea FC', 'Manchester United', ...] and try to search for their official Facebook page to get their fan_count using the Python facebook-api. This is my code so far:
club_list = []
for team in my_team_list:
data = graph.request('/pages/search?q=' + team[0])
for i in data['data']:
likes = graph.get_object(id=i['id'], fields='id,name,fan_count,is_verified')
if likes['is_verified'] is True:
club_list.append([team[0],likes['name'],likes['fan_count'],likes['id']])
However, because my list contains over 3000 clubs, with this code I will get the following rate limit error:
GraphAPIError: (#4) Application request limit reached
How can I reduce the calls to get more than one club's page per call (e.g. batch calls?)
As the comment on the OP states, batching will not save you. You need to be actively watching the rate limiting:
"All responses to calls made to the Graph API include an X-App-Usage HTTP header. This header contains the current percentage of usage for your app. This percentage is equal to the usage shown to you in the rate limiting graphs. Use this number to dynamically balance your call load to avoid being throttled."
https://developers.facebook.com/docs/graph-api/advanced/rate-limiting#application-level-rate-limiting
On your first run through, you should save all of the valid page ids so in the future you can just query those ids instead of doing a search.
Currently I am using following API call to retrieve Post Likes and Post Comments for Facebook Page (PageId). Here in below i am making only one API call and retrieving ALL posts and their comments total count.
1). https://graph.facebook.com/PageId/posts?access_token=xyz&method=GET&format=json
But, as per "July 2013 Breaking Changes" : - Now comments counts are not available with above API call. so , as per Road Map documentation I am using following API call to retrieve comments count ('total_count') for that particular POST ID.
2). https://graph.facebook.com/post_ID/?summary=true&access_token=xyz&method=GET&format=json
So , with second API call - I am able to retrieve comments count per Post Wise. But, here you can see that I need to iterate through each post & need to retrieve its comments count one by one per each post id. then need to sum up all to find out total comments count. so that requires too much API calls.
My Question is :- Is it possible to retrieve Page -> Posts -> ALL comments total count in single API call by considering 10 July breaking changes ?
Is there any alternative to my second API call to retrieve all comments total count per Facebook page posts ?
Hmm, well, I don't believe there is a way to bundle this all in a single api call. But, you can batch requests to get this in the seemingly same api call (will save time), but they will count against your rate limits separately. (my example below would be 4 calls against the limits)
Example batch call (json encoded) - and i'm storing the post ID in the php variable $postId.:
[{"method":"GET","relative_url":"' . $postId . '"},
{"method":"GET","relative_url":"' . $postId . '/likes?limit=1000&summary=true"},
{"method":"GET","relative_url":"' . $postId . /comments?filter=stream&limit=1000&summary=true"},
{"method":"GET","relative_url":"' . $postId . '/insights"}]
I'm batching 4 queries in this single call. First to get post info, second to get likes (up to 1000, plus the total count), third to get all the comments, plus the summary count, and finlly, insights (if it's the page's own posts).
You can drastically simplify this batch call if you don't want all the details I'm pulling.
In this case you still need to iterate though all. But, Facebook allows you to bundle up to 50 calls per batch request I believe, so you could request multiple post ids in the same batch call to speed things up too.
I have two otherwise identical posts on a Facebook page that I administer. One post we'll call "full" returns the full range of insight values (31) I'd expect even when the values are zero, while the other which we'll call "subset" returns only a very limited subset of values (7). See below for the actual values returned.
Note that I've confirmed this is the case by using both the GUI-driven export to Excel and the Facebook Graph API Explorer (https://developers.facebook.com/tools/explorer).
My first thought was that the API suppresses certain values such as post_negative_feedback if they are zero (i.e., nobody has clicked hide or report as spam/abusive), but this is not the case. The "full" post has no such reports (or at the very least the return value for all the post_negative_* fields are zero.
I've even tried intentionally reporting the post with no negative return values as spam, and then repulling what I thought was a real-time field (i.e., post_negative_feedback), but data still comes back empty:
{
"data": [
],
(paging data)
}
What gives?
Here is the more limited subset returned for the problematic post:
post_engaged_users
post_impressions
post_impressions_fan
post_impressions_fan_unique
post_impressions_organic
post_impressions_organic_unique
post_impressions_unique
And here is the full set returned for most other posts (with asterisks added to show the subset returned above):
post_consumptions
post_consumptions_by_type
post_consumptions_by_type_unique
post_consumptions_unique
*post_engaged_users
*post_impressions
post_impressions_by_story_type
post_impressions_by_story_type_unique
*post_impressions_fan
post_impressions_fan_paid
post_impressions_fan_paid_unique
*post_impressions_fan_unique
*post_impressions_organic
*post_impressions_organic_unique
post_impressions_paid
post_impressions_paid_unique
*post_impressions_unique
post_impressions_viral
post_impressions_viral_unique
post_negative_feedback
post_negative_feedback_by_type
post_negative_feedback_by_type_unique
post_negative_feedback_unique
post_stories
post_stories_by_action_type
post_story_adds
post_story_adds_by_action_type
post_story_adds_by_action_type_unique
post_story_adds_unique
post_storytellers
post_storytellers_by_action_type
The issue (besides "why does this happen?") is that I've tried giving negative feedback to the post that fails to report any count whatsoever for this -- and I still receive no data (would expect "1" or something around there). I started out waiting the obligatory 15 minutes (real-time field) and then when that didn't work give it a full 24 hours. What gives?
I am trying to use the graph api with limit and since
I think the highest limit is 5000, so I am using that ( I want to make the fewest calls).
I am also trying to look 1 month back.
So I try:
https://graph.facebook.com/[ID of page]/feed&access_token=[accesstoken]&limit=5000&since=11-12-24
and I get 207 results, and the earliest date is december 24th, this is all fine, its saying hey there are only 207 results in the last month. The problem is there is a next link that has:
"next": "https://graph.facebook.com/[id of page]/feed?limit=5000&until=1324702511"
If I get this page, I start getting posts beore december 24th.
So my question is, how can I be sure I get all posts after a given date with fewest calls???
The kludge I am thinking of is to set the since on the first call to 1 day before, then if I get a post with that date, I know I got them all, if not I paginate... 5000 posts in one month is a lot, but I think its possible...
It seems like facebook should provide a way to get since with highest limit possible...I read this http://developers.facebook.com/blog/post/478/ but im still confused.