read entire fb ad account structure - facebook-graph-api

v2.5 (Using java to implement http/json rest layer)
What is the most efficient way to read the entire structure of an FB ad account, by which I mean, all the entailed campaigns, adsets, ads and ad creatives?
Here is one way, which as a relative newcomer, I assume not to be the most efficient way. This is kind of a breadth first crawl:
read the account data, eg., act_123?fields=id,name,owner...
read the campaign data, eg., act_123/campaigns?fields=id,name...
for each campaign, read the adSet data
for each adSet, read the ad data
for each ad, read the ad creative data
I'm thinking that those of you who've been at this a while and need to do something similar have figured out the best strategy to do this in a time efficient manner while staying on good terms with FB server servicing the API calls (avoid limits, too many calls in too short a time, etc).
Even if the entire account structure must be crawled, perhaps going depth first is better than bread first, IOW, for each campaign, request the campaign data (using a nested fields param), to fetch the adSet data, the ad Data, etc.?
Any tips, advice or guidance would be most appreciated.
Thanks

Regarding staying on good terms with the fb servers: The cost for your api calls are not constant but depends on the complexity of the call. So making many calls should not be more expensive in that sense than retrieving everything in one big call.
I also think that the way you do it is the way it is supposed to be done. You could get all your creatives directly on the adaccount level for example. But not fetch the entire hierarchy as far as I know.

there is a java sdk released by Facebook recently (still Beta) but you might find further information there.
I'm actually looking answer for this issue.

Related

Is there a way to retrieve all targetable cities in the Ads API?

The autocomplete API allows us to retrieve lists of all countries, regions, and locales by leaving out the query string and setting the result limit to a large number, but this feature isn't available at the city level.
Is there a way that we can retrieve a full list of all targetable cities and their IDs? If not, can we cache the autocomplete data for cities to build up such a list?
That functionality is probably not supported because of the massive amount of return data that would result in fetching all the cities in the world, even with paging. Although limiting the response data by country (by using country_list=["ca"]) and then fetching all cities doesn't sound too far-fetched, however, it is not implemented either.
To me, it sounds like you have two options.
Create a bug report using our bug tool to request a wishlist feature (doesn't guarantee anything, but at least we can track it if we choose to implement it and can serve as a way to gauge interest in the feature)
IANAL, but according to the FB Platform Policies part 2 of section 2 states
You may cache data you receive through use of the Facebook API in order to improve your application’s user experience, but you should try to keep the data up to date. This permission does not give you any rights to such data.
Which sounds like you can cache the autocomplete data since it will better improve the UX of your app, however, just remember that you do not have the rights to the data. I would be cautious about this as it would really suck if you worked really hard to get all the caching functionality built in only to have FB say that it's not allowed. I would advise with some experts some more before pursuing this path.

Restful API - handling large amounts of data

I have written my own Restful API and am wondering about the best way to deal with large amounts of records returned from the API.
For example, if I use GET method to myapi.co.uk/messages/ this will bring back the XML for all message records, which in some cases could be 1000's. This makes using the API very sluggish.
Can anyone suggest the best way of dealing with this? Is it standard to return results in batches and to specify batch size in the request?
You can change your API to include additional parameters to limit the scope of data returned by your application.
For instance, you could add limit and offset parameters to fetch just a little part. This is how pagination can be done in accordance with REST. A request like this would result in fetching 10 resources from the messages collection, from 21st to 30th. This way you can ask for a specific portion of a huge data set:
myapi.co.uk/messages?limit=10&offset=20
Another way to decrease the payload would be to only ask for certain parts of your resources' representation. Here's how facebook does it:
/joe.smith/friends?fields=id,name,picture
Remember that while using either of these methods, you have to provide a way for the client to discover each of the resources. You can't assume they'll just look at the parameters and start changing them in search of data. That would be a violation of the REST paradigm. Provide them with the necessary hyperlinks to avoid it.
I strongly recommend viewing this presentation on RESTful API design by apigee (the screencast is called "Teach a Dog to REST"). Good practices and neat ideas to approach everyday problems are discussed there.
EDIT: The video has been updated a number of times since I posted this answer, you can check out the 3rd edition from January 2013
There are different ways in general by which one can improve the API performance including for large API sizes. Each of these topics can be explored in depth.
Reduce Size Pagination
Organizing Using Hypermedia
Exactly What a User Need With Schema Filtering
Defining Specific Responses Using The Prefer Header
Using Caching To Make Response
More Efficient More Efficiency Through Compression
Breaking Things Down With Chunked Responses
Switch To Providing More Streaming Responses
Moving Forward With HTTP/2
Source: https://apievangelist.com/2018/04/20/delivering-large-api-responses-as-efficiently-as-possible/
if you are using .net core
you have to try this magic package
Microsoft.AspNetCore.ResponseCompression
then use this line in configureservices in startup file
services.AddResponseCompression();
then in configure function
app.UseResponseCompression();

Facebook style like system in modx cms (php)

Trying to build a simple like system in modx (which uses php snippets of code) I just need a button that logged in users can press which adds a 'like' to a resource.
Would it be best to update a custom table or TV? my thoughts are that if it is a template variable i can use getResource to sort by amount of likes.
Any thoughts on the best way to approach this or how to build this would help. My php knowledge is limited.
Depends how you are going to use it after and if you are storing more data than just a 'like' count. TV's are expensive on resources [even more so if you are going to whip through the entire resource set with getResources] so if you are going to do a lot of processing after the fact I would either look at a custom table ~or~ explore using property sets on your pages [I think it should be pretty easy to write a plugin that will update a page property]
I'd definitely go for a custom table.
While you could simply increment a numeric TV to count the amount of likes, you will come to a situation where anyone may be able to keep on liking a resource without limit - while you didn't specify the exact concept, that hardly can be desired. Using a custom table you could throw in a relational alias to the user ID that liked the resource, add a timestamp so you know when it happened, and let your fantasy run wild on additional features that are now open to you.
While not a hard requirement for custom tables, you will probably want to take the time to learn xPDO, which is the database abstraction layer MODX is based on. There's a great tutorial on the RTFM which walks you through it.

Determine unique visitors to site

I'm creating a django website with Apache2 as the server. I need a way to determine the number of unique visitors to my website (specifically to every page in particular) in a full proof way. Unfortunately users will have high incentives to try to "game" the tracking systems so I'm trying to make it full proof.
Is there any way of doing this?
Currently I'm trying to use IP & Cookies to determine unique visitors, but this system can be easily fooled with a headless browser.
Unless it's necessary that the data be integrated into your Django database, I'd strongly recommend "outsourcing" your traffic to another provider. I'm very happy with Google Analytics.
Failing that, there's really little you can do to keep someone from gaming the system. You could limit based on IP address but then of course you run into the problem that often many unique visitors share IPs (say, via a university, organization, or work site). Cookies are very easy to clear out, so if you go that route then it's very easy to game.
One thing that's harder to get rid of is files stored in the appcache, so one possible solution that would work on modern browsers is to store a file in the appcache. You'd count the first time it was loaded in as the unique visit, and after that since it's cached they don't get counted again.
Of course, since you presumably need this to be backwards compatible then of course it leaves it open to exactly the sorts of tools which are most likely to be used for gaming the system, such as curl.
You can certainly block non-browserlike user agents, which makes it slightly more difficult if some gamers don't know about spoofing browser agent strings (which most will quickly learn).
Really, the best solution might be -- what is the outcome from a visit to a page? If it is, for example, selling a product, then don't award people who have the most page views; award the people whose hits generate the most sales. Or whatever time-consuming action someone might take at the page.
Possible solution:
If you're willing to ignore people with JavaScript disabled, you could choose to count only people who access the page and then stay on that page for a given window of time (say, 1 minute). After a given period of time, do an Ajax request back to the server. So if they tried to game by changing their cookie and loading multiple tabs at once, it wouldn't work because they'd need to have the same cookie in order to register that they'd been on that page long enough. I actually think this might work; I can't honestly see a way to game that. Basically on the server side you store a dictionary called stay_until in request.session with keys for each unique page and after 1 minute or so you run an Ajax call back to the server. If the value for stay_until[page_id] is less than or equal to the current time, then they're an active user, otherwise they're not. This means that it will take someone at least 20 minutes to generate 20 unique visitors, and so long as you make the payoff worth less than the time consumed that will be a strong disincentive.
I'd even make it more explicit: on the bottom of the page in a noscript tag, put "Your access was not counted. Turn on JavaScript to be counted" with a page that lays out the tracking process.
As HTML Requests are stateless and you have no control over the users behavior on his clientside, there is no bulletproof way.
The only way you're going to be able to track "unique" visitors in a fool-proof way is to make it contingent on some controlled factor such as a login. Anything else can and will fail to be completely accurate.

Storing mvc2 entity model in cache

I getting data from a webservice. (one-time timelimited password used for login)
Data only needs to be read, no updates.
I'm still looking for the best framework to put this in without making the small-medium site too heavy.
If I only gets my data from the webservice once, puts this in several objects..
Would it make sense to store this in cache and reuse it on other information pages?
Using mvc2, would it be sensible to put the entire entity model in HttpRuntime.Cache ?
(I guess session is out of the question..)
thanks,
nakori
You can use the EF caching provider, which will make your entities come from Velocity. That said, it's a lot easier to put stuff into a cache than to know when to expire it. Look into the CQRS approach for overall architecture.