What's the IP address range of Facebook's Open Graph crawler? - facebook-graph-api

In order to test the Open Graph API on our preview environment, we need to poke a hole in our firewall to allow Facebook to scrape our object pages. What IP ranges should we allow?

EDIT
Facebook has been showing some love and is now making the IP block public for anyone to have
http://developers.facebook.com/docs/ApplicationSecurity/#facebook_scraper
https://developers.facebook.com/docs/sharing/best-practices#crawl
Facebook Scraper
A number of Platform services such as Social Plugins and the Open
Graph require our systems to be able to reach your Web Pages. We
recognize that there are situations where you might not want these
pages on the public Internet, during testing or for other security
reasons.
To facilitate this, you should make exceptions in your security
systems to allow Facebook to scrape these pages by adding the
following IP ranges, accurate as of April 2012.
31.13.24.0/21
31.13.64.0/18
66.220.144.0/20
69.63.176.0/20
69.171.224.0/19
74.119.76.0/22
103.4.96.0/22
173.252.64.0/18
204.15.20.0/22
Instead of IP, you can also use the user agent for your firewall.
http://developers.facebook.com/docs/reference/plugins/like/
When does Facebook scrape my page?
Facebook needs to scrape your page to know how to display it around
the site.
Facebook scrapes your page every 24 hours to ensure the properties are
up to date. The page is also scraped when an admin for the Open Graph
page clicks the Like button and when the URL is entered into the
Facebook URL Linter. Facebook observes cache headers on your URLs - it
will look at "Expires" and "Cache-Control" in order of preference.
However, even if you specify a longer time, Facebook will scrape your
page every 24 hours.
The user agent of the scraper is: "facebookexternalhit/1.1
(+http://www.facebook.com/externalhit_uatext.php)"

whois -h whois.radb.net -- '-i origin AS32934' | grep ^route to see all ranges.

66.220.144.0/20
66.220.144.0/21
66.220.152.0/21
66.220.159.0/24
69.63.176.0/20
69.63.176.0/21
69.63.176.0/24
69.63.184.0/21
69.171.224.0/19
69.171.224.0/20
69.171.239.0/24
69.171.240.0/20
69.171.255.0/24
74.119.76.0/22
103.4.96.0/22
173.252.64.0/18
173.252.64.0/19
173.252.70.0/24
173.252.96.0/19
204.15.20.0/22
31.13.24.0/21
31.13.64.0/18
31.13.64.0/19
31.13.64.0/24
31.13.65.0/24
31.13.66.0/24
31.13.67.0/24
31.13.68.0/24
31.13.69.0/24
31.13.70.0/24
31.13.71.0/24
31.13.72.0/24
31.13.73.0/24
31.13.74.0/24
31.13.75.0/24
31.13.76.0/24
31.13.77.0/24
31.13.96.0/19

Facebook now publishes their IP range.
As of April 2012, it is:
31.13.24.0/21
31.13.64.0/18
66.220.144.0/20
69.63.176.0/20
69.171.224.0/19
74.119.76.0/22
103.4.96.0/22
173.252.64.0/18
204.15.20.0/22

New information is listed on the following URL & yes, they do have this info public.
https://developers.facebook.com/docs/sharing/webmasters
Run this command to get a current list of IP addresses the crawler
uses.
whois -h whois.radb.net -- '-i origin AS32934' | grep ^route
Such as
# For example only - over 100 in total
31.13.24.0/21
66.220.144.0/20
2401:db00::/32
2620:0:1c00::/40
2a03:2880::/32
So yeah, the ones mentioned by DMCS, stand correct. Just wanted to verify & found this info.
Thanks

Facebook does not publish their crawler source address range officially, but you can look at the list of all their IP ranges in the publicly available BGP routing table:
http://www.robtex.com/as/as32934.html#bgp
We're currently using this list:
69.171.224.0/19
74.119.76.0/22
204.15.20.0/22
66.220.144.0/20
69.63.176.0/20
173.252.64.0/18

Related

link preview not working (shows 301) on shortened bitly links on Facebook

When posting a link on Facebook, after setting the open graph meta tags (e.g. og:title, og:image) it successfully shows a link preview as I intend it to. However after shortening the link with bit.ly, when I post it on Facebook the link preview becomes "301 moved permanently" with no image. I get the same with tinyurl, are there any specific tags I should be adding here? I have tried refreshing with the sharing debugger, re crawling, trying iterations of the url with http and https and it's all the same result with the short url
Answering my own question here for all those in the future who may also have this problem,
I had to actually go into my site settings and turn https redirecting off, and enable both http and https

GET request 200 OK but 'failed to load response data' for links

I made a personal website (http://www.soyoungpark.online) using domain bought from GoDaddy and hosted on AWS s3. I set up everything and thought things were working until I put a simple link to my linkedin profile. When I check the network panel, I see that status code is 200 OK but for the response..there is nothing. The code itself doesn't seem to be problematic; it is simple a with href of the desired link. So I am guessing something could be wrong with my AWS s3 settings? Anyone with similar experience?
It's likely that these services include a header option called "X-Frame" that for security prevents them from being loaded within another site:
The X-Frame-Options HTTP response header can be used to indicate whether or not a browser should be allowed to render a page in a <frame>, <iframe> or <object> . Sites can use this to avoid clickjacking attacks, by ensuring that their content is not embedded into other sites. Source: X-Frame-Options
This does look to be the case when attempting to view Linkedin per your example:
Refused to display 'https://www.linkedin.com/in/exampleuser' in a frame because it set 'X-Frame-Options' to 'sameorigin'.
That said, applying a target Attribute to each to open in a new tab or window should allow these outside services to be navigated to.
e.g:
<a href="https://www.linkedin.com/in/exampleuser" target="_blank">

Google: Permission denied to generate login hint for target domain NOT on localhost

I am trying to create a Google sign-in and getting the error:
Permission denied to generate login hint for target domain
Before you mark this a duplicate, this is not the same as the question asked at Google sign in website Error : Permission denied to generate login hint for target domain because in that case the questioner was on localhost, whereas I am getting this error on the server.
Specifically, I have included the url of the server in the Authorized Javascript Origins, as in the following image:
and when I get the error, the request shows that the same url was sent, as in the following image:
Is there something else I should be putting in my Restrictions page? Is there any way to figure out what is going on here? Is there a log at the developer console that can tell me what is happening?
Okay, I figured this out. I was using an IP address (as in "http://175.132.64.120") for the redirect uri, as this was a test site on the live server, and Google only accepts actual urls (as in "http://mycompany.com" or "http://localhost") as redirect uris.
Which, you know, THEY COULD HAVE SAID SOMEWHERE IN THE DOCUMENTATION, but whatever.
I know this is an old question, but it's the first result when you look for the problem via Google, so I'll share my solution with you guys.
When deploying Google OAuth service in a private network, namely some IP that can't be accessed via the Internet, you should use a magic DNS service, like xip.io that will give you an URL that your browser will resolve to your internal IP. You see, Google needs to be able to reach your authorized origin via your browser, that's why setting localhost works if you're serving it on your computer, but it won't work when you're deploying outside the Internet, as in a VPN, intranet, or with a tunnel.
So, the steps:
get your IP address, the one you're deploying at and it's not a public domain, let's say it's 10.0.0.1 as an example.
add http://10.0.0.1.xip.io to your Authorized Javascript Origins on the Google Developer Console.
open your site by visiting http://10.0.0.1.xip.io
clear your cache for the site, if necessary.
Log in with Google, and voilĂ .
I got to this solution using this answer in another question.
If you are using http://127.0.0.1/projects/testplateform, change it into http://localhost/projects/testplateform, it will work just fine.
If you testing in your machine (locally). then dont use the IP address (i.e. http://127.0.0.1:8888) in the Client ID configuration , but use the local host instead and it should work
Example: http://localhost:8888
To allow ip address to be used as valid javascript origin, first add an entry in your /etc/hosts file
10.0.0.1 mydevserver.com
and then add this domain mydeveserver.com in Authorized Javascript Origins. If you are using some nonstandard port, then specify it with your domain in Authorized Javascript Origins.
Note: Remove your cache and it will work.
Just ran across this same issue on an external test server, without a DNS entry yet. If you have permission on your local machine just edit your /etc/hosts file:
175.132.64.120 www.jimboweb.com
And use use http://www.jimboweb.com as an authorized domain.
I have a server in private net, ip 172.16.X.X
The problem was solved with app port ssh-forwarding to my localhost port.
Now I am able to use deployed app with google oauth browsing to localhost.
ssh -N -L8081:localhost:8080 ${user}#${host}
I also add localhost:8081 to "Authorized URI redirect" and "Authorized JavaScript sources" in console.developers.google.com:
google developers console
After battling with it for a few hours, I found out that my config in the Google Cloud console was all correct and similar to the answers provided. Due to caching issues or something, I had to recreate a OAuth Client ID and then it suddenly started working.
Its a pretty old issue, but I encountered it and there wasn't any helpful resource, as such I am posting my solution.
For me the issue was when I hosted my web-app locally, a using google-auth for logging in.
The URL I was trying to hit was :- http://127.0.0.1:8000/master
I just changed from IP to http://localhost:8000/master/
And it worked. I was able to log in to the website using Google Auth.
Hope this helps someone someday.
install xampp and run apache server,
put your files (index and co) in a folder in the xampp dir (c:\xampp\htdocs\yourfolder).
Type this in your browser url - http://localhost/yourfolder/index.html

OpenGraph Debugger reporting bad HTTP response codes

For a number of sites that are functioning normally, when I run them through the OpenGraph debugger at developers facebook com/tools/debug, Facebook reports that the server returned a 502 or 503 response code.
These sites are clearly working fine on servers that are not under heavy load. URLs I've tried include but are not limited to:
http://ac.mediatemple.net
http://freespeechforpeople.org
These are in fact all sites hosted by MediaTemple. After talking to people at MediaTemple, though, they've insisted that it must be a bug in the API and is not an issue on their end. Anyone else getting unexpected 500/502/503 HTTP response codes from the Facebook Debug tool, with sites hosted by MediaTemple or anyone else? Is there a fix?
Note that I've reviewed the Apache logs on one of these and could find no evidence of Apache receiving the request from Facebook, or of a 502 response etc.
Got this response of them:
At this time, it would appear that (mt) Media Temple servers are returning 200 response codes to all requests from Facebook, including the debugger. This can be confirmed by searching your access logs for hits from the debugger. For additional information regarding viewing access logs, please review the following KnowledgeBase article:
Where are the access_log and error_log files for my server?
http://kb.mediatemple.net/questions/732/Where+are+the+access_log+and+error_log+files+for+my+server%3F#gs
You can check your access logs for hits from Facebook by using the following command:
cat <name of access log> | grep 'facebook'
This will return all hits from Facebook. In general, the debugger will specify the user-agent 'facebookplatform/1.0 (+http://developers.facebook.com),' while general hits from Facebook will specify 'facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php).'
Using this information, you can perform even further testing by using 'curl' to emulate a request from Facebook, like so:
curl -Iv -A "facebookplatform/1.0 (+http://developers.facebook.com)" http://domain.com
This should return a 200 or 206 response code.
In summary, all indications are that our servers are returning 200 response codes, so it would seem that the issue is with the way that the debugger is interpreting this response code. Bug reports have been filed with Facebook, and we are still working to obtain more information regarding this issue. We will be sure to update you as more information becomes available.
So good news, is that they are busy with it solving it. Bad news, it's out of our control.
There's a forum post here of the matter:
https://forum.mediatemple.net/topic/6759-facebook-503-502-same-html-different-servers-different-results/
With more than 800 views, and recent activity, it states that they are working hard on it.
I noticed that https MT sites don't even give a return code:
Error parsing input URL, no data was scraped.
RESOLUTION
MT admitted it was their fault and fixed it:
During our investigation of the Facebook debugger issue, we have found that multiple IPs used by this tool were being filtered by our firewall due to malformed requests. We have whitelisted the range of IP addresses used by the Facebook debugger tool at this time, as listed on their website, which should prevent this from occurring again.
We believe our auto-banning system has been blocking several Facebook IP addresses. This was not immediately clear upon our initial investigation and we apologize this was not caught earlier.
The reason API requests may intermittently fail is because only a handful of the many Facebook IP addresses were blocked. The API is load-balanced across several IP ranges. When our system picks up abusive patterns, like HTTP requests resulting in 404 responses or invalid PUT requests, a global firewall rule is added to mitigate the behavior. More often than not, this system works wonderfully and protects our customers from constant threats.
So, that being said, we've been in the process of whitelisting the Facebook API ranges today and confirming our system is no longer blocking these requests. We'd still like those affected to confirm if the issue persists. If for any reason you're still having problems, please open up or respond to your existing support request

Secure Browsing Method of Getting Facebook Photos Using APIs

Using the facebook graph you can get photo information as follows:
https://graph.facebook.com/20531316728
However the link they provide to actually grab the photos are not secure and use http:
http://profile.ak.fbcdn.net/hprofile-ak-snc4/174597_20531316728_2866555_s.jpg
Replacing http with https doesn't do the trick because you get a security warning:
https://profile.ak.fbcdn.net/hprofile-ak-snc4/174597_20531316728_2866555_s.jpg
Facebook is insisting that all apps use secure browsing and use https. However my app uses facebook photos, which cannot be accessed because they begin with http.
Does anyone know how to get around this problem?
I found the answer to my own question. You can add a parameter to get a the ssl parameter:
https://graph.facebook.com/20531316728&return_ssl_resources=1
I've never come across a way to ask the API for valid https versions of the images other than for profile pictures. That is done by https://graph.facebook.com/{userId/Name}/picture
Here's Zuck: https://graph.facebook.com/4/picture and https://graph.facebook.com/zuck/picture
If you're using the PHP SDK, this was a F***ing life-saver (where $album['cover_photo'] is the id of a photo):
$this->facebook->api($album['cover_photo'],'GET',array('return_ssl_resources'=>1));
Whenever i would simply add &return_ssl_resources=1 to the end of the query itself my server would throw a 500 error. I found another thread that showed that you can pass this argument in an array.