Retrieve information from URL to share it on my website - web-services

I am about to develop a new feature on my website that allow the user to give me a URL then I would use this URL to get the site title, description and image(s) so that I store these information on my website. I need to know if there is any script that can do that or if there is a web service that would take the url and give me the information I need or shall I start developing this from scratch.
Also, I would like to know if there is any kind of standards used in the information sharing mechanism as I want to allow the user to share a video or photo from the web.

There is no single script that can extract information from all sites, because the source HTML for most websites is different. You will need to write code specifically for the sites you are scraping.
As for syndicating the content, you can use RSS (Really Simple Syndication), which is an XML format commonly used for sharing content.

Related

Getting the specific data from a web page with boost/asio

I am learning boost/asio. I can do an endpoint, active and passive sockets. Now I want to write something like a simple client application, which will get specific data from web pages.
So I have few questions about that:
If I have a done an socket, which is related with a web page, how can specify some content on the page. For example, I want to get an image. There are many images on the page. Not only images. I want to identify specific image. How can i do that?(may be and "id" from html or some how else).
After that I want to get that specific image on my PC. How can I download it and save it?
If it is not image, if I want to work with audio file, video file, text, hyperlink and e.t.c. How to generalize it for any type of content?
How can I follow links on web page?
You also may use boost/beast in answer for this question.
offtop
(cpp is not good idea for dealing with that stuff, I know that)

How to get ondemand ids for Vimeo API

I am trying to work with the Vimeo API and I cannot figure out how to access the ondemand data.
The endpoint and parameters in the docs require an ondemand_id to work correctly. I assumed this ID would come from any official ondemand page within Vimeo. But whenever I search the ondemand pages of Vimeo and click on a resource, the URL does not contain any numerical ID.
It only contains the root path for the Vimeo website with /ondemand_page_name at the end. This value cannot be the ID since it is a string and not a number. I have looked through the entire page plenty of different times to try to find the ID but cannot seem to find it.
For example, when you visit a normal video page on Vimeo, the URL looks something like this:
https://vimeo.com/272976101
where the number 272976101 is the video_id that can be used within the API to get all the data about this particular video. Instead of this format, the ondemand pages have the format:
https://vimeo.com/ondemand/nebula
where there is no numerical ID within the URL. This is the issue I am having. How would I retrieve the public data about this ondemand page throught the API.
I feel like there may be a very simple solution/explanation to this issue and any help would be much appreciated.
Also, right now I am not using any SDK to access this data. I am strictly trying to figure out how the API works through the built-in client provided within the documentation.
It's undocumented, but you can use the On Demand custom url path as the ondemand_id.
So for your On Demand video at https://vimeo.com/ondemand/nebula, you can make an API request to this path: https://api.vimeo.com/ondemand/pages/nebula.
In the response, you'll see the "uri" value "/ondemand/pages/203314", which you can log on your end and use as the ondemand_id instead of /nebula.
Also note, this should be the same URL as your On Demand settings page: https://vimeo.com/ondemand/203314/settings
I hope this information helps!

Track users using Google analytics for one url on website

I have requirement where I need to store users ip, device information, user_agent, etc. information for on url on my site. How do I go about this?
This data will be used later as stats (which device hitting more, which locations etc.)
I can see that Google analytics helps in tracking for entire site.
How do I enable it to track only for one specific url on my site and track all information mentioned above?
If you add your tracking code only on the one web page you wish to track, then you should be able to accomplish your goal. Just to clarify, if you have two web pages, trackme.html and donottrackme.html, you would place the Google Analytics tracking code only on trackme.html. IP, device information, user agent, etc. should be visible within your dashboard.

Extracting out the important information from a web page when provided with only the URL

What I'm referring to is what apps like Facebook and Twitter do when someone posts a link. They are able to convert that link into a title, an important image and (sometimes) a short summary.
What I'm asking is: is there some trick to this using tags, rss or metadata or do you have to sign up for a web service which does this for you or write the code yourself, downloading the HTML and parsing it to extract out a guess to the components you want?
http://ogp.me - They all use the open graph protocol or others. The answers are in the meta tags.

Pulling Facebook photos onto an external website

I'm doing a job for Company A. I've just built their website in Django but now they want to add a social photo management aspect to the site (in that other people can upload).
The only way I know of doing this (having done it before) is through Flickr. You can set up a group and have it so anybody can add photos to it. And pull out the latest with RSS. But let's be honest, Facebook is far more popular and my client wants this feature heavily used by his clientèle.
They have a Facebook page and the power to open it up so anybody can add their photos to it... But how can I pull those photos back to the website?
Facebook's query language can do this (like the RSS sends data to you from Flickr) for users of FB pages, and there are some Javascripts for making them viewable and interactive on external web pages.
For example:
http://www.codeofaninja.com/2011/06/display-facebook-photos-to-your-website.html
http://www.alexanderinteractive.com/blog/2012/03/display-facebook-photos-on-your-website-with-galleria/
Good luck!
Terry
I'm not sure if this will help, but I recall that the Flock Web Browser had the capability of loading a stream of new videos/photos on the top of the brower's media stream bar - perhaps you can sneak a peek into the inner workings it uses to accomplish this task.
I know that you can start reading the RSS feed of a Facebook Page itself now, perhaps just a little parsing is all you need: http://www.allfacebook.com/facebook-pages-rss-2010-01