Want General Screen scraping application ( like facebook has ) for django

Want General Screen scraping application ( like facebook has ) for django - django

The use case of what I want is -
The user enters the link url ( like you might be doing in your facebook stat update box )
And a short description of this url with its title and a thumbnail appear.
( Yeah, the basic process is "sharing a link" )
How would you go about doing it ?
Call a django app which scrapes urls ( if yes then which one ? )
Do it using javascript? Is it possible ?

If you want a thumbnail have a look at python-webkit2png.

Have a look at BeautifulSoup, a very potent DOM crawler. With it, you could scrape URL's for tags and display them. The same goes for page titles.

Related

Django enable URL link in template

I have a blog and users can comment any articles. For now, when they post an URL, we can not follow the link by a clic. Because there is no tag.
I know we can display HTML by using the filter "|safe". But I don't want a user using all html tag for security reasons.
I just want to make URL like "http://google.fr" as active and cliquable.
Any idea ?

The urlize filter does exactly what you want.

regex and scrapy in web crawling

I do a web crawling use scrapy. currently, it can extract the start url but not crawl later.
start_urls = ['https://cloud.cubecontentgovernance.com/retention/document_types.aspx']
allowed_domains = ['cubecontentgovernance.com']
rules = (
Rule(LinkExtractor(allow=("document_type_retention.aspx?dtid=1054456",)),
callback='parse_item', follow=True),
)
And the link i want to extract in the develop tool is:<a id="ctl00_body_ListView1_ctrl0_hyperNameLink" href="document_type_retention.aspx?dtid=1054456"> pricing </a>
the corresponding url is https://cloud.cubecontentgovernance.com/retention/document_type_retention.aspx?dtid=1054456
so what the allow field should be? thanks a lot

When I try to open the site of your start URL I get a login window.
Did you try to print response.body in the simple parse method for your start URL? I guess your Scrapy instance gets the same login window which does not have the URL you want to extract with the LinkExtractor.

How do I access my query when using Haystack/Elasticsearch?

I originally followed this tutorial (https://django-haystack.readthedocs.org/en/latest/tutorial.html), and have so far been able to highlight my query within my returned results. However, I want to highlight this same query when visiting the next page that I load with a separate template. Is there any way to save/access this query so that I can highlight the same results within this other template?
Whenever I try and include a statement like this, I get an error, which I'm thinking is because I'm not trying to access the query properly.
{% highlight section.body with query html_tag "span" css_class "highlighted" %}

You have to send to the next page, the information that you use to highlight the results in the first page. You can use the request.session to store the data and call it in the next page, or you can send the sqs by the url to the next page.
If you want to know how to manage the search query set, and how to edit that kind of stuff, I recommend you to read the views.py forms.py and the elasticsearch_backend in the haystack folder at: "/usr/local/lib/python2.7/dist-packages/haystack"
This is the url for the documentation of Django Session: Django Session
This is the url for the documentation to pass parameters trhough url: URL dispatcher

Get comment count on posts across site

I have a site with lots of pages of content and a Facebook comment box (social plugin) on each one. Say, http://subdomain.site.org
I want to build a widget that searches that subdomain for the pages with the most comments on them and list them. Is there a way to do this?
Thanks!

its pretty easy to get the count of comments for each url
use fql:
SELECT url, comment_count, FROM link_stat WHERE url = " http://subdomain.site.org"
you can also get the comment count on all the urls at the same time (depending on how many urls)
SELECT url, comment_count, FROM link_stat WHERE url in ("url1", "url2", "url3" ... )
from facebook -FQL can handle simple math, basic boolean operators, AND or NOT logical operators, and ORDER BY and LIMIT clauses.
fql does not support CONTAINS() or anything else like that to use for searching all url's from a subdomain.
see here for the link_stat table - you can only query on column's that are indexable

How can I add a "trending topics" section to my site's homepage?

I would like to add a custom "trending topics" section to my news website at the top of the homepage. An existing example that shows precisely what I am looking for is the Daily Beast homepage.
I would like to do this with custom code or with a plugin, but not as a widget. Does anyone know how I can do this in a flexible way that can easily customize to style and look of my website.
My site is a Spanish language 24/7 news website called Yasta.pr. Thx!

it depends on how you are storing your data in your database but you can try this:
add a div section in the header of your webpage above the menu bar.
then you can dynamically add the most trending topic to it automatically when the page starts by inserting this code inside it:
<div id="*id_name">
$query = "SELECT * FROM *table_name ORDER BY *most_trending_column LIMIT 1";
mysql_select_db(*database_name, *connection);
$result = mysql_query($query) or die(mysql_error());
$data = mysql_fetch_assoc($result);
echo "<h3 id='*id_name'>" . $data['*title_column_name'] . "</h3>";
</div>
dont forget to connect to your database first.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Want General Screen scraping application ( like facebook has ) for django - django

If you want a thumbnail have a look at python-webkit2png.

Have a look at BeautifulSoup, a very potent DOM crawler. With it, you could scrape URL's for tags and display them. The same goes for page titles.

Related

Django enable URL link in template

regex and scrapy in web crawling

How do I access my query when using Haystack/Elasticsearch?

Get comment count on posts across site

How can I add a "trending topics" section to my site's homepage?

Categories

Resources