django dynamic scraper range_funct pagination - django

I am using django-dynamic-scraper in one of my applications, I have gone through the docs and following is my setup:
object class url I am using is : http://www.example.com/products/brandname_products.html
The pagination on the site is something like the following.
page 1: http://www.example.com/products/brandname_products.html
page 2: http://www.example.com/products/brandname_products2.html
page 3: http://www.example.com/products/brandname_products3.html
page 4: http://www.example.com/products/brandname_products4.html
The brandname in the above urls is dynamic and depends on a brand's products page. I cannot have a different scraper for each brand as there are over 10000 brands so I am trying to use a single scraper object.
In the scraper object that I am using I have defined the pagination options as follows:
pagination_type: RANGE_FUNCT
pagination_append_str: _products{page}.html
pagination_page_replace: 1,100,2
but the scraper requests the following pagination urls
http://www.example.com/products/brandname_products.html_products2.html
http://www.example.com/products/brandname_products.html_products3.html
http://www.example.com/products/brandname_products.html_products4.html
Instead of
http://www.example.com/products/brandname_products2.html
http://www.example.com/products/brandname_products3.html
http://www.example.com/products/brandname_products4.html
Q: Why is it appending the replace string to the end of the url instead of actually replacing it with _products.html in the object class url ? What am I doing wrong and how can I fix this.

The pagination_append_str option is called like this, because the string is appended to the base url and not replacing it! :-)
So everything is correct, you just have to remove _products_html from your base url so that the final url is build together without doubling url parts.

Related

Django pagination - nice urls

I did pagination in my django projet. Everything works just perfect, but my urls looks terrible, like
host:8000/?page=1
How to create nice urls like
host:8000/page/2/ or host:8000/2/
I use standard Paginator class via ListView
How to do this w/o third party code ?
If you define url pattern like this:
url(r'^/page/(?P<page>\d+)/$', 'myapp.views.list_view'),
then ListView will pass page url keyword into paginator.
Notice:
Each path segment is supposed to be a valid resource, so it's not clear what you will display on /path/ URL.
Django pagination system assumes that webpages will default to using the URL query, so it's recommended to keep it as a URL query and it's more revealing.

django url parse formatted url

I'm in the design stages of a single page web app, and would like to make it so that a user can click on a formatted URL and the data requests will load in the page.
For example, a url of http://www.mysite.com/?category=some_cat will trigger the Category view with the relevant data.
My intention is to parse the URL, gather the data, then pass it to the index.html template for rendering on page load. Once the page has been loaded, a Javascript trigger setting will trigger the appropriate button to load the client view.
However, I'm having an issue setting up the URL parser, as the following settings are not matching the example url above.
from app.views import app_views, photo_views, user_views, admin_views
urlpatterns = patterns("",
url(r'^/(?P<category>\d+)/$', app_views.index)
)
You're confusing between sending information through your urls with GET and formatting you urls with arguments for the view functions. Say I am visiting a site called http://www.mysite.com/ and the page has a form that looks like this:
<form>
<input type='text' name='category' id='category'></input>
<button type='submit'>Send!</button>
</form>
upon clicking, the url will automatically change to http://www.mysite.com/?category=<value of input>. The ? marks that everything afterwards should be treated as GET data, with the syntax of <id>=<value>. You can then access them like so:
def response(request):
category = request.GET['category']
formatting urls is different, because it means looking for patterns that are part of the url. i.e. a pattern that looks like r'^/(?P<category>\d+)/$' will look for this: http://www.mysite.com/<category>/ and it will send it to the request in your views as an additional argument like so:
def response(request, category):
...
The regex is used to define how you recognize that part of the url. For example, the \d+ you're using means that category needs to be a number. You can search how to define different types of patterns according to your needs
Note that with GET you are sending the data to the same view function that rendered the page you are currently visiting, while using a different url means you tell it where to go through your urls.py (usually a different function). Does that make things a bit clearer?

How do I access my query when using Haystack/Elasticsearch?

I originally followed this tutorial (https://django-haystack.readthedocs.org/en/latest/tutorial.html), and have so far been able to highlight my query within my returned results. However, I want to highlight this same query when visiting the next page that I load with a separate template. Is there any way to save/access this query so that I can highlight the same results within this other template?
Whenever I try and include a statement like this, I get an error, which I'm thinking is because I'm not trying to access the query properly.
{% highlight section.body with query html_tag "span" css_class "highlighted" %}
You have to send to the next page, the information that you use to highlight the results in the first page. You can use the request.session to store the data and call it in the next page, or you can send the sqs by the url to the next page.
If you want to know how to manage the search query set, and how to edit that kind of stuff, I recommend you to read the views.py forms.py and the elasticsearch_backend in the haystack folder at: "/usr/local/lib/python2.7/dist-packages/haystack"
This is the url for the documentation of Django Session: Django Session
This is the url for the documentation to pass parameters trhough url: URL dispatcher

'Hiding' form query from URL (Django 1.3)

I have a form with 6-7 fields. After user input, my webapp searches for those fields in a database and displays the results.
Now the issue is, that the URL ends up having all the form field names and their values in it.
result/?name=lorem&class=arc&course=ipsum
Now with the form having 7-8 fields the url ends up looking ugly.
Is there a Django technique to 'hide' these from the URL? Quotes around hide because I'd be okay with a completely different way to pass the objects to my database from the form as well.
Use a POST request. Here's the django docs on forms and a specific example using POST>. HTML-wise, all you need to do is change the method on the form tag.
I do not recommend to use POST requests for search. If you'll use GET it will be easer for user, he can just bookmark a link and save search or share search results with friends.

How to create a WordPress like URL naming convention in Django?

I'm a newbie in Django and in WordPress if you create a Post called "hello world" then the URL by default will be like
wordpress.com/2012/07/05/hello-world/
and if you create another post with the same name it will be
wordpress.com/2012/07/05/hello-world-2/
I want to achieve the same in Django and I was thinking to create a sample urlconf like this
(r'^articles/(\d{4})/(\d{2})/(?P<name>\w+)', 'article.views.article_detail')
and in the views break down the name and iterate through all the items and match the name.
But the problem with will be that I won't be able to reference posts dynamically. For e.g. if I was to link the a hello world post I would need to find out how many posts with the same name exist already and then append the additional number to it which is inefficient.
So what's the best way to do this in Django?
See the documentation for Django's {{ url }} template tag. It lets you pass it a view name and parameters, and automatically generates the correct URL for you.
You can take care of appending numbers to each post's name in the function that generates its slug - you could have a look at django-autoslug