I am working on a tiny movie manager by using the out-of-the-box admin module in Django.
I add a "Play" link on the movie admin page to play the movie, by passing the id of this movie. So the backend is something like this:
import subprocess
def play(request, movie_id):
try:
m = Movie.objects.get(pk=movie_id)
subprocess.Popen([PLAYER_PATH, m.path + '/' + m.name])
return HttpResponseRedirect("/admin/core/movie")
except Movie.DoesNotExist:
return HttpResponse(u"The movie is not exist!")
As the code above reveals, every time I click the "play" link, the page will be refreshed to /admin/core/movie, which is the movie admin page, I just do not want the backend to do this kind of things, because I may use the "Search" functions provided by the admin module, so the URL before clicking on "Play" may be something like: "/admin/core/movie/?q=gun", if that response takes effect, then the query criteria will be removed.
So, my thought is whether I can forbid the HttpResponse, in order to let me stay on the current page.
Any suggestions on this issue ?
Thanks in advance.
I used the custom action in admin to implement this function.
So finally I felt that actions are something like procedures, which have no return values, and requests are something like methods(views) with return values...
Thanks !
Related
I want an effect to be applied when a user is entering my website. So therefore I want to check for when a user is coming from outside my website so the effect isnt getting applied when the user is surfing through different urls inside the website, but only when the user is coming from outside my website
You can't really check for where a user has come from specifically. You can check if the user has just arrived on your site by setting a session variable when they load one of your pages. You can check for it before you set it, and if they don't have it, then they have just arrived and you can apply your effect. There's some good examples of how sessions work here: https://developer.mozilla.org/en-US/docs/Learn/Server-side/Django/Sessions
There's a couple of ways to handle this. If you are using function based views, you can just create a separate util function and include it at the top of every page, eg,
utils.py
def first_visit(request):
"""returns the answer to the question 'first visit for session?'
make sure SESSION_EXPIRE_AT_BROWSER_CLOSE set to False in settings for persistance"""
if request.session['first_visit']:
#this is not the first session because the session variable is used.
return False
else:
#This is the first visit
...#do something
#set the session variable so you only do the above once
request.session[first_visit'] = True
return True
views.py
from utils.py import first_visit
def show_page(request):
first_visit = first_visit(request)
This approach gives you some control. For example, you may not want to run it on pages that require login, because you will already have run it on the login page.
Otherwise, the best approach depends on what will happen on the first visit. If you want just to update a template (eg, perhaps to show a message or run a script on th epage) you can use a context processor which gives you extra context for your templates. If you want to interrupt the request, perhaps to redirect it to a separate page, you can create a simple piece of middleware.
docs for middleware
docs for context processors
You may also be able to handle this entirely by javascript. This uses localStorage to store whether or not this is the user's first visit to the site and displays the loading area for 5 seconds if there is nothing in localStorage. You can include this in your base template so it runs on every page.
function showMain() {
document.getElementByID("loading").style.display = "none";
document.getElementByID("main").style.display = "block";
}
const secondVisit = localStorage.getItem("secondVisit");
if (!secondVisit) {
//show loading screen
document.getElementByID("loading").style.display = "block";
document.getElementByID("main").style.display = "none";
setTimeout(5000, showMain)
localStorage.setItem("secondVisit", "true" );
} else {
showMain()
}
I am trying to scrape some information from flipkart.com for this purpose I am using Scrapy. The information I need is for every product on flipkart.
I have used the following code for my spider
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.selector import HtmlXPathSelector
from tutorial.items import TutorialItem
class WebCrawler(CrawlSpider):
name = "flipkart"
allowed_domains = ['flipkart.com']
start_urls = ['http://www.flipkart.com/store-directory']
rules = [
Rule(LinkExtractor(allow=['/(.*?)/p/(.*?)']), 'parse_flipkart', cb_kwargs=None, follow=True),
Rule(LinkExtractor(allow=['/(.*?)/pr?(.*?)']), follow=True)
]
#staticmethod
def parse_flipkart(response):
hxs = HtmlXPathSelector(response)
item = FlipkartItem()
item['featureKey'] = hxs.select('//td[#class="specsKey"]/text()').extract()
yield item
What my intent is to crawl through every product category page(specified by the second rule) and follow the product page(first rule) within the category page to scrape data from the products page.
One problem is that I cannot find a way to control the crawling and scrapping.
Second flipkart uses ajax on its category page and displays more products when a user scrolls to the bottom.
I have read other answers and assessed that selenium might help solve the issue. But I cannot find a proper way to implement it into this structure.
Suggestions are welcome..:)
ADDITIONAL DETAILS
I had earlier used a similar approach
the second rule I used was
Rule(LinkExtractor(allow=['/(.?)/pr?(.?)']),'parse_category', follow=True)
#staticmethod
def parse_category(response):
hxs = HtmlXPathSelector(response)
count = hxs.select('//td[#class="no_of_items"]/text()').extract()
for page num in range(1,count,15):
ajax_url = response.url+"&start="+num+"&ajax=true"
return Request(ajax_url,callback="parse_category")
Now i was confused on what to use for callback "parse_category" or "parse_flipkart"
Thank you for your patience
Not sure what you mean when you say that you can't find a way to control the crawling and scraping. Creating a spider for this purpose is already taking it under control, isn't it? If you create proper rules and parse the responses properly, that is all you need. In case you are referring to the actual order in which the pages are scraped, you most likely don't need to do this. You can just parse all the items in whichever order, but gather their location in the category hierarchy by parsing the breadcrumb information above the item title. You can use something like this to get the breadcrumb in a list:
response.css(".clp-breadcrumb").xpath('./ul/li//text()').extract()
You don't actually need Selenium, and I believe it would be an overkill for this simple issue. Using your browser (I'm using Chrome currently), press F12 to open the developer tools. Go to one of the category pages, and open the Network tab in the developer window. If there is anything here, click the Clear button to clear things up a bit. Now scroll down until you see that additional items are being loaded, and you will see additional requests listed in the Network panel. Filter them by Documents (1) and click on the request in the left pane (2). You can see the URL for the request (3) and the query parameters that you need to send (4). Note the start parameter which will be the most important since you will have to call this request multiple times while increasing this value to get new items. You can check the response in the Preview pane (5), and you will see that the request from the server is exactly what you need, more items. The rule you use for the items should pick up those links too.
For a more detail overview of scraping with Firebug, you can check out the official documentation.
Since there is no need to use Selenium for your purpose, I shall not cover this point more than adding a few links that show how to use Selenium with Scrapy, if the need ever occurs:
https://gist.github.com/cheekybastard/4944914
https://gist.github.com/irfani/1045108
http://snipplr.com/view/66998/
I'm very new to programming and especially to Django but can't work out how to use any previous answers to my advantage....
Apologies if my question is too vague but essentially, I have two different apps, let's call them app A and app B, with data on two different databases but apps contain information on the same individual item.
I want to edit this information on my 'edit details' page while keeping the apps as separate as possible (well AppB can know about functions in AppA but not vice-versa)...I guess what I really want is a signal which works like so:
A 'submit' view within AppA which is called when I submit changes to the data (using text boxes). The data for AppA is then saved..
And a signal then sent to AppB which ideally would update its data, before the HttpResponseRedirect is performed.
Unfortunately I can't really get this to work. My problem is that if I put 'request' into the arguments for save_details, I get errors like "save_details() takes exactly 3 arguments (2 given)"....does anyone know a clever way of getting something like this to work?
My submit function in AppA looks something like this...
def submit(self, request, id):
signal_received.send(sender=self, id=id)
q = get_object_or_404(AppA, pk=id)
q.blah = request.POST.get('wibble from the form')
...
return Http.....
in my AppB signals.py file, I have put.
signal_received = django.dispatch.Signal(providing_args=['id'])
def save_details(sender, uid, **kwargs):
p = AppB.objects.get(id=id)
p.wobble = request.POST.get('wobble from the form')
...
signal.received.connect(save_details)
Obviously the above doesn't mention request in its arguments which seems to be necessary but if I add that, I get problems with the number of arguments.
(I have imported all the right things at the top of each file I think...hence me leaving that off.)
Any point about the above would be appreciated....e.g. does "request" need to be the first argument? It didn't seem to like me using "self" before but I have tried to copy as much as possible the example at the bottom of the documentation (https://docs.djangoproject.com/en/dev/topics/signals/) but the extra functionality I need in the signal receiving function is flumoxing me.
Thanks in advance...
Morning,
With my django website, I want to make the urls as short as possible.
So instead of,
/user/john
/user/ronald
I just want it like /john and /ronald
So in my routes I have it configured that all the requests go to one
urlpatterns = patterns('',
....
(r'^about/$', 'frontend.views.about'),
(r'^(.*?)/$', 'users.views.index')
)
which means basically that all requests will be handled by the users controller if not handled else where, which isn't to bad.
But I want to do the same cakes.
so instead of /cakes/chocolate-coated-cake just have /chocolate-coated-cake
So really, it'd be nice if in my users method, instead of raising a 404 I could just some how call try next route, so it's conditional on a DB field.
Make sense?
In this case I'll prefer to have a separate dispatcher view (not related to users because it is not about users). There you could set up order of models in a list and iterate through it until the first sucess and call the appropriate view (user view, cake view) with this result as an argument).
P.S. Hope that you will not have a user and a cake with the same name :)
Admin actions can act on the selected objects in the list page.
Is it possible to act on all the filtered objects?
For example if the admin search for Product names that start with "T-shirt" which results with 400 products and want to increase the price of all of them by 10%.
If the admin can only modify a single page of result at a time it will take a lot of effort.
Thanks
The custom actions are supposed to be used on a group of selected objects, so I don't think there is a standard way of doing what you want.
But I think I have a hack that might work for you... (meaning: use at your own risk and it is untested)
In your action function the request.GET will contain the q parameter used in the admin search. So if you type "T-Shirt" in the search, you should see request.GET look something like:
<QueryDict: {u'q': [u'T-Shirt']}>
You could completely disregard the querystring parameter that your custom action function receives and build your own queryset based on that request.GET's q parameter. Something like:
def increase_price_10_percent(modeladmin, request, queryset):
if request.GET['q'] is None:
# Add some error handling
queryset=Product.objects.filter(name__contains=request.GET['q'])
# Your code to increase price in 10%
increase_price_10_percent.short_description = "Increases price 10% for all products in the search result"
I would make sure to forbid any requests where q is empty. And where you read name__contains you should be mimicking whatever filter you created for the admin of your product object (so, if the search is only looking at the name field, name__contains might suffice; if it looks at the name and description, you would have a more complex filter here in the action function too).
I would also, maybe, add an intermediate page stating what models will be affected and have the user click on "I really know what I'm doing" confirmation button. Look at the code for django.contrib.admin.actions for an example of how to list what objects are being deleted. It should point you in the right direction.
NOTE: the users would still have to select something in the admin page, otherwise the action function would never get called.
This is a more generic solution, is not fully tested(and its pretty naive), so it might break with strange filters. For me works with date filters, foreign key filters, boolean filters.
def publish(modeladmin,request,queryset):
kwargs = {}
for filter,arg in request.GET.items():
kwargs.update({filter:arg})
queryset = queryset.filter(**kwargs)
queryset.update(published=True)