Django notification observe model (watching for product results) - django

I've been using django-notification (https://github.com/jtauber/django-notification.git) but the documentation is a little brief for a beginner.
I want to be able to have users keep a watch on searches (a results page with product listings) that have no results at the time of search. Then if a record is added that matches the search, the user should be notified.
I can't find any online explanation of how to use 'observe', which I think is what i'd need to use to watch for records appearing (in search results)? Perhaps, this is the wrong approach (using django-notification) as I need a signal to await the occurrence of a filter result that would initially contain no objects...
(the project is too developed to consider an option like Pinax to provide a template for things like this)
I suppose I need to evaluate
f=Products.objects.filter({search_request_args})
if f:
notification.send([request.user], "product_match", {"from_user": settings.FROM_DEFAULT})
Perhaps as a chron job?

It looks like you want to use django signals (see: https://docs.djangoproject.com/en/dev/topics/signals/)
let's say you want to watch the creation of Product objects
from django.db.models.signals import post_save
from my_app.models import Product
def new_product(sender, instance, created, **kwargs):
# short-circuit the function if it isn't a new product (it's
# being updated not created)
if not created: return
# note: instance is the newly saved Product object
if (check_if_the_new_product_matches_searches_here):
notification.send(...)
post_save.connect(new_product, sender=Product)

Related

Testing triggers for fulltext search in Django

I'm adding a search engine to a Django project, and thus set up SearchVectorFields on several models, with custom triggers.
I would like to unit-test that my columns of type TSVECTOR are updated when the instance of a Model changes.
However, I've been unable to find any information on how to test the content of a SearchVectorField ... I can't compare my_document.search to SearchVector(Value("document content")) or similar, because the first one seems to be string-like, while the latter is an object.
TL;DR
More precisely, with the model:
from django.db import models
class Document(models.Model):
...
content = TextField()
search = SearchVectorField()
and trigger:
-- create trigger function
CREATE OR REPLACE FUNCTION search_trigger() RETURNS trigger AS $$
begin
NEW.search := to_tsvector(COALESCE(NEW.content, ''))
return NEW;
end
$$ LANGUAGE plpgsql;
-- add trigger on insert
DROP TRIGGER IF EXISTS search_trigger ON myapp_document;
CREATE TRIGGER search_trigger
BEFORE INSERT
ON myapp_document
FOR EACH ROW
EXECUTE PROCEDURE search_trigger();
-- add trigger on update
DROP TRIGGER IF EXISTS search_trigger_update ON myapp_document;
CREATE TRIGGER search_trigger_update
BEFORE UPDATE OF content
ON myapp_document
FOR EACH ROW
WHEN (OLD.content IS DISTINCT FROM NEW.content)
EXECUTE PROCEDURE search_trigger();
How can I test that when I create a new Document instance, its search field is populated with the right values ? Same question for updating an existing Document instance, but the answer should be fairly similar.
Thanks for any hint ;)
I think you can compare string representation of your SearchVectorField values:
from django.test import TestCase
from .models import Document
class DocumentTest(TestCase):
def setUp(self):
Document.objects.create(content='Pizza Recipes')
def test_document_search(self):
document_list = list(Document.objects.values_list('search', flat=True))
search_list = ["'pizza':1 'recip':2"]
self.assertSequenceEqual(document_list, search_list)

Scraping data off flipkart using scrapy

I am trying to scrape some information from flipkart.com for this purpose I am using Scrapy. The information I need is for every product on flipkart.
I have used the following code for my spider
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.selector import HtmlXPathSelector
from tutorial.items import TutorialItem
class WebCrawler(CrawlSpider):
name = "flipkart"
allowed_domains = ['flipkart.com']
start_urls = ['http://www.flipkart.com/store-directory']
rules = [
Rule(LinkExtractor(allow=['/(.*?)/p/(.*?)']), 'parse_flipkart', cb_kwargs=None, follow=True),
Rule(LinkExtractor(allow=['/(.*?)/pr?(.*?)']), follow=True)
]
#staticmethod
def parse_flipkart(response):
hxs = HtmlXPathSelector(response)
item = FlipkartItem()
item['featureKey'] = hxs.select('//td[#class="specsKey"]/text()').extract()
yield item
What my intent is to crawl through every product category page(specified by the second rule) and follow the product page(first rule) within the category page to scrape data from the products page.
One problem is that I cannot find a way to control the crawling and scrapping.
Second flipkart uses ajax on its category page and displays more products when a user scrolls to the bottom.
I have read other answers and assessed that selenium might help solve the issue. But I cannot find a proper way to implement it into this structure.
Suggestions are welcome..:)
ADDITIONAL DETAILS
I had earlier used a similar approach
the second rule I used was
Rule(LinkExtractor(allow=['/(.?)/pr?(.?)']),'parse_category', follow=True)
#staticmethod
def parse_category(response):
hxs = HtmlXPathSelector(response)
count = hxs.select('//td[#class="no_of_items"]/text()').extract()
for page num in range(1,count,15):
ajax_url = response.url+"&start="+num+"&ajax=true"
return Request(ajax_url,callback="parse_category")
Now i was confused on what to use for callback "parse_category" or "parse_flipkart"
Thank you for your patience
Not sure what you mean when you say that you can't find a way to control the crawling and scraping. Creating a spider for this purpose is already taking it under control, isn't it? If you create proper rules and parse the responses properly, that is all you need. In case you are referring to the actual order in which the pages are scraped, you most likely don't need to do this. You can just parse all the items in whichever order, but gather their location in the category hierarchy by parsing the breadcrumb information above the item title. You can use something like this to get the breadcrumb in a list:
response.css(".clp-breadcrumb").xpath('./ul/li//text()').extract()
You don't actually need Selenium, and I believe it would be an overkill for this simple issue. Using your browser (I'm using Chrome currently), press F12 to open the developer tools. Go to one of the category pages, and open the Network tab in the developer window. If there is anything here, click the Clear button to clear things up a bit. Now scroll down until you see that additional items are being loaded, and you will see additional requests listed in the Network panel. Filter them by Documents (1) and click on the request in the left pane (2). You can see the URL for the request (3) and the query parameters that you need to send (4). Note the start parameter which will be the most important since you will have to call this request multiple times while increasing this value to get new items. You can check the response in the Preview pane (5), and you will see that the request from the server is exactly what you need, more items. The rule you use for the items should pick up those links too.
For a more detail overview of scraping with Firebug, you can check out the official documentation.
Since there is no need to use Selenium for your purpose, I shall not cover this point more than adding a few links that show how to use Selenium with Scrapy, if the need ever occurs:
https://gist.github.com/cheekybastard/4944914
https://gist.github.com/irfani/1045108
http://snipplr.com/view/66998/

Custom Django signal receiver getting data

I'm very new to programming and especially to Django but can't work out how to use any previous answers to my advantage....
Apologies if my question is too vague but essentially, I have two different apps, let's call them app A and app B, with data on two different databases but apps contain information on the same individual item.
I want to edit this information on my 'edit details' page while keeping the apps as separate as possible (well AppB can know about functions in AppA but not vice-versa)...I guess what I really want is a signal which works like so:
A 'submit' view within AppA which is called when I submit changes to the data (using text boxes). The data for AppA is then saved..
And a signal then sent to AppB which ideally would update its data, before the HttpResponseRedirect is performed.
Unfortunately I can't really get this to work. My problem is that if I put 'request' into the arguments for save_details, I get errors like "save_details() takes exactly 3 arguments (2 given)"....does anyone know a clever way of getting something like this to work?
My submit function in AppA looks something like this...
def submit(self, request, id):
signal_received.send(sender=self, id=id)
q = get_object_or_404(AppA, pk=id)
q.blah = request.POST.get('wibble from the form')
...
return Http.....
in my AppB signals.py file, I have put.
signal_received = django.dispatch.Signal(providing_args=['id'])
def save_details(sender, uid, **kwargs):
p = AppB.objects.get(id=id)
p.wobble = request.POST.get('wobble from the form')
...
signal.received.connect(save_details)
Obviously the above doesn't mention request in its arguments which seems to be necessary but if I add that, I get problems with the number of arguments.
(I have imported all the right things at the top of each file I think...hence me leaving that off.)
Any point about the above would be appreciated....e.g. does "request" need to be the first argument? It didn't seem to like me using "self" before but I have tried to copy as much as possible the example at the bottom of the documentation (https://docs.djangoproject.com/en/dev/topics/signals/) but the extra functionality I need in the signal receiving function is flumoxing me.
Thanks in advance...

django admin actions on all the filtered objects

Admin actions can act on the selected objects in the list page.
Is it possible to act on all the filtered objects?
For example if the admin search for Product names that start with "T-shirt" which results with 400 products and want to increase the price of all of them by 10%.
If the admin can only modify a single page of result at a time it will take a lot of effort.
Thanks
The custom actions are supposed to be used on a group of selected objects, so I don't think there is a standard way of doing what you want.
But I think I have a hack that might work for you... (meaning: use at your own risk and it is untested)
In your action function the request.GET will contain the q parameter used in the admin search. So if you type "T-Shirt" in the search, you should see request.GET look something like:
<QueryDict: {u'q': [u'T-Shirt']}>
You could completely disregard the querystring parameter that your custom action function receives and build your own queryset based on that request.GET's q parameter. Something like:
def increase_price_10_percent(modeladmin, request, queryset):
if request.GET['q'] is None:
# Add some error handling
queryset=Product.objects.filter(name__contains=request.GET['q'])
# Your code to increase price in 10%
increase_price_10_percent.short_description = "Increases price 10% for all products in the search result"
I would make sure to forbid any requests where q is empty. And where you read name__contains you should be mimicking whatever filter you created for the admin of your product object (so, if the search is only looking at the name field, name__contains might suffice; if it looks at the name and description, you would have a more complex filter here in the action function too).
I would also, maybe, add an intermediate page stating what models will be affected and have the user click on "I really know what I'm doing" confirmation button. Look at the code for django.contrib.admin.actions for an example of how to list what objects are being deleted. It should point you in the right direction.
NOTE: the users would still have to select something in the admin page, otherwise the action function would never get called.
This is a more generic solution, is not fully tested(and its pretty naive), so it might break with strange filters. For me works with date filters, foreign key filters, boolean filters.
def publish(modeladmin,request,queryset):
kwargs = {}
for filter,arg in request.GET.items():
kwargs.update({filter:arg})
queryset = queryset.filter(**kwargs)
queryset.update(published=True)

Any thoughts on A/B testing in Django based project? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
We just now started doing the A/B testing for our Django based project. Can I get some information on best practices or useful insights about this A/B testing.
Ideally each new testing page will be differentiated with a single parameter(just like Gmail). mysite.com/?ui=2 should give a different page. So for every view I need to write a decorator to load different templates based on the 'ui' parameter value. And I dont want to hard code any template names in decorators. So how would urls.py url pattern will be?
It's useful to take a step back and abstract what A/B testing is trying to do before diving into the code. What exactly will we need to conduct a test?
A Goal that has a Condition
At least two distinct Paths to meet the Goal's Condition
A system for sending viewers down one of the Paths
A system for recording the Results of the test
With this in mind let's think about implementation.
The Goal
When we think about a Goal on the web usually we mean that a user reaches a certain page or that they complete a specific action, for example successfully registering as a user or getting to the checkout page.
In Django we could model that in a couple of ways - perhaps naively inside a view, calling a function whenever a Goal has been reached:
def checkout(request):
a_b_goal_complete(request)
...
But that doesn't help because we'll have to add that code everywhere we need it - plus if we're using any pluggable apps we'd prefer not to edit their code to add our A/B test.
How can we introduce A/B Goals without directly editing view code? What about a Middleware?
class ABMiddleware:
def process_request(self, request):
if a_b_goal_conditions_met(request):
a_b_goal_complete(request)
That would allow us to track A/B Goals anywhere on the site.
How do we know that a Goal's conditions has been met? For ease of implementation I'll suggest that we know a Goal has had it's conditions met when a user reaches a specific URL path. As a bonus we can measure this without getting our hands dirty inside a view. To go back to our example of registering a user we could say that this goal has been met when the user reaches the URL path:
/registration/complete
So we define a_b_goal_conditions_met:
a_b_goal_conditions_met(request):
return request.path == "/registration/complete":
Paths
When thinking about Paths in Django it's natural to jump to the idea of using different templates. Whether there is another way remains to be explored. In A/B testing you make small differences between two pages and measure the results. Therefore it should be a best practice to define a single base Path template from which all Paths to the Goal should extend.
How should render these templates? A decorator is probably a good start - it's a best practice in Django to include a parameter template_name to your views a decorator could alter this parameter at runtime.
#a_b
def registration(request, extra_context=None, template_name="reg/reg.html"):
...
You could see this decorator either introspecting the wrapped function and modifying the template_name argument or looking up the correct templates from somewhere (like a Model). If we didn't want to add the decorator to every function we could implement this as part of our ABMiddleware:
class ABMiddleware:
...
def process_view(self, request, view_func, view_args, view_kwargs):
if should_do_a_b_test(...) and "template_name" in view_kwargs:
# Modify the template name to one of our Path templates
view_kwargs["template_name"] = get_a_b_path_for_view(view_func)
response = view_func(view_args, view_kwargs)
return response
We'd need also need to add some way to keep track of which views have A/B tests running etc.
A system for sending viewers down a Path
In theory this is easy but there are lot of different implementations so it's not clear which one is best. We know a good system should divide users evenly down the path - Some hash method must be used - Maybe you could use the modulus of memcache counter divided by the number of Paths - maybe there is a better way.
A system for recording the Results of the Test
We need to record how many users went down what Path - we'll also need access to this information when the user reaches the goal (we need to be able to say what Path they came down to met the Condition of the Goal) - we'll use some kind of Model(s) to record the data and either Django Sessions or Cookies to persist the Path information until the user meets the Goal condition.
Closing Thoughts
I've given a lot of pseudo code for implementing A/B testing in Django - the above is by no means a complete solution but a good start towards creating a reusable framework for A/B testing in Django.
For reference you may want to look at Paul Mar's Seven Minute A/Bs on GitHub - it's the ROR version of the above!
http://github.com/paulmars/seven_minute_abs/tree/master
Update
On further reflection and investigation of Google Website Optimizer it's apparent that there are gaping holes in the above logic. By using different templates to represent Paths you break all caching on the view (or if the view is cached it will always serve the same path!). Instead, of using Paths, I would instead steal GWO terminology and use the idea of Combinations - that is one specific part of a template changing - for instance, changing the <h1> tag of a site.
The solution would involve template tags which would render down to JavaScript. When the page is loaded in the browser the JavaScript makes a request to your server which fetches one of the possible Combinations.
This way you can test multiple combinations per page while preserving caching!
Update
There still is room for template switching - say for example you introduce an entirely new homepage and want to test it's performance against the old homepage - you'd still want to use the template switching technique. The thing to keep in mind is your going to have to figure out some way to switch between X number of cached versions of the page. To do this you'd need to override the standard cached middleware to see if their is a A/B test running on the requested URL. Then it could choose the correct cached version to show!!!
Update
Using the ideas described above I've implemented a pluggable app for basic A/B testing Django. You can get it off Github:
http://github.com/johnboxall/django-ab/tree/master
If you use the GET parameters like you suggsted (?ui=2), then you shouldn't have to touch urls.py at all. Your decorator can inspect request.GET['ui'] and find what it needs.
To avoid hardcoding template names, maybe you could wrap the return value from the view function? Instead of returning the output of render_to_response, you could return a tuple of (template_name, context) and let the decorator mangle the template name. How about something like this? WARNING: I haven't tested this code
def ab_test(view):
def wrapped_view(request, *args, **kwargs):
template_name, context = view(request, *args, **kwargs)
if 'ui' in request.GET:
template_name = '%s_%s' % (template_name, request.GET['ui'])
# ie, 'folder/template.html' becomes 'folder/template.html_2'
return render_to_response(template_name, context)
return wrapped_view
This is a really basic example, but I hope it gets the idea across. You could modify several other things about the response, such as adding information to the template context. You could use those context variables to integrate with your site analytics, like Google Analytics, for example.
As a bonus, you could refactor this decorator in the future if you decide to stop using GET parameters and move to something based on cookies, etc.
Update If you already have a lot of views written, and you don't want to modify them all, you could write your own version of render_to_response.
def render_to_response(template_list, dictionary, context_instance, mimetype):
return (template_list, dictionary, context_instance, mimetype)
def ab_test(view):
from django.shortcuts import render_to_response as old_render_to_response
def wrapped_view(request, *args, **kwargs):
template_name, context, context_instance, mimetype = view(request, *args, **kwargs)
if 'ui' in request.GET:
template_name = '%s_%s' % (template_name, request.GET['ui'])
# ie, 'folder/template.html' becomes 'folder/template.html_2'
return old_render_to_response(template_name, context, context_instance=context_instance, mimetype=mimetype)
return wrapped_view
#ab_test
def my_legacy_view(request, param):
return render_to_response('mytemplate.html', {'param': param})
Justin's response is right... I recommend you vote for that one, as he was first. His approach is particularly useful if you have multiple views that need this A/B adjustment.
Note, however, that you don't need a decorator, or alterations to urls.py, if you have just a handful of views. If you left your urls.py file as is...
(r'^foo/', my.view.here),
... you can use request.GET to determine the view variant requested:
def here(request):
variant = request.GET.get('ui', some_default)
If you want to avoid hardcoding template names for the individual A/B/C/etc views, just make them a convention in your template naming scheme (as Justin's approach also recommends):
def here(request):
variant = request.GET.get('ui', some_default)
template_name = 'heretemplates/page%s.html' % variant
try:
return render_to_response(template_name)
except TemplateDoesNotExist:
return render_to_response('oops.html')
A code based on the one by Justin Voss:
def ab_test(force = None):
def _ab_test(view):
def wrapped_view(request, *args, **kwargs):
request, template_name, cont = view(request, *args, **kwargs)
if 'ui' in request.GET:
request.session['ui'] = request.GET['ui']
if 'ui' in request.session:
cont['ui'] = request.session['ui']
else:
if force is None:
cont['ui'] = '0'
else:
return redirect_to(request, force)
return direct_to_template(request, template_name, extra_context = cont)
return wrapped_view
return _ab_test
example function using the code:
#ab_test()
def index1(request):
return (request,'website/index.html', locals())
#ab_test('?ui=33')
def index2(request):
return (request,'website/index.html', locals())
What happens here:
1. The passed UI parameter is stored in the session variable
2. The same template loads every time, but a context variable {{ui}} stores the UI id (you can use it to modify the template)
3. If user enters the page without ?ui=xx then in case of index2 he's redirected to '?ui=33', in case of index1 the UI variable is set to 0.
I use 3 to redirect from the main page to Google Website Optimizer which in turn redirects back to the main page with a proper ?ui parameter.
You can also A/B test using Google Optimize. To do so you'll have to add Google Analytics to your site and then when you create a Google Optimize experiment each user will get a cookie with a different experiment variant (according to the weight for each variant). You can then extract the variant from the cookie and display various versions of your application. You can use the following snippet to extract the variant:
ga_exp = self.request.COOKIES.get("_gaexp")
parts = ga_exp.split(".")
experiments_part = ".".join(parts[2:])
experiments = experiments_part.split("!")
for experiment_str in experiments:
experiment_parts = experiment_str.split(".")
experiment_id = experiment_parts[0]
variation_id = int(experiment_parts[2])
experiment_variations[experiment_id] = variation_id
However there is a django package that integrates well with Google Optimize: https://github.com/adinhodovic/django-google-optimize/.
And here is a blog post on how to use the package and how Google Optimize works: https://hodovi.cc/blog/django-b-testing-google-optimize/.