TDD Django tests seem to skip certain parts of the view code - django

I'm writing some tests for a site using django TDD.
The problem is that when I manually go to the testserver. Fill in the form and submit it then it seems to works fine. But when I run the test using manage.py test wiki it seems to skip parts of the code within the view. The page parts all seem to work fine. But the pagemod-parts within the code and even a write() I created just to see what was going on seems to be ignored.
I have no idea what could be causing this and can't seem to find a solution. Any ideas?
This is the code:
test.py
#imports
class WikiSiteTest(LiveServerTestCase):
....
def test_wiki_links(self):
'''Go to the site, and check a few links'''
#creating a few objects which will be used later
.....
#some code to get to where I want:
.....
#testing the link to see if the tester can add pages
link = self.browser.find_element_by_link_text('Add page (for testing only. delete this later)')
link.click()
#filling in the form
template_field = self.browser.find_element_by_name('template')
template_field.send_keys('homepage')
slug_field = self.browser.find_element_by_name('slug')
slug_field.send_keys('this-is-a-slug')
title_field = self.browser.find_element_by_name('title')
title_field.send_keys('this is a title')
meta_field = self.browser.find_element_by_name('meta_description')
meta_field.send_keys('this is a meta')
content_field = self.browser.find_element_by_name('content')
content_field.send_keys('this is content')
#submitting the filled form so that it can be processed
s_button = self.browser.find_element_by_css_selector("input[value='Submit']")
s_button.click()
# now the view is called
and a view:
views.py
def page_add(request):
'''This function does one of these 3 things:
- Prepares an empty form
- Checks the formdata it got. If its ok then it will save it and create and save
a copy in the form of a Pagemodification.
- Checks the formdata it got. If its not ok then it will redirect the user back'''
.....
if request.method == 'POST':
form = PageForm(request.POST)
if form.is_valid():
user = request.user.get_profile()
page = form.save(commit=False)
page.partner = user.partner
page.save() #works
#Gets ignored
pagemod = PageModification()
pagemod.template = page.template
pagemod.parent = page.parent
pagemod.page = Page.objects.get(slug=page.slug)
pagemod.title = page.title
pagemod.meta_description = page.meta_description
pagemod.content = page.content
pagemod.author = request.user.get_profile()
pagemod.save()
f = open("/location/log.txt", "w", True)
f.write('are you reaching this line?')
f.close()
#/gets ignored
#a render to response
Then later I do:
test.py
print '###############Data check##################'
print Page.objects.all()
print PageModification.objects.all()
print '###############End data check##############'
And get:
terminal:
###############Data check##################
[<Page: this is a title 2012-10-01 14:39:21.739966+00:00>]
[]
###############End data check##############
All the imports are fine. Putting the page.save() after the ignored code makes no difference.
This only happens when running it through the TDD test.
Thanks in advance.

How very strange. Could it be that the view is somehow erroring at the Pagemodification stage? Have you got any checks later on in your test that assert that the response from the view is coming through correctly, ie that a 500 error is not being returned instead?

Now this was a long time ago.
It was solved but the solution was a little embarrassing. Basically, it was me being stupid. I can't remember the exact details but I believe a different view was called instead of the one that I showed here. That view had the same code except the "skipped" part.
My apologies to anyone who took their time looking into this.

Related

Flask session cookie reverting

I am sure I am probably being stupid but struggling to wrap my head around this one.
I have a flask website and I am setting up a checkout page for it so users can add their items to the cart etc. Everything was going great, I was able to add items to the cart, get a total etc (using sessions) however when I have tried to implement the ability for users to update the cart on the checkout page, when my form posts, the session data only survives the initial load. The print statement shows the data I am collecting is fine, and the session cookie is set initially, as everything updates, however the moment I then change page, it reverts to whatever it was before I made the update.
#views.route("/shopping-cart", methods=['GET','POST'])
def to_cart():
clear_cart = 'views.to_clear_cart'
if 'shopping' in session:
shopping_list = session['shopping']
sum_list = []
for quantity, price in shopping_list.values():
sum_list.append(quantity * price)
total = sum(sum_list)
if request.method == "POST":
new_quantity = int(request.form.get('quantity'))
product_id = request.form.get('issue')
unit_price = int(request.form.get('price'))
print(new_quantity, product_id, unit_price)
shopping_list[f'{product_id}'] = [new_quantity, unit_price]
return redirect(url_for('views.to_cart'))
return render_template("cart.html",
shopping_list=shopping_list,
total=total,
clear_cart=clear_cart,
)
else:
return render_template("cart.html",
clear_cart=clear_cart
)
I just do not really understand why it is not updating as from what I can tell, the code is running fine, and it does update, but then the session cookie just reverts itself to whatever it was before (using browser side cookies for this for testing).
Any help appreciated!!
After much confusion as everything seemed to be working absolutely fine after I rewrote this in about 5 different ways and printed half the app in the console, I finally found the answer and it is indeed me being an idiot.
It turns out if you modify a value in place rather than creating or deleting it does not automatically save the session state and you just need to state explicitly that it has been modified.
Turns out the answer was as simple as this line of code.
session.modified = True

Why does scrapy miss some links?

I am scraping the web-site "www.accell-group.com" using the "scrapy" library for Python. The site is scraped completely, in total 131 pages (text/html) and 2 documents (application/pdf) are identified. Scrapy did not throw any warnings or errors. My algorithm is supposed to scrape every single link. I use CrawlSpider.
However, when I look into the page "http://www.accell-group.com/nl/investor-relations/jaarverslagen/jaarverslagen-van-accell-group.htm", which is reported by "scrapy" as scraped/processed, I see that there are more pdf-documents, for example "http://www.accell-group.com/files/4/5/0/1/Jaarverslag2014.pdf". I cannot find any reasons for it not to be scraped. There is no dynamic/JavaScript content on this page. It is not forbidden in "http://www.airproducts.com/robots.txt".
Do you maybe have any idea why it can happen?
It is maybe because the "files" folder is not in "http://www.accell-group.com/sitemap.xml"?
Thanks in advance!
My code:
class PyscrappSpider(CrawlSpider):
"""This is the Pyscrapp spider"""
name = "PyscrappSpider"
def__init__(self, *a, **kw):
# Get the passed URL
originalURL = kw.get('originalURL')
logger.debug('Original url = {}'.format(originalURL))
# Add a protocol, if needed
startURL = 'http://{}/'.format(originalURL)
self.start_urls = [startURL]
self.in_redirect = {}
self.allowed_domains = [urlparse(i).hostname.strip() for i in self.start_urls]
self.pattern = r""
self.rules = (Rule(LinkExtractor(deny=[r"accessdenied"]), callback="parse_data", follow=True), )
# Get WARC writer
self.warcHandler = kw.get('warcHandler')
# Initialise the base constructor
super(PyscrappSpider, self).__init__(*a, **kw)
def parse_start_url(self, response):
if (response.request.meta.has_key("redirect_urls")):
original_url = response.request.meta["redirect_urls"][0]
if ((not self.in_redirect.has_key(original_url)) or (not self.in_redirect[original_url])):
self.in_redirect[original_url] = True
self.allowed_domains.append(original_url)
return self.parse_data(response)
def parse_data(self, response):
"""This function extracts data from the page."""
self.warcHandler.write_response(response)
pattern = self.pattern
# Check if we are interested in the current page
if (not response.request.headers.get('Referer')
or re.search(pattern, self.ensure_not_null(response.meta.get('link_text')), re.IGNORECASE)
or re.search(r"/(" + pattern + r")", self.ensure_not_null(response.url), re.IGNORECASE)):
logging.debug("This page gets processed = %(url)s", {'url': response.url})
sel = Selector(response)
item = PyscrappItem()
item['url'] = response.url
return item
else:
logging.warning("This page does NOT get processed = %(url)s", {'url': response.url})
return response.request
Remove or expand appropriately your "allowed_domains" variable and you should be fine. All the URLs the spider follows, by default, are restricted by allowed_domains.
EDIT: This case mentions particularly pdfs. PDFs are explicitly excluded as extensions as per the default value of deny_extensions (see here) which is IGNORED_EXTENSIONS (see here).
To allow your application to crawl PDFs all you have to do is to exclude them from IGNORED_EXTENSIONS by setting explicitly the value for deny_extensions:
from scrapy.linkextractors import IGNORED_EXTENSIONS
self.rules = (Rule(...
LinkExtractor(deny=[r"accessdenied"], deny_extensions=set(IGNORED_EXTENSIONS)-set(['pdf']))
..., callback="parse_data"...
So, I'm afraid, this is the answer to the question "Why does Scrapy miss some links?". As you will likely see it just opens the doors to further questions, like "how do I handle those PDFs" but I guess this is the subject of another question.

Scrapy webcrawler gets caught in infinite loop, despite initially working.

Alright, so I'm working on a scrapy based webcrawler, with some simple functionalities. The bot is supposed to go from page to page, parsing and then downloading. I've gotten the parser to work, I've gotten the downloading to work. I can't get the crawling to work. I've read the documentation on the Spider class, I've read the documentation on how parse is supposed to work. I've tried returning vs yielding, and I'm still nowhere. I have no idea where my code is going wrong. What seems to happen, from a debug script I wrote is the following. The code will run, it will grab page 1 just fine, it'll get the link to page two, it'll go to page two, and then it will happily stay on page two, not grabbing page three at all. I don't know where the mistake in my code is, or how to alter it to fix it. So any help would be appreciated. I'm sure the mistake is basic, but I can't figure out what's going on.
import scrapy
class ParadiseSpider(scrapy.Spider):
name = "testcrawl2"
start_urls = [
"http://forums.somethingawful.com/showthread.php?threadid=3755369&pagenumber=1",
]
def __init__(self):
self.found = 0
self.goto = "no"
def parse(self, response):
urlthing = response.xpath("//a[#title='Next page']").extract()
urlthing = urlthing.pop()
newurl = urlthing.split()
print newurl
url = newurl[1]
url = url.replace("href=", "")
url = url.replace('"', "")
url = "http://forums.somethingawful.com/" + url
print url
self.goto = url
return scrapy.Request(self.goto, callback=self.parse_save, dont_filter = True)
def parse_save(self, response):
nfound = str(self.found)
print "Testing" + nfound
self.found = self.found + 1
return scrapy.Request(self.goto, callback=self.parse, dont_filter = True)
Use Scrapy rule engine,So that don't need to write the next page crawling code in parse function.Just pass the xpath for the next page in the restrict_xpaths and parse function will get the response of the crawled page
rules=(Rule(LinkExtractor(restrict_xpaths= ['//a[contains(text(),"Next")]']),follow=True'),)
def parse(self,response):
response.url

Multiple POSTs in a Django unit test

I am writing unit tests to validate a profile avatar module. So, I have a form that allows a user to upload an avatar. If one exists, it simply replaces the current one.
In my test, I do the following (the class setup logs a user in - not shown here):
f = open('testfile1.jpg')
data = {'image':f}
response = self.client.post('/profile/uploadavatar/',data)
self.assertEqual(response.status_code, 200)
self.assertEqual(self.user1.get_profile().avatar.image.name, u'uploads/images/testfile1.jpg')
f.close()
f = open('testfile2.jpg')
data = {'image':f}
response = self.client.post('/profile/uploadavatar/',data)
self.assertEqual(response.status_code, 200)
self.assertEqual(self.user1.get_profile().avatar.image.name, u'uploads/images/testfile2.jpg')
f.close()
The second assertEqual to test for avatar image name always fails because it is still set to the first filename (testfile1.jpg). However when I test this manually the code does what I think it should, which is replace the old avatar with the new one.
Am I doing something wrong? I'm new to the django unit tests so I may be missing something very simple...
Any ideas would be appreciated.
Thanks in advance!
The "self.user1" object, along with the profile, are cached at the beginning.
Reload the user/profile objects between actions to see updated data.
(Pulled up from the comments.)

Django: Passing a request directly (inline) to a second view

I'm trying to call a view directly from another (if this is at all possible). I have a view:
def product_add(request, order_id=None):
# Works. Handles a normal POST check and form submission and redirects
# to another page if the form is properly validated.
Then I have a 2nd view, that queries the DB for the product data and should call the first one.
def product_copy_from_history(request, order_id=None, product_id=None):
product = Product.objects.get(owner=request.user, pk=product_id)
# I need to somehow setup a form with the product data so that the first
# view thinks it gets a post request.
2nd_response = product_add(request, order_id)
return 2nd_response
Since the second one needs to add the product as the first view does it I was wondering if I could just call the first view from the second one.
What I'm aiming for is just passing through the request object to the second view and return the obtained response object in turn back to the client.
Any help greatly appreciated, critism as well if this is a bad way to do it. But then some pointers .. to avoid DRY-ing.
Thanx!
Gerard.
My god, what was I thinking. This would be the cleanest solution ofcourse:
def product_add_from_history(request, order_id=None, product_id=None):
""" Add existing product to current order
"""
order = get_object_or_404(Order, pk=order_id, owner=request.user)
product = Product.objects.get(owner=request.user, pk=product_id)
newproduct = Product(
owner=request.user,
order = order,
name = product.name,
amount = product.amount,
unit_price = product.unit_price,
)
newproduct.save()
return HttpResponseRedirect(reverse('order-detail', args=[order_id]) )
A view is a regular python method, you can of course call one from another giving you pass proper arguments and handle the result correctly (like 404...). Now if it is a good practice I don't know. I would myself to an utiliy method and call it from both views.
If you are fine with the overhead of calling your API through HTTP you can use urllib to post a request to your product_add request handler.
As far as I know this could add some troubles if you develop with the dev server that comes with django, as it only handles one request at a time and will block indefinitely (see trac, google groups).