Is it possible to create a integration test of a scrapy-pipeline? I can't figure out how to do this. In particular I am trying to write a test for the FilesPipeline and I also want it to persist my mocked response to Amazon S3.
Here is my test:
def _mocked_download_func(request, info):
return Response(url=response.url, status=200, body="test", request=request)
class FilesPipelineTests(unittest.TestCase):
def setUp(self):
self.settings = get_project_settings()
crawler = Crawler(self.settings)
crawler.configure()
self.pipeline = FilesPipeline.from_crawler(crawler)
self.pipeline.open_spider(None)
self.pipeline.download_func = _mocked_download_func
#defer.inlineCallbacks
def test_file_should_be_directly_available_from_s3_when_processed(self):
item = CrawlResult()
item['id'] = "test"
item['file_urls'] = ['http://localhost/test']
result = yield self.pipeline.process_item(item, None)
self.assertEquals(result['files'][0]['path'], "full/002338a87aab86c6b37ffa22100504ad1262f21b")
I always run into the following error:
DirtyReactorAggregateError: Reactor was unclean.
How do I create a proper test using twisted and scrapy?
Up do now I did my pipeline tests without the call to from_crawler, so they are not ideal, because they do not test the functionality of from_crawler, but they work.
I do them by using an empty Spider instance:
from scrapy.spiders import Spider
# some other imports for my own stuff and standard libs
#pytest.fixture
def mqtt_client():
client = mock.Mock()
return client
def test_mqtt_pipeline_does_return_item_after_process(mqtt_client):
spider = Spider(name='spider')
pipeline = MqttOutputPipeline(mqtt_client, 'dummy-namespace')
item = BasicItem()
item['url'] = 'http://example.com/'
item['source'] = 'dummy source'
ret = pipeline.process_item(item, spider)
assert ret is not None
(in fact, I forgot to call open_spider())
You can also have a look at how scrapy itself does the testing of pipelines, e.g. for MediaPipeline:
class BaseMediaPipelineTestCase(unittest.TestCase):
pipeline_class = MediaPipeline
settings = None
def setUp(self):
self.spider = Spider('media.com')
self.pipe = self.pipeline_class(download_func=_mocked_download_func,
settings=Settings(self.settings))
self.pipe.open_spider(self.spider)
self.info = self.pipe.spiderinfo
def test_default_media_to_download(self):
request = Request('http://url')
assert self.pipe.media_to_download(request, self.info) is None
You can also have a look through their other unit tests. For me, these are always good inspiration on how to unit test scrapy components.
If you want to test the from_crawler function, too, you could have a look on their Middleware tests. In these tests, they often use from_crawler to create middlewares, e.g. for OffsiteMiddleware.
from scrapy.spiders import Spider
from scrapy.utils.test import get_crawler
class TestOffsiteMiddleware(TestCase):
def setUp(self):
crawler = get_crawler(Spider)
self.spider = crawler._create_spider(**self._get_spiderargs())
self.mw = OffsiteMiddleware.from_crawler(crawler)
self.mw.spider_opened(self.spider)
I assume the key component here is to call get_crawler from scrapy.utils.test. Seems they factored out some calls you need to do in order to have a testing environment.
Related
I keep hitting a method scan_file() that should be mocked. It happens while calling self.client.post() in a django api test. The app setup is below, I've tried to mock the imported scan_file patch("myapp.views.scan_file") as well as the source location patch("myapp.utils.scan_file") neither work.
# myapp.views.py
from myapp.utils import scan_file
class MyViewset():
def scan():
scan_file(file) # <- this should be mocked but its entering the code
#myapp.utils.py
def scan_file(file) -> bool:
boolean_result = call_api(file)
return boolean_result
#test_api.py
class MyAppTest():
def test_scan_endpoint(self):
patcher = patch("myapp.views.scan_file")
MockedScan = patcher.start()
MockedScan.return_value = True
# This post hits the scan_file code during the api
# call but it should be mocked.
resp = self.client.post(
SCAN_ENDPOINT,
data={
"file" :self.text_file,
"field1" : "Progress"
}
)
I've also tried the following syntax for mocking, and tried including it in the test setup() as well:
self.patcher = patch("myapp.views.scan_file")
self.MockedScan = self.patcher.start()
self.MockedScan.return_value = True
Folks,
I have following situation:
tested_library:
class foo(object):
def __init__(self):
self.bar = BuiltIn().get_library_instance('bar')
def get_status(self):
trace_out = self.bar.some_method_from_mocked_lib()
trace = re.findall(r'([A-Z\d]+)\s*(on|off)', trace_out )
Where BuiltIn is a robotframework library which is not working without robot's environment and python is not able to import this library because of BuiltIn and I had to mock it (below). trace is still used later and method get_status() returns something else.
test:
import pytest
from _pytest.monkeypatch import MonkeyPatch
from mock import MagicMock
monkeypatch = MonkeyPatch()
def import_lib(monkeypatch):
monkeypatch.setattr('robot.libraries.BuiltIn.BuiltIn.get_library_instance', MagicMock())
import tested_library as lib
return lib.foo
def test_getting_status():
test_object = import_lib(monkeypatch)
test_object.get_status()
But returns me:
pattern = '([A-Z\\d]+)\\s*(on|off)', string = <MagicMock name='mock().some_method_from_mocked_lib()' id='140623304433104'>, flags = 0
And the question is how to mock this, how to return string not an mocked object with method?
I have a Django project that pulls data from legacy database (read only connection) into its own database, and when I run integration tests, it tries to read from test_account on legacy connection.
(1049, "Unknown database 'test_account'")
Is there a way to tell Django to leave the legacy connection alone for reading from the test database?
I actually wrote something that lets you create integration test in djenga (available on pypi) if you want to take a look at how to create a separate integration test framework.
Here is the test runner I use when using the django unit test framework:
from django.test.runner import DiscoverRunner
from django.apps import apps
import sys
class UnManagedModelTestRunner(DiscoverRunner):
"""
Test runner that uses a legacy database connection for the duration of the test run.
Many thanks to the Caktus Group: https://www.caktusgroup.com/blog/2013/10/02/skipping-test-db-creation/
"""
def __init__(self, *args, **kwargs):
super(UnManagedModelTestRunner, self).__init__(*args, **kwargs)
self.unmanaged_models = None
self.test_connection = None
self.live_connection = None
self.old_names = None
def setup_databases(self, **kwargs):
# override keepdb so that we don't accidentally overwrite our existing legacy database
self.keepdb = True
# set the Test DB name to the current DB name, which makes this more of an
# integration test, but HEY, at least it's a start
DATABASES['legacy']['TEST'] = { 'NAME': DATABASES['legacy']['NAME'] }
result = super(UnManagedModelTestRunner, self).setup_databases(**kwargs)
return result
# Set Django's test runner to the custom class defined above
TEST_RUNNER = 'config.settings.test_settings.UnManagedModelTestRunner'
TEST_NON_SERIALIZED_APPS = [ 'legacy_app' ]
from django.test import TestCase, override_settings
#override_settings(LOGIN_URL='/other/login/')
class LoginTestCase(TestCase):
def test_login(self):
response = self.client.get('/sekrit/')
self.assertRedirects(response, '/other/login/?next=/sekrit/')
https://docs.djangoproject.com/en/1.10/topics/testing/tools/
You should theoretically be able to use the override settings here and switch to a dif
I've not done any twisted now for a couple of years and have started using the newer Agent style of client http calls. Using Agent has been OK, but testing is confusing me (it's twisted after all).
I've been through the https://twistedmatrix.com/documents/current/core/howto/trial.html docs and the APIs on trial tools and Agent itself. Also numerous searches.
I've gone with faking out Agent, as I don't need to test that. But then because of the steps to handle the processing and response of an Agent request, my test code has got nasty, implementing the nested layers of the Agent, protocol, etc. Where should I draw the line here and are there some utils I haven't found?
Here's a minimal example (naive implementation of SUT):
from twisted.web.client import Agent, readBody
from twisted.internet import reactor
import json
class SystemUnderTest(object):
def __init__(self, url):
self.url = url
def action(self):
d = self._makeAgent().request("GET", self.url)
d.addCallback(self._cbSuccess)
return d
def _makeAgent(self):
''' It's own method so can be overridden in tests '''
return Agent(reactor)
def _cbSuccess(self, response):
d = readBody(response)
d.addCallback(self._cbParse)
return d
def _cbParse(self, data):
self.result = json.loads(data)
print self.result
with the test module:
from twisted.trial import unittest
from sut import SystemUnderTest
from twisted.internet import defer
from twisted.test import proto_helpers
class Test(unittest.TestCase):
def test1(self):
s_u_t = ExtendedSystemUnderTest(None)
d = s_u_t.action()
d.addCallback(self._checks, s_u_t)
return d
def _checks(self, result, s_u_t):
print result
self.assertEqual({'one':1}, s_u_t.result)
class ExtendedSystemUnderTest(SystemUnderTest):
def _makeAgent(self):
return FakeSuccessfulAgent("{'one':1}")
## Getting ridiculous below here...
class FakeReason(object):
def check(self, _):
return False
def __str__(self):
return "It's my reason"
class FakeResponse(object):
''' Implementation of IResponse '''
def __init__(self, content):
self.content = content
self.prot = proto_helpers.StringTransport()
self.code = 200
self.phrase = ''
def deliverBody(self, prot):
prot.makeConnection(self.prot)
prot.dataReceived(self.content)
# reason = FakeReason()
# prot.connectionLost(reason)
class FakeSuccessfulAgent(object):
''' Implementation of IAgent '''
def __init__(self, response):
self.response = response
def request(self, method, url):
return defer.succeed(FakeResponse(self.response))
but testing is confusing me (it's twisted after all).
Hilarious.
class ExtendedSystemUnderTest(SystemUnderTest):
def _makeAgent(self):
return FakeSuccessfulAgent("{'one':1}")
I suggest you make the agent to use a normal parameter. This is more convenient than a private method like _makeAgent. Composition is great. Inheritance is meh.
class FakeReason(object):
...
There's no reason to make a fake of this. Just use twisted.python.failure.Failure. You don't have to fake every object in the test. Just the ones that get in your way if you don't fake them.
class FakeResponse(object):
...
This is likely good and necessary.
class FakeSuccessfulAgent(object):
...
This is most likely necessary as well. You should make it actually be more like an IAgent implementation though - declare that it implements the interface, use zope.interface.verify.verify{Class,Object} to make sure you get the implementation write, etc (eg request has the wrong signature now).
There's actually a ticket for adding all of these testing tools to Twisted itself - https://twistedmatrix.com/trac/ticket/4024. So I don't think you're actually confused, you're basically on the same track as the project itself. You're just suffering from the fact that Twisted hasn't already done all of this work for you.
Also, note that instead of:
class Test(unittest.TestCase):
def test1(self):
s_u_t = ExtendedSystemUnderTest(None)
d = s_u_t.action()
d.addCallback(self._checks, s_u_t)
return d
You can write something like this instead (and it is preferable):
class Test(unittest.TestCase):
def test1(self):
s_u_t = ExtendedSystemUnderTest(None)
d = s_u_t.action()
self._checks(s_u_t, self.successResultOf(d))
This is because your fake implementation of IAgent is synchronous. You know it is synchronous. By the time request returns, the Deferred it is returning has a result already. Writing the test this way means you can simplify your code a bit (ie, you can ignore the asynchronousness of it to some extent - because it isn't) and it avoids running the global reactor which is what returning a Deferred from a test method in trial does.
using nose to test a flask app. I'm trying to use the with_setup decorator to make my test DRY, without having to repeat the setup every test function. but it doesn't seem to run the #with_setup phase. As the docs state I'm using it with test functions and not test classes. some code:
from flask import *
from app import app
from nose.tools import eq_, assert_true
from nose import with_setup
testapp = app.test_client()
def setup():
app.config['TESTING'] = True
RUNNING_LOCAL = True
RUN_FOLDER = os.path.dirname(os.path.realpath(__file__))
fixture = {'html_hash':'aaaa'} #mocking the hash
def teardown():
app.config['TESTING'] = False
RUNNING_LOCAL = False
#with_setup(setup, teardown)
def test_scrape_wellformed_html():
#RUN_FOLDER = os.path.dirname(os.path.realpath(__file__)) #if it is here instead of inside #with_setup the code works..
#fixture = {'html_hash':'aaaa'} #mocking the hash #if it is here the code works
fixture['gush_id'] = 'current.fixed'
data = scrape_gush(fixture, RUN_FOLDER)
various assertions
for example if I create the fixture dict inside the #with_setup block, instead of inside the specific test method (and in everyone of them) I'll get a NameError (or something similar)
I guess I'm missing something, just not sure what.
Thanks for the help!
The issue is that the names RUN_FOLDER and fixture scoped to the function setup and so will not be available to test_scrape_wellformed_html. If you look at the code for with_setup you will see that it does not do anything to alter the run function's environment.
In order to do what you want to do you need to make your fixtures global variables:
testapp = app.test_client()
RUN_FOLDER = os.path.dirname(os.path.realpath(__file__))
fixture = None
def setup():
global fixture
app.config['TESTING'] = True
fixture = {'html_hash':'aaaa'} #mocking the hash
def teardown():
global fixture
app.config['TESTING'] = False
fixture = None
#with_setup(setup, teardown)
def test_scrape_wellformed_html():
# run test here