One of my methods in a project I'm working on looks like this:
from django.core.cache import cache
from app import models
def _get_active_children(parent_id, timestamp):
children = cache.get(f"active_seasons_{parent_id}")
if children is None:
children = models.Children.objects.filter(parent_id=parent_id).active(
dt=timestamp
)
cache.set(
f"active_children_{parent_id}",
children,
60 * 10,
)
return children
The issue is I don't want caching to occur when this method is being called via the command line (it's inside a task). So I'm wondering if there's a way to disable caching of this form?
Ideally I want to use a context manager so that any cache calls inside the context are ignored (or pushed to a DummyCache/LocalMem cache which wouldn't effect my main Redis cache).
I've considered pasisng skip_cache=True through the methods, but this is pretty brittle and I'm sure there's a more elegant solution. Additionally, I've tried using mock.patch but I'm not sure this works outside of test classes.
My ideal solution would look something like:
def task():
...
_get_active_children(parent_id, timestamp):
with no_cache:
task()
I have a solution (but I think there's a better one out there):
from unittest.mock import patch
from django.core.cache.backends.dummy import DummyCache
from django.utils.module_loading import import_string
def no_cache(module_str, cache_object_str='cache'):
""" example usage: with no_cache('app.tasks', 'cache'): """
module_ = import_string(module_str)
return patch.object(module_, cache_object_str, DummyCache('mock', {}))
Inspired by this.
Related
i'm trying to make a custom plotly-graphic on a wagtail homepage.
I got this far. I'm overriding the wagtail Page-model by altering the context returned to the template. Am i doing this the right way, is this possible in models.py ?
Thnx in advanced.
from django.db import models
from wagtail.models import Page
from wagtail.fields import RichTextField
from wagtail.admin.panels import FieldPanel
import psycopg2
from psycopg2 import sql
import pandas as pd
import plotly.graph_objs as go
from plotly.offline import plot
class CasPage(Page):
body = RichTextField(blank=True)
content_panels = Page.content_panels + [
FieldPanel('body'),
]
def get_connection(self):
try:
return psycopg2.connect(
database="xxxx",
user="xxxx",
password="xxxx",
host="xxxxxxxxxxxxx",
port=xxxxx,
)
except:
return False
conn = get_connection()
cursor = conn.cursor()
strquery = (f'''SELECT t.datum, t.grwaarde - LAG(t.grwaarde,1) OVER (ORDER BY datum) AS
gebruiktgas
FROM XXX
''')
data = pd.read_sql(strquery, conn)
fig1 = go.Figure(
data = data,
layout=go.Layout(
title="Gas-verbruik",
yaxis_title="aantal M3")
)
output = plotly.plot(fig1, output_type='div', include_plotlyjs=False)
# https://stackoverflow.com/questions/32626815/wagtail-views-extra-context
def get_context(self, request):
context = super(CasPage, self).get_context(request)
context['output'] = output
return context
Kind of the right track. You should move all the plot code into its own method though. At the moment, it runs the plot code when the site initialises then stays stored in memory.
There's three usual ways to get the plot to the rendered page then.
As you've done with context
As a property or method of the page class
As a template tag called from the template
The first two have more or less the same effect, except the 2nd makes the property available anywhere, not just the template. The context method runs before the page starts rendering, the other two happen during that process. I guess the only real difference there is that if you're using template caching, the context will always run each time the page is loaded, the other two only run when the cache is invalid, or if the code is escaped out of the cache (for fragment caching).
To call the plot as a property of your page class, you'd just pull out the code into a def with the #property decorator:
class CasPage(Page):
....
#property
def plot(self):
try:
conn = psycopg2.connect(
database="xxxx",
user="xxxx",
password="xxxx",
host="xxxxxxxxxxxxx",
port=xxxxx,
)
cursor = conn.cursor()
strquery = (f'''SELECT t.datum, t.grwaarde - LAG(t.grwaarde,1) OVER (ORDER BY datum) AS
gebruiktgas FROM XXX''')
data = pd.read_sql(strquery, conn)
fig1 = go.Figure(
data = data,
layout=go.Layout(
title="Gas-verbruik",
yaxis_title="aantal M3")
)
return plotly.plot(fig1, output_type='div', include_plotlyjs=False)
except Exception as e:
print(f"{type(e).__name__} at line {e.__traceback__.tb_lineno} of {__file__}: {e}")
return None
^ I haven't tried this code ... it should work as is, but no guarantees I didn't make a typo ;)
Now you can access your plot with {{ self.plot }} in the template.
If you want to stick with context, then you'd stay with the def above but just amend your output line to
context['output'] = self.plot
Template tags are more useful when they're being used in StructBlocks and not part of a page class like this, or where you have code that you want to re-use in multiple templates.
Then you'd move all that plot code into a template tag file, register it and call it in the template with {% plot %}. Wagtail template tags work the same as Django: https://docs.djangoproject.com/en/4.1/howto/custom-template-tags/
Is the plot data outside of the site database? If not, you could probably get the data via the ORM if it was defined as a model. If so, it's probably worth writing a view (or stored procedure if you want to pass parameters) on the db server and calling that rather than hard coding the SQL into your python.
The other consideration is the page load time - if the dataset is big, this could take a while and prevent the page from loading. You'd probably want a front-end solution in that case.
Abstract of Question:
I am trying to extend (not replace) some method of any of django's builtin Class in my custom module which would allow to further extend the extended(overridden) method.
Lets say I have two custom modules mod1 and mod2. Want to override same method of some django's class say get_apps_list of AdminSite, In mod1 I want to add a line to say hello, in mod2 it should say hi.
Desired output:
There should be nothing if none of the modules installed,
it should say hi if mod1 installed and hello if mod2
installed.
And hi and hello if both installed
Question with real example:
Just for example, I need to modify the implementation of AdminSite.get_app_list like following
From
#app['models'].sort(key=lambda x: x['name'])
To
app['models'] = sort_with_name_length(app['models']) //my own method
Expected/desired approach: I supposed that it should be achievable by just writing following code in any of my custom module's models.py or sites.py file
class MyAdminSite(AdminSite):
def get_app_list(self, request):
app_list = super().get_app_list(request)
for app in app_list:
#app['models'].sort(key=lambda x: x['name'])
app['models'] = sort_with_name_length(app['models']) #an example change I need
return app_list
But what above code does is nothing, its never executed, until I use Monkey patching guided by this answer.
What I could achieve
from django.contrib.admin import AdminSite
class MyAdminSite(AdminSite):
def get_app_list(self, request):
# res = super(MyAdminSite, self).get_app_list(request) //gives following error
# super(type, obj): obj must be an instance or subtype of type
# So i have to rewrite complete method again in my module like following
app_dict = self._build_app_dict(request)
app_list = sorted(app_dict.values(), key=lambda x: x['name'].lower())
for app in app_list:
#app['models'].sort(key=lambda x: x['name'])
app['models'] = sort_with_name_length(app['models'])
return app_list
AdminSite.get_app_list = MyAdminSite.get_app_list
Problem being faced: Above does what I need in a totally undesired way. This solution has two problems
It will not allow me Multilevel Inheritance (I would not be able to have child and grand child)
Its actually even not an overriding, its just a replacement of implementation as it gives error using super
Just for elaboration, following is an example of similar behavior overriding in odoo, what I want to achieve with django
The exact expected/desired behavior is offered by odoo.
You can see get_auth_signup_qcontext method of auth_oauth's main controller
https://github.com/odoo/odoo/blob/14.0/addons/auth_oauth/controllers/main.py
https://github.com/odoo/odoo/blob/14.0/addons/auth_signup/controllers/main.py
What is does is if auth_oauth module is installed anywhere we call get_auth_signup_qcontext it would first go to child(auth_oauth)'s get_auth_signup_qcontext method which will call super in it. But if auth_oauth is not installed anywhere we call get_auth_signup_qcontext will directly hit auth_signup's method
Method 1. Use your own admin site's urls
One can simply use their own admin site's urls (reference):
from django.contrib.admin import AdminSite
class MyAdminSite(AdminSite):
def get_app_list(self, request):
app_list = super().get_app_list(request)
for app in app_list:
#app['models'].sort(key=lambda x: x['name'])
app['models'] = sort_with_name_length(app['models']) #an example change I need
return app_list
admin_site = MyAdminSite(name='myadmin')
admin_site.register(MyModel) # registering models
Then in your urls:
from django.urls import path
from myapp.admin import admin_site
urlpatterns = [
path('myadmin/', admin_site.urls),
]
Method 2. Overriding the default admin site
You can override the admin site:
In apps.py:
from django.contrib.admin.apps import AdminConfig
class MyAdminConfig(AdminConfig):
default_site = 'myproject.admin.MyAdminSite'
In settings.py:
INSTALLED_APPS = [
...
'myproject.apps.MyAdminConfig', # replaces 'django.contrib.admin'
...
]
Quite an awkward and hacky but completely independent/loosely coupled solution to really override/extend (not replace) a builtin class of django's method is
Caution read the complete answer before using code :)
Add following code to models.py of any of your custom (installed) app
from django.contrib.admin import AdminSite
original_get_app_list = AdminSite.get_app_list
class AdminSiteExtension1(AdminSite):
def get_app_list(self, request):
// do something here to manipulate earlier than calling parent
// Following is the solution line
app_list = original_get_app_list(self, request)
for app in app_list:
app['name'] = app['name'] + '_my_app'
return app_list
AdminSite.get_app_list = AdminSiteExtension1.get_app_list
AdminSite is inherited just to make the use of self
Its really loosely coupled as u need to do nothing anywhere else. Even you can extend it further in any other module and you will not need AdminSiteExtension1 in that module because the updates to get_app_list of original AdminSite reside in original method as long as the app having AdminSiteExtension1 is installed.
Disclosure: I am not really a well learned and visionary programmer so I cannot imagine a situation where this solution can cause any problem, so if someone guides, it would be welcome. otherwise the beneficiary has to take care him/herself.
I'm scrapping a page successfully that returns me an unique item. I don't want neither to save the scrapped item in the database nor to a file. I need to get it inside a Django view.
My view is as follows:
def start_crawl(process_number, court):
"""
Starts the crawler.
Args:
process_number (str): Process number to be found.
court (str): Court of the process.
"""
runner = CrawlerRunner(get_project_settings())
results = list()
def crawler_results(sender, parse_result, **kwargs):
results.append(parse_result)
dispatcher.connect(crawler_results, signal=signals.item_passed)
process_info = runner.crawl(MySpider, process_number=process_number, court=court)
return results
I followed this solution but results list is always empty.
I read something as creating a custom middleware and getting the results at the process_spider_output method.
How can I get the desired result?
Thanks!
I managed to implement something like that in one of my projects. It is a mini-project and I was looking for a quick solution. You'll might need modify it or support multi-threading etc in case you put it in production environment.
Overview
I created an ItemPipeline that just add the items into a InMemoryItemStore helper. Then, in my __main__ code I wait for the crawler to finish, and pop all the items out of the InMemoryItemStore. Then I can manipulate the items as I wish.
Code
items_store.py
Hacky in-memory store. It is not very elegant but it got the job done for me. Modify and improve if you wish. I've implemented that as a simple class object so I can simply import it anywhere in the project and use it without passing its instance around.
class InMemoryItemStore(object):
__ITEM_STORE = None
#classmethod
def pop_items(cls):
items = cls.__ITEM_STORE or []
cls.__ITEM_STORE = None
return items
#classmethod
def add_item(cls, item):
if not cls.__ITEM_STORE:
cls.__ITEM_STORE = []
cls.__ITEM_STORE.append(item)
pipelines.py
This pipleline will store the objects in the in-memory store from the snippet above. All items are simply returned to keep the regular pipeline flow intact. If you don't want to pass some items down the to the other pipelines simply change process_item to not return all items.
from <your-project>.items_store import InMemoryItemStore
class StoreInMemoryPipeline(object):
"""Add items to the in-memory item store."""
def process_item(self, item, spider):
InMemoryItemStore.add_item(item)
return item
settings.py
Now add the StoreInMemoryPipeline in the scraper settings. If you change the process_item method above, make sure you set the proper priority here (changing the 100 down here).
ITEM_PIPELINES = {
...
'<your-project-name>.pipelines.StoreInMemoryPipeline': 100,
...
}
main.py
This is where I tie all these things together. I clean the in-memory store, run the crawler, and fetch all the items.
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from <your-project>.items_store import InMemoryItemStore
from <your-project>.spiders.your_spider import YourSpider
def get_crawler_items(**kwargs):
InMemoryItemStore.pop_items()
process = CrawlerProcess(get_project_settings())
process.crawl(YourSpider, **kwargs)
process.start() # the script will block here until the crawling is finished
process.stop()
return InMemoryItemStore.pop_items()
if __name__ == "__main__":
items = get_crawler_items()
If you really want to collect all data in a "special" object.
Store the data in a separate pipeline like https://doc.scrapy.org/en/latest/topics/item-pipeline.html#duplicates-filter and in close_spider (https://doc.scrapy.org/en/latest/topics/item-pipeline.html?highlight=close_spider#close_spider) you open your django object.
How to work with Django models inside Airflow tasks?
According to official Airflow documentation, Airflow provides hooks for interaction with databases (like MySqlHook / PostgresHook / etc) that can be later used in Operators for row query execution. Attaching the core code fragments:
Copy from https://airflow.apache.org/_modules/mysql_hook.html
class MySqlHook(DbApiHook):
conn_name_attr = 'mysql_conn_id'
default_conn_name = 'mysql_default'
supports_autocommit = True
def get_conn(self):
"""
Returns a mysql connection object
"""
conn = self.get_connection(self.mysql_conn_id)
conn_config = {
"user": conn.login,
"passwd": conn.password or ''
}
conn_config["host"] = conn.host or 'localhost'
conn_config["db"] = conn.schema or ''
conn = MySQLdb.connect(**conn_config)
return conn
Copy from https://airflow.apache.org/_modules/mysql_operator.html
class MySqlOperator(BaseOperator):
#apply_defaults
def __init__(
self, sql, mysql_conn_id='mysql_default', parameters=None,
autocommit=False, *args, **kwargs):
super(MySqlOperator, self).__init__(*args, **kwargs)
self.mysql_conn_id = mysql_conn_id
self.sql = sql
self.autocommit = autocommit
self.parameters = parameters
def execute(self, context):
logging.info('Executing: ' + str(self.sql))
hook = MySqlHook(mysql_conn_id=self.mysql_conn_id)
hook.run(
self.sql,
autocommit=self.autocommit,
parameters=self.parameters)
As we can see Hook incapsulates the connection configuration while Operator provides ability to execute custom queries.
The problem:
It's very convenient to use different ORM for fetching and processing database objects instead of raw SQL for the following reasons:
In straightforward cases, ORM can be a much more convenient solution, see ORM definitions.
Assume that there is already established systems like Django with defined models and their methods. Every time these models's schemas changes, airflow raw SQL queries needs to be rewritten. ORM provides a unified interface for working with such models.
For some reason, there are no examples of working with ORM in Airflow tasks in terms of hooks and operators. According to Using Django database layer outside of Django? question, it's needed to set up a connection configuration to the database, and then straight-forwardly execute queires in ORM, but doing that outside appropriate hooks / operators breaks Airflow principles. It's like calling BashOperator with "python work_with_django_models.py" command.
Finally, we want this:
So what are the best practisies in this case? Do we share any hooks / operators for Django ORM / other ORMs? In order to have the following code real (treat as pseudo-code!):
import os
import django
os.environ.setdefault(
"DJANGO_SETTINGS_MODULE",
"myapp.settings"
)
django.setup()
from your_app import models
def get_and_modify_models(ds, **kwargs):
all_objects = models.MyModel.objects.filter(my_str_field = 'abc')
all_objects[15].my_int_field = 25
all_objects[15].save()
return list(all_objects)
django_op = DjangoOperator(task_id='get_and_modify_models', owner='airflow')
instead of implementing this functionality in raw SQL.
I think it's pretty important topic, as the whole banch of ORM-based frameworks and processes are not able to dive into Airflow in this case.
Thanks in advance!
I agree we should continue to have this discussion as having access Django ORM can significantly reduce complexity of solutions.
My approach has been to 1) create a DjangoOperator
import os, sys
from airflow.models import BaseOperator
def setup_django_for_airflow():
# Add Django project root to path
sys.path.append('./project_root/')
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myapp.settings")
import django
django.setup()
class DjangoOperator(BaseOperator):
def pre_execute(self, *args, **kwargs):
setup_django_for_airflow()
and 2) Extend that DjangoOperator for logic / operators what would benefit from having access to ORM
from .base import DjangoOperator
class DjangoExampleOperator(DjangoOperator):
def execute(self, context):
from myApp.models import model
model.objects.get_or_create()
With this strategy, you can then distinguish between operators that use Raw SQL / ORM. Also note, that for the Django operator, all django model imports need to be within the execution context, demonstrated above.
here's my view (simplified):
#login_required(login_url='/try_again')
def change_bar(request):
foo_id = request.POST['fid']
bar_id = request.POST['bid']
foo = models.Foo.objects.get(id=foo_id)
if foo.value > 42:
bar = models.Bar.objects.get(id=bar_id)
bar.value = foo.value
bar.save()
return other_view(request)
Now I'd like to check if this view works properly (in this simplified model, if Bar instance changes value when it should). How do I go about it?
I'm going to assume you mean automated testing rather than just checking that the post request seems to work. If you do mean the latter, just check by executing the request and checking the values of the relevant Foo and Bar in a shell or in the admin.
The best way to go about sending POST requests is using a Client. Assuming the name of the view is my_view:
from django.test import Client
from django.urls import reverse
c = Client()
c.post(reverse('my_view'), data={'fid':43, 'bid':20})
But you still need some initial data in the database, and you need to check if the changes you expected to be made got made. This is where you could use a TestCase:
from django.test import TestCase, Client
from django.urls import reverse
FooBarTestCase(TestCase):
def setUp(self):
# create some foo and bar data, using foo.objects.create etc
# this will be run in between each test - the database is rolled back in between tests
def test_bar_not_changed(self):
# write a post request which you expect not to change the value
# of a bar instance, then check that the values didn't change
self.assertEqual(bar.value, old_bar.value)
def test_bar_changes(self):
# write a post request which you expect to change the value of
# a bar instance, then assert that it changed as expected
self.assertEqual(foo.value, bar.value)
A library which I find useful for making setting up some data to execute the tests easier is FactoryBoy. It reduces the boilerplate when it comes to creating new instances of Foo or Bar for testing purposes. Another option is to write fixtures, but I find that less flexible if your models change.
I'd also recommend this book if you want to know more about testing in python. It's django-oriented, but the principles apply to other frameworks and contexts.
edit: added advice about factoryboy and link to book
you can try putting "print" statements in between the code and see if the correct value is saved. Also for update instead of querying with "get" and then saving it (bar.save()) you can use "filter" and "update" method.
#login_required(login_url='/try_again')
def change_bar(request):
foo_id = request.POST['fid']
bar_id = request.POST['bid']
foo = models.Foo.objects.get(id=foo_id)
if foo.value > 42:
models.Bar.objects.filter(id=bar_id).update(value=foo.value)
#bar.value = foo.value
#bar.save()
return other_view(request)