I have a Django app where multiple teams upload content that will be parsed. The app keeps track of certain common information in the parsed content. The issue here is that each team has content that needs to be parsed by a different parser because the content is in a different format (e.g. some teams have XML content, some have text, some have JSON, etc...). Each of the teams has provided a parser (a python module) that grabs the necessary info that is placed into corresponding Django models after parsing.
My question is: what is the best way to architect this in Django where each team can have their own parser setup correctly? It can be purely backend, no need for a user form or anything like that. My first thought was that I would create a Parser model with ForeignKey to each team like so:
class Parser(models.Model):
team = models.ForeignKey('Team')
module_path = models.CharField(max_length=..., blank=False)
and that module_path would be something like "parsers.teamA.XMLparser" which would reside in my app code in that path like so:
parsers/
teamA/
__init__.py
XMLparser.py
teamB/
Then when my app is coming to parse the uploaded content, I would have this:
team = Team.objects.get(id=team_id)
parser = Parser.objects.get(team=team)
theParser = __import__(parser.module_path)
theParser.parse(theStuffToBeParsed)
What problems does anyone see with this? The only other option that I can think of is to create separate Django apps for each parser, but how would I reference which team uses which app in the database (same as done here?)?
The approach you're taking seems valid to me. Keep in mind that you are effectively running arbitrary python code so I would never be using something like this on a public facing site and ensure you trust the teams writing their parsers.
You might make this a bit nicer by writing a custom Field to represent the module path that will remove the need to handle the import everytime and will instead handle the import for you and return the parse method (or maybe even better a Parser object where you could tell the teams to implement an interface)
The best example might be to look at the source for django's ImageField or even CharField. Instead of having having your model have a CharField, you'd have a "ModuleField": parser = ModuleField(). The database stored value would indeed be the path to the module (so simply make it a subclass of CharField), but override the to_python method. In your new to_python method handle importing the module and return a python object.
That python object could be anything you want it to be, from your example you could return theParser.parse. This would result in if you have a Parser instance foo you could the_parser_method = foo.parser
Related
I am creating an electronic reference library for my school, where I have created a custom admin page (not the Django admin page). And in my requirements there needs to be a specific 'settings' page where I can store small information such as
How frequently to create reports
How frequently users should change their passwords etc.
These kinds of custom settings are not available in Django by default. But is there a way I can implement it in my app? And how should I store this information? Should I create another table to store these information?
Please let me know how to tackle such a situation.
Thanks!
Yes, the best way would probably be just to create another table for your settings.
That is because:
a. Django provides good tools for DB management, so it is less to worry about the infra.
b. You will usually have a DB access. You cannot say the same about other resources, depends on your server.
Just do something like this:
Then you may want to ensure somehow you only have one record, you can look here for example.
class MySettings(models.Model):
setting1 = models.IntegerField(default=10)
setting2 = models.IntegerField(default=11)
There are alternatives, of course, for example creating a simple dict containing your settings, and add some methods to serialize it to the local disc, in a well known location. Something like this:
import json
MY_SETTINGS_LOCATION = "/path/to/my/settings.json"
def load_settings() {
with open(MY_SETTINGS_LOCATION, "r") as f:
return json.load(f)
}
def save_settings(settings) {
with open(MY_SETTINGS_LOCATION, "w") as f:
return json.dump(settings, f)
}
I have created custom Django model-field subclasses based on CharField but which use to_python() to ensure that the model objects returned have more complex objects (some are lists, some are dicts with a specific format, etc.) -- I'm using MySQL so some of the PostGreSql field types are not available.
All is working great, but Pylint believes that all values in these fields will be strings and thus I get a lot of "unsupported-membership-test" and "unsubscriptable-object" warnings on code that uses these models. I can disable these individually, but I would prefer to let Pylint know that these models return certain object types. Type hints are not helping, e.g.:
class MealPrefs(models.Model):
user = ...foreign key...
prefs: dict[str, list[str]] = \
custom_fields.DictOfListsExtendsCharField(
default={'breakfast': ['cereal', 'toast'],
'lunch': ['sandwich']},
)
I know that certain built-in Django fields return correct types for Pylint (CharField, IntegerField) and certain other extensions have figured out ways of specifying their type so Pylint is happy (MultiSelectField) but digging into their code, I can't figure out where the "magic" specifying the type returned would be.
(note: this question is not related to the INPUT:type of Django form fields)
Thanks!
I had a look at this out of curiosity, and I think most of the "magic" actually comes for pytest-django.
In the Django source code, e.g. for CharField, there is nothing that could really give a type hinter the notion that this is a string. And since the class inherits only from Field, which is also the parent of other non-string fields, the knowledge needs to be encoded elsewhere.
On the other hand, digging through the source code for pylint-django, though, I found where this most likely happens:
in pylint_django.transforms.fields, several fields are hardcoded in a similar fashion:
_STR_FIELDS = ('CharField', 'SlugField', 'URLField', 'TextField', 'EmailField',
'CommaSeparatedIntegerField', 'FilePathField', 'GenericIPAddressField',
'IPAddressField', 'RegexField', 'SlugField')
Further below, a suspiciously named function apply_type_shim, adds information to the class based on the type of field it is (either 'str', 'int', 'dict', 'list', etc.)
This additional information is passed to inference_tip, which according to the astroid docs, is used to add inference info (emphasis mine):
astroid can be used as more than an AST library, it also offers some
basic support of inference, it can infer what names might mean in a
given context, it can be used to solve attributes in a highly complex
class hierarchy, etc. We call this mechanism generally inference
throughout the project.
astroid is the underlying library used by Pylint to represent Python code, so I'm pretty sure that's how the information gets passed to Pylint. If you follow what happens when you import the plugin, you'll find this interesting bit in pylint_django/.plugin, where it actually imports the transforms, effectively adding the inference tip to the AST node.
I think if you want to achieve the same with your own classes, you could either:
Directly derive from another Django model class that already has the associated type you're looking for.
Create, and register an equivalent pylint plugin, that would also use Astroid to add information to the class so that Pylint know what to do with it.
I thought initially that you use a plugin pylint-django, but maybe you explicitly use prospector that automatically installs pylint-django if it finds Django.
The checker pylint neither its plugin doesn't check the code by use information from Python type annotations (PEP 484). It can parse a code with annotations without understanding them and e.g. not to warn about "unused-import" if a name is used in annotations only. The message unsupported-membership-test is reported in a line with expression something in object_A simply if the class A() doesn't have a method __contains__. Similarly the message unsubscriptable-object is related to method __getitem__.
You can patch pylint-django for your custom fields this way:
Add a function:
def my_apply_type_shim(cls, _context=None): # noqa
if cls.name == 'MyListField':
base_nodes = scoped_nodes.builtin_lookup('list')
elif cls.name == 'MyDictField':
base_nodes = scoped_nodes.builtin_lookup('dict')
else:
return apply_type_shim(cls, _context)
base_nodes = [n for n in base_nodes[1] if not isinstance(n, nodes.ImportFrom)]
return iter([cls] + base_nodes)
into pylint_django/transforms/fields.py
and also replace apply_type_shim by my_apply_type_shim in the same file at this line:
def add_transforms(manager):
manager.register_transform(nodes.ClassDef, inference_tip(my_apply_type_shim), is_model_or_form_field)
This adds base classes list or dict respectively, with their magic methods explained above, to your custom field classes if they are used in a Model or FormView.
Notes:
I thought also about a plugin stub solution that does the same, but the alternative with "prospector" seems so complicated for SO that I prefer to simply patch the source after installation.
Classes Model or FormView are the only classes created by metaclasses, used in Django. It is a great idea to emulate a metaclass by a plugin code and to control the analysis simple attributes. If I remember, MyPy, referenced in some comment here, has also a plugin mypy-django for Django, but only for FormView, because writing annotations for django.db is more complicated than to work with attributes. - I was trying to work on it for one week.
Can anyone guide me towards the right direction as to where I should place a script solely for loading data into ndb. As I wish to upload all data into the gae ndb so that the application could perform query on it.
Right now, the loading of data is in my application. I wish to placed it separately from the main application.
Should it be edited in the yaml file?
EDITED
This is a snippet of the entity and the handler to upload the data into GAE ndb.
I wish to placed this chunk of code separately from my main application .py. Reason being the uploading of this data won't be done frequently and to keep the codes in the main application "cleaner".
class TagTrend_refine(ndb.Model):
tag = ndb.StringProperty()
trendData = ndb.BlobProperty(compressed=True)
class MigrateData(webapp2.RequestHandler):
def get(self):
listOfEntities = []
f = open("tagTrend_refine.txt")
lines = f.readlines()
f.close()
for line in lines:
temp = line.strip().split("\t")
data = TagTrend_refine(
tag = temp[0],
trendData = temp[1]
)
listOfEntities.append(data)
ndb.put_multi(listOfEntities)
For example if I placed the above code in a file called dataLoader.py, where should I call this script to invoke?
In app.yaml alongside my main application(knowledgeGraph.application)?
- url: /.*
script: knowledgeGraph.application
You don't show us the application object (no doubt a WSGI app) in your knowledge.py module, so I can't know what URL you want to serve with the MigrateData handler -- I'll just guess it's something like /migratedata.
So the class TagTrend_refine should be in a separate file (usually called models.py) so that both your dataloader.py, and your knowledge.py, can import models to access it (and models.py will need its own import of ndb of course). (Then of course access to the entity class will be as models.TagTrend_refine -- very basic Python).
Next, you'll complete dataloader.py by defining a WSGI app, e.g, at end of file,
app = webapp2.WSGIApplication(routes=[('/migratedata', MigrateData)])
(of course this means this module will need to import webapp2 as well -- can I take for granted a knowledge of super-elementary Python?).
In app.yaml, as the first URL, before that /.*, you'll have:
url: /migratedata
script: dataloader.app
Given all this, when you visit '/migratedata', your handler will read the "tagTrend_refine.txt" file that you uploaded together with your .py, .yaml, and so on, files in your overall GAE application, and unconditionally create one entity per line of that file (assuming you fix the multiple indentation problems in your code as displayed above, but, again, this is just super-elementary Python -- presumably you've used both tabs and spaces and they show up OK in your editor, but not here on SO... I recommend you use strictly, only spaces, never tabs, in Python code).
However this does seem to be a peculiar task. If /migratedata gets visited twice, it will create duplicates of all entities. If you change the tagTrend_refine.txt and deploy a changed variation, then visit /migratedata... all old entities will stick around and all the new entities will join them. And so forth.
Moreover -- /migratedata is NOT idempotent (if visited more than once it does not produce the same state as running it just once) so it shouldn't be a GET (and now we're on to super-elementary HTTP for a change!-) -- it should be a POST.
In fact I suspect (but I'm really flying blind here, since you see fit to give such tiny amounts of information) that you in fact want to upload a .txt file to a POST handler and do the updates that way (perhaps avoiding duplicates...?). However, I'm no mind reader, so this is about as far as I can go.
I believe I have fully answered the question you posted (though perhaps not the one you meant but didn't express:-) and by SO's etiquette it would be nice to upvote and accept this answer, then, if needed, post another question, expressing MUCH more clearly and completely what you're trying to achieve, your current .py and .yaml (ideally with correct indentation), what they actually do and why you'd like to do something different. For POST vs GET in particular, just study When should I use GET or POST method? What's the difference between them? ...
Alex's solution will work, as long as all you data can be loaded in under 1 minute, as that's the timeout for an app engine request.
For larger data, consider calling the datastore API directly from your own computer where you have the source. It's a bit of a hassle because it's a different API; it's not ndb. But it's still a pretty simple API. Here's some code that calls the API:
https://github.com/GoogleCloudPlatform/getting-started-python/blob/master/2-structured-data/bookshelf/model_datastore.py
Again, this code can run anywhere. It doesn't need to be uploaded to app engine to run.
I have made a django app that creates models and database tables on the fly. This is, as far as I can tell, the only viable way of doing what I need. The problem arises of how to pass a dynamically created model between pages.
I can think of a few ways of doing such but they all sound horrible. The methods I can think of are:
Use global variables within views.py. This seems like a horrible hack and likely to cause conflicts if there are multiple simultaneous users.
Pass a reference in the URL and use some eval hackery to try and refind the model. This is probably stupid as the model could potentially be garbage collected en route.
Use a place-holder app. This seems like a bad idea due to conflicts between multiple users.
Having an invisible form that posts the model when a link is clicked. Again very hacky.
Is there a good way of doing this, and if not, is one of these methods more viable than the others?
P.S. In case it helps my app receives data (as a json string) from a pre-existing database, and then caches it locally (i.e. on the webserver) creating an appropriate model and table on the fly. The idea is then to present this data and do various filtering and drill downs on it with-out placing undue strain on the main database (as each query returns a few hundred results out of a database of hundreds of millions of data points.) W.R.T. 3, the tables are named based on a hash of the query and time stamp, however a place-holder app would have a predetermined name.
Thanks,
jhoyla
EDITED TO ADD: Thanks guys, I have now solved this problem. I ended up using both answers together to give a complete answer. As I can only accept one I am going to accept the contenttypes one, sadly I don't have the reputation to give upvotes yet, however if/when I ever do I will endeavor to return and upvote appropriately.
The solution in it's totality,
from django.contrib.contenttypes.models import ContentType
view_a(request):
model = create_model(...)
request.session['model'] = ContentType.objects.get_for_model(model)
...
view_b(request):
ctmodel = request.session.get('model', None)
if not ctmodel:
return Http404
model = ctmodel.model_class()
...
My first thought would be to use content types and to pass the type/model information via the url.
You could also use Django's sessions framework, e.g.
def view_a(request):
your_model = request.session.get('your_model', None)
if type(your_model) == YourModel
your_model.name = 'something_else'
request.session['your_model'] = your_model
...
def view_b(request):
your_model = request.session.get('your_model', None)
...
You can store almost anything in the session dictionary, and managing it is also easy:
del request.session['your_model']
I am working on a Django app and I have a class which reads the contents of a file and returns a Django model. My question is where do I store this class in the file system? All this does is reads the file, populates a Django model and returns it.
Thanks
There is nothing special about a Django application: it's just a Python package. Technically you can put the class anywhere you can import.
With that being said, it's best to keep related code bundled together. It sounds like a good place for this particular class is in the file that declares the Model it returns.
On the other hand it might be logical to throw it into the application's __init__.py file.
You could also make a utils, etc, admin, scripts . . . folder/package to put utility classes and scripts if it's meant to be used for administration and site maintenance.
In the end it's more about how you want to organize your project, but technically it can live just about anywhere.