Restrict FilePathField to only existing files - django

FilePathField is described in Django's doc as:
A CharField whose choices are limited to the filenames in a certain directory on the filesystem.
Then I assume it checks if the existence of the file. But actually it doesn't:
class Unit(models.Model):
path = FilePathField(path="/home/jason/")
In IPython shell:
unit = Unit(path="non_exist_file_name")
unit.save()
No exception raised. So I have to check os.path.isfile myself or I am not using FilePathField correctly for my need (restrict only to existing files when creating a Unit)?

If I'm not mistaken, the validation for the FilePathField is not run if you don't go trough a Form. Try to call unit.clean_fields() before saving.

Related

Delete Django ImageField and FileField records whose files are not found on the server

I've recently lost some files in my media folder, and I want to delete the image field/FileField objects whose files have been removed, across all models of my application.
I've tried django-cleanup, but it appears to be doing the inverse operation, i.e. deleting files on the server whose objects have been removed from the database.
You can write a management command for this, here is a way how to handle this
Class cleanup(BaseCommand):
def handle(self,options):
for obj in Files.objects.all():
if os.path.exists(settings.MEDIA_DIR+ obj.filename): continue
obj.delete()
Note that a FileField will be falsy if the file is not there, so you can use this simple solution:
for instance in ModelWithFileField.objects.all():
if bool(instance.file_field):
continue
instance.delete()
You can even do it in a django shell, so you do not have to write a script for it.

Django: Save ContentFile (or some kind of virtual file) to database

In my django app I create a string which I have to save to my database as a File.
If i understand correctly, django is supposed to automatically upload the file when I save the entry:
class Foo(models.Model):
bar1=models.ForeignKey(Table1)
bar2=models.ForeignKey(Table2)
res=models.FileField(upload_to='MYPATH')
The problem is that to create an instance of Foo, I have to first create a physical file on the server's disk which would mean two copies would be created (one by me in order to create a database entry, one by django when saving the entry).
As far as I can see, I must create the file myself in 'MYPATH' and instead of using a FileField, I have to save a reference in the database (essentially what django is doing ????). However I have doubts that this is the best method as
It doesn't strike me as Pythonesque.
I won't have access to the same methods as when using a real FileField. For instance when calling it, I won't have a FieldFile but just a reference string.
Basically, what I wanted to do was: String --> ContentFile (or some form of "virtual" file) --> Physical File handled by Django when saving entry in the database.
entry = Foo(bar1=var1, bar2=var2, res=ContentFile(XMLSTRING))
entry.save()
This doesn't work, but it shows what I want to achieve.
So, please show me one of the following three:
How to save a file to the database without physically creating it (using a ContentFile doesn't create a physical file after saving the entry which is what I want to do)
Make django not upload the given file but use the already uploaded version whilst maintaining all the methods provided by FileField
What I don't understand.
I apologize for [my english, my lack of understanding, the lack of clarity]
Anything you need to know, I'd happy to specify.
EDIT: I looked at this thread, but there, the urlretrieve creates a temp file, which is something I don't really want to do. Maybe I should be doing that, but is there a better way?

Use validation to prevent duplicate file _name_ being uploaded

How can I detect that the name of a file that a user has provided for upload (via a django.forms.ModelForm using a FileField field) is a duplicate of one that exists, and thus decide to fail validation on the form?
I'm finding this particularly challenging, because from within the form, I don't see how I can find out what the value of upload_to is for this FileField, so I can't go looking myself in the file system to see if that file is there already.
As i see it you have 2 options:
Set a value in your settings.py to hold your 'upload_to' and then use it to check when you are validating.
Something like this to verify would work (you need to change your upload_to ofc):
from django.conf import settings
if settings.UPLOAD_TO:
# Do something
Issue with that is that you can't have subfolders or anything complex there.
A second option would be, as mentioned in your comments, to add a new column to your model that holds a hash for your file. This approach should work better. As someone mentioned in your comments, to avoid uploading a big file, checking, failing, uploading another big file, etc, you can try to hash it in the client and verify it via ajax first (you will verify it again in the server, but this can make things go faster for your users).
Older question, but Django 1.11 now supports the unique option on FileField. Set unique=True on your field declaration on your model.
It shouldn't matter what you are setting upload_to to. The file name will still be stored in the database.
Changed in Django 1.11:
In older versions, unique=True can’t be used on FileField.
https://docs.djangoproject.com/en/1.11/ref/models/fields/#unique

In Django, how can I tell my project to read/write in distinct log directories at runtime?

I am working on a django project that, along with having a database for its models and relations, writes to a log directory called activity_logs outside of the project directory to keep track of formatted user activities, one file for each user. This is an alternative file-structure-based solution to having a database table carry this information along, because this offloads some storage from the DB and is relatively easy to format and express such activities. Perhaps some of you may recommend storing this kind of data in the database, which is fine, but I still believe there is question from all of this that I need help answering.
This django project has multiple apps that have an extensive test suite, one for each app. Additionally, there is a logging.py file that encapsulates the logging functionality (writing/reading activities to/from log files), and so both the test cases within the test suites as well as the view functions (and various other utility functions) all utilize these logging functions in order to store these user activities and retrieve them based on model relationships to emulate a user notification system. Since the logging module takes care of this logging, it needs to know where to write to, and so we have a directory structure called activity_logs to which it writes user log files, creating one for a new user and deleting one for a user removed from the database. One of the newest changes we would like to make in this project is to create a separate logging directory for testing this logging functionality, something like test_activity_logs, so that it would never be confused when writing to the test directory for test users or the regular activity log directory for real users.
My problem is this: at runtime, how can I tell the system, at whichever startpoint of execution (whether it be from a view function call through the django test Client object, a test case, an actual HTTPrequest made via a URL, etc.), when to look inside the activity_logs or the test_activity_logs directory? It solely depends on whether I am generating new information for a real user or a test user, but a User is a User in our system, and I'm facing some trouble trying to tell these functions that call some logging functions to write to the test log directory vs. the regular one. For example, one approach I am trying is to pass a keyword argument (kwarg) to the logging functions so that they can be made aware of which directory to read/write to/from, like so:
self.assertTrue(activity_has_been_logged(ACTIVITY_ACCOUNT_CREATED, user.get_profile(), use_test_activity_log_directory=True) == True)
the kwarg called use_test_activity_log_directory=True will tell the logging function called activity_has_been_logged to read the test activity log directory. Unfortunately, apart from being a little inflexible (but tolerable), this doesn't solve the situation where the django test client object sends a GET or POST request via a URL to a view function that writes activities to log files:
response = client.post(propose_match_url, post) #Can't write to test_log_directory if by default it writes to regular directory!
How do I let the client pass on this kwarg to those view functions? I think that it should totally be possible to do this, but I'm not sure if fiddling with these kwargs is the best way, or maybe create a global variable in the project settings file, but maybe that might cause some trouble with race conditions with a shared mutable variable.
Your help would be great. Thanks in advance!
So I just solved this problem. The logging file hosting all logging functionality is really the only place that needs to know where to look (either test_activity_logs or activity_logs), since all other components will invoke functions from the logging module to write/read to/from these directories. I gave an additional field to the model class of the UserProfile class called is_test that is a boolean field to determine whether to look in the test_activity_logs if is_test=True, or activity_logs if is_test=False. That way, the logging module needs only to check the input parameter of type UserProfile and its new field to determine where to perform its logging functionalities. Problem solved!
Check out daemontools if you're on a *nix box or launchd on OS X. Both can make sure your Django instance stays running in whatever mode you prefer (daemontools has a few more options for that) and can isolate a directory for logging stdout/stderr.
You can set environment variables for each instance to help other log files and temporary files know where to be created, which you then get from os.environment or simply use the current working directory as a base if using daemontools.
The directory is automatically created for you using daemontools.

Django: Model-dependent Apps to Choose From Based on User

I have a Django app where multiple teams upload content that will be parsed. The app keeps track of certain common information in the parsed content. The issue here is that each team has content that needs to be parsed by a different parser because the content is in a different format (e.g. some teams have XML content, some have text, some have JSON, etc...). Each of the teams has provided a parser (a python module) that grabs the necessary info that is placed into corresponding Django models after parsing.
My question is: what is the best way to architect this in Django where each team can have their own parser setup correctly? It can be purely backend, no need for a user form or anything like that. My first thought was that I would create a Parser model with ForeignKey to each team like so:
class Parser(models.Model):
team = models.ForeignKey('Team')
module_path = models.CharField(max_length=..., blank=False)
and that module_path would be something like "parsers.teamA.XMLparser" which would reside in my app code in that path like so:
parsers/
teamA/
__init__.py
XMLparser.py
teamB/
Then when my app is coming to parse the uploaded content, I would have this:
team = Team.objects.get(id=team_id)
parser = Parser.objects.get(team=team)
theParser = __import__(parser.module_path)
theParser.parse(theStuffToBeParsed)
What problems does anyone see with this? The only other option that I can think of is to create separate Django apps for each parser, but how would I reference which team uses which app in the database (same as done here?)?
The approach you're taking seems valid to me. Keep in mind that you are effectively running arbitrary python code so I would never be using something like this on a public facing site and ensure you trust the teams writing their parsers.
You might make this a bit nicer by writing a custom Field to represent the module path that will remove the need to handle the import everytime and will instead handle the import for you and return the parse method (or maybe even better a Parser object where you could tell the teams to implement an interface)
The best example might be to look at the source for django's ImageField or even CharField. Instead of having having your model have a CharField, you'd have a "ModuleField": parser = ModuleField(). The database stored value would indeed be the path to the module (so simply make it a subclass of CharField), but override the to_python method. In your new to_python method handle importing the module and return a python object.
That python object could be anything you want it to be, from your example you could return theParser.parse. This would result in if you have a Parser instance foo you could the_parser_method = foo.parser