Use validation to prevent duplicate file _name_ being uploaded - django

How can I detect that the name of a file that a user has provided for upload (via a django.forms.ModelForm using a FileField field) is a duplicate of one that exists, and thus decide to fail validation on the form?
I'm finding this particularly challenging, because from within the form, I don't see how I can find out what the value of upload_to is for this FileField, so I can't go looking myself in the file system to see if that file is there already.

As i see it you have 2 options:
Set a value in your settings.py to hold your 'upload_to' and then use it to check when you are validating.
Something like this to verify would work (you need to change your upload_to ofc):
from django.conf import settings
if settings.UPLOAD_TO:
# Do something
Issue with that is that you can't have subfolders or anything complex there.
A second option would be, as mentioned in your comments, to add a new column to your model that holds a hash for your file. This approach should work better. As someone mentioned in your comments, to avoid uploading a big file, checking, failing, uploading another big file, etc, you can try to hash it in the client and verify it via ajax first (you will verify it again in the server, but this can make things go faster for your users).

Older question, but Django 1.11 now supports the unique option on FileField. Set unique=True on your field declaration on your model.
It shouldn't matter what you are setting upload_to to. The file name will still be stored in the database.
Changed in Django 1.11:
In older versions, unique=True can’t be used on FileField.
https://docs.djangoproject.com/en/1.11/ref/models/fields/#unique

Related

Why doesn't Django's `full_clean` work when using a non-default database?

Note, this is a rewrite of a deleted question. I'd have edited it, but there were no comments, so I never even knew it was deleted until I went to add new details today.
Background: I'm modifying our code-base to make use of multiple databases for validation of user-submitted load files. This is for automated checks of that user data so that they can fix simple issues with their files, like checks on uniqueness and such.
Our original code (not written by me) has some fundamental issues with loading. It had side-effects. It should have used transaction.atomic, but didn't and simply adding it broke the codebase, so while a refactor will eventually fix that properly, to reduce effort...
Question: I created a second database (with the alias "validation") and inserted .using(db) and .save(using=db) in all the necessary places in our code, so that loading data can be tested without risking the production data.
Everything works as expected with the 2 databases except calls to full_clean(). Take this example:
new_compound_dict = {
name="test",
formula="C1H4",
hmdb_id="HMBD0000001",
}
new_compound = Compound(**new_compound_dict)
new_compound.full_clean()
new_compound.save(using="validation")
It gives me this error:
django.core.exceptions.ValidationError: {'name': ['Compound with this Name already exists.'], 'hmdb_id': ['Compound with this HMDB ID already exists.']}
I get the same error with this code:
new_compound, inserted = Compound.objects.using("validation").get_or_create(**new_compound_dict)
if inserted:
new_compound.full_clean()
Both examples above work without a problem on the default database.
I looked up full_clean in the docs for django 3.2, but I don't see a way to have it run against a database other than default, which I'm assuming is what I would need to do to fix this. There's not even a mention of any potential issues related to a non-default database that I can find. I had expected the doc to show that there's a using parameter to full_clean (like the one for .save(using=db)), but there's no such parameter.
I debugged the above examples with this before and after each example block:
Compound.objects.using("default").filter(**new_compound_dict).count()
Compound.objects.using("validation").filter(**new_compound_dict).count()
For the default database, the counts are 0 before and 1 after with no error. For the validation database, the counts are 0 and 1, but with the error mentioned above.
At this point, I'm confounded. How can I run full_clean on a database other than the default? Does full_clean just fundamentally not support non-default databases?
Footnote: The compound loading data in the example above is never validated in a user submission. It is necessary that the compound data be in both databases in order to validate the data submitted by the user, so the compound load script is one of 2 scripts that loads data into both databases (so that it's in the validation DB when the user submits their data). The default load always happens before the validation load and when I load the validation database, the test compound is always present in the default database and is only present after the validation load in the validation database.

Django: Save ContentFile (or some kind of virtual file) to database

In my django app I create a string which I have to save to my database as a File.
If i understand correctly, django is supposed to automatically upload the file when I save the entry:
class Foo(models.Model):
bar1=models.ForeignKey(Table1)
bar2=models.ForeignKey(Table2)
res=models.FileField(upload_to='MYPATH')
The problem is that to create an instance of Foo, I have to first create a physical file on the server's disk which would mean two copies would be created (one by me in order to create a database entry, one by django when saving the entry).
As far as I can see, I must create the file myself in 'MYPATH' and instead of using a FileField, I have to save a reference in the database (essentially what django is doing ????). However I have doubts that this is the best method as
It doesn't strike me as Pythonesque.
I won't have access to the same methods as when using a real FileField. For instance when calling it, I won't have a FieldFile but just a reference string.
Basically, what I wanted to do was: String --> ContentFile (or some form of "virtual" file) --> Physical File handled by Django when saving entry in the database.
entry = Foo(bar1=var1, bar2=var2, res=ContentFile(XMLSTRING))
entry.save()
This doesn't work, but it shows what I want to achieve.
So, please show me one of the following three:
How to save a file to the database without physically creating it (using a ContentFile doesn't create a physical file after saving the entry which is what I want to do)
Make django not upload the given file but use the already uploaded version whilst maintaining all the methods provided by FileField
What I don't understand.
I apologize for [my english, my lack of understanding, the lack of clarity]
Anything you need to know, I'd happy to specify.
EDIT: I looked at this thread, but there, the urlretrieve creates a temp file, which is something I don't really want to do. Maybe I should be doing that, but is there a better way?

Making a username unique for database insertion

I am inserting unique names into a database table that have been submitted by users as their username.
When a name is submitted via a form, my ColdFusion code checks the database to see if that name already exists. If it does exist then it makes it the username unique by adding a sequential number to it.
My issue is that while checking the database for a name conflict is easy enough, I also don't want the name to conflict with the name of any folder, .cfm file, or .html file in my site.
At the moment I am using a simple ListFindNoCase('folder1,folder2,folderN', username) function to check for conflicts but this is done manually. Whenever I add a new file or folder to the site I have to add it to this list. Its not a good way to do it.
How can I get a list of all the contents in my site and make it into a delimited list and then do the ListFindNoCase() function to check if the username is in that list of contents? Is this even a pragmatic way to go about it?
Turn your 'folder1,folder2,folderN' into a getter function that returns a list of folders.
Then you can decide how to gather that list of folders.
Here are several ways I can think of:
some global config file, or if you use coldbox, use coldbox's config.cfc settings
do a directoryList() and figure it out dynamically, and optionally cache the result
store the forbidden folder names in DB and check against the DB using sql

Django doesn't read from database – no error

I just set up the environment for an existing Django project, on a new Mac. I know for certain there is nothing wrong with the code itself (just cloned the repo), but for some reason, Django can't seem to retrieve data from the database.
I know the correct tables and data is in the db.
I know the codebase is as it should be.
I can make queries using the Django shell.
Django doesn't throw any errors despite the data missing on the web page.
I realize that it's hard to debug this without further information, but I would really appreciate a finger pointing me to the right direction. I can't seem to find any useful logs.
EDIT:
I just realized the problem lies elsewhere. Unfortunately I can't delete this post with the bounty still open.
Without seeing any code, I can only suggest some general advice that might help you debug your problem. Please add a link to your repository if you can or some snippets of your database settings, the view which includes the database queries etc...
Debugging the view
The first thing I would recommend is using the python debugger inside the view which queries the database. If you've not used pdb before, it's a life saver which allows you to set breakpoints in your Python script and then interactively execute code inside the interpreter
>>> import pdb
>>> pdb.set_trace()
>>> # look at the results of your queries
If you are using the Django ORM, the QuerySet returned from the query should have all the data you expect.
If it doesn't then you need to look into your database configuration in settings.py.
If it does, then you must might not be returning that object to the template? Unlikely as you said the code was the same, but double check the objects you pass with your HttpResponse object.
Debugging the database settings
If you can query the database using the project settings inside settings.py from the django shell it sounds unlikley that there is a problem with this - but like everything double check.
You said that you've set up a new project on a mac. What is on a different operating system before? Maybe there is a problem with the paths now - to make your project platform independent remember to use the os.path.join() method when working with file paths.
And what about the username and password details....
Debugging the template
Maybe your template is referencing the wrong object variable name or object attribute.You mentioned that
Django doesn't throw any errors despite the data missing on the web
page.
This doesn't really tell us much - to quote the Django docs -
If you use a variable that doesn’t exist, the template system will
insert the value of the TEMPLATE_STRING_IF_INVALID setting, which is
set to '' (the empty string) by default.
So to check all the variables available to your template, you could use the debug template tag
{{ debug }}
Probably even better though is to use the django-debugging-toolbar - this will also let you examine the SQL queries your view is making.
Missing Modules
I would expect this to raise an exception if this were the problem, but have you checked that you have the psycopg module on your new machine?

Restrict FilePathField to only existing files

FilePathField is described in Django's doc as:
A CharField whose choices are limited to the filenames in a certain directory on the filesystem.
Then I assume it checks if the existence of the file. But actually it doesn't:
class Unit(models.Model):
path = FilePathField(path="/home/jason/")
In IPython shell:
unit = Unit(path="non_exist_file_name")
unit.save()
No exception raised. So I have to check os.path.isfile myself or I am not using FilePathField correctly for my need (restrict only to existing files when creating a Unit)?
If I'm not mistaken, the validation for the FilePathField is not run if you don't go trough a Form. Try to call unit.clean_fields() before saving.