Find and delete similar records in Postgres - django

TL\DR version - Find and delete rows in Postgres with same date but different time, leaving one record per date.
Long read:
At some point we've migrated our app's backend to a newer version - this is a Django application - migrated from Python 2, Django 1.8 to Python 3 - Django 4, and with this update we're changed timezone for backend from UTC+2 to UTC+3. And now strange things happens - records which previously successfully have been read from db with queryset StatChildVisit.objects.filter(date=day, garden_group=garden_group) - (day is a python's date only not datetime) after update returns empty queryset, although records for that day are still in db. More so newly created records have different time in them - records created with old timezone looks like 2022-12-28 22:00:00.000000 +00:00 new records looks like 2022-12-28 21:00:00.000000 +00:00
Seems that bug happened because date field in django's model have been declared as DateTimeField -
class StatChildVisit(models.Model):
child = models.ForeignKey(Child, on_delete=models.CASCADE)
date = models.DateTimeField(default=timezone.now)
visit = models.BooleanField(_('Atended'), default=True)
disease = models.BooleanField(_('Sick'), default=False)
other_approved = models.BooleanField(_('Other approved'), default=False)
garden_group = models.ForeignKey(GardenGroup, verbose_name=_('Garden group'), editable=False, blank=True, null=True, on_delete=models.CASCADE)
rossecure_visit = models.ForeignKey('rossecure.Visits', editable=False, null=True, blank=True, on_delete=models.CASCADE)
class Meta:
verbose_name = _('Attendence')
verbose_name_plural = _('Attendence')
index_together = (
('date', 'garden_group'),
)
unique_together = (
('date', 'child'),
)
all records are always being created with date only (not datetime) passing to constructor
So we've decided to migrate this field to DateField, but after migration field type in DB is still 'timestamp with time zone', and besides, because this is a production database users after finding that some data looks like lost partially recreated records.
So now we have multiple records for same day but with different time which need to be deleted and because of constrains table column can not be altered with ALTER TABLE reports_statchildvisit ALTER COLUMN date TYPE date;
Because table have rather large records count (about 4 million) I think that problem should be solved via SQL side, and not Django side. My plan is to delete duplicates and then change column type to date.
I've tried to alter records with
update reports_statchildvisit
set date = date(date) + '21:00:00'::time
but because I've tried that after users created similar records script failed with ERROR: duplicate key value violates unique constraint
UPD: DDL on the SQL side looks like this:
create table public.reports_statchildvisit
(
id serial
primary key,
date timestamp with time zone not null,
visit boolean not null,
disease boolean not null,
child_id integer not null
constraint reports_statchil_child_id_30fbdf92a34d3fea_fk_children_child_id
references public.children_child
deferrable initially deferred,
garden_group_id integer
constraint repor_garden_group_id_ef61dd52421b5d2_fk_project_gardengroup_id
references public.project_gardengroup
deferrable initially deferred,
other_approved boolean not null,
rossecure_visit_id integer
constraint repo_rossecure_visit_id_488614f59207663f_fk_rossecure_visits_id
references public.rossecure_visits
deferrable initially deferred,
constraint reports_statchildvisit_date_3d6916481fe1e727_uniq
unique (date, child_id)
);
alter table public.reports_statchildvisit
owner to django;
create index reports_statchildvisit_10e12719
on public.reports_statchildvisit (garden_group_id);
create index reports_statchildvisit_42d2af72
on public.reports_statchildvisit (rossecure_visit_id);
create index reports_statchildvisit_date_66064e65c46d4137_idx
on public.reports_statchildvisit (date, garden_group_id);
create index reports_statchildvisit_f36263a3
on public.reports_statchildvisit (child_id);

Related

Django Many to Many migration can't insert null into id

I'm trying to migrate a column from a Char field to a Many-to-Many field running Django 1.8.2. I'm doing a custom Data Migration, to move the data properly. When I try to migrate, I get a database error, can't insert null into the many to many table id column.
My models, simplified:
class LicenseArea(models.Model):
#appraisal_account = models.CharField(max_length=17, null=True, db_index=True)
appraisal_account = models.ManyToManyField(TaxAccount, db_table='LicAreaTaxAccount', related_name='accounts_for_license_area', related_query_name='license_area_for_account', null=True)
class TaxAccount(models.Model):
account = models.CharField(max_length=17, db_index=True)
So I first create TaxAccount objects in a RunPython block, then remove the old field and add the new one, like so:
migrations.RunPython(create_tax_account_objects),
migrations.RemoveField(
model_name='licensearea',
name='appraisal_account',
),
migrations.AddField(
model_name='licensearea',
name='appraisal_account',
field=models.ManyToManyField(related_query_name='license_area_for_account', related_name='accounts_for_license_area', db_table='licenses_LicAreaTaxAccount', to='licenses.TaxAccount'),
),
All that works. My issue comes when I try to migrate the data, relating the LicenseArea object with its corresponding TaxAccount object. In another RunPython block, I try the code shown below (I've tried in both directions; acct.licensearea_set.add indicates that the TaxAccount model has no licensearea_set attribute, while the second option shown below gives me the IntegrityError (ORA-01400) that I can't insert null into the ID column :
for la in LicenseArea.objects.all():
acct = TaxAccount.objects.get(account=la.appraisal_account_temp)
#acct.licensearea_set.add(la)
#la.appraisal_account.add(acct)
How do I solve this? Thanks in advance.

Django ManyToManyField not present in Sqlite3

I'm new to Django and I have some issues with a ManyToMany relationship.
I work on a blastn automatisation and here are my classes:
class Annotation(models.Model):
sequence = models.IntegerField()
annotation = models.TextField()
start = models.IntegerField()
end = models.IntegerField()
class Blast(models.Model):
sequence = models.ManyToManyField(Annotation, through="AnnotBlast")
expectValue = models.IntegerField()
class AnnotBlast(models.Model):
id_blast = models.ForeignKey(Blast, to_field="id")
id_annot = models.ForeignKey(Annotation, to_field="id")
class Hit(models.Model):
id_hit = models.ForeignKey(Blast, to_field="id")
length = models.IntegerField()
evalue = models.IntegerField()
start_seq = models.IntegerField()
end_seq = models.IntegerField()
In a view, I want to access to Annotation's data from the rest of the model via this many to many field and then apply filters based on a form. But when I do a syncdb , the "sequence" field of the Blast class disappear :
In Sqlite3 :
.schema myApp_blast
CREATE TABLE "myApp_blast" (
"id" integer not null primary key,
"expectValue" integer not null
);
So I can't load data in this table as I want. I don't understand why this field disappear during the syncdb. How can I do to link the first class to the others (and then be able to merge data in a template) ?
A ManyToManyField isn't itself a column in the database. It's represented only by an element in the joining table, which you have here defined explicitly as AnnotBlast (note that since you're not defining any extra fields on the relationship, you didn't actually need to define a through table - Django would have done it automatically if you hadn't).
So to add data to your models, you add data to the AnnotBlast table pointing at the relevant Blast and Annotation rows.
For a many-to-many relationship an intermediate join table is created, see documentation here: https://docs.djangoproject.com/en/1.2/ref/models/fields/#id1

Adding values in my database via a ManyToMany relationship represented in admin.py

I've got a tiny little problem that, unfortunately, is taking all my time.
It is really simple, I already have my database and I created then modified models.py, and admin.py. Some staff users, who will need to enter values in my database, need the simpliest form to do so.
Here is my database :
-- Table NGSdb.line
CREATE TABLE IF NOT EXISTS `NGSdb`.`line` (
`id` INT NOT NULL AUTO_INCREMENT ,
`value` INT NOT NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
CREATE UNIQUE INDEX `value_UNIQUE` ON `NGSdb`.`line` (`value` ASC) ;
-- Table NGSdb.run_has_sample_lines
CREATE TABLE IF NOT EXISTS `NGSdb`.`run_has_sample_lines` (
`line_id` INT NOT NULL ,
`runhassample_id` INT NOT NULL ,
PRIMARY KEY (`line_id`, `runhassample_id`) ,
CONSTRAINT `fk_sample_has_line_line1`
FOREIGN KEY (`line_id` )
REFERENCES `NGSdb`.`line` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_sample_has_line_run_has_sample1`
FOREIGN KEY (`runhassample_id` )
REFERENCES `NGSdb`.`run_has_sample` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
-- Table NGSdb.run_has_sample
CREATE TABLE IF NOT EXISTS `NGSdb`.`run_has_sample` (
`id` INT NOT NULL AUTO_INCREMENT ,
`run_id` INT NOT NULL ,
`sample_id` INT NOT NULL ,
`dna_quantification_ng_per_ul` FLOAT NULL ,
PRIMARY KEY (`id`, `run_id`, `sample_id`) ,
CONSTRAINT `fk_run_has_sample_run1`
FOREIGN KEY (`run_id` )
REFERENCES `NGSdb`.`run` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_run_has_sample_sample1`
FOREIGN KEY (`sample_id` )
REFERENCES `NGSdb`.`sample` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
Here is my models.py :
class Run(models.Model):
id = models.AutoField(primary_key=True)
start_date = models.DateField(null=True, blank=True, verbose_name='start date')
end_date = models.DateField(null=True, blank=True, verbose_name='end date')
project = models.ForeignKey(Project)
sequencing_type = models.ForeignKey(SequencingType)
def __unicode__(self):
return u"run started %s from the project %s" % (self.start_date,self.project)
class Line(models.Model):
id = models.AutoField(primary_key=True)
value = models.IntegerField()
def __unicode__(self):
return u"%s" % str(self.value)
class RunHasSample(models.Model):
id = models.AutoField(primary_key=True)
run = models.ForeignKey(Run)
sample = models.ForeignKey(Sample)
dna_quantification_ng_per_ul = models.FloatField(null=True, blank=True)
lines = models.ManyToManyField(Line)
def __unicode__(self):
return u"Sample %s from run %s" % (self.sample, self.run)
And here is my admin.py :
class RunHasSamplesInLine(admin.TabularInline):
model = RunHasSample
fields = ['sample', 'dna_quantification_ng_per_ul', 'lines']
extra = 6
class RunAdmin(admin.ModelAdmin):
fields = ['project', 'start_date', 'end_date', 'sequencing_type']
inlines = [RunHasSamplesInLine]
list_display = ('project', 'start_date', 'end_date', 'sequencing_type')
As you can see, my samples are displayed in lines in the run form so that the staff can easily fullfill the database.
When I try to fill the database I have this error :
(1054, "Unknown column 'run_has_sample_lines.id' in 'field list'")
Of course, there are no field "lines" in my database ! It is a many to many field so I already created my intermediate table !
Okay okay ! So I tried to create the model for the intermediate table (run_has_sample_lines) and add a "through" to the ManyToManyField in the RunHasSample model. But, as I add manually the "through", I cannot use the ManyToMany field. The only way to add lines to the admin view is to stack them in lines... As you can see the samples are already in lines, it is impossible to put a new "inlines" in the already in lines samples...
Finally, I just tried to see what django had created with the manage.py sqlall.
I see that :
CREATE TABLE `run_has_sample_lines` (
`id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
`runhassample_id` integer NOT NULL,
`line_id` integer NOT NULL,
UNIQUE (`runhassample_id`, `line_id`)
)
;
ALTER TABLE `run_has_sample_lines` ADD CONSTRAINT `line_id_refs_id_4f0766aa` FOREIGN KEY (`line_id`) REFERENCES `line` (`id`);
It seems that there are no foreign key on the run_has_sample table whereas I created it in the database in the first place. I guess that the problem is coming from here but I cannot resolve it and I really hope that you can...
Thank you very much !
you may wish to try a 'through' attribute on the many-to-many relationship and declare your intermediate table in Django.
I found where the problem is...
It is not a problem in the ManyToManyField but in the intermediate table. Django refused that my intermediate table doesn't have an unique id !
So, in the sql which created django, it created automatically an unique id named "id", but in my database I didn't create one (because the couple of two foreign key is usually enough).
Next time, I'll be more carefull.

Django unique_together on postgres: enforced by ORM or DB?

As I look at the sqlall for a models.py that contains unique_together statements, I don't notice anything that looks like enforcement.
In my mind, I can imagine that this knowledge might help the database optimize a query, like so:
"I have already found a row with spam 42 and eggs 91, so in my search for eggs 91, I no longer need to check rows with spam 42."
Am I right that this knowledge can be helpful to the DB?
Am I right that it is not enforced this way (ie, it is only enforced by the ORM)?
If yes to both, is this a flaw?
Here's an example how this should look. Assume that you have model:
class UserConnectionRequest(models.Model):
sender = models.ForeignKey(UserProfile, related_name='sent_requests')
recipient = models.ForeignKey(UserProfile, related_name='received_requests')
connection_type = models.PositiveIntegerField(verbose_name=_(u'Connection type'), \
choices=UserConnectionType.choices())
class Meta:
unique_together = (("sender", "recipient", "connection_type"),)
Running sqlall it returns:
CREATE TABLE "users_userconnectionrequest" (
"id" serial NOT NULL PRIMARY KEY,
"sender_id" integer NOT NULL REFERENCES "users_userprofile" ("id") DEFERRABLE INITIALLY DEFERRED,
"recipient_id" integer NOT NULL REFERENCES "users_userprofile" ("id") DEFERRABLE INITIALLY DEFERRED,
"connection_type" integer,
UNIQUE ("sender_id", "recipient_id", "connection_type")
)
When this model is properly synced on DB it has unique constraint (postgres):
CONSTRAINT users_userconnectionrequest_sender_id_2eec26867fa22bfa_uniq
UNIQUE (sender_id, recipient_id, connection_type),

django recursive relationships

My DjangoApp is using categories to generate a navigation and to put stuff in those categories.
There are two types of categories:
ParentCategories (top categories)
ChildCategories (sub categories that have a ParentCategory as a parent)
Because those to categories are so similar I don't want to use two different models.
This is my category model:
class Category(models.Model):
name = models.CharField(max_length=60)
slug = models.SlugField(max_length=80, blank=True)
is_parent = models.BooleanField()
parent = models.ForeignKey('self', null=True, blank=True)
In my djangoadmin the parent won't be represented.
If I use python manage.py sql I get:
CREATE TABLE "catalog_category" (
"id" integer NOT NULL PRIMARY KEY,
"name" varchar(60) NOT NULL,
"slug" varchar(80) NOT NULL,
"is_parent" bool NOT NULL
)
;
So the parent relationship won't even be created.
Is there a handy way of fixing this?
I know I could just alter the table but I'm flushing/deleting the database quite a lot because the app changes rapidly and I don't want to alter the table everytime manually.
btw: my dev db is of course sqlite3.
On the server we'll use postgresql
Something else is going on - that definition of parent is fine. If I run manage.py sql on an app with that model copy-pasted in, I get:
BEGIN;
CREATE TABLE "bar_category" (
"id" integer NOT NULL PRIMARY KEY,
"name" varchar(60) NOT NULL,
"slug" varchar(80) NOT NULL,
"is_parent" bool NOT NULL,
"parent_id" integer
)
;
COMMIT;