app-engine ndb delete data - python-2.7

I'm new to app-engine [Python 2.7]
I would like to delete elements from my ndb (currently I don't care if it is one by one or all at once since none is working for me).
Version 1 based on this Q:
ps_ancestors = req_query.fetch()
for ps_ancestor in ps_ancestors:
self.response.write(ps_ancestor.key)
ps_ancestor.key.delete()
It continues to print the same data without actually deleting anything
Version 2:
[myId currently have only the values 1,2,3]
ndb.Key(myId, 1).delete()
ndb.Key(myId, 2).delete()
ndb.Key(myId, 3).delete()
The model:
class tmpReport (ndb.Model):
myId = ndb.IntegerProperty()
hisId = ndb.IntegerProperty()
date = ndb.DateTimeProperty(auto_now_add=True)
What am I missing?

k = users.query(users.name == 'abhinav')
for l in k.fetch(limit = 1):
l.key.delete()

First of all, you should not define your Entity key as an IntegerProperty. Take a look at this documentation: NDB Entities and Keys
In order to delete an entity from datastore you should first retrieve it by using a query or by its ID. I recommend you to use a "keyname" when creating your entities (to use it as your custom ID):
# Model declaration
class tmpReport (ndb.Model):
hisId = ndb.IntegerProperty()
date = ndb.DateTimeProperty(auto_now_add=True)
# Store the entity
report = tmpReport(id=1, hisId=5)
report.put()
Then to retrieve and delete the previous entity use:
# Retrieve entity
report = ndb.Key("tmpReport", 1).get()
# Delete the entity
report.key.delete()
Hope it helps.

query1 = tmpReport.query()
query2 = query1.filter(tmpReport.myId == int(myId))
#Add other filters as necessary
query3 = query2.fetch()
query3[0].key.delete()
Removes the first entity(element) returned assuming myID is unique so there is only one element in the list.

Related

How to change data type of column in DynamoDb?

Initially I have inserted integer values hence schema created with column type number, later string values were also inserted in same column. Now I am facing issue while fetching values. Need tho update column type number to string.
Well, there are no columns in DynamoDB and even if you consider attributes as columns which they are not, they don't enforce specific type, except for primary key. Therefore you can't change the type of a column.
If you are asking about how to change type of a specific attribute for all items in a table, then you need to run update command on all of the items. DynamoDB unfortunately doesn't support batch update operation, therefore you need to fetch keys of all the items that you need to updated, loop through that list and update each item separately.
I recently had to do this. Here is my script that I used:
Assume that 'timestamp' is name of column you need to change from string to number. So here is solution:
import boto3
from boto3.dynamodb.conditions import Key
db_client = boto3.resource('dynamodb', region_name="eu-west-3")
table_res = db_client.Table(TABLE_NAME)
not_finished = True
ret = table_res.scan()
while not_finished:
for item in ret['Items']:
if 'timestamp' in item and isinstance(item['timestamp'], str):
new_item = item
new_item['timestamp'] = int(float(item['timestamp']))
print("fixing {}, {} -> {}".format(item['SK'], item['timestamp'], new_item['timestamp']))
table_res.put_item(Item = new_item)
if "LastEvaluatedKey" in ret:
last_key = ret['LastEvaluatedKey']
ret = table_res.scan(ExclusiveStartKey = last_key)
else:
not_finished = False
I do understand you probably don't need this anymore, but I still hope this will help somebody.

NDB - Most Efficient Way to Delete a List of Keys

I believe I need to use ndb.delete_multi but I am confused as to how to make it work for a specific set of keys and whether it is the most efficient way to go. Using Python 2.7 with Google App Engine.
First, I am collecting the keys that I want to delete. I don't want to delete everything, only those entries that are 1 hour or more old. Toward that end I first collect the list of keys that meet this criteria.
cs = ChannelStore()
delMsgKeys = []
for x in cs.allMessages():
current = datetime.datetime.now()
recordTime = x.channelMessageCreated
timeDiffSecs = (current - recordTime).total_seconds()
timeDiff = (timeDiffSecs/60)/60
if timeDiff >=1:
delMsgKeys.append(x.key.id())
ndb.delete_multi(?????)
The definition for cs.allMessages():
def allMessages(self):
return ChannelStore.query().fetch()
First, is this overall the most efficient approach? Second, how do I use the list of keys I created with the ndb.delete_multi statement?
---Update----
The issue with the ndb.delete_multi has to do with the keys I was passing it. In the code I posted above the keys should have been stored as follows:
delMsgKeys.append(x.key)
With the above ndb.delete_multi works.
Per the NDB documentation, you can just pass in a list of keys to ndb.delete_multi, so based on your code, this should work:
ndb.delete_multi(delMsgKeys)
I'm not sure what the limit is for the number of keys that you can pass in a single ndb.delete_multi() call, though.
For this query:
ChannelStore.query().fetch()
You can add an entity property to store a timestamp when you create/update the entity by adding auto_now = True (more documentation here). Then with the timestamp property you can query for something like this:
sixty_mins_ago = datetime.datetime.now()- datetime.timedelta(minutes = 60)
qry = ChannelStore.query()
list_of_keys = qry.filter(ChannelStore.timestamp < sixty_mins_ago).fetch(keys_only = True)
Since you don't need the entities, a keys_only fetch will be cheaper. Of course this code is assuming your ChannelStore model has a timestamp property, so your model will have to be something like this:
class ChannelStore(ndb.model):
#other properties go here
timestamp = ndb.DateTimeProperty(auto_now = True)
Putting it all together, something like this could work for the code block you have above:
from models import ChannelStore
from google.appengine.ext import ndb
from datetime import datetime, timedelta
# other imports
def delete_old_entities():
sixty_mins_ago = datetime.now() - timedelta(minutes = 60)
qry = ChannelStore.query()
qry = qry.filter(ChannelStore.timestamp < sixty_mins_ago)
list_of_keys = qry.fetch(keys_only = True)
ndb.delete_multi(list_of_keys)
In case you have to delete a lot of keys and are running into some kind of API limit with the ndb.delete_multi call, you can change the delete_old_entities() method to the following:
def delete_old_entities():
sixty_mins_ago = datetime.datetime.now() - datetime.timedelta(minutes = 60)
qry = ChannelStore.query()
qry = qry.filter(ChannelStore.timestamp < sixty_mins_ago)
list_of_keys = qry.fetch(keys_only = True)
while list_of_keys:
# delete 100 at a time
ndb.delete_multi(list_of_keys[:100])
list_of_keys = list_of_keys[100:]

how to store a field in the database after querying

views.py:
q3=KEBReading.objects.filter(datetime_reading__month=a).filter(datetime_reading__year=selected_year).values("signed")
for item in q3:
item["signed"]="signed"
print item["signed"]
q3.save()
How do I save a field into the database? I'm trying to save the field called "signed" with a value. If I do q3.save() it gives a error as it is a queryset. I'm doing a query from the database and then, based on the result, want to set a value to a field and save it.
prevdate=KEBReading.objects.filter(datetime_reading__lt=date)
i am getting all the rows from the database less than the current date. but i want only the latest record. if im entering 2012-06-03. wen i query i want the date less than this date i.e the date just previous to this. can sumbody help?
q3 = KEBReading.objects.filter(datetime_reading__month=a,
datetime_reading__year=selected_year)
for item in q3:
item.signed = True
item.save()
q3=KEBReading.objects.filter(...)
will return you a list of objects. Any instance of a Django Model is an object and all fields of the instance are attributes of that object. That means, you must use them using dot (.) notation.
like:
item.signed = "signed"
If your object is a dictionary or a class derived from dictionary, then you can use named-index like:
item["signed"] = "signed"
and in your situation, that usage is invalid (because your object's type is not dictionary based)
You can either call update query:
KEBReading.objects.filter(...).update(selected="selected")
or set new value in a loop and then save it
for item in q3:
item.signed="signed"
q3.save()
but in your situation, update query is a better approach since it executes less database calls.
Try using update query:
If signed is a booleanfield:
q3 = KEBReading.objects.filter(datetime_reading__month = a).filter(datetime_reading__year = selected_year).update(signed = True)
If it is a charfield:
q3 = KEBReading.objects.filter(datetime_reading__month = a).filter(datetime_reading__year = selected_year).update(signed = "True")
Update for comments:
If you want to fetch records based datetime_reading month, you can do it by providing month as number. For example, 2 for February:
q3 = KEBReading.objects.filter(datetime_reading__month = 2).order_by('datetime_reading')
And if you to fetch records with signed = True, you can do it by:
q3 = KEBReading.objects.filter(signed = True)
If you want to fetch only records of previous date by giving a date, you can use:
prevdate = KEBReading.objects.filter(datetime_reading = (date - datetime.timedelta(days = 1)))

How to 'bulk update' with Django?

I'd like to update a table with Django - something like this in raw SQL:
update tbl_name set name = 'foo' where name = 'bar'
My first result is something like this - but that's nasty, isn't it?
list = ModelClass.objects.filter(name = 'bar')
for obj in list:
obj.name = 'foo'
obj.save()
Is there a more elegant way?
Update:
Django 2.2 version now has a bulk_update.
Old answer:
Refer to the following django documentation section
Updating multiple objects at once
In short you should be able to use:
ModelClass.objects.filter(name='bar').update(name="foo")
You can also use F objects to do things like incrementing rows:
from django.db.models import F
Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1)
See the documentation.
However, note that:
This won't use ModelClass.save method (so if you have some logic inside it won't be triggered).
No django signals will be emitted.
You can't perform an .update() on a sliced QuerySet, it must be on an original QuerySet so you'll need to lean on the .filter() and .exclude() methods.
Consider using django-bulk-update found here on GitHub.
Install: pip install django-bulk-update
Implement: (code taken directly from projects ReadMe file)
from bulk_update.helper import bulk_update
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
r = random.randrange(4)
person.name = random_names[r]
bulk_update(people) # updates all columns using the default db
Update: As Marc points out in the comments this is not suitable for updating thousands of rows at once. Though it is suitable for smaller batches 10's to 100's. The size of the batch that is right for you depends on your CPU and query complexity. This tool is more like a wheel barrow than a dump truck.
Django 2.2 version now has a bulk_update method (release notes).
https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-update
Example:
# get a pk: record dictionary of existing records
updates = YourModel.objects.filter(...).in_bulk()
....
# do something with the updates dict
....
if hasattr(YourModel.objects, 'bulk_update') and updates:
# Use the new method
YourModel.objects.bulk_update(updates.values(), [list the fields to update], batch_size=100)
else:
# The old & slow way
with transaction.atomic():
for obj in updates.values():
obj.save(update_fields=[list the fields to update])
If you want to set the same value on a collection of rows, you can use the update() method combined with any query term to update all rows in one query:
some_list = ModelClass.objects.filter(some condition).values('id')
ModelClass.objects.filter(pk__in=some_list).update(foo=bar)
If you want to update a collection of rows with different values depending on some condition, you can in best case batch the updates according to values. Let's say you have 1000 rows where you want to set a column to one of X values, then you could prepare the batches beforehand and then only run X update-queries (each essentially having the form of the first example above) + the initial SELECT-query.
If every row requires a unique value there is no way to avoid one query per update. Perhaps look into other architectures like CQRS/Event sourcing if you need performance in this latter case.
Here is a useful content which i found in internet regarding the above question
https://www.sankalpjonna.com/learn-django/running-a-bulk-update-with-django
The inefficient way
model_qs= ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
obj.name = 'foo'
obj.save()
The efficient way
ModelClass.objects.filter(name = 'bar').update(name="foo") # for single value 'foo' or add loop
Using bulk_update
update_list = []
model_qs= ModelClass.objects.filter(name = 'bar')
for model_obj in model_qs:
model_obj.name = "foo" # Or what ever the value is for simplicty im providing foo only
update_list.append(model_obj)
ModelClass.objects.bulk_update(update_list,['name'])
Using an atomic transaction
from django.db import transaction
with transaction.atomic():
model_qs = ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
ModelClass.objects.filter(name = 'bar').update(name="foo")
Any Up Votes ? Thanks in advance : Thank you for keep an attention ;)
To update with same value we can simply use this
ModelClass.objects.filter(name = 'bar').update(name='foo')
To update with different values
ob_list = ModelClass.objects.filter(name = 'bar')
obj_to_be_update = []
for obj in obj_list:
obj.name = "Dear "+obj.name
obj_to_be_update.append(obj)
ModelClass.objects.bulk_update(obj_to_be_update, ['name'], batch_size=1000)
It won't trigger save signal every time instead we keep all the objects to be updated on the list and trigger update signal at once.
IT returns number of objects are updated in table.
update_counts = ModelClass.objects.filter(name='bar').update(name="foo")
You can refer this link to get more information on bulk update and create.
Bulk update and Create

Retrieving untagged objects with django-tagging

What I'm looking for is a QuerySet containing any objects not tagged.
The solution I've come up with so far looks overly complicated to me:
# Get all tags for model
tags = Location.tags.all().order_by('name')
# Get a list of tagged location id's
tag_list = tags.values_list('name', flat=True)
tag_names = ', '.join(tag_list)
tagged_locations = Location.tagged.with_any(tag_names) \
.values_list('id', flat=True)
untagged_locations = []
for location in Location.objects.all():
if location.id not in tagged_locations:
untagged_locations.append(location)
Any ideas for improvement? Thanks!
There is some good information in this post, so I don't feel that it should be deleted, but there is a much, much simpler solution
I took a quick peek at the source code for django-tagging. It looks like they use the ContentType framework and generic relations to pull it off.
Because of this, you should be able to create a generic reverse relation on your Location class to get easy access to the TaggedItem objects for a given location, if you haven't already done so:
from django.contrib.contenttypes import generic
from tagging.models import TaggedItem
class Location(models.Model):
...
tagged_items = generic.GenericRelation(TaggedItem,
object_id_field="object_id",
content_type_field="content_type")
...
Clarification
My original answer suggested to do this:
untagged_locs = Location.objects.filter(tagged_items__isnull=True)
Although this would work for a 'normal join', this actually doesn't work here because the content type framework throws an additional check on content_type_id into the SQL for isnull:
SELECT [snip] FROM `sotest_location`
LEFT OUTER JOIN `tagging_taggeditem`
ON (`sotest_location`.`id` = `tagging_taggeditem`.`object_id`)
WHERE (`tagging_taggeditem`.`id` IS NULL
AND `tagging_taggeditem`.`content_type_id` = 4 )
You can hack-around it by reversing it like this:
untagged_locs = Location.objects.exclude(tagged_items__isnull=False)
But that doesn't quite feel right.
I also proposed this, but it was pointed out that annotations don't work as expected with the content types framework.
from django.db.models import Count
untagged_locs = Location.objects.annotate(
num_tags=Count('tagged_items')).filter(num_tags=0)
The above code works for me in my limited test case, but it could be buggy if you have other 'taggable' objects in your model. The reason being that it doesn't check the content_type_id as outlined in the ticket. It generated the following SQL:
SELECT [snip], COUNT(`tagging_taggeditem`.`id`) AS `num_tags`
FROM `sotest_location`
LEFT OUTER JOIN `tagging_taggeditem`
ON (`sotest_location`.`id` = `tagging_taggeditem`.`object_id`)
GROUP BY `sotest_location`.`id` HAVING COUNT(`tagging_taggeditem`.`id`) = 0
ORDER BY NULL
If Location is your only taggable object, then the above would work.
Proposed Workaround
Short of getting the annotation mechanism to work, here's what I would do in the meantime:
untagged_locs_e = Location.objects.extra(
where=["""NOT EXISTS(SELECT 1 FROM tagging_taggeditem ti
INNER JOIN django_content_type ct ON ti.content_type_id = ct.id
WHERE ct.model = 'location'
AND ti.object_id = myapp_location.id)"""]
)
This adds an additional WHERE clause to the SQL:
SELECT [snip] FROM `myapp_location`
WHERE NOT EXISTS(SELECT 1 FROM tagging_taggeditem ti
INNER JOIN django_content_type ct ON ti.content_type_id = ct.id
WHERE ct.model = 'location'
AND ti.object_id = myapp_location.id)
It joins to the django_content_type table to ensure that you're looking at the appropriate
content type for your model in the case where you have more than one taggable model type.
Change myapp_location.id to match your table name. There's probably a way to avoid hard-coding the table names, but you can figure that out if it's important to you.
Adjust accordingly if you're not using MySQL.
Try this:
[location for location in Location.objects.all() if location.tags.count() == 0]
Assuming your Location class uses the tagging.fields.TagField utility.
from tagging.fields import TagField
class Location(models.Model):
tags = TagField()
You can just do this:
Location.objects.filter(tags='')