This question already has answers here:
SELECT on JSONField with Django
(2 answers)
Closed 4 months ago.
I have a model with JSONField:
class SDReport(models.Model):
summary = models.JSONField()
summary field data example:
{
"1": {
"stm": {
"1": []
},
"non_stm": {
"1": ["3419250", "3205437"]
}
},
"2": {
"stm": {
"1": []
}
}
}
How can select data (expected result ["3419250", "3205437"], default value - []) from path '1' > non_stm > '1' with ORM?
UPD. This works:
SDReport.objects.annotate(lst=RawSQL("(summary->'1'->'non_stm'->>'1')", ())).first()
But lst is a string. Is it possible to convert it from string to list in query?
Why can't you treat this as a standard python dictionary?
summary["1"]["non_stm"]["1"] will return: ["3419250", "3205437"]
if it's not a dictionary and just a string of text, try the JSON standard library (import json), first calling:
dictionary = JSON.loads(stringVersionOfDictionary)
Related
Given a Django JSONField that is structured as a list of dictionaries:
# JSONField "materials" on MyModel:
[
{"some_id": 123, "someprop": "foo"},
{"some_id": 456, "someprop": "bar"},
{"some_id": 789, "someprop": "baz"},
]
and given a list of values to look for:
myids = [123, 789]
I want to query for all MyModel instances that have a matching some_id anywhere in those lists of dictionaries. I can do this to search in dictionaries one at a time:
# Search inside the third dictionary in each list:
MyModel.objects.filter(materials__2__some_id__in=myids)
But I can't seem to construct a query to search in all dictionaries at once. Is this possible?
Given the clue here from Davit Tovmasyan to do this by incrementing through the match_targets and building up a set of Q queries, I wrote this function that takes a field name to search, a property name to search against, and a list of target matches. It returns a new list containing the matching dictionaries and the source objects they come from.
from iris.apps.claims.models import Claim
from django.db.models import Q
def json_list_search(
json_field_name: str,
property_name: str,
match_targets: list
) -> list:
"""
Args:
json_field_name: Name of the JSONField to search in
property_name: Name of the dictionary key to search against
match_targets: List of possible values that should constitute a match
Returns:
List of dictionaries: [
{"claim_id": 123, "json_obj": {"foo": "y"},
{"claim_id": 456, "json_obj": {"foo": "z"}
]
Example:
results = json_list_search(
json_field_name="materials_data",
property_name="material_id",
match_targets=[1, 22]
)
# (results truncated):
[
{
"claim_id": 1,
"json_obj": {
"category": "category_kmimsg",
"material_id": 1,
},
},
{
"claim_id": 2,
"json_obj": {
"category": "category_kmimsg",
"material_id": 23,
}
},
]
"""
q_keys = Q()
for match_target in match_targets:
kwargs = {
f"{json_field_name}__contains": [{property_name: match_target}]
}
q_keys |= Q(**kwargs)
claims = Claim.objects.filter(q_keys)
# Now we know which ORM objects contain references to any of the match_targets
# in any of their dictionaries. Extract *relevant* objects and return them
# with references to the source claim.
results = []
for claim in claims:
data = getattr(claim, json_field_name)
for datum in data:
if datum.get(property_name) and datum.get(property_name) in match_targets:
results.append({"claim_id": claim.id, "json_obj": datum})
return results
contains might help you. Should be something like this:
q_keys = Q()
for _id in myids:
q_keys |= Q(materials__contains={'some_id': _id})
MyModel.objects.filter(q_keys)
Using Apache Beam(Python 2.7 SDK) I am trying to write JSON files as entities into Google Cloud Datastore.
Sample JSON:
{
"CustId": "005056B81111",
"Name": "John Smith",
"Phone": "827188111",
"Email": "john#xxx.com",
"addresses": [
{"type": "Billing", "streetAddress": "Street 7", "city": "Malmo", "postalCode": "CR0 4UZ"},
{"type": "Shipping", "streetAddress": "Street 6", "city": "Stockholm", "postalCode": "YYT IKO"}
]
}
I have written a Apache Beam pipeline with mainly 3 steps,
beam.io.ReadFromText(input_file_path)
beam.ParDo(CreateEntities())
WriteToDatastore(PROJECT)
In step 2, I am converting JSON object(dict) into an entity,
class CreateEntities(beam.DoFn):
def process(self, element):
element = element.encode('ascii','ignore')
element = json.loads(element)
Id = element.pop('CustId')
entity = entity_pb2.Entity()
datastore_helper.add_key_path(entity.key, 'CustomerDF', Id)
datastore_helper.add_properties(entity, element)
return [entity]
This works fine for basic properties. However since address is a dict object itself it fails.
I have read a similar post.
However did not get the exact code to convert dict -> entity
Tried below to set address element as entity but does not work,
element['addresses'] = entity_pb2.Entity()
Other References:
https://www.the-swamp.info/blog/uploading-data-cloud-datastore-using-dataflow/
https://gcloud-python.readthedocs.io/en/latest/datastore/entities.html
Are you trying to store this as a repeated structured property?
ndb.StructuredPropertys appear in dataflow with the keys flattened, and for repeated structured properties, each individual property within the structured property object becomes an array. So I think you would need to write it like this:
datastore_helper.add_properties(entity, {
...
"addresses.type": ["Billing", "Shipping"],
"addresses.streetAddress": ["Street 7", "Street 6"],
"addresses.city": ["Malmo", "Stockholm"],
"addresses.postalCode": ["CR0 4UZ", "YYT IKO"],
})
Alternatively, if youre trying to save this as a ndb.JsonProperty, you can do this:
datastore_helper.add_properties(entity, {
...
"addresses": json.dumps(element['addresses']),
})
I know this is an old question, but I had a similar issue (although Python 3.6 and NDB) and wrote a function to convert all dicts inside a dict into Entity. This uses recursion to go through all nodes converting as necessary:
def dict_to_entity(data):
# the data can be a dict or a list, and they are iterated over differently
# also create a new object to store the child objects
if type(data) == dict:
childiterator = data.items()
new_data = {}
elif type(data) == list:
childiterator = enumerate(data)
new_data = []
else:
return
for i, child in childiterator:
# if the child is a dict or a list, continue drilling...
if type(child) in [dict, list]:
new_child = dict_to_entity(child)
else:
new_child = child
# add the child data to the new object
if type(data) == dict:
new_data[i] = new_child
else:
new_data.append(new_child)
# convert the new object to Entity if needed
if type(data) == dict:
child_entity = datastore.Entity()
child_entity.update(new_data)
return child_entity
else:
return new_data
If I have:
class Tag(models.Model):
number = models.IntegerField()
config = {
"data": 1,
"field": "number"
}
How do I do the following?
record = Tag(config["field"]=config["data"])
record.save()
You can unpack dict to arguments using this syntax **. For example:
config = {
"number": 1
}
record = Tag(**config)
record.save()
This will create new tag instance with number=1 value.
I have a field of phone numbers where a random variety of separators have been used, such as:
932-555-1515
951.555.1255
(952) 555-1414
I would like to go through each field that already exists and remove the non numeric characters.
Is that possible?
Whether or not it gets stored as an integer or as a string of numbers, I don't care either way. It will only be used for display purposes.
You'll have to iterate over all your docs in code and use a regex replace to clean up the strings.
Here's how you'd do it in the mongo shell for a test collection with a phone field that needs to be cleaned up.
db.test.find().forEach(function(doc) {
doc.phone = doc.phone.replace(/[^0-9]/g, '');
db.test.save(doc);
});
Based on the previous example by #JohnnyHK, I added regex also to the find query:
/*
MongoDB: Find by regular expression and run regex replace on results
*/
db.test.find({"url": { $regex: 'http:\/\/' }}).forEach(function(doc) {
doc.url = doc.url.replace(/http:\/\/www\.url\.com/g, 'http://another.url.com');
db.test.save(doc);
});
Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
And coupled with improvements made to db.collection.update() in Mongo 4.2 that can accept an aggregation pipeline, allowing the update of a field based on its own value,
We can manipulate and update a field in ways the language doesn't easily permit and avoid an inefficient find/foreach pattern:
// { "x" : "932-555-1515", "y" : 3 }
// { "x" : "951.555.1255", "y" : 7 }
// { "x" : "(952) 555-1414", "y" : 6 }
db.collection.updateMany(
{ "x": { $regex: /[^0-9]/g } },
[{ $set:
{ "x":
{ $function: {
body: function(x) { return x.replace(/[^0-9]/g, ''); },
args: ["$x"],
lang: "js"
}}
}
}
])
// { "x" : "9325551515", "y" : 3 }
// { "x" : "9515551255", "y" : 7 }
// { "x" : "9525551414", "y" : 6 }
The update consist of:
a match query { "x": { $regex: /[^0-9]/g } }, filtering documents to update (in our case any document that contains non-numeric characters in the field we're interested on updating).
an update aggreation pipeline [ { $set: { active: { $eq: [ "$a", "Hello" ] } } } ] (note the squared brackets signifying the use of an aggregation pipeline). $set is a new aggregation operator and an alias for $addFields.
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the string to modify. The function here simply consists in replacing characters matching the regex with empty characters.
args, which contains the fields from the record that the body function takes as parameter. In our case, "$x".
lang, which is the language in which the body function is written. Only js is currently available.
in mongodb version 4.2 you have regexFind project operator which can be used together with substr inside an aggregation without looping through all the documents in client
I need to populate a Map so that:
The Key is a String
The Value is a List of Strings
The process is to go through all the records in a table that has two text fields : "parameter" and "value". "Parameter" is not unique an has many duplicates. So what I intent to do is:
def all = MyTable.findAll()
def mymap = [:]
all.each {
// add to mymap the element "it.value" to the list that has "it.parameter" as key
}
Any clues ?
Thanks
There is a IMHO little bit simpler way doing this by using 'withDefault' introduced in Groovy 1.7:
all = [
[parameter: 'foo', value: 'aaa'],
[parameter: 'foo', value: 'bbb'],
[parameter: 'bar', value: 'ccc'],
[parameter: 'baz', value: 'ddd']
]
def myMap = [:].withDefault { [] }
all.each {
myMap[it.parameter] << it.value
}
assert myMap.size() == 3
assert myMap.foo == ['aaa','bbb']
assert myMap.bar == ['ccc']
assert myMap.baz == ['ddd']
You can use the Map.groupBy method, which will split the collection into a map of groups based on the passed in closure. Here's a full example, which also calls collect to make each parameter point to just the values:
all = [
[parameter: 'foo', value: 'aaa'],
[parameter: 'foo', value: 'bbb'],
[parameter: 'bar', value: 'ccc'],
[parameter: 'baz', value: 'ddd']
]
tmpMap = all.groupBy{it.parameter}
myMap = [:].putAll(tmpMap.collect{k, v -> [k, v.value] as MapEntry})
assert myMap == [foo: ['aaa', 'bbb'], bar: ['ccc'], baz:['ddd']]