I am using django 3.2 with Postgres as DB.
I have a model with JSONField:
class MyModel(models.Model):
data = models.JSONField(default=dict, blank=True)
In database there are a lot of records in this table and some data have JSON values as object and others as lists:
{
"0:00:00 A": "text",
"0:01:00 B": "text",
"0:02:00 C": "text",
}
[
{"time": "0:00:00", "type": "A", "description": "text"},
{"time": "0:01:00", "type": "B", "description": "text"},
{"time": "0:02:00", "type": "C", "description": "text"},
]
I need to filter all records which has JSON values as objects.
What I tried is to use has_key with time frame "0:00:00" :
result = MyModel.objects.filter(data__has_key="0:00:00 A")
But I really cant use it because I am not sure what the key with time frame look like completely.
Any ideas how to filter JSONField values by object struct?
Related
I have a Django app that requests data from an external API and my goal is to convert that data which is returned as list/dictionary format into a new REST API with a Geojson format.
I came across django-rest-framework-gis but I don't know if I could use it without having a Model. But if so, how?
I think the best way is to use the python library geojson
pip install geojson
If you do not have a Model like in geodjango you have to explicitly describe the geometry from the data you have.
from geojson import Point, Feature, FeatureCollection
data = [
{
"id": 1,
"address": "742 Evergreen Terrace",
"city": "Springfield",
"lon": -123.02,
"lat": 44.04
},
{
"id": 2,
"address": "111 Spring Terrace",
"city": "New Mexico",
"lon": -124.02,
"lat": 45.04
}
]
def to_geojson(entries):
features = []
for entry in entries:
point = Point((entry["lon"], entry["lat"]))
del entry["lon"]
del entry["lat"]
feature = Feature(geometry=point, properties=entry)
features.append(feature)
return FeatureCollection(features)
if __name__ == '__main__':
my_geojson = to_geojson(data)
print(my_geojson)
Create the point geometry from lon, lat (Could also be another geometry type)
Create a feature with the created geometry and add the dictionary as properties. Note that I deleted lon, lat entries from the dictionary to not show up as properties.
Create A feature collection from multiple features
Result:
{"features": [{"geometry": {"coordinates": [-123.02, 44.04], "type":
"Point"}, "properties": {"address": "742 Evergreen Terrace", "city":
"Springfield", "id": 1}, "type": "Feature"}, {"geometry":
{"coordinates": [-124.02, 45.04], "type": "Point"}, "properties":
{"address": "111 Spring Terrace", "city": "New Mexico", "id": 2},
"type": "Feature"}], "type": "FeatureCollection"}
More Info here: Documentation Geojson Library
I am trying to import a CSV file into Amazon Personalize
my schema looks like this:
{
"type": "record",
"name": "Items",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "AUTHOR",
"type": "string",
"categorical": true
},
{
"name": "COUNTRY",
"type": "string",
"categorical": true
},
{
"name": "CITY",
"type": "string",
"categorical": true
},
{
"name": "STYLES",
"type": "string",
"categorical": true
},
{
"name": "CATEGORIES",
"type": "string",
"categorical": true
}
],
"version": "1.0"
}
the first few rows of data look like this:
ITEM_ID,AUTHOR,COUNTRY,CITY,STYLES,CATEGORIES
5b4253a7e12434f55875381e,5acd193f48ed4b9b3add5be6,US,city_us_austin,5ad45bc575eb016f3cdb562b|571aa21888a4fd9934f0fd7b|571aa21888a4fd9934f0fd79|5ad45e8c75eb016f3cdb563f|5b4ea35abaa12285687a1f47,593a866a082c26444eab2d3c|5a8e4820fc112d414fbc1be3
5b4253a7e12434f55875381f,5acd193f48ed4b9b3add5be6,US,city_us_jackson,571aa21888a4fd9934f0fd82|57600e419e4959cd069658eb|5ad45c3a75eb016f3cdb5631|571aa21888a4fd9934f0fd7b|57aaa7094a393f531ace43f0|575e6d8e34ca56f742bea1c8|571aa21888a4fd9934f0fd8f,593a866a082c26444eab2d3c|5a8e4820fc112d414fbc1be3
I get the error
Failed to create a data import job for item dataset.
Input csv has rows that do not conform to the dataset schema. Please ensure all required data fields are present and that they are of the type specified in the schema.
How can I figure out what is wrong with the CSV (it's thousands of lines long), so I have not idea if its a general mistake, or something wrong on a specific line?
In my experience, so long as the dataset is not >250 thousand records, you can still use Excel to check the data utilizing data filters and corresponding search functions. If it's more than that, look into using Notepad++ and RegEx. Your problem may be one of the following things:
(1) There's a missing comma. This would misalign your data and keep it from being processed.
(2) There's a missing ITEM_ID value. For Items, Personalize requires ITEM_ID and at least one metadata field. It might give this error if there is an instance where you are missing ITEM_ID or have ITEM_ID but no other metadata field values.
(3) STYLES and/or CATEGORIES exceeds 256 characters. There is probably a limit on String length, but I can't get a clear answer on this from the developer's guide. I would guess it's 256 characters. If I was betting money, this would be my guess on your problem.
Here is a different approach to solve the problem, maybe will be useful for other cases. I had the same issue, but when dealing with int columns having null values. Pandas by default converts the columns to float data type - something AWS Personalize dataset import job will not accept if you have dedfined these columns as int or long. Long story short, converting these columns to int solves the problem:
df.column_name = df.column_name.astype(pd.Int32Dtype())
I need to store a list of map in cassandra. Is that possible?
This is a json representation of my data:
{
"deviceId" : "261e92b8-91af-40da-8ba4-c39d821472ec",
"sensors": [
{
"fieldSensorId": "sensorID",
"name": "sensorName",
"location": "sensor location",
"unit": "value units",
"notes": "notes"
},
{
"fieldSensorId": "sensorID 2",
"name": "sensorName 2",
"location": "sensor location 2",
"unit": "value units",
"notes": "notes"
}
]
}
CQL:
CREATE TABLE device_sensors (
device_id text,
sensors list<frozen <map<text,text>>>,
time timeuuid,
PRIMARY KEY (device_id)
)
Still im not able to insert any data. What is the right way of storing such data in cassandra? Later i will need to query the sensors list
Is it maybe wiser to create a sensors table and use sensor > to reference the sensors?
I think that the problem is that you declare devide_id as text in CQL, but you have declared itUUID in the source code, and Spring maps it into corresponding type when trying to insert data. Can you try to add #CassandraType(type = Name.TEXT) to the deviceId declaration. You can also remove the #Column declaration - the #PrimaryKeyColumn should be enough.
Or you can change the table definition to declare device_idas UUID.
I have a Cloudant database with objects that use the following format:
{
"_id": "0ea1ac7d5ef28860abc7030444515c4c",
"_rev": "1-362058dda0b8680a818b38e9c68c5389",
"text": "text-data",
"time-data": "1452988105",
"time-text": "3:48 PM - 16 Jan 2016",
"link": "http://url/to/website"
}
I want to fetch objects where the text attribute is distinct. There will be objects with duplicate text and I want Cloudant to handle removing them from a query.
How do I go about creating a MapReduce view that will do this for me? I'm completely new to MapReduce and I'm having difficulty understanding the relationship between the map and reduce functions. I tried tinkering with the built-in COUNT function and writing my own view, but they've failed catastrophically, haha.
Anyways, would it be easier to just delete the duplicates? If so, how do I do that?
While I'm trying to study this and find ELI5s, would anyone help me out? Thanks in advance! I appreciate it.
I'm not sure a MapReduce view is what you are looking for. A MapReduce view will essentially allow you to get the text and the number of docs with that same text, but you really won't be able to get the rest of the fields in the doc (because MapReduce has no idea which doc to return when multiple docs match the text). Here is a sample MapReduce view:
{
"_id": "_design/textObjects",
"views": {
"by_text": {
"map": "function (doc) { if (doc.text) { emit(doc.text, 1); }}",
"reduce": "_count"
}
},
"language": "javascript"
}
What this is doing:
The Map part of the Map Reduce takes each doc and maps it into a doc that looks like this:
{"key":"text-data", "value":1}
So, if you had 7 docs, 2 where text="text-data" and 5 where text="other-text-data" the data would look like this:
{"key":"text-data", "value":1}
{"key":"text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
The reduce part of the MapReduce ("reduce": "_count") groups the docs above by the key and returns the count:
{"key":"text-data","value":2},
{"key":"other-text-data","value":5}
You can query this view on your Cloudant instance:
https://<yourcloudantinstance>/<databasename>
/_design/textObjects
/_view/by_text?group=true
This will result in something similar to the following:
{"rows":[
{"key":"text-data","value":2},
{"key":"other-text-data","value":5}
]}
If this is not what you are looking for, but rather you are just looking to keep the latest info for a specific text value then you can simply find an existing document that matches that text and update it with new values:
Add an index on text:
{
"index": {
"fields": [
"text"
]
},
"type": "json"
}
Whenever you add a new document find the document with that same exact text:
{
"selector": {
"text": "text-value"
},
"fields": [
"_id",
"text"
]
}
If it exists update it. If not then insert a new document.
Finally, if you want to keep multiple docs with the same text value, but just want to be able to query the latest you could do something like this:
Add a property called latest or similar to your docs.
Add an index on text and latest:
{
"index": {
"fields": [
"text",
"latest"
]
},
"type": "json"
}
Whenever you add a new document find the document with that same exact text where latest == true:
{
"selector": {
"text": "text-value",
"latest" : true
},
"fields": [
"_id",
"text",
"latest"
]
}
Set latest = false on the existing document (if one exists)
Insert the new document with latest = true
This query will find the latest doc for all text values:
{
"selector": {
"text": {"$gt":null}
"latest" : true
},
"fields": [
"_id",
"text",
"latest"
]
}
I was trying to make hasMany relation of model with the same model itself like my model was ticket and in that relations defined are parentTickets and childTickets which are array of tickets and i made a mapping table 'ticketRelation' which is mapping table for the has many relationship.My models are following-
ticket model-
"relations":{
"parentTickets":{
"type":"hasMany",
"model":"ticket",
"foreignKey":"childId",
"through":"ticketRelation"
},
"childTickets":{
"type":"hasMany",
"model":"ticket",
"foreignKey":"parentId",
"through":"ticketRelation"
}
}
ticketRelation-
"relations":{
"pticket": {
"type": "belongsTo",
"model": "ticket",
"foreignKey": "parentId"
},
"ticket": {
"type": "belongsTo",
"model": "ticket",
"foreignKey": "childId"
}
}
My sample data is-
ticket id =1 has child tickets with id =2,3
so when i try to find parentTickets in ticket model by the following URL
http//localhost:3000/api/tickets?filter[include]=childTickets
it give me correct result ie ticket-id =1,childTickets=2,3
but whenever i try to find parentTickets for ticket by the following URL, it is not giving me correct result
http//localhost:3000/api/tickets?filter[include]=parentTickets
The data retrieved is-
ticket-id=1, parentTicket -1
so the problem i noticed is might be that loopback is expecting the relation name to be same as that of model name which we are specifying in the relation in the mapping table( ticketRelation) to retrieve the data.