Cassandra store list of objects - list

I need to store a list of map in cassandra. Is that possible?
This is a json representation of my data:
{
"deviceId" : "261e92b8-91af-40da-8ba4-c39d821472ec",
"sensors": [
{
"fieldSensorId": "sensorID",
"name": "sensorName",
"location": "sensor location",
"unit": "value units",
"notes": "notes"
},
{
"fieldSensorId": "sensorID 2",
"name": "sensorName 2",
"location": "sensor location 2",
"unit": "value units",
"notes": "notes"
}
]
}
CQL:
CREATE TABLE device_sensors (
device_id text,
sensors list<frozen <map<text,text>>>,
time timeuuid,
PRIMARY KEY (device_id)
)
Still im not able to insert any data. What is the right way of storing such data in cassandra? Later i will need to query the sensors list
Is it maybe wiser to create a sensors table and use sensor > to reference the sensors?

I think that the problem is that you declare devide_id as text in CQL, but you have declared itUUID in the source code, and Spring maps it into corresponding type when trying to insert data. Can you try to add #CassandraType(type = Name.TEXT) to the deviceId declaration. You can also remove the #Column declaration - the #PrimaryKeyColumn should be enough.
Or you can change the table definition to declare device_idas UUID.

Related

Django differ JSONField values between lists and objects

I am using django 3.2 with Postgres as DB.
I have a model with JSONField:
class MyModel(models.Model):
data = models.JSONField(default=dict, blank=True)
In database there are a lot of records in this table and some data have JSON values as object and others as lists:
{
"0:00:00 A": "text",
"0:01:00 B": "text",
"0:02:00 C": "text",
}
[
{"time": "0:00:00", "type": "A", "description": "text"},
{"time": "0:01:00", "type": "B", "description": "text"},
{"time": "0:02:00", "type": "C", "description": "text"},
]
I need to filter all records which has JSON values as objects.
What I tried is to use has_key with time frame "0:00:00" :
result = MyModel.objects.filter(data__has_key="0:00:00 A")
But I really cant use it because I am not sure what the key with time frame look like completely.
Any ideas how to filter JSONField values by object struct?

How to add map to map array in AWS DynamoDB only when id is not existed?

Here is my DynamoDB structure.
{"books": [
{
"name": "Hello World 1",
"id": "1234"
},
{
"name": "Hello World 2",
"id": "5678"
}
]}
I want to set ConditionExpression to check whether id existed before adding new items to books array. Here is my ConditionExpression. I am using API gateway to access DynamoDB.
"ConditionExpression": "NOT contains(#lu.books.id,:id)",
"ExpressionAttributeValues": {":id": {
"S": "$input.path('$.id')"
}
}
Result when I test the API: no matter id existed or not, success to add items to array.
Any suggestion on how to do it? Thanks!
Unfortunately, you can't. However, there is a workaround.
Store the books in separate rows. For example
PK SK
BOOK_LU#<ID> BOOK_NAME#<book name>#BOOK_ID#<BOOK_ID>
Now you can use the 'if_not_exists' conditional expression
"ConditionExpression": "if_not_exists(id, :id)'",
"ExpressionAttributeValues": {":id": {
"S": "$input.path('$.id')"
}
}
The con is if you were previously fetching the list as part of another object you will have to change that.
The pro is that now you can easily work with the books + you won't hit the max row size limits if the books became too many.

Is there a way to interpolate OutputPath's JsonPath using state's input in AWS step function?

Basically, i have the following input:
{
"name": "abc",
"choice": "choice1"
}
My dynamoDB table has the following structure:
Partition key - "name"
Complex json with choices:
{
"choices":
{
"choice1": ......,
"choice2": ......
}
}
I want to directly read from dynamodb, and get a subitem under the relevant choice:
{
"StartAt": "Read Next Message from DynamoDB",
"States": {
"Read Next Message from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "my_table",
"Key": {
"customerName": {"S.$": "$.name"}
}
},
"OutputPath": "$.Item.choices.M.choice1.M.myvalue.S",
"Next": "World"
},
"World": {
"Type": "Pass",
"End": true
}
}
}
basically i want to do something like "$.Item.choices.M.{$.choice}.M.myvalue.S", and take one of the output's keys from the input. is this possible?
I think what you're looking for is JsonPath interpolation, but that is not supported as per this thread on AWS forums.
As far as I know Step Functions allow only path reference through $, . and [] operators (Reference Path).
I don't know how much control you have on the DynamoDB table's data but I think your problem can be solved easily if your choice types are modeled in following way
{
"choices": [{
"choiceType": "choice1",
........
},
{
"choiceType": "choice2",
........
}]
}
Now you can use the map state to iterate over the choices array. Note that don't forget to pass the expected choiceType to each iteration.
First state of the map iterator can be a choice state which compares choiceType and moves to appropriate next state. So, basically your rest of the workflow is modeled as iterator of the map state in step 1.
Now, if you don't have the control over DynamoDB table, then you can process the query result in an AWS Lambda.

Error when importing CSV file into Amazon Personalize

I am trying to import a CSV file into Amazon Personalize
my schema looks like this:
{
"type": "record",
"name": "Items",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "AUTHOR",
"type": "string",
"categorical": true
},
{
"name": "COUNTRY",
"type": "string",
"categorical": true
},
{
"name": "CITY",
"type": "string",
"categorical": true
},
{
"name": "STYLES",
"type": "string",
"categorical": true
},
{
"name": "CATEGORIES",
"type": "string",
"categorical": true
}
],
"version": "1.0"
}
the first few rows of data look like this:
ITEM_ID,AUTHOR,COUNTRY,CITY,STYLES,CATEGORIES
5b4253a7e12434f55875381e,5acd193f48ed4b9b3add5be6,US,city_us_austin,5ad45bc575eb016f3cdb562b|571aa21888a4fd9934f0fd7b|571aa21888a4fd9934f0fd79|5ad45e8c75eb016f3cdb563f|5b4ea35abaa12285687a1f47,593a866a082c26444eab2d3c|5a8e4820fc112d414fbc1be3
5b4253a7e12434f55875381f,5acd193f48ed4b9b3add5be6,US,city_us_jackson,571aa21888a4fd9934f0fd82|57600e419e4959cd069658eb|5ad45c3a75eb016f3cdb5631|571aa21888a4fd9934f0fd7b|57aaa7094a393f531ace43f0|575e6d8e34ca56f742bea1c8|571aa21888a4fd9934f0fd8f,593a866a082c26444eab2d3c|5a8e4820fc112d414fbc1be3
I get the error
Failed to create a data import job for item dataset.
Input csv has rows that do not conform to the dataset schema. Please ensure all required data fields are present and that they are of the type specified in the schema.
How can I figure out what is wrong with the CSV (it's thousands of lines long), so I have not idea if its a general mistake, or something wrong on a specific line?
In my experience, so long as the dataset is not >250 thousand records, you can still use Excel to check the data utilizing data filters and corresponding search functions. If it's more than that, look into using Notepad++ and RegEx. Your problem may be one of the following things:
(1) There's a missing comma. This would misalign your data and keep it from being processed.
(2) There's a missing ITEM_ID value. For Items, Personalize requires ITEM_ID and at least one metadata field. It might give this error if there is an instance where you are missing ITEM_ID or have ITEM_ID but no other metadata field values.
(3) STYLES and/or CATEGORIES exceeds 256 characters. There is probably a limit on String length, but I can't get a clear answer on this from the developer's guide. I would guess it's 256 characters. If I was betting money, this would be my guess on your problem.
Here is a different approach to solve the problem, maybe will be useful for other cases. I had the same issue, but when dealing with int columns having null values. Pandas by default converts the columns to float data type - something AWS Personalize dataset import job will not accept if you have dedfined these columns as int or long. Long story short, converting these columns to int solves the problem:
df.column_name = df.column_name.astype(pd.Int32Dtype())

Fetching With Distinct/Unique Values

I have a Cloudant database with objects that use the following format:
{
"_id": "0ea1ac7d5ef28860abc7030444515c4c",
"_rev": "1-362058dda0b8680a818b38e9c68c5389",
"text": "text-data",
"time-data": "1452988105",
"time-text": "3:48 PM - 16 Jan 2016",
"link": "http://url/to/website"
}
I want to fetch objects where the text attribute is distinct. There will be objects with duplicate text and I want Cloudant to handle removing them from a query.
How do I go about creating a MapReduce view that will do this for me? I'm completely new to MapReduce and I'm having difficulty understanding the relationship between the map and reduce functions. I tried tinkering with the built-in COUNT function and writing my own view, but they've failed catastrophically, haha.
Anyways, would it be easier to just delete the duplicates? If so, how do I do that?
While I'm trying to study this and find ELI5s, would anyone help me out? Thanks in advance! I appreciate it.
I'm not sure a MapReduce view is what you are looking for. A MapReduce view will essentially allow you to get the text and the number of docs with that same text, but you really won't be able to get the rest of the fields in the doc (because MapReduce has no idea which doc to return when multiple docs match the text). Here is a sample MapReduce view:
{
"_id": "_design/textObjects",
"views": {
"by_text": {
"map": "function (doc) { if (doc.text) { emit(doc.text, 1); }}",
"reduce": "_count"
}
},
"language": "javascript"
}
What this is doing:
The Map part of the Map Reduce takes each doc and maps it into a doc that looks like this:
{"key":"text-data", "value":1}
So, if you had 7 docs, 2 where text="text-data" and 5 where text="other-text-data" the data would look like this:
{"key":"text-data", "value":1}
{"key":"text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
The reduce part of the MapReduce ("reduce": "_count") groups the docs above by the key and returns the count:
{"key":"text-data","value":2},
{"key":"other-text-data","value":5}
You can query this view on your Cloudant instance:
https://<yourcloudantinstance>/<databasename>
/_design/textObjects
/_view/by_text?group=true
This will result in something similar to the following:
{"rows":[
{"key":"text-data","value":2},
{"key":"other-text-data","value":5}
]}
If this is not what you are looking for, but rather you are just looking to keep the latest info for a specific text value then you can simply find an existing document that matches that text and update it with new values:
Add an index on text:
{
"index": {
"fields": [
"text"
]
},
"type": "json"
}
Whenever you add a new document find the document with that same exact text:
{
"selector": {
"text": "text-value"
},
"fields": [
"_id",
"text"
]
}
If it exists update it. If not then insert a new document.
Finally, if you want to keep multiple docs with the same text value, but just want to be able to query the latest you could do something like this:
Add a property called latest or similar to your docs.
Add an index on text and latest:
{
"index": {
"fields": [
"text",
"latest"
]
},
"type": "json"
}
Whenever you add a new document find the document with that same exact text where latest == true:
{
"selector": {
"text": "text-value",
"latest" : true
},
"fields": [
"_id",
"text",
"latest"
]
}
Set latest = false on the existing document (if one exists)
Insert the new document with latest = true
This query will find the latest doc for all text values:
{
"selector": {
"text": {"$gt":null}
"latest" : true
},
"fields": [
"_id",
"text",
"latest"
]
}