Entity-attribute-value mode or JSON in database? - c++

I have a database schema where attribute are unlimited, I can have this structure using two ways.
Using Entity attribute-value model
table 1
id
entity
table 2
entityid
attribute-name
attribute-value
2 . Way is to use JSON.
like
table1
id
entity
json-attribute {"name":"value-pair"}
I have a question which way will be best and effective .

I am not familiar with a DBMS that would let you efficiently find all entities where someAttribute = x, if the entities were stored in a non-deconstructed canonical JSON representation. (But I would be eager to know about any.)
The first approach using two tables (at least) can accomplish this task, and it is therefore the more capable and the more flexible approach; a JSON representation of the entity always be constructed from the database recordset:
// all entities having a particular attribute
select entityid, attributeName, attributeValue
from ENTITIES INNER JOIN ENTITYATTRIBUTES
on ENTITY.ID = ENTITYATTRIBUTES.entityid
where ENTITIES.id IN
(
select distinct entityid from ENTITYATTRIBUTES
where attributename = ? and attributeValue = ?
)
OR
// the attributes for a specified entity
select attributeName, attributeValue
from ENTITIES INNER JOIN ENTITYATTRIBUTES
on ENTITY.ID = ENTITYATTRIBUTES.entityid
where ENTITIES.id = ?
Complexity could enter, of course, if attributes could themselves contain entities. Nesting of objects is possible in the JSON representation but in the database it requires either a multi-table relational mapping or an OODBMS that supports nested tables.

I had chosen json solution.
why ?
It will avoid writing complex query to fetch data.
What about if I need to load any particular attribute ?
yes. In JSON solution I have to load all attribute from the database. and then filter for that particular attribute.
But in my case I will be loading all attribute every time.
If I have a condition of loading particular attribute I might have chosen attribute value schema.

Related

Is there a way how to address nested properties in AWS DynamoDB for purpose of documentClient.query() call?

I am currently testing how to design a query from AWS.DynamoDB.DocumentClient query() call that takes params: DocumentClient.QueryInput, which is used for retrieving data collection from a table in DynamoDB.
Query seems to be simple and working fine while working with indexes of type String or Number only. What I am not able to make is an query, that will use a valid index and filter upon an attribute that is nested (see my data structure please).
I am using FilterExpression, where can be defined logic for filtering - and that seems to be working fine in all cases except cases when trying to do filtering on nested attribute.
Current parameters, I am feeding query with
parameters {
TableName: 'myTable',
ProjectionExpression: 'HashKey, RangeKey, Artist ,#SpecialStatus, Message, Track, Statistics'
ExpressionAttributeNames: { '#SpecialStatus': 'Status' },
IndexName: 'Artist-index',
KeyConditionExpression: 'Artist = :ArtistName',
ExpressionAttributeValues: {
':ArtistName': 'BlindGuadian',
':Track': 'Mirror Mirror'
},
FilterExpression: 'Track = :Track'
}
Data structure in DynamoDB's table:
{
'Artist' : 'Blind Guardian',
..
'Track': 'Mirror Mirror',
'Statistics' : [
{
'Sales': 42,
'WrittenBy' : 'Kursch'
}
]
}
Lets assume we want to filter out all entries from DB, by using Artist in KeyConditionExpression. We can achieve this by feeding Artist with :ArtistName. Now the question, how to retrieve records that I can filter upon WritenBy, which is nested in Statistics?
To best of my knowledge, we are not able to use any other type but String, Number or Binary for purpose of making secondary indexes. I've been experimenting with Secondary Indexes and Sorting Keys as well but without luck.
I've tried documentClient.scan(), same story. Still no luck with accessing nested attributes in List (FilterExpression just won't accept it).
I am aware of possibility to filter result on "application" side, once the records are retrieved (by Artists for instance) but I am interested to filter it out in FilterExpression
If I understand your problem correctly, you'd like to create a query that filters on the value of a complex attribute (in this case, a list of objects).
You can filter on the contents of a list by indexing into the list:
var params = {
TableName: "myTable",
FilterExpression: "Statistics[0].WrittenBy = :writtenBy",
ExpressionAttributeValues: {
":writtenBy": 'Kursch'
}
};
Of course, if you don't know the specific index, this wont really help you.
Alternatively, you could use the CONTAINS function to test if the object exists in your list. The CONTAINS function will require all the attributes in the object to match the condition. In this case, you'd need to provide Sales and WrittenBy, which probably doesn't solve your problem here.
The shape of your data is making your access pattern difficult to implement, but that is often the case with DDB. You are asking DDB to support a query of a list of objects, where the object has a specific attribute with a specific value. As you've seen, this is quote tricky to do. As you know, getting the data model to correctly support your access patterns is critical to your success with DDB. It can also be difficult to get right!
A couple of ideas that would make your access pattern easier to implement:
Move WrittenBy out of the complex attribute and put it alongside the other top-level attributes. This would allow you to use a simple FilterExpression on the WrittenBy attribute.
If the WrittenBy attribute must stay within the Statistics list, make it stand alone (e.g. [{writtenBy: Kursch}, {Sales: 42},...]). This way, you'd be able to use the CONTAINS keyword in your search.
Create a secondary index with the WrittenBy field in either the PK or SK (whichever makes sense for your data model and access patterns).

Retrieving arrays of nested information in AppSync schema

I have worked out a fairly complex chain of DynamoDB resolvers on a GraphQL AppSync query. What I am curious to know is if I could have possibly designed this in a way to require fewer DynamoDB queries.
Here is my GraphQL Schema:
type Tag {
PartitionKey: ID!
SortKey: ID!
TagName: String!
TagType: String
}
type Model {
PartitionKey: ID!
Name: String
Version: Int
FBX: String
# ms since epoch
CreatedAt: AWSTimestamp
Description: String
Tags: [String]
}
type Query {
GetAllModels(count: Int, nextToken: String): PaginatedModels!
}
This is the query that I am doing:
query GetAllModels{
GetAllModels {
Models {
PartitionKey
Name
Version
CreatedAt
Description
Tags {
TagName
TagType
}
}
}
}
My DynamoDB table is set up as so:
PartionKey | SortKey | TagName | TagType | ModelName | Description
Model-0 | Model-0 | ModelZero | Blah Blah
Model-0 | Tag-Pine |
Model-0 | Tag-Apple |
Tag-Pine | Tag-Pine | Pine | Tree
Tag-Apple | Tag-Apple | Apple | Fruit
So in my resolvers I am going:
GetAllModels will scan with two filters. One filter for PartitionKey beginning with 'Model-' and another filter for SortKey begining with 'Model-'. This is to get all Models.
Next there is a resolver attached to 'Tags' in the Model object. This will query with two expressions. One for PartitionKey = source.Parition and a second for SortKey begin_with 'Tag-' this gets me all of the tags on a model.
Next there are two resolvers on the Tag object. One on TagName and another on TagType. These do a direct GetItem to get their appropriate value with PartitionKey = source.Sort and SortKey = source.SortKey set as the keys.
So each scanned Model ends up firing off 3 more queries to DynamoDB. This just seems a bit excessive to me. But I cannot see any other way to do this. Is there some way to be able to get both TagName and TagType in one query?
Is there a better way to approach this?
I see a few things that I would personally change. The first is that I would avoid the nested DynamoDB scan operations. At least one of these can be replaced with a much faster query operation. The second is that I would consider rethinking how you are storing the data. Currently, there is no good way to list model objects.
Why is there no good way to list model objects?
Assuming each model object will have multiple tags then you are going to have a table that is sparsely populated by model objects. i.e. out of 100 rows you may have 20 - 50 models depending on how many tags the average model has. In DynamoDB, a table is split up based on the partition key causing rows that share the same partition key to be stored near each other to speed up query operations. With your setup where the Partition Key is essentially the unique id of a single model object this means that we can easily get a single model object. You can also quickly get the tags for a single object since those records are nearby as well.
The issue.
The DynamoDB scan operation looks at each partition one at a time, reads as many records as the requests limit allows or all of them if the limit is sufficiently large, and then, only after reading the records from the individual partitions, applies the filter expression before returning the final result. This means you may ask for the first 10 models but since the limit is applied before the scan filter, you may very well only get back 1 model (if that one model had 9 or more tags which would exhaust the limit while DynamoDB was reading the first partition). This may seem strange when coming from many different database systems and is an important consideration of its design.
Here are two solutions to address this concern:
1. Store Models in one table and Tags in another.
NoSQL databases like DynamoDB allow you to store many types of data in the same table but there is nothing wrong with splitting them out. Traditionally it can be a pain to work with multiple tables in a NoSQL database that lacks a join operation or something similar, but fortunately for us we can use GraphQL to "join" data for us. With the approach, the Model table has a single partition key named "id" and your GetAllModels resolver is still a scan but this time on the model table. This way the table is not sparse and you will get 10 models when you ask for 10 models. The Tag table should have a partition key of modelId and a sort key of tagId. You would then have a resolver on the Model.tags field that does a query against the Tag table and looks for rows with the modelId == $ctx.source.id.
This is essentially how #model and #connection work in the new graphql transform tooling launched as part of the amplify cli. You can see more here although the docs are as of writing still being improved. https://aws-amplify.github.io/amplify-js/media/api_guide
2. Store Models and Tags in the same table but change the key structure.
This approach works if you can reliably say that you will have less than 10GB of data per data type (e.g. Model & Tag). For this approach you have a single table with a PartitionKey of Type and Sort Key of id. When you create objects you create them with a Type e.g "Tag" or "Model" etc and a unique id (like a uuid). To list objects of the same type you do a DynamoDB query operation on the partition key of the type to list e.g. "Tag" or "Model". You can then use GSIs to efficiently look up related objects. In your case you would store a "modelId" is every Tag object. You would then make a GSI using the "modelId" as the Partition Key. To list all the tags for a given model you could then do a DynamoDB query operation against that GSI.
I'm sure there are many more ways to do this but hopefully this helps point in the right direction.

Define unique columns on ManyToMany in Doctrine

I'm trying to add unique columns on a pivot table created via a ManyToMany association.
I found this page of the documentation explaining how to generate a database unique constraint on some columns with this example:
/**
* #Entity
* #Table(name="ecommerce_products",uniqueConstraints={#UniqueConstraint(name="search_idx", columns={"name", "email"})})
*/
class ECommerceProduct
{
}
But this only works if I create the pivot table via a third entity and, in my case, I created the pivot table using a ManyToMany relation (in the same fashion as this code).
Is there a way to add unique columns on pivot table while still using ManyToMany or do I need to rely on a third entity?
While #Table annotation proposes a uniqueConstraints option, #JoinTable does not. Thus, if you want to add a unique constraint on your association table, you will have to actually create another entity explicitly.
That being said, the default join table should not need anything more than the default configuration set up by Doctrine. Currently, when adding a ManyToMany association, the join table is composed of two fields and a composite primary key relying on both fields is created.
If your association table only contains the two basic fields referring to both sides of your association (which is necessarily the case if you use #ManyToMany), the composite primary key should be all you need.
Here is the generated SQL for the basic example where a User has a ManyToMany association with Group (from this section of the documentation):
CREATE TABLE users_groups (
user_id INT NOT NULL,
group_id INT NOT NULL,
PRIMARY KEY(user_id, group_id)
) ENGINE = InnoDB;
ALTER TABLE users_groups ADD FOREIGN KEY (user_id) REFERENCES User(id);
ALTER TABLE users_groups ADD FOREIGN KEY (group_id) REFERENCES Group(id);
As you can see, everything is properly set up with a composite primary key which will ensure that there can't be duplicate entries for the couple (user_id, group_id).
Of course there is another alternative, Alan!
If you need a Zero to Zero relationship, the only alternative is defining the unique constraint per each pk in the agregated table, to make doctrine figuring out about zero to zero relationship.
The problem is that Doctrine's people hadn't considered zero to zero relationships, so the only alternative for this is manytomany relationship with one unique constraint per pk.
If you have doubts about final-state of your doctrine implementation of your E-R model, I strongly recommend mysql-workbench-schema-exporter. With this php tool, you can easily export your mysql workbench E-R schema to a Doctrine's working classes schema, so you would be able to easily explore all your alternatives ;-)
Hope this helps

DynamoDB : Global Secondary Index utilisation in queries

I am coming from RDMS background and I started using DynamoDB recently.
I have following DyamoDB table with three Global Secondary Indexes (GSI)
Id (primary key), user_id(GSI), event_type (GSI), product_id (GSI)
, rate, create_date
I have following three query patterns:
a) WHERE event_type=?
b) WHERE event_type=? AND product_id=?
c) WHERE product_id=?
d) WHERE product_id=? AND user_id=?
I know in MySQL I need to create following indexes to optimize above queries :
composite index (event_type,product_id) : for queries "a" and "b"
composite index (product_id,user_id) : for queries "c" and "d"
My question is , if I create three GSIs for 'event_type', 'product_id' and 'user_id' fields in DyanomoDB, do the query patterns "b" and "d" utilize these three independent GSIs ?
Firstly, unlike in RDBMS, the Dynamodb doesn't choose the GSI based on the fields used in filter expression (I meant there is no SQL optimizer to choose the appropriate index based on the fields used in SQL).
You will have to query the GSI directly to get the data. You can refer the GSI query page to understand more on this.
You can create two GSIs:-
1) Event type
2) Product id
You make sure to include the other required fields in the GSI especially product id, user id and any other required fields. This way when you query the GSI, you get all the fields required to fulfill the use case. As long as you have one field from GSI, you can include other fields in Filter expression to filter the data. This ensures that you dont create unnecessary GSIs which requires additional space and cost.

Open JPA how do I get back results from foreign key relations

Good morning. I have been looking all over trying to answer this question.
If you have a table that has foreign keys to another table, and you want results from both tables, using basic sql you would do an inner join on the foreign key and you would get all the resulting information that you requested. When you generate your JPA entities on your foreign keys you get a #oneToone annotation, #oneToMany, #ManyToMany, #ManyToOne, etc over your foreign key columns. I have #oneToMany over the foreign keys and a corresponding #ManyToOne over the primary key in the related table column I also have a #joinedON annotation over the correct column... I also have a basic named query that will select everything from the first table. Will I need to do a join to get the information from both tables like I would need to do in basic sql? Or will the fact that I have those annotations pull those records back for me? To be clear if I have table A which is related to Table B based on a foreign key relationship and I want the records from both tables I would join table A to B based on the foreign key or
Select * From A inner Join B on A.column2 = B.column1
Or other some-such non-sense (Pardon my sql if it is not exactly correct, but you get the idea)...
That query would have selected all column froms A and B where those two selected column...
Here is my named query that I am using....
#NamedQuery(name="getQuickLaunch", query = "SELECT q FROM QuickLaunch q")
This is how I am calling that in my stateless session bean...
try
{
System.out.println("testing 1..2..3");
listQL = emf.createNamedQuery("getQuickLaunch").getResultList();
System.out.println("What is the size of this list: number "+listQL.size());
qLaunchArr = listQL.toArray(new QuickLaunch[listQL.size()]);
}
Now that call returns all the columns of table A, but it lack's the column's of table B. My first instinct would be to change the query to join the two tables... But that kind of makes me think what is the point of using JPA then if I am just writing the same queries that I would be writing anyway, just in a different place. Plus, I don't want to overlook something simple. So what say you stack overflow enthusiasts? How does one get back all the data of joined query using JPA?
Suppose you have a Person entity with a OneToMany association to the Contact entity.
When you get a Person from the entityManager, calling any method on its collection of contacts will lazily load the list of contacts of that person:
person.getContacts().size();
// triggers a query select * from contact c where c.personId = ?
If you want to use a single query to load a person and all its contacts, you need a fetch in the SQL query:
select p from Person p
left join fetch p.contacts
where ...
You can also mark the association itself as eager-loaded, using #OneToMany(lazy = false), but then every time a person is loaded (vie em.find() or any query), its contacts will also be loaded.