Most Efficient Way to Store and Access Data with Multiple Attributes

Most Efficient Way to Store and Access Data with Multiple Attributes - c++

I've been racking my brain over the past several days trying to find a solution to a storage/access problem.
I currently have 300 unique items with 10 attributes each (all attributes are currently set as strings but some of the attributes are numerical). I am trying to programmatically store and be able to efficiently access attributes based on item ID. I have attempted storing them in string arrays, vectors, maps, and multimaps with no success.
Goal: to be able to quickly access an item and one of its attributes quickly and efficiently by unique identifier.
The closest I have been able to get to being successful is:
string item1[] = {"attrib1","attrib2","attrib3","attrib4","attrib5","attrib6","attrib7","attrib8","attrib9","attrib10"};
I was then able to access an element on-demand by callingitem1[0]; but this is VERY inefficient (particularly when trying to loop through 300 items) and was very hard to work with.
Is there a better way to approach this?

If I understand your question correctly it sounds like you should have some sort of class to hold the attributes, which you would put into a map that has the item ID as the key.

Related

AWS Elasticsearch exceeded limit of total fields in index

I'm running Elasticsearch on AWS, and haven't quite understood how to properly address this issue.
Right now I have the items stored on DynamoDb and use dynamodb streams to send the items to a lambda that then uses dynamodb-stream-elasticsearch to send them to elasticsearch when they are created/updated.
Some properties can be objects which have many nested properties which can themselves be objects, and when these new fields were added, is when I first started getting this error. Due to the nature of these items, these new properties will need to be searchable in the future.
Initially the default index value had not been changed. After my first search on how to fix this I increased the limit to 5000 and now have had to increase it to 12000. The instance type is a t2.small.elasticsearch. The aws elasticsearch console is already reporting the instance health as yellow after I increased the index limit.
Which is the best way to tackle this sort of situation?
Does increasing the instance type fix it or is it a matter of breaking up the item and having multiple separate indexes? If the solution is the latter, is there a good tutorial/guide on how to do this with this set-up (aws dynamodb/elasticsearch)?

By default, the maximum number of fields in an index is 1000, But you can increase that by changing the index.mapping.total_fields.limit index setting.
See other settings to prevent mappings explosion: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/mapping.html#mapping-limit-settings
Which is the best way to tackle this sort of situation?
This could be a solution if you use flattened
The nested type is a specialised version of the object datatype that
allows arrays of objects to be indexed in a way that they can be
queried independently of each other.
When ingesting key-value pairs with a large, arbitrary set of keys,
you might consider modeling each key-value pair as its own nested
document with key and value fields. Instead, consider using the
flattened datatype, which maps an entire object as a single field and
allows for simple searches over its contents. Nested documents and
queries are typically expensive, so using the flattened datatype for
this use case is a better option.

How to store my data?

I am currently looking for a way in which I can store my data, and quickly look it up.
What currently seem to be the best idea is to use a hash map.
reasons:
To identify what item I am looking for, one need to extract a certain set of features from the data, which can be used as a key to extract the item.
Problem here is that the features could contain noise, making the key <-> item match not 100% perfect.
In such case I need to find the key which matches the features set the best?
Or a set of features sets that matches the feature set the best.
But how do i do that?
The feature vector is 512 long.
sorting would be a possibility? but how should I sort such a matrix?
And given the complications I am facing, is using a hash map the correct choice or is something better?

Should I use a relational database or write my own search tree

basically my whole career is based on reading question here but now I'm stuck since I even do not know how to ask this correctly.
I'm designing a SQLITE database which is meant for the construction of data sheets out of existing data sheets. People like reusing stuff and I want to manage this with a DB and an interface. A data sheet has reusable elements like pictures, text, formulas, sections, lists, frontpages and variables. Sections can contain elements -> This can be coped with recursive CTEs - thanks "mu is too short" for that hint. Texts, Formulas, lists etc. can contain variables. At the end I want to be able to manage variables which must be unique per data sheet, manage elements which are an ordered list making up the data sheet. So selecting a data sheet I must know which elements are contained and what variables within the elements are used. I must be able to create a new data sheet by re-using elements and/or creating new ones if desired.
I came so far to have (see also link to screen shot at the bottom)
a list of variables
which (several of them) can be contained in elements
a list of elements
elements make up the
a list of data sheets
Reading examples like
Store array in SQLite that is referenced in another table
How to store a list in a column of a database table
give me already helpful hints like that I need to create for each data sheet a new atomic list containing the elements and the position of them. Same for the variables which are referenced by each element. But the troubles start when I want to have it consistent and actually how to query it.
How do I connect the the variables which are contained within elements and the elements that are contained within the data sheets. How do I check when one element or variable is being modified, which data sheets need to be recompiled since they are using the same variables and/or elements?
The more I think about this, the more it sounds like I need to write my own search tree based on an object oriented inheritance class structure and must not use data bases. Can somebody convince me that a data base is the right tool for my issue?
I learned data bases once but this is quite some time ago and to be honest the university was not giving good lectures since we never created a database by our own but only worked on existing ones.
To be more specific, my knowledge leads to this solution so far without knowing how to correctly query for a list of data sheets when changing the content of one value since the reference is a text containing the name of a table:
screen shot since I'm a greenhorn
Update:
I think I have to search for unique connections, so it would end up in many-to-many tables. Not perfectly happy with it but I think I can go on with it.
still a green horn, how are you guys using correct high lightning for sql?

Data structure for managing contact-collision info between particles

I am performing some particle simulations in C++ and I need to keep a list of contacts info between particles. A contact is actually a data struct containing some data related to the contact. Each particle is identified with a unique ID. Once a contact is lost, it is deleted from the list. The bottleneck of the simulation is computing the force (a routine inside the contacts), and I have found an important impact on the overall performance according to the actual way the contact list is organised.
Currently, I am using a c++ unordered_map (hash map), whose key is a single integer obtained from a pair function applied over the two unique IDS of the particles, and the value is the contact itself.
I would like to know if there is a better approach to this problem (organising efficiently the list of contacts while keeping the info of the particles they are related with) since my approach is done just because I read and found than a hash map is fast for both insertion and deletion.
Thanks in advance.

Add to list within document MongoDB

I have a database where I store player names that belong to people who have certain items.
The items have and IDs, and subIDs.
The way I am currently storing everything is:
Each ID has its own collection
Within the collection there is a document for each subID.
The document for each subID is layed out like so:
{
"itemID":itemID,
"playerNames":[playerName1, playerName2, playerName3]
}
I need to be able to add to the list of playerNames quickly and efficiently.
Thank you!

If I understood your question correctly, you need to add items to the "playerNames" array. You can use one of the following operators:
If the player names array will have unique elements in it, use $addToSet
Otherwise, use $push

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Most Efficient Way to Store and Access Data with Multiple Attributes - c++

If I understand your question correctly it sounds like you should have some sort of class to hold the attributes, which you would put into a map that has the item ID as the key.

Related

AWS Elasticsearch exceeded limit of total fields in index

How to store my data?

Should I use a relational database or write my own search tree

Data structure for managing contact-collision info between particles

Add to list within document MongoDB

Categories

Resources