How DocumentDB indexes existing documents with new index? - amazon-web-services

I have 70k documents. I create a new index. Call explain("executionStats") with newindex field in match condition shows following:
Or it may take some time? But it was created 8 h ago. Can I somehow check progress of indexing?

Are you using $elemMatch? If so, documentdb will default to a collscan: https://docs.aws.amazon.com/documentdb/latest/developerguide/functional-differences.html#functional-differences.indexes

Related

DynamoDB update one column of all items

We have a huge DynamoDB table (~ 4 billion items) and one of the columns is some kind of category (string) and we would like to map this column to either new one category_id (integer) or update existing one from string to int. Is there a way to do this efficiently without creating new table and populating it from beginning. In other words to update existing table?
Is there a way to do this efficiently
Not in DynamoDB, that use case is not what it's designed for...
Also note, unless you're talking about the hash or sort key (of the table or of an existing index), DDB doesn't have columns.
You'd run Scan() (in a loop since it only returns 1MB of data)...
Then Update each item 1 at a time. (note could BatchUpdate of 10 items at a time, but that save just network overhead..still does 10 individual updates)
If the attribute in question is used as a key in the table or an existing index...then a new table is your only option. Here's a good article with a strategy for migrating a production table.
Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
Enable DynamoDB Streams on the original table
Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
[Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
Scan the GSI created in the previous step (or entire table) and use the following Filter:
FilterExpression = "attribute_not_exists(Migrated)"
Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).
NOTE You may want to increase write capacity units on the table during the updates.
The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
Once all items have been migrated, repoint the code to the new table
Remove original table, and Lambda function once happy all is good.

How to increase aws dynamodb index limit from 5

Is it possible to increase the index from 5 to 15?
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
Secondary Indexes Per Table You can define a maximum of 5 local
secondary indexes and 5 global secondary indexes per table.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html
The Query operation finds items based on primary key values. You can
query any table or secondary index that has a composite primary key (a
partition key and a sort key).
If I understood this correct, you can set 1 main hashkey, 5 secondary keys and another 5 global... and you can only query against an index.
We are thinking about using DynamoDB for a NoSQL database but we are completely stumbed by this. In Mongo or Elastic or Solr.. you can query by pretty much any doc attr you want.
In this app we already have 15 attributes we know we will want to query against, but DynamoDB only offers the ability to index 5.. unless i am mistaken... is there another way to query aside from against a preset index?
You can define a maximum of 5 local secondary indexes.
There is an initial quota of 20 global secondary indexes per table. To request a service quota increase, see https://aws.amazon.com/support.
Source: Secondary Indexes # Developer Guide
Unfortunately, 5 local secondary indexes(LSI) service quota could not be extended.
When you have more (>20) attributes to query from DynamoDB, it is not efficiently possible. You have to use Scan which evaluates all the items in the table, which is not efficient. You have to either move to a different database or use AWS ElasticSearch to index the attributes for searching.
The limit for number of global secondary index per table has been increased to 20.
You can cut a support case incase you need to create more than 20 global secondary index for a DynamoDB table.
https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-dynamodb-increases-the-number-of-global-secondary-indexes-and-projected-index-attributes-you-can-create-per-table/
It turns out that the answer was to wait. Now dynamodb supports 20 global secondary indexes. According to:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
There is an initial limit of 20 global secondary indexes per table. To
request a service limit increase see https://aws.amazon.com/support.
Docker image still has the 5 GSI limit :(
https://hub.docker.com/r/amazon/dynamodb-local
There is a way to circumvent the limit for number of indexes by overloading indexed column. For example, you may store multiple data attributes in the same partition key or sort key. The problem may arise when the values of those different attributes can overlap. In this case, you can, say, prepend a suffix that distinguishes between different attributes.
Let's take a look at an example. Let's say, we have a data set with attributes like user-name, employee-name, company-name, and you we want to store them all in a same indexed column (say partition key of a global secondary index). Some values for the attributes may overlap, so we "tag" our attributes with a prefix: user#name, employee#name and company#name.
This allows us to query with conditions like begins_with(user#abc) without mixing up different attributes but still having them all indexed.
More information is in the official AWS documentation: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html
Hope this helps.

Update all values in a DynamoDB attribute

Is there an approach for updating all items in an attribute(column) ?.
I'm updating the values one by one using a for loop, but it takes a while. I can easily update a whole row in my table by benefiting from DynamoDB mapper, but cannot find a similar functionality for an attribute.
No, the only way is to do a scan over the hash space and update each item.

How can i query to get the multiple values in SimpleDB (AWS)

jpg
In that Picture i have colored one part. i have attribute called "deviceModel". It contains more than one value.. i want to take using query from my domain which ItemName() contains deviceModel attribute values more than one value.
Thanks,
Senthil Raja
There is no direct approach to get what you are asking.. You need to manipulate by writing your own piece of code. By running SELECT query you will get the item Attribute-value pair. So here you need to traverse each each itemName() and count values of your desire attribute.
I think what you are refering to is called MultiValued Attributes. When you put a value in the attribute - if you don't replace the existing attribute value the values will multiply, giving you an array of items connected to the value of that attribute name.
How you create them will depend on the sdk/language you are using for your REST calls, however look for the Replace=true/false when you set the attribute's value.
Here is the documentation page on retrieving them: http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/ (look under Using Amazon SimpleDB -> Using Select to Create Amazon SimpleDB Queries -> Queries on Attributes with Multiple Values)

How do I update a value in a row in MySQL using Connector/C++

I have a simple database and want to update an int value. I initially do a query and get back a ResultSet (sql::ResultSet). For each of the entries in the result set I want to modify a value that is in one particular column of a table, then write it back out to the database/update that entry in that row.
It is not clear to me based on the documentation how to do that. I keep seeing "Insert" statements along with updates - but I don't think that is what I want - I want to keep most of the row of data intact - just update one column.
Can someone point me to some sample code or other clear reference/resource?
EDIT:
Alternatively, is there a way to tell the database to update a particular field (row/col) to increment an int value by some value?
EDIT:
So what is the typical way that people use MySQL from C++? Use the C api or the mysql++? I guess I chose the wrong API...
From a quick scan of the docs it appears Connector/C++ is a partial implementation of the Java JDBC API for C++. I didn't find any reference to updateable result sets so this might not be possible. In Java JDBC the ResultSet interface includes support for updating the current row if the statement was created with ResultSet.CONCUR_UPDATABLE concurrency.
You should investigate whether Connector/C++ supports updateable resultsets.
EDIT: To update a row you will need to use a PreparedStatement containing an SQL UPDATE, and then the statement's executeUpdate() method. With this approach you must identify the record to be update with a WHERE clause. For example
update users set userName='John Doe' where userID=?
Then you would create a PreparedStatement, set the parameter value, and then executeUpdate().