DynamoDB Search with HashKey - amazon-web-services

I have table with the following structure
HashKey: Place Name
Attribute: Place Data Class
now i would like to search with Place Name,
Ex: I have below data
Newyork {lat: x.xxxxx, lng: x.xxxxxx}
Newzealand {lat: x.xxxxx, lng: x.xxxxxx}
IndialaPolis {lat: x.xxxxx, lng: x.xxxxxx}
when i searching with keyword new, it should return Newyork and Newzealand, I searched google for this, i found that we can get all records by the HashKey

When doing a Query, you can only have exact matches on the HashKey.
However, there is also the Scan operation, that you can use along with a FilterExpression. Just note that Query and Scan perform and consume capacity differently. See the differences here.
Here are example parameters you could use on a Scan with begins_with:
{
table_name: tableName,
filter_expression: "begins_with(#country, :search)",
expression_attribute_names: {
"#country" => "country"
},
expression_attribute_values: {
":search" => "new"
}
}
This article is a good place to start.

Thank you #mark-b and #bruno-buccolo, as you said it is not possible to Search hashkey field
So i created Elastic Search Indexes for that table manually and updating each time the original record updates

Related

Cannot create index on non-empty table

I'm currently using AWS Lambda (NodeJS) with AWS QLDB.
The scenario is like this.
I have the first table and its indexes when I deployed the service. So the table and indexes will be created. My problem is that, once I need to add new table and its indexes; it can't create the index because there's existing table.
My workaround to be able to create new table even if there's an existing table in my Ledger is that I'm querying the list of tables I have.
const getTables = async (transactionExecutor: TransactionExecutor) => {
const statement = `SELECT name FROM information_schema.user_tables`;
return await transactionExecutor.execute(statement);
};
Then I have this condition to check if the table is already existing
const tables = JSON.stringify(result.getResultList());
if (
!JSON.parse(tables).some((object): boolean => object.name === process.env.TABLE_NAME)
) {
console.log('TABLE A NOT EXISTING');
await createTable(transactionExecutor, process.env.TABLE_NAME);
}
if (
!JSON.parse(tables).some(
(object): boolean => object.name === process.env.TABLE_NAME_1,
)
) {
console.log('TABLE B NOT EXISTING');
await createTable(transactionExecutor, process.env.TABLE_NAME_1);
}
I don't know how to do it with indexes, I tried using SQL commands in QLDB but it's not working.
I hope you can help me.
Thank you
I'm not quite sure what your question is (the post title and body hint at different things), but I'm going to do my best to answer.
First, QLDB stores data in Ion, not JSON. So, please use the Ion APIs to parse data and not the JSON ones. The reason your code works at all is because Ion is a superset of JSON and the result set doesn't include types that are unknown to JSON. So, for example, if the result set was changed to include an Ion Timestamp, then your code would break.
Next, actually getting a list of tables has first class support in the driver. Simply use driver.getTableNames.
Third, I think you have a question "can I add an index to a non-empty table?". The answer is "no". This is planned functionality and I will update this answer when it is available. UPDATE: Now you can! https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-qldb-launches-index-improvements/
Finally, I think you're also asking if there is a way to list indexes on a table in the same way as you can list tables in a ledger. The answer to that is 'yes'. The documents returned in information_schema.user_tables look like this:
{
tableId:"...",
name:"THE_TABLE_NAME",
indexes:[
{
expr:"[THE_FIELD_BEING_INDEXED]"
}
],
status:"ACTIVE"
}

Azue Map Search: unique ID and normalized name

I use azure maps for cities autocomplete on my site.
I use this API method: https://learn.microsoft.com/en-us/rest/api/maps/search/getsearchaddress
Request: https://atlas.microsoft.com/search/address/json?params and parameters are:
query=mosco - I'm looking for Moscow
typehand=true
api-version=1.0
subscription-key=...my key...
Result is
{
...
results: [
{
type: "Geography",
id: "RU/GEO/p0/116970,
...
address: {
municipality: "Moscow",
countryCode: "RU",
freeformAddress: "Moscow"
}
},
...
],
}
Ok, it's Moscow.
But I have a few questions.
What is id? Doc say it is "property id". It is persistent? Moscow will always be "116970"?
How can I get normalize name of a city?
I can write "Москва" (Moscow in Russian) and it works and id is same, but names in the object address are different (Москва, Moscow).
If I write "mos" then id is same but address is "Moskva" (instead Moscow).
Can I get name of a geo object by id?
This is a unique id but is not guaranteed to be persistent. The main purpose of this id is for debugging purposes.
We are aware of the "en" issue and are updating the docs.
I sure this is a unique ID, but want proof from documentation :)
Problem solved by parameter language=en-GB now result always is "Moscow". I was misled by the manual when specified only en (it leads to error). https://learn.microsoft.com/en-us/azure/azure-maps/supported-languages

How can I do a "where in" type query using ember-data

How can I perform a where-in type query using ember-data?
Say I have a list of tags - how can I use the store to query the API to get all relevant records where they have one of the tags present?
Something like this:
return this.store.find('tags', {
name: {
"in": ['tag1', 'tag2', 'tag3']
}
})
There isn't built in support for something like that. And, I don't think its needed.
The result that you are after can be obtained in two steps.
return this.store.find('posts'); // I guess its a blog
and then in your controller you use a computed property
filteredPosts: function('model', function() {
var tags = ['tag1', 'tag2', 'tag3'];
return this.get('model').filter(function(post) {
if ( /* post has one of tags */ ) {
}
return false;
});
});
Update: What if there are tens of thousands of tags?!
Amother option is to send a list of tags as a single argument to the back end. You'll have to do a bit of data processing before sending a request and before querying.
return this.store.find('tags', {
tags: ['tag1', 'tag2', 'tag3'].join(', ')
})
In your API you'll know that the tags argument needs to be converted into an array before querying the DB.
So, this is better because you avoid the very expensive nested loop caused by the use of filter. (expensive !== bad, it has its benefits)
It is a concern to think that there will be tens of thousands of tags, if those are going to be available in your Ember app they'll have a big memory footprint and maybe something much more advanced is needed in terms of app design.

Drop column in Dynamo DB table

I've been looking through the AWS Dynamo DB documentation and the Amazon Dynamo interface and it seems like there's no way to remove a column from a table, outside of deleting the entire table with it's contents and starting over, is that true?
If so, why would Amazon not support this?
Try removing all data from that column, it will automatically remove that column.
Using document client with javascript, we can do this:
const paramsUpdate = {
TableName: tableName,
Key: { HashKey: 'hashKey' },
UpdateExpression: 'remove #c ',
ExpressionAttributeNames: { '#c': 'columnName' }
};
documentClient.update(paramsUpdate, (errUpdate) => {
if (errUpdate) log.error(errUpdate);
});
In here we set UpdateExpression with remove sentence
There is a REMOVE action in the DynamoDB API.
DynamoDB does not have a schema definition, and so there is no such thing as a "column". It also means there is no way to delete all attributes with the same name without iterating over each record.
A solution I recommend is to keep these attributes, and to make your code refer to that same data using a fresh attribute name.
For example, attribute content could become content_v2. It might not look so clean, but it's cheap, quick and your old data would be backed up.
Setting all instances of the column value to null clears the column.
In C#, this method does the trick using the persistence framework:
static void RemoveColumn()
{
var myItems = context.ScanAsync<MyObjectType>(null).GetRemainingAsync().Result;
// Foreach item, update
myItems.ForEach(myObject =>
{
myObject.UnwantedColumn = null;
context.Save(myObject);
});
}
Just remove all the data for that one column. On my end, it automatically refreshed, might have to refresh the page.

Should I denormalize or run multiple queries in DocumentDb?

I'm learning about data modeling in DocumentDb. Here's where I need some advice
Please see what my documents look like down below.
I can take two approaches here both with pros and cons.
Scenario 1:
If I keep the data denormalized (see my documents below) by keeping project team member information i.e. first, last name, email, etc. in the same document as the project, I can get the information I need in one query BUT when Jane Doe gets married and her last name changes, I'd have to update a lot of documents in the Projects collection. I'd also have to be extremely careful in making sure that all collections with documents that contain employee information get updated as well. If, for example, I update Jane Doe's name in Projects collection but forget to update the TimeSheets collection, I'd be in trouble!
Scenario 2:
If I keep data somewhat normalized and keep only EmployeeId in the project documents, I can then run three queries whenever I want to get a projects list:
Query 1 returns projects list
Query 2 would give me EmployeeId's of all project team members that appear in the first query
Query 3 for employee information i.e. first, last name, email, etc. I'd use the result of Query 2 to run this one
I can then combine all the data in my application.
The problem here is that DocumentDb seems to have a lot of limitations now. I may be reading hundreds of projects with hundreds of employees in project teams. Looks like there's no efficient way to get all employee information whose Id's appear in my second query. Again, please keep in mind that I may need to pull hundreds of employee information here. If the following SQL query is what I'd use for employee data, I may have to run the same query a few times to get all the information I need because I don't think I can have hundreds of OR statements:
SELECT e.Id, e.firstName, e.lastName, e.emailAddress
FROM Employees e
WHERE e.Id = 1111 OR e.Id = 2222
I understand that DocumentDb is still in preview and some of these limitations will be fixed. With that said, how should I approach this problem? How can I efficiently both store/manage and retrieve all project data I need -- including project team information? Is Scenario 1 a better solution or Scenario 2 or is there a better third option?
Here's what my documents look like. First, the project document:
{
id: 789,
projectName: "My first project",
startDate: "9/6/2014",
projectTeam: [
{ id: 1111, firstName: "John", lastName: "Smith", position: "Sr. Engineer" },
{ id: 2222, firstName: "Jane", lastName: "Doe", position: "Project Manager" }
]
}
And here are two employee documents which reside in the Employees collection:
{
id: 1111,
firstName: "John",
lastName: "Smith",
dateOfBirth: "1/1/1967',
emailAddresses: [
{ email: "jsmith#domain1.com", isPrimary: "true" },
{ email: "john.smith#domain2.com", isPrimary: "false" }
]
},
{
id: 2222,
firstName: "Jane",
lastName: "Doe",
dateOfBirth: "3/8/1975',
emailAddresses: [
{ email: "jane#domain1.com", isPrimary: "true" }
]
}
I believe you're on the right track in considering the trade-offs between normalizing or de-normalizing your project and employee data. As you've mentioned:
Scenario 1) If you de-normalize your data model (couple projects and employee data together) - you may find yourself having to update many projects when you update an employee.
Scenario 2) If you normalize your data model (decouple projects and employee data) - you would have to query for projects to retrieve employeeIds and then query for the employees if you wanted to get the list of employees belonging to a project.
I would pick the appropriate trade-off given your application's use case. In general, I prefer de-normalizing when you have a read-heavy application and normalizing when you have a write-heavy application.
Note that you can avoid having to make multiple roundtrips between your application and the database by leveraging DocumentDB's store procedures (queries would be performed on DocumentDB-server-side).
Here's an example store procedure for retrieving employees belonging to a specific projectId:
function(projectId) {
/* the context method can be accessed inside stored procedures and triggers*/
var context = getContext();
/* access all database operations - CRUD, query against documents in the current collection */
var collection = context.getCollection();
/* access HTTP response body and headers from the procedure */
var response = context.getResponse();
/* Callback for processing query on projectId */
var projectHandler = function(documents) {
var i;
for (i = 0; i < documents[0].projectTeam.length; i++) {
// Query for the Employees
queryOnId(documents[0].projectTeam[i].id, employeeHandler);
}
};
/* Callback for processing query on employeeId */
var employeeHandler = function(documents) {
response.setBody(response.getBody() + JSON.stringify(documents[0]));
};
/* Query on a single id and call back */
var queryOnId = function(id, callbackHandler) {
collection.queryDocuments(collection.getSelfLink(),
'SELECT * FROM c WHERE c.id = \"' + id + '\"', {},
function(err, documents) {
if (err) {
throw new Error('Error' + err.message);
}
if (documents.length < 1) {
throw 'Unable to find id';
}
callbackHandler(documents);
}
);
};
// Query on the projectId
queryOnId(projectId, projectHandler);
}
Even though DocumentDB supports limited OR statements during the preview - you can still get relatively good performance by splitting the employeeId-lookups into a bunch of asynchronous server-side queries.