Aws step function - check if multiple entries in dynamo are updated - amazon-web-services

I have stepfunction for calculating operation cost of Departments. Each department consists of Employees. Employee is individual records in dynamo with fields (EmpId, Salary, Salary_Status(Processed, Pending, Not_Eligible))
Department operation cost Stepfunction example: Runs for individual departments
Start -> Step1: updateEmployeeSalaryLambda -> step2: wait for employee salary to be updated(Dynamo) -> doFoo() -> End
Is there a way to do the dynamo check from stepfunction directly .i.e check if all employees in a particular department have Salary_Status == Paid?
Thanks for all the help

There are two ways of doing it.
Introduce a sort key. Possible sort key design for your application may be: [Department]#[Salary_Status]. You can query it from another Lambda within your Step Function. If you know how many employees are in the department, you can count if the number matches the count of found paid employees. Another option is to check how many employees don't have Paid status (refer to this).
If you need more complex query you can also leverage secondary index.

Related

How to query data in AWS AppSync in a specific range then sort its result by another key?

I create a temple name BlogAuthor in AWS DynamoDB with following structure:
authorId | orgId | age |name
Later I need to make a query like this: get all authors from organization id = orgId123 with age between 30 and 50, then sort their name in alphabet order.
I'm not sure it's possible to perform such query in DynamoDB (later I'll apply it in AppSync), hence the first solution is to create an index (GSI) with partitionKey=orgId, sortKey=age (final name is orgId-age-index).
But next, when try to query in DynamoDB, set partitionKey orgId=orgId123, sortKey age=[30;50] and no filter; then I can have a list of authors. However, there is no way to sort that list by name from above query.
I retry another solution by create new index with partitionKey=orgId and sortKey=name. Then, query (not scan) in DynamoDB with partitionKey orgId=orgId123, set empty sortKey value (because we only want to sort by name instead of getting a specific name), and filter age in range [30;50]. This solution seems works, however I notice the filter is applied on the result list - for example the result list with 100 items, but after apply filter by age, then may by 70 items remaining, or nothing. But I always hope it returns 100 items.
Could you please tell me is there anything wrong with my approaches? Or, is it possible to make such query in DynamoDB?
Another (small) question is when connect that table to an AppSync API: if it's not possible to perform such query, then it's not possible for such query in AppSync too?
You are not going to be able to do everything you want in a single DynamoDB query.
Option 1:
You can do what you want as long as you are ok with sorting objects on the client. This would work for organizations with a relatively small number of people.
Pros:
Allows you to efficiently query users in a particular organization between a range of users.
Cons:
Results are not sorted by name on the server.
Option 2:
Pros:
Allows you to paginate through users at an organization that are ordered by the name.
Cons:
You cannot efficiently get all users in an organization within an age range. You would effectively be scanning the index and would need multiple round trip calls.
Option 3:
A third option, would be to stream information from DynamoDB into ElasticSearch using DynamoDB streams and AWS Lambda. Once the data is in Elasticsearch, you can do much more advanced queries. You can see more information on the Elasticsearch search APIs here https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html.
Pros:
Much more powerful query engine.
Cons:
More overhead w/ the DynamoDB stream and AWS Lambda function.

How to solve "hot" hash key issue (space skewed data) in DynamoDB?

For example, I am using DynamoDB to store product purchase records. The hash key is product ID and the range key is purchase time.
Some popular products can have a lot of purchase records (space skewed) so that read/write requests can get throttled for "hot" partitions while other partitions are not using full throughput.
How to solve this problem and still be able to get latest purchase records? Thanks!
You can use a cache solution in order to achieve this.
You can follow the guidelines when designing a table to cache the popular items:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.CachePopularItems
My solution for this is to use elasticache (Redis), you can create a list that represent the last purchases per product and trim the last 100 purchases for each product, for example:
LPUSH product:100 2016-08-13:purchaseId
LTRIM product:100 0 99
Will trim the list to last 100 items.
I hope this help...

map with list as value in SOQL

I'm working on a small project in Salesforce where a customer can be the master in a master-detail relationship to more than a single project object.
Projects have a currency 'Max_Budget__c' and what I want to do is calculate the max budget per customer by adding the Max_Budget__c per project. This means that I need to collect a List per customer but also need customer stored so I can update the Max_budget__c object of it.
I was thinking about storing it in a Map with Customer__c as key and Set as value, But I'm not sure how to do this in a single SOQL Query or if i'm even looking in the right direction. Any ideas?
Thanks in advance
Let suppose you have multiple projects like project1, project2 etc .
Each has many to one relationship with customer.
i.e many project1 records to single customer record.
many project2 records to single customer record and so on.
so each customer would have many project1 records,project2 records etc..
To calculate max budget per customer you need to calculate max budget from all project1 records,max budget from all project2 records etc..from each customer and add.
So the requirement is simple write a roll up summary on each child(each project) and add all roll up summaries into a custom field.
I hope this will help you or please let me know if anything else needed.
Regards,
Naveen
Automated deployments , Dataloader , Sandbox back-ups, test automation for Salesforce applications

Django webapp - tracking financial account information

I need some coding advice as I am worried that I am creating, well, bloated code that is inefficient.
I have a webapp that keeps track of a company's financial data. I have a table called Accounts with a collection of records corresponding to the typical financial accounts such as revenue, cash, accounts payable, accounts receivable, and so on. These records are simply name holders to be pointed at as foreign keys.
I also have a table called Account_Transaction which records all the transactions of money in and out of all the accounts in Accounts. Essentially, the Account_Transaction table does all the heavy lifting while pointing to the various accounts being altered.
For example, when a sale is made, two records are created in the Account_Transaction table. One record to increase the cash balance and a second record to increase the revenue balance.
Trans Record 1:
Acct: Cash
Amt: 50.00
Date: Nov 1, 2011
Trans Record 2:
Acct: Revenue
Amt: 50.00
Date: Nov 1, 2011
So now I have two records, but they each point to a different account. Now if I want to view my cash balance, I have to look at each Account_Transaction record and check if the record deals with Cash. If so, add/subtract the amount of that record and move to the next.
During a typical business day, there may be upwards of 200-300 transactions like the one above. As such, the Account_Transaction table will grow pretty quickly. After a few months, the table could have a few thousand records. Granted this isn't much for a database, however, every time the user wants to know the current balance of, say, accounts receivable, I have to traverse the entire Account_Transaction table to sum up all records that deal with the account name "Accounts Receivable".
I'm not sure I have designed this in the most optimal manner. I had considered creating a distinct table for each account (one for "Cash", another for "Accounts Receivable" another for "Revenue" etc...), but with that approach I was creating 15-20 tables with the exact same parameters, other than their name. This seemed like poor design so I went with this Account_Transaction idea.
Does this seem like an appropriate way to handle this kind of data? Is there a better way to do this that I should really be adopting?
Thanks!
Why do you need to iterate through all the records to figure out the status of Accounts Receievable accounts? Am I missing something in thinking you can't just use a .filter within the Django ORM to selectively pick the records you need?
As your records grow, you could add some date filtering to your reports. In most cases, your accountant will only want numbers for this quarter, month, etc., not entire historic data.
Add an index to that column to optimize selection and then check out Djangos aggregation to Sum up values from your database.
Finally, you could do some conservative caching to speed up things for "quick view" style reports where you just want a total number very quickly, but you need to be careful with this to not have false positives, so reseting that cache on any change to the records would be a must.
Why don't you keep track of the exact available amount in the Account table? The Account_Transaction could only be used to view transaction history.

Sorting by price with Amazon AWS

As part of an ItemSearch operation with Amazon AWS, one can sort the results by price. Does anyone know which actual price the sort is performed by? Sale Price? Regular Price?
Assuming you're sorting them in lowest-to-highest order, it lists them by the lowest available price per item. There's list price, new price (sale price), and used price to choose from. If I remember correctly, you can customize your query to include any or all of these prices. Whichever of these three is lowest determines the sort placement for an item. See the results on ThriftCart.com (generated by AWS query) for an example.