Pentaho_send mail to different groups based on columns - kettle

I have a requirement to send mail to different users with list of based on two columns in DB table (System Owner and System Owner Email)
Table will contain columns like ProcessID,SystemOwner,Email
Once in every month, Pentaho need to read the Table and group all ProcessID's that belongs to specific system owner and send email .
There are more than 300+ system owners and creating Filter rows loop for every system Owner is challenging for 300+ system Owners
Any suggestion to perform the above with looping with creating filter rows and sending mail

I have prepared a solution for you. You can download all ktr & job from Here
The main job name is 'mainJOb.ktr'. You should modify your smtp credentials.

Related

How to partition DynamoDB table with time-series data from users of different organizations?

I have an application being built using AWS AppSync with a primary focus of sending telemetry data from a mobile application. I am stuck on how to partition and structure the DynamoDB tables for this as the users of the application belong to different organizations, in those organizations there will be admins who are able to view the data specific to their organization.
OrganizationA
-->Admin # View all the telemetry data
---->User # Send the telemetry data from their mobile application
Based on some research from these resources,
Link 1.
Link 2.
The advised manner is to create tables for individual periods i.e., a table for every day with the telemetry readings.
Example(not sure what pk is in this example):
The way in which I am planning to separate the users using AWS Cognito is by attaching a custom attribute when the user signs up such as Organization and Role(Admin or User) as per this answer then use a Pre-Signup Lambda Trigger.
How should I achieve this?
Since you really don't need users from one organization to read data from another organization, and for all your access patterns you will always know the organization id, then that attribute should be a factor in partitioning: either at the table level, or at the partition key level.
Then you have to determine if you can simply use the organization id as a partition key, or you need to further partition -- say, by concatenating the organization id and the hour value for each sample. This will depend on the amount of data you expect to generate by each organization in a given day. The tradeoff being more granular partitioning vs. cost of querying for data.
If organizations generate small amounts of data each day (say, a few events an hour) then just use organization id as the partition key. Otherwise, partition the data further.
In all of the above, the sort key should probably be the timestamp of the events, either with second or millisecond precision depending on your needs. That way your queries can retrieve ordered time-series data.
Keep in mind that when you make queries, you may need to execute multiple queries and stick the results together in your application to fully represent the results as the range may span multiple partitions, or even multiple tables.

AWS DynamoDB Multi-Tenant Table Schema

Thanks in advance!
I want to ceate a Saas solution using AWS for multiple tenants each having multiple users.
Each of the user (Example : Admin, Manager, Supervisor) have to upload their department users data (Eg: Name, SurName, Email, Phone etc. and these user attributes together are identified by a HashKey)
In short I have to store all users's information of multiple departments of multiple companies including each users HashKey
How can this be done using DynamoDB? Can someone help in creating a Table schema?
Query pattern mostly used is : A tenant will provide HashKey and would want to fetch all user information from it or part of it by providing HashKey and some fields.
Regards,
You can use only one table and "separate" the data using dynamodb index.
Create an Index that will be responsible for storing the tenant hash (id, whatever) and then use it to fetch the data when needed.

DynamoDB table/index schema design for querying multi-valued attributes

I'm building a DynamoDB app that will eventually serve a large number (millions) of users. Currently the app's item schema is simple:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
email: "foo#foo.com",
... other attributes ...
}
When a new user signs up, or if a user wants to find another user by email address, we'll need to look up users by email instead of by userId. With the current schema that's easy: just use a global secondary index with email as the Partition Key.
But we want to enable multiple email addresses per user, and the DynamoDB Query operation doesn't support a List-typed KeyConditionExpression. So I'm weighing several options to avoid an expensive Scan operation every time a user signs up or wants to find another user by email address.
Below is what I'm planning to change to enable additional emails per user. Is this a good approach? Is there a better option?
Add a sort key column (e.g. itemTypeAndIndex) to allow multiple items per userId.
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "main", // sort key
email: "foo#foo.com",
... other attributes ...
}
If the user adds a second, third, etc. email, then add a new item for each email, like this:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "Email-2", // sort key
email: "bar#bar.com"
// no more attributes
}
The same global secondary index (with email as the Partition Key) can still be used to find both primary and non-primary email addresses.
If a user wants to change their primary email address, we'd swap the email values in the "primary" and "non-primary" items. (Now that DynamoDB supports transactions, doing this will be safer than before!)
If we need to delete a user, we'd have to delete all the items for that userId. If we need to merge two users then we'd have to merge all items for that userId.
The same approach (new items with same userId but different sort keys) could be used for other 1-user-has-many-values data that needs to be Query-able
Is this a good way to do it? Is there a better way?
Justin, for searching on attributes I would strongly advise not to use DynamoDB. I am not saying, you can't achieve this. However, I see a few problems that will eventually come in your path if you will go this root.
Using sort-key on email-id will result in creating duplicate records for the same user i.e. if a user has registered 5 email, that implies 5 records in your table with the same schema and attribute except email-id attribute.
What if a new use-case comes in the future, where now you also want to search for a user based on some other attribute(for example cell phone number, assuming a user may have more then one cell phone number)
DynamoDB has a hard limit of the number of secondary indexes you can create for a table i.e. 5.
Thus with increasing use-case on search criteria, this solution will easily become a bottle-neck for your system. As a result, your system may not scale well.
To best of my knowledge, I can suggest a few options that you may choose based on your requirement/budget to address this problem using a combination of databases.
Option 1. DynamoDB as a primary store and AWS Elasticsearch as secondary storage [Preferred]
Store the user records in DynamoDB table(let's call it UserTable)as and when a user registers.
Enable DynamoDB table streams on UserTable table.
Build an AWS Lambda function that reads from the table's stream and persists the records in AWS Elasticsearch.
Now in your application, use DynamoDB for fetching user records from id. For all other search criteria(like searching on emailId, phone number, zip code, location etc) fetch the records from AWS Elasticsearch. AWS Elasticsearch by default indexes all the attributes of your record, so you can search on any field within millisecond of latency.
Option 2. Use AWS Aurora [Less preferred solution]
If your application has a relational use-case where data are related, you may consider this option. Just to call out, Aurora is a SQL database.
Since this is a relational storage, you can opt for organizing the records in multiple tables and join them based on the primary key of those tables.
I will suggest for 1st option as:
DynamoDB will provide you durable, highly available, low latency primary storage for your application.
AWS Elasticsearch will act as secondary storage, which is also durable, scalable and low latency storage.
With AWS Elasticsearch, you can run any search query on your table. You can also do analytics on data. Kibana UI is provided out of the box, that you may use to plot the analytical data on a dashboard like (how user growth is trending, how many users belong to a specific location, user distribution based on city/state/country etc)
With DynamoDB streams and AWS Lambda, you will be syncing these two databases in near real-time [within few milliseconds]
Your application will be scalable and the search feature can further be enhanced to do filtering on multi-level attributes. [One such example: search all users who belong to a given city]
Having said that, now I will leave this up to you to decide. 😊

Cross-service references in DB

I am building service oriented system, with multiple services and application.
Current I am not sure how to handle DB references between resources from multiple services and databases.
For example, I have a users service, where I can define all users and their roles.
Next I have, products service, where I can define my products, their prices and other information.
I also have invoicing service, which is used to create invoices. This service will use information from previous two services. It will link products and users to invoice. Now I am not sure what is the best approach for this?
Do I just save product ID and user ID that it got from other two services, without any referential integrity?
If I do this, then I will have problem when generating reports, because at time of generation I will need to send a lot of requests to products service, to get names and prices of product in invoice. Same for users.
Do I create some table products in my invoicing application, and store name and price of product at the moment of invoice creation?
If I go with this approach, then in case that price or name of product changes, I will have inconsistent data across my applications?
Is there some well-known pattern for this kind of problem, that is what is the best solution.
Cross-service references in DB is a common challenge for Data integrity between multiple web services, And specially when we are talking about Real time access.
There is two approaches for your case :
1- Databases Replication across your servers
I suppose that you have each application hosted on a separate server, So i can name your servers as Users_server, Products_server and Invoices_Server.
In your example, your Invoice web service need to grab data from Users & Products Servers, in this case you can create a Replication of your Users Database and Products Database on your Invoices_server.
This way you can run your Join queries on the same server and get data from multiple databases.
Query example :
SELECT *
FROM UsersDB.User u
JOIN InvoicesDB.Invoice i ON u.Id = i.ClientId
2- Main Database Replication
1st step you have to replicate all your databases into one main server we can call it Base_server, which basically contain all your databases from all your services.
Then you can build an internal web service for your application to provide needed data in just "One Call", this answer your question about generating reports.
In other words, you will make one call to the mane Base service instead of making 2 or 3 calls to your separate services.
Note: As a Backend developer we use this organization as a best practice while building a large bundle based application, we create a base bundle and then create service_bundle which rely on the base bundle.
If your services are already live, we may need more details about the technology and databases type you using in order to give you a more accurate solution.
Just because you are using SOA doesn't mean you abandon database integrity. Continue to use referential integrity where your database design requires it.
At the service level, you can have each service be responsible for returning identity information for the entities which it owns. This identity information may or may not be the actual primary key from the database, but it will be used by the clients of the service as though it were the actual primary key.
When a client wants to create an invoice, it will call the User service and receive a User entity, which will contain a User Identifier. It will call the Product service and receive a set of products, each with a product identifier. It will then call the Invoice service to create an invoice, passing the user identifier and the product identifiers. This will likely return an invoice identifier.
You can (probably should) enforce the integrity making the productId and userId foreign keys in your invoice table. Then your DB makes sure the referenced entities exist. Reports should join tables, not query services for each item. I assume a central DB shared across the system.

how to executing user defined stored procedure using sync up framework

My database have 6-7 tables on server side. i want only few 10-50 list of customers which is get me by store procedure (selecting records by joining of 6-7 tables).
I created application(used in both online & offline environment) which is sync up table from server to client vise versa. Which is displaying that customers name in combo box (records from stored procedure).
I am using sync framework. but this 6-7 tables contain huge records near around 67k. I don't want to sync up that 6-7 table. I want to sync up only those list of customer as per the login user.
I created one table like:
Customer_List user_Id Customer_Name customer_Id
and stored procedure return list of customers as per above table structure:
I want to sync up this table with my stored procedure using sync framework.
How I can do this?
there is on publicly available API in Sync Framework for you to specify/invoke a custom stored proc.
seems like what you're SP does can be represented as a Filter...
e.g., side.CustomerId IN (SELECT CustomerId FROM Customer_List WHERE User_Id =#User_Id)