How can you delete Contacts that don't have any recent Cases using Bulk Deletion Jobs? - microsoft-dynamics

I'm trying to figure out the criteria for a Bulk Deletion Job in Dynamics 365 to enforce GDPR.
Essentially, I need to delete Contact records that haven't been used for more than 13 months.
The main part of the Criteria I've been using is where 'Modified On is Older Than 13 months'.
The problem with this is that the 'Modified On' field only updates when you change the details of the Contact record.
For example, a Contact might've been created in 2019 and the details have remained the same since then - therefore the Modified On date is 2019. However, this Contact only just emailed us last week and so their Contact is still in use but this isn't reflected in the Modified On date.
The criteria I have come up with to try and get around this is as follows:
Unfortunately, this is still returning Contacts that don't match the criteria I need.
Is it possible to get this criteria using the advanced find functionality, or will it require something external?

First of all, the usual recommendation is not to delete the records or you'll lose historic data, and you will end up with several records in the system with broken relationships.
If you already backed-up historic data or it does not have any value to the business, unless someone can correct me, I think what you are trying to do won't work with advance find. This is because of the limitation in advanced find with related entities and the fact the we cannot add conditions to the "Does not contain data" relationship option.
Your criteria is not giving the results you expect because it is requesting the following (assuming you have selected the Contact table in advanced find):
Select Contacts whose modifiedOn date is older than 13 months and that
they are linked as the customer of a case whose modifiedOn date is
older than 13 months and these cases have a related email whose
modiedOn date is older than 13 months
So, what you are retrieving are old Contacts that had a at least one Case some time ago, and not Contacts with no recent activity (A Contact can have an old Case, but may/may not have a recent one).
Another thing to consider is that you are using the "Customer" column [Cases(Customer)] from the Case table. Depending on how your organizations is handling this column and since the Customer column can hold Accounts or Contacts, you might want to use "primarycontactid" [Cases(Contact)] or another custom column (I've seen some designs were a custom column is used to track the Contact).
Last year I had a request for an organization to automatically merge thousands of contacts following some rules. What I ended up doing was a Console Application and one of the steps was to validate if the Contacts had any interactions (Leads, Opportunities, Cases and Activities) and count them, this way when the merge was performed I chose the Contact with more related records as the main Contact. So a similar approach could be use in your scenario.
You can create an efficient query with QueryExpressions, but if you are not used to them you can use this FetchXML in the console Application
<fetch version="1.0" output-format="xml-platform" mapping="logical" distinct="true">
<entity name="contact">
<attribute name="fullname" />
<attribute name="telephone1" />
<attribute name="contactid" />
<order attribute="fullname" descending="false" />
<link-entity name="incident" from="primarycontactid" to="contactid" link-type="outer" alias="case">
<attribute name="incidentid" />
<filter type="and">
<condition attribute="modifiedon" operator="last-x-months" value="13" />
It will give you a list of Contacts and if they had a Case that was modified in the last 13 months, you'll get a GUID in the case.incidentid column, if you have a null value in case.incidentid, it means that there are no recent Cases and should be deleted.
Keep in mind that :
You might need to update the FetchXML to your needs.
You'll need to handle paging on the query results if there are more than 5,000 Contacts.
Depending on the number of Contacts in the system you'll want to create different batches to process them because it can take a while to complete.
It would be a good idea to create a report to validate the Contacts before deleting them.


DynamoDB query all users sorted by name

I am modelling the data of my application to use DynamoDB.
My data model is rather simple:
I have users and projects
Each user can have multiple projects
Users can be millions, project per users can be thousands.
My access pattern is also rather simple:
Get a user by id
Get a list of paginated users sorted by name or creation date
Get a project by id
get projects by user sorted by date
My single table for this data model is the following:
I can easily implement all my access patterns using table PK/SK and GSIs, but I have issues with number 2.
According to the documentation and best practices, to get a sorted list of paginated users:
I can't use a scan, as sorting is not supported
I should not use a GSI with a PK that would put all my users in the same partition (e.g. GSI PK = "sorted_user", SK = "name"), as that would make my single partition hot and would not scale
I can't create a new entity of type "organisation", put all users in there, and query by PK = "org", as that would have the same hot partition issue as above
I could bucket users and use write sharding, but I don't really know how I could practically query paginated sorted users, as bucket PKs would need to be possibly random, and I would have to query all buckets to be able to sort all users together. I also thought that bucket PKs could be alphabetical letters, but that could crated hot partitions as well, as the letter "A" would probably be hit quite hard.
My application model is rather simple. However, after having read all docs and best practices and watched many online videos, I find myself stuck with the most basic use case that DynamoDB does not seem to be supporting well. I suppose it must be quite common to have to get lists of users in some sort of admin panel for practically any modern application.
What would others would do in this case? I would really want to use DynamoDB for all the benefits that it gives, especially in terms of costs.
Since I have been asked, in my app the main use case for 2) is something like this:
As to the sizing, it needs to scale well, at least to the tens of thousands.
I also thought that bucket PKs could be alphabetical letters, but
that could create hot partitions as well, as the letter "A" would
probably be hit quite hard.
I think this sounds like a reasonable approach.
The US Social Security Administration publishes data about names on its website. You can download the list of name data from as far back as 1879! I stumbled upon a website from data scientist and linguist Joshua Falk that charted the baby name data from the SSA, which can give us a hint of how names are distributed by their first letter.
Your users may not all be from the US, but this can give us an understanding of how names might be distributed if partitioned by the first letter.
While not exactly evenly distributed, perhaps it's close enough for your use case? If not, you could further distribute the data by using the first two (or three, or four...) letters of the name as your partition key.
1 million names likely amount to no more than a few MBs of data, which isn't very much. Partitioning based on name prefixes seems like a reasonable way to proceed.
You might also consider using a tool like ElasticSearch, which could support your second access pattern and more.

Model Post and Topic through DynamoDB

Heres the relation I'm trying to model in DynamoDB:
My service contains posts and topics. A post may belong to multiple topics. A topic may have multiple posts. All posts have an interest value which would be adjusted based on a combination of likes and time since posted, interest measures the popularity of a post at the current moment. If a post gets too old, its interest value will be 0 and stay that way forever (archival).
The REST api end points work like this:
GET /posts/{id} returns a post-object containing title, text, author name and a link to the authors rest endpoint (doesn't matter for this example) and the number of likes (the interest value is not included)
GET /topics/{name} should return an object with both a list with the N newest posts of the topics as well as one for the N currently most interesting posts
POST /posts/ creates a new post where multiple topics can be specified
POST /topics/
creates a new topic
POST /likes/ creates a like for a specified post (does not actually create an object, just adds the user to the given post-object's list of likers, which is invisible to the users)
The problem now becomes, how do I create a relationship between topics and and posts in DynamoDB NoSql?
I thought about adding a list of copies of posts to tag entries in DynamboDB, where every tag has a list of both the newest and the most interesting Posts.
One way I could do this is by creating a cloudwatch job that would run every 10 minutes and loop through every topic object, finding both the most interesting and newest entries and then replacing the old lists of the topic.
Another job would also have to regularly update the "interest" value of every non archived post (keep in mind both likes and time have an effect on the interest value).
One problem with this is that a lot of posts in the Tag list would be out of date for 10 minutes in case the User makes a change or deletes the post. Likes will also not be properly tracked on the Tags post list. This could perhaps be solved with transactions, although dynamoDB is limited to 10 objects per transaction.
Another problem is that it would require the add-posts-to-tags job to load all the non archived posts into memory in order to manually sort them by both time and interest, split them up by tag and then adding the first N of both sets to the tag lists every 10 minutes.
I also had a another idea, by limiting the tags of a post that are allowed to 1, I could add the tag as a partition key, with the post-time as the sort key, and use a GSI to add Interest as a second sort key.
This does have several downsides though:
very popular tags may be limited to a single parition since all the posts share a single partition key
Tag limit is 1
A cloudwatch job to adjust the Interest value of posts may still be required
It would require use of a GSI which may lead to dangerous race conditions
But it would have the advantage that there are no replications of the post objects aside from the GSI. It would also allow basically infinite paging of all posts by date instead of being limited to just the N newest posts.
So what is a good approach here? It seams both of my solutions have horrible dealbreakers. Is this just one of those problems that NoSQL simply can't solve?
You are trying to model relational data using a non relational DB ,
to do this I would use 2 types of DB ,
I would store in dynamo the post information
in your example it would be :
GET /posts/{id}
POST /posts/
POST /likes/creates
For the topic related information I would use Elastic search (Amazon Elasticsearch Service)
GET /topics/{name} : the search index would stored the full topic info as well post id's that , and the relevant fields you want to search for (in your case update date to get the most recent posts)
what this will entail is background process (in dynamoDB this can be done via streams) that takes changes to the dynamoDB for new post's , update to like count etc.. and populates the search index.
Note: this can also be solved using graphDB but for scaling purposes better separate the source of the data (post's ) and the data relations (topic).

Designing an Oracle APEX DB for an Application - Mental Road Block

I need some help getting past a road block I've come across in creating my application in APEX.
This application will be to track financial disbursements from a company. It will utilize a one to many relationship. One associate to many different transaction details.
Using Quick SQL in APEX 19.2 I have created a couple tables. DISB and DISB_DTLS
Assignor vc
Processor vc
RCVD_DA date
PROC_DA date
ACT_NO number
AMT number
The problem I'm having is that I want to have the primary table DISB be for the associate. Hence "One Associate to Many Disbursements. However, we have so many details that it would make the interactive grid APEX uses way to big and squished when doing a Master Detail form. Yet the only way to modify two tables or a view would be a master detail form. That's why I put some disbursement info in the primary table DISB and not the DTLS table.
I know there are some creative applications out there, and need some help discovering what I can do in regards to updating multiple tables from one form, if possible. Or alternatives. I want to make this process easy for the associates. This was all in one spreadsheet at one point.
I recommend you don't compromise Database design over the UI.
What you can do in this case is filter segmentation.
Complete your Master-Detail as initially thought.
Some detail columns can be logically grouped so I would put some filters somewhere on the page which the users selects a Logical group of columns to be displayed. That way you hide/show the columns to ensure they fit on the screen. Think of Filters as radio buttons or even checkboxes, let the user choose what shows on the screen.

Kibana: can I store "Time" as a variable and run a consecutive search?

I want to automate a few search in one, here are the steps:
Search in Kibana for this ID:"b2c729b5-6440-4829-8562-abd81991e2a0" which will return me a bunch of logs. Of these logs I need to take the first and the last timestamp:
I now would like to store these two data FROM: September 3rd 2019, 21:28:22.155, TO: September 3rd 2019, 21:28:23.524 in 2 variables
Run a second search in Kibana for the word "fail" in between these two variable of time
How to automate the whole process without need of copy/paste and running a second query?
SHORT STORY LONG: I work in a company that produce a software for autonomous vehicles.
SCENARIO: A booking is rejected and we need to understand why.
WHERE IS THE PROBLE: I need to monitor just a few seconds of logs on 3 different machines. Each log is completely separated, there is no relation between the logs so I cannot write a query in discover, I need to run 3 separated queries.
A booking was rejected, so I open Chrome and I search on "" for the BookingID:"b2c729b5-6440-4829-8562-abd81991e2a0" and I have a dozen of logs returned during a range of 2 seconds (FROM: September 3rd 2019, 21:28:22.155, TO: September 3rd 2019, 21:28:23.524).
Now I need to know what was happening on the car so I open a new Chrome tab and I search on "" for the CarID: "Tesla-45-OU" on the time range FROM: September 3rd 2019, 21:28:22.155, TO: September 3rd 2019, 21:28:23.524
Now I need to know why the server which calculate the matching rejected the booking so I open a new Chrome tab and I search for the word CalculationMatrix always on the time range FROM: September 3rd 2019, 21:28:22.155, TO: September 3rd 2019, 21:28:23.524
CONCLUSION: I want to stop to keep opening Chrome tabs by hand and automate the whole thing. I have no idea around what time the book was made so I first need to search for the BookingID "b2c729b5-6440-4829-8562-abd81991e2a0", then store the timestamp of first and last log and run a second and third query based on those timestamps.
There is no relation between the 3 logs I search so there is no way to filter from the Discover, I need to automate 3 different query.
Here is how I would do it. First of all, from what I understand, you have three different indexes:
one for "bookings"
one for "cars"
one for "matchings"
First, in Discover, I would create three Saved Searches, one per index pattern. Then in Visualize, I would create a Vertical bar chart on the bookings saved search (Bucket X-Axis by date_histogram on the timestamp field, leave the rest as is). You'll get a nice histogram of all your booking events bucketed by time.
Finally, I would create a dashboard and add the vertical bar chart + those three saved searches inside it.
When done, the way I would search according to the process you've described above is as follows:
Search for the booking ID b2c729b5-6440-4829-8562-abd81991e2a0 in the top filter bar. In the bar chart histogram (bookings), you will see all documents related to the selected booking. On that chart, you can select the exact period from when the very first booking document happened to the very last. This will adapt the main time picker at the top and the start/end time will be "remembered" by Kibana
Remove the booking ID from the top filter (since we now know the time range and Kibana stores it). Search for Tesla-45-OU in the top filter bar. The bar histogram + the booking saved search + the matchings saved search will be empty, but you'll have data inside the second list, the one for cars. Find whatever you need to find in there and go to the next step.
Remove the car ID from the top filter and search for ComputationMatrix. Now the third saved search is going to show you whatever documents you need to see within that time range.
I'm lacking realistic data to try this out, but I definitely think this is possible as I've laid out above, probably with some adaptations.
Kibana does work like this (any order is ok):
Select time filter:
Add additional criteria for search like for example field s is b2c729b5-6440-4829-8562-abd81991e2a0.
Add aditional criteria for search like for example field x is Fail.
Additionaly you can view surrounding documents
This is how Kibana works.
You can prepare some filters beforehands, save them and then use them if you want to automate the process of discovering somehow.
You can do that in Discover tab in Kibana using New/Save/Open options.
I do not think you can achieve what you need in Kibana. As I mentioned earlier one option is to change the data that is comming to Elasticsearch so you can search for it via discover in Kibana. Another option could be builiding for example Java application, that is using Elasticsearch - then you can write algorithm that returns the data that you want. But i think it's a big overhead and I recommend checking the data first.
Edit: To clarify - you can create external Java let's say SpringBoot application that uses Elasticsearch - all the data that you need is inside it.
But in this option you will not use Kibana at all.
You can export the result to csv or what you want in the code.
SpringBoot application can ask ElasticSearch for whatever it needs, then it would be easy to store these time variables inside of Java code.
EDIT: After OP edited question to change it dramatically:
#FrancescoMantovani Well the edited version is very different from where you first posted here How to automate the whole process without need of copy/paste and running a second query? and search for word fail in a single shot. In accepted answer you are still using a three filters one at a time so it is not one search, but three.
What's more if you would use one index, and send data from multiple hosts via filebeat you don't even to have to create this dashboard to do that. Then you can you can select the exact period from when the very first document happened to the very last regarding filter and then remove it and add another filter that you need - it's simple as that. Before you were writing about one query,
How to automate the whole process without need of copy/paste and
running a second query?
not three. And you don't need to open new tab in Chrome each time you want to change filter just organize the data by for example using filebeat as mentioned before.
There is no relation between the 3 logs
From what you wrote the realation exist and it is time.
If the data is in for example three diferent indicies (cause documents don't have much similiar data) you can do it like that:
You change them easily in dicover see:
You can go to discover select index 1 search, select time range that you need, when you change index the time range is still the one you selected, you only need to change filter - you will get what you need.

Designing MySQL table for Achievements system

I am creating a database for an achievement system (like something you would see in a Blizzard game). I would like to have a GUI that displays the current progress of all achievements in the game which means I will need to query the progress of all achievements for a user in order to populate the GUI. I plan on having somewhere around 100 achievements.
This brings about a design question. What is the best way to design the database and querying code to query the progress of ~100 bit fields?
It seems like the brute force method would be to get the entire row of achievements and then for each field in the row do some hardcoded string comparison to determine which achievement we are dealing with.
Another possible solution may be to have a big switch statement based on the column index of the table and handle each achievement for each case (requires not modifying the table or you have to refactor a lot of C++ code).
I'm curious to hear any other designs you guys may have for this.
I suggest building a solution using 3 tables. These tables are users, achievements and user_achievements. A user would be identified with a u_id in the users table. An achievement would be identified with a a_id in the achievements table. You would then keep track of users achievements by inserting a row in the user_achievements table that includes a u_id to identify the user and a a_id to identify the achievement. The user_achievements table would also contain a column that would specify the % completion of that achievement for the given user.
Came across this question and even though it's 5 years old, perhaps someone would be interested in following approach.
Achievements are usually broken down to numbers (the rest, like Name, Description of each achievement can be put to site/app core to avoid bloating the DB).
lets be simple, we are not FB and don't need separate table for them, so in "users" table we add just 1 single column: "Achievements" it is a varchar(50). Number in brackets (50) will depend on your actual needs to this column (i.e. how much data it stores).
so you end up having in each cell of the Achievements column a numerical sequence: 10982039482084109384
Read this line of digits as follows, from left to right: user has reached "1098 profile views", received "2039 likes", etc. Optionally, add a separator for easier distinction + to instantly handle cases when as first user had 25 likes, then 125, then 2039 (2 digits, 3 digits, 4 digits - or another alternative is to use 0025 then 0125 then 2039 given you know max digits is 4 per achievement). But still lets say we decide to use separators, i.e. a comma:
Then once you need a data, just SELECT achievements belonging to specific userID and subsequently (if you added a separator)
explode (',', $array)
then your site php core knows that first 4 digits stand for "profile views" and lets say this means that he has a level 10 badge for profile views (1 badge for 100 views).
Thereon, you can easily do operations with no further need for SQL queries. Example, user wants to know his progress on achieving a level 20 badge, you display: he has a 1098/2000 (or 55%) progress.
At that, achievement Description, Name, level information is stored in site core, while percentage is calculated on the go.
Hope the logic is clear and may be useful to any1 in community out there.