DynamoDB one to many to many relationship - amazon-web-services

While the question on how to model 1 to many relationships is well answered on stackoverflow, I couldn't find any information for hierarchical lookup where every intermediate level must be accessible.
Let's assume the following entities: Accounts Group Instances InstanceProviders.
One account has multiple groups. One account has configured multiple InstanceProvider accounts. One group has access to multiple instances, one instance is assigned to one group only. The group name can be chosen freely and is tied to the account. Hence it must be unique on the account level.
The external instance name is provided by the InstanceProvider, uniquely within Account-InstanceProvider-InstanceId.
Now I need to answer the following read patterns:
Read instance with id
Read instance with external id from provider
Read all instances in a group (which depends on an account)
Read all instances in an account
Read all instances from a provider in an account
Read all instances from a provider in a group (which depends on an account)
...
Restrictions:
Group name unique within an account
One instance assigned to one group, not multiple
External ID unique within Account-Provider combination (avoid duplicates for the same external id)
The "Read all" part is where I am struggeling. These lookup would require an GSI each per level, since every sub-level is dependent on the level before it.
Like for one Instance
PK=ACCOUNT#123#INSTANCE#11b14ba1 SK=ACCOUNT#123#INSTANCE#11b14ba1
GSI1PK=ACCOUNT#123 GSI1SK=INSTANCE#11b14ba1
GSI2PK=ACCOUNT#123#PROVIDER#GoodCompany GSI2SK=GROUP#AdminGroup#INSTANCE#11b14ba1
GSI3PK=ACCOUNT#123#GROUP#AdminGroup GSI3SK=PROVIDER#GoodCompany#INSTANCE#11b14ba1
Here it's basically one GSI per attribute "chain". Is there a better way?

According to best practices for managing many-to-many relationships by AWS, I think you are doing great.
From my experience, I think you are designing the adjacency list design pattern.

Related

BigQuery - Inerheritance of Permissions

TL;DR
We are looking for a way for letting a service account inherit BQ read permissions from more than one other service account. Impersonation does only work with one.
The scenario
Our company follows a data mesh approach in which our product teams are responsible for integrating their data into BigQuery. The product owner is also considered the owner of the data. So it is the product owner to decides whom to give access to the data.
Working in an analytical team, we usually combine data from multiple source systems in our BigQuery queries. Our ETL processes run on a kubernetes cluster, each process is using a separate service account. This gives us fine-grained access control as access to data is limited for each process to those very objects that they really need. This design also helps us with debugging and cost control. On the other hand, this leads to an issue on source side:
The problem
Every time we design a new process, we need to ask the data owner for allowance. They already agreed that our product team / system at wholesome level may access their data, so this authorization process is quite cumbersome and confuses the data owner.
We'd prefer to have just one "proxy" service account for each source object that holds the neccessary BQ read permissions. The processes' service accounts would then be set up to inherit the rights from the proxy service account of those BQ sources they need to access.
Using impersonation does only help if it's just one source system, but our queries often use more than one.
Using Google Groups does not help
We discussed a solution in which we setup a google group for each source system we want to read from. BigQuery Data Reader role will then be assigned to this group. In turn, service accounts that require those rights will be added to the group. However, company policy does not allow for adding service accounts to google groups. Also, google groups cannot be managed (created) by our product teams themselves, so this approach lacks flexibility.
Implementing a coarse-grained approach
One approach is to use a more coarsed-grained access control, i.e. just using one service account for all ETL processes. We could add the process name as a label to the query to cover the debugging and cost control part. However, if possible, we'd prefer an approach in which the processes can only access as little data objects as possible.
You haven't easy solution.
Data governance is in place to control the quality, the source and the access to the data. It's normal to ask them to access the data.
Special groups could have access to all the data source (after a request to the data gov team of each data mesh instance).
However, groups with service account aren't allowed.
The only solution that I see is to use a service account, authorized on all the data mesh instances, and you impersonate it to access to all the sources.
It's not the most perfect for traceabilty, but I don't see any other good solution for that.

Change AWS SageMaker LogGroup Prefix?

We have applications for multiple tenants on our AWS account and would like to distinguish between them in different IAM roles. In most places this is already possible by limiting resource access based on naming patterns.
For CloudWatch log groups of SageMaker training jobs however I have not seen a working solution yet. The tenants can choose the job name arbitrarily, and hence the only part of the LogGroup name that is available for pattern matching would be the prefix before the job name. This prefix however seems to be fixed to /aws/sagemaker/TrainingJobs.
Is there a way to change or extend this prefix in order to make such limiting possible? Say, for example /aws/sagemaker/TrainingJobs/<product>-<stage>-<component>/<training-job-name>-... so that a resource limitation like /aws/sagemaker/TrainingJobs/<product>-* becomes possible?
I think it is not possible to change the log streams names for any of the SageMaker services.

AWS: Is it possible to share DynamoDB items across multiple users?

By looking at the documentation on DynamoDB, I was able to find some examples of restricting item access for users based on the table's primary key. However, all of these examples only cover restricting access to a single user. Is there a way to allow access only for a group of users? From what I've read, this would come down to creating IAM groups/roles, but there is a limit on how many of each can be created, and it doesn't seem like doing so programmatically for each item would work well.
Your guess is correct; you would need an IAM policy per shared row.
There are no substitution variables currently available as far as I know to get the group(s) a user is part of, so no single IAM policy will be able to cover your use case.
Not only that, only the partition key can be matched with conditions in the IAM policy, so unless your partition key has a group name as part of it (which implies that users can never change groups) you will require, as you imply, an IAM policy per row in the database, which won't scale.
It could be acceptable if you have controls in place to limit the number of shared items, and are aggressive about cleaning up the policies for items that are no longer shared.
I don't think using AWS's built-in access controls to allow group access is going to work very well, though, and you'll be better off building a higher-level abstraction on top that does have the access control you need (using AWS Lambda, for example).

Is it possible for DescribeInstances to return multiple instances when given instance ID filters?

If I pass an instance ID as a filter to DescribeInstances, is it ever possible to get multiple reservations or instances? I am checking to see if I can simplify my code to just access index 0 of reservations or instances returned rather than a for loop
No, you will always receive zero or one instance when you are filtering by an instance ID.
Instance IDs are unique and you'll never receive a duplicate id. (More information: Resource IDs Guide)
Even in cases where you have similar instance IDs for multiple instances, like i-abc123 and i-abc12345, when you query the API it will retrieve you the information of the instance that has exactly the same id of your filter.

How do I count the "role" instances in a cluster using RapidMiner

I have a RapidMiner flow that takes a dataset and clusters it. In the output I can see my role, but I can't figure out a way to count the role per cluster. How can I count the number of roles per cluster. I've looked at the Aggregate node but my role isn't an available attribute.
Essentially, I'm trying to figure out if the clusters say anything about the role. I also use Weka and they call this "Classes to clusters evaluation". It basically shows how the class (or role) breakdown per cluster.
My current flow:
Only two attributes are available. My role isn't one of them.
There are 34 total attributes. I want to aggregate by ret_zpc
RapidMiner has the concept of roles. An attribute can be one of regular, id, cluster or label (and some others). There's even an operator, Set Role that allows the role to be changed. Outside RapidMiner, role, label and class get used interchangeably.
For your question, the Aggregate operator is what you need. Assuming you have an attribute in your example set with role Cluster and another with role Label you select these attributes as the ones to group by. For aggregation attribute, choose another attribute and select count as the aggregation function.
In your case, the attributes you want are not being populated in the drop downs but they can still be used. You just have to type them in manually and explicitly add them to the selection criteria. This absence of attributes can sometimes happen if RapidMiner cannot see any metadata for the attributes. If you change the Read CSV operator so that it has an explicit mapping you should find that the attributes appear for selection.