How do I count the "role" instances in a cluster using RapidMiner - weka

I have a RapidMiner flow that takes a dataset and clusters it. In the output I can see my role, but I can't figure out a way to count the role per cluster. How can I count the number of roles per cluster. I've looked at the Aggregate node but my role isn't an available attribute.
Essentially, I'm trying to figure out if the clusters say anything about the role. I also use Weka and they call this "Classes to clusters evaluation". It basically shows how the class (or role) breakdown per cluster.
My current flow:
Only two attributes are available. My role isn't one of them.
There are 34 total attributes. I want to aggregate by ret_zpc

RapidMiner has the concept of roles. An attribute can be one of regular, id, cluster or label (and some others). There's even an operator, Set Role that allows the role to be changed. Outside RapidMiner, role, label and class get used interchangeably.
For your question, the Aggregate operator is what you need. Assuming you have an attribute in your example set with role Cluster and another with role Label you select these attributes as the ones to group by. For aggregation attribute, choose another attribute and select count as the aggregation function.
In your case, the attributes you want are not being populated in the drop downs but they can still be used. You just have to type them in manually and explicitly add them to the selection criteria. This absence of attributes can sometimes happen if RapidMiner cannot see any metadata for the attributes. If you change the Read CSV operator so that it has an explicit mapping you should find that the attributes appear for selection.

Related

DynamoDB one to many to many relationship

While the question on how to model 1 to many relationships is well answered on stackoverflow, I couldn't find any information for hierarchical lookup where every intermediate level must be accessible.
Let's assume the following entities: Accounts Group Instances InstanceProviders.
One account has multiple groups. One account has configured multiple InstanceProvider accounts. One group has access to multiple instances, one instance is assigned to one group only. The group name can be chosen freely and is tied to the account. Hence it must be unique on the account level.
The external instance name is provided by the InstanceProvider, uniquely within Account-InstanceProvider-InstanceId.
Now I need to answer the following read patterns:
Read instance with id
Read instance with external id from provider
Read all instances in a group (which depends on an account)
Read all instances in an account
Read all instances from a provider in an account
Read all instances from a provider in a group (which depends on an account)
...
Restrictions:
Group name unique within an account
One instance assigned to one group, not multiple
External ID unique within Account-Provider combination (avoid duplicates for the same external id)
The "Read all" part is where I am struggeling. These lookup would require an GSI each per level, since every sub-level is dependent on the level before it.
Like for one Instance
PK=ACCOUNT#123#INSTANCE#11b14ba1 SK=ACCOUNT#123#INSTANCE#11b14ba1
GSI1PK=ACCOUNT#123 GSI1SK=INSTANCE#11b14ba1
GSI2PK=ACCOUNT#123#PROVIDER#GoodCompany GSI2SK=GROUP#AdminGroup#INSTANCE#11b14ba1
GSI3PK=ACCOUNT#123#GROUP#AdminGroup GSI3SK=PROVIDER#GoodCompany#INSTANCE#11b14ba1
Here it's basically one GSI per attribute "chain". Is there a better way?
According to best practices for managing many-to-many relationships by AWS, I think you are doing great.
From my experience, I think you are designing the adjacency list design pattern.

How to assign column-level restriction on BigQuery table in asia-east1 location

I want to restrict access to certain PII columns of my BigQuery tables. My tables are present in location: asia-east1. The BigQuery 'Policy Tag' feature can create policy tags for enforcing column restrictions only in 'US' and 'EU' regions. When I try to assign these policy tags to my asia-east1 tables, it fails with error:
BigQuery error in update operation: Policy tag reference projectsproject-id/locations/us/taxonomies/taxonomy-id/policyTags/policytag-id
should contain a location that is in the same region as the dataset.
Any idea on how I can implement this column level restriction for my asia-east1 BigQuery tables?
Summarising our discussion from the comment section.
According to the documentation, BigQuery provides fine grained access to sensitive data based on type or data classification of the data. In order to achieve this, you can use Data Catalog to create a the taxonomy and policy for your data.
Regarding the location of the Policy tags, asia-east1. Currently, this feature is on Beta. This is a launch stage where the product is available for broader testing and use and new features/updates might be still taking place. For this reason, Data Catalog locations are limited to the ones listed here. As shown in the link, asia-east1 end point has Taiwan as the region.
As an addition information, here is a How to guide to implement Policy Tags in BigQuery.

How to get a filtered list of AWS EMR clusters?

I want to get list of EMR clusters that have a specific tag value.
I looked up the ListClusters API but that does not allow to add custom filters.
How can I apply filter in the API call or implement a two step solution (first get all clusters and then filter them)?
The service API doesn't expose a tags parameter in the request or response so you would need to first call ListClusters and then follow with DescribeCluster for every cluster id to expose the tags. An alternative approach would be to embed any tags or data in the cluster name for the list of clusters to be filterable by name after the first step but this is probably not a suitable approach as tags may change.

AWS: Is it possible to share DynamoDB items across multiple users?

By looking at the documentation on DynamoDB, I was able to find some examples of restricting item access for users based on the table's primary key. However, all of these examples only cover restricting access to a single user. Is there a way to allow access only for a group of users? From what I've read, this would come down to creating IAM groups/roles, but there is a limit on how many of each can be created, and it doesn't seem like doing so programmatically for each item would work well.
Your guess is correct; you would need an IAM policy per shared row.
There are no substitution variables currently available as far as I know to get the group(s) a user is part of, so no single IAM policy will be able to cover your use case.
Not only that, only the partition key can be matched with conditions in the IAM policy, so unless your partition key has a group name as part of it (which implies that users can never change groups) you will require, as you imply, an IAM policy per row in the database, which won't scale.
It could be acceptable if you have controls in place to limit the number of shared items, and are aggressive about cleaning up the policies for items that are no longer shared.
I don't think using AWS's built-in access controls to allow group access is going to work very well, though, and you'll be better off building a higher-level abstraction on top that does have the access control you need (using AWS Lambda, for example).

Base access to DynamoDB Records off of a attribute in a record

I am looking to limit access to my dynamodb records based off of an attribute in a table. From the documentation I read, there does not seem to be a straightforward way to do this unless I have made a huge error. Here is an example of what I would like to accomplish:
Table 1
userid
groups <-- (a list of groupids that the user belongs to)
Table 2
groupid
data
someotherdata
I would like the user in Table 1 to be able to have full access to the information in table 2. Is this possible via dynamodb?
Thanks
If you manage your groups in IAM, you can set a fine-grained access control policy on Table 1 that can restrict the visibility and access to items and attributes in items to particular groups.