AWS Glue Column Level access Control - amazon-web-services

How to give column level access to particular roles in Glue Catalog? I want to give Role_A permissions to only column_1 and column_2 of Table XYZ And, Role_B to give access to all columns of Table XYZ.

AWS Glue offers fine grained access only for tables/databases [1]. If you want users to restrict to only few columns then you have to use AWS Lake Formation. Refer to this which has examples.
For example if you want to give access to only two columns prodcode and location then you can achieve it by doing as shown below:
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "SELECT" --resource '{ "TableWithColumns": {"DatabaseName":"retail", "Name":"inventory", "ColumnNames": ["prodcode","location"]}}'

Related

Ways to backup AWS Athena views

In an AWS Athena instance we have several user-created views.
Would like to back-up the views.
Have been experimenting using AWS CLI
aws athena start-query-execution --query-string “show views...
and for each view
aws athena start-query-execution --query-string “show create views...
and then
aws athena get-query-execution --query-execution-id...
to get the s3 location for the create view code.
Looking for ways to get the view definitions backed up.If AWS CLI is the best suggestion, then I will create a Lambda to do the backup.
I think SHOW VIEWS is the best option.
Then you can get the Data Definition Language (DDL) with SHOW CREATE VIEW.
There are a couple of ways to back the views up. You could use GIT (AWS offers CodeCommit). You could definitely leverage CodeCommit in a Lambda Function using Boto3.
In fact, just by checking the DDL, you are in fact backing them up to [S3].
Consider the following DDL:
CREATE EXTERNAL TABLE default.dogs (
`breed_id` int,
`breed_name` string,
`category` string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION
's3://stack-exchange/48836509'
TBLPROPERTIES ('skip.header.line.count'='1')
and the following view based on it.
CREATE VIEW default.vdogs AS SELECT * FROM default.dogs;
When we show the DDL:
$ aws athena start-query-execution --query-string "SHOW CREATE VIEW default.vdogs" --result-config
uration OutputLocation=s3://stack-exchange/66620228/
{
"QueryExecutionId": "ab21599f-d2f3-49ce-89fb-c1327245129e"
}
We write to S3 (just like any Athena query).
$ cat ab21599f-d2f3-49ce-89fb-c1327245129e.txt
CREATE VIEW default.vdogs AS
SELECT *
FROM
default.dogs

Restrict AppSync permissions in AWS CDK

I am trying to build an AppSync API connected to a DynamoDB table in AWS using the CDK in Python. I want this to be a read only API with no create, delete, update. In my stack I add the AppSync API:
# AppSync API for data and data catalogue queries
api = _appsync.GraphqlApi(self,
'DataQueryAPI',
name='dataqueryapi',
log_config=_appsync.LogConfig(field_log_level=_appsync.FieldLogLevel.ALL),
schema=_appsync.Schema.from_asset('graphql/schema.graphql')
)
I then add the DynamoDB table as a data source as follows:
# Data Catalogue DynamoDB Table as AppSync Data Source
data_catalogue_api_ds = api.add_dynamo_db_data_source(
'DataCatalogueDS',
data_catalogue_dynamodb
)
I later add some resolvers with mapping templates but even after just the above, if I run cdk diff I see that this will create permission changes that appear to grant full access to AppSync when interacting with the DynamoDB table.
I only want this to be a read only API and so the question is how can I restrict permissions so that the AppSync API can only read from the table?
What I have tried was to add a role that would explicitly grant query permissions in the hope that this would prevent the creation of the wider set of permissions but it didn't have that effect and I'm not really sure where I was going with it or if it was on the right track:
role = _iam.Role(self,
"Role",
assumed_by=_iam.ServicePrincipal("appsync.amazonaws.com")
)
api.grant_query(role, "getData")
Following a comment on this question I have swapped add_dynamo_db_data_source for DynamoDbDataSource as it has a read_only_access parameter. So I am now using:
data_catalogue_api_ds = _appsync.DynamoDbDataSource(self,
'DataCatalogueDS',
table=data_catalogue_dynamodb,
read_only_access=True,
api=api
)
Which seems to then just give me read permissions:

Restrict user from executing INSERT queries on athena

I want to restrict user from executing INSERT queries in master table(Not CTAS table) in athena.
If there way, I can achieve this ?
user will executing queries from Lambda.
Athena just supports StartQueryExecution and StopQueryExecution as actions in IAM permission policies - so there is no differentiation which type of SQL Command (DDL, DML) is being executed.
However, I think you can overcome this by denying permissions on glue and S3 so Athena queries that try to execute INSERTs will fail:
glue permissions can be managed on catalog, database and table level, some examples can be found in AWS' Identity-Based Policies (IAM Policies) for Access Control for Glue
Relevant glue actions to deny: BatchCreatePartition, CreatePartition, UpdatePartition - see Actions, resources, and condition keys for AWS Glue
On S3 you need to deny PutObject or Put* for the S3 location of the specific table, see Actions defined by Amazon S3 - again this can be defined on a object level in a bucket.

"aws dynamodb list-tables" not showing the tables present

When I use:
aws dynamodb list-tables
I am getting:
{
"TableNames": []
}
I gave the region as default as I did the same while aws configure.
I also tried with specific region name.
When I check in AWS console also I don't see any DynamoDB tables, but i am able to access the table programmatically. Able to add and modify item as well.
But no result when enter I use aws dynamodb list-tables and also no tables when I check in console.
This is clearly a result of the commands looking in the wrong place.
DynamoDB tables are stored in an account within a region. So, if there is definitely a table but none are showing, then the credentials being used either belong to a different AWS Account or the command is being sent to the wrong region.
You can specify a region like this:
aws dynamodb list-tables --region ap-southeast-2
If you are able to access the table dynamically, the make sure the same credentials being used by your program are also being used for the AWS CLI.
We need to specify the endpoint in the command which will work . As the above dynamodb is used programmatically and used as wed app.
this command will work :
aws dynamodb list-tables --endpoint-url http://localhost:8080 --region us-west-2
Check the region you set up in AWS configuration vs what is displayed at the top of the AWS console. I had my app configured to us-east-2 but the AWS console had us-east-1 as the default. I was able to view my table once the correct region was selected in the AWS console.

Athena queries between tables in different accounts

I can individually access two different Athena tables using two different IAM roles because each lie in different accounts.
Is there a way to run a single query that pulls from both (ie. INNER JOIN)?
Under the hood, Athena table data is in S3 bucket.
Athena supports cross-account S3 bucket access.
Assume you've Account A & B and Athena table TableA and TableB respectively.
Steps to run query from AccountA Athena (access cross-account data):
Provide AccountA IAM Role read access on AccountB S3 bucket Policy
(where TableB data resides).
Create TableB in AccountA Athena, referring to S3 bucket data in
AccountB.
Use TableA & and TableB in AccountA and do Inner join
Ref (cross-account access): https://aws.amazon.com/athena/faqs/