What precautions do I need to take when sharing an AWS Amplify project publicly? - amazon-web-services

I'm creating a security camera IoT project that uploads images to S3 and will soon offer a UI to review those images. AWS Amplify is being used to make this happen quickly.
As I get started on the Amplify side of things, I'm noticing a config file that has very specifically named attributes and values. The team-provider-info.json file in particular that isn't ignored is very specific:
{
"dev": {
"awscloudformation": {
"AuthRoleName": "amplify-twintigersecurityweb-dev-123456-authRole",
"UnauthRoleArn": "arn:aws:iam::111164163333:role/amplify-twintigersecurityweb-dev-123456-unauthRole",
"AuthRoleArn": "arn:aws:iam::111164163333:role/amplify-twintigersecurityweb-dev-123456-authRole",
"Region": "us-east-1",
"DeploymentBucketName": "amplify-twintigersecurityweb-dev-123456-deployment",
"UnauthRoleName": "amplify-twintigersecurityweb-dev-123456-unauthRole",
"StackName": "amplify-twintigersecurityweb-dev-123456",
"StackId": "arn:aws:cloudformation:us-east-1:111164163333:stack/amplify-twintigersecurityweb-dev-123456/88888888-8888-8888-8888-888838f58888",
"AmplifyAppId": "dddd7dx2zipppp"
}
}
}
May I post this to my public repository without worry? Is there a chance for conflict in naming? How would one pull this in for use in their new project?

Per AWS Amplify documentation:
If you want to share a project publicly and open source your serverless infrastructure, you should remove or put the amplify/team-provider-info.json file in gitignore file.
At a glance, everything else generated by amplify init NOT in the .gitignore file is ok to share. e.g. project-config.json and backend-config.json.
Add this to .gitignore:
# not to share if public
amplify/team-provider-info.json

Related

Google Dataprep copy flows from one project to another

I have two Google projects: dev and prod. I import data from also different storage buckets located in these projects: dev-bucket and prod-bucket.
After I have made and tested changes in the dev environment, how can I smoothly apply (deploy/copy) the changes to prod as well?
What I do now is I export the flow from devand then re-import it into prod. However, each time I need to manually do the following in the `prod flows:
Change the dataset that serve as inputs in the flow
Replace the manual and scheduled destinations for the right BigQuery dataset (dev-dataset-bigquery and prod-dataset-bigquery)
How can this be done more smoother?
If you want to copy data between Google Cloud Storage (GCS) buckets dev-bucket and prod-bucket, Google provides a Storage Transfer Service with this functionality. https://cloud.google.com/storage-transfer/docs/create-manage-transfer-console You can either manually trigger data to be copied from one bucket to another or have it run on a schedule.
For the second part, it sounds like both dev-dataset-bigquery and prod-dataset-bigquery are loaded from files in GCS? If this is the case, the BigQuery Transfer Service may be of use. https://cloud.google.com/bigquery/docs/cloud-storage-transfer You can trigger a transfer job manually, or have it run on a schedule.
As others have said in the comments, if you need to verify data before initiating transfers from dev to prod, a CI system such as spinnaker may help. If the verification can be automated, a system such as Apache Airflow (running on Cloud Composer, if you want a hosted version) provides more flexibility than the transfer services.
Follow below procedure for movement from one environment to another using API and for updating the dataset and the output as per new environment.
1)Export a plan
GET
https://api.clouddataprep.com/v4/plans/<plan_id>/package
2)Import the plan
Post:
https://api.clouddataprep.com/v4/plans/package
3)Update the input dataset
PUT:
https://api.clouddataprep.com/v4/importedDatasets/<datset_id>
{
"name": "<new_dataset_name>",
"bucket": "<bucket_name>",
"path": "<bucket_file_name>"
}
4)Update the output
PATCH
https://api.clouddataprep.com/v4/outputObjects/<output_id>
{
"publications": [
{
"path": [
"<project_name>",
"<dataset_name>"
],
"tableName": "<table_name>",
"targetType": "bigquery",
"action": "create"
}
]
}

Permissions Issue with Google Cloud Data Fusion

I'm following the instructions in the Cloud Data Fusion sample tutorial and everything seems to work fine, until I try to run the pipeline right at the end. Cloud Data Fusion Service API permissions are set for the Google managed Service account as per the instructions. The pipeline preview function works without any issues.
However, when I deploy and run the pipeline it fails after a couple of minutes. Shortly after the status changes from provisioning to running the pipeline stops with the following permissions error:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X.",
"reason" : "forbidden"
} ],
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X."
}
xxxxxxxxxxx-compute#developer.gserviceaccount.com is the default Compute Engine service account for my project.
"Project X" is not one of mine though, I've no idea why the pipeline startup code is trying to create a bucket there, it does successfully create temporary buckets ( one called df-xxx and one called dataproc-xxx) in my project before it fails.
I've tried this with two separate accounts and get the same error in both places. I had tried adding storage/admin roles to the various service accounts to no avail but that was before I realized it was attempting to access a different project entirely.
I believe I was able to reproduce this. What's happening is that the BigQuery Source plugin first creates a temporary working GCS bucket to export the data to, and I suspect it is attempting to create it in the Dataset Project ID by default, instead of your own project as it should.
As a workaround, create a GCS bucket in your account, and then in the BigQuery Source configuration of your pipeline, set the "Temporary Bucket Name" configuration to "gs://<your-bucket-name>"
You are missing setting up permissions steps after you create an instance. The instructions to give your service account right permissions is in this page https://cloud.google.com/data-fusion/docs/how-to/create-instance

If there are a way to get info at runtime about SparkMetrics configuration

I add metrics.properties file to resource directory (maven project) with CSV sinc. Everything is fine when I run Spark app locally - metrics appears. But when I file same fat jar to Amazon EMR I do not see any tries to put metrics into CSV sinc. So I want to check at runtime what are loaded settings for SparkMetrics subsystem. If there are any possibility to do this?
I looked into SparkEnv.get.metricsSystem but didn't find any.
That is basically because Spark on EMR is not picking up your custom metrics.properties file from the resources dir of the fat jar.
For EMR the preferred way to configure is through EMR Configurations API in which you need to pass the classification and properties in an embedded JSON.
For spark metrics subsystem here is an example to modify a couple of metrics
[
{
"Classification": "spark-metrics",
"Properties": {
"*.sink.csv.class": "org.apache.spark.metrics.sink.CsvSink",
"*.sink.csv.period": "1"
}
}
]
You can use this JSON when creating EMR cluster using Amazon Console or through SDK

terraform apply keeps changing things even though no tf files have changed

I have a moderately complex terraform setup with
a module directory containing a main.tf, variables.tf and input.tf
and environments directory containing foo.tf, variables.tf and vars.tf
I can successfully run terraform apply and everything succeeds.
But, if I immediately run terraform apply again it makes changes.
The changes it keeps making are to resources in the module...resources that get attributes from variables in the environments tf files. I'm creating an MQ broker and a dashboard to monitor it.
In the environments directory
top.tf
module "broker" {
source = "modules/broker"
dashboard = "...."
}
In the modules directory
input.tf
variable "dashboard" {
}
amazonmq.tf
resource "aws_cloudwatch_dashboard" "mydash" {
dashboard_name = "foo"
dashboard_body = "${dashboard}"
}
Every time I run terraform apply it says it needs to change the dashboard. Any hints on what I'm doing wrong? (I've tried running with TF_LOG=DEBUG but I can't see anything that says why a change is needed). Thanks in advance.
This seems to be an issue with the terraform provider code itself. The dashboard_body property should have the computed flag attached to it, to allow you to provide it but ignore any incoming changes from aws.
I've opened up an issue on the github page. You'll find it here: https://github.com/terraform-providers/terraform-provider-aws/issues/5729

What does 'Logging' do in Dockerrun.aws.json

I'm struggling to work out what the Logging tag does in the Dockerrun.aws.json file for a Single Container Docker Configuration. All the official docs say about it is Logging – Maps the log directory inside the container.
This sounds like they essentially create a volume from /var/log on the EC2 instance to a directory in the docker filesystem as specified by Logging. I have the following Dockerrun.aws.json file:
{
"AWSEBDockerrunVersion": "1",
...
"Logging": "/var/log/supervisor"
}
However, when I go to the AWS Console and request the logs for my instance, none of my custom log files located in /var/log/supervisor are in the log bundles. Can anyone explain to me what the purpose of this Logging tag is and how I may use it (or not) to retrieve my custom logs.
EDIT
Here are the Volume mappings for my container (didn't think to check that):
"Volumes": {
"/var/cache/nginx": "/var/lib/docker/vfs/dir/ff6ecc190ba3413660a946c557f14a104f26d33ecd13a1a08d079a91d2b5158e",
"/var/log/supervisor": "/var/log/eb-docker/containers/eb-current-app"
},
"VolumesRW": {
"/var/cache/nginx": true,
"/var/log/supervisor": true
}
It turns out that /var/log/supervisor is mapping to /var/log/eb-docker/containers/eb-current-app rather than /var/log as I originally suspected. It'd be nice if this was clearer in the documentation.
But it also turns out that I was running the wrong Docker Image which explains why my log files weren't appearing anywhere! Doh!