I need to check the list of aws_vpc_endpoint_service_allowed_principal from a specific aws_vpc_endpoint_service.
The aws_vpc_endpoint_service data source does not return the list of allowed_principals.
Does anyone know how can I retrieve that information?
Since the data source for that resource does not exist, you can use external data source with a custom script to query the required information.
Here's an example script (get_vpc_endpoint_service_permissions.sh) that fetches the required information:
#!/bin/bash
sep=$(aws ec2 describe-vpc-endpoint-service-permissions --service-id vpce-svc-03d5ebb7d9579a2b3 --query 'AllowedPrincipals')
jq -n --arg sep "$sep" '{"sep":$sep}'
and here's how you consume it in terraform:
data "external" "vpc_endpoint_service_permissions" {
program = ["bash", "get_vpc_endpoint_service_permissions.sh"]
}
output "vpc_endpoint_service_permissions" {
value = data.external.vpc_endpoint_service_permissions.result.sep
}
data.external.vpc_endpoint_service_permissions.result.sep contains the output of the bash script, which is a JSON array that you can access/manipulate as needed.
Related
I have a CSV file in AWS S3.
The file is very large 2.5 Gigabytes
The file has a single column of strings, over 120 million:
apc.com
xyz.com
ggg.com
dddd.com
...
How can I query the file to determine if the string xyz.com is located in the file?
I only need to know if the string is there or not, I don't need to return the file.
Also it will be great if I can pass multiple strings for search and return only the ones that were found in the file.
For example:
Query => ['xyz.com','fds.com','ggg.com']
Will return => ['xyz.com','ggg.com']
The "S3 Select" SelectObjectContent API enables applications to retrieve only a subset of data from an object by using simple SQL expressions. Here's a Python example:
res = client.select_object_content(
Bucket="my-bucket",
Key="my.csv",
ExpressionType="SQL",
InputSerialization={"CSV": { "FileHeaderInfo": "NONE" }}, # or IGNORE, USE
OutputSerialization={"JSON": {}},
Expression="SELECT * FROM S3Object s WHERE _1 IN ['xyz.com', 'ggg.com']") # _1 refers to the first column
See this AWS blog post for an example with output parsing.
If you use the aws s3 cp command you can send the output to stdout:
aws s3 cp s3://yourbucket/foo.csv - | grep 'apc.com'
- The dash will send the output to stdout.
this are two examples of grep checking on multiple patterns:
aws s3 cp s3://yourbucket/foo.csv - | grep -e 'apc.com' -e 'dddd.com'
aws s3 cp s3://yourbucket/foo.csv - | grep 'apc.com\|dddd.com'
To learn more about grep, please look at the manual: GNU Grep 3.7
What are some of the options to back up BigQuery DDLs - particularly views, stored procedure and function code?
We have a significant amount of code in BigQuery and we want to automatically back this up and preferably version it as well. Wondering how others are doing this.
Appreciate any help.
Thanks!
In order to keep and track our BigQuery structure and code, we're using Terraform to manage every resources in big query.
More specifically to your question, We use google_bigquery_routine resource to make sure the changes are reviewed by other team members and every other benefit you get from working with VCS.
Another important part of our TerraForm code is the fact we version our BigQuery module (via github releases/tags) that includes the Tables structure and Routines, version it and use it across multiple environments.
Looks something like:
main.tf
module "bigquery" {
source = "github.com/sample-org/terraform-modules.git?ref=0.0.2/bigquery"
project_id = var.project_id
...
... other vars for the module
...
}
terraform-modules/bigquery/main.tf
resource "google_bigquery_dataset" "test" {
dataset_id = "dataset_id"
project_id = var.project_name
}
resource "google_bigquery_routine" "sproc" {
dataset_id = google_bigquery_dataset.test.dataset_id
routine_id = "routine_id"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = "CREATE FUNCTION Add(x FLOAT64, y FLOAT64) RETURNS FLOAT64 AS (x + y);"
}
This helps us upgrading our infrastructure across all environments without additional code changes
We finally ended up backing up DDLs and routines using INFORMATION_SCHEMA. A scheduled job extracts the relevant metadata and then uploads the content into GCS.
Example SQLs:
select * from <schema>.INFORMATION_SCHEMA.ROUTINES;
select * from <schema>.INFORMATION_SCHEMA.VIEWS;
select *, DDL from <schema>.INFORMATION_SCHEMA.TABLES;
You have to explicitly specify DDL in the column list for the table DDLs to show up.
Please check the documentation as these things evolve rapidly.
I write a table/views and a routines (stored procedures and functions) definition file nightly to Cloud Storage using Cloud Run. See this tutorial about setting it up. Cloud Run has an HTTP endpoint that is scheduled with Cloud Scheduler. It essentially runs this script:
#!/usr/bin/env bash
set -eo pipefail
GCLOUD_REPORT_BUCKET="myproject-code/backups"
objects_report="gs://${GCLOUD_REPORT_BUCKET}/objects-backup-report-$(date +%s).txt"
routines_report="gs://${GCLOUD_REPORT_BUCKET}/routines-backup-report-$(date +%s).txt"
project_id="myproject-dw"
table_defs=()
routine_defs=()
# get list of datasets and table definitions
datasets=$(bq ls --max_results=1000 | grep -v -e "fivetran*" | awk '{print $1}' | tail +3)
for dataset in $datasets
do
echo ${project_id}:${dataset}
# write tables and views to file
tables=$(bq ls --max_results 1000 ${project_id}:${dataset} | awk '{print $1}' | tail +3)
for table in $tables
do
echo ${project_id}:${dataset}.${table}
table_defs+="$(bq show --format=prettyjson ${project_id}:${dataset}.${table})"
done
# write routines (stored procs and functions) to file
routines=$(bq ls --max_results 1000 --routines=true ${project_id}:${dataset} | awk '{print $1}' | tail +3)
for routine in $routines
do
echo ${project_id}:${dataset}.${routine}
routine_defs+="$(bq show --format=prettyjson --routine=true ${project_id}:${dataset}.${routine})"
done
done
echo $table_defs | jq '.' | gsutil -q cp -J - "${objects_report}"
echo $routine_defs | jq '.' | gsutil -q cp -J - "${routines_report}"
# /dev/stderr is sent to Cloud Logging.
echo "objects-backup-report: wrote to ${objects_report}" >&2
echo "Wrote objects report to ${objects_report}"
echo "routines-backup-report: wrote to ${routines_report}" >&2
echo "Wrote routines report to ${routines_report}"
The output is essentially the same as writing a bq ls and bq show commands for all datasets with the results piped to a text file with a date. I may add this to git, but the file includes a timestamp so you know the state of BigQuery by reviewing the file for a certain date.
I have >100 files where each line is a json. It looks something like this (no commas & no []):
{"one":"one","two":{"tree":...}}
{"one":"one","two":{"tree":...}}
...
{"one":"one","two":{"tree":...}}
To be able to use aws firehose put-record-batch, file needs to be in the format:
[
{
"Data": blob
},
{
"Data": blob
},
...
]
I want to put all of these files to aws Firehose from terminal.
I'm looking to write a shell script that looks something like this:
for file in files
do
aws firehose put-record-batch --delivery-stream-name <name> --records file://$file
done
So there're 2 questions:
How to transform the files into the applicable format
And, how to iterate through all the files
for file in *.json;
do
jq -s . "${file}" >${file}.tmp && mv ${file}.tmp $file
done
This will read all the json file in the current directory and change it into the desired form and save to the file .
OR if you do not have jq here is alternate way using python's json module.
for file in *.json;do
while read line ; do
echo $line | python -m json.tool
done < ${file} |awk 'BEGIN{print "["}{print}END{print "]"}'
done
Can someone help me out in handling a dynamic "ParameterValue" in parameter.json file.
I'm running "cloudformation create-stack" and passing in --parameters a parameter.json file, there are few "ParameterValue" in the file that needs to be dynamic for example, timestamp and appending index values from loop etc... so, how can i modify the parameters.json file to handle dynamic values.
Alternate way i could go with is to just not use the parameters.json file and pass in the key, value like below to the create-stack command inside the loop in the script,
--parameters ParameterKey="XYZ",ParameterValue="${someval}${index}"
I would create parameters.json.template file to hold the values in their parameterized form like you show:
[
{
"ParameterKey": "XYZ",
"ParameterValue": "{someval}{index}"
},
{
"ParameterKey": "ABC",
"ParameterValue": "staticval-{suffix}"
}
]
I am assuming you are doing this on the cli, based on the use of the --parameters flag. In that case, I would create a script to merge the template file with the values (into a generated file) and call the create-stack cli command after that.
Something like this on linux:
#! /bin/bash
# create output file from template
cp templates/parameters.json.template generated/parameters.json
# merge dynamic values into templated file
sed -i "s/{someval}/$SOME_VAL/g" generated/parameters.json
sed -i "s/{index}/$INDEX/g" generated/parameters.json
sed -i "s/{suffix}/$SUFFIX/g" generated/parameters.json
aws cloudformation create-stack ... --parameters generated/parameters.json ...
This of course assumes your script has access to your dynamic values.
I am using 'terraform apply' in a shell script to create multiple EC2 instances. I need to output the list of generated IPs to a script variable & use the list in another sub-script. I have defined output variables for the ips in a terraform config file - 'instance_ips'
output "instance_ips" {
value = [
"${aws_instance.gocd_master.private_ip}",
"${aws_instance.gocd_agent.*.private_ip}"
]
}
However, the terraform apply command is printing entire EC2 generation output apart from the output variables.
terraform init \
-backend-config="region=$AWS_DEFAULT_REGION" \
-backend-config="bucket=$TERRAFORM_STATE_BUCKET_NAME" \
-backend-config="role_arn=$PROVISIONING_ROLE" \
-reconfigure \
"$TERRAFORM_DIR"
OUTPUT = $( terraform apply <input variables e.g -
var="aws_region=$AWS_DEFAULT_REGION">
-auto-approve \
-input=false \
"$TERRAFORM_DIR"
)
terraform output instance_ips
So the 'OUTPUT' script variable content is
Terraform command: apply Initialising the backend... Successfully
configured the backend "s3"! Terraform will automatically use this
backend unless the backend configuration changes. Initialising provider
plugins... Terraform has been successfully initialised!
.
.
.
aws_route53_record.gocd_agent_dns_entry[2]: Creation complete after 52s
(ID:<zone ............................)
aws_route53_record.gocd_master_dns_entry: Creation complete after 52s
(ID:<zone ............................)
aws_route53_record.gocd_agent_dns_entry[1]: Creation complete after 53s
(ID:<zone ............................)
Apply complete! Resources: 9 added, 0 changed, 0 destroyed. Outputs:
instance_ips = [ 10.39.209.155, 10.39.208.44, 10.39.208.251,
10.39.209.227 ]
instead of just the EC2 ips.
Firing the 'terraform output instance_ips' is throwing a 'Initialisation Required' error which I understand means 'terraform init' is required.
Is there any way to suppress ec2 generation & just print output variables. if not, how to retrieve the IPs using 'terraform output' command w/o needing to do a terraform init ?
If I understood the context correctly, you can actually create a file in that directory & that file can be used by your sub-shell script. You can do it by using a null_resource OR "local_file".
Here is how we can use it in a modularized structure -
Using null_resource -
resource "null_resource" "instance_ips" {
triggers {
ip_file = "${sha1(file("${path.module}/instance_ips.txt"))}"
}
provisioner "local-exec" {
command = "echo ${module.ec2.instance_ips} >> instance_ips.txt"
}
}
Using local_file -
resource "local_file" "instance_ips" {
content = "${module.ec2.instance_ips}"
filename = "${path.module}/instance_ips.txt"
}