DM create bigquery view then authorize it on dataset - google-cloud-platform

Using Google Deployment Manager, has anybody found a way to first create a view in BigQuery, then authorize one or more datasets used by the view, sometimes in different projects, and were not created/managed by deployment manager? Creating a dataset with a view wasn't too challenging. Here is the jinja template named inventoryServices_bigquery_territory_views.jinja:
resources:
- name: territory-{{properties["OU"]}}
type: gcp-types/bigquery-v2:datasets
properties:
datasetReference:
datasetId: territory_{{properties["OU"]}}
- name: files
type: gcp-types/bigquery-v2:tables
properties:
datasetId: $(ref.territory-{{properties["OU"]}}.datasetReference.datasetId)
tableReference:
tableId: files
view:
query: >
SELECT DATE(DAY) DAY, ou, email, name, mimeType
FROM `{{properties["files_table_id"]}}`
WHERE LOWER(SPLIT(ou, "/")[SAFE_OFFSET(1)]) = "{{properties["OU"]}}"
useLegacySql: false
The deployment configuration references the above template like this:
imports:
- path: inventoryServices_bigquery_territory_views.jinja
resources:
- name: inventoryServices_bigquery_territory_views
type: inventoryServices_bigquery_territory_views.jinja
In the example above files_table_id is the project.dataset.table that needs the newly created view authorized.
I have seen some examples of managing IAM at project/folder/org level, but my need is on the dataset, not project. Looking at the resource representation of a dataset it seems like I can update access.view with the newly created view, but am a bit lost on how I would do that without removing existing access levels, and for datasets in projects different than the one the new view is created in. Any help appreciated.
Edit:
I tried adding the dataset which needs the view authorized like so, then deploy in preview mode just to see how it interprets the config:
-name: files-source
type: gcp-types/bigquery-v2:datasets
properties:
datasetReference:
datasetId: {{properties["files_table_id"]}}
access:
view:
projectId: {{env['project']}}
datasetId: $(ref.territory-{{properties["OU"]}}.datasetReference.datasetId)
tableId: $(ref.territory_files.tableReference.tableId)
But when I deploy in preview mode it throws this error:
errors:
- code: MANIFEST_EXPANSION_USER_ERROR
location: /deployments/inventoryservices-bigquery-territory-views-us/manifests/manifest-1582283242420
message: |-
Manifest expansion encountered the following errors: mapping values are not allowed here
in "<unicode string>", line 26, column 7:
type: gcp-types/bigquery-v2:datasets
^ Resource: config
Strange to me, hard to make much sense of that error since the line/column it points to is formatted exactly the same as the other dataset in the config, except that maybe it doesn't like that the files-source dataset already exists and was created from outside of deployment manager.

Related

Google Workflow insert a bigquery job that queries a federated Google Drive table

I am working on an ELT using workflows. So far very good. However, one of my tables is based on a Google sheet and that job fails on "Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials."
I know I need to add the https://www.googleapis.com/auth/drive scope to the request and the service account that is used by the workflow needs access to the sheet. The access is correct and if I do an authenticated insert using curl it works fine.
My logic is that I should add the drive scope. However I do not know where/how to add it. Am I missing something?
The step in the Workflow:
call: googleapis.bigquery.v2.jobs.insert
args:
projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
body:
configuration:
query:
query: select * from `*****.domains_sheet_view`
destinationTable:
projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
datasetId: ***
tableId: domains
create_disposition: CREATE_IF_NEEDED
write_disposition: WRITE_TRUNCATE
allowLargeResults: true
useLegacySql: false```
AFAIK for connectors, you cannot customize the scope parameter but you can customize if you put together the HTTP call yourself.
add the service account as a viewer on the Google Docs
then run the workflow
here is my program
#workflow entrypoint
main:
steps:
- initialize:
assign:
- project: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
- makeBQJob:
call: BQJobsInsertJobWithSheets
args:
project: ${project}
configuration:
query:
query: SELECT * FROM `ndc.autoritati_publice` LIMIT 10
destinationTable:
projectId: ${project}
datasetId: ndc
tableId: autoritati_destination
create_disposition: CREATE_IF_NEEDED
write_disposition: WRITE_TRUNCATE
allowLargeResults: true
useLegacySql: false
result: res
- final:
return: ${res}
#subworkflow definitions
BQJobsInsertJobWithSheets:
params: [project, configuration]
steps:
- runJob:
try:
call: http.post
args:
url: ${"https://bigquery.googleapis.com/bigquery/v2/projects/"+project+"/jobs"}
headers:
Content-type: "application/json"
auth:
type: OAuth2
scope: ["https://www.googleapis.com/auth/drive","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/bigquery"]
body:
configuration: ${configuration}
result: queryResult
except:
as: e
steps:
- UnhandledException:
raise: ${e}
next: queryCompleted
- pageNotFound:
return: "Page not found."
- authError:
return: "Authentication error."
- queryCompleted:
return: ${queryResult.body}

Google Deployment Manager error when using manual IP allocation in NAT (HTTP 400)

Context
I am trying to associate serverless egress with a static IP address (GCP Docs). I have been able to set this up manually through the gcp-console, and now I am trying to implement it with deployment manager. However, with just the IP address and the router, once I add the NAT config, I get 400's, "Request contains an invalid argument.", which is not giving me enough information to fix the problem.
# config.yaml
resources:
# addresses spec: https://cloud.google.com/compute/docs/reference/rest/v1/addresses
- name: serverless-egress-address
type: compute.v1.address
properties:
region: europe-west3
addressType: EXTERNAL
networkTier: PREMIUM
# router spec: https://cloud.google.com/compute/docs/reference/rest/v1/routers
- name: serverless-egress-router
type: compute.v1.router
properties:
network: projects/<project-id>/global/networks/default
region: europe-west3
nats:
- name: serverless-egress-nat
natIpAllocateOption: MANUAL_ONLY
sourceSubnetworkIpRangesToNat: ALL_SUBNETWORKS_ALL_IP_RANGES
natIPs:
- $(ref.serverless-egress-address.selfLink)
# error response
code: RESOURCE_ERROR
location: /deployments/<deployment-name>/resources/serverless-egress-router
message: '{
"ResourceType":"compute.v1.router",
"ResourceErrorCode":"400",
"ResourceErrorMessage":{
"code":400,
"message":"Request contains an invalid argument.",
"status":"INVALID_ARGUMENT",
"statusMessage":"Bad Request","requestPath":"https://compute.googleapis.com/compute/v1/projects/<project-id>/regions/europe-west3/routers/serverless-egress-router",
"httpMethod":"PUT"
}}'
Notably, if I remove the 'natIPs' array and set 'natIpAllocateOption' to 'AUTO_ONLY', it goes through without errors. While this is not the configuration I need, it does narrow the problem down to these config options.
Question
Which is the invalid argument?
Are there things outside of the YAML which I should check? In the docs it says the following, which makes me wonder if there are other caveats like it:
Note that if this field contains ALL_SUBNETWORKS_ALL_IP_RANGES or ALL_SUBNETWORKS_ALL_PRIMARY_IP_RANGES, then there should not be any other Router.Nat section in any Router for this network in this region.
I checked the API reference and passing the values that you used should work. Furthermore, if you talk directly to the API using a JSON payload with these, it return 200:
{
"name": "nat",
"network": "https://www.googleapis.com/compute/v1/projects/project/global/networks/nat1",
"nats": [
{
"natIps": [
"https://www.googleapis.com/compute/v1/projects/project/regions/us-central1/addresses/test"
],
"name": "nat1",
"natIpAllocateOption": "MANUAL_ONLY",
"sourceSubnetworkIpRangesToNat": "ALL_SUBNETWORKS_ALL_IP_RANGES"
}
]
}
From what I can see the request is correctly formed using methods other than Deployment Manager so there might be an issue in the tool.
I have filed an issue about this on Google's Issue Tracker for them to take a look at it.
The DM team might be able to shed light on what's happening here.

AWS CloudFormation & Service Catalog - Can I require tags with user values?

Our problem seems very basic and I would expect common.
We have tags that must always be applied (for billing). However, the tag values are only known at the time the stack is deployed... We don't know what the tag values will be when developing the stack, or when creating the product in the Service Catalog...
We don't want to wait until AFTER the resource is deployed to discover the tag is missing, so as cool as AWS config may be, we don't want to rely on its rules if we don't have to.
So things like Tag Options don't work, because it appears that they expect we know the tag value months prior to some deployment (which isn't the case.)
Is there any way to mandate tags be used for a cloudformation template when it is deployed? Better yet, can we have service catalog query for a tag value when deploying? Tags like "system" or "project", for instance, come and go over time and are not known up-front for many types of cloudformation templates we develop.
Isn't this a common scenario?
I am worried that I am missing something very, very simple and basic which mandates tags be used up-front, but I can't seem to figure out what. Thank you in advance. I really did Google a lot before asking, without finding a satisfying answer.
I don't know anything about service catalog but you can create Conditions and then use it to conditionally create (or even fail) your resource creation. Conditional Resource Creation e.g.
Parameters:
ResourceTag:
Type: String
Default: ''
Conditions:
isTagEmpty:
!Equals [!Ref ResourceTag, '']
Resources:
DBInstance:
Type: AWS::RDS::DBInstance
Condition: isTagEmpty
Properties:
DBInstanceClass: <DB Instance Type>
Here RDS DB instance will only be created if tag is non-empty. But cloudformation will still return success.
Alternatively, you can try & fail the resource creation.
Resources:
DBInstance:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceClass: !If [isTagEmpty, !Ref "AWS::NoValue", <DB instance type>]
I haven't tried this but it should fail as DB instance type will be invalid if tag is null.
Edit: You can also create your stack using the createStack CFN API. Write some code to read & validate the input (e.g. read from service catalog) & call the createStack API. I am doing the same from Lambda (nodejs) reading some input from Parameter Store. Sample code -
module.exports.create = async (event, context, callback) => {
let request = JSON.parse(event.body);
let subnetids = await ssm.getParameter({
Name: '/vpc/public-subnets'
}).promise();
let securitygroups = await ssm.getParameter({
Name: '/vpc/lambda-security-group'
}).promise();
let params = {
StackName: request.customerName, /* required */
Capabilities: [
'CAPABILITY_IAM',
'CAPABILITY_NAMED_IAM',
'CAPABILITY_AUTO_EXPAND',
/* more items */
],
ClientRequestToken: 'qwdfghjk3912',
EnableTerminationProtection: false,
OnFailure: request.onfailure,
Parameters: [
{
ParameterKey: "SubnetIds",
ParameterValue: subnetids.Parameter.Value,
},
{
ParameterKey: 'SecurityGroupIds',
ParameterValue: securitygroups.Parameter.Value,
},
{
ParameterKey: 'OpsPoolArnList',
ParameterValue: request.userPoolList,
},
/* more items */
],
TemplateURL: request.templateUrl,
};
cfn.config.region = request.region;
let result = await cfn.createStack(params).promise();
console.log(result);
}
Another option: add a AWS Custom Resource backed by Lambda. Check for tags in this section & return failure if it doesn't satisfy the constraints. Make all other resource creation depend on this resource (so that they all create if your checks pass). Link also contains example. You will also have to add handling for stack update & deletion (like a default success). I think this is your best bet as of now.

Error when trying to create a serviceaccount key in deployment manager

The error is below:
ERROR: (gcloud.deployment-manager.deployments.update) Error in Operation [operation-1544517871651-57cbb1716c8b8-4fa66ff2-9980028f]: errors:
- code: MISSING_REQUIRED_FIELD
location: /deployments/infrastructure/resources/projects/resources-practice/serviceAccounts/storage-buckets-backend/keys/json->$.properties->$.parent
message: |-
Missing required field 'parent' with schema:
{
"type" : "string"
}
Below is my jinja template content:
resource:
- name: {{ name }}-keys
type: iam.v1.serviceAccounts.key
properties:
name: projects/{{ properties["projectID"] }}/serviceAccounts/{{ serviceAccount["name"] }}/keys/json
privateKeyType: enum(TYPE_GOOGLE_CREDENTIALS_FILE)
keyAlgorithm: enum(KEY_ALG_RSA_2048)
P.S.
My reference for the properties is based on https://cloud.google.com/iam/reference/rest/v1/projects.serviceAccounts.keys
I will post the response of #John as the answer for the benefit of the community.
The parent was missing, needing an existing service account:
projects/{PROJECT_ID}/serviceAccounts/{ACCOUNT}
where ACCOUNT value can be the email or the uniqueID of the service account.
Regarding the template, please remove the enum wrapping the privateKeyType and keyAlgoritm.
The above deployment creates a service account credentials for an existing service account, and in order to retrieve this downloadable json key file, it can be exposed using outputs using the publicKeyData property then have it base64decoded.

Composite index required does not exist, yet defined in index.yaml

I have some IoT devices that are sending some data into a Google Cloud Datastore.
The Datastore is setup as Cloud Firestore in Datastore mode.
Each row has the following fields:
Name/ID
current_temperature
data
device_id
event
gc_pub_sub_id
published_at
target_temperature
And these are all under the ParticleEvent kind.
I wish to run the following query; select current_temperature, target_temperature from ParticleEvent where device_id = ‘abc123’ order by published_at desc.
I get the below error when I try to run that query:
GQL query error: Your Datastore does not have the composite index (developer-supplied) required for this query.
So I setup an index.yaml file with the following contents:
indexes:
- kind: ParticleEvent
properties:
- name: data
- name: device_id
- name: published_at
direction: desc
- kind: ParticleEvent
properties:
- name: current_temperature
- name: target_temperature
- name: device_id
- name: published_at
direction: desc
I used the gcloud tool to send this successfully up to the datastore and I can see both indexes in the indexes tab.
However I still get the above error when I try to run the query.
What do I need to add/change to my indexes to get this query to work?
Though in the comment I simply suggest select * (that's the best way, I do think)
There is a way make your query work.
- kind: ParticleEvent
properties:
- name: device_id
- name: published_at
direction: desc
- name: current_temperature
- name: target_temperature
The reason why is select is done at the end and thus you need the index of current_temperature and target_temperature in a lower level.
Why I don't suggest this way is because, when your data grows and you need more combination of indexing just because of select specific columns. Your index size will grow exponentially.
But let's say if you sure you will just use this once and always query the data like this, then feel free to indexing it.
Or, if the connection bandwidth between your computer and google cloud is very small such that downloading more data causes you lag.