I have some IoT devices that are sending some data into a Google Cloud Datastore.
The Datastore is setup as Cloud Firestore in Datastore mode.
Each row has the following fields:
Name/ID
current_temperature
data
device_id
event
gc_pub_sub_id
published_at
target_temperature
And these are all under the ParticleEvent kind.
I wish to run the following query; select current_temperature, target_temperature from ParticleEvent where device_id = ‘abc123’ order by published_at desc.
I get the below error when I try to run that query:
GQL query error: Your Datastore does not have the composite index (developer-supplied) required for this query.
So I setup an index.yaml file with the following contents:
indexes:
- kind: ParticleEvent
properties:
- name: data
- name: device_id
- name: published_at
direction: desc
- kind: ParticleEvent
properties:
- name: current_temperature
- name: target_temperature
- name: device_id
- name: published_at
direction: desc
I used the gcloud tool to send this successfully up to the datastore and I can see both indexes in the indexes tab.
However I still get the above error when I try to run the query.
What do I need to add/change to my indexes to get this query to work?
Though in the comment I simply suggest select * (that's the best way, I do think)
There is a way make your query work.
- kind: ParticleEvent
properties:
- name: device_id
- name: published_at
direction: desc
- name: current_temperature
- name: target_temperature
The reason why is select is done at the end and thus you need the index of current_temperature and target_temperature in a lower level.
Why I don't suggest this way is because, when your data grows and you need more combination of indexing just because of select specific columns. Your index size will grow exponentially.
But let's say if you sure you will just use this once and always query the data like this, then feel free to indexing it.
Or, if the connection bandwidth between your computer and google cloud is very small such that downloading more data causes you lag.
Related
I am trying to automate the process of updating IPs to help engineers whitelist IPs on AWS WAF IP set. aws waf-regional update-ip-set returns a ChangeToken which has to be used in the next run of update-ip-set command.
This automation I am trying to achieve is through Rundeck job (community edition). Ideally engineers will not have access to the output of previous job to retrieve ChangeToken. What's the best way to accomplish this task?
You can hide the step output using the "Mask Log Output by Regex" output filter.
Take a look at the following job definition example, the first step is just a simulation of getting the token, but it's hidden by the filter.
- defaultTab: nodes
description: ''
executionEnabled: true
id: fcf8cf5d-697c-42a1-affb-9cda02183fdd
loglevel: INFO
name: TokenWorkflow
nodeFilterEditable: false
plugins:
ExecutionLifecycle: null
scheduleEnabled: true
sequence:
commands:
- exec: echo "abc123"
plugins:
LogFilter:
- config:
invalidKeyPattern: \s|\$|\{|\}|\\
logData: 'false'
name: mytoken
regex: s*([^\s]+?)\s*
type: key-value-data
- config:
maskOnlyValue: 'false'
regex: .*
replacement: '[SECURE]'
type: mask-log-output-regex
- exec: echo ${data.mytoken}
keepgoing: false
strategy: node-first
uuid: fcf8cf5d-697c-42a1-affb-9cda02183fdd
The second step uses that token (to show the data passing the steps print the data value generated in the first step, of course in your case the token is used by another command).
Update (passing the data value to another job)
Just use the job reference step and put the data variable name on the remote job option as an argument.
Check the following example:
The first job generates the token (or gets it from your service, hiding the result like in the first example). Then, it calls another job that "receives" that data in an option (Job Reference Step > Arguments) using this format:
-token ${data.mytoken}
Where -token is the target job option name, and ${data.mytoken} is the current data variable name.
- defaultTab: nodes
description: ''
executionEnabled: true
id: fcf8cf5d-697c-42a1-affb-9cda02183fdd
loglevel: INFO
name: TokenWorkflow
nodeFilterEditable: false
plugins:
ExecutionLifecycle: null
scheduleEnabled: true
sequence:
commands:
- exec: echo "abc123"
plugins:
LogFilter:
- config:
invalidKeyPattern: \s|\$|\{|\}|\\
logData: 'false'
name: mytoken
regex: s*([^\s]+?)\s*
type: key-value-data
- config:
maskOnlyValue: 'false'
regex: .*
replacement: '[SECURE]'
type: mask-log-output-regex
- jobref:
args: -token ${data.mytoken}
group: ''
name: ChangeRules
nodeStep: 'true'
uuid: b6975bbf-d6d0-411e-98a6-8ecb4c3f7431
keepgoing: false
strategy: node-first
uuid: fcf8cf5d-697c-42a1-affb-9cda02183fdd
This is the job that receive the token and do something, the example show the token but the idea is to use internally to do some action (like the first example).
- defaultTab: nodes
description: ''
executionEnabled: true
id: b6975bbf-d6d0-411e-98a6-8ecb4c3f7431
loglevel: INFO
name: ChangeRules
nodeFilterEditable: false
options:
- name: token
plugins:
ExecutionLifecycle: null
scheduleEnabled: true
sequence:
commands:
- exec: echo ${option.token}
keepgoing: false
strategy: node-first
uuid: b6975bbf-d6d0-411e-98a6-8ecb4c3f7431
I am working on an ELT using workflows. So far very good. However, one of my tables is based on a Google sheet and that job fails on "Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials."
I know I need to add the https://www.googleapis.com/auth/drive scope to the request and the service account that is used by the workflow needs access to the sheet. The access is correct and if I do an authenticated insert using curl it works fine.
My logic is that I should add the drive scope. However I do not know where/how to add it. Am I missing something?
The step in the Workflow:
call: googleapis.bigquery.v2.jobs.insert
args:
projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
body:
configuration:
query:
query: select * from `*****.domains_sheet_view`
destinationTable:
projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
datasetId: ***
tableId: domains
create_disposition: CREATE_IF_NEEDED
write_disposition: WRITE_TRUNCATE
allowLargeResults: true
useLegacySql: false```
AFAIK for connectors, you cannot customize the scope parameter but you can customize if you put together the HTTP call yourself.
add the service account as a viewer on the Google Docs
then run the workflow
here is my program
#workflow entrypoint
main:
steps:
- initialize:
assign:
- project: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
- makeBQJob:
call: BQJobsInsertJobWithSheets
args:
project: ${project}
configuration:
query:
query: SELECT * FROM `ndc.autoritati_publice` LIMIT 10
destinationTable:
projectId: ${project}
datasetId: ndc
tableId: autoritati_destination
create_disposition: CREATE_IF_NEEDED
write_disposition: WRITE_TRUNCATE
allowLargeResults: true
useLegacySql: false
result: res
- final:
return: ${res}
#subworkflow definitions
BQJobsInsertJobWithSheets:
params: [project, configuration]
steps:
- runJob:
try:
call: http.post
args:
url: ${"https://bigquery.googleapis.com/bigquery/v2/projects/"+project+"/jobs"}
headers:
Content-type: "application/json"
auth:
type: OAuth2
scope: ["https://www.googleapis.com/auth/drive","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/bigquery"]
body:
configuration: ${configuration}
result: queryResult
except:
as: e
steps:
- UnhandledException:
raise: ${e}
next: queryCompleted
- pageNotFound:
return: "Page not found."
- authError:
return: "Authentication error."
- queryCompleted:
return: ${queryResult.body}
Using Google Deployment Manager, has anybody found a way to first create a view in BigQuery, then authorize one or more datasets used by the view, sometimes in different projects, and were not created/managed by deployment manager? Creating a dataset with a view wasn't too challenging. Here is the jinja template named inventoryServices_bigquery_territory_views.jinja:
resources:
- name: territory-{{properties["OU"]}}
type: gcp-types/bigquery-v2:datasets
properties:
datasetReference:
datasetId: territory_{{properties["OU"]}}
- name: files
type: gcp-types/bigquery-v2:tables
properties:
datasetId: $(ref.territory-{{properties["OU"]}}.datasetReference.datasetId)
tableReference:
tableId: files
view:
query: >
SELECT DATE(DAY) DAY, ou, email, name, mimeType
FROM `{{properties["files_table_id"]}}`
WHERE LOWER(SPLIT(ou, "/")[SAFE_OFFSET(1)]) = "{{properties["OU"]}}"
useLegacySql: false
The deployment configuration references the above template like this:
imports:
- path: inventoryServices_bigquery_territory_views.jinja
resources:
- name: inventoryServices_bigquery_territory_views
type: inventoryServices_bigquery_territory_views.jinja
In the example above files_table_id is the project.dataset.table that needs the newly created view authorized.
I have seen some examples of managing IAM at project/folder/org level, but my need is on the dataset, not project. Looking at the resource representation of a dataset it seems like I can update access.view with the newly created view, but am a bit lost on how I would do that without removing existing access levels, and for datasets in projects different than the one the new view is created in. Any help appreciated.
Edit:
I tried adding the dataset which needs the view authorized like so, then deploy in preview mode just to see how it interprets the config:
-name: files-source
type: gcp-types/bigquery-v2:datasets
properties:
datasetReference:
datasetId: {{properties["files_table_id"]}}
access:
view:
projectId: {{env['project']}}
datasetId: $(ref.territory-{{properties["OU"]}}.datasetReference.datasetId)
tableId: $(ref.territory_files.tableReference.tableId)
But when I deploy in preview mode it throws this error:
errors:
- code: MANIFEST_EXPANSION_USER_ERROR
location: /deployments/inventoryservices-bigquery-territory-views-us/manifests/manifest-1582283242420
message: |-
Manifest expansion encountered the following errors: mapping values are not allowed here
in "<unicode string>", line 26, column 7:
type: gcp-types/bigquery-v2:datasets
^ Resource: config
Strange to me, hard to make much sense of that error since the line/column it points to is formatted exactly the same as the other dataset in the config, except that maybe it doesn't like that the files-source dataset already exists and was created from outside of deployment manager.
Consider the following config for ansible's gcp_compute inventory plugin:
plugin: gcp_compute
projects:
- myproj
scopes:
- https://www.googleapis.com/auth/compute
filters:
- ''
groups:
connect: '"connect" in list"'
gcp: 'True'
auth_kind: serviceaccount
service_account_file: ~/.gsutil/key.json
This works for me, and will put all hosts in the gcp group as expected. So far so good.
However, I'd like to group my machines based on certain substrings appearing in their names. How can I do this?
Or, more broadly, how can I find a description of the various variables available to the jinja expressions in the groups dictionary?
The variables available are the keys available inside each of the items in the response, as listed here: https://cloud.google.com/compute/docs/reference/rest/v1/instances/list
So, for my example:
plugin: gcp_compute
projects:
- myproj
scopes:
- https://www.googleapis.com/auth/compute
filters:
- ''
groups:
connect: "'connect' in name"
gcp: 'True'
auth_kind: serviceaccount
service_account_file: ~/.gsutil/key.json
For complete your accurate answer, for choose the machines based on certain substrings appearing in their names in the parameter 'filter' you can add a, for example, expression like this:
filters:
- 'name = gke*'
This value list only the instances that their name start by gke.
I would like to understand what is the best approach for modeling an action on a resource using RAML.
E.g. I have the following resource definition in RAML:
/orders:
type: collection
get:
description: Gets all orders
post:
description: Creates a new order
/{orderId}:
type: element
get:
description: Gets a order
put:
description: Updates a order
delete:
description: Deletes a order
Now for an order I would like to model an "approve" action. Is there a best practice of doing this with RAML ?
You could PUT or PATCH for setting some "Approval" to true in your model.
You could think about the approval as a resource. For example:
/orders:
type: collection
get:
post:
/{orderId}:
type: element
get:
put:
delete:
/approval:
post:
get:
...
It's not a RAML best practice. It's more related with how do you represent your model in REST.
You could use a PATCH request with a "patch document" that raises the approved flag on an order.