Environment variables in Google Cloud Build - google-cloud-platform

We want to migrate from Bitbucket Pipelines to Google Cloud Build to test, build and push Docker images.
How can we use environment variables without a CryptoKey? For example:
- printf "https://registry.npmjs.org/:_authToken=${NPM_TOKEN}\nregistry=https://registry.npmjs.org" > ~/.npmrc

To use environment variables in the args portion of your build steps you need:
"a shell to resolve environment variables with $$" (as mentioned in the example code here)
and you also need to be careful with your usage of quotes (use single quotes)
See below the break for a more detailed explanation of these two points.
While the Using encrypted resources docs that David Bendory also linked to (and which you probably based your assumption on) show how to do this using an encrypted environment variable specified via secretEnv, this is not a requirement and it works with normal environment variables too.
In your specific case you'll need to modify your build step to look something like this:
# you didn't show us which builder you're using - this is just one example of
# how you can get a shell using one of the supported builder images
- name: 'gcr.io/cloud-builders/docker'
entrypoint: 'bash'
args: ['-c', 'printf "https://registry.npmjs.org/:_authToken=%s\nregistry=https://registry.npmjs.org" $$NPM_TOKEN > ~/.npmrc']
Note the usage of %s in the string to be formatted and how the environment variable is passed as an argument to printf. I'm not aware of a way that you can include an environment variable value directly in the format string.
Alternatively you could use echo as follows:
args: ['-c', 'echo "https://registry.npmjs.org/:_authToken=$${NPM_TOKEN}\nregistry=https://registry.npmjs.org" > ~/.npmrc']
Detailed explanation:
My first point at the top can actually be split in two:
you need a shell to resolve environment variables, and
you need to escape the $ character so that Cloud Build doesn't try to perform a substitution here
If you don't do 2. your build will fail with an error like: Error merging substitutions and validating build: Error validating build: key in the template "NPM_TOKEN" is not a valid built-in substitution
You should read through the Substituting variable values docs and make sure that you understand how that works. Then you need to realise that you are not performing a substitution here, at least not a Cloud Build substitution. You're asking the shell to perform a substitution.
In that context, 2. is actually the only useful piece of information that you'll get from the Substituting variable values docs (that $$ evaluates to the literal character $).
My second point at the top may be obvious if you're used to working with the shell a lot. The reason for needing to use single quotes is well explained by these two questions. Basically: "You need to use single quotes to prevent interpolation happening in your calling shell."

That sounds like you want to use Encrypted Secrets: https://cloud.google.com/cloud-build/docs/securing-builds/use-encrypted-secrets-credentials

Related

Treat capture groups with custom function before using them in the replace part

I have some markdown files with broken relative links. I wish to fix them.
For instance I have this (very short) example file:
Please refer to [this first ressource](wrong/path/to/file) and [this other ressource](non/existing/text).
You can also search on [this website](https://example.net).
Note that there can be multiple links on the same line, and that there are also external links, which should stay untouched.
On a first approach, if we forget about external links, and if the correct resources pointed by the relative links were only placed in a sub-folder, we could have this approach:
sed -i -E "s/(\[.*\])\(([^\)]*)\)/\1(subfolder\/\2)/g" document.md
which would turn my example document into this:
Please refer to [this first ressource](subfolder/wrong/path/to/file) and [this other ressource](subfolder/non/existing/text).
You can also search on [this website](subfolder/https://example.net).
But we have 2 problems here:
External links are messed up
Correct resources pointed by the links are not simply in some sub-folders. The correct path can be determined from the wrong one though, but we have to go through the entire folder in order to do that. This could be done quite easily in a bash script for instance.
So I need a way to apply a function to my capture group before sed takes it to determine the replacement string. Here, the capture flag I need to process is \2.
Any solution even without sed is acceptable.
Thanks.
Concerning your second bullet point,
Correct resources pointed by the links are not simply in some sub-folders. The correct path can be determined from the wrong one though, but we have to go through the entire folder in order to do that. This could be done quite easily in a bash script for instance.
you have not mentioned what this bash script should do. In the following I assume you have defined a correct_path function which does the job:
correct_path () { echo ${1//#(wrong|non\/existing)/correct}; } # this example assumes shopt -s extglob
With that function defined, you can run the following command, which
executes a sed command to change each address to $(correct_path address) and ecloses the result in double quotes to preserve the newlines, "$(…)";
evals the line made up of echo, ", the "$(…)" generated above, and another ".
eval echo '"'"$(< os sed -E 's/(\[[^]]*\])\(([^)]*)\)/\1($(correct_path \2))/g')"'"' > os
However eval is evil.

Executing a Dataflow job with multiple inputs/outputs using gcloud cli

I've designed a data transformation in Dataprep and am now attempting to run it by using the template in Dataflow. My flow has several inputs and outputs - the dataflow template provides them as a json object with key/value pairs for each input & location. They look like this (line breaks added for easy reading):
{
"location1": "project:bq_dataset.bq_table1",
#...
"location10": "project:bq_dataset.bq_table10",
"location17": "project:bq_dataset.bq_table17"
}
I have 17 inputs (mostly lookups) and 2 outputs (one csv, one bigquery). I'm passing these to the gcloud CLI like this:
gcloud dataflow jobs run job-201807301630 /
--gcs-location=gs://bucketname/dataprep/dataprep_template /
--parameters inputLocations={"location1":"project..."},outputLocations={"location1":"gs://bucketname/output.csv"}
But I'm getting an error:
ERROR: (gcloud.dataflow.jobs.run) unrecognized arguments:
inputLocations=location1:project:bq_dataset.bq_table1,outputLocations=location2:project:bq_dataset.bq_output1
inputLocations=location10:project:bq_dataset.bq_table10,outputLocations=location1:gs://bucketname/output.csv
From the error message, it looks to be merging the inputs and outputs so that as I have two outputs, each two inputs are paired with the two outputs:
input1:output1
input2:output2
input3:output1
input4:output2
input5:output1
input6:output2
...
I've tried quoting the input/output objects (single and double, plus removing the quotes in the object), wrapping them in [], using tildes but no joy. Has anyone managed to execute a dataflow job with multiple inputs?
I finally found a solution for this via a huge process of trial and error. There are several steps involved.
Format of --parameters
The --parameters argument is a dictionary-type argument. There are details on these in a document you can read by typing gcloud topic escaping in the CLI, but in short it means you'll need an = between --parameters and the arguments, and then the format is key=value pairs with the value enclosed in quote marks ("):
--parameters=inputLocations="object",outputLocations="object"
Escape the objects
Then, the objects need the quotes escaping to avoid ending the value prematurely, so
{"location1":"gcs://bucket/whatever"...
Becomes
{\"location1\":\"gcs://bucket/whatever\"...
Choose a different separator
Next, the CLI gets confused because while the key=value pairs are separated by a comma, the values also have commas in the objects. So you can define a different separator by putting it between carats (^) at the start of the argument and between the key=value pairs:
--parameters=^*^inputLocations="{"\location1\":\"...\"}"*outputLocations="{"\location1\":\"...\"}"
I used * because ; didn't work - maybe because it marks the end of the CLI command? Who knows.
Note also that the gcloud topic escaping info says:
In cmd.exe and PowerShell on Windows, ^ is a special character and
you must escape it by repeating it. In the following examples, every time
you see ^, replace it with ^^^^.
Don't forget customGcsTempLocation
After all that, I'd forgotten that customGcsTempLocation needs adding to the key=value pairs in the --parameters argument. Don't forget to separate it from the others with a * and enclose it in quote marks again:
...}*customGcsTempLocation="gs://bucket/whatever"
Pretty much none of this is explained in the online documentation, so that's several days of my life I won't get back - hopefully I've helped someone else with this.

How do I configure Jenkins to strip the leading "origin/" in git branch parameter?

I'm using Jenkins with a branch parameter to specify the branch to build from. Other stuff downstream needs the branch name to not have the leading "origin/" -- just "feature/blahblah" or "bugfix/12345" or similar. The advanced settings for the parameter let me specify a branch filter via regex, but I'm a regex newbie and the solutions I've found in searching are language-dependent. The Jenkins documentation for this is sparse.
When a user clicks on "build with parameters", for the branch I want to see branch names that omit the leading "origin/". I'm not sure how to write a regex for Jenkins that will "consume" that part of the branch name before setting the parameter value.
I solved this problem once before, I'm pretty sure using Stack Overflow, but I can't find those hints now.
For the git branch parameter, set Branch Filter to:
origin/(.*)
I found the parentheses to be counter-intuitive, because if you don't specify a filter you get:
.*
(No parens.) If you are filtering stuff out, you use parens to indicate the part to keep.
I usually use a groovy script evaluated before the job, like:
def map = [:]
map['GIT_BRANCH'] = GIT_BRANCH - 'origin/'
return map
This is using the EnvInject plugin, as described in gitlab-plugin issue 444
If you need to filter multiple patterns without origin/ section, try the following.
origin/(develop.*|feature.*|bugfix.*)
This will list the develop, feature and bugfix branches without the leading origin/.

How to robustly set Django secret key as environment variable

My Django project's secret key contains special characters such as #, #, ^, * etc. I'm trying to set this as an env variable at /etc/environment.
I include the following in the file:
export SECRET_KEY='zotpbek!*t_abkrfdpo!*^##plg6qt-x6(%dg)9p(qoj_r45y8'
I.e. I included single quotes around the string since it contains special characters (also prescribed by this SO post). I exit the file and do source /etc/environment. Next I type env in the terminal: SECRET__KEY correctly shows.
I log out and log back in. I type env again.
This time SECRET_KEY still shows, but is cut off beyond the # character. It's excluding everything beyond (and including) the # character.
How do I fix this issue? Trying with double quotes didn't alleviate anything either. My OS is Ubuntu 14.04 LTS.
p.s. I'm aware environment variables don't support access control; there's a bunch of reasons not to set the Django secret key as an env var. For the purposes of this ques, let's put that on the back burner.
This isn't a Django problem per se. According to this question Escape hash mark (#) in /etc/environment you can't use a "#" in /etc/environment.
I would recommend that you keep regenerating your secret key until you get one without #s -- that should fix the problem. Django Extensions has a command generate_secret_key for this. The side effect of changing the secret key is that current sessions will become invalid, that is, unless you are using it other places in your application.
Easiest way is to generate one using python3 in you linux terminal with following inline script:
python3 -c 'import random; print("".join([random.choice("abcdefghijklmnopqrstuvwxyz0123456789!#%^&*-_") for i in range(50)]))'
this will generate secret key without unsafe characters
As per the django-environ documention you can use unsafe characters in .env file.
https://django-environ.readthedocs.io/en/latest/index.html#tips
To use unsafe characters you have to encode with urllib.parse.encode before you set into .env file.
Example:- admin#123 = admin%28123

Regex to identify reference to environment variables in bash

We know that 'eval' can do evil things: Like execute code.
I need a way in BASH to be able to identify in a string of characters where an environment variable is used, and ultimately replace it with its actual value.
e.g. This is a very simple example of something much more complex.
File "x.dat" contains:
$MYDIR/file.txt
Environment
export MYDIR=/tmp/somefolder
Script "x.sh"
...
fileToProcess=$(cat x.dat)
realFileToProcess=$(eval echo $fileToProcess)
echo $realFileToProcess
...
Keep in mind that referenced environment variables in a string can also be:
${MYDIR}_txt
$MYDIR-txt
${MYDIR:0:3}:txt
${MYDIR:5}.txt
Not an aswer yet but some remarks about the question.
It seems what you need is not variable expansion but token replacement in a template, depending the use case printf may be sufficient.
variable expansion depends also on context for example following are not replaced:
# single quotes
echo '${MYDIR}'
# ANSI-C quotes
echo $'${MYDIR}'
# heredoc with end marker enclosed between single quotes
cat << 'END'
${MYDIR}
END
Should be noted also that a variable expansion may execute arbitrary code:
echo ${X[`echo hi >&2`]}