I am trying to invoke Airflow 2.0's Stable REST API from Cloud Composer Version 1 via a Python script and encountered a HTTP 401 error while referring to Triggering DAGS with Cloud Functions and Access the Airflow REST API.
The service account has the following list of permissions:
roles/iam.serviceAccountUser (Service Account User)
roles/composer.user (Composer User)
roles/iap.httpsResourceAccessor (IAP-Secured Web App User, added when the application returned a 403, which was unusual as the guides did not specify the need for such a permission)
I am not sure what is wrong with my configuration; I have tried giving the service account the Editor role and roles/iap.tunnelResourceAccessor (IAP-Secured Tunnel User) & roles/composer.admin (Composer Administrator), but to no avail.
EDIT:
I found the source of my problems: The Airflow Database did not have the credentials of the service account in the users table. However, this is unusual as I currently have a service account (the first I created) whose details were added automatically to the table. Subsequent service accounts were not added to the users table when they tried to initially access the REST API, thus returning the 401. I am not sure of a way to create users without passwords since the Airflow web server is protected by IAP.
Thanks to answers posted by #Adrie Bennadji and #ewertonvsilva, I was able to diagnose the HTTP 401 issue.
The email field in some of Airflow's database tables that are pertaining to users, have a limit of 64 characters (Type: character varying(64)), as noted in: Understanding the Airflow Metadata Database
Coincidentally, my first service account had an email whose character length was just over 64 characters.
When I tried running the command: gcloud composer environments run <instance-name> --location=<location> users -- create --use-random-password --username "accounts.google.com:<service_accounts_uid>" --role Op --email <service-account-username>#<...>.iam.gserviceaccount.com -f Service -l Account as suggested by #ewertonvsilva to add my other service accounts, they failed with the following error: (psycopg2.errors.StringDataRightTruncation) value too long for type character varying(64).
As a result, I created new service accounts with shorter emails and these were able to be authenticated automatically. I was also able to add these new service accounts with shorter emails to Airflow manually via the gcloud command and authenticate them. Also, I discovered that the failure to add the user upon first acccess to the REST API was actually logged in Cloud Logging. However, at that time I was not aware of how Cloud Composer handled new users accessing the REST API and the HTTP 401 error was a red herring.
Thus, the solution is to ensure that the combined length of your service account's email is lesser than 64 characters.
ewertonvsilva's solution worked for me (manually adding the service account to Airflow using gcloud composer environments run <instance-name> --location=<location> users -- create ... )
At first it didn't work but changing the username to accounts.google.com:<service_accounts_uid> made it work.
Sorry for not commenting, not enough reputation.
Based on #Adrien's Bennadji feedback, I'm posting the final answer.
Create the service accounts with the proper permissions for cloud composer;
Via gcloud console, add the users in airflow database manually:
gcloud composer environments run <instance-name> --location=<location> users -- create --use-random-password --username "accounts.google.com:<service_accounts_uid>" --role Op --email <service-account-username>#<...>.iam.gserviceaccount.com -f Service -l Account
And then, list the users with: gcloud composer environments run <env_name> --location=<env_loc> users -- list
use: accounts.google.com:<service_accounts_uid> for the username.
Copying my answer from https://stackoverflow.com/a/70217282/9583820
It looks like instead of creating Airflow accounts with
gcloud composer environments run
You can just use GCP service accounts with email length <64 symbols.
It will work automatically under those conditions:
TL'DR version:
In order to make Airflow Stable API work at GCP Composer:
Set "api-auth_backend" to "airflow.composer.api.backend.composer_auth"
Make sure your service account email length is <64 symbols
Make sure your service account has required permissions (Composer User role should be sufficient)
Longread:
We are using Airflow for a while now, and started with version 1.x.x with "experimental" (now deprecated) API's.
To Authorize, we are using "Bearer" token obtained with service account:
# Obtain an OpenID Connect (OIDC) token from metadata server or using service account.
google_open_id_connect_token = id_token.fetch_id_token(Request(), client_id)
# Fetch the Identity-Aware Proxy-protected URL, including an
# Authorization header containing "Bearer " followed by a
# Google-issued OpenID Connect token for the service account.
resp = requests.request(
method, url,
headers={'Authorization': 'Bearer {}'.format(
google_open_id_connect_token)}, **kwargs)
Now we are migrating to Airflow 2.x.x and faced with exact same issue:
403 FORBIDDEN.
Our environment details are:
composer-1.17.3-airflow-2.1.2 (Google Cloud Platform)
"api-auth_backend" is set to "airflow.api.auth.backend.default".
Documentation claims that:
After you set the api-auth_backend configuration option to airflow.api.auth.backend.default, the Airflow web server accepts all API requests without authentication.
However, this does not seem to be true.
In experimental way, we found that if "api-auth_backend" is set to "airflow.composer.api.backend.composer_auth", Stable REST API (Airflow 2.X.X) starting to work.
But there is other caveat to this: for us, some of our service accounts did work, and some did not.
The ones that did not work were throwing "401 Unauthorized" error.
We figured out that accounts having email length > 64 symbols were throwing error. Same was observed at this answer.
So after setting "api-auth_backend" to "airflow.composer.api.backend.composer_auth" and making sure that our service account email length is <64 symbols - our old code for Airflow 1.x.x started to work for Authentication. Then we needed to make changes (API URLs and response handling) and stable Airflow (2.x.x) API started to work for us
in the same way as it was for Airflow 1.x.x.
UPD: this is a defect in Airflow and will be fixed here:
https://github.com/apache/airflow/pull/19932
I was trying to invoke Airflow 2.0's Stable REST API from Cloud Composer Version 2 via a Python script and encountered an HTTP 401 error while referring to Triggering DAGS with Cloud Functions and accessing the Airflow REST API.
I used this image version: composer-2.1.2-airflow-2.3.4
I also followed these 2 guides:
Triggering Cloud Composer DAGs with Cloud Functions (Composer 2 + Airflow 2)
Access the Airflow REST API Cloud Composer 2
But I was always stuck with Error 401, when I tried to run the DAG via the Cloud Function.
However, when the DAG was executed from the Airflow UI, it was successful (Trigger DAG in the Airflow UI).
For me the following solution worked:
In the airflow.cfg, set the following settings:
api - auth_backends=airflow.composer.api.backend.composer_auth,airflow.api.auth.backend.session
api - composer_auth_user_registration_role = Op (default)
api - enable_experimental_api = False (default)
webserver - rbac_user_registration_role = Op (default)
Service Account:
The service account email total length is <64 symbols.
The account has these roles:
Cloud Composer v2 API Service Agent Extension, Composer User
Airflow UI
Add the service account to the Airflow Users via Airflow UI
(Security -> List Users with username) = accounts.google.com:<service account uid>, and assign the role of Op to it.
You can get the UID from via cloud shell command (see above), or just
navigate to the IAM & Admin Page on Google Cloud -> Service Accounts
-> Click on the service account and read the Unique ID from the Details page.
And now, IMPORTANT!: SET THE ACCOUNT ACTIVE! (In the Airflow UI, check the box "is Active?" to true).
This last step to set it active was not described anywhere, and for long time I just assumed it gets set active when there is an open session (when it makes the calls), but that is not the case. The account has to be set manually active.
After that, everything worked fine :)
Other remarks: As I joined a new company, I also had to check some other stuff (maybe this is not related to your problem, but it's good to know anyway - maybe others can use this). I use Cloud Build to deploy the Cloud Functions and the DAGs in the Airflow, so I also had to check the following:
Cloud Source Repository (https://source.cloud.google.com/) is in sync with the GitHub Repository. If not: Disconnect the repository and reconnect again.
The GCS Bucket which is created when the Composer 2 Environment is setup the very first time has a subfolder "/dags/". I had to manually add the subfolder "/dags/dataflow/" so the deployed Dataflow Pipeline codes could be uploaded to that subfolder "/dags/dataflow/"
Related
New to GCP and use IAC for our Terraform. I've managed to build most of the initial organisation config in Terraform no problem with the exception of Cloud Identity and Organisation Policies. I'm using gcloud provided login credentials. Whenever I try to build for those two services I get this:
Error creating Group: googleapi: Error 403: Your application has authenticated using end user credentials from the Google Cloud SDK or Google Cloud Shell which are not supported by the cloudidentity.googleapis.com. We recommend configuring the billing/quota_project setting in gcloud or using a service account through the auth/impersonate_service_account setting. For more information about service accounts and how to use them in your application, see https://cloud.google.com/docs/authentication/. If you are getting this error with curl or similar tools, you may need to specify 'X-Goog-User-Project' HTTP header for quota and billing purposes. For more information regarding 'X-Goog-User-Project' header, please check https://cloud.google.com/apis/docs/system-parameters.
So in this case i'm using the Google Cloud SDK, so the error makes sense. However, the two options it presents don't work:
Setting a quota project makes no difference
I can't create a service account at the organisational level (and when I create one within a project it can't configure these organisational level constructs)
So how do I go about Terraforming these services?
Thanks.
I have a service account with Owner permissions on the project (Just for testing this out)
Still I am not able to create API Keys using that service account via gcloud. It says "Permission Denied"
I am using the following commands.
1.
gcloud auth activate-service-account <Service-account>#<project-id>.iam.gserviceaccount.com --key-file=<key-file>.json
2.
gcloud auth list //Gives the service account name
3.
gcloud alpha services api-keys create --display-name=dummy
The above command works if I authenticate as a normal user with Owner permission but with service account it doesn't seems to work. Am I missing something ? Please help.
The APIKEY Api has a strange history. Relesed in Beta about 1 years ago, and now go back to Alpha. There is no public documentation (in reality it has been removed) and if you know this API, you have found it on SO or on old tutorial.
Anyway, just to say that it's not a reliable API and if you want to automate stuff on it (with call with a service account) it's not a good idea. In addition, sometime, APIs don't allow service account call but require user credentials. It was the case previously with the quota APIs, but it has been updated recently (this summer 2020).
Eventually, Google Cloud don't recommend to use APIKEY for security reason (we can discuss this more if you want). And thus, I don't think it is in its (security and best practice) strategy to promote an API that allows APIKEY automation.
TL;DR - Can I authenticate to AutoML API by impersonating a service account (SA) with my application default credentials (ADC) or must I actually use SA authentication?
I would like to be able to authenticate to the AutoML API using ADC when making batch predictions on a deployed model. This is just for development purposes as to not create a new SA for each developer & data scientist. I know AutoML requires a SA for authentication so I would like to use the --impersonate-service-account flag or the auth/impersonate_service_account setting. I have followed instructions from this Medium post but am still getting an error about using end user credentials. So my question is, am I just doing something wrong, or must AutoML use a true SA authentication without impersonation?
The output of gcloud config list is -
[auth]
impersonate_service_account = abcdefghijklmnop#my-project.iam.gserviceaccount.com
[compute]
region = us-east1
zone = us-east1-b
[core]
account = first.last#domain.com
disable_usage_reporting = False
project = my-project
Your active configuration is: [default]
Here is the error returned by AutoML -
google.api_core.exceptions.PermissionDenied: 403 Your application has authenticated using end user credentials from the Google Cloud SDK or Google Cloud Shell which are not supported by the automl.googleapis.com. We recommend configuring the billing/quota_project setting in gcloud or using a service account through the auth/impersonate_service_account setting. For more information about service accounts and how to use them in your application, see https://cloud.google.com/docs/authentication/.
And from the AutoML docs -
Service accounts are the only authentication option available with the AutoML API.
Thanks,
Zach
Have you tried specifying which service account to use for impersonation[1] by running "gcloud config set auth/impersonate_service_account"?
In order to impersonate, your original credentials need to be granted roles/iam.serviceAccountTokenCreator on the target service account[1].
[1] https://cloud.google.com/storage/docs/gsutil/addlhelp/CredentialTypesSupportingVariousUseCases
I have tested with several other services and service account impersonation seems to work for them. It appears Google AutoML requires a service account and impersonation will not work.
TL;DR
Can anyone suggest a way that I can run tests as a named user that isn't a service account within a CI pipeline?
Detail:
We provide an application running on GCP, the stakeholders of which are all users internal to our organisation. Those users' identities are mastered in our on-premises ADFS implementation and are synced to GCP so that they can be used for authentication.
We authorize access to various parts of the application using ADFS groups that those identities can be added to and again, we sync those group memberships to GCP. The groups are available as Google Groups (https://groups.google.com). At runtime we check whether the authenticated user is a member of a given group. This all works fine.
I would like to run tests in our CI pipeline that ensure that an authenticated user can do the things they're allowed to do and can't do the things that they're not allowed to do. Here are some of the tests I would like to run:
* Ensure gsutil ls allowed-bucket returns an expected list of blobs
* Ensure gsutil ls non-allowed-bucket cannot access the named bucket
* Ensure a cloud function can be called via gcloud functions call function-name
Thus I need to programatically (i.e. without human intervention) authenticate as a user that is a member of a given Google group. I came up with two possibilities:
Using a service account and authenticating using gcloud auth activate-service-account --key-file /path/to/keyfile.json but I cannot find a way to add a service account to the appropriate Google Group - so that option won't work
Preparing a non-privileged user in our on-premises ADFS that we then add to the appropriate Google Group. Unfortunately I don't think there's a way to authenticate as that user without human intervention because gcloud auth login uses a web-based authorization workflow. If it requires human intervention then I can't use it in a CI pipeline.
Here is the error:
Your application has authenticated using end user credentials from the
Google Cloud SDK or Google Cloud Shell which are not supported by the
dialogflow.googleapis.com. We recommend that most server applications
use service accounts instead. For more information about service
accounts and how to use them in your application, see
https://cloud.google.com/docs/authentication/.
Many of the Client Libraries pull from the Application Default Credentials, a summary of how they're checked is provided on that link. Essentially it will check environmental variables for a path and pull credentials from that location. This error message means you're using a User account, and not a service account.
Most commonly you logged in once using gcloud auth login, and even though you provided your service account it's still pulling from the Application Default location.
As you did, the method to associate a specific service account is gcloud auth activate-service-account --key-file <path>
Alternatively to use the true application default you can use gcloud auth application-default login