getting logs from a file with Ops Agent - google-cloud-platform

I have a python script on a vm that writes logs to a file and I want to use them in the google logging.
I tried this config yaml:
logging:
receivers:
syslog:
type: files
include_paths:
- /var/log/messages
- /var/log/syslog
etl-error-logs:
type: files
include_paths:
- /home/user/test_logging/err_*
etl-info-logs:
type: files
include_paths:
- /home/user/test_logging/out_*
processors:
etl_log_processor:
type: parse_regex
field: message
regex: "(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s(?<severity>INFO|ERROR)\s(?<message>.*)"
time_key: time
time_format: "%Y-%m-%d %H:%M:%S"
service:
pipelines:
default_pipeline:
receivers: [syslog]
error_pipeline:
receivers: [etl-error-logs]
processors: [etl_log_processor]
log_level: error
info_pipeline:
receivers: [etl-info-logs]
processors: [etl_log_processor]
log_level: info
metrics:
receivers:
hostmetrics:
type: hostmetrics
collection_interval: 60s
processors:
metrics_filter:
type: exclude_metrics
metrics_pattern: []
service:
pipelines:
default_pipeline:
receivers: [hostmetrics]
processors: [metrics_filter]
error_pipeline:
receivers: [hostmetrics]
processors: [metrics_filter]
info_pipeline:
receivers: [hostmetrics]
processors: [metrics_filter]
and this is an example of the logs: 2021-11-22 11:15:44 INFO testing normal
I didn't fully understand the google docs so I created the yaml as best as I understood and with a reference to their main example but I have no idea why it doesn't work
environmen:GCE VM
You want to use those logs in GCP Log Viewer: yes
Which docs did you follow: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/configuration#logging-receivers
How did you install OpsAgent: in gce I entered each vm instance went to observability and there was the option to install ops agent in cloud shell
What logs you want to save: I want to save all of the logs that are being written to my log file live.
specific applications logs: its an etl process that runs in python and saves its logs to a local file on the vm

sudo journalctl -xe | grep "google_cloud_ops_agent_engine"
Try out this command it should show you the exact(almost) error

Related

Google Cloud: ERROR: Reachability Check failed

I followed this answer already. But it didn't help, also, I re-installed gcloud CLI, but now I am not able to install CLI anymore because of the following error.
Here is my output for ./google-cloud-sdk/bin/gcloud init
ERROR: Reachability Check failed.
Cannot reach https://cloudresourcemanager.googleapis.com/v1beta1/projects with httplib2 (SSLCertVerificationError)
Cannot reach https://www.googleapis.com/auth/cloud-platform with httplib2 (SSLCertVerificationError)
Cannot reach https://cloudresourcemanager.googleapis.com/v1beta1/projects with requests (SSLError)
Cannot reach https://www.googleapis.com/auth/cloud-platform with requests (SSLError)
Network connection problems may be due to proxy or firewall settings.
Also, I am not behind any corporate proxy.
It was working perfectly few days ago, until today.I did not changed any settings whatsoever, I didn't install any new services whatsoever.
Output for ./google-cloud-sdk/bin/gcloud info.
./google-cloud-sdk/bin/gcloud info
Google Cloud SDK [354.0.0]
Python Version: [3.7.9 (v3.7.9:13c94747c7, Aug 15 2020, 01:31:08) [Clang 6.0 (clang-600.0.57)]]
Python Location: [/Users/myname/.config/gcloud/virtenv/bin/python3]
Site Packages: [Enabled]
Installation Root: [/Users/myname/Downloads/google-cloud-sdk]
Installed Components:
gsutil: [4.67]
core: [2021.08.20]
bq: [2.0.71]
System PATH: [/Users/myname/.config/gcloud/virtenv/bin:/Users/myname/Downloads/apache-maven-3.8.4/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/munki:/usr/local/opt/go/libexec/bin:/Users/myname/go/bin]
Python PATH: [/Users/myname/Downloads/./google-cloud-sdk/lib/third_party:/Users/myname/Downloads/google-cloud-sdk/lib:/Library/Frameworks/Python.framework/Versions/3.7/lib/python37.zip:/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7:/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload:/Users/myname/.config/gcloud/virtenv/lib/python3.7/site-packages]
Cloud SDK on PATH: [False]
Kubectl on PATH: [/usr/local/bin/kubectl]
Installation Properties: [/Users/myname/Downloads/google-cloud-sdk/properties]
User Config Directory: [/Users/myname/.config/gcloud]
Active Configuration Name: [default]
Active Configuration Path: [/Users/myname/.config/gcloud/configurations/config_default]
Account: [None]
Project: [None]
Current Properties:
[core]
disable_usage_reporting: [True]
Logs Directory: [/Users/myname/.config/gcloud/logs]
Last Log File: [/Users/myname/.config/gcloud/logs/2022.08.10/15.35.06.807614.log]
git: [git version 2.32.0 (Apple Git-132)]
ssh: [OpenSSH_8.1p1, LibreSSL 2.7.3]
Update on this, just disable the ssl validation and everything will work.
gcloud config set auth/disable_ssl_validation True

SSM Automation Run Command is not longer than default 3600 seconds

I have been working with AWS Systems Manager and I have created a Document to run a command, But it appears there is no way to overwrite the timeout for a run command in an SSM
I have changed the execution timeout here in the parameters but does not work.
also, I added a timeoutSeconds in my Document and it doesn't work either.
This is my Document (I'm using schema version 2.2):
schemaVersion: "2.2"
description: "Runs a Python command"
parameters:
Params:
type: "String"
description: "Params after the python3 keyword."
mainSteps:
- action: "aws:runShellScript"
name: "Python3"
inputs:
timeoutSeconds: '300000'
runCommand:
- "sudo /usr/bin/python3 /opt/python/current/app/{{Params}}"
1: The setting that’s displayed in your screenshot in the Other parameters section is the Delivery Timeout, which is different from the execution timeout.
You must specify the execution timeout value in the Execution Timeout field, if available. Not all SSM documents require that you specify an execution timeout. If a Systems Manager document doesn't require that you explicitly specify an execution timeout value, then Systems Manager enforces the hard-coded default execution timeout.
2: In your document, the timeoutSeconds attribute is in the wrong place. It needs to be on the same level as the action.
...
mainSteps:
- action: "aws:runShellScript"
timeoutSeconds: 300000
name: "Python3"
inputs:
runCommand:
- "sudo /usr/bin/python3 /opt/python/current/app/{{Params}}"
timeoutSeconds: '300000'
Isn't this string but not integer?

How to check if the latest Cloud Run revision is ready to serve

I've been using Cloud Run for a while and the entire user experience is simply amazing!
Currently I'm using Cloud Build to deploy the container image, push the image to GCR, then create a new Cloud Run revision.
Now I want to call a script to purge caches from CDN after the latest revision is successfully deployed to Cloud Run, however $ gcloud run deploy command can't tell you if the traffic is started to pointing to the latest revision.
Is there any command or the event that I can subscribe to to make sure no traffic is pointing to the old revision, so that I can safely purge all caches?
#Dustin’s answer is correct, however "status" messages are an indirect result of Route configuration, as those things are updated separately (and you might see a few seconds of delay between them). The status message will still be able to tell you the Revision has been taken out of rotation if you don't mind this.
To answer this specific question (emphasis mine) using API objects directly:
Is there any command or the event that I can subscribe to to make sure no traffic is pointing to the old revision?
You need to look at Route objects on the API. This is a Knative API (it's available on Cloud Run) but it doesn't have a gcloud command: https://cloud.google.com/run/docs/reference/rest/v1/namespaces.routes
For example, assume you did 50%-50% traffic split on your Cloud Run service. When you do this, you’ll find your Service object (which you can see on Cloud Console → Cloud Run → YAML tab) has the following spec.traffic field:
spec:
traffic:
- revisionName: hello-00002-mob
percent: 50
- revisionName: hello-00001-vat
percent: 50
This is "desired configuration" but it actually might not reflect the status definitively. Changing this field will go and update Route object –which decides how the traffic is splitted.
To see the Route object under the covers (sadly I'll have to use curl here because no gcloud command for this:)
TOKEN="$(gcloud auth print-access-token)"
curl -vH "Authorization: Bearer $TOKEN" \
https://us-central1-run.googleapis.com/apis/serving.knative.dev/v1/namespaces/GCP_PROJECT/routes/SERVICE_NAME
This command will show you the output:
"spec": {
"traffic": [
{
"revisionName": "hello-00002-mob",
"percent": 50
},
{
"revisionName": "hello-00001-vat",
"percent": 50
}
]
},
(which you might notice is identical with Service’s spec.traffic –because it's copied from there) that can tell you definitively which revisions are currently serving traffic for that particular Service.
You can use gcloud run revisions list to get a list of all revisions:
$ gcloud run revisions list --service helloworld
REVISION ACTIVE SERVICE DEPLOYED DEPLOYED BY
✔ helloworld-00009 yes helloworld 2019-08-17 02:09:01 UTC email#email.com
✔ helloworld-00008 helloworld 2019-08-17 01:59:38 UTC email#email.com
✔ helloworld-00007 helloworld 2019-08-13 22:58:18 UTC email#email.com
✔ helloworld-00006 helloworld 2019-08-13 22:51:18 UTC email#email.com
✔ helloworld-00005 helloworld 2019-08-13 22:46:14 UTC email#email.com
✔ helloworld-00004 helloworld 2019-08-13 22:41:44 UTC email#email.com
✔ helloworld-00003 helloworld 2019-08-13 22:39:16 UTC email#email.com
✔ helloworld-00002 helloworld 2019-08-13 22:36:06 UTC email#email.com
✔ helloworld-00001 helloworld 2019-08-13 22:30:03 UTC email#email.com
You can also use gcloud run revisions describe to get details about a specific revision, which will contain a status field. For example, an active revision:
$ gcloud run revisions describe helloworld-00009
...
status:
conditions:
- lastTransitionTime: '2019-08-17T02:09:07.871Z'
status: 'True'
type: Ready
- lastTransitionTime: '2019-08-17T02:09:14.027Z'
status: 'True'
type: Active
- lastTransitionTime: '2019-08-17T02:09:07.871Z'
status: 'True'
type: ContainerHealthy
- lastTransitionTime: '2019-08-17T02:09:05.483Z'
status: 'True'
type: ResourcesAvailable
And an inactive revision:
$ gcloud run revisions describe helloworld-00008
...
status:
conditions:
- lastTransitionTime: '2019-08-17T01:59:45.713Z'
status: 'True'
type: Ready
- lastTransitionTime: '2019-08-17T02:39:46.975Z'
message: Revision retired.
reason: Retired
status: 'False'
type: Active
- lastTransitionTime: '2019-08-17T01:59:45.713Z'
status: 'True'
type: ContainerHealthy
- lastTransitionTime: '2019-08-17T01:59:43.142Z'
status: 'True'
type: ResourcesAvailable
You'll specifically want to check the type: Active condition.
This is all available via the Cloud Run REST API as well: https://cloud.google.com/run/docs/reference/rest/v1/namespaces.revisions
By default, the traffic is routed to the latest revision. You can see this into the logs.
Deploying container to Cloud Run service [SERVICE_NAME] in project [YOUR_PROJECT] region [YOUR_REGION]
✓ Deploying... Done.
✓ Creating Revision...
✓ Routing traffic...
Done.
Service [SERVICE_NAME] revision [SERVICE_NAME-00012-yic] has been deployed and is serving 100 percent of traffic at https://SERVICE_NAME-vqg64v3fcq-uc.a.run.app
If you want to be sure, you can explicitly call the update traffic command
gcloud run services update-traffic --platform=managed --region=YOUR_REGION --to-latest YOUR_SERVICE

Configuring Concourse CI to use AWS Secrets Manager

I have been trying to figure out how to configure the docker version of Concourse (https://github.com/concourse/concourse-docker) to use the AWS Secrets Manager and I added the following environment variables into the docker-compose file but from the logs it doesn't look like it ever reaches out to AWS to fetch the creds. Am I missing something or should this automatically happen when adding these environment variables under environment in the docker-compose file? Here are the docs I have been looking at https://concourse-ci.org/aws-asm-credential-manager.html
version: '3'
services:
concourse-db:
image: postgres
environment:
POSTGRES_DB: concourse
POSTGRES_PASSWORD: concourse_pass
POSTGRES_USER: concourse_user
PGDATA: /database
concourse:
image: concourse/concourse
command: quickstart
privileged: true
depends_on: [concourse-db]
ports: ["9090:8080"]
environment:
CONCOURSE_POSTGRES_HOST: concourse-db
CONCOURSE_POSTGRES_USER: concourse_user
CONCOURSE_POSTGRES_PASSWORD: concourse_pass
CONCOURSE_POSTGRES_DATABASE: concourse
CONCOURSE_EXTERNAL_URL: http://XXX.XXX.XXX.XXX:9090
CONCOURSE_ADD_LOCAL_USER: test: test
CONCOURSE_MAIN_TEAM_LOCAL_USER: test
CONCOURSE_WORKER_BAGGAGECLAIM_DRIVER: overlay
CONCOURSE_AWS_SECRETSMANAGER_REGION: us-east-1
CONCOURSE_AWS_SECRETSMANAGER_ACCESS_KEY: <XXXX>
CONCOURSE_AWS_SECRETSMANAGER_SECRET_KEY: <XXXX>
CONCOURSE_AWS_SECRETSMANAGER_TEAM_SECRET_TEMPLATE: /concourse/{{.Secret}}
CONCOURSE_AWS_SECRETSMANAGER_PIPELINE_SECRET_TEMPLATE: /concourse/{{.Secret}}
pipeline.yml example:
jobs:
- name: build-ui
plan:
- get: web-ui
trigger: true
- get: resource-ui
- task: build-task
file: web-ui/ci/build/task.yml
- put: resource-ui
params:
repository: updated-ui
force: true
- task: e2e-task
file: web-ui/ci/e2e/task.yml
params:
UI_USERNAME: ((ui-username))
UI_PASSWORD: ((ui-password))
resources:
- name: cf
type: cf-cli-resource
source:
api: https://api.run.pivotal.io
username: ((cf-username))
password: ((cf-password))
org: Blah
- name: web-ui
type: git
source:
uri: git#github.com:blah/blah.git
branch: master
private_key: ((git-private-key))
When storing parameters for concourse pipelines in AWS Secrets Manager, it must follow this syntax,
/concourse/TEAM_NAME/PIPELINE_NAME/PARAMETER_NAME`
If you have common parameters that are used across the team in multiple pipelines, use this syntax to avoid creating redundant parameters in secrets manager
/concourse/TEAM_NAME/PARAMETER_NAME
The highest level that is supported is concourse team level.
Global parameters are not possible. Thus these variables in your compose environment will not be supported.
CONCOURSE_AWS_SECRETSMANAGER_TEAM_SECRET_TEMPLATE: /concourse/{{.Secret}}
CONCOURSE_AWS_SECRETSMANAGER_PIPELINE_SECRET_TEMPLATE: /concourse/{{.Secret}}
Unless you want to change the prefix /concourse, these parameters shall be left to their defaults.
And, when retrieving these parameters in the pipeline, no changes required in the template. Just pass the PARAMETER_NAME, concourse will handle the lookup in secrets manager as per the team and pipeline name.
...
params:
UI_USERNAME: ((ui-username))
UI_PASSWORD: ((ui-password))
...

AWS Code Deploy Deployment Failed run as user root failed

i have been use code pipeline and code deploy to deploy my asp.net application using git hub , every time deployment failed with this error message in event log
" Script at specified location: scripts/stop_service run as user root failed with exit code 5 "
i have been installed the Code Deploy Agent on ec2 instance and here is sample from appspec file
version: 0.0
os: linux
files:
- source: /
destination: /var/www/html/
hooks:
BeforeInstall:
- location: scripts/install_dependencies
timeout: 300
runas: root
- location: scripts/start_server
timeout: 300
runas: root
ApplicationStop:
- location: scripts/stop_server
timeout: 300
runas: root
enter image description here
CodeDeploy is trying to run the script 'scripts/stop_server' and failing e.g. because the script is not present at this location.
If the script is at this location, check whats wrong with the execution by inspecting the log at:
/opt/codedeploy-agent/deployment-root/[deployment-group-ID]/[deployment-ID]/logs/scripts.log
Replace '[deployment-group-ID]' and '[deployment-ID]' with actual Ids of your deployment.