Elastic Beanstalk CloudWatch Log streaming stops working – How to debug - amazon-web-services

My Elastic Beanstalk environment is stopping streaming node.js events to CloudWatch Logs. Streaming works fine for a view minutes on a new instance. After a view minutes no more logs show up in CloudWatch.
I set up AWS Elastic Beanstalk to stream logs to CloudWatch under Configuration > Software Configuration > CloudWatch Logs > Log Streaming (true). I deactivated log streaming and reactivated it as a test. Taking a look at cloudwatch
Last eb-activity log is about 10 minutes old
Error log is not available (on neither of the instances)
nginx/access.log is a view seconds old
nodejs.log is about an hour old (short after relaunching instance)
Every health check writes an log entry every view seconds into nodejs.log though.
I did not find any logs on the ec2 instance regarding log streaming.
Has anyone similar issues?
How do I make Elastic Beanstalk stream nodejs logs to CloudWatch logs.
--- EDIT
[ec2-user#ip-###-##-##-## log]$ cat /var/log/awslogs.log
2017-03-07 11:01:05,928 - cwlogs.push.stream - INFO - 31861 - Thread-1 - Detected file rotation, notifying reader
2017-03-07 11:01:05,928 - cwlogs.push.stream - INFO - 31861 - Thread-1 - Reader is still alive.
2017-03-07 11:01:05,928 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/error.log*'.
2017-03-07 11:01:05,928 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/access.log*'.
2017-03-07 11:01:06,052 - cwlogs.push.reader - INFO - 31861 - Thread-8 - No data is left. Reader is leaving.
2017-03-07 11:01:10,929 - cwlogs.push.stream - INFO - 31861 - Thread-1 - Removing dead reader [2177a5cce5ed29525de329bfdc292ff1, /var/log/nginx/access.log]
2017-03-07 11:01:10,929 - cwlogs.push.stream - INFO - 31861 - Thread-1 - Starting reader for [92257964a10edeb586f084f4f2ba35de, /var/log/nginx/access.log]
2017-03-07 11:01:10,930 - cwlogs.push.reader - INFO - 31861 - Thread-11 - Start reading file from 0.
2017-03-07 11:01:10,930 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/error.log*'.
2017-03-07 11:01:10,930 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/access.log*'.
2017-03-07 11:01:15,931 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/error.log*'.
2017-03-07 11:01:15,931 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/access.log*'.
2017-03-07 11:01:16,788 - cwlogs.push.publisher - INFO - 31861 - Thread-7 - Log group: /aws/elasticbeanstalk/production/var/log/nginx/access.log, log stream: i-0bd24767864801e2c, queue size: 0, Publish batch: {'skipped_events_count': 0, 'first_event': {'timestamp': 1488884470930, 'start_position': 0L, 'end_position': 114L}, 'fallback_events_count': 0, 'last_event': {'timestamp': 1488884472931, 'start_position': 341L, 'end_position': 454L}, 'source_id': '92257964a10edeb586f084f4f2ba35de', 'num_of_events': 4, 'batch_size_in_bytes': 554}
2017-03-07 11:01:20,932 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/error.log*'.
2017-03-07 11:01:20,932 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/access.log*'.
2017-03-07 11:01:25,933 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/error.log*'.
2017-03-07 11:01:25,933 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/access.log*'.
2017-03-07 11:01:27,881 - cwlogs.push.publisher - INFO - 31861 - Thread-7 - Log group: /aws/elasticbeanstalk/production/var/log/nginx/access.log, log stream: i-0bd24767864801e2c, queue size: 0, Publish batch: {'skipped_events_count': 0, 'first_event': {'timestamp': 1488884481933, 'start_position': 454L, 'end_position': 568L}, 'fallback_events_count': 0, 'last_event': {'timestamp': 1488884482934, 'start_position': 568L, 'end_position': 681L}, 'source_id': '92257964a10edeb586f084f4f2ba35de', 'num_of_events': 2, 'batch_size_in_bytes': 277}

When Andrew (#andrew-ferk) and myself activated log streaming, it created all the log groups and streams in CloudWatch with the current log. After we deployed again, we noticed the logs stopped. This is because aws hashes the first line of the log. If it has seen that hash before it will treat that file like it's already been processed
If you are using npm start the first lines will be your application's name with version.
You can add a CMD date && npm start to your dockerfile to trigger a different first line each time or run npm in silent mode (as long as your first output is unique).
Also according to their docs you should add some policy to your elastic-beanstalk before enabling the feature AWS-Docs
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:GetLogEvents",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:PutRetentionPolicy"
],
"Resource": [
"*"
]
}
]
}

The following FAQs might be helpful:
CloudWatch Logs Agent FAQs
Why can’t I push log data to CloudWatch Logs with the awslogs agent?
Some things to check if you are streaming custom log files:
eb ssh into the instance and look at /var/log/awslogs.log. If that doesn't even mention "Loading additional configs from (your awslogs config file)", make sure you are installing your config file correct as well as restarting the awslogs service after installing it (presumably using .ebextensions. See "Custom Log File Streaming" in Using Elastic Beanstalk with Amazon CloudWatch Logs. See the commands section in logs-streamtocloudwatch-linux.config for how to restart the awslogs service.
The CloudWatch Logs Agent is stateful. If the first few lines of your log file are blank or never change, you may need to set file_fingerprint_lines. See CloudWatch Logs Agent Reference.

Related

Google Cloud Build keeps on giving me 'Can't reach database server at'

So I have been at this for days now almost and it is driving me crazy. Based on other posts, I have set up the following cloudbuild.yaml :
steps:
- name: gcr.io/cloud-builders/docker
args:
- build
- -t
- gcr.io/${INSTANCE_NAME}
- .
- name: gcr.io/cloud-builders/docker
args:
- push
- gcr.io/${INSTANCE_NAME}
- name: 'gcr.io/${INSTANCE_NAME}'
entrypoint: sh
env:
- DATABASE_URL=postgresql://USER:PASSWORD#localhost/DATABASE?host=/cloudsql/CONNECTION_NAME
args:
- -c
- |
wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy
chmod +x cloud_sql_proxy
./cloud_sql_proxy -instances=CONNECTION_NAME=tcp:5432 & sleep 3
npx prisma migrate deploy
- name: gcr.io/google.com/cloudsdktool/cloud-sdk
entrypoint: gcloud
args:
- run
- deploy
- backend
- --image
- gcr.io/${INSTANCE_NAME}
- --region
- europe-west1
images:
- gcr.io/${INSTANCE_NAME}
When running this, I am greeted by:
Step #2: 2023/02/05 13:00:49 Listening on 127.0.0.1:5432 for CONNECTION_NAME
Step #2: 2023/02/05 13:00:49 Ready for new connections
Step #2: 2023/02/05 13:00:49 Generated RSA key in 118.117245ms
Step #2: npm WARN exec The following package was not found and will be installed: prisma#4.9.0
Step #2: Prisma schema loaded from prisma/schema.prisma
Step #2: Datasource "db": PostgreSQL database "develop", schema "public" at "localhost"
Step #2:
Step #2: Error: P1001: Can't reach database server at `/cloudsql/CONNECTION_NAME`:`5432`
Step #2:
Step #2: Please make sure your database server is running at `/cloudsql/CONNECTION_NAME`:`5432`.
So even with using the database url hardcoded and with the Cloud SQL proxy working, i am STILL getting this error. What am I missing?
Check for the container-name in .env file and change it to postgres as it would replace name in connection string as discussed here
Or try the following format if you don’t want to hardcode IP address
DB_USER=dbuser
DB_PASS=dbpass
DB_HOST=localhost
DB_PORT=5432
CLOUD_SQL_CONNECTION_NAME=/cloudsql/gcp-project-id:europe-west3:db-instance-name
DATABASE_URL=postgres://${DB_USER}:${DB_PASS}#${DB_HOST}:${DB_PORT}/${DB_BASE}?host=${CLOUD_SQL_CONNECTION_NAME}
If you have public IP try connecting by unix socket

getting logs from a file with Ops Agent

I have a python script on a vm that writes logs to a file and I want to use them in the google logging.
I tried this config yaml:
logging:
receivers:
syslog:
type: files
include_paths:
- /var/log/messages
- /var/log/syslog
etl-error-logs:
type: files
include_paths:
- /home/user/test_logging/err_*
etl-info-logs:
type: files
include_paths:
- /home/user/test_logging/out_*
processors:
etl_log_processor:
type: parse_regex
field: message
regex: "(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s(?<severity>INFO|ERROR)\s(?<message>.*)"
time_key: time
time_format: "%Y-%m-%d %H:%M:%S"
service:
pipelines:
default_pipeline:
receivers: [syslog]
error_pipeline:
receivers: [etl-error-logs]
processors: [etl_log_processor]
log_level: error
info_pipeline:
receivers: [etl-info-logs]
processors: [etl_log_processor]
log_level: info
metrics:
receivers:
hostmetrics:
type: hostmetrics
collection_interval: 60s
processors:
metrics_filter:
type: exclude_metrics
metrics_pattern: []
service:
pipelines:
default_pipeline:
receivers: [hostmetrics]
processors: [metrics_filter]
error_pipeline:
receivers: [hostmetrics]
processors: [metrics_filter]
info_pipeline:
receivers: [hostmetrics]
processors: [metrics_filter]
and this is an example of the logs: 2021-11-22 11:15:44 INFO testing normal
I didn't fully understand the google docs so I created the yaml as best as I understood and with a reference to their main example but I have no idea why it doesn't work
environmen:GCE VM
You want to use those logs in GCP Log Viewer: yes
Which docs did you follow: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/configuration#logging-receivers
How did you install OpsAgent: in gce I entered each vm instance went to observability and there was the option to install ops agent in cloud shell
What logs you want to save: I want to save all of the logs that are being written to my log file live.
specific applications logs: its an etl process that runs in python and saves its logs to a local file on the vm
sudo journalctl -xe | grep "google_cloud_ops_agent_engine"
Try out this command it should show you the exact(almost) error

gcloud alpha run deploy --set-secrets flag does not work: "should be either `latest` or a positive integer"

I fail when trying to use injecting my cloud secret to my cloud run services as environment variable. I followed the documentation at https://cloud.google.com/sdk/gcloud/reference/alpha/run/deploy#--set-secrets
Here is the relevant portion of the cloudbuild.yml file:
steps:
- name: 'gcr.io/cloud-builders/docker'
args: [ 'build', '-t', 'eu.gcr.io/$PROJECT_ID/backend:$BUILD_ID', '.' ]
- name: 'gcr.io/cloud-builders/docker'
args: [ 'push', 'eu.gcr.io/$PROJECT_ID/backend:$BUILD_ID' ]
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: gcloud
args:
- 'alpha'
- 'run'
- 'deploy'
- 'backend'
- '--image=eu.gcr.io/$PROJECT_ID/backend:$BUILD_ID'
- '--concurrency=10'
- '--cpu=1'
- '--memory=512Mi'
- '--region=europe-west4'
- '--min-instances=1'
- '--max-instances=2'
- '--platform=managed'
- '--port=8080'
- '--timeout=3000'
- '--set-env-vars=SQL_CONNECTION=10.0.0.3, SQL_USER=root, SQL_PASSWORD=root, SQL_DATABASE=immobilien'
- '--set-env-vars=^#^SPRING_PROFILES_ACTIVE=prod'
- '--set-env-vars=MAIL_SMTP_HOST=smtp.foo.com'
- '--set-env-vars=MAIL_SMTP_PORT=993'
- '--set-env-vars=MAIL_SMTP_USER=root'
- '--set-secrets=[MAIL_SMTP_PASSWORD=mail_smtp_password:1]'
- '--ingress=internal'
- '--vpc-connector=cloud-run'
- '--vpc-egress=private-ranges-only'
- '--set-cloudsql-instances=abc-binder-3423:europe-west4:data'
This is the error output:
Step #2: Status: Downloaded newer image for gcr.io/google.com/cloudsdktool/cloud-sdk:latest
Step #2: gcr.io/google.com/cloudsdktool/cloud-sdk:latest
Step #2: Skipped validating Cloud SQL API and Cloud SQL Admin API enablement due to an issue contacting the Service Usage API. Please ensure the Cloud SQL API and Cloud SQL Admin API are activated (see https://console.cloud.google.com/apis/dashboard).
Step #2: Deploying container to Cloud Run service [backend] in project [abc-binder-3423] region [europe-west4]
Step #2: Deploying...
Step #2: failed
Step #2: Deployment failed
Step #2: ERROR: (gcloud.alpha.run.deploy) should be either `latest` or a positive integer
Finished Step #2
ERROR
ERROR: build step 2 "gcr.io/google.com/cloudsdktool/cloud-sdk:latest" failed: step exited with non-zero status: 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ERROR: (gcloud.builds.submit) build caf4fae8-daae-49d4-9349-6995b1f275e8 completed with status "FAILURE"
I do not understand what is meant by
should be either `latest` or a positive integer
Don't surround the value with brackets.
--set-secrets=MAIL_SMTP_PASSWORD=mail_smtp_password:1

Why doesn't Dask dashboard update when I run some code?

I'm trying to recreate the behaviour of the Dask dashboard as illustrated in this Youtube video https://www.youtube.com/watch?time_continue=1086&v=N_GqzcuGLCY. I can see my dashboard, but it doesn't update when I run a computation.
I'm running everything on my local machine (Kubuntu 18.04).
I used anaconda to set up my environment, including
python 2.7.14
dask 0.17.4
dask-core 0.17.4
bokeh 1.0.4
tornado 4.5.1
I set up my scheduler from the command line
dask-scheduler
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Scheduler at: tcp://192.168.1.204:8786
distributed.scheduler - INFO - bokeh at: :8787
distributed.scheduler - INFO - Local Directory: /tmp/scheduler-bYQe2p
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Register tcp://127.0.0.1:35007
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:35007
...and a worker too.
dask-worker localhost:8786
distributed.nanny - INFO - Start Nanny at: 'tcp://127.0.0.1:36345'
distributed.worker - INFO - Start worker at: tcp://127.0.0.1:44033
distributed.worker - INFO - Listening to: tcp://127.0.0.1:44033
distributed.worker - INFO - bokeh at: 127.0.0.1:8789
distributed.worker - INFO - nanny at: 127.0.0.1:36345
distributed.worker - INFO - Waiting to connect to: tcp://localhost:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 4
distributed.worker - INFO - Memory: 16.70 GB
distributed.worker - INFO - Local Directory: /home/fergal/orbital/repos/projects/safegraph/dask/dask-worker-space/worker-QjJ1ke
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Registered to: tcp://localhost:8786
distributed.worker - INFO - -------------------------------------------------
Then my code, borrowed from the video, is
from dask.distributed import Client
import dask.array as da
client = Client(processes=False)
print(client)
x = da.random.random((10000, 10000, 10), chunks=(1000,1000,5))
y = da.random.random((10000, 10000, 10), chunks=(1000,1000,5))
z = (da.arcsin(x) + da.arcsin(y)).sum(axis=(1,2))
z.visualize('eg.svg')
z.compute()
The code runs, and produces a graph via graph-viz. The bokeh server is accessible at 127.0.0.1:8787/status, and displays a big blue block at the top right, as per the first few seconds of the video. But when I run my code, the webpage doesn't update to show a running computation, nor does it show any results when the computation is finished. Iwould expect to see something like what I see around time 1:20 on the video.
I'm undoubtedly neglecting to set something up properly, but I can't find any clues in either the documentation or on Stack Overflow. So what am I doing wrong?
I found a solution.
Update dask to 1.1.5, shutdown the dask-scheduler (and dask-worker). I'm guessing my problem was that the version of dask from the default conda channel was out of date. I downloaded the newer version from conda-forge

Slow upload speed in aws deploy push command

I am trying to use AWS CodeDeploy. I use aws deploy push --debug command. The file to be uploaded is around 250 KB. But upload doesn't finish. Following is the logs displayed.
2017-10-27 11:11:40,601 - MainThread - botocore.auth - DEBUG - CanonicalRequest:
PUT
/frontend-deployer/business-services-0.0.1-SNAPSHOT-classes.jar
partNumber=39&uploadId=.olvaJkxreDZf1ObaHCMtHmkQ5DFE.uZ9Om0sxZB08YG3tqRWBxmGLTFWSYQaj9mHl26LPJk..Stv_vPB5NMaV.zAqsYX6fZz_S3.uN5J4FlxHZFXoeTkMiBSYQB2C.g
content-md5:EDXgvJ8Tt5tHYZ6Nkh7epg==
host:s3.us-east-2.amazonaws.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20171027T081140Z
content-md5;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
...
2017-10-27 11:12:12,035 - MainThread - botocore.endpoint - DEBUG - Sending http request: <PreparedRequest [PUT]>
2017-10-27 11:12:12,035 - MainThread - botocore.awsrequest - DEBUG - Waiting for 100 Continue response.
2017-10-27 11:12:12,189 - MainThread - botocore.awsrequest - DEBUG - 100 Continue response seen, now sending request body.
Even though the file is fairly small (250 KB), upload doesn't finish.
On the other hand, upload via aws s3 cp command lasts 1 second.
How can I increase the upload speed in aws deploy push command?