Regarding the pod issue in AWS EKS - amazon-web-services

I have created a cluster in AWS EKS with 2 nodes, where it is a web based application and i am using EBS volumes on the Pods. I am able to communicate in initial login page with the server and can proceed with other steps as well. But sometimes for one particular module it is throwing an "internal server error:500" which i used to inspect option whether it is from the development code issue or from the AWS code issue.
Only for one particular Pod(service) the issue is coming and throwing an error whether it is from network related issue, which the communication is not happening or not. from the server the data is unable to pull or not unable to understand.

Related

AWS EMR jupyter error 403 Forbidden (Workspace is not attached to cluster)

I have a simple notebook in EMR. I have no running clusters. From the notebook open page itself I request a new cluster so my expectation is that all params necessary to ensure a good workbook-cluster connection are in place. I observe that the release is emr-5.36.0 and that applications Hadoop, Spark, Livy, Hive, JupyterEnterpriseGateway are all included. I am using default security groups.
Both the cluster and the workbook hosts start but upon opening jupyter (or jupyterlab), the kernel launch fails with the message Error 403: Workbook is not attached to cluster. All attempts at "jiggling" the kernel -- choosing a different one, doing a start/stop, etc. -- all yield the same error.
There are a number of docs plus answers here on SO but these tend to revolve around trying to use EC2 instances instead of EMR, messing with master vs. core nodes, forgetting the JupyterGateway, and the like. Again, you'd think that a cluster launched directly from notebook would work.
Any clues?
I have done this many times before and it always works, with the create new cluster option, and default security groups are not an issue.
here is an image of one from before:
One thing that could cause this error, and which you have not made clear is that it will not let you open it as root. So do not use the root AWS account to create the cluster / notebook. Create and use an IAM user that has permissions to launch the cluster
I tried with the admin policy attached.

AWS EB can't talk to the app servers... what could block that

We had a working env yesterday morning. One person was working with some VPC peering stuff. And somehow it seems blocked the ELB from being able to talk to the application servers. So we can't deploy.
We get this error
Command failed on instance. An unexpected error has occurred [ErrorCode: 0000000001].
And EB can't retrieve any logs from the app server either. So clearly it is getting blocked. The IAM of the instance has AWSElasticBeanstalkWebTier and is the same IAM used in another env that works.
Also, I can RDP to the app instance from my laptop. Though I added my IP to one of the security groups for the instance. I audited the SG's for the working env, and don't see anything specific for EB.
Everything points to something with the VPC... but what should I look at?
So turns out that that error message doesn't always mean it can't talk to the instance. We had updated the AMI a while back and used the wrong one... it wasn't compatible with the platform. So nothing worked right, including getting the logs. So the fix was to sync up the AMI and the EB platform.

AWS Datapipeline RDS to S3 Activity Error: Unable to establish connection to jdbc://mysql:

I am currently setting up a AWS Data Pipeline using the RDStoRedshift Template. During the first RDStoS3Copy activity I am receiving the following error:
"[ERROR] (TaskRunnerService-resource:df-04186821HX5MK8S5WVBU_#Ec2Instance_2021-02-09T18:09:17-0) df-04186821HX5MK8S5WVBU amazonaws.datapipeline.database.ConnectionFactory: Unable to establish connection to jdbc://mysql:/myhostname:3306/mydb No suitable driver found for jdbc://mysql:/myhostname:3306/mydb"
I'm relatively new with AWS services, but it seems that the copy activity spins up an EC2 instance for the copy activity. The error clearly states there isn't a drive available. Do I need to stand up an EC2 instance for AWSDataPipeline to use and install the driver there?
Typically when you are coding a solution that interacts with a MySQL RDS instance, esp a Java solution such a Lambda function written using Java runtime API or a cloud based web app (ie - Spring Boot web app), you specify the driver file using a POM/Gradle dependency.
For this use case, there seems to be information here about a Driver file: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.html

AWS Fargate 503 Service Temporarily Unavailable

I'm trying to deploy backend application to the AWS Fargate using cloudformation templates that I found. When I was using the docker image training/webapp I was able to successfully deploy it and access with the externalUrl from the networking stack for the app.
When I try to deploy our backend image I can see the stacks are deploying correctly but when I try to go to the externalUrl I get 503 Service Temporarily Unavailable and I'm unable to see it... Another thing that I've noticed is on the docker hub I can see that the image is continuously pulled all the time when the cloudformation services are running...
The backend is some kind of maven project I don't know exactly what but I know that locally its working but to get it up running the container with this backend image takes about 8 minutes... I'm not sure if this affects the Fargate ?? Any Idea how to get it working ?
It sounds like you need to find the actual error that you're experiencing, the 503 isn't enough information. Can you provide some other context?
I'm not familiar with fargate but have been using ecs quite a bit this year and I generally would find that by going to (on the dashboard) ecs -> cluster -> service -> events. The events tab gives more specific errors as to what is happening.
My ecs deployment problems are generally summarized into
the container is not exposing the same port as is in the definition, this could be the case if you're deploying from a stack written by someone else.
the task definition memory/cpu restrictions don't grant enough space for the application and it has trouble placing (probably a problem with ecs more than fargate but you never know.)
Your timeout in the task definition is not set to 8 minutes: see this question, it has a lot of this covered
Your start command in the task definition does not work as expected with the container you're trying to deploy
If it is pulling from docker hub continuously my bet would be that it's 1, 3 or 4, and it's attempting to pull the image over and over again.
Try adding a Health check grace period of 60 by going to ECS -> cluster -> service -> update Network Access section.

S3 to EC2 problems

I´m trying to transfer a project from TeamCity to my EC2 server using CodeDeploy. In the process we had a problem and the file isn´t being transfered from the S3 service to our EC2 instance. The error message is :
The overall deployment failed because too many individual instances
failed deployment, too few healthy instances are available for
deployment, or some instances in your deployment group are
experiencing problems. (Error code: HEALTH_CONSTRAINTS)
The team decided that the best way to solve this problem was reading the server log, but in the process we noticed that the server keeps shutting down alone and that was a huge problem, we tought that can be solved with the logs, so we tried to get them using CodeWatch ( our team created the correct IAM and with a run command installed the agent on the server). Sadly this work only managed to get shutting down or turning information on logs.
At this moment we don´t know how to solve this problem but our plain was to get all the logs and then see what is wrong.
I´m stucked at this part since setember, can someone help me ?