Kafka Connect with Elasticsearch in AWS - amazon-web-services

I am trying to connect Kafka(MSK) in aws to Elasticsearch in aws. I set it up but currently getting an error.Here are the steps:
sudo apt-get update
sudo apt-install java-1.8.0
wget https://packages.confluent.io/archive/5.2/confluent-5.2.0-2.11.tar.gz?_ga=2.30447679.1453070970.1611201478-474568264.1611201478
tar -xf confluent-5.2.0-2.11.tar.gz
confluent-hub install confluentinc/kafka-connect-elasticsearch:11.0.0
export PATH=/home/ubuntu/confluent-5.2.0/bin:${PATH};
I then updated the connect-standalone.properties config file:
bootstrap.servers=b-1.xx.xx.c8.kafka.us-east-1.amazonaws.com:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
plugin.path=share/java,/home/ubuntu/confluent-5.2.0/share/confluent-hub-components
Then I created another config file for the sink connector.
name=elasticsearch-sink
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=1
topics=sampleTopic
topic.index.map=logs:logs_index
connection.url=https://xxxx.us-east-1.es.amazonaws.com:443
type.name=log
key.ignore=true
schema.ignore=true
Then I run the confluent standalone command to connect.
bin/connect-standalone etc/kafka/connect-standalone.properties etc/kafka/elasticsearch-connect.properties
It runs but then eventually throws an error which I can not figure out why. Bellow is the error:
ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:119)
java.lang.NoClassDefFoundError: org/apache/kafka/common/config/ConfigDef$CaseInsensitiveValidString
Any help or input would be great thank you.

Related

Kubectl against GKE Cluster through Terraform's local-exec?

I am trying to make an automatic migration of workloads between two node pools in a GKE cluster. I am running Terraform in GitLab pipeline. When new node pool is created the local-exec runs and I want to cordon and drain the old node so that the pods are rescheduled on the new one. I am using this registry.gitlab.com/gitlab-org/terraform-images/releases/1.1:v0.43.0 image for my Gitlab jobs. Also, python3 is installed with apk add as well as gcloud cli - downloading the tar and using the gcloud binary executable from google-cloud-sdk/bin directory.
I am able to use commands like ./google-cloud-sdk/bin/gcloud auth activate-service-account --key-file=<key here>.
The problem is that I am not able to use kubectl against my cluster.
Although I have installed the gke-gcloud-auth-plugin with ./google-cloud-sdk/bin/gcloud components install gke-gcloud-auth-plugin --quiet once in the CI job and second time in the local-exec script in HCL code I get the following errors:
module.create_gke_app_cluster.null_resource.node_pool_provisioner (local-exec): E0112 16:52:04.854219 259 memcache.go:238] couldn't get current server API group list: Get "https://<IP>/api?timeout=32s": getting credentials: exec: executable <hidden>/google-cloud-sdk/bin/gke-gcloud-auth-plugin failed with exit code 1
290module.create_gke_app_cluster.null_resource.node_pool_provisioner (local-exec): Unable to connect to the server: getting credentials: exec: executable <hidden>/google-cloud-sdk/bin/gke-gcloud-auth-plugin failed with exit code 1
When I check the version of the gke-gcloud-auth-plugin with gke-gcloud-auth-plugin --version
I am getting the following error:
174/bin/sh: eval: line 253: gke-gcloud-auth-plugin: not found
Which clearly means that the plugin is not installed.
The image that I am using is based on alpine for which there is no way to install the plugin via package manager, unfortunately.
Edit: gcloud components list shows gke-gcloud-auth-plugin as installed too.
The solution was to use google/cloud-sdk image in which I have installed terraform and used this image for the job in question.

How to install deb package in aws codebuild worker container

I'm trying to build docker container and push to to ECR repository. All is working fine locally, but once moved to AWS I'm getting error:
dpkg-deb: error: 'docker-ce_20.10.3_3-0_ubuntu-bionic_amd64.deb' is not a Debian format archive
dpkg: error processing archive docker-ce_20.10.3_3-0_ubuntu-bionic_amd64.deb (--install):
dpkg-deb --control subprocess returned error exit status 2
from following commands in Docker file:
COPY docker-assets/docker-ce_20.10.3_3-0_ubuntu-bionic_amd64.deb /home/folder/
RUN dpkg -i docker-ce_20.10.3_3-0_ubuntu-bionic_amd64.deb
Can anyone hit the same issue / help me out ?
I did workaround this with URL

What is the correct way of installing a JDBC driver on EMR for Sqoop to use?

I am running Sqoop 1.4.7 on AWS EMR 5.21.1 and am trying to import data from a database. I have successfully been able to do this manually where I create an EMR instance with Sqoop installed via the EMR Console.
Here are the preliminary steps that I performed in order to run sqoop on EMR
Download the JDBC Driver
Move the JDBC driver to the /usr/lib/sqoop/lib directory
I was able to successfully run a sqoop import when I was sshd into an EMR cluster with these commands:
wget -O mssql-jdbc.jar https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/8.4.0.jre8/mssql-jdbc-8.4.0.jre8.jar
sudo mv mssql-jdbc.jar /usr/lib/sqoop/lib/
When I try to run these commands from an EMR bootstrap script however I get the error:
usr/lib/sqoop/lib/ No such file or directory
After doing some investigation I realized this is because "Bootstrap actions execute before core services, such as Hadoop or Spark, are installed", as found here
So the /usr/lib/sqoop/lib directory doesnt exist when I run my bootstrap steps.
Here are some solutions which work but they feel like work-arounds
Create the /usr/lib/sqoop/lib directory in my bootstrap script and then place the jar in it
Add the jar to this directory as an EMR step. (Turns out this this is the correct approach, look at below accepted answer)
What is the correct way of installing this JDBC driver on EMR?
The 2nd option is the correct way to do it. The documentation explains running bash scripts as an EMR step.
You can also use the jar command-runner.jar and the arguments to be
bash -c "wget -O mssql-jdbc.jar https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/8.4.0.jre8/mssql-jdbc-8.4.0.jre8.jar;sudo mv mssql-jdbc.jar /usr/lib/sqoop/lib/"

AWS Ubuntu 18.04 AMI package installation failed

Whenever an AWS autoscaling group launches new ubuntu instance and I try to install any package on that instance it gives me the following error:
[stderr]E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
[stderr]E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend),
Is there another process using it?
I tried to find a solution and manually fixed it but I don't know why whenever the autoscaling group launches a new ubuntu instance it gives the following error.
When any command updates the Ubuntu or installs a new application, it locks the dpkg(Debian Package Manager).
To identify the problem, please look at the logs
If your system is installing some updates you may find journalctl logs journalctl -u apt-daily.service. This usually happend when the system is set to update itslef and you will notice such activity with this ps -ef | grep apt.systemd.daily and you can check these setting in the file /etc/apt/apt.conf.d/20auto-upgrades
/var/log/dpkg.log*(as it may get rotated) check these logs to find which all services were trying to get installed
Once you have identified the problem, you can solve with these methods:
If system is updating, then try to wait by executing sleep command in the --user-dataof your bootstrapping script
If your 1st installation of an service/application is blocking other one, then put a condition to wait/sleep until the first service is up and so on with rest of the services you are installing.
This was a common problem in Ubuntu 16.04 LTS as per, and you can find the same with the solution code https://forums.aws.amazon.com/thread.jspa?threadID=251663
A snippet of code from the referenced link:
until service codedeploy-agent status >/dev/null 2>&1; do
sleep 60
rm -f install
wget https://aws-codedeploy-us-west-2.s3.amazonaws.com/latest/install
chmod +x ./install
sudo ./install auto
service codedeploy-agent restart
done
SSH into the instance before/while the UserData is running and check which process has acquired the lock:
$ lsof /var/lib/dpkg/lock-frontend
Also, try to enable CodeDeploy agent at the last step after performing all other steps in UserData, like:
https://gist.github.com/say8425/8344d19911dba20fab5538b85006bd31

AWS Elastic Beanstalk commands return no output

I am very new to the Amazon Web Services and have been trying a learn-by-doing approach with them.
In summary I was trying to set up Git with the elastic beanstalk command line interface for my web-app. However, I wanted to use my SSH key-pair to authenticate (aws-access-id, secret) and in my naivety and ignorance, I just supplied this information (the SSH key files) and now I can't get it to work. More specifically stated below.
I have my project directory with Git set up so that it works. I then open the git bash window MINGW64 (I am on Windows 10) and attempt to set up eb.
$ eb init
It then tells me that my credentials are not set up and asks me for aws-access-id and the secret. I had just set up the SSH key-pair and try to enter these files; what's the harm in trying? EB failure, it turns out. Now, the instances seem to run fine still, looking at their status on the AWS console website. However, whatever I type into the bash:
$ eb init
$ eb status
$ eb deploy
$
There is no output. Not even an error. It just silently returns to awaiting a new command from me.
When using the --debug option with these commands, a long list of operations is returned, ending with
botocore.parsers.ResponseParserError: Unable to parse response (no element found: line 1, column 0), invalid XML received:
b''
I thought I would be able to log out or something the like, so that I could enter proper credentials which I messed up from the beginning. I restarted the web-app from the AWS webpage interface and restarted my PC. No success.
Thanks in advance.
EDIT:
I also tried reinstalling awscli and awsebcli:
pip uninstall awsebcli
pip uninstall awscli
pip install awscli
pip install awsebcli --upgrade --user
Problem persists, but now there is one output (previously seen only upon --debug option):
$ eb init
ERROR: ResponseParserError - Unable to parse response (no element found: line 1, column 0), invalid XML received:
b''
$
It sounds like you have replaced your AWS credentials in ~/.aws/credentials and/or ~/.aws/config file(s) with your SSH key. You could manually replace these or execute aws configure if you have the AWS CLI installed.