AWS EMR Impala daemon issue - amazon-web-services

I've just created EMR cluster and trying to create my first Impala table. Getting this error: This Impala daemon is not ready to accept user requests. Status: Waiting for catalog update from the StateStore. Any suggestion please? I did everything as documented by Amazon.
[ip-10-72-69-85.ec2.internal:21000] > connect localhost;
Connected to localhost:21000
Server version: impalad version 1.2.1 RELEASE (build d0bf3eae1df0f437bb4d0e44649293756ccdc76c)
[localhost:21000] > show tables;
Query: show tables
ERROR: AnalysisException: This Impala daemon is not ready to accept user requests. Status: Waiting for catalog update from the StateStore.
[localhost:21000] >

I had the same error - after many troubles I've found the simple solution:
A. Check impala-state-store and impala-catalog daemons are running:
sudo service impala-state-store status
sudo service impala-catalog status
If not running - check the logs and be sure to activate them.
B.If they are running - simply type in your impala-shell:
invalidate metadata;‏
This command will update your catalog from the state store.
Then, you are ready to start!

Run the following command in the said order and reopen the Impala browser
sudo /etc/init.d/hive-metastore start
sudo /etc/init.d/impala-state-store start
And
sudo /etc/init.d/impala-catalog start
sudo /etc/init.d/impala-server start

I actually found the solution to this problem might be to just wait. I had this problem and had restarted everything impala with no luck. I even tried stopping all impala services and starting them in the recommended order (statestore first). Nothing helped but then after an amount of time of being left it started to work. I'm not sure what that time is but it was more than 5 minutes and less than an hour.

I would first recommend you check the logs at /mnt/var/log/apps. The error is likely related to the state-store, which can be restarted with the command below.
sudo service impala-state-store restart

I ran into the same error. The tutorial skipped a couple steps. Once in an impala-shell, create a database, then use the database, then create a table.

Related

AWS EC2 User Data not working (Tried Installing and starting httpd via User Data)

The Following is my EC2 User Data:
#!/bin/bash
sudo yum update -y
sudo yum install -y httpd
sudo systemctl start httpd
sudo systemctl enable httpd
In Security Group SSH 22 Port and HTTP 80 Port is Open.
Yet when I try accessing http://public_ip_of_instance the HTTP Apache page doesn't load.
Also, on the Instance Apache is not installed when I checked sudo systemctl status httpd.
I then manually tried it on the EC2 Server and it worked. Then I removed it through yum remove as I wanted to see whether User Data works.
I stopped the Instance and started again but I observed that the User Data Script doesn't work as I am unable to access http page through browser and also on Instance http is not installed.
Where is the actual issue? Some months back this same thing worked on another instance I remember.
Your user data is correct. Whatever is happening with your website is not due to the user data code that you provided.
There could be many reasons it does not work. Public IP of the instance has changed, as always happens when you stop/start the instance. Instance may have per-existing software that clashes with httpd.
Here's some general advice on running UserData once or each startup.
Short answer as John mentioned in the comments EC2's only run the UserData (aka Bootstrap) script once on initalization.
The user data Bash/Powershell is Infrastructure-As-Code. You deploy the script and it installs and configures the machine.
This causes confusion with everyone starting AWS. When you think about it though it doesn't make sense to run the UserData script each time when the PCs already been configured.
What people do often instead is make "Golden Images" (aka Amazon Machine Images - AMI's) of pre-setup EC2s, typically for PCs that take long time to install/configure. The beauty of this is you can setup AutoScaleGroups to use the images which saves any long installation during a scale up event.
Pro Tip: When developing an UserData script run through and test it manually on the EC2. Trust me its far quicker than troubleshooting unattended EC2 UserData errors.
Long answer: you can run the UserData on each boot of the machine using Mime multi-part file. A mime multi-part file allows your script to override how frequently user data is run in the cloud-init package.
https://aws.amazon.com/premiumsupport/knowledge-center/execute-user-data-ec2/
For all those who will run into this problem, first of all check the log with the command:
sudo cat /var/log/cloud-init-output.log
then if you notice connection errors to the various repositories, the reason is because you don't have an internet connection. However, if once inside your EC2 you manage to launch the update and install commands, then the reason why they fail in the UserData is because your EC2 takes a few seconds to get the Internet connection and executes the commands before having it. So to solve this problem, just add this command after #!/bin/bash
#!/bin/bash
until ping -c1 8.8.8.8 &>/dev/null; do :; done
sudo yum update -y
...
This will prevent your EC2 from executing commands before an internet connection is established

AWS Ubuntu 18.04 AMI package installation failed

Whenever an AWS autoscaling group launches new ubuntu instance and I try to install any package on that instance it gives me the following error:
[stderr]E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
[stderr]E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend),
Is there another process using it?
I tried to find a solution and manually fixed it but I don't know why whenever the autoscaling group launches a new ubuntu instance it gives the following error.
When any command updates the Ubuntu or installs a new application, it locks the dpkg(Debian Package Manager).
To identify the problem, please look at the logs
If your system is installing some updates you may find journalctl logs journalctl -u apt-daily.service. This usually happend when the system is set to update itslef and you will notice such activity with this ps -ef | grep apt.systemd.daily and you can check these setting in the file /etc/apt/apt.conf.d/20auto-upgrades
/var/log/dpkg.log*(as it may get rotated) check these logs to find which all services were trying to get installed
Once you have identified the problem, you can solve with these methods:
If system is updating, then try to wait by executing sleep command in the --user-dataof your bootstrapping script
If your 1st installation of an service/application is blocking other one, then put a condition to wait/sleep until the first service is up and so on with rest of the services you are installing.
This was a common problem in Ubuntu 16.04 LTS as per, and you can find the same with the solution code https://forums.aws.amazon.com/thread.jspa?threadID=251663
A snippet of code from the referenced link:
until service codedeploy-agent status >/dev/null 2>&1; do
sleep 60
rm -f install
wget https://aws-codedeploy-us-west-2.s3.amazonaws.com/latest/install
chmod +x ./install
sudo ./install auto
service codedeploy-agent restart
done
SSH into the instance before/while the UserData is running and check which process has acquired the lock:
$ lsof /var/lib/dpkg/lock-frontend
Also, try to enable CodeDeploy agent at the last step after performing all other steps in UserData, like:
https://gist.github.com/say8425/8344d19911dba20fab5538b85006bd31

How to access UI in Airflow 1.10?

To start with I am trying to upgrade from 1.9 version to 1.10 so my setup contains two vms running different versions of airflow with different port forwarding.
I can access UI from vm running with 1.9 but not able to access UI from 1.10.
To debug I want to confirm if airflow webserver is running. if I execute
sudo systemctl start airflow-webserver
it throws no error but when
I am looking at netstat I am not seeing any process listening to port 8080(default).
Also I have not created any user as I do not need rbac authentication ? Can that be a problem?
As requested by #kaxil. Below is the output of ps aux | grep airflow
Can someone provide some suggestions on how to fix this problem? Also if you need any further resource can provide it. I am not sure what is relevant here.
Output of journalctl -u airflow-webserver.service -b
The Error message shows that there is an issue with airflow.cfg file i.e. there might be a character in your airflow.cfg that is causing the issue. Recheck your config file, if you don't find an issue, post your config file in your question and we will try to figure it out.

Cloud Composer GKE Node upgrade results in Airflow task randomly failing

The problem:
I have a managed Cloud composer environment, under a 1.9.7-gke.6 Kubernetes cluster master.
I tried to upgrade it (as well as the default-pool nodes) to 1.10.7-gke.1, since an upgrade was available.
Since then, Airflow has been acting randomly. Tasks that were working properly are failing for no given reason. This makes Airflow unusable, since the scheduling becomes unreliable.
Here is an example of a task that runs every 15 minutes and for which the behavior is very visible right after the upgrade:
airflow_tree_view
On hover on a failing task, it only shows an Operator: null message (null_operator). Also, there is no log at all for that task.
I have been able to reproduce the situation with another Composer environment in order to ensure that the upgrade is the cause of the dysfunction.
What I have tried so far :
I assumed the upgrade might have screwed up either the scheduler or Celery (Cloud composer defaults to CeleryExecutor).
I tried restarting the scheduler with the following command:
kubectl get deployment airflow-scheduler -o yaml | kubectl replace --force -f -
I also tried to restart Celery from inside the workers, with
kubectl exec -it airflow-worker-799dc94759-7vck4 -- sudo celery multi restart 1
Celery restarts, but it doesn't fix the issue.
So I tried to restart the airflow completely the same way I did with airflow-scheduler.
None of these fixed the issue.
Side note, I can't access Flower to monitor Celery when following this tutorial (Google Cloud - Connecting to Flower). Connecting to localhost:5555 stay in 'waiting' state forever. I don't know if it is related.
Let me know if I'm missing something!
1.10.7-gke.2 is available now [1]. Can you further upgrade to 1.10.7-gke.2 to see if the issue persists?
[1] https://cloud.google.com/kubernetes-engine/release-notes

InfluxDB Cannot see databases from localhost:8083 + Cannot access Command Line Interface

Please feel free to redirect me to any other place if this isn't the right one for this question.
Problem: When I log to the administration panel : "localhost:8083" with "root" "root" I cannot see the existing databases nor the data in it. Also, I have no way to access InfluxDB from the command line.
Also the line sudo /etc/init.d/influxdb start does not work for my setup. I have to go into /etc/init.d/ and run sudo ./influxdb start -config=config.toml in order to get the server running.
I've installed influxDB v0.8 from https://influxdb.com/docs/v0.8/introduction/installation.html for Ubuntu 14.04.
I've been developing a Clojure program using the Capacitor API just to get started and interact with InfluxDB. It runs well, I can create delete, insert and query a database without problems.
netstat -anp | grep LISTEN confirms me that ports 8083 8086 8090 and 8099 are listening.
I've been Googling all around but cannot manage to get a solution.
Thanks for the support and enjoy building things !
Problem solved: the database weren't visible in firefox but everything is visible in Chromium!
Why couldn't I access the CLI ? I was expecting the v0.8 to behave exactly like the v0.9.
You help was appreciated anyway !
For InfluxDB 0.9 the CLI could be started with:
/opt/influxdb/influx
then you can display available databases:
Connected to http://localhost:8086 version 0.9.1
InfluxDB shell 0.9.1
> show databases
name: databases
---------------
name
collectd
graphite
> use collectd
Using database collectd
> show series limit 5
You can try creating new database from CLI:
> CREATE DATABASE mydb
or with curl command:
curl -G 'http://localhost:8086/query' --data-urlencode "q=CREATE DATABASE mydb"
Web UI should be available on http://localhost:8083