Error launching Phoenix on Hbase - mapreduce

I installed Phoenix in my system and copied phoenix-4.10.0-HBase-0.98-server.jar to the Hbase lib directory.Then I restarted Hbase. If i try to execute sqlline.py to launch phoenix Hbase it throws below error. Please Advise

Issue resolved after I reinstalled hbase .

Related

Problems Integrating Hadoop 3.x on Flink cluster

I am facing some issues while trying to integrate Hadoop 3.x version on a Flink cluster. My goal is to use HDFS as a persistent storage and store checkpoints. I am currectly using Flink 1.13.1 and HDFS 3.3.1. The error that I am getting while trying to submit a job is that HDFS is not supported as a file system. In the standalone version, this error was solved by specifying the HADOOP_CLASSPATH on my local machine. As a next step I applied the solution above in all the machines that are used in my cluster and in the standalone version I managed to successfully submit my jobs in all of them without facing any issues. However, when I started modifying the configurations to setup my cluster (by specifying the IPs of my machines) that problem came up once again. What I am missing?
In Hadoop 2.x there are the pre-bundled jar files in the official flink download page that would solve similar issues in the past but that's not the case with Hadoop 3.x versions
It should be enough to set HADOOP_CLASSPATH on every machine in the cluster.
For anyone still struggling with a similar issue, the answer proposed by David worked for me in the end. The detail that I was missing was in the definition of the environment variables.
In my initial attempts, I was using the .bashrc script to permanently define my environment variables. This works in the standalone cluster which is not the case with a distributed cluster due to the scope of the script. What actually worked for me was defining my variables(and $HADOOP_CLASSPATH) in the /etc/profile
I also managed to find another solution while was struggling with HADOOP_CLASSPATH. As I mentioned in my initial post, in Hadoop 2.x there are pre-bundled jar files in the official Flink download page to support HDFS integration, which is not the case in Hadoop 3.x. I found the following maven repository page and after testing all of the existing jars I managed to find one that worked in my case. To be more precise, for Hadoop 3.3.1 the 3.1.1.7.2.8.0-224-9.0 jar (Placed the jar in the $FLINK_HOME/lib) worked. While it is not an "official solution" it seems to solve the issue.
https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-3-uber?repo=cloudera-repos

Debug Pyspark on EMR using Pycharm

Does anyone have experience with debugging Pyspark that runs on AWS EMR using Pycharm?
I couldn't find any good guides or existing threads regrading this one.
I know how to debug Scala-Spark with Intellij against EMR but I have no experince with doing this with Python.
I am aware of being able to connect to remote server using ssh (EMR Master) and maybe with Professional edition I can use the remote deployment feature to run my spark job using Pycharm but I'm not sure if it will work and I want to know if anyone has tried it, before I will go with Pycharm Pro.
I got to debug Pyspark on EMR as I wanted to.
Please look at this Medium blog post that describes how to do so:
https://medium.com/explorium-ai/debugging-pyspark-with-pycharm-and-aws-emr-d50f90077c92
It describes how to use Pycharm Pro - remote deployment feature in order to debug your pyspark program.

Issue when running spark program in EMR 5.13.0

We are able to run the spark programs in emr 5.9.0 without any issues. But getting the below error when running in emr 5.13.0.
19/11/12 07:09:43 ERROR SparkContext: Error initializing SparkContext.
javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
I have added the below dependency in maven. But still getting the same issue. Can anyone please help to fix the issue.
<dependency>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
<version>2.11.0</version>
</dependency>
Thanks
EMR 5.13.0 has spark version 2.3.0 while EMR 5.9.0 has 2.2.0. Try to upgrade spark version in your jar.
Was able to fix the issue after adding the below.
--jars xercesImpl-2.11.0.jar,xml-apis-1.4.01.jar
Thanks

How to create a working JDBC connection in Google Cloud Composer?

To get the JDBC Hook working, I first add in the jaydebeapi package in the PYPI packages page in Composer.
However, that alone does not allow a JDBC connection to work:
1) How do I specify the .jar driver path for the JDBC driver I have?
I was thinking it would be something like "/home/airflow/gcs/drivers/xxx.jar" (assuming I've created a drivers folder in the gcs directory)... but I haven't been able to verify or find documentation on this.
2)How do I install/point toward the Java JRE? On Ubuntu I run this command to install JRE: sudo apt-get install default-jre libc6-i386. Is a JRE or ability to install a JRE available in Cloud Composer? This is the current error message I get in the Adhoc window with the JDBC connection: [Errno 2] No such file or directory: '/usr/lib/jvm'
If either of the above options are not currently available, are there any workarounds to get a JDBC connection working with Composer?
There are known issues with JDBC issues in Airflow 1.9 (https://github.com/apache/incubator-airflow/pull/3257); hopefully, we should be able to backport these fixes in Composer by GA!

Zeppelin: how to download or save the Zeppelin notebook?

I am using Zeppelin sandbox with aws EMR.
Is there a way to download or save the zeppelin notebook in a way so that it can be imported into another Zeppelin server ?
As noted in the comments above, this feature is available starting in version 0.5.6. You can find more details in the release notes. Downloading and installing this version would solve that issue.
Given that you are using EMR, it looks like you will have to work with the version available. As Samuel mentioned above, you can backup the contents of the incubator-zeppelin/notebook folder and make the transfer.