integration between Hive and Hbase - mapreduce

I'm using hive over hbase to make some BI.
i have already configured hive and hbase but when i run that query "select count(*) from hbase_table_2 " on Hive
hbase_table_2 is a table in hive which refer to a table in Hbase
This exception occurred:
# of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201212171838_0009_m_000000"
java.io.IOException: java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation#7d858aa0 closed
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)"
WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found on CLASSPATH
I don't know where is the problem, can any one help me?

Related

Database Migration Task fails to load the data into the source database

I have created PostgreSQL (target) RDS on AWS , did schema conversion using SCT and now I am trying to move data using Data Migration task from database (DB2) placed at EC2 instance (source) to target DB. The data is not loading and task is giving following error:
Last Error ODBC general error. Task error notification received from subtask 1, thread 0 [reptask/replicationtask.c:2800] [1022502] Error executing source loop; Stream component failed at subtask 1, component st_1_5D3OUPDVTS3BLNMSQGEXI7ARKY ; Stream component 'st_1_5D3OUPDVTS3BLNMSQGEXI7ARKY' terminated [reptask/replicationtask.c:2807] [1022502] Stop Reason RECOVERABLE_ERROR Error Level RECOVERABLE
I was getting the same error and the issue was related to database user rights for REPLICATION CLIENT and REPLICATION SLAVE as mentioned in AWS Documentation:
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html#CHAP_Source.MySQL.Prerequisites
I resolved it by setting the above mentioned REPLICATION rights using the following statements in MySQL (replacing {dbusername} with the actual database user name which was being used in DMS Endpoint):
GRANT REPLICATION CLIENT ON *.* to {dbusername}#'%';
GRANT REPLICATION SLAVE ON *.* to {dbusername}#'%';

ODBC: ERROR [HY000] [Microsoft][DriverSupport] (1170)

Getting this error when connecting Power BI with Azure Databricks through spark build in connector:-
Details: "ODBC: ERROR [HY000] [Microsoft][DriverSupport] (1170)
Unexpected response received from server. Please ensure the server
host and port specified for the connection are correct."
I have checked many times host and port of the databrick cluster , and also tried after restarting of cluster .
Guide for the connection:-
https://docs.azuredatabricks.net/user-guide/bi/power-bi.html
Got the same problem today. I followed these instructions and it worked.
The user was not able to import SQL data Power BI and getting this error, while testing connection in ODBC was successful.
It turned out that he has old credentials stored in PowerBI, and that caused identification issues. Purging cached data sources (Power BI: Home >Edit Queries > Data source settings" resolved the issue.

AWS EMR HBase Bulk Load

I developed a Map Reduce program to make HBase Bulk Loading using the technique explained in this Cloudera article : https://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/.
On our previous on-prem Cloudera Hadoop cluster, it was working very well. Now, we are moving to AWS. I don't manage to make this program work on AWS EMR cluster.
EMR details :
Release label:emr-5.16.0
Hadoop distribution:Amazon 2.8.4
Applications:Spark 2.3.1, HBase 1.4.4
Master : m4.4xlarge
Nodes : 12 x m4.4xlarge
Here is the code of my Driver
Job job = Job.getInstance(getConf());
job.setJobName("My job");
job.setJarByClass(getClass());
// Input
FileInputFormat.setInputPaths(job, input);
// Mapper
job.setMapperClass(MyMapper.class);
job.setInputFormatClass(ExampleInputFormat.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
// Reducer : Auto configure partitioner and reducer
Table table = HBaseCnx.getConnection().getTable(TABLE_NAME);
RegionLocator regionLocator = HBaseCnx.getConnection().getRegionLocator(TABLE_NAME);
HFileOutputFormat2.configureIncrementalLoad(job, table, regionLocator);
// Output
Path out = new Path(output);
FileOutputFormat.setOutputPath(job, out);
// Launch the MR job
logger.debug("Start - Map Reduce job to produce HFiles");
boolean b = job.waitForCompletion(true);
if (!b) throw new RuntimeException("FAIL - Produce HFiles for HBase bulk load");
logger.debug("End - Map Reduce job to produce HFiles");
// Make the output HFiles usable by HBase (permissions)
logger.debug("Start - Set the permissions for HBase in the output dir " + out.toString());
//fs.setPermission(outputPath, new FsPermission(ALL, ALL, ALL)); => not recursive
FsShell shell = new FsShell(getConf());
shell.run(new String[]{"-chmod", "-R", "777", out.toString()});
logger.debug("End - Set the permissions for HBase in the output dir " + out.toString());
// Run complete bulk load
logger.debug("Start - HBase Complete Bulk Load");
LoadIncrementalHFiles loadIncrementalHFiles = new LoadIncrementalHFiles(getConf());
int loadIncrementalHFilesOutput = loadIncrementalHFiles.run(new String[]{out.toString(), TABLE_NAME.toString()});
if (loadIncrementalHFilesOutput != 0) {
throw new RuntimeException("Problem in LoadIncrementalHFiles. Return code is " + loadIncrementalHFiles);
}
logger.debug("End - HBase Complete Bulk Load");
My mapper reads Parquet files and emits :
key which is the row key of a Put as ImmutableBytesWritable
value which is a HBase Put
The issue happens in the Reduce step. In each Reducer's "syslog", I got errors that seem related to Socket connections. Here is a piece of syslog :
2018-09-04 08:21:39,085 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-04 08:21:39,086 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
2018-09-04 08:21:55,705 ERROR [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
2018-09-04 08:21:55,705 WARN [main] org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x3ecedf210x0, quorum=localhost:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2018-09-04 08:21:55,706 ERROR [main] org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: hconnection-0x3ecedf210x0, quorum=localhost:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2018-09-04 08:21:55,706 WARN [main] org.apache.hadoop.hbase.client.ZooKeeperRegistry: Can't retrieve clusterId from Zookeeper
After several search in Google, I found several posts that were advising to set the quorum IP directly in the Java code. I did that as well but it did not work. Here is how I currently get the HBase connection
Configuration conf = HBaseConfiguration.create();
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
// Attempts to set directly the quorum IP in the Java code that did not work
//conf.clear();
//conf.set("hbase.zookeeper.quorum", "...ip...");
//conf.set("hbase.zookeeper.property.clientPort", "2181");
Connection cnx = ConnectionFactory.createConnection(conf);
What I don't understand is that everything else is working. I can programmatically create tables, query the table (Scan or Get). I can even use a MR job that insert data with TableMapReduceUtil.initTableReducerJob("my_table", IdentityTableReducer.class, job);. But is of course much less fast than the HBase complete Bulk Load technique which directly writes HFiles splitted according to existing regions.
Thank you for your help
I've been working on a similar migration. The issue is that the reducer runs in a separate process so you need to set the quorum on the job's configuration instead. That will make the value available to the reducer.
job.getConfiguration().set("hbase.zookeeper.quorum", "...ip...");

HdfsRpcException: Failed to invoke RPC call "getFsStats" on server

I've installed a single node Hadoop Cluster on EC2 instance. I then stored some test data on hdfs and I'm trying to load the hdfs data to SAP Vora. I'm using SAP Vora 2.0 for this project.
To create the table and load the data to Vora, this is the query I'm running:
drop table if exists dims;
CREATE TABLE dims(teamid int, team string)
USING com.sap.spark.engines.relational
OPTIONS (
hdfsnamenode "namenode.example.com:50070",
files "/path/to/file.csv",
storagebackend "hdfs");
When I run the above query, I get this error message:
com.sap.vora.jdbc.VoraException: HL(9): Runtime error.
(could not handle api call, failure reason : execution of scheduler plan failed:
found error: :-1, CException, Code: 10021 : Runtime category : an std::exception wrapped.
Next level: v2 HDFS Plugin: Exception at opening
hdfs://namenode.example.com:50070/path/to/file.csv:
HdfsRpcException: Failed to invoke RPC call "getFsStats" on server
"namenode.example.com:50070" for node id 20
with error code 0, status ERROR_STATUS
Hadoop and Vora are running on different nodes.
You should specify the HDFS Namenode port, which is typically 8020. 50700 is the port of the WebUI. See e.g. Default Namenode port of HDFS is 50070.But I have come across at some places 8020 or 9000

DMS Source connection issue

Error Details: [errType=ERROR_RESPONSE, status=1022506,
errMessage=Failed to connect Network error has occurred, errDetails=
RetCode: SQL_ERROR SqlState: HYT00 NativeError: 0 Message:
[unixODBC][Microsoft][ODBC Driver 13 for SQL Server]Login timeout
expired ODBC general error.
I am getting this while creating DMS source. I have created a firewall inbound rules also. Target successfully tested.
My goal is migrate on premises SQL server db to aws sql instance, which is installed in aws-EC2.
Anyone please help me.
Thanks in Advance.