Newbie here. Trying to run the code from Nathan Marz's book Big Data DFS Datastore using Pail. What am I doing wrong? Trying to connect to an HDFS VM. Tried replacing hdfs with file. Any help appreciated.
public class AppTest
{
private App app = new App();
private String path = "hdfs:////192.168.0.101:8080/mypail";
#Before
public void init() throws IllegalArgumentException, IOException{
FileSystem fs = FileSystem.get(new Configuration());
fs.delete(new Path(path), true);
}
#Test public void testAppAccess() throws IOException{
Pail pail = Pail.create(path);
TypedRecordOutputStream os = pail.openWrite();
os.writeObject(new byte[] {1, 2, 3});
os.writeObject(new byte[] {1, 2, 3, 4});
os.writeObject(new byte[] {1, 2, 3, 4, 5});
os.close();
}
}
Get an error -
java.lang.IllegalArgumentException: Wrong FS: hdfs:/192.168.0.101:8080/mypail, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
On replacing HDFS with file as file:///
java.io.IOException: Mkdirs failed to create file:/192.168.0.101:8080/mypail (exists=false, cwd=file:/Users/joshi/git/projectcsr/projectcsr)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
at
I came across the same problem and I solved it! You should add your core-site.xml to the hadoop Configuration object, something like this should work:
Configuration cfg = new Configuration();
Path core_site_path = new Path("path/to/your/core-site.xml");
cfg.addResource(core_site_path);
FileSystem fs = FileSystem.get(cfg);
I guess you could do the same also programmatically adding the property fs.defaultFS to the cfg object
Source:
http://opensourceconnections.com/blog/2013/03/24/hdfs-debugging-wrong-fs-expected-file-exception/
Related
i want to develop an IOTA application, but not a messaging application or coin based system. I want an simple example of how to store data in IOTA. For example i want to build an SCM or even an simple login/registration app. Can anyone guide me? Any sample application? i try to run https://github.com/domschiener/leaderboard-example But getting same error like https://github.com/domschiener/leaderboard-example/issues/6 How to run this.
Storing text data on the tangle is not that difficult. The following are snippets from my tangle-based app. I used IOTA's API Java wrapper library Jota.
1) Connect to IOTA node. You can find a list of nodes here https://nodes.iota.works. Also you can set up your own full node and use it instead of an external one.
final String protocol = "https";
final String url = "tuna.iotasalad.org";
final String port = "14265";
IotaAPI iotaServer = new IotaAPI.Builder().protocol(protocol).host(host).port(port).build();
2) Covert your text into trytes
String trytes = TrytesConverter.toTrytes("my text");
3) Prepare and send transaction to tangle
private static final String SEED = "IHDEENZYITYVYSPKAURUZAQKGVJERUZDJMYTANNZZGPZ9GKWTEOJJ9AAMXOGZNQLSNMFDSQOTZAEETA99";//just a random one
private static final int MIN_WEIGHT_MAGNITUDE = 14;
private static final int DEPTH = 9;
private static final int TAG = "mytag"; //optional
String tangleHash = prepareTransfer(createAddress(), trytes);
public String createAddress() throws ArgumentException {
GetNewAddressResponse res = iotaServer.getNewAddress(SEED, 2, 0, false, 1, false);
return res.getAddresses().get(0);
}
public String prepareTransfer(String address_seclevel_2, String trytes) throws ArgumentException {
List<Transfer> transfers = new ArrayList<Transfer>();
transfers.add(new Transfer(address_seclevel_2, 0, trytes, TAG));
SendTransferResponse str = iotaServer.sendTransfer(SEED, 2, DEPTH, MIN_WEIGHT_MAGNITUDE, transfers, null,
null, false, false);
if(str.getSuccessfully()!=null){
//Transfer successfully!
for(Transaction tx: str.getTransactions()) {
return tx.getHash();
}
}
return "Handle error here. Something went wrong!";
}
Environment: HDP 2.3 Sandbox
Problem: I have created a table in hive with just 2 columns. Now i want to read this in my MR code using HCatalog integration. The MR Job fails to read the table from the MySql meta-store. It uses the Derby for some reason and hence it fails with "table not found" message.
Job Client code:
public class HCatalogMRJob extends Configured implements Tool {
public int run(String[] args) throws Exception {
Configuration conf = getConf();
args = new GenericOptionsParser(conf, args).getRemainingArgs();
String inputTableName = args[0];
String outputTableName = args[1];
String dbName = null;
Job job = new Job(conf, "HCatalogMRJob");
HCatInputFormat.setInput(job, dbName, inputTableName);
job.setInputFormatClass(HCatInputFormat.class);
job.setJarByClass(HCatalogMRJob.class);
job.setMapperClass(HCatalogMapper.class);
job.setReducerClass(HCatalogReducer.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(WritableComparable.class);
job.setOutputValueClass(DefaultHCatRecord.class);
HCatOutputFormat.setOutput(job, OutputJobInfo.create(dbName, outputTableName, null));
HCatSchema s = HCatOutputFormat.getTableSchema(conf);
System.err.println("INFO: output schema explicitly set for writing:"
+ s);
HCatOutputFormat.setSchema(job, s);
job.setOutputFormatClass(HCatOutputFormat.class);
return (job.waitForCompletion(true) ? 0 : 1);
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new HCatalogMRJob(), args);
System.exit(exitCode);
}
}
Job Run Command:
hadoop jar mr-hcat.jar input_table out_table
Before running this command, i have set the necessary hcatalog, hive jars in the class path using the hadoop_classpath variable.
Question:
Now, how do i make the job to use the hive-site.xml correctly?
I tried setting this in the classpath using the same hadoop_classpath as mentioned above., but still it fails.
I am trying to use the ZipFiles() utility method and its producing an empty zip file. I am using Sitecore 6.5. There are no error, permissions or otherwise.
Any thoughts? Here is the code.
public void CreateZipFile(string zipfileName, List<string> files)
{
var zipfile = string.Format("{0}/{1}/{2}", TempFolder.Folder, "myfolder", zipfileName) ;
var fileArray = files.ToArray();
var x = FileUtil.ZipFiles(zipfile, fileArray);
}
EDIT:
I am passing the files like this
var files = new List<string> { FileUtil.MapPath("/temp/sample.xlf") };
The proper usage of FileUtil.ZipFiles method is:
FileUtil.ZipFiles("/test.zip", new []{"/web.config", "/otherfile.txt"})
Sitecore automatically maps paths. The zip file will be created in your web app root.
EDIT AFTER COMMENT
If you want to create a zip file outside the web root and with a flat structure inside, you can use Sitecore ZipWriter class like this:
public static string ZipFiles(string absolutePathToZipfile, string[] files)
{
using (ZipWriter zipWriter = new ZipWriter(absolutePathToZipfile))
{
foreach (string path in files)
{
using (FileStream fileStream = System.IO.File.OpenRead(path.StartsWith("/") ? FileUtil.MapPath(path) : path))
zipWriter.AddEntry(FileUtil.GetFileName(path), fileStream);
}
}
return absolutePathToZipfile;
}
I need to implement a MR job which access data from both HBase table and HDFS files. E.g., mapper reads data from HBase table and from HDFS files, these data share the same primary key but have different schema. A reducer then join all columns (from HBase table and HDFS files) together.
I tried look online and could not find a way to run MR job with such mixed data source. MultipleInputs seem only work for multiple HDFS data sources. Please let me know if you have some ideas. Sample code would be great.
After a few days of investigation (and get help from HBase user mailing list), I finally figured out how to do it. Here is the source code:
public class MixMR {
public static class Map extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String s = value.toString();
String[] sa = s.split(",");
if (sa.length == 2) {
context.write(new Text(sa[0]), new Text(sa[1]));
}
}
}
public static class TableMap extends TableMapper<Text, Text> {
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR1 = "c1".getBytes();
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
String key = Bytes.toString(row.get());
String val = new String(value.getValue(CF, ATTR1));
context.write(new Text(key), new Text(val));
}
}
public static class Reduce extends Reducer <Object, Text, Object, Text> {
public void reduce(Object key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String ks = key.toString();
for (Text val : values){
context.write(new Text(ks), val);
}
}
}
public static void main(String[] args) throws Exception {
Path inputPath1 = new Path(args[0]);
Path inputPath2 = new Path(args[1]);
Path outputPath = new Path(args[2]);
String tableName = "test";
Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MixMR.class); // class that contains mapper
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
scan.addFamily(Bytes.toBytes("cf"));
TableMapReduceUtil.initTableMapperJob(
tableName, // input HBase table name
scan, // Scan instance to control CF and attribute selection
TableMap.class, // mapper
Text.class, // mapper output key
Text.class, // mapper output value
job);
job.setReducerClass(Reduce.class); // reducer class
job.setOutputFormatClass(TextOutputFormat.class);
// inputPath1 here has no effect for HBase table
MultipleInputs.addInputPath(job, inputPath1, TextInputFormat.class, Map.class);
MultipleInputs.addInputPath(job, inputPath2, TableInputFormat.class, TableMap.class);
FileOutputFormat.setOutputPath(job, outputPath);
job.waitForCompletion(true);
}
}
There is no OOTB feature that supports this. A possible workaround could be to Scan your HBase table and write the Results to a HDFS file first and then do the reduce-side join using MultipleInputs. But this will incur some additional I/O overhead.
A pig script or hive query can do that easily.
sample pig script
tbl = LOAD 'hbase://SampleTable'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'info:* ...', '-loadKey true -limit 5')
AS (id:bytearray, info_map:map[],...);
fle = LOAD '/somefile' USING PigStorage(',') AS (id:bytearray,...);
Joined = JOIN A tbl by id,fle by id;
STORE Joined to ...
In hadoop 0.20.2 version one can add input/output compression to the jobconf in the following way:
jobConf.setBoolean("mapred.output.compress", true);
jobConf.setClass("mapred.output.compression.codec", BZip2Codec.class, CompressionCodec.class);
jobConf is deprecated and job should be used instead. How can I add compression/decompression there? In particular, how can I change the wordcount example to input bzip2 files:
public class WordCount {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
Job job = new Job(conf, "Example Hadoop 0.20.1 WordCount");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenCounterMapper.class);
job.setReducerClass(TokenCounterReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Use the Configuration class as below while submitting the Job
Configuration conf = new Configuration();
conf.set("mapred.output.compression.codec",
"org.apache.hadoop.io.compress.BZip2Codec");
Job job = new Job(conf);
This is the way I found to compress the output:
Job job = new Job(conf, "FromToWordStatistics");
job.setJarByClass(FromToWordStatistics.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setCombinerClass(IntSumReducer.class);
job.setNumReduceTasks(20);
SequenceFileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
SequenceFileOutputFormat.setCompressOutput(job, true);
SequenceFileOutputFormat.setOutputCompressorClass(job, BZip2Codec.class);