Akka scheduled job questions - akka

I have been experimenting with Play 2.0 and using Akka for a recurring scheduled job. I would like the job to run every 5 minutes. I have this really basic test and it works for the most part. Based on this test it should create a PDF file every 5 minutes. What happens is I get 4 files written every 5 minutes and sometimes more. I am not exactly sure why. Below is my code.
package models;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.*;
import javax.persistence.*;
import play.libs.*;
import play.db.ebean.*;
import akka.util.*;
import static java.util.concurrent.TimeUnit.*;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;
#Entity
public class EmailService extends Model {
public EmailService() {
// Run the Service every 5 minutes
Akka.system().scheduler().schedule(
Duration.create(0, MILLISECONDS),
Duration.create(5, MINUTES),
new Runnable() {
public void run() {
try {
// TEST
com.itextpdf.text.Document document = new com.itextpdf.text.Document();
PdfWriter.getInstance(document, new FileOutputStream(UUID.randomUUID().toString() + ".pdf"));
document.open();
document.add(new Paragraph("Hello World!"));
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
);
}
}
Ideas why it runs multiple times?

Related

Running MapReduce on Hbase Exported Table thorws Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result

I have taken the Hbase table backup using Hbase Export utility tool .
hbase org.apache.hadoop.hbase.mapreduce.Export "FinancialLineItem" "/project/fricadev/ESGTRF/EXPORT"
This has kicked in mapreduce and transferred all my table data into Output folder .
As per the document the file format will of the ouotput file is sequence file .
So i ran below code to extract my key and value from the file .
Now i want to run mapreduce to read the key value from the output file but getting below exception
java.lang.Exception: java.io.IOException: Could not find a
deserializer for the Value class:
'org.apache.hadoop.hbase.client.Result'. Please ensure that the
configuration 'io.serializations' is properly configured, if you're
using custom serialization.
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.io.IOException: Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please
ensure that the configuration 'io.serializations' is properly
configured, if you're using custom serialization.
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1964)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1760)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1774)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:478)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:671)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
Here is my driver code
package SEQ;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class SeqDriver extends Configured implements Tool
{
public static void main(String[] args) throws Exception{
int exitCode = ToolRunner.run(new SeqDriver(), args);
System.exit(exitCode);
}
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.printf("Usage: %s needs two arguments files\n",
getClass().getSimpleName());
return -1;
}
String outputPath = args[1];
FileSystem hfs = FileSystem.get(getConf());
Job job = new Job();
job.setJarByClass(SeqDriver.class);
job.setJobName("SequenceFileReader");
HDFSUtil.removeHdfsSubDirIfExists(hfs, new Path(outputPath), true);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Result.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setMapperClass(MySeqMapper.class);
job.setNumReduceTasks(0);
int returnValue = job.waitForCompletion(true) ? 0:1;
if(job.isSuccessful()) {
System.out.println("Job was successful");
} else if(!job.isSuccessful()) {
System.out.println("Job was not successful");
}
return returnValue;
}
}
Here is my mapper code
package SEQ;
import java.io.IOException;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MySeqMapper extends Mapper <ImmutableBytesWritable, Result, Text, Text>{
#Override
public void map(ImmutableBytesWritable row, Result value,Context context)
throws IOException, InterruptedException {
}
}
So i will answer my question
here is what was needed to make it work
Because we use HBase to store our data and this reducer outputs its result to HBase table, Hadoop is telling us that he doesn’t know how to serialize our data. That is why we need to help it. Inside setUp set the io.serializations variable
hbaseConf.setStrings("io.serializations", new String[]{hbaseConf.get("io.serializations"), MutationSerialization.class.getName(), ResultSerialization.class.getName()});

Tests pass with Playframework 1.2.x but fails with Playframework 1.4.x

I am migrating my application from Play1.2+Java7 to Play1.4+Java8
Play1.2+Java7 my test passes OK
Play1.4+Java8 my test fails.
I have reduced the code to the minimum and reproduced the problem. Here is the main line
The model is
package models;
import play.db.jpa.Model;
import javax.persistence.Entity;
#Entity
public class Token extends Model {
public String name;
public String role;
}
The controller is
package controllers;
import models.Token;
import play.mvc.Controller;
public class Application extends Controller {
public static void index() {
renderJSON(Token.all().fetch());
}
}
The DB test configuration is
%test.application.mode=dev
%test.db.url=jdbc:h2:mem:play;MODE=MYSQL;LOCK_MODE=0
%test.jpa.ddl=create
The test is
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import org.junit.*;
import org.junit.Before;
import play.test.*;
import play.mvc.*;
import play.mvc.Http.*;
import models.*;
public class ApplicationTest extends FunctionalTest {
#Before
public void before() {
Token.deleteAll();
}
#Test
public void testThatIndexPageWorks() {
{
Response response = GET("/");
assertIsOk(response);
String content = getContent(response);
System.out.println(content);
assertFalse(content.contains("le nom"));
assertFalse(content.contains("identifier"));
}
Token t = new Token();
t.name="le nom";
t.role="identifier";
t.save();
{
Response response = GET("/");
assertIsOk(response);
String content = getContent(response);
System.out.println(content);
assertTrue(content.contains("le nom"));
assertTrue(content.contains("identifier"));
}
}
}
The behaviour is not predictable. It seems that saving entities in the tests are committed async and calling the controller depends on the threads while it did not in release 1.2
I can provide the whole project if necessary
As I do not want to use the fixtures, I have to manually sync the DB: test call of model.save() is done within a local transaction. The transaction is not closed when GET is called. the data is not flushed yet.
I thought that it was covered by
jpa FlushModeType COMMIT
It seems that it is the case in 1.2.x, but not the case in 1.4.x
I modified the test adding the code snippet below after save() and deleteAll(), and it works fine
if ( play.db.jpa.JPA.em().getTransaction().isActive()) {
play.db.jpa.JPA.em().getTransaction().commit();
play.db.jpa.JPA.em().getTransaction().begin();
}

About GenericOptionsParser getRemainingArgs method

package com.ibm.dw61;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import com.ibm.dw61.MaxTempReducer;
import com.ibm.dw61.MaxTempMapper;
public class MaxMonthlyTemp {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] programArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (programArgs.length != 2) {
System.err.println("Usage: MaxTemp <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "Monthly Max Temp");
job.setJarByClass(MaxMonthlyTemp.class);
job.setMapperClass(MaxTempMapper.class);
job.setCombinerClass(MaxTempReducer.class);
job.setReducerClass(MaxTempReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(programArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(programArgs[1]));
// Submit the job and wait for it to finish.
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Questions :
1) This is a map-reduce code to extract max temperature for each month. The coder is trying to get non-generic options using the getRemainingArgs method. But the next line says if the number of non-generic options is not 2, that means there is an error and the program will immediately abort. I couldn’t figure out what is the coder’s logic here. Anyone kind enough to explain?
2) In another example Wordcount, the coder didn’t perform this step of getting non-generic options. So under what circumstances do we have to perform this step and testing whether the non-generic options numbers 2?
as you can see in the Hadoop API documentation, purpose of the method getRemainingArgs is to extract application-specific arguments , those that are not related to Hadoop framework. in this code, you should specify two arguments, first your input and then output, as you can see in the Usage

Play Framework: IntegrationSpec ignoring configuration provided to FakeApplication when running play test

I am using Play 2.2 and Specs2 and having the following test
import org.specs2.mutable.Specification
import org.specs2.runner.JUnitRunner
import play.api.test.Helpers.running
import play.api.test.{FakeApplication, TestBrowser, TestServer}
import java.util.concurrent.TimeUnit
import org.openqa.selenium.firefox.FirefoxDriver
import org.fluentlenium.core.domain.{FluentList, FluentWebElement}
import org.openqa.selenium.NoSuchElementException
"Application" should {
"work from within a browser" in {
running(TestServer(port, application = FakeApplication(additionalConfiguration = Map("configParam.value" -> 2)), classOf[FirefoxDriver]) {
.....
}
}
}
configParam.value is being accessed the following way in the application
import scala.concurrent.Future
import play.api.libs.json._
import play.api.Play._
import play.api.libs.ws.Response
import play.api.libs.json.JsObject
object Configuration {
val configParamValue = current.configuration.getString("configParam.value").get
}
When running play test the configParam.value being used is the one from application.conf instead of the one passed in FakeApplication.
What am I doing wrong here?
The problem is probably with the Map passed to additionalConfiguration.
You're passing an Int and trying to get a String with "getString"
Try changing to this:
running(TestServer(port, application = FakeApplication(additionalConfiguration = Map("configParam.value" -> "2")), classOf[FirefoxDriver]) {
Notice the " around the 2.

How to force an Apache Mahout application read directly from the HDFS

I have implemented an Apache Mahout application (attached bellow) which does some basic computations. To do so it is required to load the dataset from my local machine. This application comes in the form of a jar file, but then its being executed within a hadoop pseudo-distributed cluster. The terminal command for that is: $ hadoop jar /home/eualin/ApacheMahout/tdunning-MiA-5b8956f/target/mia-0.1-jar-with-dependencies.jar mia.recommender.ch03.IREvaluatorBooleanPrefIntro2 "/home/eualin/Desktop/links-final"
Now, my question is how to do the same, but this time by reading the dataset from the HDFS (we, of course, suppose that the dataset is already stored in HDFS, e.g. in /user/eualin/output/links-final}. What should change in that case? This might help: hdfs://localhost:50010/user/eualin/output/links-final
package mia.recommender.ch03;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.DataModelBuilder;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.model.GenericBooleanPrefDataModel;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import java.io.File;
public class IREvaluatorBooleanPrefIntro2 {
private IREvaluatorBooleanPrefIntro2() {
}
public static void main(String[] args) throws Exception {
if (args.length != 1) {
System.out.println("give file's HDFS path");
System.exit(1);
}
DataModel model = new GenericBooleanPrefDataModel(
GenericBooleanPrefDataModel.toDataMap(
new GenericBooleanPrefDataModel(new FileDataModel(new File(args[0])))));
RecommenderIRStatsEvaluator evaluator =
new GenericRecommenderIRStatsEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
#Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new LogLikelihoodSimilarity(model);
UserNeighborhood neighborhood =
new NearestNUserNeighborhood(10, similarity, model);
return new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
}
};
DataModelBuilder modelBuilder = new DataModelBuilder() {
#Override
public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
return new GenericBooleanPrefDataModel(
GenericBooleanPrefDataModel.toDataMap(trainingData));
}
};
IRStatistics stats = evaluator.evaluate(
recommenderBuilder, modelBuilder, model, null, 10,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD,
1.0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
}
}
You can't, directly, since the non-distributed code has no knowledge of HDFS. Instead, copy the file to a local location in setup() and then read it from a local file.