Executing multiple jobs in Map/Reduce program of Hadoop - mapreduce

public int run(String[] args) throws Exception {
/*****************Job for id-title mapping*******************/
JobConf conf_idTitle = new JobConf(PageRank.class);
conf_idTitle.setJobName("first_idTitleMapping");
...
FileInputFormat.setInputPaths(conf_idTitle, new Path(args[0]));
FileOutputFormat.setOutputPath(conf_idTitle, new Path(pathforIdTitle));
JobClient.runJob(conf_idTitle);
/*****************Job for linkgraph mapping*******************/
JobConf conf_linkgraph = new JobConf(PageRank.class);
conf_linkgraph.setJobName("second_linkgraphBuilding");
...
FileInputFormat.setInputPaths(conf_linkgraph, new Path(args[0]));
FileOutputFormat.setOutputPath(conf_linkgraph, new Path(pathforLinkgraph));
JobClient.runJob(conf_linkgraph);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new PageRank(), args);
System.exit(res);
}
This code is for executing two jobs in Map/reduce program serially, not in parellel.
I uses different mappers and reducers for each job, so I defined two JobConf objects. Because I should use the output of the first job as the input of the second job, I defined second JobConf objects after 'JobClient.runJob(conf_idTitle);'.
However, I got an exception like below:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
Why this error emerged? Thanks in advance.

public JobConf(Configuration conf)
This constructor is commonly used when your application already has constructed a JobConf
object and wants a copy to use for an alternate job. The configuration in conf is copied into the
new JobConf object.

Related

Error: RefNonZero When Returning a Uniue_Ptr to ClientReader in GRPC

After defining a method of the following form:
std::unique_ptr<ClientReader<FlowCellPositionResponse> > method(FlowCellPositionsRequest request)
{
...
ClientContext context;
return stub->some_method(&context, request); // Also tried std::move
}
within a file and accessing this method via another file's method like so:
FlowCellPositionsRequest request;
FlowCellPositionsResponse response;
std::unique_ptr<ClientReader<FlowCellPositionResponse> > reader = file.method(request);
while(reader->Read(&response)) { // Error raised here
...
}
Status status = reader->Finish();
I get the following error:
Assertion failed: (prior > 0), function RefNonZero, file ref_counted.h, line 119.
[1] 2450 abort ./program
If I move this logic back into method, it runs fine, but I wanted to create this abstraction. I'm still quite new to both C++ and GRPC and I was just wondering what I'm doing wrong?
The ClientContext is going out of scope when method() returns, but that object needs to outlive the ClientReader<> object that you're returning.
I think what you probably want here is an object to hold all of the state needed for the RPC, including both the ClientContext and the ClientReader<>. Then you can return that object from method().

Waiting for an external event before continue in unit test

Context:
I'm writing unit test for a gRPC service. I want to verify that the method of the mock on the server side is called. I'm using easy mock. To be sure we get the response of gRPC (whatever it is) I need to suspend the thread before easy mock verify the calls.
So I tried something like this using LockSupport:
#Test
public void alphaMethodTest() throws Exception
{
Dummy dummy = createNiceMock(Dummy.class);
dummy.alphaMethod(anyBoolean());
expectLastCall().once();
EasyMock.replay(dummy);
DummyServiceGrpcImpl dummyServiceGrpc = new DummyServiceGrpcImpl();
bcreuServiceGrpc.setDummy(dummy);
DummyServiceGrpc.DummyServiceStub stub = setupDummyServiceStub();
Thread thread = Thread.currentThread();
stub.alphaMethod(emptyRequest, new StreamObserver<X>(){
#Override
public void onNext(X value) {
LockSupport.unpark(thread);
}
}
Instant expirationTime = Instant.now().plus(pDuration);
LockSupport.parkUntil(expirationTime.toEpochMilli());
verify(dummy);
}
But I have many tests like this one (around 40) and I suspect threading issue. I usually get one or two failing the verify step, sometime all of them pass. I try to use a ReentrantLock with Condition instead. But again some are failing (IllegalMonitorStateException on the signalAll):
#Test
public void alphaMethodTest() throws Exception
{
Dummy dummy = createNiceMock(Dummy.class);
dummy.alphaMethod(anyBoolean());
expectLastCall().once();
EasyMock.replay(dummy);
DummyServiceGrpcImpl dummyServiceGrpc = new DummyServiceGrpcImpl();
bcreuServiceGrpc.setDummy(dummy);
DummyServiceGrpc.DummyServiceStub stub = setupDummyServiceStub();
ReentrantLock lock = new ReentrantLock();
Condition conditionPromiseTerminated = lock.newCondition();
stub.alphaMethod(emptyRequest, new StreamObserver<X>(){
#Override
public void onNext(X value) {
conditionPromiseTerminated.signalAll();
}
}
Instant expirationTime = Instant.now().plus(pDuration);
conditionPromiseTerminated.awaitUntil(new Date(expirationTime.toEpochMilli()));
verify(dummy);
}
I'm sorry not providing runnable example for you, my current code is using a private API :/.
Do you think LockSupport may cause trouble because of the multiple tests running? Am I missing something using lock support or reentrant lock. Do you think of any other class of the concurrent API that would suit better my needs?
LockSupport is a bit dangerous, you will need to read the documentation closely and find out that:
The call spuriously (that is, for no reason) returns.
So when you think your code will do some "waiting", it might simply return immediately. The simplest reason for that would be this for example, but there could be other reasons too.
When using ReentrantLock, all of them should fail with IllegalMonitorStateException, because you never acquire the lock via ReentrantLock::lock. And stop using new Date(...), it is deprecated for a reason.
I think you are over-complicating things, you could do the same signaling with a plain lock, a simplified example:
public static void main(String[] args) {
Object lock = new Object();
Thread first = new Thread(() -> {
synchronized (lock) {
System.out.println("Locked");
try {
System.out.println("Sleeping");
lock.wait();
System.out.println("Waked up");
} catch (InterruptedException e) {
// these are your tests, no one should interrupt
// unless it's yourself
throw new RuntimeException(e);
}
}
});
first.start();
sleepOneSecond();
Thread second = new Thread(() -> {
synchronized (lock) {
System.out.println("notifying waiting threads");
lock.notify();
}
});
second.start();
}
private static void sleepOneSecond() {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Notice the output:
Locked
Sleeping
notifying waiting threads
Waked up
It should be obvious how the "communication" (signaling) between threads happens.

Howto resend message after actor is restarted by supervisor strategy

I have parent actor (A) with two child actors (B).
I made a supervisor strategy in A, so in case a specific exception happens in B, B will be restarted.
How can I resend the message which caused the exception in B to B again?
What I've done in B is to send the message again to B in preRestart, see code below.
#Override
public void preRestart(final Throwable reason, final scala.Option<Object> message) throws Exception
{
getSelf().tell(message.get(), getSender());
};
To ensure I don't end in an endless loop, I configure the supervisor strategy in A as follows:
private final SupervisorStrategy strategy = new OneForOneStrategy(3, Duration.Inf(),
new Function<Throwable, SupervisorStrategy.Directive>()
{
#Override
public Directive apply(final Throwable t) throws Exception
{
if (t instanceof SpecificException)
{
return SupervisorStrategy.restart();
}
return SupervisorStrategy.escalate();
}
});
This should gurarantee, that the problematic message is resent only three times. Could somebody give me an advice if this is good practice or link me to a better solution?

Returning Promise from AWS.SWF Workflow

It seems that according to swf-docs the following code:
#Workflow
#WorkflowRegistrationOptions(
defaultExecutionStartToCloseTimeoutSeconds = 60,
defaultTaskStartToCloseTimeoutSeconds = 10)
public interface MyWorkflow
{
#Execute(version = "1.0")
Promise<String> startMyWF(int a, String b);
}
Should generate MyWorkflowClientExternal that returns a Promise<String>; i.e.:
Promise<String> startMyWF(int a, String b);
However, instead a void method is generated for both MyWorkflowClientExternal and MyWorkflowClientExternalImpl:
void startMyWF(int a, String b) ...
The internal client MyWorkflowClient and MyWorkflowClientImpl does return the Promise object as expected:
Promise<String> startMyWF(int a, String b);
I would like to use ExternalClient; but it does not seem to return the Promise object. I would very much appreciate clarifications.
Thank you.
I posted this question on the AWS-SWF developer forum; and #maxim-fateev has kindly pointed several approaches:
The return value of a workflow is very useful for child workflows
because they are modeled as asynchronous calls. For standalone
workflows, you can use one of the following options to retrieve the
results:
1) Get it from the workflow history using SWF API
GetWorkflowExecutionHistory (the result is in the
WorkflowExecutionCompleted event). You can also inspect the history
using the SWF console.
2) Design your workflow to put the result somewhere, for example you
can add an activity at the end to put the result in a store and have
the application look there periodically.
3) Host an activity in the program that starts the workflow execution.
The workflow starter program now becomes part of the workflow and the
activity it hosts can be passed the result of the workflow.
You may use the first option in manually operated tools. However, it
is not recommended as a general mechanism for applications to retrieve
workflow results because it effectively requires you to poll SWF to
check for workflow completion and goes against our long polling
design.
I went with the approach #2; here is the gist of it (if you think there is a better way; please do let me know).
Created NotificationActivityImpl:
public class NotificationActivitiesImpl implements NotificationActivities {
private Object notification;
public NotificationActivitiesImpl() {
this.notification = null;
}
#Override
public void notify(Object obj) {
this.notification = obj;
}
/**
* #return notification (will block until it is available)
*/
#Override
public Object getNotification() {
while (notification == null ){
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
return notification;
}
}
In the WorkflowImpl added:
notificationClient.notify(obj) // obj that want to pass back to your app
In the App (which starts the workflow; and NotificationAcitivityWorker) added the following:
workflowWorker.start();
notificationWorker.start();
NotificationActivitiesImpl notificationImpl = (NotificationActivitiesImpl) notificationWorker.getActivitiesImplementations().iterator().next();
Object notification = notificationImpl.getNotification();

Using NUnit to test HTTP status of a WebFaultException

I want to write a unit test to ensure that I get a WebException with a 404 status code thrown from a particular method.
The WebException bit is easy:
[Test]
[ExpectedException(typeof(WebFaultException))]
public void CheckForWebFaultException()
{
var myClass = new MyClass();
myClass.MyMethod();
}
However this could be a 404, a 400, 401 or any other of the myriad of other http codes.
I could do a try/catch/Assert.True but this feels like a hack. Is there a way to Assert against a property of the thrown exception?
Something like
Assert.Throws(typeof(WebFaultException), myClass.MyMethod(), wfx => wfx.StatusCode == HttpStatusCode.NotFound);
I was on the right lines, Assert.Throws actually returns the exception which was thrown.
[Test]
public void CheckForWebFaultException()
{
var myClass = new MyClass();
var ex = Assert.Throws<WebFaultException>(() => myClass.MyMethod());
Assert.AreEqual(HttpStatusCode.NotFound, ex.StatusCode);
}
Note that I've taken out the [ExpectedException(typeof(WebFaultException))] as the exception is now handled and the test will fail if this is left in.
Assert.Throws ensures that the exception was thrown by myClass.MyMethod() and the second assert checks the status code.