MRUnit with HBase and MapReduce gets an error with serialization - mapreduce

I'm trying to test my MapReduce with MRUnit, when I do the integration test, it works. I have some unit test that I want to pass them as well.
My MRUnit driver and MapReduce class are:
MapDriver<ImmutableBytesWritable, Result, ImmutableBytesWritable, KeyValue>
public final class HashMapper extends
TableMapper<ImmutableBytesWritable, KeyValue>
When I define the input I get an error:
mapDriver.withInput(new ImmutableBytesWritable(Bytes
.toBytes("query")), new Result(kvs1));
java.lang.NullPointerException
at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:73)
at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:91)
at org.apache.hadoop.mrunit.internal.output.MockOutputCollector.collect(MockOutputCollector.java:48)
at org.apache.hadoop.mrunit.internal.mapreduce.AbstractMockContextWrapper$4.answer(AbstractMockContextWrapper.java:90)
at org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:29)
at org.mockito.internal.MockHandler.handle(MockHandler.java:95)
at org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
I guess that it's because it doesn't like the Result and KeyValue object since they're not Writable, but I don't undersntad why the integration tests works then. It was working before with Hbase 0.94 when all these objects implement Writable, now I'm working with HBase 0.96. Any clue how should I use MRUnit here?

With the version HBase 0.96 some classes aren't implement Writable anymore, but people from HBase have created a new serialize classes for them.
So, the solutions is to indicate in the Configuration what classes MRUnit must use:
The property is called io.serializations
The different serializes are:
Result class org.apache.hadoop.hbase.mapreduce.ResultSerialization
KeyValue class org.apache.hadoop.hbase.mapreduce.KeyValueSerialization
Put & Get classes org.apache.hadoop.hbase.mapreduce.MutationSerialization

Related

EnableNeo4jRepositories.sessionFactoryRef is ignored / does nothing

I'm trying to configure a Spring Boot 1.5.9 project with multiple data sources, of which some are Neo4j.
The version of spring-data-neo4j I'm using is 4.2.9.
My goal is to use a different SessionFactory for different repositories, using a different Configuration class for each.
I've got this all working with Mongo but it seems that, even though the sessionFactoryRef is available on #EnableNeo4jRepositories, it simple does not get acted upon.
Abbreviated version of my configuration, with the general concepts:
#org.springframework.context.annotation.Configuration
#EnableNeo4jRepositories(basePackages = "<repo-package-name>", sessionFactoryRef = NEO4J_SESSIONFACTORY_NAME)
public class MyConfiguration {
protected static final String NEO4J_SESSIONFACTORY_NAME = "mySessionFactory";
#Bean(NEO4J_SESSIONFACTORY_NAME)
public SessionFactory mySessionFactory() {
SessionFactory sessionFactory = ...
// passing entity package corresponding to repository
return sessionFactory;
}
As mentioned, this construct works fine with spring-data-mongodb, however in neo4j it first starts out with an error:
***************************
APPLICATION FAILED TO START
***************************
Description:
A component required a bean named 'getSessionFactory' that could not be found.
Action:
Consider defining a bean named 'getSessionFactory' in your configuration.
Turning on debug in the logger and a look through the code led me to SessionBeanDefinitionRegistrarPostProcessor, that contains the following code to get the sessionFactory:
private static String getSessionFactoryBeanRef(ConfigurableListableBeanFactory beanFactory) {
return beanFactory.containsBeanDefinition("sessionFactory") ? "sessionFactory" : "getSessionFactory";
}
Hmmm... hardcoded names for a bean, no sign of customisability.
I then proceeded to name my bean twice, #Bean("sessionFactory", NEO4J_SESSIONFACTORY_NAME), so the above code would pass.
The application started, but the problem is that the repositories get wired with whatever bean is called sessionFactory, effectively not using the sessionFactoryRef on the annotation.
To test this, I changed the name on the annotation to a non-existing bean and it continued to start (if I do this with the mongo-annotation, the application quits because the bean mentioned in mongoTemplateRef isn't available).
I dug a little deeper and found that, for mongo, it retrieves the bean reference in this class. The equivalent neo4j implementation has no such thing. It could of course be an implementation detail but I wasn't able to find any reference to the sessionFactoryRef attribute other than the annotation and the xml-schema.
There are also other places in the config classes that expect only one SessionFactory to be available.
So, in short, it seems to me that EnableNeo4jRepositories.sessionFactoryRef has no implementation and therefore simple doesn't do anything.
As a result, with the current code a single bean "sessionFactory" must be present and all repositories will be wired with this bean, regardless of the value of sessionFactoryRef.
Anybody else with a similar experience or any idea how to file a bug for this?

Web Sphere does not commit JPA transaction

Could someone explain to me why Web Sphere Application Server 8.5.5 does not commit (or even begin?) transactions in JTA mode.
I have a dao class annotated with
#Stateless
#TransactionManagement(value = TransactionManagementType.CONTAINER)
And I have a method annotated with #TransactionAttribute(TransactionAttributeType.REQUIRES_NEW). The method simply inserts some entities into the database (if they do not exist yet).
for (MyEntity entity : entities) {
if (validate(entity) { // Programmatic bean validation, returns true when ok
getEntityManager().persist(entity);
}
}
Tests run with Arquillian in Embedded GlassFish, this works perfectly. I can breakpoint stop the code in Eclipse (Luna & Kepler) after this method completes and check the db that there is data. The data used in the test is identical to the data used when deployed on WAS. (Validation errors are shown correctly when tested separately)
According to instructions (http://docs.oracle.com/javaee/6/tutorial/doc/bncij.html)
The code does not include statements that begin and end the transaction...
I probably can't understand this correctly as I have to explicitly wrap the method contents with these:
getEntityManager().getTransaction().begin();
... The persist loop ...
getEntityManager().getTransaction().commit();
...to make the the persisting work.
If I do not do this, there is nothing put in to the database.
I also injected an extra resource for checking the transaction status
#Resource
private TransactionSynchronizationRegistry tsr;
and put this at the end of the method
System.out.println("Transaction status: " + tsr.getTransactionStatus());
getEntityManager().flush();
The output was this:
Transaction status: 0
where 0 = Status.STATUS_ACTIVE
However at the 'flush', an excpetion was thrown:
javax.persistence.TransactionRequiredException:
Exception Description: No transaction is currently active
I spent days trying to figure this out on WAS, while I had it all the time working with the embedded GlassFish (v3) tests.
Both using JavaEE6 (and java 6), though for the debug in Eclipse I have to switch to JavaEE7 + Java7.
Prior to this in another project I have done similar code on GlasFish v4 without any kind of problems.
So could someone clarify me if there are some WAS specific requirements to make this work, or do I just need to do the exact opposite with WAS than the instructions say and how I understand things should work?
I have already the following configuration on WAS:
(admin console)
server > server types > WebSphere application servers > server1 > Container Services > Default Java Persistence API settings > Default JTA data source JNDI name = 'jdbc/kr' (the same as configured in my persistence.xml)
resources > JDBC > JDBC providers > Oracle JDBC Driver (pings ok)
(When this was created) the 'Implementation type' was set to 'connection Pool Datasource', but I also tried this using the 'XA'.
// UPDATE
The getEntityManager-method simply returns the injected entity manager from the super class.
public abstract GenericDAO<T extends GenericEntity> {
#PersistenceContext
private EntityManager em;
...
public EntityManager getEntityManager() {
return this.em;
}
}
// GenericEntity is an interface to force the entities to have the "get all" named query.
The class uses generic dao -pattern (you get the idea from this Single DAO & generic CRUD methods (JPA/Hibernate + Spring), though I have my own modifications as it's an abstract class with default CRUD methods).
When the metdhod getEntityManager is used instead of directly accessing the resource, it's possible to override the entity manager used in the super class if the real dao-class decides to use it's own. => Also the super class has getEntityManager calls and if you override this in implementing class, it will get the same em in the abstract what the actual implementing class uses. Also this method is usable in tests when you can get the em and evict data when needed.
Also this way you can easily add logging when em is accessed (logging interceptor).
// UPDATE 2
Occurred to my mind that there is a separate resource manager used to get remote resources (ejb's). This is so that the location of the ejb is configurable from a property file. However the inner-injection still works within the ejb of this service of mine.
I started thinking that could this cause somehow that the container losses it's transaction handling ability?
Also I noted that there is a #Singleton scoped bean along the path using the actual transactional resources. I could not find a clear explanation on what scopes the beans should be (probably there is not any kind of requirement), but I ended up with understanding that the dao should be #Stateless.
In JavaEE7 this is much more clearer as there is the #Transactional annotation for pointing this.

iOS Unit Testing: XCTestSuite, XCTest, XCTestRun, XCTestCase methods

In my daily unit test coding with Xcode, I only use XCTestCase. There are also these other classes that don't seem to get used much such as: XCTestSuite, XCTest, XCTestRun.
What are XCTestSuite, XCTest, XCTestRun for? When do you use them?
Also, XCTestCase header has a few methods such as:
defaultTestSuite
invokeTest
testCaseWithInvocation:
testCaseWithSelector:
How and when to use the above?
I am having trouble finding documentation on the above XCTest-classes and methods.
Well, this question is pretty good and I just wondering why this question is being ignored.
As the document saying:
XCTestCase is a concrete subclass of XCTest that should be the override point for
most developers creating tests for their projects. A test case subclass can have
multiple test methods and supports setup and tear down that executes for every test
method as well as class level setup and tear down.
On the other hand, this is what XCTestSuite defined:
A concrete subclass of XCTest, XCTestSuite is a collection of test cases. Alternatively, a test suite can extract the tests to be run automatically.
Well, with XCTestSuite, you can construct your own test suite for specific subset of test cases, instead of the default suite ( [XCTestCase defaultTestSuite] ), which as all test cases.
Actually, the default XCTestSuite is composed of every test case found in the runtime environment - all methods with no parameters, returning no value, and prefixed with ‘test’ in all subclasses of XCTestCase.
What about the XCTestRun class?
A test run collects information about the execution of a test. Failures in explicit
test assertions are classified as "expected", while failures from unrelated or
uncaught exceptions are classified as "unexpected".
With XCTestRun, you can record information likes startDate, totalDuration, failureCount etc when the test is starting, or somethings like hasSucceeded when done, and therefore you got the result of running a test. XCTestRun gives you controlability to focus what is happening or happened about the test.
Back to XCTestCase, you will notice that there are methods named testCaseWithInvocation: and testCaseWithSelector: if you read source code. And I recommend you to do for more digging.
How do they work together ?
I've found that there is an awesome explanation in Quick's QuickSpec source file.
XCTest automatically compiles a list of XCTestCase subclasses included
in the test target. It iterates over each class in that list, and creates
a new instance of that class for each test method. It then creates an
"invocation" to execute that test method. The invocation is an instance of
NSInvocation, which represents a single message send in Objective-C.
The invocation is set on the XCTestCase instance, and the test is run.
Some links:
http://modocache.io/probing-sentestingkit
https://github.com/Quick/Quick/blob/master/Sources/Quick/QuickSpec.swift
https://developer.apple.com/reference/xctest/xctest?language=objc
Launch your Xcode, and use cmd + shift + O to open the quickly open dialog, just type 'XCTest' and you will find some related files, such as XCTest.h, XCTestCase.h ... You need to go inside this file to check out the interfaces they offer.
There is a good website about XCTest: http://iosunittesting.com/xctest-assertions/

Unit Testing DbContext

I've researched some information about techniques I could use to unit test a DbContext. I would like to add some in-memory data to the context so that my tests could run against it. I'm using Database-First approach.
The two articles I've found most usefull were this and this.
That approach relies on creating an IContext interface that both MyContext and FakeContext will implement, allowing to Mock the context.
However, I'm trying to avoid using repositories to abstract EF, as pointed by some people, since EF 4.1 already implements repository and unit of work patterns through DbSet and DbContext, and I really would like to preserve all the features implemented by the EF Team without having to maintain them myself with a generic repository, as I already did in other project (and it was kind of painful).
Working with an IContext will lead me to the same path (or won't it?).
I thought about creating a FakeContext that inherits from main MyContext and thus take advantage of the DbContext underneath it to run my tests without hitting the database.
I couldn't find similar implementations, so I'm hoping someone can help me on this.
Am I doing something wrong, or could this lead me to some problems that I'm not anticipating?
Ask yourself a single question: What are you going to test?
You mentioned FakeContext and Mocking the context - why to use both? Those are just different ways to do the same - provide test only implementation of the context.
There is one more bigger problem - faking or mocking context or set has only one result: You are not testing your real code any more.
Simple example:
public interface IContext : IDisposable
{
IDbSet<MyEntity> MyEntities { get; }
}
public class MyEntity
{
public int Id { get; set; }
public string Path { get; set; }
}
public class MyService
{
private bool MyVerySpecialNetMethod(e)
{
return File.Exists(e.Path);
}
public IEnumerable<MyEntity> GetMyEntities()
{
using (IContext context = CreateContext())
{
return context.MyEntities
.Where(e => MyVerySpecialNetMethod(e))
.Select(e)
.ToList();
}
}
}
Now imagine that you have this in your SUT (system under test - in case of unit test it is an unit = usually a method). In the test code you provide FakeContext and FakeSet and it will work - you will have a green test. Now in the production code you will provide a another derived DbContext and DbSet and you will get exception at runtime.
Why? Because by using FakeContext you have also changed LINQ provider and instead of LINQ to Entities you are running LINQ to Objects so calling local .NET methods which cannot be converted to SQL works as well as many other LINQ features which are not available in LINQ to Entities! There are other issues you can find with data modification as well - referential integrity, cascade deletes, etc. That is the reason why I believe that code dealing with context / LINQ to Entities should be covered with integration tests and executed against the real database.
I am developing an open-source library to solve this problem.
http://effort.codeplex.com
A little teaser:
You don't have to add any boilerplate code, just simply call the appropriate API of the library, for example:
var context = Effort.ObjectContextFactory.CreateTransient<MyContext>();
At first this might seem to be magic, but the created ObjectContext object will communicate with an in-memory database and will not talk to the original real database at all. The term "transient" refers to the lifecycle of this database, it only lives during the presence of the created ObjectContext object. Concurrently created ObjectContext objects communicate with dedicated database instances, the data is not shared accross them. This enables to write automated tests easily.
The library provides various features to customize the creation: share data across instances, set initial data of the database, create fake database on different data layers... check out the project site for more info.
As of EF 4.3, you can unit test your code by injecting a fake DefaultConnectionFactory before creating the context.
Entity Framework 4.1 is close to being able to be mocked up in tests but requires a little extra effort. The T4 template provides you with a DbContext derived class that contains DbSet properties. The two things that I think you need to mock are the DbSet objects that these properties return and properites and methods you're using on the DbContext derived class. Both can be achieved by modifying the T4 template.
Brent McKendrick has shown the types of modifications that need to be made in this post, but not the T4 template modifications that can achieve this. Roughly, these are:
Convert the DbSet properties on the DbContext derived class into IDbSet properties.
Add a section that generates an interface for the DbContext derived class containing the IDbSet properties and any other methods (such as SaveChanges) that you'll need to mock.
Implement the new interface in the DbContext derived class.

Entity Framework 4 Unit Testing and Mocking

I am very new to unit testing when it comes to databases and especially entity framework and I am now stuck. I am using NUnit to test and mock the entities used and am working using a generic repository. My entity framework has a full set of POCO classes and the bit I am currently testing looks like this:
campaignRepoMock = new DynamicMock(typeof(IRepository<Campaign>));
campaignRepoMock.ExpectAndReturn("First", testCampaign, new Func<Campaign, bool>(c => c.CampaignID == testCampaign.CampaignID));
CampaignService campaignService = new CampaignService((IRepository<Campaign>)campaignRepoMock.MockInstance);
Campaign campaign = campaignService.GetCampaign(testCampaign.Key, ProjectId);
Assert.AreEqual(testCampaign, campaign);
testCampaign is a single POCO campaign test object. The method "First" in the IRepository looks like the following:
public T First(Func<T, bool> predicate)
{
return _objectSet.FirstOrDefault<T>(predicate);
}
The error I am getting from Nunit is
CampaignServiceTests.Campaign_Get_Campaign:
Expected: <System.Func`2[Campaign,System.Boolean]>
But was: <System.Func`2[Campaign,System.Boolean]>
So it is basically saying that it is getting what it is expecting, but its throwing an error? Maybe my understanding of this is all wrong, I just want to test the searching for a Campaign based on its key and the project it is linked to. The GetCampaigns method just search the repository sent to it for a campaign that has both of those items.
Can anyone point me to what I am doing wrong? Thanks in advance.
If I understand your code, over here
campaignRepoMock.ExpectAndReturn("First", testCampaign, new Func<Campaign, bool>(c => c.CampaignID == testCampaign.CampaignID));
you are setting up your mock object to return a function that is not identical to your testCampaign.
Assert.AreEqual() tests for strict equality. testCampaign and campaign are of the same type and have the same content, but refer to different objects.
What mocking framework are you using? Looks pretty complicated and confusing to me. For starting I would recommend something like Moq