mrunit - Using mrunit.mapreduce.MapDriver not callling customised Record Reader - mapreduce

I am modifying an MapReduce program in the Record Reader and wanted to write a test case for mapper to call customised InputFormat or Record Reader. I have modified test case for record reader but the record reader test cases are not an mrunit one.
How to call the customised Record Reader from Mapper, as withInputFormat function is not listed under MapDriver.newMapDriver?
Please find the snapshot of my code:
#Before
public void setUp(){
MapDriver<Object, Text, Text, Text> mapDriver = MapDriver.newMapDriver(new myMapper());
}
#Test
public void testFunction1() throws IOException {
mapDriver.withInput(...).withOutput(...).runTest();
}
Thanks

I was checking the MRUnit API, so. If you want to add a custom RecordReader i would guess you must have a custom InputFormat due to must assign the custom RecordReader into the createRecordReader method of the custom InputFormat class.
So, MRUnit API allows you to assign a custom InputFormat with also the custom OutputFormat.
public MapDriver<K1,V1,K2,V2> withOutputFormat(Class<? extends org.apache.hadoop.mapreduce.OutputFormat> outputFormatClass,
Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)
Configure Mapper to output with a real OutputFormat. Set InputFormat to read
output back in for use with run* methods
Parameters:
outputFormatClass -
inputFormatClass -
Returns:
this for fluent style
Based on this, you can call mapDriver.withOutFormat(customOutputFormat.class, customInputFormat.class). With this way, you can use your RecordReader for testing.

Related

Kafka Streams - Unit test for ProductionExceptionHandler implementation

I am developing a Kafka streams application that uses Spring Cloud streams for configuration.
In order to handle "RecordTooLargeException" I have implemented a custom ProductionExceptionHandler as below.
public class CustomProductionExceptionHandler implements ProductionExceptionHandler {
#Override
public ProductionExceptionHandlerResponse handle(ProducerRecord<byte[], byte[]> record,
Exception exception) {
if (exception instanceof RecordTooLargeException) {
return ProductionExceptionHandlerResponse.CONTINUE;
}
return ProductionExceptionHandlerResponse.FAIL;
}
#Override
public void configure(Map<String, ?> configs) {
}
}
I have added the following property in my application.yml
spring.kafka:
streams:
properties:
default.production.exception.handler: "com.fd.acquisition.product.availability.product.exception.StreamsRecordProducerErrorHandler"
The code works fine as the default failing behavior is overridden and the processing is continued.
I am trying to write unit test cases to simulate this behaviour.
I am making use of TopologyTestDriver and it has the following configuration.
Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "TopologyTestDriver");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "dummy:1234");
streamsConfiguration.put(ProducerConfig.MAX_REQUEST_SIZE_CONFIG, "200");
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, (10 * 1024 ));
streamsConfiguration.put(StreamsConfig.SEND_BUFFER_CONFIG, (10 ));
streamsConfiguration.put(StreamsConfig.RECEIVE_BUFFER_CONFIG, (10 ));
final String tempDrectory = Files.createTempDirectory("kafka-streams").toAbsolutePath().toString();
streamsConfiguration.setProperty(StreamsConfig.STATE_DIR_CONFIG, tempDrectory);
streamsConfiguration.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
CustomProductionExceptionHandler.class);
Ideally, the size of the record should throw RecordTooLargeException when I push a large record.
But the MockProducer class is used instead of KafkaProducer and hence the ensureValidRecordSize() method is never called.
Is there any way to override this behaviour or any other way to simulate the RecordTooLargeException ?
I think this is basically a feature missing from MockProducer. So the first option would be to contribute a PR back to Kafka that adds the missing check to MockProducer. It looks that it should be fairly easy to add it. If you are not a developer, alternatively you can create an issue and hope somebody from the community will pick it up.
Either way, you will have to wait for the next Kafka release to get the feature. You have to decide whether you can live without the unit test until then. There could be ways to hack this (use reflection to make TopologyTestDriver.producer visible / non-final and replace it by an "improved" version of MockProducer), but it's probably going to be harder than to just contribute to Kafka.

aem-mocks property test a servlet

Trying to write some proper AEM integration tests using the aem-mocks framework. The goal is to try and test a servlet by calling its path,
E.g. an AEM servlet
#SlingServlet(
paths = {"/bin/utils/emailSignUp"},
methods = {"POST"},
selectors = {"form"}
)
public class EmailSignUpFormServlet extends SlingAllMethodsServlet {
#Reference
SubmissionAgent submissionAgent;
#Reference
XSSFilter xssFilter;
public EmailSignUpFormServlet(){
}
public EmailSignUpFormServlet(SubmissionAgent submissionAgent, XSSFilter xssFilter) {
this.submissionAgent = submissionAgent;
this.xssFilter = xssFilter;
}
#Override
public void doPost(SlingHttpServletRequest request, SlingHttpServletResponse response) throws IOException {
String email = request.getParameter("email");
submissionAgent.saveForm(xssFilter.filter(email));
}
}
Here is the corresponding test to try and do the integration testing. Notice how I've called the servlet's 'doPost' method, instead of 'POST'ing via some API.
public class EmailSignUpFormServletTest {
#Rule
public final AemContext context = new AemContext();
#Mock
SubmissionAgent submissionAgent;
#Mock
XSSFilter xssFilter;
private EmailSignUpFormServlet emailSignUpFormServlet;
#Before
public void setup(){
MockitoAnnotations.initMocks(this);
Map<String,String> report = new HashMap<>();
report.put("statusCode","302");
when(submissionAgent.saveForm(any(String.class)).thenReturn(report);
}
#Test
public void emailSignUpFormDoesNotRequireRecaptchaChallenge() throws IOException {
// Setup test email value
context.request().setQueryString("email=test.only#mail.com");
//===================================================================
/*
* WHAT I END UP DOING:
*/
// instantiate a new class of the servlet
emailSignUpFormServlet = new EmailSignUpFormServlet(submissionAgent, xssFilter);
// call the post method (Simulate the POST call)
emailSignUpFormServlet.doPost(context.request(),context.response());
/*
* WHAT I WOULD LIKE TO DO:
*/
// send request using some API that allows me to do post to the framework
// Example:
// context.request().POST("/bin/utils/emailSignUp") <--- doesn't exist!
//===================================================================
// assert response is internally redirected, hence expected status is a 302
assertEquals(302,context.response().getStatus());
}
}
I've done a lot of research on how this could be done (here) and (here), and these links show a lot about how you can set various parameters for context.request() object. However, they just don't show how to finally execute the 'post' call.
What you are trying to do is mix a UT with IT so this won't be easy at least with the aem-mocks framework. Let me explain why.
Assuming that you are able to call your required code
/*
* WHAT I WOULD LIKE TO DO:
*/
// send request using some API that allows me to do post to the framework
// Example:
// context.request().POST("/bin/utils/emailSignUp") <--- doesn't exist!
//===================================================================
Your test will end up executing all the logic in SlingAllMethodsServlet class and its parent classes. I am assuming that this is not what you want to test as these classes are not part of your logic and they already have other UT/IT (under respective Apache projects) to cater for testing requirements.
Also, looking at your code, bulk of your core logic resides in following snipper
String email = request.getParameter("email");
submissionAgent.saveForm(xssFilter.filter(email));
Your UT criteria is already met by the following line of your code:
emailSignUpFormServlet.doPost(context.request(),context.response());
as it covers most of that logic.
Now, if you are looking for proper IT for posting the parameters and parsing them all the way down to doPost method then aem-mocks is not the framework for that because it does not provide it in a simple way.
You can, in theory, mock all the layers from resource resolver, resource provider and sling servlet executors to pass the parameters all the way to your core logic. This can work but it won't benefit your cause because:
Most of the code is already tested via other UT
Too many internal mocking dependencies might make the tests flaky or version dependant.
If you really want to do pure IT, then it will be easier to host the servlet in an instance and access it via HttpClient. This will ensure that all the layers are hit. A lot of tests are done this way but it feels a bit heavy handed for the functionality you want to test and there are better ways of doing it.
Also the reason why context.request().POST doesn't exist is because context.request() for is a mocked state for the sake of testing. You want to actually bind and mock Http.Post operations which needs some way to resolve to your servlet and that is not supported by the framework.
Hope this helps.

Symfony Doctrine entity manager and repository

I understand the benefit or repository pattern but I just can't understand in Symfony3 Doctrine there are Doctrine\ORM\EntityManager and \Doctrine\ORM\EntityRepository
What are the difference between the two?
Is repository should be injected to controller or entity manager?
Edit
The correct question should be: What's the proper way to access a repository from a controller?
Should a repository be injected to a controller as a service?
Should a repository be injected to another service as a service?
Should entity manager contain any query at all?
Edit
The correct question should be: should a service contain a query at all? Which #MateuszSip already explained, it could be done by injecting Entity Manager
Should a custom function like getAvailableManagers be put in
repository or services? (Where manager is a repository and there
are some logic in determining available manager)
How about a more generic function like findAllManager, should it be in repository or entity manager?
Currently I'm using Symfony3. Thank you very much
Cheers,
Edit
Talking to #MateuszSip (thanks mate), I decided to make my question clearer with an example below. Please note that below code are not representing real problem
controller
Class ManagementController
{
public function assignManager($projectType)
{
// Grabbing a service
$s = $this->get('mycompany_management_management_service')
$managers = $s->findAvailableManagers();
$managers = $s->checkCapability($managers, $projectType);
return $managers
}
}
repository
class ManagerRepository extends \Doctrine\ORM\EntityRepository
{
public function findAvailableManagers()
{
...
return $managers
}
public function checkCapability($managers, $type)
{
...
return $capableManagers
}
}
services
class ManagementService
{
... I am not sure what should be here.
}
EntityManager is used to manage doctrine-related objects, so:
you can persist an entity object (it's now managed by doctrine, and ready to save)
you can remove an entity object (so it'll be deleted later)
you can flush, and it'll trigger pending operations
you can get a repository (to get objects you'll need) or use a generic api to get an object by a primary key
etc.
It's a class that manages a state of objects and their relation to the database.
Repository is a pattern that standarizes an access to the entites.
If your app is complex, you should inject a separate service(s) to your controller. So there's a UserSaver service (as an example) that use entityManager to create/update a user and UserFinder (or something well-named) using UserRepository which's responsible of fetching user by defined criterias.
You can create a query using entity manager, but em itself cannot contain queries.
In my opinion, define a method inside a service, and a corresponding method in your UserRepository. At this moment, all of what you want should be fetched by a database, but it can change later.
In repository. Methods like: findByRole(role=manager), findIsActive, findOneBySecurityNumber relies to a repository.

Mock SomeClass.getClassLoader() when called in method (Mockito / PowerMockito)

I have inherited some code that isn't tested and which loads a resource using a method like :
SomeClass.class.getClassLoader().getResource("somefile");
I've written the following test but there are 0 interactions with the Mock class loader I've created. Can anyone comment on whether this type of test is possible.
public enum SomeClass {
INSTANCE;
public boolean someMethod() {
URL pathToLicense = SomeClass.class.getClassLoader().getResource("somefile");
return false;
}
}
#Test
public void testLicenseWorkflow(){
ClassLoader cl = PowerMockito.mock(ClassLoader.class);
File f = new File("someFile");
assertTrue(f.exists());
logger.info(f.getCanonicalPath() );
when(cl.getResource("somefile")).thenReturn(f.toURL());
verify(cl).getResource("somefile");
assertTrue(SomeClass.INSTANCE.someMethod());
}
Update - Adding a resources via Classloader
I've also tried the following but the someMethod this doens't seem to work either
new URLClassLoader(((URLClassLoader) SomeClass.INSTANCE.getClass().getClassLoader()).getURLs()) {
#Override
public void addURL(URL url) {
super.addURL(url);
logger.info("Calling add URL");
}
}.addURL(f.toURI().toURL());
You are not passing cl to anything. You prepare a mock for a classloader but then proceed to load the resource with another classloader, the one that loaded SomeClass. That is why you have 0 interactions in your mock.
And about your first question, it is possible if somehow you pass your mocked classloader to the method that actually loads the resource. For example, something like
public boolean someMethod(Classloader loader) {
URL pathToLicense = loader.getResource("somefile");
return false;
}
But I have to say that IMO, this test is not very useful, you should be mocking your own components, not java classes. If your goal mocking the classloader is to inject a different file when testing, a better approach is to change your code to receive a Stream and inject a stream connected to the file in production and in testing inject a stream connected to an element in memory.
In other words, resources make for bad testing when they need to be changed at test time

Can the Map function of Map Reduce call a (external) web service?

I have to write a Map Reduce job in Java which use an external web service. my questions is whether it is allowed to invoke an HTTP request from the Map function of a Map Reduce job, i.e:
public class GeoLocator {
private static String genderCheck = "female";
public static class Map extends MapReduceBase implements Mapper {
/* CALL EXTERNAL WEB SERVICE HERE */
}
..
}
If so, how can I invoke the web service?
I think you are using the old API, since, I see this line in your class,
public static class Map extends MapReduceBase implements Mapper {}
So, with respect to that, first, you need to set the webservice URL in your JobConf object, later, from the JobConf instance retrieve the same url in the mapper.
Furthermore, make sure your webservice url is accessible from all the Hadoop nodes
For e.g-
In main() function, you can do something like,
JobConf job = (JobConf) getConf();
job.set("webservice.url", "http://your_address_whatsoever");
And in the Mapper class -
String url = null;
public void configure(JobConf job) {
url = (String) job.get("webservice.url");
}
But I suggest you use the new API to kind of make things easy -
in main() you can simply go ,
Configuration conf = getConf();
set("webservice.url", "http://your_address_whatsoever");
And the mapper class's map() function simply do
String url = context.getConfiguration().get("webservice.url");