Should the Mapper and Reducer be inner classes?

Should the Mapper and Reducer be inner classes? - mapreduce

In the O'Reilly Hadoop guide, implementations of the mapper and reducer classes are first introduced in separate files. Pages later, it shows them as inner classes.
With the jobs that I've written and worked on, I've found that implementations with the mapper and reducers bundled together in one class are much harder to work with. So my convention is to write separate, top-level classes. Is this correct?

Looking at a java program perspective there are certainly some differences between Inner classes and Outer classes and one of the main advantage being that
Nested classes represent a special type of relationship that is it can access all the members (data members and methods) of outer class including private.
In a MapReduce program since the mapper and reducer run independent of each other using Inner classes certainly does not provide any programmatic advantage.
The main reason why Inner classes are used in the book is for readability and ease of use. Any newbie trying to copy the code from the book can copy paste(in case of e-book) the entire code to one single Java file and execute the program.
Just FYI:
Refer this link to know when to use inner classes.

You can write them both as inner classes or as separate classes. The second one is much better practice.

Related

Implement concatenative inheritance in C++

Is it ppssible to implement a concatenative inheritance or at least mixins in C++?
It feels like it is impossible to do in C++, but I cannot prove it.
Thank you.

According to this article:
Concatenative inheritance is the process of combining the properties
of one or more source objects into a new destination object.
Are we speaking of class inheritance ?
This is the basic way public inheritance works in C++. Thanks to multiple inheritance, you can even combine several base classes.
There might be some constraints however (e.g. name conflicts between different sources have to be addressed, depending on use case you might need virtual functions, and there might be the need to create explicitly a combined constructors).
Or is inheritance from instantiated objects meant ?
If it's really about objects and not classes, the story is different. You cannot clone and combine object of random type with each other, since C++ is a strongly typed language.
But first, let's correct the misleading wording. It's not really about concatenative inheritance, since inheritance is for classes. It's rather "concatenative prototyping", since you create new objects by taking over values and behaviors of existing objects.
To realize some kind of "concatenative prototyping" in C++, you therefore need to design it, based on the principle of composition, using a set of well defined "concatenable" (i.e. composable) base classes. This can be achieved, using the prototype design pattern together with the entity-component-system architecture.
What's the purpose
You are currently looking for this kind of construct, probably because you used it heavily in a dynamically typed language.
So keep in mind the popular quote (Mark Twain ? Maslow ? ):
If you have a hammer in your hand, every problem looks like nails
So the question is what you are really looking for and what problem you intend to solve. IMHO, it cannot be excluded that other idioms could be more suitable in the C++ world to achieve the same objective.

Should I seperate model classes or have them as a single unit?

My game logic model consists of multiple connected classes. There are Board, Cell, Character, etc. Character can be placed (and moved) in Cell (1-1 rel).
There are two approaches:
Make each class of model implement interfaces so that they can be mocked and each class can be tested independently. It forces me to make implementation of each class to not rely on another. But in practice it's hard to avoid Board knowing about Cells too much and Characters knowing how Cell storing mechanism works. I have a Character.Cell and Cell.CurrentCharacter properties. In order for setters to work correctly (not go recursively) they should rely on each others implementation. It feels like the model logic should be considered as a single unit.
Make all public members to return interfaces but use exact classes inside (can involve some downcasting). The cons here are such that I should test the whole model as a single and can't use mocking to test different parts independently. Also there is no sense to use dependency injection inside model, only to get another full model implementation from controller.
So what to do?
UPDATE
You can propose other options.

Why are these the only 2 options?
If you intend to have different versions/types of the classes then interfaces/abstract base classes are a good option to enforce shared behaviour and generalize many operations. However the idea of building the classes independently without knowledge of each other is ridiculous.
It is always a good idea to separate class storage/behaviour to the class/layer it belongs. E.g. no business logic code in the data layer, etc. but the classes need to know about each other in order to function properly. If you make everything independent and based on interfaces you run the risk of over generalizing the application and reducing your efficiency.
Basically if you think you would need to ever downcast the incoming objects to more than one type it's a good idea to look at the design and see if you are gaining anything for the performance loss and nasty casting code you are about to write. If you will be required to handle every type of downcast object you have not gained anything and using polymorphism and a base class is a much better way to go.
Using interfaces does not eliminate your trouble in testing. You will still have to instantiate some version of the objects to test most of the functions on the cell/board anyway. Which for full regression testing will require you test each character's interaction with both.
Don't get me wrong, your character class should most likely have a base class or have an interface. All characters will (I'm sure) share many actions and can benefit from this design. E.g. Moving a character on the board is a fairly generic operation and can be made independent of the character except for a few pieces of information (such as how the character moves, if they are allowed to move, etc.) which should be part of said base class/interface.
When it is reasonable, design classes independently so that they can be tested on their own, but do not use testing as a reason to write bad code. Simple stubs or basic testing instances can be created to help with component testing and takes far less time and effort than fixing unnecessarily complex code.
Interfaces have a purpose, but if you will not be treating 2 classes the same... that is not it.
*Using MVC gives you a leg up on testing as well. If done correctly you should be able to swap out any of the layers to ease your testing of a single layer.

Organizing large C++ project

Should all C++ code in a project be encapsulated into a single class with main simply calling that class? Or should the main function declare variables and classes.

If you are going to build a large project in C++, you should at the very least read Large Scale C++ Software Design by John Lakos about it. It's a little old but it sounds like you could benefit from the fundamentals in it.
Keep in mind that building a large scale system in any language is a challenge and requires skill and discipline to prevent it falling to pieces very quickly. Don't take it lightly.
That said, if your definition of "large" is different than mine then I may have alternative advice to give you. I'm assuming you're talking about a project where the word "million" will be mentioned in sentences that also contain the words "lines of code".

for large C++ projects, you should create many classes!
main should just kick things off (maybe doing a few housekeeping things) and then calling into a class that will fire up the rest of the system

If it's a class that really makes sense, sure -- but at least IME, that's a fairly rare exception, not the general rule.
Here, I'm presuming that you don't really mean all the code is in one class, but that there's a single top-level class, so essentially all main does is instantiate and use it. That class, in turn, will presumably instantiate and use other subordinate classes.
If you really mean "should all the code being contained in a single class?", then the answer is almost certainly a resounding "no", except for truly minuscule projects. Much of the design of classes in C++ (and most other OO languages) is completely pointless if all the code is in one class.

If you can put your entire project in one class without going insane, your definition of "large" may be different than most people's here. Which is fine -- just keep in mind when you ask people about a "large" c++ project, they will assume you're talking about something that takes multiple person-years to create.
That said, the same principles of encapsulation apply no matter what the size of the project. Break your logic and data into units that make sense and are not too tied together and then organize your class(es) around those divisions. Don't be afraid to try one organization and then refactor it into another organization if you find yourself copy-pasting code, or if you find one class depending too heavily on another. (Or if you find yourself with too many classes and you're creating many objects to accomplish one task where a single object would be cleaner and easier on you.)
Have fun and don't be afraid to experiment a little.

In C++ you should avoid putting entire project in one class, irrespective of big or small. At the max you can try putting it in 1 or 2 namespace (which can be split across the files).
The advantage of having multiple classes are,
Better maintainability of your code
Putting classes in multiple .h and .cpp files (i.e. small modules) help you fast debugging
If all code is in one class and changes are made somewhere then one has to compile whole project. Instead, if project is across modules, one can just compile the module where changes are made. It saves time a lot.

No! Each header/implementation file pair should represent a single class. Placing a large project in one file is a surefire way to disaster: the project become unmaintainable and compiling will take ages. Break up your code in to appropriately sized pieces.
The main function should not declare the classes, rather, the file it contains (often named something like main.cpp, driver.cpp, projectname.cpp) should use #include directives to make the compiler read the declarations in header files. Read up on C++'s separate compilation model for more info.
Some newcomers to C++ find the compilation model - as well as error codes generated when you screw it up - incomprehensible or intimidating and give up thinking it's not worth it. Don't let this be you. Learn how to properly organize your code.

Finding Implicit Communication in Classes

I am currently refactoring a very useful but poorly designed class in C++, and I'm running into a problem with the design: rather passing data around using arguments to methods, the data is passed around by setting private state variables in the class. This makes it very difficult for me to diagram out how data moves through functions. It's my weekend task to try and remove this style of passing data around as much as possible, as makes the program very impossible to understand from just the method signatures, as the signatures only tell a part of the story. I've decided
My current approach to test if a method communicates using private class-level variables is the following:
Edit the method and make it a function rather than a method, which removes its access to the state variables in the class.
Edit all of the calls to the method so that they call the function rather than the method.
Compile, see if anything breaks. Make a list of accessors to add to the original class.
Run the unit tests to see if I've broken anything in a very subtle way.
Is there a better way of doing this, perhaps one that can be easily automated? Is this refactoring a well-known technique that I can cite if I show it to other people?
The only mention of this problem that I've found so far is this quote from Coders at Work via the Object-oriented programming Wikipedia entry:
"The problem with object-oriented languages is they've got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle." - Joe Armstrong
Edit in response to a good question from Oli Charlesworth:
I understand that the point of OOP is to sometimes communicate through state variables of the class. The difficulty with my current case is that there are currently 78 different data members in the class, many of which are key-value pairs of strings to other data types, and there are undocumented implicit dependencies on the order in which they need to be initialized. It's possible that given a sufficiently smart programmer working with this class would be easy, but it's currently very difficult for me. I think that several of these data types could be abstracted into their own classes, but before I can do that I need to understand more clearly how the data members interact with each other.

Given the clarification in the question my "are you sure it's not just that you don't like the other programmer's style" comment dies a death ;)
Personally I'd just refactor normally. That is, with 78 data members and lots of bits that are related but not in a class of their own I'd start by grouping the related data and extracting the functionality that works on it. There's no need, IMHO, to go through a stage where you explicitly pass the data into the functions in the existing class. Just pick a group of related data items, come up with a decent name, extract them and work out where they were used and how you need to move functionality into the new class.
Ideally, I'd start writing unit tests for the main class and the new broken out classes as I went along...

Instead of making all of the method's callers call the function, a smaller intermediate change would be to leave the method in place for all callers, and have it simply delegate by calling the function. Later you can inline the method call so all callers are directly calling the function.
Also, from your description it sounds like you are approaching this with manual testing. You will have better success (easier refactoring with reduced risk of error) with comprehensive unit tests in place, although of course the code you describe would be hard to unit test. Nevertheless, work toward more test automation.

Guidelines for writing flexible software?

I've been developing an interpreter in C++ for my (esoteric, if you want) programming language some time now. One of the main things that I have noticed: I start with a flexible concept, and the further I code (Tokenizer->Parser->Interpreter) the less flexible the whole system gets.
For example: I didn't implement an include function at first, yet the interpreter was already up and running - I had extreme difficulties implementing it and it was just like "patching something out" later on. My system had lost flexibility very quickly.
How can I learn to keep relatively small C++ projects as flexible and extensible as possible during development?

If you need to keep
C++ projects as flexible and extensible as possible during development
then you haven't got a product specification, you have no real goal and no way of defining a finished product.
For a commercial product this is the worst situation to be in. To paraphrase one well known blogger (can't remember who) "you haven't got a product until you define what you aren't going to do."
For personal projects this might not be a problem. Chalk it up to experience and remember for future reference. Refactor and move on.

Define the structure of the project before you start coding. Outline your main objectives and think about how can you achieve that.
Code the headers.
Look if it's possible to implement every feature using this set of interfaces
If no -> go back to (2)
If yes -> code .cpp files
Enjoy.
Of course, this doesn't apply to really large projects. But if your design is modular, there shouldn't be any problems to divide the project into separate parts.

Don't fear Evolution (Refactoring).
If there are many class that fit a theme, create a common base class.
Instead of hard coding data members, use pointers to an abstract base class.
For example, instead of using std::ifstream use std::istream.
In my project, I have abstract classes for Reading and Writing. Classes that support reading and writing use these interfaces. I can pass specialized readers to these classes without changing any code. A data base reader would inherit from the base Reader class, and thus can be used anywhere a reader is used.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Should the Mapper and Reducer be inner classes? - mapreduce

You can write them both as inner classes or as separate classes. The second one is much better practice.

Related

Implement concatenative inheritance in C++

Should I seperate model classes or have them as a single unit?

Organizing large C++ project

Finding Implicit Communication in Classes

Guidelines for writing flexible software?

Categories

Resources