How to unit test a method that is having multiple object creation in switch statement? How to Mock them? - unit-testing

Another question is if there is any better way to write this method?
Public decimal CalculateTotalPrice(List<product> items)
{
decimal totalPrice = 0.m;
foreach(Product p in items)
{
if(p.Offer == "")
calc = new DefaultCalc();
else if(p.Offer == "BuyOneGetOneFree")
calc = new BuyOneGetOneFreeCalc();
else if(p.Offer == "ThreeInPriceOfTwo")
calc = new ThreeInPriceOfTwoCalc()
totalPrice += calc.Calculate(p.Quantity, p.UnitPrice);
}
return totalPrice;
}

You should probably review Polly Want a Message, by Sandi Metz
How to unit test a method that is having multiple object creation in switch statement?
An important thing to notice here is that the switch statement is an implementation detail. From the point of view of the caller, this thing is just a function
Public decimal CalculateTotalPrice(List<product> items);
If the pricing computations are fixed, you can just use the usual example based tests:
assertEquals(expectedPrice, CalculateTotalPrice(items));
But if they aren't fixed, you can still do testing based on the properties of the method. Scott Wlaschin has a really good introduction to property based testing. Based on the logic you show here, there are some things we can promise about prices, without knowing anything about the strategies in use
the price should always be greater than zero.
the price of a list of items is the same as the sum of the prices of the individual items.
if there is any better way to write this method?
You might separate choosing the pricing strategy from using the strategy. As Sandi notes, that sort of construct often shows up in more than once place.
foreach(Product p in items)
{
calc = pricing(p.Offer);
totalPrice += calc.Calculate(p.Quantity, p.UnitPrice);
}
"pricing" would then become something that you pass into this function (either as an argument, or as a dependency).
In effect, you would end up with three different kinds of test.
Checks that pricing returns the right pricing strategy for each offer.
Checks that each strategy performs its own calculation correctly.
Checks that CalculateTotalPrice compute the sum correctly.
Personally, I prefer to treat the test subject as a single large black box, but there are good counter arguments. Horses for courses.

Constructors can not be mocked (at least with free mocking frameworks).
Write tests without mocking as far as your tests run fast and test case setup is not very very complicated.
In your particular case you should be able to write tests without mocking.
Prepare data
var products = new List<Product>
{
new Product { Quantity = 10, UnitPrice = 5.0m, Offer = "" },
new Product { Quantity = 2, UnitPrice = 3.0m , Offer = "BuyOneGetOneFree" },
new Product { Quantity = 3, UnitPrice = 2.0m , Offer = "ThreeInPriceOfTwo" },
}
// prepare expected total
var expected = 57.0m; // 10 * 50.0 + 1 * 3.0 + 2 * 2.0
// Run the test
var actual = CalculateTotalPrice(products);
actual.Should().Be(expected); // pass Ok.
With this approach tests will not depend on implementation details.
You will be able to freely play with designs without rewriting tests every time you change your implementation logic.

The other answers are technically fine, but I would suggest one thing:
if(p.Offer == "")
calc = new DefaultCalc();
else if(p.Offer == "BuyOneGetOneFree")
calc = new BuyOneGetOneFreeCalc();
else if(p.Offer == "ThreeInPriceOfTwo")
calc = new ThreeInPriceOfTwoCalc()
should absolutely go into its own method/scope/whatever.
You are mapping a string to a specific calculator. That should happen in one place, and one place only. You see, first you do that here. Then some method method comes around that needs the same mapping. So you start duplicating.

Related

Google Sheets / Google Data Studio - RegEx

I got cells in a Google Sheet, which consist of some combined data to track workout progress. They look something like this:
80kg-3x5, 100kg-1x3
For a given exercise, i.e. hang snatch above, it means what actual work loads I did for that exercise on a given date, with weights and the related set x reps separated by commas. So for one exercise, I might have only one work load, or several (which are then comma separated). I have them in a single cell to keep the data tidy, and reduce time when entering the data after a workout.
Now to analyze the data, I need to somehow separate the comma separated values. An example using the sample cell data above, would be total volume for that exercise, with an expression like this:
Sum( (digit before 'kg') * (digit before 'x') * (digit after 'x') + Same expression before, if comma ',' exists after first expression (multiple loads for the exercise) )
It should be a trivial task, but I haven't touched the functions in google sheet or data studio that much, and I had a surprisingly difficult time figuring out a way to either loop through the content in a cell with appropriate regex, or other ways. I could do this easily in python and then any other visualization software, but the point for going this way using drive tools is that it saves a lot of time (if it works...). I can either implement it in google sheet, or in data studio as a new calculated column from the import, whichever makes it possible.
If you are looking to write a custom function, something like this may do the trick (though it needs work for better error-handling)
function workoutProgress(string) {
if (string == '' || string == null || string == undefined) { return 'error';}
var stringArray = string.split(",");
var sum = 0;
var digitsArray, digitsProduct;
if ( stringArray.length > 0) {
for (var element in stringArray) {
digitsArray = stringArray[element].match(/\d{1,}/g);
digitsProduct = digitsArray.reduce(function(product, digit){ return product*digit;});
sum += digitsProduct;
}
}
return sum;
}
It can be achieved using the RegEx Calculated Field below where Field represents the respective field name; each row represents a single workload (for example 80kg-3x5), thus the below accounts for 5 workloads (more can be added, for example a 6th could be added by copy-pasting the 5th line and incrementing he number in curly brackets by one - that is, changing {4} to {5}):
(CAST(REGEXP_EXTRACT(Field,"^(\\d+)kg")AS NUMBER) * CAST(REGEXP_EXTRACT(Field,"^\\d+kg-(\\d+)")AS NUMBER) * CAST(REGEXP_EXTRACT(Field,"^\\d+kg-\\d+x(\\d+)")AS NUMBER)) +
(NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){1}(\\d+)kg")AS NUMBER),0) * NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){1}\\d+kg-(\\d+)")AS NUMBER),0) * NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){1}\\d+kg-\\d+x(\\d+)")AS NUMBER),0)) +
(NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){2}(\\d+)kg")AS NUMBER),0) * NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){2}\\d+kg-(\\d+)")AS NUMBER),0) * NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){2}\\d+kg-\\d+x(\\d+)")AS NUMBER),0)) +
(NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){3}(\\d+)kg")AS NUMBER),0) * NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){3}\\d+kg-(\\d+)")AS NUMBER),0) * NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){3}\\d+kg-\\d+x(\\d+)")AS NUMBER),0)) +
(NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){4}(\\d+)kg")AS NUMBER),0) * NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){4}\\d+kg-(\\d+)")AS NUMBER),0) * NARY_MAX(CAST(REGEXP_EXTRACT(Field,"^(?:\\d+kg-\\d+x\\d+,\\s){4}\\d+kg-\\d+x(\\d+)")AS NUMBER),0))
Editable Google Data Studio Report, Embedded Data Source, Editable Data Set (Google Sheets) and a GIF to elaborate, so feel free to change the name of the field (at the Data Source) to adapt the field to the Calculated Field:

thinkscript if statement failure

The thinkscript if statement fails to branch as expected in some cases. The following test case can be used to reproduce this bug / defect.
It is shared via Grid containing chart and script
To cut the long story short, a possible workaround in some cases is to use the if-expression which is a function, which may be slower, potentially leading to Script execution timeout in scans.
This fairly nasty bug in thinkscript prevents me from writing some scans and studies the way I need to.
Following is some sample code that shows the problem on a chart.
input price = close;
input smoothPeriods = 20;
def output = Average(price, smoothPeriods);
# Get the current offset from the right edge from BarNumber()
# BarNumber(): The current bar number. On a chart, we can see that the number increases
# from left 1 to number of bars e.g. 140 at the right edge.
def barNumber = BarNumber();
def barCount = HighestAll(barNumber);
# rightOffset: 0 at the right edge, i.e. at the rightmost bar,
# increasing from right to left.
def rightOffset = barCount - barNumber;
# Prepare a lookup table:
def lookup;
if (barNumber == 1) {
lookup = -1;
} else {
lookup = 53;
}
# This script gets the minimum value from data in the offset range between startIndex
# and endIndex. It serves as a functional but not direct replacement for the
# GetMinValueOffset function where a dynamic range is required. Expect it to be slow.
script getMinValueBetween {
input data = low;
input startIndex = 0;
input endIndex = 0;
plot minValue = fold index = startIndex to endIndex with minRunning = Double.POSITIVE_INFINITY do Min(GetValue(data, index), minRunning);
}
# Call this only once at the last bar.
script buildValue {
input lookup = close;
input offsetLast = 0;
# Do an indirect lookup
def lookupPosn = 23;
def indirectLookupPosn = GetValue(lookup, lookupPosn);
# lowAtIndirectLookupPosn is assigned incorrectly. The if statement APPEARS to be executed
# as if indirectLookupPosn was 0 but indirectLookupPosn is NOT 0 so the condition
# for the first branch should be met!
def lowAtIndirectLookupPosn;
if (indirectLookupPosn > offsetLast) {
lowAtIndirectLookupPosn = getMinValueBetween(low, offsetLast, indirectLookupPosn);
} else {
lowAtIndirectLookupPosn = close[offsetLast];
}
plot testResult = lowAtIndirectLookupPosn;
}
plot debugLower;
if (rightOffset == 0) {
debugLower = buildValue(lookup);
} else {
debugLower = 0;
}
declare lower;
To prepare the chart for the stock ADT, please set custom time frame:
10/09/18 to 10/09/19, aggregation period 1 day.
The aim of the script is to find the low value of 4.25 on 08/14/2019.
I DO know that there are various methods to do this in thinkscript such as GetMinValueOffset().
Let us please not discuss alternative methods of achieving the objective to find the low, alternatives for the attached script.
Because I am not asking for help achieving the objective. I am reporting a bug, and I want to know what goes wrong and perhaps how to fix it. In other words, finding the low here is just an example to make the script easier to follow. It could be anything else that one wants a script to compute.
Please let me describe the script.
First it does some smoothing with a moving average. The result is:
def output;
Then the script defines the distance from the right edge so we can work with offsets:
def rightOffset;
Then the script builds a lookup table:
def lookup;
script getMinValueBetween {} is a little function that finds the low between two offset positions, in a dynamic way. It is needed because GetMinValueOffset() does not accept dynamic parameters.
Then we have script buildValue {}
This is where the error occurs. This script is executed at the right edge.
buildValue {} does an indirect lookup as follows:
First it goes into lookup where it finds the value 53 at lookupPosn = 23.
With 53, if finds the low between offset 53 and 0, by calling the script function getMinValueBetween().
It stores the value in def lowAtIndirectLookupPosn;
As you can see, this is very simple indeed - only 38 lines of code!
The problem is, that lowAtIndirectLookupPosn contains the wrong value, as if the wrong branch of the if statement was executed.
plot testResult should put out the low 4.25. Instead it puts out close[offsetLast] which is 6.26.
Quite honestly, this is a disaster because it is impossible to predict which of any if statement in your program will fail or not.
In a limited number of cases, the if-expression can be used instead of the if statement. However the if-expression covers only a subset of use cases and it may execute with lower performance in scans. More importantly,
it defeats the purpose of the if statement in an important case because it supports conditional assignment but not conditional execution. In other words, it executes both branches before assigning one of two values.

Should I reimplement the logic in property based test?

Let's say there is a function to determine if a button should be visible.
fun isButtonVisible(fitlers: List<Filters>, results: List<Shop>, isLoading: Boolean) {
return fitlers.isNotEmpty() && results.isEmpty() && !isLoading
}
Now I would like to test this function using PBT like:
"the button should be visible if filters is not empty and results is empty and is not loading" {
forAll { filters: List<Filters>, results: List<Shop>, isLoading: Boolean ->
val actual = isButtonVisible(filters, results, isLoading)
// Here reimplement the logic
val expected = filters.isNotEmpty() && results.isEmpty() && !isLoading
assertThat(actual).isEqual(expected)
}
}
It seems that I just reimplement the logic again in my test, is this correct? If not, how can I come up with another properties if the logic is just simple combinations of several flags?
that is not right.
you should not have to calculate what the expected value should be during the test, you should know what the result should be, set it as such and compare it against the actual result.
Tests work by calling the method you want to test and comparing the result against an already known, expected value.
"the button should be visible when filters are not empty, results is empty, isLoading is false " {
forAll { filters: List<Filters>, results: List<Shop>, isLoading: Boolean ->
val actualVisibleFlag = isButtonVisible(filters, results, isLoading)
val expectedVisibleFlag = true
assertThat(actualVisibleFlag ).isEqual(expectedVisibleFlag )
}
}
Your expected value is known, this is the point I am trying to make.
For each combination of inputs, you create a new test.
The idea here is that when you have a bug you can easily see which existing test fails or you can add a new one which highlights the bug.
If you call a method, to give you the result of what you think you should get, well, how do you know that method is correct anyway? How do you know it works correctly for every combination?
You might get away with less tests if you reduce your number of flags maybe, do you really need 4 of them?
Now, each language / framework has ( or should have ) support for a matrix kind of thing so you can easily write the values of every combination

CouchDB View - filter keys before grouping

I have a CouchDB database which has documents with the following format:
{ createdBy: 'userId', at: 123456, type: 'action_type' }
I want to write a view that will give me how many actions of each type were created by which user. I was able to do that creating a view that does this:
emit([doc.createdBy, doc.type, doc.at], 1);
With the reduce function "sum" and consuming the view in this way:
/_design/userActionsDoc/_view/userActions?group_level=2
this returns a result with rows just in the way I want:
"rows":[ {"key":["userId","ACTION_1"],"value":20}, ...
the problem is that now I want to filter the results for a given time period. So I want to have the exact same information but only considering actions which happened within a given time period.
I can filter the documents by "at" if I emit the fields in a different order.
?group_level=3&startkey=[149328316160]&endkey=[1493283161647,{},{}]
emit([doc.at, doc.type, doc.createdBy], 1);
but then I won't get the results grouped by userId and actionType. Is there a way to have both? Maybe writing my own reduce function?
I feel your pain. I have done two different things in the past to attempt to solve similar issues.
The first pattern is a pain and may work great or may not work at all. I've experienced both. Your map function looks something like this:
function(doc) {
var obj = {};
obj[doc.createdBy] = {};
obj[doc.createdBy][doc.type] = 1;
emit(doc.at, obj);
// Ignore this for now
// emit(doc.at, JSON.stringify(obj));
}
Then your reduce function looks like this:
function(key, values, rereduce) {
var output = {};
values.forEach(function(v) {
// Ignore this for now
// v = JSON.parse(v);
for (var user in v) {
for (var action in v[user]) {
output[user][action] = (output[user][action] || 0) + v[user][action];
}
}
});
return output;
// Ignore this for now
// return JSON.stringify(output);
}
With large datasets, this usually results in a couch error stating that your reduce function is not shrinking fast enough. In that case, you may be able to stringify/parse the objects as shown in the "ignore" comments in the code.
The reasoning behind this is that couchdb ultimately wants you to output a simple object like a string or integer in a reduce function. In my experience, it doesn't seem to matter that the string gets longer, as long as it remains a string. If you output an object, at some point the function errors because you have added too many props to that object.
The second pattern is potentially better, but requires that your time periods are "defined" ahead of time. If your time period requirements can be locked down to a specific year, specific month, day, quarter, etc. You just emit multiple times in your map function. Below I assume the at property is epoch milliseconds, or at least something that the date constructor can accurately parse.
function(doc) {
var time_key;
var my_date = new Date(doc.at);
//// Used for filtering results in a given year
//// e.g. startkey=["2017"]&endkey=["2017",{}]
time_key = my_date.toISOString().substr(0,4);
emit([time_key, doc.createdBy, doc.type], 1);
//// Used for filtering results in a given month
//// e.g. startkey=["2017-01"]&endkey=["2017-01",{}]
time_key = my_date.toISOString().substr(0,7);
emit([time_key, doc.createdBy, doc.type], 1);
//// Used for filtering results in a given quarter
//// e.g. startkey=["2017Q1"]&endkey=["2017Q1",{}]
time_key = my_date.toISOString().substr(0,4) + 'Q' + Math.floor(my_date.getMonth()/3).toString();
emit([time_key, doc.createdBy, doc.type], 1);
}
Then, your reduce function is the same as in your original. Essentially you're just trying to define a constant value for the first item in your key that corresponds to a defined time period. Works well for business reporting, but not so much for allowing for flexible time periods.

MongoDB MapReduce update in place how to

*Basically I'm trying to order objects by their score over the last hour.
I'm trying to generate an hourly votes sum for objects in my database. Votes are embedded into each object. The object schema looks like this:
{
_id: ObjectId
score: int
hourly-score: int <- need to update this value so I can order by it
recently-voted: boolean
votes: {
"4e4634821dff6f103c040000": { <- Key is __toString of voter ObjectId
"_id": ObjectId("4e4634821dff6f103c040000"), <- Voter ObjectId
"a": 1, <- Vote amount
"ca": ISODate("2011-08-16T00:01:34.975Z"), <- Created at MongoDate
"ts": 1313452894 <- Created at timestamp
},
... repeat ...
}
}
This question is actually related to a question I asked a couple of days ago Best way to model a voting system in MongoDB
How would I (can I?) run a MapReduce command to do the following:
Only run on objects with recently-voted = true OR hourly-score > 0.
Calculate the sum of the votes created in the last hour.
Update hourly-score = the sum calculated above, and recently-voted = false.
I also read here that I can perform a MapReduce on the slave DB by running db.getMongo().setSlaveOk() before the M/R command. Could I run the reduce on a slave and update the master DB?
Are in-place updates even possible with Mongo MapReduce?
You can definitely do this. I'll address your questions one at a time:
1.
You can specify a query along with your map-reduce, which filters the set of objects which will be passed into the map phase. In the mongo shell, this would look like (assuming m and r are the names of your mapper and reducer functions, respectively):
> db.coll.mapReduce(m, r, {query: {$or: [{"recently-voted": true}, {"hourly-score": {$gt: 0}}]}})
2.
Step #1 will let you use your mapper on all documents with at least one vote in the last hour (or with recently-voted set to true), but not all the votes will have been in the last hour. So you'll need to filter the list in your mapper, and only emit those votes you wish to count:
function m() {
var hour_ago = new Date() - 3600000;
this.votes.forEach(function (vote) {
if (vote.ts > hour_ago) {
emit(/* your key */, this.vote.a);
}
});
}
And to reduce:
function r(key, values) {
var sum = 0;
values.forEach(function(value) { sum += value; });
return sum;
}
3.
To update the hourly scores table, you can use the reduceOutput option to map-reduce, which will call your reducer with both the emitted values, and the previously saved value in the output collection, (if any). The result of that pass will be saved into the output collection. This looks like:
> db.coll.mapReduce(m, r, {query: ..., out: {reduce: "output_coll"}})
In addition to re-reducing output, you can use merge which will overwrite documents in the output collection with newly created ones (but leaving behind any documents with an _id different than the _ids created by your m-r job), replace, which is effectively a drop-and-create operation and is the default, or use {inline: 1}, which will return the results directly to the shell or to your driver. Note that when using {inline: 1}, your results must fit in the size allowed for a single document (16MB in recent MongoDB releases).
(4.)
You can run map-reduce jobs on secondaries ("slaves"), but since secondaries cannot accept writes (that's what makes them secondary), you can only do this when using inline output.