We are using DynamoDB for counting user actions and an item must be either inserted or updated, depending on whatever it's already exists. The code must also update a counter. Right now we do this with 2 steps:
using (var client = AWSClientFactory.CreateAmazonDynamoDBClient(RegionEndpoint.USEast1))
{
var table = Table.LoadTable(client, TableName);
var item = await table.GetItemAsync(id);
if (item == null)
{
// row not exists -> insert & return 1
var document = new Document();
document["Id"] = id;
document["Counter"] = 1;
await table.PutItemAsync(document);
return 1;
}
// row exists -> increment counter & update
var counter = item["Counter"].AsInt();
item["Counter"] = counter + 1;
await table.UpdateItemAsync(item);
return counter + 1;
}
The problem with the code is that it increases latency times & server load. I would prefer to do this with a single operation. I think this should be possible with conditional expressions but I cannot figure out how to do this using .NET SDK.
Be careful about incrementing counters yourself, as you could have race conditions if multiple instances of your app can increment the counter. Instead use DynamoDB Atomic Counters. For example, my ruby code calls the UpdateItem API with the following (older) way of incrementing counters:
{"counter" => {value: {n: "1"}, action: "ADD"}}
The newer way is to use an Update Expression, which I haven't implemented yet. Also, if the counter/item doesn't already exist, it will assume the value is 0 and increment the counter to 1.
you have a race condition in your code.
it's possible that 2 different worker create the item at the same time.
the recommended pattern for what you are trying to do is:
create if not exists operation for the item.
atomic counter update on "Count"
So instead of 3 operations (get, put, update) - that also have a race condition - in this case you will only have 2 operations (and the correct behavior)
hope this helps.
Related
I am currently building a map/reduce script in NetSuite which passes the results of a saved search from the getInputData stage to the map stage. This is being done by first running a WHILE loop in the getInputData stage to obtain the internal ids of each entry, inserting into an array, then passing over to the map stage. Like so:
// run saved search - unlimited rows from saved search.
do {
var subresults = invoiceSearch.run().getRange({ start: start, end: start + pageSize });
results = results.concat(subresults);
count = subresults.length;
start += pageSize + 1;
} while (count == pageSize);
var invSearchArray = [];
if(invoiceSearch){
//NOTE: .run().each has a limit of 4,000 results, hence the do-while loop above.
for (var i = 0; i < results.length; i++){
var invObj = new Object();
invObj['invID'] = results[i].getValue({name: 'internalid'});
invSearchArray.push(invObj);
}
}
return invSearchArray;
I implemented it this way because I feared there would be result restrictions, just as the ".run().each" function has (limited to 4000 results).
I made the assumption that passing the search object directly from getInputData to Map would have restricted results of 4000 as well. Can someone offer clarity on whether there are such restrictions? Am I right to fear the script holting prematurely because search results cannot be processed beyond 4000 in the getInputData stage of a map/reduce script?
Any example to aid me in understanding how a search object is processed in a map/reduce script would be most appreciated.
Thanks
If you simply return the Search instance, all results will be passed along to map, beyond the 1000 or 4000 limits of the getRange and each methods.
If the Search has 8500 results, all 8500 will get passed to map.
function getInputData() {
return search.load(...); // alternatively search.create(...)
}
Another question is if there is any better way to write this method?
Public decimal CalculateTotalPrice(List<product> items)
{
decimal totalPrice = 0.m;
foreach(Product p in items)
{
if(p.Offer == "")
calc = new DefaultCalc();
else if(p.Offer == "BuyOneGetOneFree")
calc = new BuyOneGetOneFreeCalc();
else if(p.Offer == "ThreeInPriceOfTwo")
calc = new ThreeInPriceOfTwoCalc()
totalPrice += calc.Calculate(p.Quantity, p.UnitPrice);
}
return totalPrice;
}
You should probably review Polly Want a Message, by Sandi Metz
How to unit test a method that is having multiple object creation in switch statement?
An important thing to notice here is that the switch statement is an implementation detail. From the point of view of the caller, this thing is just a function
Public decimal CalculateTotalPrice(List<product> items);
If the pricing computations are fixed, you can just use the usual example based tests:
assertEquals(expectedPrice, CalculateTotalPrice(items));
But if they aren't fixed, you can still do testing based on the properties of the method. Scott Wlaschin has a really good introduction to property based testing. Based on the logic you show here, there are some things we can promise about prices, without knowing anything about the strategies in use
the price should always be greater than zero.
the price of a list of items is the same as the sum of the prices of the individual items.
if there is any better way to write this method?
You might separate choosing the pricing strategy from using the strategy. As Sandi notes, that sort of construct often shows up in more than once place.
foreach(Product p in items)
{
calc = pricing(p.Offer);
totalPrice += calc.Calculate(p.Quantity, p.UnitPrice);
}
"pricing" would then become something that you pass into this function (either as an argument, or as a dependency).
In effect, you would end up with three different kinds of test.
Checks that pricing returns the right pricing strategy for each offer.
Checks that each strategy performs its own calculation correctly.
Checks that CalculateTotalPrice compute the sum correctly.
Personally, I prefer to treat the test subject as a single large black box, but there are good counter arguments. Horses for courses.
Constructors can not be mocked (at least with free mocking frameworks).
Write tests without mocking as far as your tests run fast and test case setup is not very very complicated.
In your particular case you should be able to write tests without mocking.
Prepare data
var products = new List<Product>
{
new Product { Quantity = 10, UnitPrice = 5.0m, Offer = "" },
new Product { Quantity = 2, UnitPrice = 3.0m , Offer = "BuyOneGetOneFree" },
new Product { Quantity = 3, UnitPrice = 2.0m , Offer = "ThreeInPriceOfTwo" },
}
// prepare expected total
var expected = 57.0m; // 10 * 50.0 + 1 * 3.0 + 2 * 2.0
// Run the test
var actual = CalculateTotalPrice(products);
actual.Should().Be(expected); // pass Ok.
With this approach tests will not depend on implementation details.
You will be able to freely play with designs without rewriting tests every time you change your implementation logic.
The other answers are technically fine, but I would suggest one thing:
if(p.Offer == "")
calc = new DefaultCalc();
else if(p.Offer == "BuyOneGetOneFree")
calc = new BuyOneGetOneFreeCalc();
else if(p.Offer == "ThreeInPriceOfTwo")
calc = new ThreeInPriceOfTwoCalc()
should absolutely go into its own method/scope/whatever.
You are mapping a string to a specific calculator. That should happen in one place, and one place only. You see, first you do that here. Then some method method comes around that needs the same mapping. So you start duplicating.
I have a CouchDB database which has documents with the following format:
{ createdBy: 'userId', at: 123456, type: 'action_type' }
I want to write a view that will give me how many actions of each type were created by which user. I was able to do that creating a view that does this:
emit([doc.createdBy, doc.type, doc.at], 1);
With the reduce function "sum" and consuming the view in this way:
/_design/userActionsDoc/_view/userActions?group_level=2
this returns a result with rows just in the way I want:
"rows":[ {"key":["userId","ACTION_1"],"value":20}, ...
the problem is that now I want to filter the results for a given time period. So I want to have the exact same information but only considering actions which happened within a given time period.
I can filter the documents by "at" if I emit the fields in a different order.
?group_level=3&startkey=[149328316160]&endkey=[1493283161647,{},{}]
emit([doc.at, doc.type, doc.createdBy], 1);
but then I won't get the results grouped by userId and actionType. Is there a way to have both? Maybe writing my own reduce function?
I feel your pain. I have done two different things in the past to attempt to solve similar issues.
The first pattern is a pain and may work great or may not work at all. I've experienced both. Your map function looks something like this:
function(doc) {
var obj = {};
obj[doc.createdBy] = {};
obj[doc.createdBy][doc.type] = 1;
emit(doc.at, obj);
// Ignore this for now
// emit(doc.at, JSON.stringify(obj));
}
Then your reduce function looks like this:
function(key, values, rereduce) {
var output = {};
values.forEach(function(v) {
// Ignore this for now
// v = JSON.parse(v);
for (var user in v) {
for (var action in v[user]) {
output[user][action] = (output[user][action] || 0) + v[user][action];
}
}
});
return output;
// Ignore this for now
// return JSON.stringify(output);
}
With large datasets, this usually results in a couch error stating that your reduce function is not shrinking fast enough. In that case, you may be able to stringify/parse the objects as shown in the "ignore" comments in the code.
The reasoning behind this is that couchdb ultimately wants you to output a simple object like a string or integer in a reduce function. In my experience, it doesn't seem to matter that the string gets longer, as long as it remains a string. If you output an object, at some point the function errors because you have added too many props to that object.
The second pattern is potentially better, but requires that your time periods are "defined" ahead of time. If your time period requirements can be locked down to a specific year, specific month, day, quarter, etc. You just emit multiple times in your map function. Below I assume the at property is epoch milliseconds, or at least something that the date constructor can accurately parse.
function(doc) {
var time_key;
var my_date = new Date(doc.at);
//// Used for filtering results in a given year
//// e.g. startkey=["2017"]&endkey=["2017",{}]
time_key = my_date.toISOString().substr(0,4);
emit([time_key, doc.createdBy, doc.type], 1);
//// Used for filtering results in a given month
//// e.g. startkey=["2017-01"]&endkey=["2017-01",{}]
time_key = my_date.toISOString().substr(0,7);
emit([time_key, doc.createdBy, doc.type], 1);
//// Used for filtering results in a given quarter
//// e.g. startkey=["2017Q1"]&endkey=["2017Q1",{}]
time_key = my_date.toISOString().substr(0,4) + 'Q' + Math.floor(my_date.getMonth()/3).toString();
emit([time_key, doc.createdBy, doc.type], 1);
}
Then, your reduce function is the same as in your original. Essentially you're just trying to define a constant value for the first item in your key that corresponds to a defined time period. Works well for business reporting, but not so much for allowing for flexible time periods.
I am trying to understand why is it taking so long to execute a simple query.
In my local machine it takes 10 seconds but in production it takes 1 min.
(I imported the database from production into my local database)
select *
from JobHistory
where dbo.LikeInList(InstanceID, 'E218553D-AAD1-47A8-931C-87B52E98A494') = 1
The table DataHistory is not indexed and it has 217,302 rows
public partial class UserDefinedFunctions
{
[SqlFunction]
public static bool LikeInList([SqlFacet(MaxSize = -1)]SqlString value, [SqlFacet(MaxSize = -1)]SqlString list)
{
foreach (string val in list.Value.Split(new char[] { ',' }, StringSplitOptions.None))
{
Regex re = new Regex("^.*" + val.Trim() + ".*$", RegexOptions.IgnoreCase);
if (re.IsMatch(value.Value))
{
return(true);
}
}
return (false);
}
};
And the issue is that if a table has 217k rows then I will be calling that function 217,000 times! not sure how I can rewrite this thing.
Thank you
There are several issues with this code:
Missing (IsDeterministic = true, IsPrecise = true) in [SqlFunction] attribute. Doing this (mainly just the IsDeterministic = true part) will allow the SQLCLR UDF to participate in parallel execution plans. Without setting IsDeterministic = true, this function will prevent parallel plans, just like T-SQL UDFs do.
Return type is bool instead of SqlBoolean
RegEx call is inefficient: using an instance method once is expensive. Switch to using the static Regex.IsMatch instead
RegEx pattern is very inefficient: wrapping the search string in "^.*" and ".*$" will require the RegEx engine to parse and retain in memory as the "match", the entire contents of the value input parameter, for every single iteration of the foreach. Yet the behavior of Regular Expressions is such that simply using val.Trim() as the entire pattern would yield the exact same result.
(optional) If neither input parameter will ever be over 4000 characters, then specify a MaxSize of 4000 instead of -1 since NVARCHAR(4000) is much faster than NVARCHAR(MAX) for passing data into, and out of, SQLCLR objects.
*Basically I'm trying to order objects by their score over the last hour.
I'm trying to generate an hourly votes sum for objects in my database. Votes are embedded into each object. The object schema looks like this:
{
_id: ObjectId
score: int
hourly-score: int <- need to update this value so I can order by it
recently-voted: boolean
votes: {
"4e4634821dff6f103c040000": { <- Key is __toString of voter ObjectId
"_id": ObjectId("4e4634821dff6f103c040000"), <- Voter ObjectId
"a": 1, <- Vote amount
"ca": ISODate("2011-08-16T00:01:34.975Z"), <- Created at MongoDate
"ts": 1313452894 <- Created at timestamp
},
... repeat ...
}
}
This question is actually related to a question I asked a couple of days ago Best way to model a voting system in MongoDB
How would I (can I?) run a MapReduce command to do the following:
Only run on objects with recently-voted = true OR hourly-score > 0.
Calculate the sum of the votes created in the last hour.
Update hourly-score = the sum calculated above, and recently-voted = false.
I also read here that I can perform a MapReduce on the slave DB by running db.getMongo().setSlaveOk() before the M/R command. Could I run the reduce on a slave and update the master DB?
Are in-place updates even possible with Mongo MapReduce?
You can definitely do this. I'll address your questions one at a time:
1.
You can specify a query along with your map-reduce, which filters the set of objects which will be passed into the map phase. In the mongo shell, this would look like (assuming m and r are the names of your mapper and reducer functions, respectively):
> db.coll.mapReduce(m, r, {query: {$or: [{"recently-voted": true}, {"hourly-score": {$gt: 0}}]}})
2.
Step #1 will let you use your mapper on all documents with at least one vote in the last hour (or with recently-voted set to true), but not all the votes will have been in the last hour. So you'll need to filter the list in your mapper, and only emit those votes you wish to count:
function m() {
var hour_ago = new Date() - 3600000;
this.votes.forEach(function (vote) {
if (vote.ts > hour_ago) {
emit(/* your key */, this.vote.a);
}
});
}
And to reduce:
function r(key, values) {
var sum = 0;
values.forEach(function(value) { sum += value; });
return sum;
}
3.
To update the hourly scores table, you can use the reduceOutput option to map-reduce, which will call your reducer with both the emitted values, and the previously saved value in the output collection, (if any). The result of that pass will be saved into the output collection. This looks like:
> db.coll.mapReduce(m, r, {query: ..., out: {reduce: "output_coll"}})
In addition to re-reducing output, you can use merge which will overwrite documents in the output collection with newly created ones (but leaving behind any documents with an _id different than the _ids created by your m-r job), replace, which is effectively a drop-and-create operation and is the default, or use {inline: 1}, which will return the results directly to the shell or to your driver. Note that when using {inline: 1}, your results must fit in the size allowed for a single document (16MB in recent MongoDB releases).
(4.)
You can run map-reduce jobs on secondaries ("slaves"), but since secondaries cannot accept writes (that's what makes them secondary), you can only do this when using inline output.