SuiteScript 2.0: Are there any search result limitations when executing a saved search via "getInputData" stage of map/reduce script? - mapreduce

I am currently building a map/reduce script in NetSuite which passes the results of a saved search from the getInputData stage to the map stage. This is being done by first running a WHILE loop in the getInputData stage to obtain the internal ids of each entry, inserting into an array, then passing over to the map stage. Like so:
// run saved search - unlimited rows from saved search.
do {
var subresults = invoiceSearch.run().getRange({ start: start, end: start + pageSize });
results = results.concat(subresults);
count = subresults.length;
start += pageSize + 1;
} while (count == pageSize);
var invSearchArray = [];
if(invoiceSearch){
//NOTE: .run().each has a limit of 4,000 results, hence the do-while loop above.
for (var i = 0; i < results.length; i++){
var invObj = new Object();
invObj['invID'] = results[i].getValue({name: 'internalid'});
invSearchArray.push(invObj);
}
}
return invSearchArray;
I implemented it this way because I feared there would be result restrictions, just as the ".run().each" function has (limited to 4000 results).
I made the assumption that passing the search object directly from getInputData to Map would have restricted results of 4000 as well. Can someone offer clarity on whether there are such restrictions? Am I right to fear the script holting prematurely because search results cannot be processed beyond 4000 in the getInputData stage of a map/reduce script?
Any example to aid me in understanding how a search object is processed in a map/reduce script would be most appreciated.
Thanks

If you simply return the Search instance, all results will be passed along to map, beyond the 1000 or 4000 limits of the getRange and each methods.
If the Search has 8500 results, all 8500 will get passed to map.
function getInputData() {
return search.load(...); // alternatively search.create(...)
}

Related

Dynamic pre-request script Postman

I have this pre-request script and using runner to send bulk requests each second
const moment = require('moment');
postman.setEnvironmentVariable("requestid", moment().format("0223YYYYMMDDHHmmss000000"));
I need the “requestid” to be unique every time.
first request: "022320221115102036000001"
second request: "022320221115102037000002"
third request: "022320221115102038000003"
.
.
.
and so on until let’s say 1000 requests.
Basically, I need to make the last 6 digits dynamic.
Your answer can be found on this postman request I've created for you. There's many ways to achieve this, given the little information provided, I've defaulted to:
Set a baseline prefix (before the last 6 numbers)
Give a start number for the last 6 numbers
If there IS NOT a previous variable stored initialized with the values above
If there IS a previous variable stored just increment it by one.
The variable date is your final result, current is just the increment
You can see here sequential requests:
And here is the code, but I would test this directly on the request I've provided above:
// The initial 6 number to start with
// A number starting with 9xxxxxx will be easier for String/Number converstions
const BASELINE = '900000'
const PREFIX = '022320221115102036'
// The previous used value if any
let current = pm.collectionVariables.get('current')
// If there's a previous number increment that, otherwise use the baseline
if (isNaN(current)) {
current = BASELINE
} else {
current = Number(current) + 1
}
const date = PREFIX + current
// Final number you want to use
pm.collectionVariables.set('current', current)
pm.collectionVariables.set('date', PREFIX + date)
console.log(current)
console.log(date)

CouchDB View - filter keys before grouping

I have a CouchDB database which has documents with the following format:
{ createdBy: 'userId', at: 123456, type: 'action_type' }
I want to write a view that will give me how many actions of each type were created by which user. I was able to do that creating a view that does this:
emit([doc.createdBy, doc.type, doc.at], 1);
With the reduce function "sum" and consuming the view in this way:
/_design/userActionsDoc/_view/userActions?group_level=2
this returns a result with rows just in the way I want:
"rows":[ {"key":["userId","ACTION_1"],"value":20}, ...
the problem is that now I want to filter the results for a given time period. So I want to have the exact same information but only considering actions which happened within a given time period.
I can filter the documents by "at" if I emit the fields in a different order.
?group_level=3&startkey=[149328316160]&endkey=[1493283161647,{},{}]
emit([doc.at, doc.type, doc.createdBy], 1);
but then I won't get the results grouped by userId and actionType. Is there a way to have both? Maybe writing my own reduce function?
I feel your pain. I have done two different things in the past to attempt to solve similar issues.
The first pattern is a pain and may work great or may not work at all. I've experienced both. Your map function looks something like this:
function(doc) {
var obj = {};
obj[doc.createdBy] = {};
obj[doc.createdBy][doc.type] = 1;
emit(doc.at, obj);
// Ignore this for now
// emit(doc.at, JSON.stringify(obj));
}
Then your reduce function looks like this:
function(key, values, rereduce) {
var output = {};
values.forEach(function(v) {
// Ignore this for now
// v = JSON.parse(v);
for (var user in v) {
for (var action in v[user]) {
output[user][action] = (output[user][action] || 0) + v[user][action];
}
}
});
return output;
// Ignore this for now
// return JSON.stringify(output);
}
With large datasets, this usually results in a couch error stating that your reduce function is not shrinking fast enough. In that case, you may be able to stringify/parse the objects as shown in the "ignore" comments in the code.
The reasoning behind this is that couchdb ultimately wants you to output a simple object like a string or integer in a reduce function. In my experience, it doesn't seem to matter that the string gets longer, as long as it remains a string. If you output an object, at some point the function errors because you have added too many props to that object.
The second pattern is potentially better, but requires that your time periods are "defined" ahead of time. If your time period requirements can be locked down to a specific year, specific month, day, quarter, etc. You just emit multiple times in your map function. Below I assume the at property is epoch milliseconds, or at least something that the date constructor can accurately parse.
function(doc) {
var time_key;
var my_date = new Date(doc.at);
//// Used for filtering results in a given year
//// e.g. startkey=["2017"]&endkey=["2017",{}]
time_key = my_date.toISOString().substr(0,4);
emit([time_key, doc.createdBy, doc.type], 1);
//// Used for filtering results in a given month
//// e.g. startkey=["2017-01"]&endkey=["2017-01",{}]
time_key = my_date.toISOString().substr(0,7);
emit([time_key, doc.createdBy, doc.type], 1);
//// Used for filtering results in a given quarter
//// e.g. startkey=["2017Q1"]&endkey=["2017Q1",{}]
time_key = my_date.toISOString().substr(0,4) + 'Q' + Math.floor(my_date.getMonth()/3).toString();
emit([time_key, doc.createdBy, doc.type], 1);
}
Then, your reduce function is the same as in your original. Essentially you're just trying to define a constant value for the first item in your key that corresponds to a defined time period. Works well for business reporting, but not so much for allowing for flexible time periods.

Perform INSERT OR UPDATE as a single operation with DynamoDB

We are using DynamoDB for counting user actions and an item must be either inserted or updated, depending on whatever it's already exists. The code must also update a counter. Right now we do this with 2 steps:
using (var client = AWSClientFactory.CreateAmazonDynamoDBClient(RegionEndpoint.USEast1))
{
var table = Table.LoadTable(client, TableName);
var item = await table.GetItemAsync(id);
if (item == null)
{
// row not exists -> insert & return 1
var document = new Document();
document["Id"] = id;
document["Counter"] = 1;
await table.PutItemAsync(document);
return 1;
}
// row exists -> increment counter & update
var counter = item["Counter"].AsInt();
item["Counter"] = counter + 1;
await table.UpdateItemAsync(item);
return counter + 1;
}
The problem with the code is that it increases latency times & server load. I would prefer to do this with a single operation. I think this should be possible with conditional expressions but I cannot figure out how to do this using .NET SDK.
Be careful about incrementing counters yourself, as you could have race conditions if multiple instances of your app can increment the counter. Instead use DynamoDB Atomic Counters. For example, my ruby code calls the UpdateItem API with the following (older) way of incrementing counters:
{"counter" => {value: {n: "1"}, action: "ADD"}}
The newer way is to use an Update Expression, which I haven't implemented yet. Also, if the counter/item doesn't already exist, it will assume the value is 0 and increment the counter to 1.
you have a race condition in your code.
it's possible that 2 different worker create the item at the same time.
the recommended pattern for what you are trying to do is:
create if not exists operation for the item.
atomic counter update on "Count"
So instead of 3 operations (get, put, update) - that also have a race condition - in this case you will only have 2 operations (and the correct behavior)
hope this helps.

Regex with SQL Server 2008 CLR performance issues

I am trying to understand why is it taking so long to execute a simple query.
In my local machine it takes 10 seconds but in production it takes 1 min.
(I imported the database from production into my local database)
select *
from JobHistory
where dbo.LikeInList(InstanceID, 'E218553D-AAD1-47A8-931C-87B52E98A494') = 1
The table DataHistory is not indexed and it has 217,302 rows
public partial class UserDefinedFunctions
{
[SqlFunction]
public static bool LikeInList([SqlFacet(MaxSize = -1)]SqlString value, [SqlFacet(MaxSize = -1)]SqlString list)
{
foreach (string val in list.Value.Split(new char[] { ',' }, StringSplitOptions.None))
{
Regex re = new Regex("^.*" + val.Trim() + ".*$", RegexOptions.IgnoreCase);
if (re.IsMatch(value.Value))
{
return(true);
}
}
return (false);
}
};
And the issue is that if a table has 217k rows then I will be calling that function 217,000 times! not sure how I can rewrite this thing.
Thank you
There are several issues with this code:
Missing (IsDeterministic = true, IsPrecise = true) in [SqlFunction] attribute. Doing this (mainly just the IsDeterministic = true part) will allow the SQLCLR UDF to participate in parallel execution plans. Without setting IsDeterministic = true, this function will prevent parallel plans, just like T-SQL UDFs do.
Return type is bool instead of SqlBoolean
RegEx call is inefficient: using an instance method once is expensive. Switch to using the static Regex.IsMatch instead
RegEx pattern is very inefficient: wrapping the search string in "^.*" and ".*$" will require the RegEx engine to parse and retain in memory as the "match", the entire contents of the value input parameter, for every single iteration of the foreach. Yet the behavior of Regular Expressions is such that simply using val.Trim() as the entire pattern would yield the exact same result.
(optional) If neither input parameter will ever be over 4000 characters, then specify a MaxSize of 4000 instead of -1 since NVARCHAR(4000) is much faster than NVARCHAR(MAX) for passing data into, and out of, SQLCLR objects.

Get all HITs with a certain status

The SearchHITs function seems almost useless for doing any actual searching. It merely pulls a listing of your HITs and doesn't allow for any filters. Is the only way to search to iterate through all the results? For example:
my_reviewable_hits = []
for page in range(5,50):
res = m.conn.search_hits(sort_direction='Descending', page_size=100, page_number=page)
for hit in res:
if hit.HITStatus == 'Reviewable':
my_reviewable_hits.append(hit)
Yes. You have to iterate through all of them.