Scala function takes 2 hours with 2 million values - list

Would be grateful if any ideas to speed it up!
case class Pair(aa:String, bb:String)
case class OutputRow(bb:String, aa:String, bb_2:String, aa_2:String)
def startSearch(
_1_sorted: Array[Pair] ,
_2_hashmap: HashMap[String, String] ) : ArrayBuffer[OutputRow] = {
var outputTableListBuffer = ArrayBuffer[OutputRow]()
var searchComparisionFlag = false
var storeMain = Pair("0","0") //Initialize with Dummy data
var i = 0
def search(xxxx_1: Pair): Unit = {
if (searchComparisionFlag==true) {
var _2_exists = _2_hashmap.exists(_._1 == xxxx_1.aa)
if (_2_exists) {
val _2_xxxx = _2_hashmap(xxxx_1.aa)
outputTableListBuffer.append(OutputRow(storeMain.aa, storeMain.bb,_2_xxxx, xxxx_1.aa))
i = i + 1
if (i % 1000 == 0) println("In recursive search storeMain: ", storeMain)
var storePair = Pair(_2_xxxx,xxxx_1.aa)
search(storePair)
} else {
searchComparisionFlag = false
return
}
} else {
var _2_exists = _2_hashmap.exists(_._1 == xxxx_1.aa)
if (_2_exists) {
val _2_xxxx = _2_hashmap(xxxx_1.aa)
searchComparisionFlag = true
outputTableListBuffer.append(OutputRow(xxxx_1.aa, xxxx_1.bb,_2_xxxx, xxxx_1.aa))
var store = Pair(_2_xxxx,xxxx_1.aa)
search(store)
}
}
}
_1_sorted.foreach{ aa_1 =>
val store = Pair(aa_1.aa, aa_1.bb)
storeMain = store
search(store)
}
outputTableListBuffer
}
The above function takes 2 hours with 1 million values in _1_sorted and with a good 1 Million lookup in the hashmap.
Any ideas to speed this up?
This is a recursive logic function

The biggest problem is this:
_2_hashmap.exists(_._1 == xxxx_1.aa)
This is checking every single element of the hashmap on every call. Instead, use get:
_2_hashmap.get(xxxx_1.aa) match {
Some(_2_xxxx) => // Found
???
None => // Not found
???
}
Other code issues:
Don't use return
Pass flags down through recursive call rather than using global var
Use val wherever possible
Don't start variable names with _

Related

Initializing Lists to Avoid Potential Errors

This function
List<int> _calculateTrips() {
List<int> trips = [];
trips = List.generate(
30,
(index) {
var counter = 0;
var aDay = DateTime.now().subtract(Duration(days: index));
for (var aWalk in walks) {
if ((aDay.month == aWalk.month) && (aDay.day == aWalk.day)) {
counter++;
}
}
trips.add(counter);
},
);
return trips;
}
creates the error The body might complete normally, causing null to be returned, but the return type is a potentially non-nullable type.Try adding either a return or a throw statement at the end. I'm struggling a bit to understand the message because (a) I thought I initialized the list at the beginning of the function and (b) I thought I had a return statement at the end.
The issue is with the function you pass to List.generate(). It expects a E Function(int), where E is the type of the element, for example:
final evenNumbers = List.generate(10, (index) {
return index * 2;
});
Your issue comes from trips.add(counter):
List<int> trips = [];
trips = List.generate(30, (index) {
final trip = calculateTrip(index);
trips.add(trip);
})
The inner function needs to be an int Function(int) (i.e. a function that takes an int, and returns an int), because your list is a List<int>.
However, your inner function never returns anything.
Simply replace trips.add(counter); with return counter; and it should solve this error. You may also want to refactor your function a little:
List<int> _calculateTrips() => List.generate(30, (index {
var counter = 0;
var aDay = DateTime.now().subtract(Duration(days: index));
for (var aWalk in walks) {
if ((aDay.month == aWalk.month) && (aDay.day == aWalk.day)) {
counter++;
}
}
return counter;
});

How to update a variable maintaining the length of a list in a cascade-like structure?

class CoinData {
var _controller = StreamController<Map<String,dynamic>>();
.....
Stream<Map<String,dynamic>> getAllCurrentRates() {
int numAssets = 0;
int counter = 0;
this.listAllAssets()
.then((list) {
if (list != null) {
numAssets = list.length;
List<Map<String, dynamic>>.from(list)
.where((map) => map["type_is_crypto"] == 1)
.take(3)
.map((e) => e["asset_id"].toString())
.forEach((bitCoin) {
this.getCurrentRate(bitCoin)
.then((rate) => _controller.sink.add(Map<String,dynamic>.from(rate)))
.whenComplete(() {
if (++counter >= numAssets) _controller.close();
});
});
}
});
return _controller.stream;
}
.....
}
The length of returned list is around 2500 and this value is assumed by numAssets, however as you see that list is modified later and therefore its length is less, then the evaluation (++counter >= numAssets) is incorrect. So, is it possible to fix that code maintaining its current structure?
.take(3) is temporal, it shall be removed later.

How to alter this script to add another condition

I have this piece of code which has been working great for me, however, I need a minor alteration to it and don't know how to proceed.
I would like for 'Multiple Use' to be added as another condition, alongside 'Yes' for the onEdit() to work.
function numberToLetter(number){
// converts the column number to a letter
var temp = "";
var letter = "";
while (number > 0){
temp = (number - 1) % 26;
letter = String.fromCharCode(temp + 65) + letter;
number = (number - temp - 1) / 26;
}
return letter;
}
function obtainFirstBlankRow() {
var sheet = SpreadsheetApp.getActive().getSheetByName('Aug2019');
// search for first blank row
var col = sheet.getRange('A:A');
var vals = col.getValues();
var count = 0;
while (vals[count][0] != "") {
count++;
}
return count + 1;
}
function onEdit(e) {
var ss = SpreadsheetApp.getActiveSheet();
if (ss.getName() == 'ProspectiveSites' && e.range.getColumn() == 26) {
if (e.range.getValue() != 'Yes'){
Logger.log('test');
return;
}
var sourceSheet = SpreadsheetApp.getActive().getSheetByName('ProspectiveSites');
var targetSheet = SpreadsheetApp.getActive().getSheetByName('Aug2019');
//Logger.log('O' + e.getRow() + ':O' + e.getRow());
Logger.log(e);
Logger.log(e.range.getValue());
var cell15 = sourceSheet.getRange('O' + e.range.getRow() + ':O' + e.range.getRow()).getValue();
var cell24 = sourceSheet.getRange('X' + e.range.getRow() + ':X' + e.range.getRow()).getValue();
Logger.log(cell15);
Logger.log(cell24);
var row = obtainFirstBlankRow();
targetSheet.getRange(row, 1).setValue(cell15);
targetSheet.getRange(row, 2).setValue(cell24);
}
}
Solution
What stops you from adding another condition for the if statement? Please, take some time to research JS documentation, it will greatly help you in the long run (see useful links after the sample).
Modifications
This modification assumes that you need to exit if value is not equal to "Multiple use" and not equal to "Yes". Also, note that there are a few additional changes made for optimization purposes (I changed all comparison operators to strict as well).
Sample
/**
* onEdit simple trigger;
* #param {Object} event object;
*/
function onEdit(e) {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var actS = ss.getActiveSheet(); //active sheet === source sheet;
//access event object params;
var range = e.range;
var row = range.getRow();
var column = range.getColumn();
var value = range.getValue();
if (actS.getName()==='ProspectiveSites' && column===26) {
if (value!=='Yes'&&value!=='Multiple Use') {
Logger.log('test');
return;
}
var augS = ss.getSheetByName('Aug2019'); //target sheet;
var cell15val = actS.getRange('O'+row+':O'+row).getValue();
var cell24val = actS.getRange('X'+row+':X'+row).getValue();
var rowBlank = obtainFirstBlankRow();
var target = augS.getRange(rowBlank,1,1,2); //idx, first col, 1 row, 2 cols;
target.setValues([[ cell15val , cell24val ]]);
}
}
Useful links
if..else statement reference on MDN;
Comparison operators reference on MDN;
getRange() method reference;
setValues() method reference;

CouchDB list view error when no key requested

Having trouble with a list function I wrote using CouchApp to take items from a view that are name, followed by a hash list of id and a value to create a CSV file for the user.
function(head, req) {
// set headers
start({ "headers": { "Content-Type": "text/csv" }});
// set arrays
var snps = {};
var test = {};
var inds = [];
// get data to associative array
while(row = getRow()) {
for (var i in row.value) {
// add individual to list
if (!test[i]) {
test[i] = 1;
inds.push(i);
}
// add to snps hash
if (snps[row.key]) {
if (snps[row.key][i]) {
// multiple call
} else {
snps[row.key][i] = row.value[i];
}
} else {
snps[row.key] = {};
snps[row.key][i] = row.value[i];
}
//send(row.key+" => "+i+" => "+snps[row.key][i]+'\n');
}
}
// if there are individuals to write
if (inds.length > 0) {
// sort keys in array
inds.sort();
// print header if first
var header = "variant,"+inds.join(",")+"\n";
send(header);
// for each SNP requested
for (var j in snps) {
// build row
var row = j;
for (var k in inds) {
// if snp[rs_num][individual] is set, add to row string
// else add ?
if (snps[j][inds[k]]) {
row = row+","+snps[j][inds[k]];
} else {
row = row+",?";
}
}
// send row
send(row+'\n');
}
} else {
send('No results found.');
}
}
If I request _list/mylist/myview (where mylist is the list function above and the view returns as described above) with ?key="something" or ?keys=["something", "another] then it works, but remove the query string and I get the error below:
{"code":500,"error":"render_error","reason":"function raised error: (new SyntaxError(\"JSON.parse\", \"/usr/local/share/couchdb/server/main.js\", 865)) \nstacktrace: getRow()#/usr/local/share/couchdb/server/main.js:865\n([object Object],[object Object])#:14\nrunList(function (head, req) {var snps = {};var test = {};var inds = [];while ((row = getRow())) {for (var i in row.value) {if (!test[i]) {test[i] = 1;inds.push(i);}if (snps[row.key]) {if (snps[row.key][i]) {} else {snps[row.key][i] = row.value[i];}} else {snps[row.key] = {};snps[row.key][i] = row.value[i];}}}if (inds.length > 0) {inds.sort();var header = \"variant,\" + inds.join(\",\") + \"\\n\";send(header);for (var j in snps) {var row = j;for (var k in inds) {if (snps[j][inds[k]]) {row = row + \",\" + snps[j][inds[k]];} else {row = row + \",?\";}}send(row + \"\\n\");}} else {send(\"No results found.\");}},[object Object],[object Array])#/usr/local/share/couchdb/server/main.js:979\n(function (head, req) {var snps = {};var test = {};var inds = [];while ((row = getRow())) {for (var i in row.value) {if (!test[i]) {test[i] = 1;inds.push(i);}if (snps[row.key]) {if (snps[row.key][i]) {} else {snps[row.key][i] = row.value[i];}} else {snps[row.key] = {};snps[row.key][i] = row.value[i];}}}if (inds.length > 0) {inds.sort();var header = \"variant,\" + inds.join(\",\") + \"\\n\";send(header);for (var j in snps) {var row = j;for (var k in inds) {if (snps[j][inds[k]]) {row = row + \",\" + snps[j][inds[k]];} else {row = row + \",?\";}}send(row + \"\\n\");}} else {send(\"No results found.\");}},[object Object],[object Array])#/usr/local/share/couchdb/server/main.js:1024\n(\"_design/kbio\",[object Array],[object Array])#/usr/local/share/couchdb/server/main.js:1492\n()#/usr/local/share/couchdb/server/main.js:1535\n#/usr/local/share/couchdb/server/main.js:1546\n"}
Can't say for sure since you gave little detail, however, a probable source of problems, is the use of arrays to collect data from every row: it consumes an unpredictable amount of memory. This may explain why it works when you query for a few records, and fails when you query for all records.
You should try to arrange data in a way that eliminates the need to collect all values before sending output to the client. And keep in mind that while map and reduce results are saved on disk, list functions are executed on every single query. If you don't keep list function fast and lean, you'll have problems.

Fuzzy Matches on dijit.form.ComboBox / dijit.form.FilteringSelect Subclass

I am trying to extend dijit.form.FilteringSelect with the requirement that all instances of it should match input regardless of where the characters are in the inputted text, and should also ignore whitespace and punctuation (mainly periods and dashes).
For example if an option is "J.P. Morgan" I would want to be able to select that option after typing "JP" or "P Morgan".
Now I know that the part about matching anywhere in the string can be accomplished by passing in queryExpr: "*${0}*" when creating the instance.
What I haven't figured out is how to make it ignore whitespace, periods, and dashes. I have an example of where I'm at here - http://jsfiddle.net/mNYw2/2/. Any help would be appreciated.
the thing to master in this case is the store fetch querystrings.. It will call a function in the attached store to pull out any matching items, so if you have a value entered in the autofilling inputfield, it will eventually end up similar to this in the code:
var query = { this.searchAttr: this.get("value") }; // this is not entirely accurate
this._fetchHandle = this.store.query(query, options);
this._fetchHandle.then( showResultsFunction );
So, when you define select, override the _setStoreAttr to make changes in the store query api
dojo.declare('CustomFilteringSelect', [FilteringSelect], {
constructor: function() {
//???
},
_setStoreAttr: function(store) {
this.inherited(arguments); // allow for comboboxmixin to modify it
// above line eventually calls this._set("store", store);
// so now, 'this' has 'store' set allready
// override here
this.store.query = function(query, options) {
// note that some (Memory) stores has no 'fetch' wrapper
};
}
});
EDIT: override queryEngine function as opposed to query function
Take a look at the file SimpleQueryEngine.js under dojo/store/util. This is essentially what filters the received Array items on the given String query from the FilteringSelect. Ok, it goes like this:
var MyEngine = function(query, options) {
// create our matching query function
switch(typeof query){
default:
throw new Error("Can not query with a " + typeof query);
case "object": case "undefined":
var queryObject = query;
query = function(object){
for(var key in queryObject){
var required = queryObject[key];
if(required && required.test){
if(!required.test(object[key])){
return false;
}
}else if(required != object[key]){
return false;
}
}
return true;
};
break;
case "string":
/// HERE is most likely where you can play with the reqexp matcher.
// named query
if(!this[query]){
throw new Error("No filter function " + query + " was found in store");
}
query = this[query];
// fall through
case "function":
// fall through
}
function execute(array){
// execute the whole query, first we filter
var results = arrayUtil.filter(array, query);
// next we sort
if(options && options.sort){
results.sort(function(a, b){
for(var sort, i=0; sort = options.sort[i]; i++){
var aValue = a[sort.attribute];
var bValue = b[sort.attribute];
if (aValue != bValue) {
return !!sort.descending == aValue > bValue ? -1 : 1;
}
}
return 0;
});
}
// now we paginate
if(options && (options.start || options.count)){
var total = results.length;
results = results.slice(options.start || 0, (options.start || 0) + (options.count || Infinity));
results.total = total;
}
return results;
}
execute.matches = query;
return execute;
};
new Store( { queryEngine: MyEngine });
when execute.matches is set on bottom of this function, what happens is, that the string gets called on each item. Each item has a property - Select.searchAttr - which is tested by RegExp like so: new RegExp(query).test(item[searchAttr]); or maybe a bit simpler to understand; item[searchAttr].matches(query);
I have no testing environment, but locate the inline comment above and start using console.debug..
Example:
Stpre.data = [
{ id:'WS', name: 'Will F. Smith' },
{ id:'RD', name:'Robert O. Dinero' },
{ id:'CP', name:'Cle O. Patra' }
];
Select.searchAttr = "name";
Select.value = "Robert Din"; // keyup->autocomplete->query
Select.query will become Select.queryExp.replace("${0]", Select.value), in your simple queryExp case, 'Robert Din'.. This will get fuzzy and it would be up to you to fill in the regular expression, here's something to start with
query = query.substr(1,query.length-2); // '*' be gone
var words = query.split(" ");
var exp = "";
dojo.forEach(words, function(word, idx) {
// check if last word
var nextWord = words[idx+1] ? words[idx+1] : null;
// postfix 'match-all-but-first-letter-of-nextWord'
exp += word + (nextWord ? "[^" + nextWord[0] + "]*" : "");
});
// exp should now be "Robert[^D]*Din";
// put back '*'
query = '*' + exp + '*';