Would be grateful if any ideas to speed it up!
case class Pair(aa:String, bb:String)
case class OutputRow(bb:String, aa:String, bb_2:String, aa_2:String)
def startSearch(
_1_sorted: Array[Pair] ,
_2_hashmap: HashMap[String, String] ) : ArrayBuffer[OutputRow] = {
var outputTableListBuffer = ArrayBuffer[OutputRow]()
var searchComparisionFlag = false
var storeMain = Pair("0","0") //Initialize with Dummy data
var i = 0
def search(xxxx_1: Pair): Unit = {
if (searchComparisionFlag==true) {
var _2_exists = _2_hashmap.exists(_._1 == xxxx_1.aa)
if (_2_exists) {
val _2_xxxx = _2_hashmap(xxxx_1.aa)
outputTableListBuffer.append(OutputRow(storeMain.aa, storeMain.bb,_2_xxxx, xxxx_1.aa))
i = i + 1
if (i % 1000 == 0) println("In recursive search storeMain: ", storeMain)
var storePair = Pair(_2_xxxx,xxxx_1.aa)
search(storePair)
} else {
searchComparisionFlag = false
return
}
} else {
var _2_exists = _2_hashmap.exists(_._1 == xxxx_1.aa)
if (_2_exists) {
val _2_xxxx = _2_hashmap(xxxx_1.aa)
searchComparisionFlag = true
outputTableListBuffer.append(OutputRow(xxxx_1.aa, xxxx_1.bb,_2_xxxx, xxxx_1.aa))
var store = Pair(_2_xxxx,xxxx_1.aa)
search(store)
}
}
}
_1_sorted.foreach{ aa_1 =>
val store = Pair(aa_1.aa, aa_1.bb)
storeMain = store
search(store)
}
outputTableListBuffer
}
The above function takes 2 hours with 1 million values in _1_sorted and with a good 1 Million lookup in the hashmap.
Any ideas to speed this up?
This is a recursive logic function
The biggest problem is this:
_2_hashmap.exists(_._1 == xxxx_1.aa)
This is checking every single element of the hashmap on every call. Instead, use get:
_2_hashmap.get(xxxx_1.aa) match {
Some(_2_xxxx) => // Found
???
None => // Not found
???
}
Other code issues:
Don't use return
Pass flags down through recursive call rather than using global var
Use val wherever possible
Don't start variable names with _
Is there a way, possibly using a ppx extension or similar, to use the functional update syntax { record with key = value } with a nested record?
For instance, in the following example program I'm functionally updating only the outermost record, when I really want to target an "inner" one.
type outer = {
a : float;
b : inner
}
and inner = {
c : float;
}
let item = { a = 0.4; b = { c = 0.7 } }
let () = ignore { item with b = { c = 0.8 }
It becomes less convenient if inner has more than one field.
I'd like to be able to write something like the following (strawman syntax):
let () = ignore { item with b.c = 0.8 }
You can write this in straight OCaml:
{ item with b = { item.b with c = 0.8 } }
I assume you're using ignore just for the examples; it doesn't make sense to ignore the result of a functional record update.
At the moment I'm trying to use a regular expression to find usernames. The following condition is what I need:
"Username matches the search term with a maximum of 3 wrong characters"
For example,
Database content:
"MyUsername"
Search command -> returning match:
search("Username") -> "MyUsername"
search("Us3rname") -> "MyUsername"
search("userName") -> "MyUsername"
search("MyUser") -> none (4 characters wrong)
search("My Us3r N#me") -> none (4 characters wrong)
I can build my regex dynamically and push this to a database query. I only can't get a grip on the regex itself. Could you help me with this? Would be great? (or is it even possible?)
You can't do this with regular expression. You need some similarity algorithm to check the similarity between two strings.
A good start and an easy one is the levensthein distance.
In short: It calculates how many Insert/Update/Delete Operations are needed to transform string A to string B.
I had done this in Javascript some years ago, but it should be easy in nearly every programming language. You can find a working example here:
// http://ejohn.org/blog/fast-javascript-maxmin/
Array.max = function( array ){
return Math.max.apply( Math, array );
};
Array.min = function( array ){
return Math.min.apply( Math, array );
};
// Levenshstein Distance Calculation
function levenshtein_distance (t1, t2) {
var countI = t1.length+1;
var countJ = t2.length+1;
// build empty 'matrix'
var matrix = new Array (countI);
for (var i=0;i<countI;i++) {
matrix[i] = new Array (countJ);
}
// initialize the matrix;
// set m(0,0) = 0;
// m(0,0<=j<countJ) = j
// m(0<=i<countI, 0) = i
matrix[0][0] = 0;
for (var j=1;j<matrix[0].length;j++) {
matrix[0][j] = j;
}
for (var i=1;i<matrix.length;i++) {
matrix[i][0] = i;
}
// calculate the matrix
for (var i=1;i<matrix.length;i++) {
for (var j=1;j<matrix[i].length;j++) {
var costs = new Array ();
if (t1.charAt(i-1) == t2.charAt(j-1)) {
costs.push (matrix[i-1][j-1]);
}
costs.push (matrix[i-1][j-1] + 1);
costs.push (matrix[i][j-1] + 1);
costs.push (matrix[i-1][j] + 1);
matrix[i][j] = Array.min(costs);
}
}
// resultMatrix = matrix;
var result = new Object
result.distance = matrix[countI-1][countJ-1];
result.matrix = matrix;
return result;
}
I have a situation where I'm refactoring some code - I segregated out a lot of code in a loop and put it into a component, each activity is a method, all called w/i a loop.
when it runs - the second time thru the loop, it fails to resolve a.id - at line "var b = b( i, a.id );"
If i do a writeOutput() at each line, I see my vals at the start of the loop and at each line, until the last time.
function a() {
//do thing
return id;
}
function b() {
//do thing
return id;
}
function bigOne() {
for( var i=1; i<2; i++; ) {
var a = a( i );
var b = b( i, a.id );
}
}
Ive tried this too - same issue
function bigOne() {
var a = '';
var b = '';
for( var i=1; i<2; i++; ) {
a = a( i );
b = b( i, a.id );
}
}
I've read this Coldfusion, The symbol you provided [method_name] is not a function - but it's not the same thing, this article is dealing with getters and setters... I don't think this applies to my issue.
If I put my output like this (to 'see' it):
function bigOne() {
var loopcount = 1;
for( var i=1; i<2; i++; ) {
writeOutput( 'loop count = ' & loopcount );
var a = a( i );
writeoutput( 'a.id = ' & a.id );
var b = b( i, a.id );
}
}
I get this:
loop count = 1
a.id = 52978
loop count = 2
then error. ERROR MSG: Entity has incorrect type for being called as a function.
The symbol you provided insStop is not the name of a function.
Functions are pointed to by references just like variables are, so when you do this:
a = a();
you are overwriting the reference that points to the function a with the value returned from it. So the next time you try to call a(), a ia no longer your function, it is the value returned from it the previous time it was called.
When you get an error along the lines of a variable not being able to be used in the way you want to use it... dump it out and look at what it contains. That generally points you in the right direction as to what you're doing wrong.
To support what Adam says, you can avoid this by scoping your variables inside your function. By default, unscoped variables are placed in the "variables" scope and this applies to functions called within the same template. So instead of naming your function var "loopcount", you could scope it to the "local" scope i.e
var local.loopcount = 1
This will make the function var belong only to the function and thus unaffected by anything occurring outside the function and vice-versa.
I need to define a bunch of vector sequences, which are all a series of L,D,R,U for left, down, right, up or x for break. There are optional parts, and either/or parts. I have been using my own invented system for noting it down, but I want to document this for other, potentially non-programmers to read.
I now want to use a subset (I don't plan on using any wildcards, or infinite repetition for example) of regex to define the vector sequence and a script to produce all possible matching strings...
/LDR/ produces ['LDR']
/LDU?R/ produces ['LDR','LDUR']
/R(LD|DR)U/ produces ['RLDU','RDRU']
/DxR[DL]U?RDRU?/ produces ['DxRDRDR','DxRDRDRU','DxRDURDR','DxRDURDRU','DxRLRDR','DxRLRDRU','DxRLURDR','DxRLURDRU']
Is there an existing library I can use to generate all matches?
EDIT
I realised I will only be needing or statements, as optional things can be specified by thing or nothing maybe a, or b, both optional could be (a|b|). Is there another language I could use to define what I am trying to do?
By translating the java code form the link provided by #Dukeling into javascript, I think I have solved my problem...
var Node = function(str){
this.bracket = false;
this.children = [];
this.s = str;
this.next = null;
this.addChild = function(child){
this.children.push(child);
}
}
var printTree = function(root,prefix){
prefix = prefix.replace(/\./g, "");
for(i in root.children){
var child = root.children[i]
printTree(child, prefix + root.s);
}
if(root.children.length < 1){
console.log(prefix + root.s);
}
}
var Stack = function(){
this.arr = []
this.push = function(item){
this.arr.push(item)
}
this.pop = function(){
return this.arr.pop()
}
this.peek = function(){
return this.arr[this.arr.length-1]
}
}
var createTree = function(s){
// this line was causing errors for `a(((b|c)d)e)f` because the `(((` was only
// replacing the forst two brackets.
// var s = s.replace(/(\(|\||\))(\(|\||\))/g, "$1.$2");
// this line fixes it
var s = s.replace(/[(|)]+/g, function(x){ return x.split('').join('.') });
var str = s.split('');
var stack = new Stack();
var root = new Node("");
stack.push(root); // start node
var justFinishedBrackets = false;
for(i in str){
var c = str[i]
if(c == '('){
stack.peek().next = new Node("Y"); // node after brackets
stack.peek().bracket = true; // node before brackets
} else if (c == '|' || c == ')'){
var last = stack.peek(); // for (ab|cd)e, remember b / d so we can add child e to it
while (!stack.peek().bracket){ // while not node before brackets
stack.pop();
}
last.addChild(stack.peek().next); // for (b|c)d, add d as child to b / c
} else {
if (justFinishedBrackets){
var next = stack.pop().next;
next.s = "" + c;
stack.push(next);
} else {
var n = new Node(""+c);
stack.peek().addChild(n);
stack.push(n);
}
}
justFinishedBrackets = (c == ')');
}
return root;
}
// Test it out
var str = "a(c|mo(r|l))e";
var root = createTree(str);
printTree(root, "");
// Prints: ace / amore / amole
I only changed one line, to allow more than two consecutive brackets to be handled, and left the original translation in the comments
I also added a function to return an array of results, instead of printing them...
var getTree = function(root,prefix){
this.out = this.out || []
prefix = prefix.replace(/\./g, "");
for(i in root.children){
var child = root.children[i]
getTree(child, prefix + root.s, out);
}
if(root.children.length < 1){
this.out.push(prefix + root.s);
}
if(!prefix && !root.s){
var out = this.out;
this.out = null
return out;
}
}
// Test it
var str = "a(b|c)d";
var root = createTree(str);
console.log(getTree(root, ""));
// logs ["abd","acd"]
The last part, to allow for empty strings too, so... (ab|c|) means ab or c or nothing, and a convenience shortcut so that ab?c is translated into a(b|)c.
var getMatches = function(str){
str = str.replace(/(.)\?/g,"($1|)")
// replace all instances of `(???|)` with `(???|µ)`
// the µ will be stripped out later
str = str.replace(/\|\)/g,"|µ)")
// fix issues where last character is `)` by inserting token `µ`
// which will be stripped out later
str = str+"µ"
var root = createTree(str);
var res = getTree(root, "");
// strip out token µ
for(i in res){
res[i] = res[i].replace(/µ/g,"")
}
// return the array of results
return res
}
getMatches("a(bc|de?)?f");
// Returns: ["abcf","adef","adf","af"]
The last part is a little hack-ish as it relies on µ not being in the string (not an issue for me) and solves one bug, where a ) at the end on the input string was causing incorrect output, by inserting a µ at the end of each string, and then stripping it from the results. I would be happy for someone to suggest a better way to handle these issues, so it can work as a more general solution.
This code as it stands does everything I need. Thanks for all your help!
I'd imagine what you're trying is quite easy with a tree (as long as it's only or-statements).
Parse a(b|c)d (or any or-statement) into a tree as follows: a has children b and c, b and c have a mutual child d. b and c can both consist of 0 or more nodes (as in c could be g(e|f)h in which case (part of) the tree would be a -> g -> e/f (2 nodes) -> h -> d or c could be empty, in which case (part of) the tree would be a -> d, but an actual physical empty node may simplify things, which you should see when trying to write the code).
Generation of the tree shouldn't be too difficult with either recursion or a stack.
Once you have a tree, it's trivial to recursively iterate through the whole thing and generate all strings.
Also, here is a link to a similar question, providing a library or two.
EDIT:
"shouldn't be too difficult" - okay, maybe not
Here is a somewhat complicated example (Java) that may require some advanced knowledge about stacks.
Here is a slightly simpler version (Java) thanks to inserting a special character between each ((, )), |(, etc.
Note that neither of these are particularly efficient, the point is just to get the idea across.
Here is a JavaScript example that addresses parsing the (a|b) and (a|b|) possibilities, creates an array of possible substrings, and composes the matches based on this answer.
var regex = /\([RLUD]*\|[RLUD]*\|?\)/,
str = "R(LD|DR)U(R|L|)",
substrings = [], matches = [], str_tmp = str, find
while (find = regex.exec(str_tmp)){
var index = find.index
finds = find[0].split(/\|/)
substrings.push(str_tmp.substr(0, index))
if (find[0].match(/\|/g).length == 1)
substrings.push([finds[0].substr(1), finds[1].replace(/.$/, '')])
else if (find[0].match(/\|/g).length == 2){
substrings.push([finds[0].substr(1), ""])
substrings.push([finds[1], ""])
}
str_tmp = str_tmp.substr(index + find[0].length)
}
if (str_tmp) substrings.push([str_tmp])
console.log(substrings) //>>["R", ["LD", "DR"], "U", ["R", ""], ["L", ""]]
//compose matches
function printBin(tree, soFar, iterations) {
if (iterations == tree.length) matches.push(soFar)
else if (tree[iterations].length == 2){
printBin(tree, soFar + tree[iterations][0], iterations + 1)
printBin(tree, soFar + tree[iterations][1], iterations + 1)
}
else printBin(tree, soFar + tree[iterations], iterations + 1)
}
printBin(substrings, "", 0)
console.log(matches) //>>["RLDURL", "RLDUR", "RLDUL", "RLDU", "RDRURL", "RDRUR", "RDRUL", "RDRU"]