Writing an iterator for a CxxWrap vector - c++
I'm evaluating CxxWrap for a Julia (1.x) project I'm working on. I'm interested in having my CxxWrap code return a std::vector of a type, and iterating over the vector in my Julia code. The c++ part looks something like this:
using PointVec = std::vector<Point2D>;
.
.
.
JLCXX_MODULE define_julia_module(jlcxx::Module& types) {
.
.
.
types.add_type<PointVec>("PointVec")
.method("length", &PointVec::size)
.method("getindex", [](const PointVec& vec, size_t index) {
return vec.at(index);
});
.
.
.
}
This is based on some searching I've already done. The example that I cribbed from alluded to creating an iterator on the Julia side, but didn't elaborate. The descriptions I've seen of creating Julia iterators are pretty daunting, and it's not at all obvious how to plumb in the CxxWrap type that I'm importing. Any tips would be appreciated.
With a lot of help from Jan Strube on the Julia Discourse site, I came up with an approach that works:
module BoostWrapper
using CxxWrap
#wrapmodule("libboost_wrap")
function __init__()
#initcxx
end
export Point2D, PointVec, getx, gety,
Polygon2D, PolygonVec, add_vertex, scale_polygon, get_vertices,
poly_intersection, intersection_point
end
using Main.BoostWrapper
import Base: getindex, length, convert, iterate, size
iterate(it::PointVec) = length(it) > 0 ? (it[1], 2) : nothing
iterate(it::PointVec, i) = i <= length(it) ? (it[i], i+1) : nothing
length(it::PointVec) = Main.BoostWrapper.size(it)
getindex(it::PointVec, i) = Main.BoostWrapper.at(it, convert(UInt64, i - 1))
eltype(::Type{PointVec}) = Point2D
p1 = Point2D(-10.0, 10.0)
p2 = Point2D(10.0, 10.0)
p3 = Point2D(10.0, -10.0)
p4 = Point2D(-10.0, -10.0)
obstacle = Polygon2D()
add_vertex(obstacle, p1)
add_vertex(obstacle, p2)
add_vertex(obstacle, p3)
add_vertex(obstacle, p4)
pts = get_vertices(obstacle)
for pt in pts
println("current pt: ", getx(pt), ", ", gety(pt))
end
I'm including a lot of detail, because it turned out there were subtleties involved in name resolution, etc.
Related
Fuzzy matching in Google Sheets
Trying to compare two columns in GoogleSheets with this formula in Column C: =if(A1=B1,"","Mismatch") Works fine, but I'm getting a lot of false positives: A. B C MARY JO Mary Jo JAY, TIM TIM JAY Mismatch Sam Ron Sam Ron Mismatch Jack *Ma Jack MA Mismatch Any ideas how to work this?
This uses a score based approach to determine a match. You can determine what is/isn't a match based on that score: Score Formula = getMatchScore(A1,B1) Match Formula = if(C1<.7,"mismatch",) function getMatchScore(strA, strB, ignoreCase=true) { strA = String(strA); strB = String(strB) const toLowerCase = ignoreCase ? str => str.toLowerCase() : str => str; const splitWords = str => str.split(/\b/); let [maxLenStr, minLenStr] = strA.length > strB.length ? [strA, strB] : [strB, strA]; maxLenStr = toLowerCase(maxLenStr); minLenStr = toLowerCase(minLenStr); const maxLength = maxLenStr.length; const minLength = minLenStr.length; const lenScore = minLength / maxLength; const orderScore = Array.from(maxLenStr).reduce( (oldItem, nItem, index) => nItem === minLenStr[index] ? oldItem + 1 : oldItem, 0 ) / maxLength; const maxKeyWords = splitWords(maxLenStr); const minKeyWords = splitWords(minLenStr); const keywordScore = minKeyWords.reduce(({ score, searchWord }, nItem) => { const newSearchWord = searchWord?.replace(new RegExp(nItem, ignoreCase ? 'i' : ''), ''); score += searchWord.length != newSearchWord.length ? 1: 0; return { score, searchWord: newSearchWord }; }, { score: 0, searchWord: maxLenStr }).score / minKeyWords.length; const sortedMaxLenStr = Array.from(maxKeyWords.sort().join('')); const sortedMinLenStr = Array.from(minKeyWords.sort().join('')); const charScore = sortedMaxLenStr.reduce((oldItem, nItem, index) => { const surroundingChars = [sortedMinLenStr[index-1], sortedMinLenStr[index], sortedMinLenStr[index+1]] .filter(char => char != undefined); return surroundingChars.includes(nItem)? oldItem + 1 : oldItem }, 0) / maxLength; const score = (lenScore * .15) + (orderScore * .25) + (charScore * .25) + (keywordScore * .35); return score; }
try: =ARRAYFORMULA(IFERROR(IF(LEN( REGEXREPLACE(REGEXREPLACE(LOWER(A1:A), "[^a-z ]", ), LOWER("["&B1:B&"]"), ))>0, "mismatch", )))
Implementing fuzzy matching via Google Sheets formula would be difficult. I would recommend using a custom formula for this one or a full blown script (both via Google Apps Script) if you want to populate all rows at once. Custom Formula: function fuzzyMatch(string1, string2) { string1 = string1.toLowerCase() string2 = string2.toLowerCase(); var n = -1; for(i = 0; char = string2[i]; i++) if (!~(n = string1.indexOf(char, n + 1))) return 'Mismatch'; }; What this does is compare if the 2nd string's characters order is found in the same order as the first string. See sample data below for the case where it will return mismatch. Output: Note: Last row is a mismatch as 2nd string have r in it that isn't found at the first string thus correct order is not met. If this didn't meet your test cases, add a more definitive list that will show the expected output of the formula/function so this can be adjusted, or see player0's answer which solely uses Google Sheets formula and is less stricter with the conditions. Reference: https://stackoverflow.com/a/15252131/17842569
The main limitation of traditional fuzzy matching is that it doesn’t take into consideration similarities outside of the strings. Topic clustering requires semantic understanding. Goodlookup is a smart function for spreadsheet users that gets very close to semantic understanding. It’s a pre-trained model that has the intuition of GPT-3 and the join capabilities of fuzzy matching. Use it like vlookup or index match to speed up your topic clustering work in google sheets. https://www.goodlookup.com/
How to unit test that a document has a given permission?
I'm setting up unit testing using marklogic-unit-test and one thing I'd like to do is check a given document has a particular permission. However, when I test my permission against a Sequence of permissions, I get an XDMP-NONMIXEDCOMPLEXCONT error. I assume this has to do with the fact that permissions are complex objects and not something like a simple string, because this works with collections. const test = require("/test/test-helper.xqy"); let p1 = Sequence.from([xdmp.permission("rest-reader", "read", "element")]); let p2 = Sequence.from([ xdmp.permission("rest-reader", "read", "element"), xdmp.permission("rest-writer", "update", "element") ]); test.assertAtLeastOneEqual(p1, p2) Which returns: [javascript] XDMP-NONMIXEDCOMPLEXCONT: fn:data(<sec:permission xmlns:sec="http://marklogic.com/xdmp/security"> <sec:capability>...</sec:capability>...</sec:permission>) -- Node has complex type with non-mixed complex content The best alternative I can come up with is to explicitly loop over the Sequence and do the comparison with fn.deepEqual on each element. Is there a better way?
The test.assertAtLeastOneEqual() function expects atomic values (item() signature). The only test helper function that can handle elements is test.assertEqualXml(), but that looks for exact matches. I think your best bet is to stringify the permissions. Something like this: const test = require("/test/test-helper.xqy"); let p1 = [xdmp.permission("rest-reader", "read")]; let p2 = [ xdmp.permission("rest-reader", "read"), xdmp.permission("rest-writer", "update") ]; p1 = Sequence.from(p1.map(p => xdmp.roleName(p.roleId) + ':' + p.capability)); p2 = Sequence.from(p2.map(p => xdmp.roleName(p.roleId) + ':' + p.capability)); test.assertAtLeastOneEqual(p1, p2)
Looping in Mata with OLS
I need help with looping in Mata. I have to write a code for Beta coefficients for OLS in Mata using a loop. I am not sure how to call for the variables and create the code. Here is what I have so far. foreach j of local X { if { //for X'X matrix XX = [mata:XX = cross(X,1 , X,1)] XX } else { mata:Xy = cross(X,1 , y,0) Xy } I am getting an error message "invalid syntax".
I'm not sure what you need the loop for. Perhaps you can provide more information about that. However the following example may help you implement OLS in mata. Load example data from bcuse: ssc install bcuse clear bcuse bwght mata x = st_data(., ("male", "parity","lfaminc","packs")) cons = J(rows(x), 1, 1) X = (x, cons) y = st_data(., ("lbwght")) beta_hat = (invsym(X'*X))*(X'*y) e_hat = y - X * beta_hat s2 = (1 / (rows(X) - cols(X))) * (e_hat' * e_hat) B = J(cols(X), cols(X), 0) n = rows(X) for (i=1; i<=n; i++) { B =B+(e_hat[i,1]*X[i,.])'*(e_hat[i,1]*X[i,.]) } V_robust = (n/(n-cols(X)))*invsym(X'*X)*B*invsym(X'*X) se_robust = sqrt(diagonal(V_robust)) V_ols = s2 * invsym(X'*X) se_ols = sqrt(diagonal(V_ols)) beta_hat se_robust end This is far from the only way to implement OLS using mata. See the Stata Blog for another example using quadcross, I like my example because it preserves a little more of the matrix algebra in the code.
How to sympify initial conditions for ODE in sympy?
I am passing initial conditions as string, to be used to solving an ODE in sympy. It is a first order ode, so for example, lets take initial conditions as y(0):3 for example. From help ics is the set of initial/boundary conditions for the differential equation. It should be given in the form of {f(x0): x1, f(x).diff(x).subs(x, x2): x3} I need to pass this to sympy.dsolve. But sympify(ic) gives an error for some reason. What other tricks to use to make this work? Here is MWE. First one shows it works without initial conditions being string (normal mode of operation) from sympy import * x = Symbol('x') y = Function('y') ode = Eq(Derivative(y(x),x),1+2*x) sol = dsolve(ode,y(x),ics={y(0):3}) gives sol Eq(y(x), x**2 + x + 3) Now the case when ics is string from sympy import * ic = "y(0):3" x = Symbol('x') y = Function('y') ode = Eq(Derivative(y(x),x),1+2*x) sol = dsolve(ode,y(x),ics={ sympify(ic) }) gives SympifyError: Sympify of expression 'could not parse 'y(0):3'' failed, because of exception being raised: SyntaxError: invalid syntax (, line 1) So looking at sympify webpage sympify(a, locals=None, convert_xor=True, strict=False, rational=False, evaluate=None) And tried changing different options as shown above, still the syntax error shows up. I also tried sol = dsolve(ode,y(x),ics= { eval(ic) } ) But this gives syntax error as well Is there a trick to use to convert this initial conditions string to something sympy is happy with? Python 4.7 with sympy 1.5 As temporary work around, currently I do this from sympy import * ic = "y(0):3" ic = ic.split(":") x = Symbol('x') y = Function('y') ode = Eq(Derivative(y(x),x),1+2*x) sol = dsolve(ode,y(x),ics= {S(ic[0]):S(ic[1])} ) Which works. So the problem is with : initially, sympify (or S) do not handle : it seems.
You can use sympify('{y(0):3}'). I don't know what your actual goal is but I don't recommend parsing strings like this in general. The format for ICs is actually slightly awkward so that for a second order ODE it looks like: ics = '{y(0):3, y(x).diff(x).subs(x, 0):1}' If you're parsing a string then you can come up with a better syntax than that like ics = "y(0)=3, y'(0)=1" Also you should use parse_expr rather than converting strings with sympify or S: https://docs.sympy.org/latest/modules/parsing.html#sympy.parsing.sympy_parser.parse_expr
Scala memory issue on List vs. Vector
I wrote a solution to project Euler problem #59 in Scala and I do not understand why switching between Vector and List adds what I think is a memory leak. Here is a working, brute force solution using Vectors. val code = scala.io.Source.fromFile("e59.txt").getLines() .flatMap(l => l.split(',')).map(_.toInt).toVector val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toVector def decode(k: Int)(code: Vector[Int])(pswd: Vector[Int]): Vector[Int] = { code.grouped(k).flatMap(cs => cs.toVector.zip(pswd).map(t => t._1 ^ t._2)).toVector } def scoreText(text: Vector[Int]): Int = { if (text.contains((c: Int) => (c < 0 || c > 128))) -1 else { val words = text.map(_.toChar).mkString.toLowerCase.split(' ') words.length - words.diff(commonWords).length } } lazy val psswds = for { a <- (97 to 122); b <- (97 to 122); c <- (97 to 122) } yield Vector(a, b, c) val ans = psswds.toStream.map(decode(3)(code)) .map(text => (text, scoreText(text))) .maxBy(_._2)._1.sum println(ans) I store original code (a collection of ordered ints), each password and some common English words as Vectors. However, if I replace Vector with List, my program slows down with each checked password and eventually crashes: val code = scala.io.Source.fromFile("e59.txt").getLines() .flatMap(l => l.split(',')).map(_.toInt).toList val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toList def decode(k: Int)(code: List[Int])(pswd: List[Int]): List[Int] = { println(pswd) code.grouped(k).flatMap(cs => cs.toList.zip(pswd).map(t => t._1 ^ t._2)).toList } def scoreText(text: List[Int]): Int = { if (text.contains((c: Int) => (c < 0 || c > 128))) -1 else { val words = text.map(_.toChar).mkString.toLowerCase.split(' ') words.length - words.diff(commonWords).length } } lazy val psswds = for { a <- (97 to 122); b <- (97 to 122); c <- (97 to 122) } yield List(a, b, c) val ans = psswds.toStream.map(decode(3)(code)) .map(text => (text, scoreText(text))) .maxBy(_._2)._1.sum println(ans) Error: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.valueOf(String.java:2861) at java.lang.Character.toString(Character.java:4439) at java.lang.String.valueOf(String.java:2847) at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200) at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:349) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:342) at scala.collection.AbstractTraversable.addString(Traversable.scala:104) at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:308) at scala.collection.AbstractTraversable.mkString(Traversable.scala:104) at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:310) at scala.collection.AbstractTraversable.mkString(Traversable.scala:104) at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:312) at scala.collection.AbstractTraversable.mkString(Traversable.scala:104) at Main$$anon$1.Main$$anon$$scoreText(e59_list.scala:14) at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26) at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1222) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1212) at scala.collection.immutable.Stream.foreach(Stream.scala:595) at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:227) at scala.collection.AbstractTraversable.maxBy(Traversable.scala:104) at Main$$anon$1.<init>(e59_list.scala:27) at Main$.main(e59_list.scala:1) at Main.main(e59_list.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:70) Files used: common_words.txt a able about across after all almost also am among an and any are as at be because been but by can cannot could dear did do does either else ever every for from get got had has have he her hers him his how however i if in into is it its just least let like likely may me might most must my neither no nor not of off often on only or other our own rather said say says she should since so some than that the their them then there these they this tis to too twas us wants was we were what when where which while who whom why will with would yet you your e59.txt 79,59,12,2,79,35,8,28,20,2,3,68,8,9,68,45,0,12,9,67,68,4,7,5,23,27,1,21,79,85,78,79,85,71,38,10,71,27,12,2,79,6,2,8,13,9,1,13,9,8,68,19,7,1,71,56,11,21,11,68,6,3,22,2,14,0,30,79,1,31,6,23,19,10,0,73,79,44,2,79,19,6,28,68,16,6,16,15,79,35,8,11,72,71,14,10,3,79,12,2,79,19,6,28,68,32,0,0,73,79,86,71,39,1,71,24,5,20,79,13,9,79,16,15,10,68,5,10,3,14,1,10,14,1,3,71,24,13,19,7,68,32,0,0,73,79,87,71,39,1,71,12,22,2,14,16,2,11,68,2,25,1,21,22,16,15,6,10,0,79,16,15,10,22,2,79,13,20,65,68,41,0,16,15,6,10,0,79,1,31,6,23,19,28,68,19,7,5,19,79,12,2,79,0,14,11,10,64,27,68,10,14,15,2,65,68,83,79,40,14,9,1,71,6,16,20,10,8,1,79,19,6,28,68,14,1,68,15,6,9,75,79,5,9,11,68,19,7,13,20,79,8,14,9,1,71,8,13,17,10,23,71,3,13,0,7,16,71,27,11,71,10,18,2,29,29,8,1,1,73,79,81,71,59,12,2,79,8,14,8,12,19,79,23,15,6,10,2,28,68,19,7,22,8,26,3,15,79,16,15,10,68,3,14,22,12,1,1,20,28,72,71,14,10,3,79,16,15,10,68,3,14,22,12,1,1,20,28,68,4,14,10,71,1,1,17,10,22,71,10,28,19,6,10,0,26,13,20,7,68,14,27,74,71,89,68,32,0,0,71,28,1,9,27,68,45,0,12,9,79,16,15,10,68,37,14,20,19,6,23,19,79,83,71,27,11,71,27,1,11,3,68,2,25,1,21,22,11,9,10,68,6,13,11,18,27,68,19,7,1,71,3,13,0,7,16,71,28,11,71,27,12,6,27,68,2,25,1,21,22,11,9,10,68,10,6,3,15,27,68,5,10,8,14,10,18,2,79,6,2,12,5,18,28,1,71,0,2,71,7,13,20,79,16,2,28,16,14,2,11,9,22,74,71,87,68,45,0,12,9,79,12,14,2,23,2,3,2,71,24,5,20,79,10,8,27,68,19,7,1,71,3,13,0,7,16,92,79,12,2,79,19,6,28,68,8,1,8,30,79,5,71,24,13,19,1,1,20,28,68,19,0,68,19,7,1,71,3,13,0,7,16,73,79,93,71,59,12,2,79,11,9,10,68,16,7,11,71,6,23,71,27,12,2,79,16,21,26,1,71,3,13,0,7,16,75,79,19,15,0,68,0,6,18,2,28,68,11,6,3,15,27,68,19,0,68,2,25,1,21,22,11,9,10,72,71,24,5,20,79,3,8,6,10,0,79,16,8,79,7,8,2,1,71,6,10,19,0,68,19,7,1,71,24,11,21,3,0,73,79,85,87,79,38,18,27,68,6,3,16,15,0,17,0,7,68,19,7,1,71,24,11,21,3,0,71,24,5,20,79,9,6,11,1,71,27,12,21,0,17,0,7,68,15,6,9,75,79,16,15,10,68,16,0,22,11,11,68,3,6,0,9,72,16,71,29,1,4,0,3,9,6,30,2,79,12,14,2,68,16,7,1,9,79,12,2,79,7,6,2,1,73,79,85,86,79,33,17,10,10,71,6,10,71,7,13,20,79,11,16,1,68,11,14,10,3,79,5,9,11,68,6,2,11,9,8,68,15,6,23,71,0,19,9,79,20,2,0,20,11,10,72,71,7,1,71,24,5,20,79,10,8,27,68,6,12,7,2,31,16,2,11,74,71,94,86,71,45,17,19,79,16,8,79,5,11,3,68,16,7,11,71,13,1,11,6,1,17,10,0,71,7,13,10,79,5,9,11,68,6,12,7,2,31,16,2,11,68,15,6,9,75,79,12,2,79,3,6,25,1,71,27,12,2,79,22,14,8,12,19,79,16,8,79,6,2,12,11,10,10,68,4,7,13,11,11,22,2,1,68,8,9,68,32,0,0,73,79,85,84,79,48,15,10,29,71,14,22,2,79,22,2,13,11,21,1,69,71,59,12,14,28,68,14,28,68,9,0,16,71,14,68,23,7,29,20,6,7,6,3,68,5,6,22,19,7,68,21,10,23,18,3,16,14,1,3,71,9,22,8,2,68,15,26,9,6,1,68,23,14,23,20,6,11,9,79,11,21,79,20,11,14,10,75,79,16,15,6,23,71,29,1,5,6,22,19,7,68,4,0,9,2,28,68,1,29,11,10,79,35,8,11,74,86,91,68,52,0,68,19,7,1,71,56,11,21,11,68,5,10,7,6,2,1,71,7,17,10,14,10,71,14,10,3,79,8,14,25,1,3,79,12,2,29,1,71,0,10,71,10,5,21,27,12,71,14,9,8,1,3,71,26,23,73,79,44,2,79,19,6,28,68,1,26,8,11,79,11,1,79,17,9,9,5,14,3,13,9,8,68,11,0,18,2,79,5,9,11,68,1,14,13,19,7,2,18,3,10,2,28,23,73,79,37,9,11,68,16,10,68,15,14,18,2,79,23,2,10,10,71,7,13,20,79,3,11,0,22,30,67,68,19,7,1,71,8,8,8,29,29,71,0,2,71,27,12,2,79,11,9,3,29,71,60,11,9,79,11,1,79,16,15,10,68,33,14,16,15,10,22,73
Large amount of Lists create more load on GC comparing to the same Vectors. But your problem is not about right choice of collections, but about wrong use of Stream. Scala's streams can be very memory inefficient if used improperly. In your case, I assume, you were trying to use Stream to avoid eager computation of the transformed passwds collection, but you actually made the things worse (as Stream not only memoized your elements, it created extra overhead with Stream wrappers of these elements). What you had to do is just to replace toStream with view. It will create collection wrapper which makes nearly all transformations lazy (basically what you tried to achieve). val ans = psswds.view.map(decode(3)(code)) .map(text => (text, scoreText(text))) .maxBy(_._2)._1.sum After this tiny fix you program runs fine even with -Xmx5m (I checked). There are also many other things to optimize in your program (try to avoid creating excessive collections), but I'll leave it to you.