Writing an iterator for a CxxWrap vector - c++

I'm evaluating CxxWrap for a Julia (1.x) project I'm working on. I'm interested in having my CxxWrap code return a std::vector of a type, and iterating over the vector in my Julia code. The c++ part looks something like this:
using PointVec = std::vector<Point2D>;
.
.
.
JLCXX_MODULE define_julia_module(jlcxx::Module& types) {
.
.
.
types.add_type<PointVec>("PointVec")
.method("length", &PointVec::size)
.method("getindex", [](const PointVec& vec, size_t index) {
return vec.at(index);
});
.
.
.
}
This is based on some searching I've already done. The example that I cribbed from alluded to creating an iterator on the Julia side, but didn't elaborate. The descriptions I've seen of creating Julia iterators are pretty daunting, and it's not at all obvious how to plumb in the CxxWrap type that I'm importing. Any tips would be appreciated.

With a lot of help from Jan Strube on the Julia Discourse site, I came up with an approach that works:
module BoostWrapper
using CxxWrap
#wrapmodule("libboost_wrap")
function __init__()
#initcxx
end
export Point2D, PointVec, getx, gety,
Polygon2D, PolygonVec, add_vertex, scale_polygon, get_vertices,
poly_intersection, intersection_point
end
using Main.BoostWrapper
import Base: getindex, length, convert, iterate, size
iterate(it::PointVec) = length(it) > 0 ? (it[1], 2) : nothing
iterate(it::PointVec, i) = i <= length(it) ? (it[i], i+1) : nothing
length(it::PointVec) = Main.BoostWrapper.size(it)
getindex(it::PointVec, i) = Main.BoostWrapper.at(it, convert(UInt64, i - 1))
eltype(::Type{PointVec}) = Point2D
p1 = Point2D(-10.0, 10.0)
p2 = Point2D(10.0, 10.0)
p3 = Point2D(10.0, -10.0)
p4 = Point2D(-10.0, -10.0)
obstacle = Polygon2D()
add_vertex(obstacle, p1)
add_vertex(obstacle, p2)
add_vertex(obstacle, p3)
add_vertex(obstacle, p4)
pts = get_vertices(obstacle)
for pt in pts
println("current pt: ", getx(pt), ", ", gety(pt))
end
I'm including a lot of detail, because it turned out there were subtleties involved in name resolution, etc.

Related

Fuzzy matching in Google Sheets

Trying to compare two columns in GoogleSheets with this formula in Column C:
=if(A1=B1,"","Mismatch")
Works fine, but I'm getting a lot of false positives:
A.
B
C
MARY JO
Mary Jo
JAY, TIM
TIM JAY
Mismatch
Sam Ron
Sam Ron
Mismatch
Jack *Ma
Jack MA
Mismatch
Any ideas how to work this?
This uses a score based approach to determine a match. You can determine what is/isn't a match based on that score:
Score Formula = getMatchScore(A1,B1)
Match Formula = if(C1<.7,"mismatch",)
function getMatchScore(strA, strB, ignoreCase=true) {
strA = String(strA);
strB = String(strB)
const toLowerCase = ignoreCase ? str => str.toLowerCase() : str => str;
const splitWords = str => str.split(/\b/);
let [maxLenStr, minLenStr] = strA.length > strB.length ? [strA, strB] : [strB, strA];
maxLenStr = toLowerCase(maxLenStr);
minLenStr = toLowerCase(minLenStr);
const maxLength = maxLenStr.length;
const minLength = minLenStr.length;
const lenScore = minLength / maxLength;
const orderScore = Array.from(maxLenStr).reduce(
(oldItem, nItem, index) => nItem === minLenStr[index] ? oldItem + 1 : oldItem, 0
) / maxLength;
const maxKeyWords = splitWords(maxLenStr);
const minKeyWords = splitWords(minLenStr);
const keywordScore = minKeyWords.reduce(({ score, searchWord }, nItem) => {
const newSearchWord = searchWord?.replace(new RegExp(nItem, ignoreCase ? 'i' : ''), '');
score += searchWord.length != newSearchWord.length ? 1: 0;
return { score, searchWord: newSearchWord };
}, { score: 0, searchWord: maxLenStr }).score / minKeyWords.length;
const sortedMaxLenStr = Array.from(maxKeyWords.sort().join(''));
const sortedMinLenStr = Array.from(minKeyWords.sort().join(''));
const charScore = sortedMaxLenStr.reduce((oldItem, nItem, index) => {
const surroundingChars = [sortedMinLenStr[index-1], sortedMinLenStr[index], sortedMinLenStr[index+1]]
.filter(char => char != undefined);
return surroundingChars.includes(nItem)? oldItem + 1 : oldItem
}, 0) / maxLength;
const score = (lenScore * .15) + (orderScore * .25) + (charScore * .25) + (keywordScore * .35);
return score;
}
try:
=ARRAYFORMULA(IFERROR(IF(LEN(
REGEXREPLACE(REGEXREPLACE(LOWER(A1:A), "[^a-z ]", ),
LOWER("["&B1:B&"]"), ))>0, "mismatch", )))
Implementing fuzzy matching via Google Sheets formula would be difficult. I would recommend using a custom formula for this one or a full blown script (both via Google Apps Script) if you want to populate all rows at once.
Custom Formula:
function fuzzyMatch(string1, string2) {
string1 = string1.toLowerCase()
string2 = string2.toLowerCase();
var n = -1;
for(i = 0; char = string2[i]; i++)
if (!~(n = string1.indexOf(char, n + 1)))
return 'Mismatch';
};
What this does is compare if the 2nd string's characters order is found in the same order as the first string. See sample data below for the case where it will return mismatch.
Output:
Note:
Last row is a mismatch as 2nd string have r in it that isn't found at the first string thus correct order is not met.
If this didn't meet your test cases, add a more definitive list that will show the expected output of the formula/function so this can be adjusted, or see player0's answer which solely uses Google Sheets formula and is less stricter with the conditions.
Reference:
https://stackoverflow.com/a/15252131/17842569
The main limitation of traditional fuzzy matching is that it doesn’t take into consideration similarities outside of the strings. Topic clustering requires semantic understanding. Goodlookup is a smart function for spreadsheet users that gets very close to semantic understanding. It’s a pre-trained model that has the intuition of GPT-3 and the join capabilities of fuzzy matching. Use it like vlookup or index match to speed up your topic clustering work in google sheets.
https://www.goodlookup.com/

How to unit test that a document has a given permission?

I'm setting up unit testing using marklogic-unit-test and one thing I'd like to do is check a given document has a particular permission. However, when I test my permission against a Sequence of permissions, I get an XDMP-NONMIXEDCOMPLEXCONT error. I assume this has to do with the fact that permissions are complex objects and not something like a simple string, because this works with collections.
const test = require("/test/test-helper.xqy");
let p1 = Sequence.from([xdmp.permission("rest-reader", "read", "element")]);
let p2 = Sequence.from([
xdmp.permission("rest-reader", "read", "element"),
xdmp.permission("rest-writer", "update", "element")
]);
test.assertAtLeastOneEqual(p1, p2)
Which returns:
[javascript] XDMP-NONMIXEDCOMPLEXCONT: fn:data(<sec:permission
xmlns:sec="http://marklogic.com/xdmp/security">
<sec:capability>...</sec:capability>...</sec:permission>)
-- Node has complex type with non-mixed complex content
The best alternative I can come up with is to explicitly loop over the Sequence and do the comparison with fn.deepEqual on each element. Is there a better way?
The test.assertAtLeastOneEqual() function expects atomic values (item() signature). The only test helper function that can handle elements is test.assertEqualXml(), but that looks for exact matches. I think your best bet is to stringify the permissions. Something like this:
const test = require("/test/test-helper.xqy");
let p1 = [xdmp.permission("rest-reader", "read")];
let p2 = [
xdmp.permission("rest-reader", "read"),
xdmp.permission("rest-writer", "update")
];
p1 = Sequence.from(p1.map(p => xdmp.roleName(p.roleId) + ':' + p.capability));
p2 = Sequence.from(p2.map(p => xdmp.roleName(p.roleId) + ':' + p.capability));
test.assertAtLeastOneEqual(p1, p2)

Looping in Mata with OLS

I need help with looping in Mata. I have to write a code for Beta coefficients for OLS in Mata using a loop. I am not sure how to call for the variables and create the code. Here is what I have so far.
foreach j of local X {
if { //for X'X
matrix XX = [mata:XX = cross(X,1 , X,1)]
XX
}
else {
mata:Xy = cross(X,1 , y,0)
Xy
}
I am getting an error message "invalid syntax".
I'm not sure what you need the loop for. Perhaps you can provide more information about that. However the following example may help you implement OLS in mata.
Load example data from bcuse:
ssc install bcuse
clear
bcuse bwght
mata
x = st_data(., ("male", "parity","lfaminc","packs"))
cons = J(rows(x), 1, 1)
X = (x, cons)
y = st_data(., ("lbwght"))
beta_hat = (invsym(X'*X))*(X'*y)
e_hat = y - X * beta_hat
s2 = (1 / (rows(X) - cols(X))) * (e_hat' * e_hat)
B = J(cols(X), cols(X), 0)
n = rows(X)
for (i=1; i<=n; i++) {
B =B+(e_hat[i,1]*X[i,.])'*(e_hat[i,1]*X[i,.])
}
V_robust = (n/(n-cols(X)))*invsym(X'*X)*B*invsym(X'*X)
se_robust = sqrt(diagonal(V_robust))
V_ols = s2 * invsym(X'*X)
se_ols = sqrt(diagonal(V_ols))
beta_hat
se_robust
end
This is far from the only way to implement OLS using mata. See the Stata Blog for another example using quadcross, I like my example because it preserves a little more of the matrix algebra in the code.

How to sympify initial conditions for ODE in sympy?

I am passing initial conditions as string, to be used to solving an ODE in sympy.
It is a first order ode, so for example, lets take initial conditions as y(0):3 for example. From help
ics is the set of initial/boundary conditions for the differential
equation. It should be given in the form of {f(x0): x1,
f(x).diff(x).subs(x, x2): x3}
I need to pass this to sympy.dsolve. But sympify(ic) gives an error for some reason.
What other tricks to use to make this work? Here is MWE. First one shows it works without initial conditions being string (normal mode of operation)
from sympy import *
x = Symbol('x')
y = Function('y')
ode = Eq(Derivative(y(x),x),1+2*x)
sol = dsolve(ode,y(x),ics={y(0):3})
gives sol Eq(y(x), x**2 + x + 3)
Now the case when ics is string
from sympy import *
ic = "y(0):3"
x = Symbol('x')
y = Function('y')
ode = Eq(Derivative(y(x),x),1+2*x)
sol = dsolve(ode,y(x),ics={ sympify(ic) })
gives
SympifyError: Sympify of expression 'could not parse 'y(0):3'' failed,
because of exception being raised: SyntaxError: invalid syntax
(, line 1)
So looking at sympify webpage
sympify(a, locals=None, convert_xor=True, strict=False, rational=False, evaluate=None)
And tried changing different options as shown above, still the syntax error shows up.
I also tried
sol = dsolve(ode,y(x),ics= { eval(ic) } )
But this gives syntax error as well
Is there a trick to use to convert this initial conditions string to something sympy is happy with?
Python 4.7 with sympy 1.5
As temporary work around, currently I do this
from sympy import *
ic = "y(0):3"
ic = ic.split(":")
x = Symbol('x')
y = Function('y')
ode = Eq(Derivative(y(x),x),1+2*x)
sol = dsolve(ode,y(x),ics= {S(ic[0]):S(ic[1])} )
Which works. So the problem is with : initially, sympify (or S) do not handle : it seems.
You can use sympify('{y(0):3}').
I don't know what your actual goal is but I don't recommend parsing strings like this in general. The format for ICs is actually slightly awkward so that for a second order ODE it looks like:
ics = '{y(0):3, y(x).diff(x).subs(x, 0):1}'
If you're parsing a string then you can come up with a better syntax than that like
ics = "y(0)=3, y'(0)=1"
Also you should use parse_expr rather than converting strings with sympify or S:
https://docs.sympy.org/latest/modules/parsing.html#sympy.parsing.sympy_parser.parse_expr

Scala memory issue on List vs. Vector

I wrote a solution to project Euler problem #59 in Scala and I do not understand why switching between Vector and List adds what I think is a memory leak.
Here is a working, brute force solution using Vectors.
val code = scala.io.Source.fromFile("e59.txt").getLines()
.flatMap(l => l.split(',')).map(_.toInt).toVector
val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toVector
def decode(k: Int)(code: Vector[Int])(pswd: Vector[Int]): Vector[Int] = {
code.grouped(k).flatMap(cs => cs.toVector.zip(pswd).map(t => t._1 ^ t._2)).toVector
}
def scoreText(text: Vector[Int]): Int = {
if (text.contains((c: Int) => (c < 0 || c > 128))) -1
else {
val words = text.map(_.toChar).mkString.toLowerCase.split(' ')
words.length - words.diff(commonWords).length
}
}
lazy val psswds = for {
a <- (97 to 122);
b <- (97 to 122);
c <- (97 to 122)
} yield Vector(a, b, c)
val ans = psswds.toStream.map(decode(3)(code))
.map(text => (text, scoreText(text)))
.maxBy(_._2)._1.sum
println(ans)
I store original code (a collection of ordered ints), each password and some common English words as Vectors.
However, if I replace Vector with List, my program slows down with each checked password and eventually crashes:
val code = scala.io.Source.fromFile("e59.txt").getLines()
.flatMap(l => l.split(',')).map(_.toInt).toList
val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toList
def decode(k: Int)(code: List[Int])(pswd: List[Int]): List[Int] = {
println(pswd)
code.grouped(k).flatMap(cs => cs.toList.zip(pswd).map(t => t._1 ^ t._2)).toList
}
def scoreText(text: List[Int]): Int = {
if (text.contains((c: Int) => (c < 0 || c > 128))) -1
else {
val words = text.map(_.toChar).mkString.toLowerCase.split(' ')
words.length - words.diff(commonWords).length
}
}
lazy val psswds = for {
a <- (97 to 122);
b <- (97 to 122);
c <- (97 to 122)
} yield List(a, b, c)
val ans = psswds.toStream.map(decode(3)(code))
.map(text => (text, scoreText(text)))
.maxBy(_._2)._1.sum
println(ans)
Error:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.valueOf(String.java:2861)
at java.lang.Character.toString(Character.java:4439)
at java.lang.String.valueOf(String.java:2847)
at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:349)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:342)
at scala.collection.AbstractTraversable.addString(Traversable.scala:104)
at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:308)
at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:310)
at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:312)
at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
at Main$$anon$1.Main$$anon$$scoreText(e59_list.scala:14)
at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26)
at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1222)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1212)
at scala.collection.immutable.Stream.foreach(Stream.scala:595)
at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:227)
at scala.collection.AbstractTraversable.maxBy(Traversable.scala:104)
at Main$$anon$1.<init>(e59_list.scala:27)
at Main$.main(e59_list.scala:1)
at Main.main(e59_list.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:70)
Files used:
common_words.txt
a
able
about
across
after
all
almost
also
am
among
an
and
any
are
as
at
be
because
been
but
by
can
cannot
could
dear
did
do
does
either
else
ever
every
for
from
get
got
had
has
have
he
her
hers
him
his
how
however
i
if
in
into
is
it
its
just
least
let
like
likely
may
me
might
most
must
my
neither
no
nor
not
of
off
often
on
only
or
other
our
own
rather
said
say
says
she
should
since
so
some
than
that
the
their
them
then
there
these
they
this
tis
to
too
twas
us
wants
was
we
were
what
when
where
which
while
who
whom
why
will
with
would
yet
you
your
e59.txt
79,59,12,2,79,35,8,28,20,2,3,68,8,9,68,45,0,12,9,67,68,4,7,5,23,27,1,21,79,85,78,79,85,71,38,10,71,27,12,2,79,6,2,8,13,9,1,13,9,8,68,19,7,1,71,56,11,21,11,68,6,3,22,2,14,0,30,79,1,31,6,23,19,10,0,73,79,44,2,79,19,6,28,68,16,6,16,15,79,35,8,11,72,71,14,10,3,79,12,2,79,19,6,28,68,32,0,0,73,79,86,71,39,1,71,24,5,20,79,13,9,79,16,15,10,68,5,10,3,14,1,10,14,1,3,71,24,13,19,7,68,32,0,0,73,79,87,71,39,1,71,12,22,2,14,16,2,11,68,2,25,1,21,22,16,15,6,10,0,79,16,15,10,22,2,79,13,20,65,68,41,0,16,15,6,10,0,79,1,31,6,23,19,28,68,19,7,5,19,79,12,2,79,0,14,11,10,64,27,68,10,14,15,2,65,68,83,79,40,14,9,1,71,6,16,20,10,8,1,79,19,6,28,68,14,1,68,15,6,9,75,79,5,9,11,68,19,7,13,20,79,8,14,9,1,71,8,13,17,10,23,71,3,13,0,7,16,71,27,11,71,10,18,2,29,29,8,1,1,73,79,81,71,59,12,2,79,8,14,8,12,19,79,23,15,6,10,2,28,68,19,7,22,8,26,3,15,79,16,15,10,68,3,14,22,12,1,1,20,28,72,71,14,10,3,79,16,15,10,68,3,14,22,12,1,1,20,28,68,4,14,10,71,1,1,17,10,22,71,10,28,19,6,10,0,26,13,20,7,68,14,27,74,71,89,68,32,0,0,71,28,1,9,27,68,45,0,12,9,79,16,15,10,68,37,14,20,19,6,23,19,79,83,71,27,11,71,27,1,11,3,68,2,25,1,21,22,11,9,10,68,6,13,11,18,27,68,19,7,1,71,3,13,0,7,16,71,28,11,71,27,12,6,27,68,2,25,1,21,22,11,9,10,68,10,6,3,15,27,68,5,10,8,14,10,18,2,79,6,2,12,5,18,28,1,71,0,2,71,7,13,20,79,16,2,28,16,14,2,11,9,22,74,71,87,68,45,0,12,9,79,12,14,2,23,2,3,2,71,24,5,20,79,10,8,27,68,19,7,1,71,3,13,0,7,16,92,79,12,2,79,19,6,28,68,8,1,8,30,79,5,71,24,13,19,1,1,20,28,68,19,0,68,19,7,1,71,3,13,0,7,16,73,79,93,71,59,12,2,79,11,9,10,68,16,7,11,71,6,23,71,27,12,2,79,16,21,26,1,71,3,13,0,7,16,75,79,19,15,0,68,0,6,18,2,28,68,11,6,3,15,27,68,19,0,68,2,25,1,21,22,11,9,10,72,71,24,5,20,79,3,8,6,10,0,79,16,8,79,7,8,2,1,71,6,10,19,0,68,19,7,1,71,24,11,21,3,0,73,79,85,87,79,38,18,27,68,6,3,16,15,0,17,0,7,68,19,7,1,71,24,11,21,3,0,71,24,5,20,79,9,6,11,1,71,27,12,21,0,17,0,7,68,15,6,9,75,79,16,15,10,68,16,0,22,11,11,68,3,6,0,9,72,16,71,29,1,4,0,3,9,6,30,2,79,12,14,2,68,16,7,1,9,79,12,2,79,7,6,2,1,73,79,85,86,79,33,17,10,10,71,6,10,71,7,13,20,79,11,16,1,68,11,14,10,3,79,5,9,11,68,6,2,11,9,8,68,15,6,23,71,0,19,9,79,20,2,0,20,11,10,72,71,7,1,71,24,5,20,79,10,8,27,68,6,12,7,2,31,16,2,11,74,71,94,86,71,45,17,19,79,16,8,79,5,11,3,68,16,7,11,71,13,1,11,6,1,17,10,0,71,7,13,10,79,5,9,11,68,6,12,7,2,31,16,2,11,68,15,6,9,75,79,12,2,79,3,6,25,1,71,27,12,2,79,22,14,8,12,19,79,16,8,79,6,2,12,11,10,10,68,4,7,13,11,11,22,2,1,68,8,9,68,32,0,0,73,79,85,84,79,48,15,10,29,71,14,22,2,79,22,2,13,11,21,1,69,71,59,12,14,28,68,14,28,68,9,0,16,71,14,68,23,7,29,20,6,7,6,3,68,5,6,22,19,7,68,21,10,23,18,3,16,14,1,3,71,9,22,8,2,68,15,26,9,6,1,68,23,14,23,20,6,11,9,79,11,21,79,20,11,14,10,75,79,16,15,6,23,71,29,1,5,6,22,19,7,68,4,0,9,2,28,68,1,29,11,10,79,35,8,11,74,86,91,68,52,0,68,19,7,1,71,56,11,21,11,68,5,10,7,6,2,1,71,7,17,10,14,10,71,14,10,3,79,8,14,25,1,3,79,12,2,29,1,71,0,10,71,10,5,21,27,12,71,14,9,8,1,3,71,26,23,73,79,44,2,79,19,6,28,68,1,26,8,11,79,11,1,79,17,9,9,5,14,3,13,9,8,68,11,0,18,2,79,5,9,11,68,1,14,13,19,7,2,18,3,10,2,28,23,73,79,37,9,11,68,16,10,68,15,14,18,2,79,23,2,10,10,71,7,13,20,79,3,11,0,22,30,67,68,19,7,1,71,8,8,8,29,29,71,0,2,71,27,12,2,79,11,9,3,29,71,60,11,9,79,11,1,79,16,15,10,68,33,14,16,15,10,22,73
Large amount of Lists create more load on GC comparing to the same Vectors. But your problem is not about right choice of collections, but about wrong use of Stream.
Scala's streams can be very memory inefficient if used improperly. In your case, I assume, you were trying to use Stream to avoid eager computation of the transformed passwds collection, but you actually made the things worse (as Stream not only memoized your elements, it created extra overhead with Stream wrappers of these elements).
What you had to do is just to replace toStream with view. It will create collection wrapper which makes nearly all transformations lazy (basically what you tried to achieve).
val ans = psswds.view.map(decode(3)(code))
.map(text => (text, scoreText(text)))
.maxBy(_._2)._1.sum
After this tiny fix you program runs fine even with -Xmx5m (I checked).
There are also many other things to optimize in your program (try to avoid creating excessive collections), but I'll leave it to you.