XQuery get Random Text from a List - list

Suppose I have a list of 100 String Element and I want to get 50 of these random Text string to be returned Randomly.
I try to do this:
let $list := ("a","b",..."element number 100")
return xdmp:random(100)
This query return one string, I want to return back 50 strings that are distinct from each other.

Easiest to order by xdmp:random() and limit to the first 50:
(for $x in (1 to 100)
order by xdmp:random()
return $x
)[1 to 50]

xdmp:random() (as well as xdmp:elapsed-time()) return different values at each call. It would be rather impractical if it wouldn't. This opposed to for instance fn:current-dateTime() which gives the same value throughout one execution run.
Ghislain is making a good first attempt, but as also pointed out by BenW, even though xdmp:random() does return different results each time, it is not said they are unique throughout one execution run. Collisions on its 64-bit max scale are rare (though still possible), but on small scale like 10's or 100's it is likely to have some accidental repetition. It is wise to eliminate the texts from the list once chosen.
BenW beat me to posting an alternative to Ghislain, and it looks similar, but uses less lines. Posting it anyhow, in the hope someone finds it useful:
declare function local:getRandomTexts($list, $count) {
if ($count > 0 and exists($list)) then
let $random := xdmp:random(count($list) - 1) + 1
let $text := $list[$random]
return ($text, local:getRandomTexts($list[. != $text], $count - 1))
else ()
};
let $list :=
for $i in (1 to 26)
return fn:codepoints-to-string(64 + $i)
for $t in local:getRandomTexts($list, 100)
order by $t
return $t
HTH!

If you are saying the 50 strings must be distinct from one another, as in, no repeats allowed, then even if xdmp:random() does return different values when called repeatedly in the same query, getting 50 random positions in the same list is not sufficient, because there may be repeats. You need to get random positions from 50 different lists.
declare function local:pickSomeFromList($some as xs:integer, $listIn as xs:string*, $listOut as xs:string*) as xs:string* {
if($some = 0 or not($listIn)) then $listOut
else
let $random := xdmp:random(count($listIn) - 1) + 1
return local:pickSomeFromList(
$some - 1,
($listIn[fn:position() lt $random],$listIn[fn:position() gt $random]),
($listOut, $listIn[$random])
)
};
let $list := ("a","b","c","d","e","f","g","h","i","element number 10")
return local:pickSomeFromList(5, $list, ())

Assuming it returns a different result at every call (but I cannot tell from the documentation of xdmp:random whether it is the case), the following code returns 50 strings from the list picked at random (but not necessarily distinct):
let $list := ("a","b",..."element number 100")
for $i in 1 to 50
let $position = 1 + xdmp:random(99)
return $list[$position]
However, the exact behavior of xdmp:random, that is, whether it returns identical results across calls, depends on how the MarkLogic engine supports or treats nondeterministic behavior, which is outside of the scope of the XQuery specification. Strict adherence to the specification would actually return 50 times the same result with the above query.
XQuery 3.1 provides a random number generator with which you can control the seed. This allows you to generate as many numbers as you want by chaining calls, while only using interoperable behavior and staying within a fully deterministic realm.
Edit: here is a query (still assuming calls to xdmp:random are made each time) that should make sure that 50 distinct strings from the list are taken following grtjn's remark. It uses a group by clause and relies on a lazy evaluation for taking the first 50.
let $list := ("a","b",..."element number 100")
let $positions := (
for $i in 1 to 100000 (: can be adjusted to make sure we get 50 distinct :)
group by $position = 1 + xdmp:random(count($list) - 1)
return $position
)[position() le 50]
return $list[position() = $positions]
I think hunterhacker's proposal for computing the $positions is even better though.

Related

TACL How to use multiple Arguments

I was wondering if we have any TACL experts out there can can help me answer probably a very basic question.
How do you inject multiple arguments into you routine.
This is what I have currently so far
[#CASE [#ARGUMENT / VALUE job_id/number /minimum [min_job], maximum [max_job]/
otherwise]
|1|#output Job Number = [job_id]
|otherwise|
#output Bad number - Must be a number between [min_job] & [max_job]
#return
]
I have been told you need to use a second #ARGUMENT statement to get it to work but I have had no such luck getting it to work. And the PDF guides don't help to much.
Any ideas/answers would be great
Thanks.
The #CASE statement isn't required if your arguments are positional and of one type (i.e. you know what you are getting and in what order). In that case you can just use a sequence of #ARGUMENT statements to get the arguments.
In your example #ARGUMENT accepts either a number in a range or anything else - the OTHERWISE bit. The #CASE statement then tells you which of those two you got, 1 or 2.
#ARGUMENT can do data validation for you (you may recognize the output from some of the TACL routines that come with the operating system).
So you can write something like this:
SINK [#ARGUMENT / VALUE job_id/number /minimum [min_job], maximum [max_job]/]
The SINK just tosses away the expansion of the #ARGUMENT, you don't need it since you only accept a number and fail otherwise.
I figured out a way but idk if it is the best way to do it.
It seems that for one an Argument statement needs to always be in a #CASE statement so all I basically did was mirror the above and just altered it for text rather than use integer.
If you know of any other/better ways let me know :)
It find it best to use CASE when you have multiple types of argument
input to process. Kind of mocked up how I would see multiple argument
types being used in the context that you shared with the CASE
expression:
?TACL ROUTINE
#FRAME
#PUSH JOB_ID MIN_JOB MAX_JOB
#SETMANY MIN_JOB MAX_JOB , 1 3
[#DEF VALID_KEYWORDS TEXT |BODY| THISJOB THATJOB SOMEOTHERJOB]
[#CASE
[#ARGUMENT/VALUE JOB_ID/
NUMBER/MINIMUM [MIN_JOB],MAXIMUM [MAX_JOB]/
KEYWORD/WORDLIST [VALID_KEYWORDS]/
STRING
OTHERWISE
]
| 1 |
#OUTPUT VALID JOB NUMBER = [JOB_ID]
| 2 |
#OUTPUT VALID KEYWORD = [JOB_ID]
| 3 |
#OUTPUT VALID STRING = [JOB_ID]
| OTHERWISE |
#OUTPUT NOT A NUMBER, KEYWORD, OR A STRING
#OUTPUT MUST BE ONE OF:
#OUTPUT A NUMBER IN THE RANGE OF: [MIN_JOB] TO [MAX_JOB]
#OUTPUT A KEYWORD IN THIS LIST: [VALID_KEYWORDS]
#OUTPUT OR A STRING OF CHARACTERS
#RETURN
]
#OUTPUT
#OUTPUT NOW WE ARE USING ARGUMENT [JOB_ID] !!!
TIME
#UNFRAME

Check if all elements of list are prime in Raku

my #g = (1,2,3,4);
say reduce {is-prime}, #g; # ==> gives error
say reduce {is-prime *}, #g; #==> gives error
say reduce {is-prime}, (1,2,3,4); # ==> gives error
say so is-prime #g.all; # ==> gives error
How to check if all elements of list are prime in Raku?
The answers above are all helpful, but they fail to explain why your solution does not work. Basically reduce is not going to apply a function (in your case, is-prime) to every member of a list. You want map for that. The error says
Calling is-prime() will never work with signature of the proto ($, *%)
Because reduce expects an infix, thus binary, function, or a function with two arguments; what it does is to apply them to the first pair of elements, then to the result and the third element, and so on. Last statement does not work for a similar reason: you are calling is-prime with a list argument, not a single argument.
You're basically asking: are there any elements in this list which are not prime? I would write that as:
say "not all prime" if #g.first: !*.is-prime;
Please note though, that apparently 1 is not considered prime according to the is-prime function:
say 1.is-prime; # False
so the first would trigger on the 1 in your example, not on the 4.
There are of course may ways to do this. A very explicit way is using a for loop:
for #g -> $g {
if $g.is-prime {
say $g;
}
}
Or with a grep (you could leave the $_ implicit):
#g.grep({ $_.is-prime }).say
Both above are assuming you really want to filter the primes out. Of course you can also really check each number and get a boolean:
#g.map({ .is-prime }).say
There is a big problem with this:
say reduce {is-prime}, #g;
You created a lambda:
{ }
The only thing it does is calls a function:
is-prime
You didn't give the function any arguments though.
Is it just supposed to guess what the arguments should be?
If you meant to pass in is-prime as a reference, you should have used &is-prime rather than {is-prime}.
Of course that still wouldn't have worked.
The other problem is that reduce operates by recursively combining values.
It can't do that if it operates on one argument at a time.
The bare block lambda {}, takes zero or one argument, not two or more.
reduce is often combined with map.
It happens so often that there is a Wikipedia page about MapReduce.
say ( map &is-prime, #g ==> reduce { $^a and $^b } );
# False
say ( map &is-prime, 2,3,5 ==> reduce { $^a and $^b } );
# True
I wrote it that way so that map would be in the line before reduce, but perhaps it would be more clear this way:
say reduce {$^a and $^b}, map &is-prime, 2,3,5;
# True
reduce with an infix operator is so common that there is a shorter way to write it.
say [and] map &is-prime, 2,3,5;
# True
Of course it would be better to just find the first value that isn't prime, and say the inverse.
Since if there is even a single value that isn't prime that would mean they can't all be primes.
You have to be careful though, as you may think something like this would always work:
not #g.first: !*.is-prime;
It does happen to work for the values you gave it, but may not always.
first returns Nil if it can't find the value.
not (2,3,5).first: !*.is-prime;
# not Nil === True
not (2,3,4).first: !*.is-prime;
# not 4 === False
not (2,3,0,4).first: !*.is-prime;
# not 0 === True
That last one returned 0 which when combined with not returns True.
You could fix this with defined.
not defined (2,3,0,4).first: !*.is-prime;
# False
This only works if first wouldn't return an undefined element that happens to be in the list.
(Int,Any).first: Real
# Int
defined (Int,Any).first: Real
# False
You could fix that by asking for the index instead of the value.
You of course still need defined.
(Int,Any).first: :k, Real
# 0
defined (Int,Any).first: :k, Real
# True
The other way to fix it is to just use grep.
not (2,3,0,4).grep: !*.is-prime;
# not (0,4) === False
Since grep always returns a List, you don't have to worry about checking for 0 or undefined elements.
(A List is True if it contains any elements, no matter what the values.)
grep is smart enough to know that if you coerce to Bool that it can stop upon finding the first value.
So it short-circuits the same as if you had used first.
This results in some fairly funky code, with those two negating operators. So it should be put into a function.
sub all-prime ( +#_ ) {
# return False if we find any non-prime
not #_.grep: !*.is-prime
# grep short-circuits in Bool context, so this will stop early
}
This could still fail if you give it something weird
all-prime 2,3,5, Date.today;
# ERROR: No such method 'is-prime' for invocant of type 'Date'
If you care, add some error handling.
sub all-prime ( +#_ ) {
# return Nil if there was an error
CATCH { default { return Nil }}
# return False if we find any non-prime
not #_.grep: !*.is-prime
}
all-prime 2,3,5, Date.today;
# Nil
use the all junction:
say so all #g».is-prime; # False

(XQuery/Conditions) Is it possible to declare variables in an if-statement?

I do not find an example for my problem so here is my question.
I get an error that else is an unexpected token in the following example:
let $var1 := 'true'
if ($var1 = 'true') then
let $var2 := 1
let $var3 := $var1 + 1
else ()
As you see I want to declare variables if the if-statement is true. Is this possible in XQuery? I just saw examples where the value of just one variable can depends on a condition. The following does more or less the same I want to realize with the code at the beginning.. and it works but it is a little bit confusing in my opinion and actually I don't want the variables to be created if the condition is not true. Furthermore you have to think around the corner when you realize it like that especially when there are more than just 2 variables that depends on each other.
let $var1 := 'true'
let $var2 := if ($var1 = 'true') then (1) else (0)
let $var3 := if ($var2 = 1) then ($var2 + 1) else (0)
So my question is. Is there a prettier solution to realize that than my solution?
You could add a return clause to put a full flwor expression inside the condition, e.g. something like this:
let $var := 'true'
if ($var = 'true') then
let $var2 := 1
let $var3 := $var1 + 1
return 0
else ()
But it would be pointless: the binding of $var2 and $var3 would not extend outside of the scope of the then clause.
XQuery is a declarative and functional language, which means that variables do not get assigned, but only bound within a certain scope. This is something that should be thought about in term of space, not time, as there is no elapse of time in an XQuery program, like a ticket allows you to visit a museum but not another.
Let clauses are part of FLWOR (acronym for for-let-where-orderby-return) expressions. A variable bound in a let clause can be used in subsequent clauses, up to and including the return clause. As mholstege explains, beyond the return clause, which is required, the variable is not visible any more, like nobody would accept your ticket outside the museum.
Since expressions nest in a "well-parenthesized" way according to the XQuery grammar, any attempt to start a let clause inside an if-then-else expression requires that a return clause be present before the then (or else) expression ends. This means that a variable bound this way will never be visible after this if-then-else expression.
In general, when I program in XQuery (as opposed to, say, Java), I try to remind myself continuously that I have to write down what I want, and resist the temptation to describe how I want it computed.
Having said that, XQuery does have scripting extensions that introduce variable assignments as you describe, but this did not get standardized so far -- also, such a scripting extension should only be used when side effects to the outside world happen, meaning that one needs a notion of time and succeeding snapshots.
You could avoid using if/else altogether by defining sequences for your possible values, and a predicate that calculates the position() to select the desired value from the sequence:
The following uses number() to evaluate the numeric value of a boolean (0 for false, 1 for true) and selects either the first or the second item in the sequence of values:
let $var1 := 'true'
let $var2 := (0, 1)[number($var1 = 'true') + 1]
let $var3 := (0, $var2 + 1)[number($var2 eq 1) + 1]
return ($var1, $var2, $var3)

cts:value-match on xs:dateTime() type in Marklogic

I have a variable $yearMonth := "2015-02"
I have to search this date on an element Date as xs:dateTime.
I want to use regex expression to find all files/documents having this date "2015-02-??"
I have path-range-index enabled on ModifiedInfo/Date
I am using following code but getting Invalid cast error
let $result := cts:value-match(cts:path-reference("ModifiedInfo/Date"), xs:dateTime("2015-02-??T??:??:??.????"))
I have also used following code and getting same error
let $result := cts:value-match(cts:path-reference("ModifiedInfo/Date"), xs:dateTime(xs:date("2015-02-??"),xs:time("??:??:??.????")))
Kindly help :)
It seems you are trying to use wild card search on Path Range index which has data type xs:dateTime().
But, currently MarkLogic don't support this functionality. There are multiple ways to handle this scenario:
You may create Field index.
You may change it to string index which supports wildcard search.
You may run this workaround to support your existing system:
for $x in cts:values(cts:path-reference("ModifiedInfo/Date"))
return if(starts-with(xs:string($x), '2015-02')) then $x else ()
This query will fetch out values from lexicon and then you may filter your desired date.
You can solve this by combining a couple cts:element-range-querys inside of an and-query:
let $target := "2015-02"
let $low := xs:date($target || "-01")
let $high := $low + xs:yearMonthDuration("P1M")
return
cts:search(
fn:doc(),
cts:and-query((
cts:element-range-query("country", ">=", $low),
cts:element-range-query("country", "<", $high)
))
)
From the cts:element-range-query documentation:
If you want to constrain on a range of values, you can combine multiple cts:element-range-query constructors together with cts:and-query or any of the other composable cts:query constructors, as in the last part of the example below.
You could also consider doing a cts:values with a cts:query param that searches for values between for instance 2015-02-01 and 2015-03-01. Mind though, if multiple dates occur within one document, you will need to post filter manually after all (like in option 3 of Navin), but it could potentially speed up post-filtering a lot..
HTH!

Splitting a list of strings into 2 lists, one containing single values and the other multiples

I have a large list of strings, it's a TStringList and is sorted by key. The structure is ('Key', Obj).
In this list there are single and repeated key values. I'm trying to split them into two separate lists, one for the single values and one for the repeated ones.
If my initial list is {A,A,A,B,B,C,D,E,E,F} then the result should be a list of singles = {C,D,F} and a list of repeats = {A,B,E}.
I've tried many different variations of the same code to try and get it to work but I'm having problems, a lot of them. :)
I'm getting key[i] and key[i+1] and comparing them, if they're the same I save [i+1] into a temp string and set a bool value, I then run some conditional checks to determine what list it should go in but just failing at the moment.
It seems like it should be such an easy thing to accomplish and I am somewhat embarrassed to have to ask. Any help is greatly appreciated, thank you.
StartIndex := 0;
for i := 1 to List.Count - 1 do
if List[i].Key <> List[StartIndex].Key then begin
if i - StartIndex = 1 then
SingleList.Add(List[StartIndex])
else
MultiList.Add(List[StartIndex])
StartIndex := i;
end;
//check for the last chunk
if StartIndex = List.Count - 1 then
SingleList.Add(List[StartIndex])
else
MultiList.Add(List[StartIndex])
I suspect that the mistake you're making is that you don't select correct index for the next comparision in case key[i] = key[i+1]. So if input is A,A,A,B,B you first compare i := 0 => [0]=[1] (you compare first two A) which is True. Then you probably proceed with i := 2 which means that you compare last A to the first B which causes the bug. Solution is to move index to the next value after last key witch equals to the current matching key, ie something like
while(list.Count > i)and(list.keys[i] = currentKey)do Inc(i);
at the appropriate place should make sure that you make next key comparision with correct list items.