Sitecore 7 ContentSearch - Sort by random - sitecore

Using the ContentSearch Linq API in Sitecore 7, how might I go about efficiently taking a random selection of, say, 3 search results from around 1500 potential results?
So far I'm considering using the API to return an entire list of IDs (seeing as 1500 results isn't that large), and then doing the rest in code.
Can somebody point me in the right direction of what I'd need to do to be able to achieve this directly from Lucene?

If you're dealing with a smaller subset of items, it might be easiest for you to randomly shuffle the resultset of SkinnyItems using Fisher-Yates or any other shuffling algorithm.
To shuffle an array a of n elements (indices 0..n-1):
for i from n − 1 downto 1 do
j ← random integer with 0 ≤ j ≤ i
exchange a[j] and a[i]
Source
I'm not too familiar with Sitecore 7 yet, so if there's an easier way to do it I hope someone can provide it.

You can try the custom sort option as described here: Lucene 2.9.2: How to show results in random order?
This, however did not perform any better than randomizing all the results, in our experience...
For that there are several options: Linq to Entities, random order.

Stevie, have a read of this question and answer which might give you some inspiration as to how to go about it.
Also recommend reading this article on Sitecore Community as suggested by Stephen Pope

Related

Riddle puzzle in clingo

So in the tag prolog someone wanted to solve the "the giant cat army riddle" by Dan Finkel (see video / Link for description of the puzzle).
Since I want to improve in answer set programming I hereby challenge you to solve the puzzle more efficient than me. You will find my solution as answer. I'll accept the fastest running answer (except if it's using dirty hacks).
Rules:
hardcoding the length of the list (or something similar) counts as dirty hack.
The output has to be in the predicate r/2, where it's first argument is the index of the list and the second its entry.
Time measured is for the first valid answer.
num(0..59).
%valid operation pairs
op(N*N,N):- N=2..7.
% no need to add operations that start with 14
op(Ori,New):- num(Ori), New = Ori+7, num(New), Ori!=14.
op(Ori,New):- num(Ori), New = Ori+5, num(New), Ori!=14.
%iteratively create new numbers from old numbers
l(0,0).
{l(T+1,New) : op(Old,New)} = 1 :- l(T,Old), num(T+1), op(Old,_).
%no number twice
:- 2 #sum {1,T : l(T,Value)}, num(Value).
%2 before 10 before 14
%linear encoding
reached(T,10) :- l(T,10).
reached(T+1,10) :- reached(T,10), num(T+1).
:- reached(T,10), l(T,2).
:- l(T,14), l(T+1,_).
%looks nicer, but quadratic
%:- l(T2,2), l(T10,10), T10<T2.
%:- l(T14,14), l(T10,10), T14<T10.
%we must have these three numbers in the list somewhere
:- not l(_,2).
:- not l(_,10).
:- not l(_,14).
#show r(T,V) : l(T,V).
#show.
Having a slightly more ugly encoding improves grounding a lot (which was your main problem).
I restricted op/2 to not start with 14, as this should be the last element in the list
I do create the list iteratively, this may not be as nice, but at least for the start of the list it already removed impossible to reach values via grounding. So you will never have l(1,33) or l(2,45) etc...
Also list generation stops when reaching the value 14, as no more operation is possible/needed.
I also added a linear scaling version of the "before" section, although it is not really necessary for this short list (but a cool trick in general if you have long lists!) This is called "chaining".
Also note that your show statement is non-trivial and does create some constraints/variables.
I hope this helps, otherwise feel free to ask such questions also on our potassco mailing list ;)
My first attempt is to generate a permutation of numbers and force successor elements to be connected by one of the 3 operations (+5, +7 or sqrt). I predefine the operations to avoid choosing/counting problems. Testing for <60 is not necessary since the output of an operation has to be a number between 0 and 59. The generated List l/2 is forwarded to the output r/2 until the number 14 appears. I guess there is plenty of room to outrun my solution.
num(0..59).
%valid operation pairs
op(N*N,N):- N=2..7.
op(Ori,New):- num(Ori), New = Ori+7, num(New).
op(Ori,New):- num(Ori), New = Ori+5, num(New).
%for each position one number
l(0,0).
{l(T,N):num(N)}==1:-num(T).
{l(T,N):num(T)}==1:-num(N).
% following numbers are connected with an operation until 14
:- l(T,Ori), not op(Ori,New), l(T+1,New), l(End,14), T+1<=End.
% 2 before 10 before 14
:- l(T2,2), l(T10,10), T10<T2.
:- l(T14,14), l(T10,10), T14<T10.
% output
r(T,E):- l(T,E), l(End,14), T<=End.
#show r/2.
first Answer:
r(0,0) r(1,5) r(2,12) r(3,19) r(4,26) r(5,31) r(6,36) r(7,6)
r(8,11) r(9,16) r(10,4) r(11,2) r(12,9) r(13,3) r(14,10) r(15,15)
r(16,20) r(17,25) r(18,30) r(19,37) r(20,42) r(21,49) r(22,7) r(23,14)
There are multiple possible lists with different length.

Sentiment analysis with association rule mining

I am trying to come up with an algorithm to find top-3 most frequently used adjectives for the product in the same sentence. I want to use association rule mining(Apriori algorithm).
For that I am planning of using the twitter data. I can more or less decompose twits in to sentences and then with filtering I can find product names and adjectives with it.
For instance, after filtering I have data like;
ipad mini, great
ipad mini, horrible
samsung galaxy s2, best
...
etc.
Product names and adjectives are previously defined. So I have a set of product names and set of adjectives that I am looking for.
I have read couple of papers about sentimental analysis and rule mining and they all say Apriori algorithm is used. But they don't say how they used it and they don't give details.
Therefore how can I reduce my problem to association rule mining problem?
What values should I use for minsup and minconf?
How can I modify Apriori algorithm to solve this problem?
What I' m thinking is;
I should find frequent adjectives separately for each product. Then by sorting I can get top-3 adjectives. But I do not know if it is correct.
Finding the top-3 most used adjectives for each product is not association rule mining.
For Apriori to yield good results, you must be interested in itemsets of length 4 and more. Apriori pruning starts at length 3, and begins to yield major gains at length 4. At length 2, it is mostly enumerating all pairs. And if you are only interested in pairs (product, adjective), then apriori is doing much more work than necessary.
Instead, use counting. Use hash tables. If you really have Exabytes of data, use approximate counting and heavy hitter algorithms. (But most likely, you don't have exabytes of data after extracting those pairs...)
Don't bother to investigate association rule mining if you only need to solve this much simpler problem.
Association rule mining is really only for finding patterns such as
pasta, tomato, onion -> basil
and more complex rules. The contribution of Apriori is to reduce the number of candidates when going from length n-1 -> n for length n > 2. And it gets more effective when n > 3.
Reducing your problem to Association Rule Mining (ARM)
Create a feature vector having all the topics and adjectives. If a feed contains topic then place 1 for it else 0 in tuple. For eg. Let us assume Topics are Samsung and Apple. And Adjectives are good and horrible. And feed contains Samsung good. Then corresponding tuple for it is :
Samsung Apple good horrible
1 0 1 0
Modification to Apriori Algorithm required
generate Association Rules of type 'topic' --> 'adjective' using constrained apriori algorithm. 'topic' --> 'adjective' is a constraint.
How to set MinSup and MinConf :
Read a paper entitled "Minin top-k association rules". Implement that with k=3 for 3 top adjectives.

Simple Curve Fitting Implimentation in C++ (SVD Least Sqares Fit or similar)

I have been scouring the internet for quite some time now, trying to find a simple, intuitive, and fast way to approximate a 2nd degree polynomial using 5 data points.
I am using VC++ 2008.
I have come across many libraries, such as cminipack, cmpfit, lmfit, etc... but none of them seem very intuitive and I have had a hard time implementing the code.
Ultimately I have a set of discrete values put in a 1D array, and I am trying to find the 'virtual max point' by curve fitting the data and then finding the max point of that data at a non-integer value (where an integer value would be the highest accuracy just looking at the array).
Anyway, if someone has done something similar to this, and can point me to the package they used, and maybe a simple implementation of the package, that would be great!
I am happy to provide some test data and graphs to show you what kind of stuff I'm working with, but I feel my request is pretty straightforward. Thank you so much.
EDIT: Here is the code I wrote which works!
http://pastebin.com/tUvKmGPn
change size to change how many inputs are used
0 0
1 1
2 4
4 16
7 49
a: 1 b: 0 c: 0
Press any key to continue . . .
Thanks for the help!
Assuming that you want to fit a standard parabola of the form
y = ax^2 + bx + c
to your 5 data points, then all you will need is to solve a 3 x 3 matrix equation. Take a look at this example http://www.personal.psu.edu/jhm/f90/lectures/lsq2.html - it works through the same problem you seem to be describing (only using more data points). If you have a basic grasp of calculus and are able to invert a 3x3 matrix (or something nicer numerically - which I am guessing you do given you refer specifically to SVD in your question title) then this example will clarify what you need to do.
Look at this Wikipedia page on Poynomial Regression

The New Villa Acm solution strategy

I am trying to solve this ACM problem The New Villa
and i am not figuring out how to approach this problem definitely its graph problem but doors and the room that have switches to other rooms are very confusing to make a generic solution. Can some body help me in defining the strategy for this problem.
Also i want some discussion forum for ACM problems if you know any one then please share.
Thanks
A.S
It seems like a pathfinding problem on states.
You can represent each vertex with a binary vector of size n + an indentifier - where which room you are in at the moment [n is the number of rooms].
G=(V,E) where V = {all binary vectors of size n and a recored for which room you are in} and E = {(u,v) | you can switch from binary vector u to v by clicking a button in the room you are in, or move to adjacent lights on room }
Now you only need to run a search algorithm on the possible paths.
Possible search algorithms:
BFS - simplest to program, though slowest run time
bi - directional BFS - since there is only one target node,
a bi-directional search will work here, it is expected to be much
faster then BFS
A* - find an admissible heurstic function and run
informed A* on the problem. It is harder to program it then the rest - but if you find a good heurisitc, it will most likely perform much better.
(*) All of the above are both complete [will find a solution if one exists] and optimal [will find the shortest solution, if one exists]
(*) This solution runs in exponential time on the number of rooms, but it should end up for d <= 10 as indicated in the problem in reasonable time.

C++ pathfinding with a-star, optimization

Im wondering if I can optimize my pathfinding code a bit, lets look at this map:
+ - wall, . - free, S - start, F - finish
.S.............
...............
..........+++..
..........+F+..
..........+++..
...............
The human will look at it and say its impossible, becouse finish is surrounded... But A-star MUST check all fields to ascertain, that there isnt possible road. Well, its not a problem with small maps. But when I have 256x265 map, it takes a lot of time to check all points. I think that i can stop searching while there are closed nodes arround the finish, i mean:
+ - wall, . - free, S - start, F - finish, X - closed node
.S.............
.........XXXXX.
.........X+++X.
.........X+F+X.
.........X+++X.
.........XXXXX.
And I want to finish in this situation (There is no entrance to "room" with finish). I thought to check h, and while none of open nodes is getting closer, then to finish... But im not sure if its ok, maybe there is any better way?
Thanx for any replies.
First of all this problem is better solved with breadth-first search, but I will assume you have a good reason to use a-star instead. However I still recommend you first check the connectivity between S and F with some kind of search(Breadth-first or depth-first search). This will solve our issue.
Assuming the map doesn't change, you can preprocess it by dividing it to connected components. It can be done with a fast disjoint set data structure. Then before launching A* you check in constant time that the source and destination belong to the same component. If not—no path exists, otherwie you run A* to find the path.
The downside is that you will need additional n-bits per cell where n = ceil(log C) for C being the number of connected components. If you have enough memory and can afford it then it's OK.
Edit: in case you fix n being small (e.g. one byte) and have more than that number of components (e.g. more than 256 for 8-bit n) then you can assign the same number to multiple components. To achieve best results make sure each component-id has nearly the same number of cells assigned to it.