Bubble sort with variable number of inputs - bubble-sort

I am working on a bubble sort program for the Little Man Computer and I want it to have a variable number of inputs (like 500), after which the program will stop taking inputs and will sort the values from least to greatest.
Note that zero should be accepted as a number in the bubble sort. So if the inputs are 3, 5, 6, 0 then it should sort them to 0, 3, 5, 6.

The idea is to reserve the very first input for the length of the rest of the input. This way you can know when all the values have been taken. So in your example:
3 5 6 0
The actual input values would have to be
4 3 5 6 0
...where 4 tells us that 4 data values are following.
So this means that the program would start with something like:
INP
BRZ quit ; nothing to do
STA size
; .... other code ....
quit HLT
size DAT
Then the code would need to use this size to initialise a counter, and take the remaining inputs
LDA size
SUB one
loop STA counter
INP ; take the next input
; .... process this value ....
LDA counter ; decrement the counter
SUB one
BRP loop ; while no underflow: repeat
; ... other processing on the collected input ...
quit HLT
counter DAT
When you have several -- possibly nested -- loops, like is the case with bubble sort, you'll have to manage multiple counters.
Applied to Bubble Sort
In this answer you'll find an implementation of Bubble Sort where the input needs to be terminated by a 0. Here I provide you a variation of that solution where 0 no longer serves as an input terminator, but where the first input denotes the length of the array of values that follows in the input.
Note that this makes the code somewhat longer, and as a consequence the space that remains for storing the input array becomes smaller: here only 25 mailboxes remain available for the array. On a standard LMC it would never be possible to store 500 inputs, as there are only 100 mailboxes in total, and code occupies some of these mailboxes.
In the algorithm (after having loaded the input), the outer loop needs to iterate size-1 times, and the inner loop needs to iterate one time less each time the outer loop makes an iteration (this is the standard principle of Bubble Sort).
#input: 10 4 3 2 1 0 9 8 5 6 7
LDA setfirst
STA setcurr1
INP
BRZ zero ; nothing to do
SUB one
STA size ; actually one less
input STA counter1
INP
setcurr1 STA array
LDA setcurr1
ADD one
STA setcurr1
LDA counter1
SUB one
BRP input
LDA size
BRA dec
sort STA counter1
LDA getfirst
STA getcurr1
STA getcurr2
LDA setfirst
STA setcurr2
LDA cmpfirst
STA cmpcurr
LDA counter1
loop STA counter2
LDA getcurr1
ADD one
STA getnext1
STA getnext2
LDA setcurr2
ADD one
STA setnext
getnext1 LDA array
cmpcurr SUB array
BRP inc
getcurr1 LDA array
STA temp
getnext2 LDA array
setcurr2 STA array
LDA temp
setnext STA array
inc LDA getnext1
STA getcurr1
LDA setnext
STA setcurr2
LDA cmpcurr
ADD one
STA cmpcurr
LDA counter2
SUB one
BRP loop
LDA counter1
dec SUB one
BRP sort
LDA size
output STA counter1
getcurr2 LDA array
OUT
LDA getcurr2
ADD one
STA getcurr2
LDA counter1
SUB one
BRP output
zero HLT
one DAT 1
getfirst LDA array
setfirst STA array
cmpfirst SUB array
size DAT
counter1 DAT
counter2 DAT
temp DAT
array DAT
<script src="https://cdn.jsdelivr.net/gh/trincot/lmc#v0.77/lmc.js"></script>

This is the final code and some basic information.
// Basic Outline
// 1) Initialize (may be empty)
// 2) Input Count
// 3) Handle Special Cases, GoTo 1 (will now be no special cases)
// 4) Input List
// 5) Sort the list (using Bubblesort)
// 6) Output List
// 7) GoTo 1
//
// Program uses an LMCe, same as an LMC except that it has an extra digit.
//The number of memory cells is thus 1000 and the range of values is from 0 to 9999.
//
// Memory Map
//
// 0 – 79 the program
// 80-87 unused (may be used to test sorting in LMCs)
// 88-99 constants and variables
// 100 – 999 the list to be sorted.
//
// INITIALIZE (This section is blank)
//
// INPUT COUNT
//
000 IN 9001 // input count
001 STO 090 3090 // store count
//
// SPECIAL CASES (This section is now blank)
//
// INPUT LIST
//
002 LDA 096 5096 // STO
003 ADD 095 1095 // Determine first location
004 STO 011 3011 // Overwrite STO instruction for list
005 ADD 090 1090
006 STO 092 3092 // Store STO + LOC + Count to determine end
//
// INPUT LIST LOOP
007 LDA 011 5013 // Load manipulated instruction (using as counter)
008 SUB 092 2092 //
009 BRZ 016 7016 // If last count, go to END INPUT LIST
010 IN 9001 //
011 DAT 0 // manipulated instruction (store input in list)
012 LDA 011 5011
013 ADD 098 1098 // increment store instruction (to next list location)
014 STO 011 3011 // Update STO instruction
015 BR 007 6007 // GOTO INPUT LIST LOOP
//
// END INPUT LIST
//
// BUBBLESORT
// Note: the ‘to’ is inclusive.
//
// for I = 0 to count – 1 do (may not be inclusive)
// for j = count – 1 downto I + 1 do (may be inclusive)
// if A[j] < A[j-1]
// then exchange A[j] and A[j-1]
// end do
// end do
//
// If count < 2, then skip bubble sort
016 LDA 098 5098
017 SUB 090 2090 // 1 – count
018 BRP 061 8061 //. GO TO END I LOOP
//
// Initialize ‘I’ Counter
019 LDA 099 5099
020 STO 092 3092 // set I to zero (0)
//
// START I LOOP
//
021 LDA 090 5090
022 SUB 098 2098 // COUNT - 1
023 SUB 092 1092 // COUNT -1 – I
024 BRZ 061 7061 // if(I == count - 1) GOTO END I LOOP
//
// Initialize J
025 LDA 090 5090
026 SUB 098 2098
027 STO 093 3093 // J = Count – 1
//
// START J LOOP
//
028 LDA 092 5092 // I
029 SUB 093 2093 // I - J
030 BRP 057 8057 // If I == j, then GO END J LOOP
//
// Compare A[j] and A[j-1]
//
// Load A[j] into variable
031 LDA 097 5097 // load LDA instruction numeric code
032 ADD 095 1095 // set to LDA 500
033 ADD 093 1093 // set to LDA [500 + j] or A[j]
034 STO 039 3039 // reset instruction
035 SUB 098 2098 // set to LDA [500 + j – 1] or A[j-1]
036 STO 037 3037 // reset instruction
//
// Load and compare A[j] and A[j-1]
037 DAT 0 // load A[j-1] (instruction is manipulated)
038 STO 088 3088
039 DAT 0 // load A[j] (instruction is manipulated)
040 STO 089 3089
041 SUB 088 2088 // A[j] – A[j-1] (swap if not positive)
042 BRP 053 8053 // GOTO DECREMENT J
//
// swap the variables
//
// set up the STO variables
043 LDA 096 5096 // load STO instruction code
044 ADD 095 1095 // set to STO 500
045 ADD 093 1093 // set to STO [500 + j]
046 STO 052 3052 // reset instruction
047 SUB 098 2098 // set to STO [500 + j – 1]
048 STO 050 3050 // reset instruction
//
// do the swap (no need for a variable since they are already stored)
049 LDA 089 5089 // load A[j]
050 DAT 0 // Store in A[j-1] (instruction is manipulated)
051 LDA 088 5088 // load A[j-1]
052 DAT 0 // Store in A[j] (instruction is manipulated)
//
// DECREMENT J
//
053 LDA 093 5093
054 SUB 098 2098
055 STO 093 3093 // J = J – 1
056 BR 028 6028 // GOTO START J LOOP
//
// END J LOOP
//
// Increment I
057 LDA 092 5092
058 ADD 098 1098
059 STO 092 3092 // I = I + 1
060 BR 021 6021 // GOTO START I LOOP
//
// END I LOOP (End Bubblesort)
//
// OUTPUT COUNT
//
061 LDA 090 5090 // Count
062 OUT 9002
//
// OUTPUT LIST (now sorted)
// Initialize
063 LDA 097 5097
064 ADD 095 1095 // LDA + LOC
065 STO 071 3071 // set up instruction
066 ADD 090 1090 // LDA + LOC + Count
067 STO 092 3092 // store unreachable instruction
//
// OUTPUT LIST LOOP
068 LDA 071 5071 // load manipulated instruction (used as counter)
069 SUB 092 2092
070 BRZ 077 7077 // GOTO END OUTPUT LOOP
071 DAT 0 // manipulated output
072 OUT 9002
073 LDA 071 5071
074 ADD 098 1098
075 STO 071 3071 // increment manipulated instruction
076 BR 068 6028 // GOTO OUTPUT LIST LOOP
//
// END OUTPUT LOOP
077 BR 0 6000 // Branch to top of loop (embedded)
//
// End of program
078 HLT 0 // (Should never hit this instruction)
//
// Variables
088 DAT 0 // A[j-1] value (also used for swapping)
089 DAT 0 // A[j] value (also used for swapping)
//
090 DAT 0 // count variable (input and output)
091 DAT 0 // unused
092 DAT 0 // ‘I’ counter
093 DAT 0 // ‘j’ counter
//
// Constants
094 DAT 0 // unused
095 DAT 500 // initial list location
096 DAT 3000 // STO instruction
097 DAT 5000 // LDA instruction
098 DAT 1 // one (constant)
099 DAT 0 // zero (constant)

Related

Average over number of variables where number of variables is dictated by separate column

I would like to create a new column whose values equal the average of values in other columns. But the number of columns I am taking the average of is dictated by a variable. My data look like this, with 'length' dictating the number of columns x1-x5 that I want to average:
data have;
input ID $ length x1 x2 x3 x4 x5;
datalines;
A 5 8 234 79 36 78
B 4 8 26 589 3 54
C 3 19 892 764 89 43
D 5 72 48 65 4 9
;
run;
I would like to end up with the below where 'avg' is the average of the specified columns.
data want;
input ID $ length avg
datalines;
A 5 87
B 4 156.5
C 3 558.3
D 5 39.6
;
run;
Any suggestions? Thanks! Sorry about the awful title, I did my best.
You have to do a little more work since mean(of x[1]-x[length]) is not valid syntax. Instead, save the values to a temporary array and take the mean of it, then reset it at each row. For example:
tmp1 tmp2 tmp3 tmp4 tmp5
8 234 79 36 78
8 26 589 3 .
19 892 764 . .
72 48 65 4 9
data want;
set have;
array x[*] x:;
array tmp[5] _temporary_;
/* Reset the temp array */
call missing(of tmp[*]);
/* Save each value of x to the temp array */
do i = 1 to length;
tmp[i] = x[i];
end;
/* Get the average of the non-missing values in the temp array */
avg = mean(of tmp[*]);
drop i;
run;
Use an array to average it by summing up the array for the length and then dividing by the length.
data have;
input ID $ length x1 x2 x3 x4 x5;
datalines;
A 5 8 234 79 36 78
B 4 8 26 589 3 54
C 3 19 892 764 89 43
D 5 72 48 65 4 9
;
data want;
set have;
array x(5) x1-x5;
sum=0;
do i=1 to length;
sum + x(i);
end;
avg = sum/length;
keep id length avg;
format avg 8.2;
run;
#Reeza's solution is good, but in case of missing values in x it will produce not always desirable result. It's better to use a function SUM. Also the code is little simplified:
data want (drop=i s);
set have;
array a{*} x:;
s=0; nm=0;
do i=1 to length;
if missing(a{i}) then nm+1;
s=sum(s,a{i});
end;
avg=s/(length-nm);
run;
Rather than writing your own code to calculate means you could just calculate all of the possible means and then just use an index into an array to select the one you need.
data have;
input ID $ length x1 x2 x3 x4 x5;
datalines;
A 5 8 234 79 36 78
B 4 8 26 589 3 54
C 3 19 892 764 89 43
D 5 72 48 65 4 9
;
data want;
set have;
array means[5] ;
means[1]=x1;
means[2]=mean(x1,x2);
means[3]=mean(of x1-x3);
means[4]=mean(of x1-x4);
means[5]=mean(of x1-x5);
want = means[length];
run;
Results:

Generate stepping numbers upto a given number N

A number is called a stepping number if all adjacent digits in the number have an absolute difference of 1.
Examples of stepping numbers :- 0,1,2,3,4,5,6,7,8,9,10,12,21,23,...
I have to generate stepping numbers upto a given number N. The numbers generated should be in order.
I used the simple method of moving over all the numbers upto N and checking if it is stepping number or not. My teacher told me it is brute force and will take more time. Now, I have to optimize my approach.
Any suggestions.
Stepping numbers can be generated using Breadth First Search like approach.
Example to find all the stepping numbers from 0 to N
-> 0 is a stepping Number and it is in the range
so display it.
-> 1 is a Stepping Number, find neighbors of 1 i.e.,
10 and 12 and push them into the queue
How to get 10 and 12?
Here U is 1 and last Digit is also 1
V = 10 + 0 = 10 ( Adding lastDigit - 1 )
V = 10 + 2 = 12 ( Adding lastDigit + 1 )
Then do the same for 10 and 12 this will result into
101, 123, 121 but these Numbers are out of range.
Now any number transformed from 10 and 12 will result
into a number greater than 21 so no need to explore
their neighbors.
-> 2 is a Stepping Number, find neighbors of 2 i.e.
21, 23.
-> generate stepping numbers till N.
The other stepping numbers will be 3, 4, 5, 6, 7, 8, 9.
C++ code to do generate stepping numbers in a given range:
#include<bits/stdc++.h>
using namespace std;
// Prints all stepping numbers reachable from num
// and in range [n, m]
void bfs(int n, int m)
{
// Queue will contain all the stepping Numbers
queue<int> q;
for (int i = 0 ; i <= 9 ; i++)
q.push(i);
while (!q.empty())
{
// Get the front element and pop from the queue
int stepNum = q.front();
q.pop();
// If the Stepping Number is in the range
// [n, m] then display
if (stepNum <= m && stepNum >= n)
cout << stepNum << " ";
// If Stepping Number is 0 or greater than m,
// need to explore the neighbors
if (stepNum == 0 || stepNum > m)
continue;
// Get the last digit of the currently visited
// Stepping Number
int lastDigit = stepNum % 10;
// There can be 2 cases either digit to be
// appended is lastDigit + 1 or lastDigit - 1
int stepNumA = stepNum * 10 + (lastDigit- 1);
int stepNumB = stepNum * 10 + (lastDigit + 1);
// If lastDigit is 0 then only possible digit
// after 0 can be 1 for a Stepping Number
if (lastDigit == 0)
q.push(stepNumB);
//If lastDigit is 9 then only possible
//digit after 9 can be 8 for a Stepping
//Number
else if (lastDigit == 9)
q.push(stepNumA);
else
{
q.push(stepNumA);
q.push(stepNumB);
}
}
}
//Driver program to test above function
int main()
{
int n = 0, m = 99;
// Display Stepping Numbers in the
// range [n,m]
bfs(n,m);
return 0;
}
Visit this link.
The mentioned link has both BFS and DFS approach.
It will provide you with explaination and code in different languages for the above problem.
We also can use simple rules to move to the next stepping number and generate them in order to avoid storing "parents".
C.f. OEIS sequence
#include <iostream>
int next_stepping(int n) {
int left = n / 10;
if (left == 0)
return (n + 1); // 6=>7
int last = n % 10;
int leftlast = left % 10;
if (leftlast - last == 1 & last < 8)
return (n + 2); // 32=>34
int nxt = next_stepping(left);
int nxtlast = nxt % 10;
if (nxtlast == 0)
return (nxt * 10 + 1); // to get 101
return (nxt * 10 + nxtlast - 1); //to get 121
}
int main()
{
int t = 0;
for (int i = 1; i < 126; i++, t = next_stepping(t)) {
std::cout << t << "\t";
if (i % 10 == 0)
std::cout << "\n";
}
}
0 1 2 3 4 5 6 7 8 9
10 12 21 23 32 34 43 45 54 56
65 67 76 78 87 89 98 101 121 123
210 212 232 234 321 323 343 345 432 434
454 456 543 545 565 567 654 656 676 678
765 767 787 789 876 878 898 987 989 1010
1012 1210 1212 1232 1234 2101 2121 2123 2321 2323
2343 2345 3210 3212 3232 3234 3432 3434 3454 3456
4321 4323 4343 4345 4543 4545 4565 4567 5432 5434
5454 5456 5654 5656 5676 5678 6543 6545 6565 6567
6765 6767 6787 6789 7654 7656 7676 7678 7876 7878
7898 8765 8767 8787 8789 8987 8989 9876 9878 9898
10101 10121 10123 12101 12121
def steppingNumbers(self, n, m):
def _solve(v):
if v>m: return 0
ans = 1 if n<=v<=m else 0
last = v%10
if last > 0: ans += _solve(v*10 + last-1)
if last < 9: ans += _solve(v*10 + last+1)
return ans
ans = 0 if n>0 else 1
for i in range(1, 10):
ans += _solve(i)
return ans

Unable to execute my hash table correctly/ SAS

I have a data step before the code below called "simulation_tracking3", that outputs something like:
CDFx Allowed_Claims
.06 120
.12 13
.15 1400
I want my hash table to average the Allowed_Claims based on a randomly generated value (from 0 to 1). For example, let's call this Process A, if Px = rand('Uniform',0,1) yields .09, I want it to average between the Allowed_Claims values where Px = .06 and Px = 0.12, which is (120+13)/2
The role of the array is that it dictates how many iterations of Process A I want. The array is
Members {24} _temporary_ (5 6 8 10 12 15 20 25 30 40 50 60 70 80
90 100 125 150 175 200 250 300 400 500);
So when the loop starts, it will perform 5 iterations of Process A, thereby producing 5 averaged "allowed_claims" values. I want the sum of these five claims.
Then, the loop will continue and perform 6 iterations of Process A and produce 6 averaged "allowed_claims" values. Again, I want the sum of these 6 claims.
I want the output table to look like:
`
Member[i] Average_Expected_Claims
5 (sum of 5 'averaged 'claims)
6 (sum of 6 'averaged' claims)
8 (sum of 8 'averaged' claims)
The code that I currently have is below. My errors occur here:
do rc = hi_iter.first() by 0 until (hi_iter.next()_ ne 0 or CDFx gt rand_value);
rc = hi_iter.prev();
The error says, respectively:
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &,
*, **, +, -, /, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN,
LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, ^=, |, ||, ~=.
Blockquote
ERROR: DATA STEP Component Object failure. Aborted during the
COMPILATION phase.
data simulation_members; *simulates allowed claims for each member in member array;
call streaminit(454);
array members [24] _temporary_ (5 6 8 10 12 15 20 25 30 40 50
60 70 80 90 100 125 150 175 200 250 300 400 500); *any number of members here is fine;
if _n_ eq 1 then do; * initiliaze the hash tables;
if 0 then set simulation_tracking3; * defines the variables used;
declare hash _iter(dataset:'simulation_tracking3', ordered: 'a'); *ordered = ascending - do not need a sort first;
_iter.defineKey('CDFx'); * key is artificial, but has to exist;
_iter.defineData('CDFx','Allowed_Claims'); * data variables to retrieve;
_iter.defineDone();
declare hiter hi_iter('_iter'); * the iterator object;
end;
do _i_member = 1 to dim(members); * iterate over members array;
call missing(claims_simulated);
do _i_simul = 1 to members[_i_member]-1;
rand_value = rand('Uniform',0,1);
do rc = hi_iter.first() by 0 until (hi_iter.next()_ ne 0 or CDFx gt rand_value);
end;
ac_max = allowed_claims;
rc = hi_iter.prev();
ac_min = allowed_claims;
claims_simulated + mean(ac_max,ac_min);
put rand_value= claims_simulated=; *just for logging;
end;
putlog;
output; *drop unnecessary columns;
end;
stop;
run;

The indices of non-zero bytes of an SSE/AVX register

If an SSE/AVX register's value is such that all its bytes are either 0 or 1, is there any way to efficiently get the indices of all non zero elements?
For example, if xmm value is
| r0=0 | r1=1 | r2=0 | r3=1 | r4=0 | r5=1 | r6=0 |...| r14=0 | r15=1 |
the result should be something like (1, 3, 5, ... , 15). The result should be placed in another _m128i variable or char[16] array.
If it helps, we can assume that register's value is such that all bytes are either 0 or some constant nonzero value (not necessary 1).
I am pretty much wondering if there is an instruction for that or preferably C/C++ intrinsic. In any SSE or AVX set of instructions.
EDIT 1:
It was correctly observed by #zx485 that original question was not clear enough. I was looking for any "consecutive" solution.
The example 0 1 0 1 0 1 0 1... above should result in either of the following:
If we assume that indices start from 1, then 0 would be a termination byte and the result might be
002 004 006 008 010 012 014 016 000 000 000 000 000 000 000 000
If we assume that negative byte is a termination byte the result might be
001 003 005 007 009 011 013 015 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF
Anything, that gives as a consecutive bytes which we can interpret as indices of non-zero elements in the original value
EDIT 2:
Indeed, as #harold and #Peter Cordes suggest in the comments to the original post, one of the possible solutions is to create a mask first (e.g. with pmovmskb) and check non zero indices there. But that will lead to a loop.
Your question was unclear regarding the aspect if you want the result array to be "compressed". What I mean by "compressed" is, that the result should be consecutive. So, for example for 0 1 0 1 0 1 0 1..., there are two possibilities:
Non-consecutive:
XMM0: 000 001 000 003 000 005 000 007 000 009 000 011 000 013 000 015
Consecutive:
XMM0: 001 003 005 007 009 011 013 015 000 000 000 000 000 000 000 000
One problem of the consecutive approach is: how do you decide if it's index 0 or a termination value?
I'm offering a simple solution to the first, non-consecutive approach, which should be quite fast:
.data
ddqZeroToFifteen db 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
ddqTestValue: db 0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1
.code
movdqa xmm0, xmmword ptr [ddqTestValue]
pxor xmm1, xmm1 ; zero XMM1
pcmpeqb xmm0, xmm1 ; set to -1 for all matching
pandn xmm0, xmmword ptr [ddqZeroToFifteen] ; invert and apply indices
Just for the sake of completeness: the second, the consecutive approach, is not covered in this answer.
Updated answer: the new solution is slightly more efficient.
You can do this without a loop by using the pext instruction from the Bit Manipulation Instruction Set 2 ,
in combination with a few other SSE instructions.
/*
gcc -O3 -Wall -m64 -mavx2 -march=broadwell ind_nonz_avx.c
*/
#include <stdio.h>
#include <immintrin.h>
#include <stdint.h>
__m128i nonz_index(__m128i x){
/* Set some constants that will (hopefully) be hoisted out of a loop after inlining. */
uint64_t indx_const = 0xFEDCBA9876543210; /* 16 4-bit integers, all possible indices from 0 o 15 */
__m128i cntr = _mm_set_epi8(64,60,56,52,48,44,40,36,32,28,24,20,16,12,8,4);
__m128i pshufbcnst = _mm_set_epi8(0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80, 0x0E,0x0C,0x0A,0x08,0x06,0x04,0x02,0x00);
__m128i cnst0F = _mm_set1_epi8(0x0F);
__m128i msk = _mm_cmpeq_epi8(x,_mm_setzero_si128()); /* Generate 16x8 bit mask. */
msk = _mm_srli_epi64(msk,4); /* Pack 16x8 bit mask to 16x4 bit mask. */
msk = _mm_shuffle_epi8(msk,pshufbcnst); /* Pack 16x8 bit mask to 16x4 bit mask, continued. */
uint64_t msk64 = ~ _mm_cvtsi128_si64x(msk); /* Move to general purpose register and invert 16x4 bit mask. */
/* Compute the termination byte nonzmsk separately. */
int64_t nnz64 = _mm_popcnt_u64(msk64); /* Count the nonzero bits in msk64. */
__m128i nnz = _mm_set1_epi8(nnz64); /* May generate vmovd + vpbroadcastb if AVX2 is enabled. */
__m128i nonzmsk = _mm_cmpgt_epi8(cntr,nnz); /* nonzmsk is a mask of the form 0xFF, 0xFF, ..., 0xFF, 0, 0, ...,0 to mark the output positions without an index */
uint64_t indx64 = _pext_u64(indx_const,msk64); /* parallel bits extract. pext shuffles indx_const such that indx64 contains the nnz64 4-bit indices that we want.*/
__m128i indx = _mm_cvtsi64x_si128(indx64); /* Use a few integer instructions to unpack 4-bit integers to 8-bit integers. */
__m128i indx_024 = indx; /* Even indices. */
__m128i indx_135 = _mm_srli_epi64(indx,4); /* Odd indices. */
indx = _mm_unpacklo_epi8(indx_024,indx_135); /* Merge odd and even indices. */
indx = _mm_and_si128(indx,cnst0F); /* Mask out the high bits 4,5,6,7 of every byte. */
return _mm_or_si128(indx,nonzmsk); /* Merge indx with nonzmsk . */
}
int main(){
int i;
char w[16],xa[16];
__m128i x;
/* Example with bytes 15, 12, 7, 5, 4, 3, 2, 1, 0 set. */
x = _mm_set_epi8(1,0,0,1, 0,0,0,0, 1,0,1,1, 1,1,1,1);
/* Other examples. */
/*
x = _mm_set_epi8(1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1);
x = _mm_set_epi8(0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0);
x = _mm_set_epi8(1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0);
x = _mm_set_epi8(0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,1);
*/
__m128i indices = nonz_index(x);
_mm_storeu_si128((__m128i *)w,indices);
_mm_storeu_si128((__m128i *)xa,x);
printf("counter 15..0 ");for (i=15;i>-1;i--) printf(" %2d ",i); printf("\n\n");
printf("example xmm: ");for (i=15;i>-1;i--) printf(" %2d ",xa[i]); printf("\n");
printf("result in dec ");for (i=15;i>-1;i--) printf(" %2hhd ",w[i]); printf("\n");
printf("result in hex ");for (i=15;i>-1;i--) printf(" %2hhX ",w[i]); printf("\n");
return 0;
}
It takes about five instructions to get 0xFF (the termination byte) at the unwanted positions.
Note that a function nonz_index that returns the indices and only the position of the termination byte, without actually
inserting the termination byte(s), would be much cheaper to compute and might be as suitable in a particular application.
The position of the first termination byte is nnz64>>2.
The result is:
$ ./a.out
counter 15..0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
example xmm: 1 0 0 1 0 0 0 0 1 0 1 1 1 1 1 1
result in dec -1 -1 -1 -1 -1 -1 -1 15 12 7 5 4 3 2 1 0
result in hex FF FF FF FF FF FF FF F C 7 5 4 3 2 1 0
The pext instruction is supported on Intel Haswell processors or newer.

Suggest an optimal algorithm for the sum of distances of the first match in two seqence

I have two list say L1 and L2, (minimum) sum of the lengths of the two lists.
For Example:
89 145 42 20 4 16 37 58 89
20 4 16 37 58 89
Output : 5
89 145 42 20 4 16 37 58 89
56 678 123 65467
Output : 0
19 82 68 100 1
100 1
Output : 5
Thanks,
PS: My language of choice is C and C++ hence the tag.
Add shorter list to hash (dictionary) key = number, value = index of first instance in list
Iterate through the other list and for each element try a lookup in the hash. When a match is made, add the indices together (value from hash plus current index in the list)
This runs in O(n)
boost::unordered_map or stdex::hash_map could be used for the hash
Here is a linear time algorithm using a hash table.
To start with hash elements of L1 (with element being the hash key and index being the value) if it is not already hashed.
Next, foreach element in L2 see if the element has been hashed, if yes print the sum of the index of the element in L2 and the hash value ( index of the same element in L1) and exit.
If no element of L2 is found in the hash table, print 0 and exit.
Algorithm:
foreach ele N in L1 at position pos
if N not in hash
hash[N] = pos
end-if
end-foreach
foreach ele N in L2 at position pos
if N in hash
print pos + hash[N]
exit
end-if
end-foreach
print 0
for (int sum = 0; sum < a.length + b.length - 1; sum++)
for (int i = 0; i < a.length && i <= sum; i++)
if(a[i] == b[sum-i])
return sum;
return -1;
This is O(1) in space and worst case O(n^2) in time. And best case O(1) in time! This algorithm is very quick for lists having a match in the first few elements.