Cumulative sum while value I'm summing is unchanged - stata

I have the following data structure
Y
cum_sum
1
1
1
1
1
1
0
1
0
1
1
1
0
1
1
1
1
1
I would like to have cum_sum change so that it calculates the cumulative sum while Y is unchanged, so that the data is:
Y
cum_sum
1
1
1
2
1
3
0
1
0
2
1
1
0
1
1
1
1
2
Not sure how to do it and I've tried searching but the phrasing I'm using leads me to different questions

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(y cum_sum)
1 1
1 1
1 1
0 1
0 1
1 1
0 1
1 1
1 1
end
replace cum_sum = cum_sum + cum_sum[_n-1] if y == y[_n-1]

Related

Coder for a Hamming code of arbitrary length (binary matrix-vector multiplication, on a CPU, platform independent)

The problem seems quite suited for a GPU, FPGA, etc. (because it's quite parallel); but I'm looking for a CPU-based and somewhat architecture independent solution right now. I think a good answer could be just some unimplemented pseudo-code, but my program is in pure C++20, so the answer should be relevant in that context (e.g., don't assume something very-high-level like Python, don't use compiler specific intrinsics or assembly). I'm not expecting mind-blowing performance, but I do want the answer to be significantly faster than the three implementations I already have (in this file): a very naive approach without generator matrices, and a naive "multiply input vector with dense generator matrix" approach. The answer should work for arbitrary code word and input lengths, but the important code word lengths are under, say, 2000 bits, and small input lengths are not important.
Some preliminaries: the binary numbers in question have addition and multiplication defined as, respectively, the "exclusive-or" (XOR) and "and" logical/bitwise operations. This extends to binary matrix multiplication.
Hamming codes are old and well-known binary linear block error detecting/correcting codes. Each code word is a string of bits where some bit-positions are designated as parity bits, used for error detection and correction, while the rest of the bits are data bits, which are just copies of the input bits if there was no error. We are considering only Hamming codes where the parity bits are at traditional power-of-two positions (i.e., with 1-based numbering: bit 1, bit 2, bit 4, bit 8, ...). Thus each possible code can be determined using its length n (number of bits in a code word) or its rank k (number of data bits in a code word). A Hamming code can be referred to as (n, k), e.g., (7, 4) or (40, 34).
Each code has a generator matrix, a binary matrix with which an input vector can be multiplied to obtain a code word. Thus the set of the code words of a certain code is exactly the set of linear combinations of the rows of the generator matrix.
The desired program is basically a coder: it takes as input an (n, k) pair to give the code (yes, this is redundant - only one of the pair is needed in essence) and an arbitrary binary message, divides the message into k-bits long sub-messages and outputs a sequence of n-bits long code words, each encoding one sub-message.
I'm hoping for an answer leveraging properties specific to our generator matrices here (e.g., special representation for the generator matrix and special vector-matrix multiplication algorithm), so here are examples of generator matrices for some codes:
Hamming code (3, 1) (has only the code words 000 and 111):
111
Hamming code (5, 2) (has only the code words 00000, 11100, 10011 and 01111):
11100
10011
Hamming code (6, 3):
111000
100110
010101
(Notice how each generator matrix contains the generator matrices of all the smaller codes.)
Hamming code (150, 142) (all zeros left blank so the ones would stand out more):
111
1 11
1 1 1
11 1 1
1 11
1 1 1
11 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 11
1 1 1
11 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 11
1 1 1
11 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
11 1 1 1 1 1
1 11
1 1 1
11 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
11 1 1 1 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
11 1 1 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
11 1 1 1 1 1
1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
11 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
11 1 1 1 1 1 1
1 11
1 1 1
11 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
11 1 1 1 1
1 1 1
1 1 1 1
1 1 1 1
11 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1 1
Notice how there are relatively few ones among all the zeros in most generator matrices, and there's definitely a pattern, shape even, to the matrices.
I'm weak in all relevant areas here, so please try to correct any possible mistakes I made.

Regular expression to distinguish between single and multiple digit numbers

I have a regular expression for capturing repeating numerical patterns in a string of number. However, it is not able to distinguish between single and multiple digits within a number.
Given a string:
0 5 0 0 0 16 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 11 1 1 1 11 1 1 1 1 1 1 1 2 11 1 4 4 4 16
and regular expression
(\d+)( \1)+
the match result is
0 5 0 0 0 16 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 11 1 1 1 11 1 1 1 1 1 1 1 2 1 1 1 4 4 4 16
The regex is not able to distinguish between 1 and 11.
(Note: 11 could also be a repeating number and maximum 3 digits are possible in a number)
You need to add a word boundary to regex. For example:
(\b\d+)( \1\b)+
See https://regex101.com/r/ZSCMjF/1

J how to make a shape of random numbers

I'm trying to make a shape of random numbers (0 or 1) in this case as I'm trying to create a minesweeper field.
I've tried using the "?" symbol for random to receive it but it normally turns into an unrandom, repeated pattern which for my purposes is unsatisfactory:
5 5 $ ? 0 1
0 1 0 1 0
1 0 1 0 1
0 1 0 1 0
1 0 1 0 1
0 1 0 1 0
Because of this, I tried other ways like pulling numbers from an index (this is called roll). But this returns random decimals. Other small changes to the code also resulted in these random decimals.
I've done this a few times myself. The key thing is when you apply the ?. You get the result that you want if you apply it after the matrix has been created.
We know that ?2 returns a 1 or a 0 value generated randomly.
? 2
0
? 2
1
? 2
0
So if we create a 5X5 matrix of 2's
5 5 $ 2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
then we apply ? to each 2 in the matrix you get the random 1 or 0 for each position.
? 5 5 $ 2 NB. first 5 X 5 matrix of random 1's and 0's
0 0 0 1 1
1 1 1 0 1
0 0 0 0 1
1 1 1 1 0
1 1 1 0 0
? 5 5 $ 2 NB. different 5 X 5 matrix of random 1's and 0's
0 0 0 1 1
1 0 1 1 0
0 0 0 1 1
1 0 0 1 0
1 1 1 0 0

is clojure.jdbc/insert! done in batch mode or one by one?

After I use clojure.jdbc/insert! to insert some data, it printed many "1", so I am wondering whether the insert is done in batch mode which has better performance or done one by one which is slow. We'd better to have it run like java jdbc batch insert.
clojurewerkz.testcom.core=> (time (apply (partial j/insert! postgres-db 'test_clojure [:a :b :c :d :e]) (map #(process-row % constraints) (repeat 10000 row))))
"Elapsed time: 540.111482 msecs"
(1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
clojure.jdbc/insert! eventually calls
(apply db-do-prepared db transaction? (first stmts) (rest stmts))
which calls db-do-execute-prepared-statement so it appears your answer is yes! it does them in batches.

How to rearrange vector to be cols not rows?

I am solving systems of equations using Armadillo. I make a matrix from one array of doubles, specifying the rows and columns. The problem is that it doesn't read it the way I make the array, (it's a vector but then converted to an array) so I need to manipulate the vector.
To be clear, it takes a vector with these values:
2 0 0 0 2 1 1 1 0 1 1 0 3 0 0 1 1 1 1 0 0 1 0 1 2
And it makes this matrix:
2 1 1 1 0
0 1 0 1 1
0 1 3 1 0
0 0 0 1 1
2 1 0 0 2
But I want this matrix:
2 0 0 0 2
1 1 1 0 1
1 0 3 0 0
1 1 1 1 0
0 1 0 1 2
How do I manipulate my vector to make it like this?
I feel as if you are looking for a transposition of a matrix. There is relevant documentation here.