J, the unfindable verb - list

1 0 0 1 verb 1 2 3 4
result:1 4
The verb drops the items from the list on the right that have a 0 in the list on the left. I can remember seeing this verb in the Vocabulary but I can't find it again. Does anybody know this verb?

It's #.
Explanation: Such verbs (1 or 2 symbols, rarely 3) are called primitives. The # primitive is called Tally as a monad (effectively tallies the items, returning the count on the first dimension), and Copy as a dyad, where it copies the right arguments as many times as indicated on the left argument. Of course, in this case, your right and left elements must be the same length (or that one of them is scalar if the other is not).
Example:
1 0 0 1 # 1 2 3 4
1 4

Related

Find and KEEP all DUPLICATE lines (instead of unique lines) in a text file

I am aiming to identify and keep DUPLICATE, TRIPLICATE, etc. lines, i.e., all lines that occur more than once in Notepad++? In other words, how can I delete all unique lines only?
For example, here are seven (7) separate lists and the desired true duplicate lines of each lists (shown as 7 columns, regard each column as an individual list or file!). (The lists here are shown side by side only to save space, in real life, each of the 7 lists occurs alone and independently from the others and are separate files!)
list1 list2 list3 list4 list5 list6 list7
1 0 0 0 0 0 0
2 1 1 1 1 1 1
3 2 2 2 2 2 2
4 3 3 3 3 3 3
4 4 4 4 4 4 4
4 4 4 4 4 4 4
5 4 4 4 4 4 4
6 5 5 5 5 5 5
7 5 5 5 5 5 5
8 6 6 6 6 6 6
9 6 6 6 6 6 6
abc 7 7 7 7 7 7
abd 8 8 8 8 8 8
abd 9 9 9 9 9 9
abe <CR> 9 9 9 9
<CR> 99 99
<CR>
[Lines of multiple occurence of above lists:]
4 4 4 4 4 4 4
4 4 4 4 4 4 4
4 4 4 4 4 4 4
abd 5 5 5 5 5 5
abd 5 5 5 5 5 5
6 6 6 6 6 6
6 6 6 6 6 6
9 9 9 9
9 9 9 9
There are many solutions to eliminate duplicates (e.g., TextFX; notepad++ delete duplicate and original lines to keep unique lines), I can not find solutions to keep duplicates only.
((.*)\R(\2\R)+)*\K.+\R
#Lars Fischer: This script works nearly OK, except the last entry of the (presorted) list needs to be unique line followed by a <CR> empty line. One (suboptimal) workaround is to insert an artificial (helper) unique line (e.g., zzz) followed by an empty line <CR> as the last two lines.
(END OF QUESTION)
UPDATE 3: This question is reposted per stackoverflow "ask a new question" instruction. (#AdrianHHH, #B. Desai, #Paolo Forgia, #greg-449, #Erik von Asmuth draw the incorrect conclusion that this question is a duplicate of notepad++ delete duplicate and original lines to keep unique lines. This question is definitely not a duplicate of the one #AdrianHHH et al quotes.
UPDATE 2: #AdrianHHH This question is not less "broad" (in fact, one can hardly be more specific) or less researched than other Notepad++ questions, including the one https://stackoverflow.com/questions/29303148 cited (wrongly) by #AdrianHHH et al. as the same question.
UPDATE:
#AdrianHHH, #B. Desai, #Paolo Forgia, #greg-449, #Erik von Asmuth
This questions is different from:
https://stackoverflow.com/questions/29303148
beacuse Q 29303148 is (i) neither asking how to identify and keep only the lines of multiple occurrence, (ii) neither there is a solution provided in the answers for that. Q 29303148 asks "...I just need the unique lines."
Here is a solution based on regular Expressions and bookmarks, it works for a sorted file (i.e. each duplicated line is followed by its duplicates):
Open the Mark Dialog (Search -> Mark ....)
click Clear all Marks on the right
check Bookmark line
check Wrap aound
Find What: ((.*)\R(\2\R?)+)*\K.*
Check regular expression and uncheck . matches newline
Mark All
Click Close
Search -> Bookmark -> Remove Bookmarked Lines
Explanation
The regular expression is made up of three parts:
((.*)\R(\2\R?)+)* : this is an optional block of duplicates consisting of one ore more line blocks
the outher ( ... )* matches zero or more such blocks of duplicated lines (if in your example the three 4 would be followed by two 5 we will need a concept of sequences of duplicate blocks)
(.*)\R(\2\R?)+: \2 references the content of (.*): this are all duplicates of one line
the second \R is an optional ( due to the ?) linebreak. Thus it is possible to match a duplicate in the last line of the file if that line does not end with a linebreak
If there is a block of duplicated lines after the cursor position from which you start, this will match it.
now \K discards what we have matched so far (the duplicates) and "puts the cursor" before the first unique line
.* matches the next (unique) line and bookmarks it
Using Mark All we bookmark all such unique lines, so that we can remove them using the Entry from the Search -> Bookmark menu.

Regex to check consecutive number with spaces

I am trying to find some Regular Expression that will identify a string:
Question 1
With more than 4 consecutive numbers followed by white space:
e.g.
123456
something 1 2 3 4 5 and 1 2 3 4 5
12 34 5
1234
Question 2
With more than 4 consecutive number words with white space:
e.g.
one two three four
one two three four five
onetwothreefour
Question 3
Is there any smart way to do this for other languages too?
Thanks
If you only want numbers between 0 and 9, these should do:
Q1: (?:[0-9]\s*){4,}
Q2: (?:(?:zero|one|two|three|four|five|six|seven|eight|nine)\s*){4,}
Can't think of anything smart now, but for example:
1 *2 *3 *4 *(5 *(6 *)?)?
1 *2 *3 *4 *(5 *(6 *)?)?|one *two *three *four *(five *(six *)?)?

Stata: looping over observations

My data set looks like this
x1
1
0
0
1
0
0
1
1
In this data set the values following 1 belongs to the same group. For example the first 2 zero belong to group 1 and the second 2 zeros belong to the second group an so on. And I would like to get a final output similar to this. Note that the delta between the two 1's is arbitrary:
x1 x2
1 1
0 1
0 1
1 2
0 2
0 2
1 3
1 4
I think I need to write a loop that goes over the observations. But I cannot figure out the logical statements that will accomplish this.
Either
gen x2 = sum(x1)
or
gen x2 = sum(x1 == 1)
is sufficient. There is a loop over observations tacit as usual there, but you don't need an explicit loop.
In detail, sum() here is a cumulative or running sum. In your case, the first solution is simple and adequate. The reason for mentioning the second solution is because it's more general: we can tag the first observation in each block or spell with 1 and then create a running sum to form blocks of 1s, 2s, and so forth.

Divide a large image into two non overlapping images whose union is the large image

Given a large image composed of smaller images stored as a matrix. I need to find out a boundary dividing the large image into two parts(not necessarily equal but preferably nearly equal) without cutting past a smaller image.
Each small image is represented by a single integer in the larger image matrix.
Ex:
1 1 2 2 2
1 1 2 2 2
3 3 3 4 4
3 3 3 4 4
is the large image matrix composed of 4 small images.
I need to find one such boundary to separate it into two smaller images such that their sizes don't differ by a very large amount.
This is my solution:
1. Start from considering the 1st row.
2. Using binary search find the start of a boundary. In above example it will be like
1 1 | 2 2 2
1 1 2 2 2
3 3 3 4 4
3 3 3 4 4
3.Proceed down until the dividing line doesn't intersect an image. If end of large image is reached then stop.
1 1 | 2 2 2
1 1 | 2 2 2
3 3 3 4 4
3 3 3 4 4
4.Again do step 1,2,3 considering the remaining rows and make horizontal line from old line to new division line.
1 1 | 2 2 2
1 1 | 2 2 2
--
3 3 3 4 4
3 3 3 4 4
1 1 | 2 2 2
1 1 | 2 2 2
-----
3 3 3 | 4 4
3 3 3 | 4 4
End of large image...Stop.
Of-course if no vertical line can be found in step 2. We can look for a horizontal line first in a similar way like in the case of:
1 1 1 1 1
1 1 1 1 1
--
3 3 3 2 2
3 3 3 2 2
and then proceed.
How can I improve on this solution?
Are there better solutions and will my algorithm fail anytime?
I will be coding in C++. A heuristic/ greedy solution will be nice as well.
If the image is somehow big enough to make sense then you could get local differences to guide your boundaries selection.
Here is an example implemented in MATLAB for simplicity but you will get the picture:
suppose we create an image similar to the one you defined:
img = [ ones(20,20), 2*ones(20,30); ones(10,20), 2*ones(10,30); 3*ones(20,30), 4*ones(20,20)]
This command creates an image 50x50, having a 20x30 sub-image 1, a 30x30 sub-image 2, a 30x20 sub-image 3 and a 20x20 sub-image 4, as depicted graphically bellow:
Ideally you would like to get the boundaries between these "trays" representing the values 1 to 4. One way to do so is to shift the image one pixel left/right and one pixel top/bottom and subtract it with the original. This will produce another image with values only in the boundary positions.
See for example in MATLAB:
mask=((img-shift(img,1) + img-shift(img',1)')~=0);
This will create a mask by adding the difference of the right-shifted image and the original with the difference of the bottom-shifted image and the original, and, finally, by comparing the result with zero (zero values will be all pixel values except in boundaries). Function shift just shifts values of a matrix right or left. There is no need to put the code here since I just want to show the concept.
So you will end-up with the following mask image:
This mask has been cropped one pixel at the right and bottom since the previous subtractions produces a border that is not needed.
In this image, true values (white pixels) are on the last pixel of the previous image, i.e. image 1 ends at the 1st boundary and image 2 begins at the next pixel, so image 1 is bounded by x=20 and y=30, and so on for the other sub-images.

simulate a deterministic pushdown automaton (PDA) in c++

I was reading an exercise of UVA, which I need to simulate a deterministic pushdown automaton, to see
if certain strings are accepted or not by PDA on a given entry in the following format:
The first line of input will be an integer C, which indicates the number of test cases. The first line of each test case contains five integers E, T, F, S and C, where E represents the number of states in the automaton, T the number of transitions, F represents the number of final states, S the initial state and C the number of test strings respectively. The next line will contain F integers, which represent the final states of the automaton. Then come T lines, each with 2 integers I and J and 3 strings, L, T and A, where I and J (0 ≤ I, J < E) represent the state of origin and destination of a transition state respectively. L represents the character read from the tape into the transition, T represents the symbol found at the top of the stack and A the action to perform with the top of the stack at the end of this transition (the character used to represent the bottom of the pile is always Z. to represent the end of the string, or unstack the action of not taking into account the top of the stack for the transition character is used <alt+156> £). The alphabet of the stack will be capital letters. For chain A, the symbols are stacked from right to left (in the same way that the program JFlap, ie, the new top of the stack will be the character that is to the left). Then come C lines, each with an input string. The input strings may contain lowercase letters and numbers (not necessarily present in any transition).
The output in the first line of each test case must display the following string "Case G:", where G represents the number of test case (starting at 1). Then C lines on which to print the word "OK" if the automaton accepts the string or "Reject" otherwise.
For example:
Input:
2
3 5 1 0 5
2
0 0 1 Z XZ
0 0 1 X XX
0 1 0 X X
1 1 1 X £
1 2 £ Z Z
111101111
110111
011111
1010101
11011
4 6 1 0 5
3
1 2 b A £
0 0 a Z AZ
0 1 a A AAA
1 0 a A AA
2 3 £ Z Z
2 2 b A £
aabbb
aaaabbbbbb
c1bbb
abbb
aaaaaabbbbbbbbb
this is the output:
Output:
Case 1:
Accepted
Rejected
Rejected
Rejected
Accepted
Case 2:
Accepted
Accepted
Rejected
Rejected
Accepted
I need some help, or any idea how I can simulate this PDA, I am not asking me a code that solves the problem because I want to make my own code (The idea is to learn right??), But I need some help (Some idea or pseudocode) to begin implementation.
You first need a data structure to keep transitions. You can use a vector with a transition struct that contains transition quintuples. But you can use fact that states are integer and create a vector which keeps at index 0, transitions from state 0; at index 1 transitions from state 1 like that. This way you can reduce searching time for finding correct transition.
You can easily use the stack in stl library for the stack. You also need search function it could chnage depending on your implementation if you use first method you can use a function which is like:
int findIndex(vector<quintuple> v)//which finds the index of correct transition otherwise returns -1
then use the return value to get newstate and newstack symbol.
Or you can use a for loop over the vector and bool flag which represents transition is found or not.
On second method you can use a function which takes references to new state and new stack symbol and set them if you find a appropriate transition.
For inputs you can use something like vector or vector depends on personal taste. You can implement your main method with for loops but if you want extra difficulties you can implement a recursive function. May it be easy.