Stata - Generate all possible combinations - tuples

I need to find all possible combinations of the following variables, each containing a X number of observations
Variable Obs
Black 1
Pink 2
Yellow 6
Red 15
Green 17
e.g. (black, pink), (black, pink, yellow), (black, pink, yellow, red), (red, green)....
Order is not important, so I must delete all the combinations that contain the same elements (black, pink) and (pink, black).
Also, at the end I would need to calculate the number of total observations per each combination.
What is the fastest method, which is also less prone to errors?
I read about Tuples but I am not able to write the code myself.

You can use tuples (to install ssc install tuples), like the example below. Note that I use postfile with a temporary name for the handle and temporary file for the results. After the loop is complete, I open the temporary file colors, and use gsort to sort in descending order.
tuples black pink yellow red green
scalar black=1
scalar pink=2
scalar yellow=6
scalar red=15
scalar green=17
tempname colors_handle
tempfile colors
postfile `colors_handle' str40 colors cnt using `colors', replace
forvalues i = 1/`ntuples' {
scalar sum = 0
foreach n of local tuple`i' {
scalar sum = sum + `n'
}
post `colors_handle' ("`tuple`i''") (sum)
}
postclose `colors_handle'
use `colors',clear
gsort -cnt
list
Output:
colors cnt
1. black pink yellow red green 41
2. pink yellow red green 40
3. black yellow red green 39
4. yellow red green 38
5. black pink red green 35
6. pink red green 34
7. black red green 33
8. red green 32
9. black pink yellow green 26
10. pink yellow green 25
11. black pink yellow red 24
12. black yellow green 24
13. yellow green 23
14. pink yellow red 23
15. black yellow red 22
16. yellow red 21
17. black pink green 20
18. pink green 19
19. black green 18
20. black pink red 18
21. green 17
22. pink red 17
23. black red 16
24. red 15
25. black pink yellow 9
26. pink yellow 8
27. black yellow 7
28. yellow 6
29. black pink 3
30. pink 2
31. black 1

Related

Reporting Multiple Response questions

I have a csv dataset like this where we asked favorite colors:
id q1 q2 q3
1 red blue green
2 blue green .
3 green . .
4 blue . .
5 . . .
Is PowerBI able to handle this type of reporting, I've seen recommendations to Unpivot the data which I could do BUT i would like to keep the results % based on respondents NOT on mentions, meaning % should be calculated by diving by 4 (people that answered a favorite color) son for example for RED result should be:
Green = 3/4 = 75% (based on 4 respondents)
Instead of
Green = 3/7 = 43% (based on 7 colors mentioned)
Thanks!
After unpivoting your sample data table looks like this:
ID
Attribute
Value
1
q1
red
1
q2
blue
1
q3
green
2
q1
blue
2
q2
green
3
q1
green
4
q1
blue
Now you can use this calculated table
% Colors =
VAR numIDs =
DISTINCTCOUNT('Table'[ID])
RETURN
SUMMARIZE(
'Table',
'Table'[Value],
"Pct", DIVIDE(COUNT('Table'[Value]), numIDs)
)
to get this result:

Excel - Drop down list within a formula

I am sure this is a easy formula but 1 am struggling, I have the following:
On tab 1 I want to enter a colour multiple times into column A using a drop down option, for example and I want to pull the how many information from a table on another sheet, so when I do my formula using xlookup (=XLOOKUP(A2,Sheet2!A2:A7,Sheet2!B2:B7)) it works for the top 4 options but not the rest. Can someone help? I ahve also tried the IF formula etc but with no success.
A B
Colours How Many
Black 17
Yellow 765
Purple 65
Orange 43
Red #N/A
Green #N/A
Purple #N/A
Orange #N/A
Sheet 2 table:
Colours How Many
Red 34
Black 17
Green 32
Yellow 765
Purple 65
Orange 43
I hope this make sense.
Thanks in advance
Wayne
I figured it out
=VLOOKUP(A2,Sheet2!$A$1:$B$7, 2, FALSE)

Pandas and reg ex, decompoising text and numbers into several columns with headings

I have a dataframe with a column containing:
1 Tile 1 up Red 2146 (75) Green 1671 (75)
The numbers 1 can be upto 10
up can be also be down
The 2146 and 1671 can be any digit upto 9999
Whats the best way to break out each of these into separate columns without using split. I was looking at regex but not sure how to handle this (especially the white spaces). I liked the idea of putting the new column names in too and started with
Pixel.str.extract(r'(?P<num1>\d)(?P<text>[Tile])(?P<Tile>\d)')
Thanks for any help
To avoid an overly complicated regex pattern, perhaps you can use str.extractall to get all numbers, and then concat to your current df. For up or down, use str.findall:
df = pd.DataFrame({"title":["1 Tile 1 up Red 2146 (75) Green 1671 (75)",
"10 Tile 10 down Red 9999 (75) Green 9999 (75)"]})
df = pd.concat([df, df["title"].str.extractall(r'(\d+)').unstack().loc[:,0]], axis=1)
df["direction"] = df["title"].str.findall(r"\bup\b|\bdown\b").str[0]
print (df)
#
title 0 1 2 3 4 5 direction
0 1 Tile 1 up Red 2146 (75) Green 1671 (75) 1 1 2146 75 1671 75 up
1 10 Tile 10 down Red 9999 (75) Green 9999 (75) 10 10 9999 75 9999 75 down

How to measure the length (in pixel) for each pole in an image

I want to measure the height and width of each individual pole in pixel.
But because the poles are not always stand straight, but i need the height of pole from the horizontal ground. Can anyone guide me how to handle this?
Note: I might need to get the angle it has slanted later on. Not sure I can ask so many question in here. But greatly appreciate if someone can help.
The image sample i have is at below link:
This should give you a good idea how to do it:
#!/usr/local/bin/python3
import cv2
# Open image in greyscale mode
img = cv2.imread('poles.png',cv2.IMREAD_GRAYSCALE)
# Threshold image to pure black and white AND INVERT because findContours looks for WHITE objects on black background
_, thresh = cv2.threshold(img,127,255,cv2.THRESH_BINARY_INV)
# Find contours
_, contours, _ = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
# Print the contours
for c in contours:
x,y,w,h = cv2.boundingRect(c)
print(x,y,w,h)
The output is this, where each line corresponds to one vertical bar in your image:
841 334 134 154 <--- bar 6 is 154 pixels tall
190 148 93 340 <--- bar 2 is 340 pixels tall
502 79 93 409 <--- bar 4 is 409 pixels tall
633 55 169 433 <--- bar 5 is 433 pixels tall
1009 48 93 440 <--- bar 7 is 490 pixels tall
348 48 93 440 <--- bar 3 is 440 pixels tall
46 46 93 442 <--- bar 1 is 442 pixels tall (leftmost bar)
The first column is the distance from the left edge of the image and the last column is the height of the bar in pixels.
As you seem unsure about whether you want to do this in Python or C++, you may prefer not write any code at all - in which case you can simply use ImageMagick which is included in most Linux distros and is available for macOS and Windows.
Basically, you use "Connected Component" analysis by typing this into the Terminal:
convert poles.png -colorspace gray -threshold 50% \
-define connected-components:verbose=true \
-connected-components 8 null:
Output
Objects (id: bounding-box centroid area mean-color):
0: 1270x488+0+0 697.8,216.0 372566 srgb(255,255,255)
1: 93x442+46+46 92.0,266.5 41106 srgb(0,0,0)
2: 93x440+348+48 394.0,267.5 40920 srgb(0,0,0)
3: 93x440+1009+48 1055.0,267.5 40920 srgb(0,0,0)
4: 169x433+633+55 717.3,271.0 40269 srgb(0,0,0)
5: 93x409+502+79 548.0,283.0 38037 srgb(0,0,0)
6: 93x340+190+148 236.0,317.5 31620 srgb(0,0,0)
7: 134x154+841+334 907.4,410.5 14322 srgb(0,0,0)
That gives you a header line which tells you what all the fields are, then a line for each of the blobs it found in the image. Disregard the first one because that is the white background - you can see that from the last field which is rgb(255,255,255).
So, if we look at the last line, it is a blob that is 134 pixels wide and 154 pixels tall, starting at x=841 and y=334 from the top-left corner, i.e. it corresponds to the first contour that OpenCV found.

Excel if statement comparing string values

I want to use the if formula to return value if the various conditions are met, eg. I have a supplier code, and item description and a rate, the Rate field is populated using vlookup from another table with only Supplier_code and Rate.
I then want to use a formula to only return a Rate, in the Actual_Rate's column with the item description doesn't continue a value.
Supplier_code Item Description Rate
1234 Pen Red 5.00
1234 Pen Blue 5.00
1234 Pen Black 5.00
1234 Book Black 5.00
1234 Book Blue 5.00
1234 Ruler Red 5.00
1234 Ruler Blue 5.00
The formula I'm trying is below, to only populate if it's a ruler. But doesn't work.
=if(and(a2=1234,b2="Book*',b2="Pen*"),"0", C2))
Result expected:
Supplier_code Item Description Rate Actual_Rate
1234 Pen Red 5.00 0
1234 Pen Blue 5.00 0
1234 Pen Black 5.00 0
1234 Book Black 5.00 0
1234 Book Blue 5.00 0
1234 Ruler Red 5.00 5.00
1234 Ruler Blue 5.00 5.00
I believe your requirement is to populate only in case of Ruler, if thats the case use below formula
=if(and(a2=1234,b2="Ruler*"),"0", C2))