Return all possible pairs combinations of values for a single column in SQL - combinations

I would like to get all DISTINCT pairs combination to follow table :
Table Name: Dancer
id name
1 Yaniv
2 Dan
3 Eli
4 Guy
5 Sara
6 Naama
7 Suzi
8 Vered
*The results should be like this :
pairs
Yaniv Dan
Yaniv Eli
Yaniv Guy
Yaniv Sara
Yaniv Naama
Yaniv Suzi
Yaniv Vered
Dan Eli
Dan Guy
Dan Sara
Dan Naama
Dan Suzi
Dan Vered
Guy Sara
Guy Naama
Guy Suzi
Guy Vered
Sara Naama
Sara Suzi
Sara Vered
Naama Suzi
Naama Vered
Suzi Vered
I tried CROSS JOIN + WHERE clause to eliminate the identical names as Yaniv Yaniv,Dan Dan...etc
BUT I get also multiple pairs combinations as
Yaniv Dan
Dan Yaniv
How can I filter these multiple pairs..??
this is my SQL code :
Select D2.name + ' ' + D1.name
From Dancer D1
Cross join
Dancer D2
Where D1.name<>D2.name
Hope my question is clear enough.

If your goal is to get all the column of names with all other names in the same column this would be considered a Cartesian join.
Would be something like the following:
Select a.Name +' ' + b.Name as ResultName
from Table as a, Table as b
This works if your Name column contains like you described : John, Yaniv, Dan .. would results in John Yaniv, John Name, Yaniv John, Yaniv Dan. If however your names have multiple parts like Sarah T you will only get the single name with other names. Example: Sarah T John, Sarah T Yaniv, etc. It will not break down Sarah and T into separate results.

Related

Pyspark: Create additional column based on Regex

I recently started Pyspark and I'm trying to figure out the regex matching.
For the regexes I've created a list and if one of these items in the list is found in the name column, the added column must be true. This Regex matching must not be case sensitive as seen in the example below.
I have a Table with the following format:
seqno
name
1
john jones
2
John Jones
3
John Stones
4
Mary Wild
5
William Wurt
6
steven wurt
I need to change the Table above to the format of the Table below. This is just a small part of the actual table so hard coding is not going to cut it unfortunately.
seqno
name
regex
1
john jones
True
2
John Jones
True
3
John Stones
True
4
Mary Wild
False
5
William Wurt
True
6
steven wurt
True
Here is the code to create part of the Table:
regex_list = [john, wurt]
columns = ['seqno', 'name']
data = [('1', 'john jones'),
('2', 'John Jones'),
('3', 'John Stones'),
('4', 'Mary Wild'),
('5', 'William Wurt'),
('6', 'steven wurt')]
df = spark.createDataFrame(data=data, schema=columns)
I've been trying numerous applications with .isin and .rlike but can't seem to make it work. Any help would be gladly appreciated.
Thanks in advance!
Use rlike to check if any of the listed regex are like names. can change case in both list and column while test happens Code beloow
df.withColumn('regex',upper(col('name')).rlike(('|').join([x.upper() for x in regex_list]))).show()
+-----+------------+-----+
|seqno| name|regex|
+-----+------------+-----+
| 1| john jones| true|
| 2| John Jones| true|
| 3| John Stones| true|
| 4| Mary Wild|false|
| 5|William Wurt| true|
| 6| steven wurt| true|
+-----+------------+-----+

Google Sheets ARRAYFORMULA count preceeding rows that meet condition

Let's say I have a spreadsheet that looks something like this:
Name D-List
--------------------- ------
Arnold Schwarzenegger
Bruce Willis
Dolph Lundgren
Dwayne Johnson
Jason Statham
Keanu Reeves
Samuel L. Jackson
Sylvester Stallone
Vin Diesel
For the D-List column, I'd like to count the number of proceeding rows that contains the string "d". If the row doesn't contain a "d", then I want it to return an empty string.
For any given row, I can get this to work with the following pseudo formula:
=IF(REGEXMATCH(A<row>, "d"), COUNTIF(A<row>, "*d*"), "")
Name D-List
--------------------- ------
Arnold Schwarzenegger 1
Bruce Willis
Dolph Lundgren 2
Dwayne Johnson 3
Jason Statham
Keanu Reeves
Samuel L. Jackson
Sylvester Stallone
Vin Diesel 4
I can turn this into an expression that can be duplicated between rows by using INDIRECT and ROW:
=IF(REGEXMATCH(A2, "(?i)d"), COUNTIF(INDIRECT("A2:A" & ROW(A2)), "*D*"), "")
Name D-List
--------------------- ------
Arnold Schwarzenegger 1
Bruce Willis
Dolph Lundgren 2
Dwayne Johnson 3
Jason Statham
Keanu Reeves
Samuel L. Jackson
Sylvester Stallone
Vin Diesel 4
However, if I try to stick it in an ARRAYFORMULA, it doesn't work.
=ARRAYFORMULA(IF(REGEXMATCH(A2:A, "D"), COUNTIF(INDIRECT("A2:A" & ROW(A2:A)), "*D*"), ""))
Name D-List
--------------------- ------
Arnold Schwarzenegger
Bruce Willis
Dolph Lundgren 1
Dwayne Johnson 1
Jason Statham
Keanu Reeves
Samuel L. Jackson
Sylvester Stallone
Vin Diesel 1
What am I missing?
try:
=ARRAYFORMULA(IF(
REGEXMATCH(A2:A, "(?i)d"), COUNTIFS(
REGEXMATCH(A2:A, "(?i)d"),
REGEXMATCH(A2:A, "(?i)d"), ROW(A2:A), "<="&ROW(A2:A)), ))

Calculated Column to DAX

i have a calculated column using this code:
SalesMember_Cal_Column =
VAR ContextID = Table1[Id]
RETURN
CONCATENATEX (
CALCULATETABLE (
DISTINCT ( Table1[salesmember] ),
FILTER ( Table1, Table1[Id] = Id )
),
Table1[salesmember],
","
)
This works fine, but I would like to use it as a measure.
What can I change to use this as a measure?
Example:
I have data that looks like this:
Company CompnayID SalesMember Role
Walmart 1 Ryan Lead
Walmart 1 Vinnie Lead2
Walmart 1 Danny Lead3
Winco 2 Ryan Lead
Winco 2 Vinnie Lead2
Winco 2 Danny Lead3
Fred Myer 3 Noelle Lead
Kroger 4 Dennis Lead
Albertsons 5 Nate Lead
Safeway 6 Carol Lead
I want to create a measure called SalesMember_Cal_Column that will give me this result:
Company CompnayID SalesMember Role SalesMember_Cal_Column
Walmart 1 Ryan Lead Ryan, Vinnie, Danny
Walmart 1 Vinnie Lead2 Ryan, Vinnie, Danny
Walmart 1 Danny Lead3 Ryan, Vinnie, Danny
Winco 2 Ryan Lead Ryan, Vinnie, Danny
Winco 2 Vinnie Lead2 Ryan, Vinnie, Danny
Winco 2 Danny Lead3 Ryan, Vinnie, Danny
Fred Myers 3 Noelle Lead Noelle
Kroger 4 Dennis Lead Dennis
Albertsons 5 Nate Lead Nate
Safeway 6 Carol Lead Carol
I want to make sure when I slice to a company then slice on sales member that only the associated sales member shows up in the new column.
For example, if I were to slice the above table to Walmart and sales member Ryan the result would look like this:
Company CompnayID SalesMember Role SalesMember_Cal_Column
Walmart 1 Ryan Lead Ryan
I think you should be able to simply replace
VAR ContextID = Table1[Id]
with something like
VAR ContextID = SELECTEDVALUE ( Table1[Id] )
This will return the Id value from the filter context (or return blank if there are multiple).

Using Lookup or Index - If a certain placing, then place the name

I would like to provide the name of the competitor if they placed first. In different cells, I will like the same for second place to fifth place.
My purpose is because there are many divisions, 27, and each are on different worksheets. It would make it easier to have all the top five division placings on one sheet for the announcer and passing out trophies.
I am unable to provide a picture until I have a rep of 10. Therefore, the data is provided below.
Thank you so much for your time and help!
Column B
Competitor Name
Brown, Sam
Simmons, Donald
Smith, John
Doe, John
Lee, Joe
Smith, Joey
Smith, Joey
Smith, Joey
Column C
Placings
5
4
2
6
8
7
1
3
I figured out the formula, but before hand I had to make sure the data was in ascending order:
=LOOKUP(1,C1:C8,B1:B8)
Formula returned - Smith, Joey
=LOOKUP(2,C1:C8,B1:B8)
Formula returned - Smith, John
I figured out another formula so the numbers do not need to be in any particular order:
=INDEX(B1:B8,MATCH(1,C1:C8,0),1)
Formula returned - Smith, Joey
=INDEX(B1:B8,MATCH(2,C1:C8,0),1)
Formula returned - Smith, John

Setting up negative to positive counts for time

If there is a data set that has months and each person has a different month of starting a job. For example:
person date date_started date_count
Tim 1/1/2000 3/1/2000 -2
Tim 2/1/2000 3/1/2000 -1
Tim 3/1/2000 3/1/2000 0
John 1/1/2000 7/1/2000 -6
John 2/1/2000 7/1/2000 -5
John 3/1/2000 7/1/2000 -4
John 4/1/2000 7/1/2000 -3
John 5/1/2000 7/1/2000 -2
John 6/1/2000 7/1/2000 -1
John 7/1/2000 7/1/2000 0
John 8/1/2000 7/1/2000 1
John 9/1/2000 7/1/2000 2
John 10/1/2000 7/1/2000 3
Mary 3/1/2000 3/1/2000 0
Mary 4/1/2000 3/1/2000 1
What is the most efficient way to get the date_count column? I also have a column that is 1 in your first month and 0 otherwise. I rather use that in making the date_count
I don't understand what the difficulty is here. The question seems poorly explained to me.
You mention months, but your example shows daily dates, so the role of months in the problem is a mystery.
The variable you want is just the difference between two daily dates. So long as you have two daily date variables (Dimitriy explains how to get those from string dates), it is just a subtraction.
(Added later) My uncertainty shows what happens when one assumes on an international list that local conventions are universal. There are two conventions easily confused, showing dates as day/month/year and showing dates as month/day/year. Evidently you are using the second convention. If so, the problem is to convert from daily dates to monthly dates using mofd(); then as said it is a subtraction.
I don't know if this is the optimal way, but I think it should work:
/* convert your dates to Stata's date format from strings */
gen date2=daily(date,"MDY");
gen date_started2=daily(date_started,"MDY");
format date2 date_started2 %td;
/* this is the main code */
gen before = date_started2>date2;
bys person before: egen date_count2 = rank(abs(date_started2 - date2));
replace date_count2 = date_count2 - 1 if before==0;
replace date_count2 = -date_count2 if before==1;
drop before;
Edit:
Mea culpa. I completely misunderstood your question to mean that you wanted a countdown to start date for each person-observation event. You actually want something much simpler:
gen date_count2=mofd(daily(date,"MDY")) - mofd(daily(date_started,"MDY"));
This assumes you are working with date and date_started that are stores as string variables. The daily() converts to Stata date format, and mofd() converts to calendar months. Then it's just the difference.