I've searched far and wide for a solution to my problem... over several long weeks now. I've come up a partially working solutions, which I'll include at the bottom for those who might know how to modify/extend them to resolve the problem.
Here's what I'm trying to accomplish (the following descriptions are in reference to this screen capture http://imgur.com/oTkbjrw):
1) I am looking to match values from Column G with Column A ($G2 with $A$2:$A$10).
2) If the values match, then extract unique values (excluding blanks) from adjacent Columns B, C, and D, and return them across columns (H through M).
So, as shown in the 'desired results 1' table (G1:M10), if G2 (Vancouver) is found in A2:A10, then extract indexed values from columns B,C, and D where Vancouver is a match (i.e., Blue, ,Green,Green,Red,Yellow,Teal,Green, , , ,Pink,Pink,Pink, ,) and return only the unique values (excluding blanks) across columns H through L (blue,Green,Red,Yellow,Teal,Pink).
I've been looking for a formula solution that can be popped into H2 and then dragged horizontally to M2, to reveal the results. I'm not married to this solution, however. An alternative that I've also considered is to return the unique values in comma delimited format rather than across columns(see 'desired Results 2'). If this is easier to accomplish, I'm all for it.
Note: I will be running this formula on 10,000+ rows. A lean/efficient solution is desirable if possible.
PARTIAL SOLUTIONS I'VE PUT TOGETHER:
1) {Array formula 1}, results can be seen here:
=IFERROR(INDEX($B$2:$B$10, SMALL(IF(COUNTIF($G2,$A$2:$A$10), MATCH(ROW($B$2:$B$10), ROW($B$2:$B$10)), ""), COLUMN(A1))),"")
This formula is only able to index column B, but it successfully matches and returns values across columns. Unfortunately, it doesn't extract unique values and returns blank cells.
2) Re: Justin's comment about a likely VBA solution, figure I should pop up one of the VBA solutions I came across.
Function UNIQUE_PH(Lookupvalue As String, LookupRange As Range, ColumnNumber As Integer)
Dim i As Long
Dim Result As String
For i = 1 To LookupRange.Columns(1).Cells.Count
If LookupRange.Cells(i, 1) = Lookupvalue Then
For J = 1 To i - 1
If LookupRange.Cells(J, 1) = Lookupvalue Then
If LookupRange.Cells(J, ColumnNumber) = LookupRange.Cells(i, ColumnNumber) Then
GoTo Skip
End If
End If
Next J
Result = Result & " " & LookupRange.Cells(i, ColumnNumber) & ","
Skip:
End If
Next i
UNIQUE_PH = Left(Result, Len(Result) - 1)
End Function
'It takes 3 arguments as inputs:
'1. Lookupvalue – A string that we need to look-up in a range of cells.
'2. LookupRange – An array of cells from where we need to fetch the data
'3. ColumnNumber – It is the column number of the table/array from which matching value is to be returned (e.g. 2 for second column).
This basically does what the above formula accomplishes, with the exception that it successfully identifies unique values. It presents the results in one cell as comma delimited (similar to 'desired Results 2'). This isn't able to search through multiple columns and doesn't remove blank cells. I also tried to run it on 10,000 rows of data and it was very slow. I know very little about VBA, so I'm not sure what could be contributing to the lag.
Sample Data solution
'in G2
=A2
'in H2 as an array formula with CSE
=IFERROR(INDEX($B$2:$B$16, MATCH(0, IF($A$2:$A$16=$G2, IF(SIGN(LEN($B$2:$B$16)), COUNTIF($G2:G2, $B$2:$B$16), 1), 1), 0), 1),
IFERROR(INDEX($C$2:$C$16, MATCH(0, IF($A$2:$A$16=$G2, IF(SIGN(LEN($C$2:$C$16)), COUNTIF($G2:G2, $C$2:$C$16), 1), 1), 0), 1),
IFERROR(INDEX($D$2:$D$16, MATCH(0, IF($A$2:$A$16=$G2, IF(SIGN(LEN($D$2:$D$16)), COUNTIF($G2:G2, $D$2:$D$16), 1), 1), 0), 1),
TEXT(,))))
'in G19
=G2
'in H19
=TEXTJOIN(",", TRUE, H2:N2)
Fill H2 right then G2:N2 down as appropriate. Fill G19:H19 down to collate the values above. See footnote ¹ if you get a #NAME! error on the TEXTJOIN function.
10K Rows of Data solution
Array formulas chew up calculations exponentially the referenced ranges increase in size. The only way to do this as an array formula is to only reference the rows with Vancouver when you are creating a unique list for Vancouver and only reference the rows with Seattle when you are creating a unique list for Seattle. In other words, don't reference rows 2:10000 for Vancouver when Vancouver's data is in rows 2:6.
Note: you will have to sort your data with column A as the primary key. This operation requires it. Unsorted data (even grouped) will not allow the second MATCH function to locate the terminating row.
The rows of data in column B containing Vancouver in column A can be referenced with this.
INDEX(B:B, MATCH("vancouver", A:A, 0)):INDEX(B:B, MATCH("vancouver", A:A))
Now all you have to do is replace all occurrences of $B$2:$B$16 with the above. Adjust the formula and make the replacements for $C$2:$C$16 and $D$2:$D$16 as well. The bonus is that you can get rid of the check to see IF($A$2:$A$16=$G2, ... since by definition, you are only referencing rows with the appropriate city in column A. Make sure you reference G2 and do not hardcode in Vancouver.
'in H2 as an array formula with CSE
=IFERROR(INDEX(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A))), 1), 0)),
IFERROR(INDEX(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A))), 1), 0)),
IFERROR(INDEX(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A))), 1), 0)),
TEXT(,))))
You should be able to live with the calculation time on that array formula even through 10K rows.
I'm pretty sure that this is as far as native worksheet functions can go. Further improvements would be using variant memory arrays.
Addendum²
There is one further optimization. A worksheet IF only processes the part of the formula that is true. If you look at the names of the cities and only process the array portion of the formula when the city names change, copying directly from above when they are the same you should be able to limit the calculations further.
'in H2 as an array formula with CSE
=IF($G2=$G1, H1,
IFERROR(INDEX(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A))), 1), 0)),
IFERROR(INDEX(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A))), 1), 0)),
IFERROR(INDEX(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A))), 1), 0)),
TEXT(,)))))
Addendum³
There was a problem with single city entries and this amendment takes care of that.
You may receive a Circular Reference warning with this. Technically, the warning is true but you wil never actually get a circular reference due to the nested IF structure. In other words, the circular reference is only valid when an IF is false. Excel reports this because it does not examine the conditions; only the possibility of a circular reference which technically is true but can never actually occur.
'in H2 as an array formula with CSE
=IF($G2=$G1, H1, IF(COUNTIF($A:$A, $G2)=1,
IFERROR(INDEX(INDEX($B:$D, MATCH($G2, $A:$A, 0), 0), MATCH(0, IF(INDEX($B:$D, MATCH($G2, $A:$A, 0), 0)<>"", COUNTIF($G2:G2, INDEX($B:$D, MATCH($G2, $A:$A, 0), 0)), 1), 0)), TEXT(,)),
IFERROR(INDEX(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A))), 1), 0)),
IFERROR(INDEX(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A))), 1), 0)),
IFERROR(INDEX(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A))), 1), 0)),
TEXT(,))))))
¹ If your Excel version does not support the TEXTJOIN function, search this site for [excel][textjoin] to find alternatives in ragged-length, delimited string concatenation.
Try this UDF
Function UniquePh(Lookupvalue As String, LookupRange As Range, ValueRng As Range) As String
Dim dict As Object
Dim lkpArr() As Variant
Dim ValArr() As Variant
Set LookupRange = Intersect(LookupRange, LookupRange.Parent.UsedRange)
Set ValueRng = Intersect(ValueRng, ValueRng.Parent.UsedRange)
If LookupRange.Rows.Count <> ValueRng.Rows.Count Or LookupRange.Columns.Count > 1 Then Exit Function
Set dict = CreateObject("Scripting.Dictionary")
lkpArr = LookupRange.Value
ValArr = ValueRng.Value
For i = LBound(lkpArr, 1) To UBound(lkpArr, 1)
If lkpArr(i, 1) = Lookupvalue Then
For j = LBound(ValArr, 2) To UBound(ValArr, 2)
If ValArr(i, j) <> "" Then
On Error Resume Next
dict.Add ValArr(i, j), ValArr(i, j)
On Error GoTo 0
End If
Next j
End If
Next i
For Each itm In dict
UniquePh = UniquePh & itm & ", "
Next itm
If Len(UniquePh) > 0 Then
UniquePh = Left(UniquePh, Len(UniquePh) - 2)
Else: UniquePh = ""
End If
End Function
Put it in a modules attached to the workbook. Then call it from the sheet. Put this formula in H2:
=UniquePh(G2,$A$2:$A$10,$B$2:$D$10)
It should run a lot faster.
I'm trying to use sapply instead of a 'for' loop but I'm not getting the result I'm expecting, I've tested each line separated and the code is working but when I use sapply is not. I'm looking for some hints on what might be wrong:
event <- c('Astronomical Low Tide', 'Avalanche', 'Blizzard', 'Coastal Flood',
'Cold/Wind Chill', 'Debris Flow', 'Dense Fog', 'Dense Smoke', 'Drought',
'Dust Devil', 'Dust Storm','Excessive Heat', 'Extreme Cold/Wind Chill',
'Flash Flood', 'Flood', 'Frost/Freeze', 'Funnel Cloud', 'Freezing Fog',
'Hail', 'Heat', 'Heavy Rain', 'Heavy Snow', 'High Surf', 'High Wind',
'Hurricane/Typhoon', 'Ice Storm', 'Lake/Effect Snow', 'Lakeshore Flood',
'Lightning', 'Marine Hail', 'Marine High Wind', 'Marine Strong Wind',
'Marine Thunderstorm Wind', 'Rip Current', 'Seiche', 'Sleet',
'Storm Surge/Tide', 'Strong Wind', 'Thunderstorm Wind', 'Tornado',
'Tropical Depression', 'Tropical Storm', 'Tsunami', 'Volcanic Ash',
'Waterspout', 'Wildfire', 'Winter Storm', 'Winter Weather')
replace <- function(dt, x, col) {
idx <- grep(paste('(?i)', event[x], sep = ''), dt[, col])
dt[idx, col] <- event[x]
}
sapply(1:length(event), function(x) replace(stormdata, x, 8))
Basically, what I'm trying to do is to use every value on the event variable as a pattern on the grep function within the custom made replace function then I get the index of the rows that matched my pattern and stored them in the idx variable. After that I want to replace the rows in the data frame that correspond to the idx values with the value contained in the event variable.
I'm trying to create a loop with the sapply function to use every value on the event variable, so I want a loop that goes 48 times looking for each pattern in the data frame stormdata on its 8th column and replace them. BUT my code does nothing, after running it the data remains the same, no substitutions. When I run each line separately without the sapply it works.
I've looking everywhere, I can't find why isn't working. Help.
Try using global assignment eg stormdata[idx, col] <<- event[x] in your function. Not clean but probably will work.
I have a function that cycles through two sperate lists and combines them into one as follows:
spread = Table[{gld[[i, 1]], (gld[[i, 2]] - gdx[[i, 2]]) },
{i, 1, Length[gld], 1}]
This works fine, and generates answers in the form:
{{2009, 6, 1}, 52.72}
But when I add a subtraction, as follows:
spread = Table[{gld[[i, 1]], (gld[[i, 2]] - gdx[[i, 2]]) - meanspread },
{i, 1, Length[gld], 1}]
I get answers in the format:
{{2009, 6, 1}, {-20.2896}}
This causes issues when I want to use DateLinePlot (all the data is in the extreme right of the graph, and the graph is not usable.
Can anyone suggest what might be happening here and how I may avoid it?
Thanks!
Most likely meanspread is not a number, but a single-item list, such as {1.1}. It's impossible to tell without knowing more details and having a sample of all data/variables you're using.
I don't get this, recreating your inputs as best I can. It really depends on how you're computing meanspread.
(*In[2]:= *)
gld = FinancialData["NYSE:GLD", "Close", {"June 1, 2009", DateString[], "Day"}];
gdx = FinancialData["NYSE:GDX", "Close", {"June 1, 2009", DateString[], "Day"}];
(*In[5]:= *)
First[spread = Table[{gld[[i, 1]], (gld[[i, 2]] - gdx[[i, 2]])}, {i, 1, Length[gld], 1}]]
(*Out[5]= *)
{{2009, 6, 1}, 52.72}
(*In[8]:= *)
meanspread = Mean[spread[[All, 2]]]
(*Out[8]= *)
74.0373
(*In[9]:= *)
First[Table[{gld[[i, 1]], (gld[[i, 2]] - gdx[[i, 2]]) - meanspread}, {i, 1, Length[gld], 1}]]
(*Out[9]= *)
{{2009, 6, 1}, -21.3173}
I think you would benefit from a simpler construction.
spread = {gld[[All, 1]], gld[[All, 2]] - gdx[[All, 2]] - meanspread}\[Transpose]
As already said, if meanspread is a single numerical value, and not a list, the output should be correct.