Matching two lists in excel - list

I am trying to compare two months sales to each other in excel in the most automated way possible (just so it will be quicker for future months)
This months values are all worked out through formulae and last months will be copy and pasted into D:E. However as you can see there are some customers that made purchases last month and then did not this month (and vice versa). I basically need to be have all CustomerID's matching row by row. So for example it to end up like this:
Can anyone think of a good way of doing this without having to do it all manually? Thanks

Use the SUMIFS function or VLOOKUP. Like this:
http://screencast.com/t/VTBZrfHjo8tk
You should just have your entire customer list on one sheet and then add up the values associated with them month over month. The design you are describing is going to be a nightmare to maintain over time and serves no purpose. I can understand you would like to see the customers in a row like that, which is why I suggest SUMIFS.

This option compare only two columns, I think you do to think anoter way,
first I will add the date/month and then you can add down the next month value:
then you can use a simply pivot to see more month in the some time
any case if you want to format your two columns, you can use this code (you will to update with you reference, I used the date from your img example)
Sub OrderMachColumns()
Dim lastRow As Integer
Dim sortarray(1 To 2, 1 To 2) As String
Dim x As Long, y As Long
Dim TempTxt10 As String
Dim TempTxt11 As String
Dim TempTxt20 As String
Dim TempTxt22 As String
lastRow = Range("A3").End(xlDown).Row ' I use column A, same your example
For x = 3 To lastRow * 2
Cells(x, 1).Select
If Cells(x, 1) = "" Then GoTo B
If Cells(x, 4) = "" Then GoTo A
If Cells(x, 1) = Cells(x, 4) Then
Else
If Cells(x, 1).Value = Cells(x - 1, 4).Value Then
Range(Cells(x - 1, 4), Cells(x - 1, 5)).Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
ElseIf Cells(x, 1).Value = Cells(x + 1, 4).Value Then
Range(Cells(x, 1), Cells(x, 2)).Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
Else
sortarray(1, 1) = Cells(x, 1).Value
sortarray(1, 2) = "Cells(" & x & ", 1)"
sortarray(2, 1) = Cells(x, 4).Value
sortarray(2, 2) = "Cells(" & x & ", 4)"
For Z = LBound(sortarray) To UBound(sortarray)
For y = Z To UBound(sortarray)
If UCase(sortarray(y, 1)) > UCase(sortarray(Z, 1)) Then
TempTxt11 = sortarray(Z, 1)
TempTxt12 = sortarray(Z, 2)
TempTxt21 = sortarray(y, 1)
TempTxt22 = sortarray(y, 2)
sortarray(Z, 1) = TempTxt21
sortarray(y, 1) = TempTxt11
sortarray(Z, 2) = TempTxt22
sortarray(y, 2) = TempTxt12
End If
Next y
Next Z
Select Case sortarray(1, 2)
Case "Cells(" & x & ", 1)"
Range(Cells(x, 1), Cells(x, 2)).Select
Case "Cells(" & x & ", 4)"
Range(Cells(x, 4), Cells(x, 5)).Select
End Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
End If
End If
A:
Next x
B:
End Sub

Related

Excel - Identify unique value patterns and return output in descending order across columns, optimized for 500,000+ rows

This is the third and final remaining problem to a massive data cleaning task I have been working on for over a year. Thank you Stack Overflow community for helping figure out:
Problem 1- Index multiple columns and Match distinct values....
Problem 2- Count unique values that match ID, optimized for 100,000+ cases.
I'm not 100% sure if the following is achievable in excel, but I'll do my best to describe the data cleaning and organization challenge I'm faced with.
I have a series of data markers/attributes that are in a random order across 24 columns, spanning 500,000+ rows. Image 1 below is an example of what the data looks like in raw form, presented across 12 columns and spanning 22 rows for illustrative simplicity. Columns A through L contain the raw data and Columns M through X represent the desired output.
SUMMARY OF THE TASK: What needs to be accomplished is a series of matching functions that search through all indexed columns (in this case columns A through L) to identify unique values (e.g. 1), search for the value in range (in this case A2:L21 range), identify the adjacent values to the unique value (for value 1, adjacent values are 2 and 13-XR), then output them in a descending sequence from most frequently occurring value to least frequently occurring in each row that contains any of the values in question (in this case, 1 occurs 5 times and is placed in M2 through M6; 2 occurs 3 times and is placed in N2 through N6; and 13-XR occurs 2 times and is placed in O2 through O6).
To clarify, below is a step by step description using colours to illustrate the pattern matching in the raw data (columns A through L) and how these patterns should then presented in the output (columns M through X). I've sectioned off each of the following images into the six patterns that are in the raw data.
The above image is the first pattern that would be identified by the VBA solution. It would identify "1" as a unique value and search through the A:L range for number of instances of "1" (highlighted in blue), then identify all the values that can be found adjacent in the same row: "2" in rows 3, 5, and 6 (highlighted in green); and "13-XR" in rows 4 and 5 (highlighted in pink). This would then need to be done for "2", identifying the adjacent values ("1" and "13-XR"), and then for "13-XR", identifying ("1" and "2" as adjacent values). The output would return the unique values with the most frequently occurring in Column M ("1" occurs 5 times), then the second most occurring in Column N ("2" occurs 3 times), and the third most occurring in Column O ("13-XR" occurs 2 times).
The above is little more complex. The VBA would identify "3" as a unique value, search through the A:L range for other instances of "3" and identify all the values that are adjacent to it (in this case, "4", "7", and "9"). It would then do the same for "4", identifying all adjacent values (only "3"); then for "7", identifying adjacent values ("9", "3", and "12"); then for "9" identifying ("7", and "3"); and finally, for "12" identifying adjacent values (only "7"). Then for each row where any of these values are present, the output would return a "3" in column M (occurring three times) and a "7" in column N (also occurring three times); if counts are equal, they could be presented in ascending fashion A to Z or smallest to largest... or just random, the ordering of equal counts is arbitrary for my purposes. "9" would be returned in column O as it occurs two times, then "4" in column P and "12" in column Q, as they both occur once but 12 is greater than 4.
The above image represents what is likely to be a common occurrence, where there is only one unique value. Here, "5" is not identified in any other columns in the range. It is thus returned as "5" in column M for each row where a "5" is present.
This will be another of the more common occurrences, where one value may be present in one row and two values present in another row. In this instance "6" is only identified once in the range and "8" is the only adjacent value found. When "8" is searched for it only returns one instance of an adjacent value "6". Here, "8" occurs twice and "6" only once, thus resulting in "8" imputed in column M and "6" imputed in column N wherever an "8" or a "6" are present in the row.
Here "10", "111", "112", "543", "433", "444", and "42-FG" are identified as unique values associated with one another in the A:L range. All values except "10" occur twice, which are returned in columns M through S in descending order.
This final pattern is identified in the same manner as above, just with more unique values (n=10).
FINAL NOTES: I have no idea how to accomplish this within excel, but I'm hoping someone else has the knowledge to move this problem forward. Here are some additional notes about the data that might help towards a resolution:
The first column will always be sorted in ascending order. I can do additional custom sorts if it simplifies things.
Out of the ~500,000 rows, 15% only have one attribute value (one value in column A), 30% have two attribute values (1 value in col A & 1 value in col B), 13% have three attribute values (1 value in col A, B, & C).
I have presented small numbers in this example. The actual raw data values in each cell will be closer to 20 characters in length.
A solution that does everything except present the patterns in descending order would be absolutely cool. The sorting would be great but I can live without it if it causes too much trouble.
If anything in this description needs further clarification, or if I can provide additional information, please let me know and I'll adjust as needed.
Thanks in advance to anyone who can help solve this final challenge of mine.
ADDENDUM:
There was a memory error happening with the full data set. #ambie figured out the source of the error was adjacent chains (results) numbering in the 1000s (trying to return results across 1000s of columns). Seems the problem is not with the solution or the data, just hitting a limitation within excel. A possible solution to this is (see image below) to add two new columns (ATT_COUNT as column M; ATT_ALL as column Z). ATT_COUNT in Column M would return the total number of unique values that would ordinarily be returned across columns. Only up to the top 12 most frequently occurring values would be returned in columns N through Y (ATT_1_CL through ATT_12_CL). To get around the instances where ATT_COUNT is > 12 (& upwards of 1000+), we can return all the unique values in space delimited format in ATT_ALL (column Z). For example, in the image below, rows 17, 18, 19, and 21, have 17 unique values in the chain. Only the first 12 most frequently occurring values are presented in columns N through Y. All 17 values are presented in space delimited format in column Z.
Here is a link to this mini example test data.
Here is a link to a mid sized sample of test data of ~50k rows.
Here is a link to the full sized sample test data of ~500k rows.
We don't normally provide a 'code for you service' but I know in previous questions you have provided some sample code that you've tried, and I can see how you wouldn't know where to start with this.
For your future coding work, the trick is to break the problem down into individual tasks. For your problem, these would be:
Identify all the unique values and acquire a list of all the adjacent values - fairly simple.
Create a list of 'chains' which link one adjacent value to the next - this is more awkward because, although the list appears sorted, the adjacent values are not, so a value relatively low down in the list might be adjacent to a higher value that is already part of a chain (the 3 in your sample is an example of this). So the simplest thing would be to assign the chains only after all the unique values have been read.
Map of each unique value to its appropriate 'chain' - I've done this by creating an index for the chains and assigning the relevant one to the unique value.
Collection objects are ideal for you because they deal with the issue of duplicates, allow you to populate lists of an unknown size and make value mapping easy with their Key property. To make the coding easy to read, I've created a class containing some fields. So first of all, insert a Class Module and call it cItem. The code behind this class would be:
Option Explicit
Public Element As String
Public Frq As Long
Public AdjIndex As Long
Public Adjs As Collection
Private Sub Class_Initialize()
Set Adjs = New Collection
End Sub
In your module, the tasks could be coded as follows:
Dim data As Variant, adj As Variant
Dim uniques As Collection, chains As Collection, chain As Collection
Dim oItem As cItem, oAdj As cItem
Dim r As Long, c As Long, n As Long, i As Long, maxChain As Long
Dim output() As Variant
'Read the data.
'Note: Define range as you need.
With Sheet1
data = .Range(.Cells(2, "A"), _
.Cells(.Rows.Count, "A").End(xlUp)) _
.Resize(, 12) _
.Value2
End With
'Find the unique values
Set uniques = New Collection
For r = 1 To UBound(data, 1)
For c = 1 To UBound(data, 2)
If IsEmpty(data(r, c)) Then Exit For
Set oItem = Nothing: On Error Resume Next
Set oItem = uniques(CStr(data(r, c))): On Error GoTo 0
If oItem Is Nothing Then
Set oItem = New cItem
oItem.Element = CStr(data(r, c))
uniques.Add oItem, oItem.Element
End If
oItem.Frq = oItem.Frq + 1
'Find the left adjacent value
If c > 1 Then
On Error Resume Next
oItem.Adjs.Add uniques(CStr(data(r, c - 1))), CStr(data(r, c - 1))
On Error GoTo 0
End If
'Find the right adjacent value
If c < UBound(data, 2) Then
If Not IsEmpty(data(r, c + 1)) Then
On Error Resume Next
oItem.Adjs.Add uniques(CStr(data(r, c + 1))), CStr(data(r, c + 1))
On Error GoTo 0
End If
End If
Next
Next
'Define the adjacent indexes.
For Each oItem In uniques
'If the item has a chain index, pass it to the adjacents.
If oItem.AdjIndex <> 0 Then
For Each oAdj In oItem.Adjs
oAdj.AdjIndex = oItem.AdjIndex
Next
Else
'If an adjacent has a chain index, pass it to the item.
i = 0
For Each oAdj In oItem.Adjs
If oAdj.AdjIndex <> 0 Then
i = oAdj.AdjIndex
Exit For
End If
Next
If i <> 0 Then
oItem.AdjIndex = i
For Each oAdj In oItem.Adjs
oAdj.AdjIndex = i
Next
End If
'If we're still missing a chain index, create a new one.
If oItem.AdjIndex = 0 Then
n = n + 1
oItem.AdjIndex = n
For Each oAdj In oItem.Adjs
oAdj.AdjIndex = n
Next
End If
End If
Next
'Populate the chain lists.
Set chains = New Collection
For Each oItem In uniques
Set chain = Nothing: On Error Resume Next
Set chain = chains(CStr(oItem.AdjIndex)): On Error GoTo 0
If chain Is Nothing Then
'It's a new chain so create a new collection.
Set chain = New Collection
chain.Add oItem.Element, CStr(oItem.Element)
chains.Add chain, CStr(oItem.AdjIndex)
Else
'It's an existing chain, so find the frequency position (highest first).
Set oAdj = uniques(chain(chain.Count))
If oItem.Frq <= oAdj.Frq Then
chain.Add oItem.Element, CStr(oItem.Element)
Else
For Each adj In chain
Set oAdj = uniques(adj)
If oItem.Frq > oAdj.Frq Then
chain.Add Item:=oItem.Element, Key:=CStr(oItem.Element), Before:=adj
Exit For
End If
Next
End If
End If
'Get the column count of output array
If chain.Count > maxChain Then maxChain = chain.Count
Next
'Populate each row with the relevant chain
ReDim output(1 To UBound(data, 1), 1 To maxChain)
For r = 1 To UBound(data, 1)
Set oItem = uniques(CStr(data(r, 1)))
Set chain = chains(CStr(oItem.AdjIndex))
c = 1
For Each adj In chain
output(r, c) = adj
c = c + 1
Next
Next
'Write the output to sheet.
'Note: adjust range to suit.
Sheet1.Range("M2").Resize(UBound(output, 1), UBound(output, 2)).Value = output
This isn't the most efficient way of doing it, but it does make each task more obvious to you. I'm not sure I understood the full complexities of your data structure, but the code above does reproduce your sample, so it should give you something to work with.
Update
Okay, now I've seen your comments and the real data, below is some revised code which should be quicker and deals with the fact that the apparently 'empty' cells are actually null strings.
First of all create a class called cItem and add code behind:
Option Explicit
Public Name As String
Public Frq As Long
Public Adj As Collection
Private mChainIndex As Long
Public Property Get ChainIndex() As Long
ChainIndex = mChainIndex
End Property
Public Property Let ChainIndex(val As Long)
Dim oItem As cItem
If mChainIndex = 0 Then
mChainIndex = val
For Each oItem In Me.Adj
oItem.ChainIndex = val
Next
End If
End Property
Public Sub AddAdj(oAdj As cItem)
Dim t As cItem
On Error Resume Next
Set t = Me.Adj(oAdj.Name)
On Error GoTo 0
If t Is Nothing Then Me.Adj.Add oAdj, oAdj.Name
End Sub
Private Sub Class_Initialize()
Set Adj = New Collection
End Sub
Now create another class called cChain with code behind as:
Option Explicit
Public Index As Long
Public Members As Collection
Public Sub AddItem(oItem As cItem)
Dim oChainItem As cItem
With Me.Members
Select Case .Count
Case 0 'First item so just add it.
.Add oItem, oItem.Name
Case Is < 12 'Fewer than 12 items, so add to end or in order.
Set oChainItem = .item(.Count)
If oItem.Frq <= oChainItem.Frq Then 'It's last in order so just add it.
.Add oItem, oItem.Name
Else 'Find its place in order.
For Each oChainItem In Me.Members
If oItem.Frq > oChainItem.Frq Then
.Add oItem, oItem.Name, before:=oChainItem.Name
Exit For
End If
Next
End If
Case 12 'Full list, so find place and remove last item.
Set oChainItem = .item(12)
If oItem.Frq > oChainItem.Frq Then
For Each oChainItem In Me.Members
If oItem.Frq > oChainItem.Frq Then
.Add oItem, oItem.Name, before:=oChainItem.Name
.Remove 13
Exit For
End If
Next
End If
End Select
End With
End Sub
Private Sub Class_Initialize()
Set Members = New Collection
End Sub
Finally, your module code would be:
Option Explicit
Public Sub ProcessSheet()
Dim data As Variant
Dim items As Collection, chains As Collection
Dim oItem As cItem, oAdj As cItem
Dim oChain As cChain
Dim txt As String
Dim r As Long, c As Long, n As Long
Dim output() As Variant
Dim pTick As Long, pCount As Long, pTot As Long, pTask As String
'Read the data.
pTask = "Reading data..."
Application.StatusBar = pTask
With Sheet1
data = .Range(.Cells(2, "A"), _
.Cells(.Rows.Count, "A").End(xlUp)) _
.Resize(, 12) _
.Value2
End With
'Collect unique and adjacent values.
pTask = "Finding uniques "
pCount = 0: pTot = UBound(data, 1): pTick = 0
Set items = New Collection
For r = 1 To UBound(data, 1)
If ProgressTicked(pTot, pCount, pTick) Then
Application.StatusBar = pTask & pTick & "%"
DoEvents
End If
For c = 1 To UBound(data, 2)
txt = data(r, c)
If Len(txt) = 0 Then Exit For
Set oItem = GetOrCreateItem(items, txt)
oItem.Frq = oItem.Frq + 1
'Take adjacent on left.
If c > 1 Then
txt = data(r, c - 1)
If Len(txt) > 0 Then
Set oAdj = GetOrCreateItem(items, txt)
oItem.AddAdj oAdj
End If
End If
'Take adjacent on right.
If c < UBound(data, 2) Then
txt = data(r, c + 1)
If Len(txt) > 0 Then
Set oAdj = GetOrCreateItem(items, txt)
oItem.AddAdj oAdj
End If
End If
Next
Next
'Now that we have all the items and their frequencies,
'we can find the adjacent chain indexes by a recursive
'call of the ChainIndex set property.
pTask = "Find chain indexes "
pCount = 0: pTot = items.Count: pTick = 0
Set chains = New Collection
n = 1 'Chain index.
For Each oItem In items
If ProgressTicked(pTot, pCount, pTick) Then
Application.StatusBar = pTask & pTick & "%"
DoEvents
End If
If oItem.ChainIndex = 0 Then
oItem.ChainIndex = n
Set oChain = New cChain
oChain.Index = n
chains.Add oChain, CStr(n)
n = n + 1
End If
Next
'Build the chains.
pTask = "Build chains "
pCount = 0: pTot = items.Count: pTick = 0
For Each oItem In items
If ProgressTicked(pTot, pCount, pTick) Then
Application.StatusBar = pTask & pTick & "%"
DoEvents
End If
Set oChain = chains(CStr(oItem.ChainIndex))
oChain.AddItem oItem
Next
'Write the data to our output array.
pTask = "Populate output "
pCount = 0: pTot = UBound(data, 1): pTick = 0
ReDim output(1 To UBound(data, 1), 1 To 12)
For r = 1 To UBound(data, 1)
If ProgressTicked(pTot, pCount, pTick) Then
Application.StatusBar = pTask & pTick & "%"
DoEvents
End If
Set oItem = items(data(r, 1))
Set oChain = chains(CStr(oItem.ChainIndex))
c = 1
For Each oItem In oChain.Members
output(r, c) = oItem.Name
c = c + 1
Next
Next
'Write the output to sheet.
'Note: adjust range to suit.
pTask = "Writing data..."
Application.StatusBar = pTask
Sheet1.Range("M2").Resize(UBound(output, 1), UBound(output, 2)).Value = output
Application.StatusBar = "Ready"
End Sub
Private Function GetOrCreateItem(col As Collection, key As String) As cItem
Dim obj As cItem
'If the item already exists then return it,
'otherwise create a new item.
On Error Resume Next
Set obj = col(key)
On Error GoTo 0
If obj Is Nothing Then
Set obj = New cItem
obj.Name = key
col.Add obj, key
End If
Set GetOrCreateItem = obj
End Function
Public Function ProgressTicked(ByVal t As Long, ByRef c As Long, ByRef p As Long) As Boolean
c = c + 1
If Int((c / t) * 100) > p Then
p = p + 1
ProgressTicked = True
End If
End Function

How to stop second run of the code to prevent override data regex vba?

The below code will split 1 cell into 3 or 4 column based on a pattern of 6chr,5chr,4chr,5+chr. The below also needs to be available on all open workbooks and work from the user selection.
How to fix a bug that after the first splitting of the cell is done and by mistake you run it again will override the data?
Class Module
Option Explicit
'Rename this Class Module cFabric
Private pStyle As String
Private pFabric As String
Private pColour As String
Private pSize As String
Public Property Get Style() As String
Style = pStyle
End Property
Public Property Let Style(Value As String)
pStyle = Value
End Property
Public Property Get Fabric() As String
Fabric = pFabric
End Property
Public Property Let Fabric(Value As String)
pFabric = UCase(Value)
End Property
Public Property Get Colour() As String
Colour = pColour
End Property
Public Property Let Colour(Value As String)
pColour = Value
End Property
Public Property Get Size() As String
Size = pSize
End Property
Public Property Let Size(Value As String)
pSize = Value
End Property
Regular Module
Option Explicit
Sub Fabrics()
Dim wsSrc As Workbook, wsRes As Workbook
Dim vSrc As Variant, vRes As Variant, rRes As Range
Dim RE As Object, MC As Object
Const sPat As String = "^(.{6})\s*(.{5})\s*(.{4})(?:.*1/(\S+))?"
'Group 1 = style
'Group 2 = fabric
'Group 3 = colour
'Group 4 = size
Dim colF As Collection, cF As cFabric
Dim I As Long
Dim S As String
Dim V As Variant
'Set source and results worksheets and ranges
Set wsSrc = ActiveWorkbook
Set wsRes = ActiveWorkbook
Set rRes = wsRes.Application.Selection
'Read source data into array
vSrc = Application.Selection
'Initialize the Collection object
Set colF = New Collection
'Initialize the Regex Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = False
.MultiLine = True
.Pattern = sPat
'Test for single cell
If Not IsArray(vSrc) Then
V = vSrc
ReDim vSrc(1 To 1, 1 To 1)
vSrc(1, 1) = V
End If
'iterate through the list
For I = 1 To UBound(vSrc, 1)
S = vSrc(I, 1)
Set cF = New cFabric
If .test(S) = True Then
Set MC = .Execute(S)
With MC(0)
cF.Style = .submatches(0)
cF.Fabric = .submatches(1)
cF.Colour = .submatches(2)
cF.Size = .submatches(3)
End With
Else
cF.Style = S
End If
colF.Add cF
Next I
End With
'create results array
'Exit if no results
If colF.Count = 0 Then Exit Sub
ReDim vRes(1 To colF.Count, 1 To 4)
'Populate the rest
I = 0
For Each V In colF
I = I + 1
With V
vRes(I, 1) = .Style
vRes(I, 2) = .Fabric
vRes(I, 3) = .Colour
vRes(I, 4) = .Size
End With
Next V
'Write the results
Set rRes = rRes.Resize(UBound(vRes, 1), UBound(vRes, 2))
rRes.Value = vRes
End Sub
Credits for the above goes to #Ron Rosenfeld for the project!
One way to tell if the entry has been previously split is as follows
If the regex.test fails, then
If the results line passes, then the item has been previously split
if not, then it is a blank, or a malformed entry
Note that a lot of this could be avoided if you were not overwriting your original data. I would recommend against overwriting your data both for audit and debugging purposes, but the below should help in case you cannot change that.
You just need to make some small changes in the logic where we checked for the malformed entry originally. As well as reading in the "possible" results array into vSrc so that we have the potentially split data to compare:
Option Explicit
Sub Fabrics()
'assume data is in column A
Dim wsSrc As Worksheet, wsRes As Worksheet
Dim vSrc As Variant, vRes As Variant, rRes As Range
Dim RE As Object, MC As Object
Const sPat As String = "^(.{6})\s*(.{5})\s*(.{4})(?:.*1/(\S+))?"
'Group 1 = style
'Group 2 = fabric
'Group 3 = colour
'Group 4 = size
Dim colF As Collection, cF As cFabric
Dim I As Long
Dim S As String
Dim V As Variant
'Set source and results worksheets and ranges
Set wsSrc = ActiveSheet
Set wsRes = ActiveSheet
Set rRes = Selection
'Read source data into array
vSrc = Selection.Resize(columnsize:=4)
'Initialize the Collection object
Set colF = New Collection
'Initialize the Regex Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = False
.MultiLine = True
.Pattern = sPat
'iterate through the list
'Test for single cell
If Not IsArray(vSrc) Then
V = vSrc
ReDim vSrc(1 To 1, 1 To 1)
vSrc(1, 1) = V
End If
For I = 1 To UBound(vSrc, 1)
S = vSrc(I, 1)
Set cF = New cFabric
If .test(S) = True Then
Set MC = .Execute(S)
With MC(0)
cF.Style = .submatches(0)
cF.Fabric = .submatches(1)
cF.Colour = .submatches(2)
cF.Size = .submatches(3)
End With
ElseIf .test(vSrc(I, 1) & vSrc(I, 2) & vSrc(I, 3)) = False Then
cF.Style = S
Else
cF.Style = vSrc(I, 1)
cF.Fabric = vSrc(I, 2)
cF.Colour = vSrc(I, 3)
cF.Size = vSrc(I, 4)
End If
colF.Add cF
Next I
End With
'create results array
'Exit if not results
If colF.Count = 0 Then Exit Sub
ReDim vRes(1 To colF.Count, 1 To 4)
'Populate
I = 0
For Each V In colF
I = I + 1
With V
vRes(I, 1) = .Style
vRes(I, 2) = .Fabric
vRes(I, 3) = .Colour
vRes(I, 4) = .Size
End With
Next V
'Write the results
Set rRes = rRes.Resize(UBound(vRes, 1), UBound(vRes, 2))
With rRes
.Clear
.NumberFormat = "#"
.Value = vRes
.EntireColumn.AutoFit
End With
End Sub
Disregarding the previous regex/class method,
Option Explicit
Sub Fabrics_part_Deux()
Dim a As Long, b As Long
With Worksheets("Sheet1")
If .AutoFilterMode Then .AutoFilterMode = False
With .Range(.Cells(1, "A"), .Cells(.Rows.Count, "B").End(xlUp).Offset(0, 3))
With .Columns("B")
.Offset(1, 0).Replace what:=Chr(32), replacement:=vbNullString, lookat:=xlPart
End With
.AutoFilter field:=2, Criteria1:="<>"
.AutoFilter field:=3, Criteria1:=""
With .Resize(.Rows.Count - 1, 1).Offset(1, 1)
If CBool(Application.Subtotal(103, .Cells)) Then
With .SpecialCells(xlCellTypeVisible)
For a = 1 To .Areas.Count
With .Areas(a).Cells
.TextToColumns Destination:=.Cells(1), DataType:=xlFixedWidth, _
FieldInfo:=Array(Array(0, 1), Array(6, 1), Array(11, 1), Array(15, 2))
For b = 1 To .Rows.Count
.Cells(b, 2) = UCase$(.Cells(b, 2).Value2)
If CBool(InStr(1, .Cells(b, 4).Value2, Chr(47), vbBinaryCompare)) Then
.Cells(b, 4) = Trim(Split(.Cells(b, 4), Chr(47))(1))
End If
Next b
End With
Next a
End With
End If
End With
End With
If .AutoFilterMode Then .AutoFilterMode = False
End With
End Sub
In your code to output to the spreadsheet, you need to check for empty strings
I = 0
For Each V In colF
I = I + 1
With V
vRes(I, 1) = .Style
If len(.Fabric) > 0 then
vRes(I, 2) = .Fabric
vRes(I, 3) = .Colour
vRes(I, 4) = .Size
End If
End With
Next V

How can I do this task using Z-algorithm?

In a question I am asked to find if the given string s contains two non-overlapping substrings "AB" and "BA" (the substrings can go in any order).
I have already solved this question but since I am learning Z-algorithm.Can anyone help me in that ?
I know how to find number of occurrence of a pattern in a text(by appending P and T)but I am not getting any idea how to solve this using Z algorithm ?
To find if T contains P with Z-algorithm:
S = P + '#' + T //extra char not occurring in strings
for i in 0..Length(T) - 1 do
if Z[i + Length(P) + 1] = Length(P) then
P contains T in ith position
To find if T contains both 'AB' and 'BA' without overlapping:
Sab = 'AB#' + T
Sba = 'BA#' + T
Build Zab and Zba arrays with Z-algo
PosAB_Last = Length(T) + 10 //just big value
PosAB_Prev = PosAB_Last
PosBA_Last = PosAB_Last
PosBA_Prev = PosAB_Last
for i in 0..Length(T) - 1 do
if Zab[i + 3] = 2 then
PosAB_Prev = PosAB_Last //keep two last positions of AB in text
PosAB_Last = i
//it is enough to compare positions with two last occurences of 'BA '
//so algo is linear
if (i - PosBA_Last > 1) or (i - PosBA_Prev > 1) then
Success
else
if Zba[i + 3] = 2 then
PosBA_Prev = PosBA_Last
PosBA_Last = i
if (i - PosAB_Last > 1) or (i - PosAB_Prev > 1) then
Success

Split string of digits into individual cells, including digits within parentheses/brackets

I have a column where each cell has a string of digits, ?, -, and digits in parentheses/brackets/curly brackets. A good example would be something like the following:
3????0{1012}?121-2[101]--01221111(01)1
How do I separate the string into different cells by characters, where a 'character' in this case refers to any number, ?, -, and value within the parentheses/brackets/curly brackets (including said parentheses/brackets/curly brackets)?
In essence, the string above would turn into the following (spaced apart to denote a separate cell):
3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
The amount of numbers within the parentheses/brackets/curly brackets vary. There are no letters in any of the strings.
Here you are!
RegEx method:
Sub Test_RegEx()
Dim s, col, m
s = "3????0{1012}?121-2[101]--01221111(01)1"
Set col = CreateObject("Scripting.Dictionary")
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "(?:\d|-|\?|\(\d+\)|\[\d+\]|\{\d+\})"
For Each m In .Execute(s)
col(col.Count) = m
Next
End With
MsgBox Join(col.items) ' 3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
End Sub
Loop method:
Sub Test_Loop()
Dim s, col, q, t, k, i
s = "3????0{1012}?121-2[101]--01221111(01)1"
Set col = CreateObject("Scripting.Dictionary")
q = "_"
t = True
k = 0
For i = 1 To Len(s)
t = (t Or InStr(1, ")]}", q) > 0) And InStr(1, "([{", q) = 0
q = Mid(s, i, 1)
If t Then k = k + 1
col(k) = col(k) & q
Next
MsgBox Join(col.items) ' 3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
End Sub
Something else to look at :)
Sub test()
'String to parse through
Dim aStr As String
'final string to print
Dim finalString As String
aStr = "3????0{1012}?121-2[101]--01221111(01)1"
'Loop through string
For i = 1 To Len(aStr)
'The character to look at
char = Mid(aStr, i, 1)
'Check if the character is an opening brace, curly brace, or parenthesis
Dim result As String
Select Case char
Case "["
result = loop_until_end(Mid(aStr, i + 1), "]")
i = i + Len(result)
result = char & result
Case "("
result = loop_until_end(Mid(aStr, i + 1), ")")
i = i + Len(result)
result = char & result
Case "{"
result = loop_until_end(Mid(aStr, i + 1), "}")
i = i + Len(result)
result = char & result
Case Else
result = Mid(aStr, i, 1)
End Select
finalString = finalString & result & " "
Next
Debug.Print (finalString)
End Sub
'Loops through and concatenate to a final string until the end_char is found
'Returns a substring starting from the character after
Function loop_until_end(aStr, end_char)
idx = 1
If (Len(aStr) <= 1) Then
loop_until_end = aStr
Else
char = Mid(aStr, idx, 1)
Do Until (char = end_char)
idx = idx + 1
char = Mid(aStr, idx, 1)
Loop
End If
loop_until_end = Mid(aStr, 1, idx)
End Function
Assuming the data is in column A starting in row 1 and that you want the results start in column B and going right for each row of data in column A, here is alternate method using only worksheet formulas.
In cell B1 use this formula:
=IF(OR(LEFT(A1,1)={"(","[","{"}),LEFT(A1,MIN(FIND({")","]","}"},A1&")]}"))),IFERROR(--LEFT(A1,1),LEFT(A1,1)))
In cell C1 use this formula:
=IF(OR(MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1)={"(","[","{"}),MID($A1,SUMPRODUCT(LEN($B1:B1))+1,MIN(FIND({")","]","}"},$A1&")]}",SUMPRODUCT(LEN($B1:B1))+1))-SUMPRODUCT(LEN($B1:B1))),IFERROR(--MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1),MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1)))
Copy the C1 formula right until it starts giving you blanks (there are no more items left to split out from the string in the A cell). In your example, need to copy it right to column AA. Then you can copy the formulas down for the rest of your Column A data.

VBA: How to list items on rows separated by headers?

I want to list rows from Sheet1 to Sheet2 based on a criteria, move on to next criteria once there is no more rows to copy according to the first criteria and also separate the copied rows by headers.
Sheet1 contains an unsorted list of projects, where I want to be able to add and remove projects whenever. I also want to categorize projects into different types. Sheet1 would look like this:
ProjectID ProjectName Type Cost
1 ProjectA Development -120
2 ProjectB Development -250
3 ProjectC Maintenance -30
I would then like to copy the data via VBA to Sheet2 in the following format:
Maintenance Projects
ProjectID ProjectName Type Cost
3 ProjectC Maintenance -30
Development Projects
ProjectID ProjectName Type Cost
1 ProjectA Development -120
2 ProjectB Development -250
I've been trying to look for a solution but haven't found one that would suit my need and I am not a very experienced VBA user. Any tips or hints on what method to use here?
This will copy your data from sheet1 to sheet2 assuming sheet2 is blank in the format that you requested.
Sub SplitData_Click()
Dim dicType As Object
Set dicType = CreateObject("scripting.dictionary")
Dim i As Integer
Dim lstRow As Long
Dim val As String
lstRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, 4).End(xlUp).Row
Dim projects() As Variant
ReDim projects(0 To lstRow - 2, 0 To 3) ' I like 0 based arrays
' Populate the dictionary with the unique types
For i = 2 To lstRow
projects(i - 2, 0) = Range("A" & i) ' ProjectID
projects(i - 2, 1) = Range("B" & i) ' ProjectName
projects(i - 2, 2) = Range("C" & i) ' Type
projects(i - 2, 3) = Range("D" & i) ' Cost
val = Range("C" & i)
If dicType.Exists(val) Then
dicType.Item(val) = dicType.Item(val) + 1
Else
dicType.Add val, 1
End If
Next i
Dim header() As Variant
ReDim header(0 To 3)
header(0) = "ProjectId"
header(1) = "ProjectName"
header(2) = "Type"
header(3) = "Cost"
Sheets("Sheet2").Select
' loop through each type and build its structure on sheet 2
Dim key As Variant
For Each key In dicType
If Range("A1") = "" Then
Range("A1").Value = key & " Projects"
Else
lstRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, 1).End(xlUp).Row + 2
Range("A" & lstRow).Value = key & " Projects"
End If
lstRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, 1).End(xlUp).Row + 1
Range("A" & lstRow).Value = header(0)
Range("B" & lstRow).Value = header(1)
Range("C" & lstRow).Value = header(2)
Range("D" & lstRow).Value = header(3)
For i = 0 To UBound(projects)
lstRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, 1).End(xlUp).Row + 1
If projects(i, 2) = key Then
Range("A" & lstRow).Value = projects(i, 0)
Range("B" & lstRow).Value = projects(i, 1)
Range("C" & lstRow).Value = projects(i, 2)
Range("D" & lstRow).Value = projects(i, 3)
End If
Next i
Debug.Print key
Next key
End Sub