Vba - extract values and list once - list

I have a spreadsheet with two raw data sheets on separate excel tabs that has been extracted from a finance system, containing values that represent cost codes. The dataset on both tabs is quite large and the codes that I want listed just once are repeated multiple times. I want a macro that will scan these two relevant columns (say column A on both sheets) and list the cost codes once in numerical order on a third sheet.
I've searched this site but can't seem to find a code that does the above completely.
Thanks in advance

This may not be the fastest implementation possible, as it mostly relies on VBA operations to do the work, except the final sort. Has not been tested.
Sub AppendUnique(ByVal W1 As Worksheet, ByVal W2 As Worksheet, ByVal R1 As Long, ByVal R2 As Long, ByVal C1 As Long, ByVal C2 As Long)
' Append values from an unsorted column to a new unique but unsorted column
Dim V1 As Variant, V2 As Variant
Dim I As Long
V1 = W1.Cells(R1, C1).Value
While Not IsEmpty(V1)
I = R2
V2 = W2.Cells(I, C2).Value
While Not IsEmpty(V2)
If V2 = V1 Then Exit While
I = I + 1
V2 = W2.Cells(I, C2).Value
W2.Cells(I, C2).Value = V1
R1 = R1 + 1
V1 = W1.Cells(R1, C1).Value
End Sub
Dim W1 As Worksheet, W2 As Worksheet, W3 As Worksheet
Dim C1 As Long, Dim C2 As Long, Dim C3 As Long
Dim R1 As Long, Dim R2 As Long, Dim R3 As Long
Set W1 = Worksheets("Sheet1") ' Source 1
Set W2 = Worksheets("Sheet2") ' Source 2
Set W3 = Worksheets("Sheet3") ' Destination
C1 = 1 ' Column on Sheet1: Source 1
C2 = 1 ' Column on Sheet2: Source 2
C3 = 1 ' Column on Sheet3: Destination
R1 = 1 ' Starting Row on Sheet1: Source 1
R2 = 1 ' Starting Row on Sheet2: Source 2
R3 = 1 ' Starting Row on Sheet3: Destination
AppendUnique W1, W3, R1, R3, C1, C3
AppendUnique W2, W3, R2, R3, C2, C3
W3.Range(W3.Cells(R3, C3), W3.Cells(R3, C3).End(xlDown)).Sort


Regex - Summing values in a string

We get data from another company in the following formats
These values go into a SQL Server table. The numbers represent electricity usages. I'm working on finding a way to extract the numbers and sum them together.
There is only one condition: the number must be followed by "-KH". If it is followed by "-K1" we don't need to do anything with it.
Upon inputting "65-KH-ON-PEAK|2.1-K1-ON-PEAK|164-KH-OFF-PEAK|27-K1", I need to output 229 which stands for 65 + 164
I'd prefer to find a solution using VBA for Access(For reasons related to the business's current software solutions), but I'm open to other solutions as well.
Using [Excel] can be done like this:
Sub test()
Dim cl As Range, z!, x As Variant, x2 As Variant
For Each cl In [A1:A4]
z = 0
For Each x In Split(cl.Value2, "|")
If x Like "*-KH-*" Then
For Each x2 In Split(x, "-")
If IsNumeric(x2) Then z = z + x2
Next x2
End If
Next x
cl.Offset(, 1).Value = z
Next cl
End Sub
another variant, without second loop (using #shawnt00 comment below OP)
Sub test()
Dim cl As Range, z!, x As Variant
For Each cl In [A1:A4]
z = 0
For Each x In Split(cl.Value2, "|")
If x Like "*-KH-*" Then z = z + Left(x, InStr(1, x, "-") - 1)
Next x
cl.Offset(, 1).Value = z
Next cl
End Sub
Using [Access] can be something like this:
Sub test2()
Dim z!, x As Variant
Dim rs As DAO.Recordset
Set rs = CurrentDb.OpenRecordset("SELECT * FROM Table1")
Do Until rs.EOF = True
z = 0
For Each x In Split(rs!Field1, "|")
If x Like "*-KH-*" Then z = z + Left(x, InStr(1, x, "-") - 1)
Next x
Debug.Print rs!Field1, z
End Sub
You would do a single bulk insert into an SQL Server table using | as the field terminator, so you would have fields like f1,f2,f3,f4. Then you can use an expression like:
WITH numerics
WHEN PATINDEX('%-KH-%', f1) > 0
END AS f1,
WHEN PATINDEX('%-KH-%', f2) > 0
END AS f2,
WHEN PATINDEX('%-KH-%', f3) > 0
END AS f3,
WHEN PATINDEX('%-KH-%', f4) > 0
FROM myTable )
SELECT f1 + f2 + f3 + f4 AS rowTotal;
You could do it with a Powershell script, that would give the power of regex to extract and sum the numbers. Something like the example below (I have tested the extracting from the file part but not the Access parts so they may need some tweaking):
$conn = New-Object -ComObject ADODB.Connection
$recordset = New-Object -ComObject ADODB.Recordset
$cmd = $conn.CreateCommand()
$ado.open("Provider = Microsoft.ACE.OLEDB.12.0;Data Source=\\path_to\database.accdb")
# Microsoft.Jet.OLEDB.4.0 for older versions of Access
(Select-String file.txt -Pattern '[\d.]+(?=-KH)' -AllMatches) | % {
($_.Matches | % {
} | Measure-Object -Sum).Sum
} | % {
$cmd.CommandText = "INSERT INTO TABLE VALUES($($_))"
Write-Output $cmd.ExecuteNonQuery()

Matching two lists in excel

I am trying to compare two months sales to each other in excel in the most automated way possible (just so it will be quicker for future months)
This months values are all worked out through formulae and last months will be copy and pasted into D:E. However as you can see there are some customers that made purchases last month and then did not this month (and vice versa). I basically need to be have all CustomerID's matching row by row. So for example it to end up like this:
Can anyone think of a good way of doing this without having to do it all manually? Thanks
Use the SUMIFS function or VLOOKUP. Like this:
You should just have your entire customer list on one sheet and then add up the values associated with them month over month. The design you are describing is going to be a nightmare to maintain over time and serves no purpose. I can understand you would like to see the customers in a row like that, which is why I suggest SUMIFS.
This option compare only two columns, I think you do to think anoter way,
first I will add the date/month and then you can add down the next month value:
then you can use a simply pivot to see more month in the some time
any case if you want to format your two columns, you can use this code (you will to update with you reference, I used the date from your img example)
Sub OrderMachColumns()
Dim lastRow As Integer
Dim sortarray(1 To 2, 1 To 2) As String
Dim x As Long, y As Long
Dim TempTxt10 As String
Dim TempTxt11 As String
Dim TempTxt20 As String
Dim TempTxt22 As String
lastRow = Range("A3").End(xlDown).Row ' I use column A, same your example
For x = 3 To lastRow * 2
Cells(x, 1).Select
If Cells(x, 1) = "" Then GoTo B
If Cells(x, 4) = "" Then GoTo A
If Cells(x, 1) = Cells(x, 4) Then
If Cells(x, 1).Value = Cells(x - 1, 4).Value Then
Range(Cells(x - 1, 4), Cells(x - 1, 5)).Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
ElseIf Cells(x, 1).Value = Cells(x + 1, 4).Value Then
Range(Cells(x, 1), Cells(x, 2)).Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
sortarray(1, 1) = Cells(x, 1).Value
sortarray(1, 2) = "Cells(" & x & ", 1)"
sortarray(2, 1) = Cells(x, 4).Value
sortarray(2, 2) = "Cells(" & x & ", 4)"
For Z = LBound(sortarray) To UBound(sortarray)
For y = Z To UBound(sortarray)
If UCase(sortarray(y, 1)) > UCase(sortarray(Z, 1)) Then
TempTxt11 = sortarray(Z, 1)
TempTxt12 = sortarray(Z, 2)
TempTxt21 = sortarray(y, 1)
TempTxt22 = sortarray(y, 2)
sortarray(Z, 1) = TempTxt21
sortarray(y, 1) = TempTxt11
sortarray(Z, 2) = TempTxt22
sortarray(y, 2) = TempTxt12
End If
Next y
Next Z
Select Case sortarray(1, 2)
Case "Cells(" & x & ", 1)"
Range(Cells(x, 1), Cells(x, 2)).Select
Case "Cells(" & x & ", 4)"
Range(Cells(x, 4), Cells(x, 5)).Select
End Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
End If
End If
Next x
End Sub

"if then statement in VBA to fill cell with number"

"if then statement in VBA" I'm writing a program that puts a number in a cell in Excell if a variable reaches a certain value. I understand how to declare variables but I don't know how to tell excel to write x if A1 =34. Thanks
Add a listener to your worksheet to capture a Range. You can make the range [A1] if you are only watching a specific column/row, or you can add a range like I have below.
Private Sub Worksheet_Change(ByVal Target As Range)
Dim KeyCells As Range
Set KeyCells = Range("A:A")
If Not Application.Intersect(KeyCells, Range(Target.Address)) _
Is Nothing Then
If Target.Value = "34" Then
Cells(Target.Row, 2) = "X"
Cells(Target.Row, 2) = ""
End If
End If
End Sub
Change "x" to if you want variable x and not literal x.
If your goal is to change the value of the cell to "X" (Literal X), and you are not having macros run constantly or with each cell change, you can use the following function (or similar) in each cell in which you have a conditional.
See the Microsoft support on this topic https://support.microsoft.com/en-us/kb/213612
It's not clear what you wish to do, but let's say you want to write the current value of your variable, x, into cell B2... if cell A1 is 34.
In the above case, you would do this:
If [a1] = 34 then [b2] = x
Private Sub CommandButton1_Click()
Dim lr As Long
lr = Worksheets("New").Range("A" & Rows.Count).End(xlUp).Row
Worksheets("New").Range("T2").Formula = "=LEFT(B2,2)"
Worksheets("New").Range("T2").AutoFill Destination:=Worksheets("New").Range("T2:T" & lr)
Worksheets("New").Range("U2").Formula = "=(T2&0&0)"
Worksheets("New").Range("U2").AutoFill Destination:=Worksheets("New").Range("U2:U" & lr)
Worksheets("New").Range("V2").Formula = "=IF(AND(a2=A1,U2=U1),"",A2")" (HOW TO AUTO FILL THIS FORMULA IN A CELL)
Worksheets("New").Range("V2").AutoFill Destination:=Worksheets("New").Range("V2:V" & lr)
End Sub

Remove the duplicate row in apache pig

I want to remove the duplicate rows in pig. There are a lot of ways, but I am not sure if which one is better.
Here is the data set, the schema is (f0,f1,id,f3,f4):
If any of rows whose f0,f1 and id are equal, then they are considered to be the duplicate. And I want to output one of them where f3 is minimum.
But I also want to output which ids have the duplicates.
That is, I will store or dump two relations.
one of both relations are:
The other one is the id which has the duplicate rows, the schema is (id,f4)
That is, id=3 has the duplicate data.
Here is my workaround
r1 = LOAD 'data' USING PigStorage(',');
r2 = group r1 by ($0,$1,$2);
r3 = FOREACH r2 GENERATE COUNT(r1) as c, r1;
SPLIT r3 into r4 if c > 1, r5 if c == 1;
r6 = FOREACH r5 GENERATE flatten(r1);
dups_id = FOREACH r4 {
GENERATE flatten(r1.$2),flatten(r1.$4);
r7= distinct dups_id
dump r7
no_dups = FOREACH r4 {
sorted = ORDER r1 by $3 ASC;
lim = limit sorted 1;
GENERATE flatten(lim);
r8 = union no_dups,r6
dump r8
I think that this is a little complicated, and I doubt the performance.
Is there any other better idea can implement this use case?
Here is how I would do it.
r1 = LOAD 'data' USING PigStorage(',');
r2 = group r1 by ($0,$1,$2);
r3 = FOREACH r2 GENERATE $0.., SIZE($1) AS size;
r8 = FOREACH r2 {
dups = FILTER r3 BY size > 1L;
dups3 = FOREACH dups2 GENERATE $2, $4;
dups_id = DISTINCT dups3;
dump r8;
dump dups_id;

Delete similar rows

I have list of 3 word phrases with 90000 rows. I need to delete every row, if any other row contains 2 of the same words. For example
Word1 word2 word3
word1 word2 word4 - delete
word1 word2 word5 - delete
word1 word6 word7 - keep, only 1 matching words compared to earlier rows
Is there any way to do this?
Step 1. Separate words into three columns (A, B, and C) using Text to Columns or formulas
Step 2. In columns D, E, and F, past the following formulas to create all two-word combinations:
Step 3. Put the following formula in G1 and fill it through columns H and I and all the rows:
The spreadsheet should now look like this screenshot (besides the two rows I added to the end):
All rows with two words that match two words in a row above will have a value greater than 0 in columns G, H, or I.
Step 4. Finally, filter the entire table by rows G, H, and I equal to 0. You can copy and past (by value) the words to another sheet if desired.
Are the three word phrases in separate cells or are they all in the same cell.
If they are in separate cells, you can use this macro:
Option Explicit
Sub DeleteDups()
Dim colPhrase As Collection
Dim colRows As Collection
Dim V As Variant, vRes() As Variant
Dim I As Long, J As Long
Dim lDupCount As Long
Dim rRes As Range 'results range
V = Worksheets("sheet1").Range("a1", Cells(Rows.Count, "C").End(xlUp))
Set colPhrase = New Collection
Set colRows = New Collection
Set rRes = Range("e1")
'look for dups
For I = 1 To UBound(V)
lDupCount = 0
On Error Resume Next
For J = 1 To 3
colPhrase.Add Item:=CStr(V(I, J)), Key:=CStr(V(I, J))
If Err.Number <> 0 Then lDupCount = lDupCount + 1
Next J
On Error GoTo 0
If lDupCount < 2 Then colRows.Add Item:=CStr(I)
Next I
ReDim vRes(1 To colRows.Count, 1 To 3)
For I = 1 To colRows.Count
For J = 1 To 3
vRes(I, J) = V(colRows(I), J)
Next J
Next I
Set rRes = rRes.Resize(UBound(vRes), 3)
rRes = vRes
End Sub
If they are in the same cell, depending on how the phrases are separated, you would just need to add a line that separates them into three array elements.