Regex - Summing values in a string - regex

We get data from another company in the following formats
374-KH-ON-PEAK|807-KH-OFF-PEAK
82.5-KH-TOTAL|8-K1-CURRENT
44.5-KH-TOTAL
65-KH-ON-PEAK|2.1-K1-ON-PEAK|164-KH-OFF-PEAK|27-K1
These values go into a SQL Server table. The numbers represent electricity usages. I'm working on finding a way to extract the numbers and sum them together.
There is only one condition: the number must be followed by "-KH". If it is followed by "-K1" we don't need to do anything with it.
Upon inputting "65-KH-ON-PEAK|2.1-K1-ON-PEAK|164-KH-OFF-PEAK|27-K1", I need to output 229 which stands for 65 + 164
I'd prefer to find a solution using VBA for Access(For reasons related to the business's current software solutions), but I'm open to other solutions as well.

Using [Excel] can be done like this:
code:
Sub test()
Dim cl As Range, z!, x As Variant, x2 As Variant
For Each cl In [A1:A4]
z = 0
For Each x In Split(cl.Value2, "|")
If x Like "*-KH-*" Then
For Each x2 In Split(x, "-")
If IsNumeric(x2) Then z = z + x2
Next x2
End If
Next x
cl.Offset(, 1).Value = z
Next cl
End Sub
another variant, without second loop (using #shawnt00 comment below OP)
Sub test()
Dim cl As Range, z!, x As Variant
For Each cl In [A1:A4]
z = 0
For Each x In Split(cl.Value2, "|")
If x Like "*-KH-*" Then z = z + Left(x, InStr(1, x, "-") - 1)
Next x
cl.Offset(, 1).Value = z
Next cl
End Sub
output:
Using [Access] can be something like this:
Sub test2()
Dim z!, x As Variant
Dim rs As DAO.Recordset
Set rs = CurrentDb.OpenRecordset("SELECT * FROM Table1")
Do Until rs.EOF = True
z = 0
For Each x In Split(rs!Field1, "|")
If x Like "*-KH-*" Then z = z + Left(x, InStr(1, x, "-") - 1)
Next x
Debug.Print rs!Field1, z
rs.MoveNext
Loop
End Sub
test:

You would do a single bulk insert into an SQL Server table using | as the field terminator, so you would have fields like f1,f2,f3,f4. Then you can use an expression like:
WITH numerics
AS ( SELECT CASE
WHEN PATINDEX('%-KH-%', f1) > 0
THEN CAST(SUBSTRING(f1, 1, PATINDEX('%-KH-%', f1) - 1) AS INT)
ELSE 0
END AS f1,
CASE
WHEN PATINDEX('%-KH-%', f2) > 0
THEN CAST(SUBSTRING(f2, 1, PATINDEX('%-KH-%', f2) - 1) AS INT)
ELSE 0
END AS f2,
CASE
WHEN PATINDEX('%-KH-%', f3) > 0
THEN CAST(SUBSTRING(f3, 1, PATINDEX('%-KH-%', f3) - 1) AS INT)
ELSE 0
END AS f3,
CASE
WHEN PATINDEX('%-KH-%', f4) > 0
THEN CAST(SUBSTRING(f4, 1, PATINDEX('%-KH-%', f4) - 1) AS INT)
ELSE 0
END AS f4
FROM myTable )
SELECT f1 + f2 + f3 + f4 AS rowTotal;

You could do it with a Powershell script, that would give the power of regex to extract and sum the numbers. Something like the example below (I have tested the extracting from the file part but not the Access parts so they may need some tweaking):
$conn = New-Object -ComObject ADODB.Connection
$recordset = New-Object -ComObject ADODB.Recordset
$conn.Open()
$cmd = $conn.CreateCommand()
$ado.open("Provider = Microsoft.ACE.OLEDB.12.0;Data Source=\\path_to\database.accdb")
# Microsoft.Jet.OLEDB.4.0 for older versions of Access
(Select-String file.txt -Pattern '[\d.]+(?=-KH)' -AllMatches) | % {
($_.Matches | % {
[double]$_.Value
} | Measure-Object -Sum).Sum
} | % {
$cmd.CommandText = "INSERT INTO TABLE VALUES($($_))"
Write-Output $cmd.ExecuteNonQuery()
}
$conn.Close()

Related

Match parts of a string

I have 2 strings that each contain 25 characters. E.g.
X = "0000111111110111111111110"
Y = "0000011111000000000000000"
What would be the most efficient method to identify, true or false if every position that has a "1" string Y also has a "1" in string X? In this example it should return True as there are 1s in X that match the positions of all 1s in Y.
I could read each character position and do a comparison for all 25 but was hoping some clever person would know of a more elegant way.
The easier way is to use Convert.ToInt32() to parse the string as a binary literal and perform binary AND:
Public Function MatchAsBinary(ByVal x As String, ByVal y As String) As Boolean
Dim x_int = Convert.ToInt32(x, 2)
Dim y_int = Convert.ToInt32(y, 2)
Return (x_int And y_int) = y_int
End Function
The faster (~10 times in release build) way is to compare the chars directly:
Public Function MatchAsChars(ByVal x As String, ByVal y As String) As Boolean
For i As Integer = 0 To y.Length - 1
If y(i) = "1"c AndAlso x(i) = "0"c Then
Return False
End If
Next
Return True
End Function
If you regard the strings as binary numbers, you can convert them to numbers and then use the bitwise and operator, like this:
Module Module1
Sub Main()
Dim X = "0000111111110111111111110"
Dim Y = "0000011111000000000000000"
Dim Xb = Convert.ToInt64(X, 2)
Dim Yb = Convert.ToInt64(Y, 2)
Console.WriteLine((Xb And Yb) = Yb)
Console.ReadLine()
End Sub
End Module
That will output True and work for strings of up to 64 characters.
Or, following on from your comment, you could use Convert.ToInt32 as that would give enough bits for your data.
Can do something similar #JoshD said above, but use Convert.ToInt32(Y, 2) to convert from a binary string to an integer.
Xint = Convert.ToInt32(X, 2)
Yint = Convert.ToInt32(Y, 2)
return ((Xint And Yint) = Yint)
This includes what others have shown plus a test for each bit one at a time.
Dim s As String = "0000011111000000000000000"
Dim X As String = "0000111111110111111111110"
Dim Y As String = "0000011111000000000000000"
Dim xi As Integer = Convert.ToInt32(X, 2)
Dim yi As Integer = Convert.ToInt32(Y, 2)
'check each bit
For i As Integer = 0 To 24
Dim msk As Integer = 1 << i
If (msk And xi) = msk AndAlso (msk And yi) = msk Then
Debug.WriteLine("Bit {0} on in both", i)
End If
Next
'all bits
Dim rslt As Integer = xi And yi
s = Convert.ToString(rslt, 2).PadLeft(25, "0"c)
Dim intY As Integer = CInt(Y)
Dim res As Boolean = (CInt(X) And intY) = intY
Convert them to integers, get all instances of matching 1's with a bitwise And, then compare to see if Y was changed by that comparison. If the comparison preserved the original Y, the result will be True.

Why can't I extract text between patterns with varying text length

I have a column in a data frame with characters that need to be split into columns. My code seems to break when the string in the column has a length of 12 but it works fine when the string has a length of 11.
S99.ABCD{T}
S99.ABCD{V}
S99.ABCD{W}
S99.ABCD{Y}
Q100.ABCD{A}
Q100.ABCD{C}
Q100.ABCD{D}
Q100.ABCD{E}
An example of the ideal format is on the left, what I'm getting is on the right:
ID WILD RES MUT | ID WILD RES MUT
ABCD S 99 T | ABCD S 99 T
... | ...
ABCD Q 100 A | .ABC Q 100 {
... | ...
My current solution is the following:
data <- data.frame(ID = substr(mdata$substitution,
gregexpr(pattern = "\\.",
mdata$substitution)[[1]] + 1,
gregexpr(pattern = "\\{",
mdata$substitution)[[1]] - 1),
WILD = substr(mdata$substitution, 0, 1),
RES = gsub("[^0-9]","", mdata$substitution),
MUT = substr(mdata$substitution,
gregexpr(pattern = "\\{",
mdata$substitution)[[1]] + 1,
gregexpr(pattern = "\\}",
mdata$substitution)[[1]] - 1))
I'm not sure why my code isn't working, I thought using gregexpr I would be able to find where the pattern was in the string to find out the position of characters I want to extract but it doesn't work when the length of the string changes.
Using this example you can do what you want
test=c("S99.ABCD{T}",
"S99.ABCD{V}",
"S99.ABCD{W}",
"S99.ABCD{Y}",
"Q100.ABCD{A}",
"Q100.ABCD{C}",
"Q100.ABCD{D}",
"Q100.ABCD{E}")
library(stringr)
test=str_remove(test,pattern = "\\}")
testdf=as.data.frame(str_split(test,pattern = "\\.|\\{",simplify = T))
testdf$V4substring(testdf$V1, 1, 1)
testdf$V5=substring(testdf$V1, 2)

Split string of digits into individual cells, including digits within parentheses/brackets

I have a column where each cell has a string of digits, ?, -, and digits in parentheses/brackets/curly brackets. A good example would be something like the following:
3????0{1012}?121-2[101]--01221111(01)1
How do I separate the string into different cells by characters, where a 'character' in this case refers to any number, ?, -, and value within the parentheses/brackets/curly brackets (including said parentheses/brackets/curly brackets)?
In essence, the string above would turn into the following (spaced apart to denote a separate cell):
3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
The amount of numbers within the parentheses/brackets/curly brackets vary. There are no letters in any of the strings.
Here you are!
RegEx method:
Sub Test_RegEx()
Dim s, col, m
s = "3????0{1012}?121-2[101]--01221111(01)1"
Set col = CreateObject("Scripting.Dictionary")
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "(?:\d|-|\?|\(\d+\)|\[\d+\]|\{\d+\})"
For Each m In .Execute(s)
col(col.Count) = m
Next
End With
MsgBox Join(col.items) ' 3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
End Sub
Loop method:
Sub Test_Loop()
Dim s, col, q, t, k, i
s = "3????0{1012}?121-2[101]--01221111(01)1"
Set col = CreateObject("Scripting.Dictionary")
q = "_"
t = True
k = 0
For i = 1 To Len(s)
t = (t Or InStr(1, ")]}", q) > 0) And InStr(1, "([{", q) = 0
q = Mid(s, i, 1)
If t Then k = k + 1
col(k) = col(k) & q
Next
MsgBox Join(col.items) ' 3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
End Sub
Something else to look at :)
Sub test()
'String to parse through
Dim aStr As String
'final string to print
Dim finalString As String
aStr = "3????0{1012}?121-2[101]--01221111(01)1"
'Loop through string
For i = 1 To Len(aStr)
'The character to look at
char = Mid(aStr, i, 1)
'Check if the character is an opening brace, curly brace, or parenthesis
Dim result As String
Select Case char
Case "["
result = loop_until_end(Mid(aStr, i + 1), "]")
i = i + Len(result)
result = char & result
Case "("
result = loop_until_end(Mid(aStr, i + 1), ")")
i = i + Len(result)
result = char & result
Case "{"
result = loop_until_end(Mid(aStr, i + 1), "}")
i = i + Len(result)
result = char & result
Case Else
result = Mid(aStr, i, 1)
End Select
finalString = finalString & result & " "
Next
Debug.Print (finalString)
End Sub
'Loops through and concatenate to a final string until the end_char is found
'Returns a substring starting from the character after
Function loop_until_end(aStr, end_char)
idx = 1
If (Len(aStr) <= 1) Then
loop_until_end = aStr
Else
char = Mid(aStr, idx, 1)
Do Until (char = end_char)
idx = idx + 1
char = Mid(aStr, idx, 1)
Loop
End If
loop_until_end = Mid(aStr, 1, idx)
End Function
Assuming the data is in column A starting in row 1 and that you want the results start in column B and going right for each row of data in column A, here is alternate method using only worksheet formulas.
In cell B1 use this formula:
=IF(OR(LEFT(A1,1)={"(","[","{"}),LEFT(A1,MIN(FIND({")","]","}"},A1&")]}"))),IFERROR(--LEFT(A1,1),LEFT(A1,1)))
In cell C1 use this formula:
=IF(OR(MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1)={"(","[","{"}),MID($A1,SUMPRODUCT(LEN($B1:B1))+1,MIN(FIND({")","]","}"},$A1&")]}",SUMPRODUCT(LEN($B1:B1))+1))-SUMPRODUCT(LEN($B1:B1))),IFERROR(--MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1),MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1)))
Copy the C1 formula right until it starts giving you blanks (there are no more items left to split out from the string in the A cell). In your example, need to copy it right to column AA. Then you can copy the formulas down for the rest of your Column A data.

Matching two lists in excel

I am trying to compare two months sales to each other in excel in the most automated way possible (just so it will be quicker for future months)
This months values are all worked out through formulae and last months will be copy and pasted into D:E. However as you can see there are some customers that made purchases last month and then did not this month (and vice versa). I basically need to be have all CustomerID's matching row by row. So for example it to end up like this:
Can anyone think of a good way of doing this without having to do it all manually? Thanks
Use the SUMIFS function or VLOOKUP. Like this:
http://screencast.com/t/VTBZrfHjo8tk
You should just have your entire customer list on one sheet and then add up the values associated with them month over month. The design you are describing is going to be a nightmare to maintain over time and serves no purpose. I can understand you would like to see the customers in a row like that, which is why I suggest SUMIFS.
This option compare only two columns, I think you do to think anoter way,
first I will add the date/month and then you can add down the next month value:
then you can use a simply pivot to see more month in the some time
any case if you want to format your two columns, you can use this code (you will to update with you reference, I used the date from your img example)
Sub OrderMachColumns()
Dim lastRow As Integer
Dim sortarray(1 To 2, 1 To 2) As String
Dim x As Long, y As Long
Dim TempTxt10 As String
Dim TempTxt11 As String
Dim TempTxt20 As String
Dim TempTxt22 As String
lastRow = Range("A3").End(xlDown).Row ' I use column A, same your example
For x = 3 To lastRow * 2
Cells(x, 1).Select
If Cells(x, 1) = "" Then GoTo B
If Cells(x, 4) = "" Then GoTo A
If Cells(x, 1) = Cells(x, 4) Then
Else
If Cells(x, 1).Value = Cells(x - 1, 4).Value Then
Range(Cells(x - 1, 4), Cells(x - 1, 5)).Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
ElseIf Cells(x, 1).Value = Cells(x + 1, 4).Value Then
Range(Cells(x, 1), Cells(x, 2)).Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
Else
sortarray(1, 1) = Cells(x, 1).Value
sortarray(1, 2) = "Cells(" & x & ", 1)"
sortarray(2, 1) = Cells(x, 4).Value
sortarray(2, 2) = "Cells(" & x & ", 4)"
For Z = LBound(sortarray) To UBound(sortarray)
For y = Z To UBound(sortarray)
If UCase(sortarray(y, 1)) > UCase(sortarray(Z, 1)) Then
TempTxt11 = sortarray(Z, 1)
TempTxt12 = sortarray(Z, 2)
TempTxt21 = sortarray(y, 1)
TempTxt22 = sortarray(y, 2)
sortarray(Z, 1) = TempTxt21
sortarray(y, 1) = TempTxt11
sortarray(Z, 2) = TempTxt22
sortarray(y, 2) = TempTxt12
End If
Next y
Next Z
Select Case sortarray(1, 2)
Case "Cells(" & x & ", 1)"
Range(Cells(x, 1), Cells(x, 2)).Select
Case "Cells(" & x & ", 4)"
Range(Cells(x, 4), Cells(x, 5)).Select
End Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
End If
End If
A:
Next x
B:
End Sub

VBA EXCEL : Pattern creation function replacing numbers with characters

I have written the below mentioned code but its not functional. Can anyone help?
Explanation:
A 7 or 8 digit number is set. If the number is 8 digits, the first 2 numbers are removed, if the number is 7 digits, the first number is removed. A 6 digit number is left whereby every digit can be repeated without any constraints. So one can have a number between 000001 and 999999. (Zeros on the left are counted).
The code is functional on the first 3 digits but does not function properly later on though i'm using the same logic. The function of the code is to Generate all possible patterns by translating the numbers into characters.
The constraints:
Letters used are only a, b, c, d, e, and f.
Characters should run systematic order
Under this logic:
The pattern can range between aaaaaa and abcdef.
The first character is always "a" and the last character could be "f" in case all digits are different from one another.
So, the number 454657 is translated to abacbd or 123456 is translated to abcdef. (c Can't exist if there is no b and d can't exist if there is no b and c).
Private Sub CommandButton1_Click()
Dim GSM_Counter, GSM, GSM_Range, a, b, c, d, e, f As String
Dim GSM_length, Num1, Num2, Num3, Num4, Num5, Num6, a1, b1, c1, d1, e1, f1 As integer
GSM_Counter = Application.WorksheetFunction.CountA(Range("A:A"))
For i = 2 To GSM_Counter
GSM_length = Len(Range("A" & i))
Select Case GSM_length
Case Is = 8
Range("B" & i) = Left(Range("A" & i), 2)
Num1 = Right(Left(Range("A" & i), 3), 1)
Num2 = Right(Left(Range("A" & i), 4), 1)
Num3 = Right(Left(Range("A" & i), 5), 1)
Num4 = Right(Left(Range("A" & i), 6), 1)
Num5 = Right(Left(Range("A" & i), 7), 1)
Num6 = Right(Left(Range("A" & i), 8), 1)
Case Is = 7
Range("B" & i) = Left(Range("A" & i), 1)
Num1 = Right(Left(Range("A" & i), 2), 1)
Num2 = Right(Left(Range("A" & i), 3), 1)
Num3 = Right(Left(Range("A" & i), 4), 1)
Num4 = Right(Left(Range("A" & i), 5), 1)
Num5 = Right(Left(Range("A" & i), 6), 1)
Num6 = Right(Left(Range("A" & i), 7), 1)
End Select
Range("C" & i) = Num1
Range("D" & i) = Num2
Range("E" & i) = Num3
Range("F" & i) = Num4
Range("G" & i) = Num5
Range("H" & i) = Num6
Next i
For k = 2 To GSM_Counter
a1 = Range("C" & k)
b1 = Range("D" & k)
c1 = Range("E" & k)
d1 = Range("F" & k)
e1 = Range("G" & k)
f1 = Range("H" & k)
a = "a"
Range("K" & k) = a
If b1 = a1 Then
b = "a"
Else
b = "b"
End If
Range("L" & k) = b
If c1 = a1 Then
c = "a"
ElseIf c1 = b1 Then
c = "b"
Else
c = "c"
End If
Range("M" & k) = c
If d1 = a1 Then
d = "a"
ElseIf d1 = b1 Then
d = "b"
ElseIf d1 = c1 Then
d = "c"
Else
d = "d"
End If
Range("N" & k) = d
If e1 = a1 Then
e = "a"
ElseIf e1 = b1 Then
e = "b"
ElseIf e1 = c1 Then
e = "c"
ElseIf e1 = d1 Then
e = "d"
Else
e = "e"
End If
Range("O" & k) = e
If f1 = a1 Then
f = "a"
ElseIf f1 = b1 Then
f = "b"
ElseIf f1 = c1 Then
f = "c"
ElseIf f1 = d1 Then
f = "d"
ElseIf f1 = e1 Then
f = "e"
Else
f = "f"
End If
Range("P" & k) = f
Next k
End Sub
Here is another way..
'~~> Test Data
Sub Sample()
Dim TestArray(1 To 6) As Long
Dim i As Long
TestArray(1) = 468013: TestArray(2) = 12234455: TestArray(3) = 234523
TestArray(4) = 44444444: TestArray(5) = 123: TestArray(6) = 111222
For i = 1 To 6
Debug.Print TestArray(i) & " --> " & Encrypt(TestArray(i))
Next i
End Sub
'~~> Actual Function
Function Encrypt(n As Long) As String
Dim j As Long, k As Long, sNum As String
sNum = Format(CLng(Right(n, 6)), "000000")
j = 97
For k = 1 To 6
If IsNumeric(Mid(sNum, k, 1)) Then
sNum = Replace(sNum, Mid(sNum, k, 1), Chr(j))
j = j + 1
End If
Next k
Encrypt = sNum
End Function
Output
468013 --> abcdef
12234455 --> abccdd
234523 --> abcdab
44444444 --> aaaaaa
123 --> aaabcd
111222 --> aaabbb
EDIT:
If you are planning to use it as a worksheet function and you are not sure what kind of input will be there then change
Function Encrypt(n As Long) As String
to
Function Encrypt(n As Variant) As String
I would suggest getting to know the Chr() and possibly the Asc() VBA functions along with a general knowledge of how digits and alphabetic characters translate to ASCII code characters. I may be reading things wrong but I thought I saw some contradictions between the examples, your description and the actual code provided. Here is one method putting the pattern generation into a User Defined Function or UDF.
Function num_2_alpha(sNUM As String)
'ASCII 0-9 = 46-57, a-z = 97-122
Dim tmp As String, i As Long, c As Long
sNUM = Right(sNUM, 6)
tmp = Chr(97) ' =a
For i = 2 To 6
If CBool(InStr(1, Left(sNUM, i - 1), Mid(sNUM, i, 1))) Then
tmp = tmp & Mid(tmp, InStr(1, Left(sNUM, i - 1), Mid(sNUM, i, 1)), 1)
Else
'tmp = tmp & Chr(i + 96)
c = c + 1
tmp = tmp & Chr(c + 97) 'alternate (code) method
End If
Next i
num_2_alpha = tmp
End Function
Note that I've offered an alternate method that is commented out. Either that line or the one above it should be active; never both at one time. These were the results generated.
             
Addendum: I believe my recent edit should help conform to the examples you left in comments. Code and image updated.