VBA transpose column vector by delimiter and include nullstrings - list

Below is an example of my data (column vector):
NAME
A
B
C
A
B
C
[blank cell]
B
C
NAME
A
B
C
Note that the [blank cell] in my document is actually just a blank cell, no formula data etc.
Ultimately, I just need to transpose the data delimited by NAME. NAME cells are the only cells with:
Left(rngCell.Value, 2) = Left(StrConv(rngCell.Value, vbUpperCase), 2)
My code is breaking my transpose and making a new row for all NAME and all [blank cell]s.
I'm trying to get:
NAME A B C A B C B C
NAME A B C A B C A B C
NAME A B C B C B C
But my code is returning:
NAME A B C A B C
B C
NAME A B C A B C A B C
NAME A B C
B C
B C
Here is the code I'm using:
Sub Dataclean()
Dim lngRowLast As Long, _
lngRowPaste As Long, _
lngColOffset As Long
Dim rngCell As Range, _
rngDataSet As Range
Dim strSourceTab As String, _
strOutputTab As String
'Tab name containing source data. Change to suit.
strSourceTab = "sheet2pull"
'Tab name for data output. Change to suit.
strOutputTab = "transposed"
lngRowLast = Sheets(strSourceTab).Cells(Rows.Count, "A").End(xlUp).Row
'Assumes the original dataset is in Column A and starts at Row 1. Change to suit.
Set rngDataSet = Sheets(strSourceTab).Range("A1:A" & lngRowLast)
Application.ScreenUpdating = False
For Each rngCell In rngDataSet
If Left(rngCell.Value, 2) = Left(StrConv(rngCell.Value, vbUpperCase), 2) Then
If lngRowPaste = 0 And lngColOffset = 0 Then
lngRowPaste = 1
lngColOffset = 1
Else
lngRowPaste = lngRowPaste + 1
lngColOffset = 1
End If
ElseIf lngRowPaste = 0 And lngColOffset = 0 Then
lngRowPaste = 1
lngColOffset = 1
End If
Sheets(strOutputTab).Cells(lngRowPaste, lngColOffset).Value = rngCell.Value
lngColOffset = lngColOffset + 1
Next rngCell
Application.ScreenUpdating = True
End Sub
Please let me know if I've been unclear or confusing. I tried to be as explicit as possible, but it's often tough to explain! Thank you so much.
I'm a bit new to VBA, but learning.

You error is when the first if is comparing a blank. This will always be true as a blank and uppercase blank aare equal. Replace your first if with the below.
If Left(rngCell.Value, 2) = Left(StrConv(rngCell.Value, vbUpperCase), 2) And _
Not IsBlank(rngCell.Value) Then

Related

Find starting and ending index of each unique charcters in a string in python

I have a string with characters repeated. My Job is to find starting Index and ending index of each unique characters in that string. Below is my code.
import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
mo = re.search(item,x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)
Output :
a 0 1
b 3 4
c 7 8
Here the end index of the characters are not correct. I understand why it's happening but how can I pass the character to be matched dynamically to the regex search function. For instance if I hardcode the character in the search function it provides the desired output
x = 'aabbbbccc'
xs = set(x)
mo = re.search("[b]+",x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)
output:
b 2 5
The above function is providing correct result but here I can't pass the characters to be matched dynamically.
It will be really a help if someone can let me know how to achieve this any hint will also do. Thanks in advance
String literal formatting to the rescue:
import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
# for patterns better use raw strings - and format the letter into it
mo = re.search(fr"{item}+",x) # fr and rf work both :) its a raw formatted literal
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n) # fix upper limit by n-1
Output:
a 0 3 # you do see that the upper limit is off by 1?
b 3 7 # see above for fix
c 7 9
Your pattern does not need the [] around the letter - you are matching just one anyhow.
Without regex1:
x = "aaabbbbcc"
last_ch = x[0]
start_idx = 0
# process the remainder
for idx,ch in enumerate(x[1:],1):
if last_ch == ch:
continue
else:
print(last_ch,start_idx, idx-1)
last_ch = ch
start_idx = idx
print(ch,start_idx,idx)
output:
a 0 2 # not off by 1
b 3 6
c 7 8
1RegEx: And now you have 2 problems...
Looking at the output, I'm guessing that another option would be,
import re
x = "aaabbbbcc"
xs = re.findall(r"((.)\2*)", x)
start = 0
output = ''
for item in xs:
end = start + len(item[0])
output += (f"{item[1]} {start} {end}\n")
start = end
print(output)
Output
a 0 3
b 3 7
c 7 9
I think it'll be in the Order of N, you can likely benchmark it though, if you like.
import re, time
timer_on = time.time()
for i in range(10000000):
x = "aabbbbccc"
xs = re.findall(r"((.)\2*)", x)
start = 0
output = ''
for item in xs:
end = start + len(item[0])
output += (f"{item[1]} {start} {end}\n")
start = end
timer_off = time.time()
timer_total = timer_off - timer_on
print(timer_total)

Excel | Get all column/row names in which a specific text is as a list

It is difficult for me to describe the problem in the title, so excuse any misleading description.
The easiest way to describe what I need is with an example. I have a table like:
A B C
1 x
2 x x
3 x x
Now what I want is the formula in a cell for every single column and row with each of the column or row name for every x that is placed. In the example like:
A B C
1,2 2,3 3
1 A x
2 A, B x x
3 B, C x x
The column and row names are not equivalent to the excel designation. It works with an easy WHEN statement for single cells (=WHEN(C3="x";C1)), but not for a bunch of them (=WHEN(C3:E3="x";C1:E1)). How should/can such a formula look like?
So I found the answer to my problem. Excel provides the normal CONCATENATE function. What is needed is something like a CONCATENATEIF (in German = verkettenwenn) function. By adding a module in VBA based on a thread from ransi from 2011 on the ms-office-forum.net the function verkettenwenn can be used. The code for the German module looks like:
Option Explicit
Public Function verkettenwenn(Bereich_Kriterium, Kriterium, Bereich_Verketten)
Dim mydic As Object
Dim L As Long
Set mydic = CreateObject("Scripting.Dictionary")
For L = 1 To Bereich_Kriterium.Count
If Bereich_Kriterium(L) = Kriterium Then
mydic(L) = Bereich_Verketten(L)
End If
Next
verkettenwenn = Join(mydic.items, ", ")
End Function
With that module in place one of the formula for the mentioned example looks like: =verkettenwenn(C3:E3;"x";$C$1:$K$1)
The English code for a CONCATENATEIF function should probably be:
Option Explicit
Public Function CONCATENATEIF(Criteria_Area, Criterion, Concate_Area)
Dim mydic As Object
Dim L As Long
Set mydic = CreateObject("Scripting.Dictionary")
For L = 1 To Criteria_Area.Count
If Criteria_Area(L) = Criterion Then
mydic(L) = Concate_Area(L)
End If
Next
CONCATENATEIF = Join(mydic.items, ", ")
End Function

Split string of digits into individual cells, including digits within parentheses/brackets

I have a column where each cell has a string of digits, ?, -, and digits in parentheses/brackets/curly brackets. A good example would be something like the following:
3????0{1012}?121-2[101]--01221111(01)1
How do I separate the string into different cells by characters, where a 'character' in this case refers to any number, ?, -, and value within the parentheses/brackets/curly brackets (including said parentheses/brackets/curly brackets)?
In essence, the string above would turn into the following (spaced apart to denote a separate cell):
3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
The amount of numbers within the parentheses/brackets/curly brackets vary. There are no letters in any of the strings.
Here you are!
RegEx method:
Sub Test_RegEx()
Dim s, col, m
s = "3????0{1012}?121-2[101]--01221111(01)1"
Set col = CreateObject("Scripting.Dictionary")
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "(?:\d|-|\?|\(\d+\)|\[\d+\]|\{\d+\})"
For Each m In .Execute(s)
col(col.Count) = m
Next
End With
MsgBox Join(col.items) ' 3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
End Sub
Loop method:
Sub Test_Loop()
Dim s, col, q, t, k, i
s = "3????0{1012}?121-2[101]--01221111(01)1"
Set col = CreateObject("Scripting.Dictionary")
q = "_"
t = True
k = 0
For i = 1 To Len(s)
t = (t Or InStr(1, ")]}", q) > 0) And InStr(1, "([{", q) = 0
q = Mid(s, i, 1)
If t Then k = k + 1
col(k) = col(k) & q
Next
MsgBox Join(col.items) ' 3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
End Sub
Something else to look at :)
Sub test()
'String to parse through
Dim aStr As String
'final string to print
Dim finalString As String
aStr = "3????0{1012}?121-2[101]--01221111(01)1"
'Loop through string
For i = 1 To Len(aStr)
'The character to look at
char = Mid(aStr, i, 1)
'Check if the character is an opening brace, curly brace, or parenthesis
Dim result As String
Select Case char
Case "["
result = loop_until_end(Mid(aStr, i + 1), "]")
i = i + Len(result)
result = char & result
Case "("
result = loop_until_end(Mid(aStr, i + 1), ")")
i = i + Len(result)
result = char & result
Case "{"
result = loop_until_end(Mid(aStr, i + 1), "}")
i = i + Len(result)
result = char & result
Case Else
result = Mid(aStr, i, 1)
End Select
finalString = finalString & result & " "
Next
Debug.Print (finalString)
End Sub
'Loops through and concatenate to a final string until the end_char is found
'Returns a substring starting from the character after
Function loop_until_end(aStr, end_char)
idx = 1
If (Len(aStr) <= 1) Then
loop_until_end = aStr
Else
char = Mid(aStr, idx, 1)
Do Until (char = end_char)
idx = idx + 1
char = Mid(aStr, idx, 1)
Loop
End If
loop_until_end = Mid(aStr, 1, idx)
End Function
Assuming the data is in column A starting in row 1 and that you want the results start in column B and going right for each row of data in column A, here is alternate method using only worksheet formulas.
In cell B1 use this formula:
=IF(OR(LEFT(A1,1)={"(","[","{"}),LEFT(A1,MIN(FIND({")","]","}"},A1&")]}"))),IFERROR(--LEFT(A1,1),LEFT(A1,1)))
In cell C1 use this formula:
=IF(OR(MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1)={"(","[","{"}),MID($A1,SUMPRODUCT(LEN($B1:B1))+1,MIN(FIND({")","]","}"},$A1&")]}",SUMPRODUCT(LEN($B1:B1))+1))-SUMPRODUCT(LEN($B1:B1))),IFERROR(--MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1),MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1)))
Copy the C1 formula right until it starts giving you blanks (there are no more items left to split out from the string in the A cell). In your example, need to copy it right to column AA. Then you can copy the formulas down for the rest of your Column A data.

Extracting columns with a difference in aligned data

I have some aligned data (something bioinformatic related) as so:
reference_string = 'yearning'
string2 = 'learning'
string3 = 'aligning'
I need to extract only columns showing differences in relation to the reference data.
The output should show only positional information of the columns containing differences in relation to the reference string and the corresponding reference item.
1 2 3 4
y e a r
l
a l i g
My current code does most things okay except that it also reports columns with no difference.
string1 = 'yearning'
string2 = 'learning'
string3 = 'aligning'
string_list = [string1, string2]
reference = reference_string
diffs_top, diffs = [], []
all_diffs = set()
for s in string_list:
diffs = []
for i, c in enumerate(s):
if s[i] != reference[i]:
diffs.append(i)
all_diffs.add(i)
diffs_top.append(diffs)
for d in all_diffs:
print str(int(d+1)),
print
for c in reference:
print str(c),
print
for i, s in enumerate(string_list):
for j, c in enumerate(s):
if j in diffs_top[i]:
print str(c),
else:
print str(' '),
print
This code would give:
1 2 3 4
y e a r n i n g
l
a l i g
Any help appreciated.
EDIT: I have picked some section of real data to make the problem as clearer as possible and my attempt at solving it thus far:
reference_string = 'MAHEWGPQRLAGGQPQAS'
string1 = 'MAQQWSLQRLAGRHPQDS'
string2 = 'MAQRWGAHRLTGGQLQDT'
string3 = 'MAQRWGPHALSGVQAQDA'
string_list = [string1, string2, string3]
reference = reference_string
diffs_top, diffs = [], []
all_diffs = set()
for s in string_list:
diffs = []
for i, c in enumerate(s):
if s[i] != reference[i]:
diffs.append(i)
all_diffs.add(i)
diffs_top.append(diffs)
#print diffs_top
#print all_diffs
for d in all_diffs:
print str(int(d+1)), # retains natural positions of the reference residues
print
for d in all_diffs:
for i, c in enumerate(reference):
if i == d:
print c,
print
The print out will be an output showing the position at which there is any difference to other non-reference strings and the corresponding reference letter.
3 4 6 7 8 9 11 13 14 15 17 18
H E G P Q R A G Q P A S
Then the next step is to write a code that will process non reference strings by printing out the difference with the reference (at that position). If there is no difference it will leave blank (' ').
Doing it manually the output will be:
3 4 6 7 8 9 11 13 14 15 17 18
H E G P Q R A G Q P A S
Q Q S L R H D
Q R A H T L D T
Q R H A S V A D A
My entire code as an attempt to get to the solution above as been messy to say the least:
reference_string = 'MAHEWGPQRLAGGQPQAS'
string1 = 'MAQQWSLQRLAGRHPQDS'
string2 = 'MAQRWGAHRLTGGQLQDT'
string3 = 'MAQRWGPHALSGVQAQDA'
string_list = [string1, string2, string3]
reference = reference_string
diffs_top, diffs = [], []
all_diffs = set()
for s in string_list:
diffs = []
for i, c in enumerate(s):
if s[i] != reference[i]:
diffs.append(i)
all_diffs.add(i)
diffs_top.append(diffs)
#print diffs_top
#print all_diffs
for d in all_diffs:
print str(int(d+1)),
print
for d in all_diffs:
for i, c in enumerate(reference):
if i == d:
print c,
print
# this is my attempt to look into non-reference strings
# to check for the difference with the reference, and print an output.
for d in all_diffs:
for i, s in enumerate(string_list):
for j, c in enumerate(s):
if j == d:
print c,
else:
print str(' '),
print
Your code is working perfectly fine (as per your logic).
What is happening , is that while printing the output, when you come across the reference string, Python looks for the corresponding entry in the diffs_top list and because while storing in diff_top, you have no entry stored for the reference string, Python just prints blank spaces for your reference string.
1 2 3 4
y e a r n i n g #prints the reference string, because you've coded in that way
#prints blank as string_list[0] and reference string are the same
l
a l i g
The question here is how exactly do you define your difference for reference string.
Besides, I also found some fundamental flaws in your code implementation. If you try to run your code by setting string_list[1] as your reference string, you would get your output as :
1 2 3 4
l e a r n i n g
y
a l i g
Is this what you need? Please spend some time in properly defining difference for all cases and then try to implement you code.
EDIT:
As per you updated requirements, replace the last block in your code with this:
for i, s in enumerate(string_list):
for d in all_diffs:
if d in diffs_top[i]:
print s[d],
else:
print ' ',
print
Cheers!
I think there is a general problem in your logic. If you need to extract only columns showing difference in relation to the reference data and string1 is the reference the output should be:
1 2 3 4
l
a l i g
So, 'yearning' shouldn't show any character because it has no difference to string1.
If you delete or put the following lines in comments, you will exactly get what I expect is the right answer:
#for c in reference:
# print str(c),
#print
Consider to review your logic if this solution is not what you actually want.
Update
Here is a shorter solution which solves your task:
from itertools import compress, izip_longest
def delta(reference, string):
return [ '' if a == b else b for a, b in izip_longest(reference, string)]
ref_string = 'MAHEWGPQRLAGGQPQAS'
strings = ['MAQQWSLQRLAGRHPQDS',
'MAQRWGAHRLTGGQLQDT',
'MAQRWGPHALSGVQAQDA']
delta_strings = [delta(ref_string, string) for string in strings]
selectors = [1 if any(tup) else 0 for tup in izip_longest(*delta_strings)]
indices = [str(i+1) for i in range(len(selectors))]
output_data = [indices, ref_string] + delta_strings
for line in output_data:
print ''.join(x.rjust(3) for x in compress(line, selectors))
Explanation:
I defined a function delta(reference, string) which returns the delta between the string and the referenced string. For example: delta("ABFF", "AECF") returns the list ['', E, C, ''].
The variable delta_strings holds all the deltas between each string in the list strings and the reference string ref_string.
The variable selector is a list containing only 1 and 0 values, where 0 specifies the collumns which shouldn't be printed and vice versa.

excel/vba: count number of unique male and female students in list

I have a database of students with names and their gender, however the list contains repeats of students on different dates. How do I count the number of unique male students and unique female students on my list?
Here is a sample database:
Name Date Gender
A 8/1/2013 M
B 8/2/2013 F
C 8/2/2013 F
A 9/2/2013 M
A 9/3/2013 M
C 8/31/2013 F
B 8/15/2013 F
D 10/5/2013 M
The total count for unique males should be 2, and unique females should be 2.
I tried to play around with the sum(if(frequency)) variation formula but without luck. I'm not sure how to tie it to using the names.
I don't mind using VBA code either.
Any suggestions would be appreciated.
Thanks!
Assuming that each name will always have the same gender if repeated (!) you can use a formula like this to count different males in the list:
=SUMPRODUCT((C2:C100="M")/COUNTIF(A2:A100,A2:A100&""))
obviously change M to F for female count
Sort the table by name A to Z first and then run this macro. Note that I am hard coding the range A2:A9.
This will only work if you sort your list first. Select all columns and rows and right click and sort A to Z by the name column.
Sub LoopRange()
Dim rCell As Range
Dim rRng As Range
Set rRng = Sheet1.Range("A2:A9")
Dim countM As Integer
Dim countF As Integer
Dim name As String
For Each rCell In rRng.Cells
If rCell.Value > name Then
If rCell.Offset(0, 2).Value = "M" Then
countM = countM + 1
End If
If rCell.Offset(0, 2).Value = "F" Then
countF = countF + 1
End If
End If
name = rCell.Value
Next rCell
MsgBox ("Males = " & countM & ", Females = " & countF)
'Range("E3").Value = countF
'Range("E4").Value = countM
End Sub