Regextmatch + partial vlookup in google spreadsheets

Regextmatch + partial vlookup in google spreadsheets - regex

I have a table with three columns in google sheets:
Column A (raw data): it containss different strings for which I need to extract the character "Y" + a match with a string in columns B.
-Columns B: is an unordered list of codes that can occur multiple times as a substring in column A
Column C displays the expected outcome.
Column A (raw data):
Row 1: X Y Apple
Row 2: Z Apple
Row 3: K
Row 4: L M Y Orange
Column B (codes to match, unordered):
Row 1: Apple
Row 2: Orange
Row 3: Mango
Row 4: Banana
Column C (expected outcome):
Row 1: Y Apple
Row 2: Apple
Row 3: (empty cell)
Row 4: Y Orange

I'm not 100% clear on what you need, but is it this in cell C1?
=arrayformula(trim(if(regexmatch(A1:A,"Y "),"Y ",)&if(countif(B:B,iferror(regexextract(A1:A,"\ (\w+)$")))>=1,iferror(regexextract(A1:A,"\ (\w+)$")),)))

Related

A formula that finds the value in range 1 in range 2 and returns the value in another column at that row position

data1
data2
data2
rusult
a
a
apple
apple
c
b
banna
kiwi
b
c
kiwi
banna
c
kiwi
a
apple
a
apple
b
banna
c
kiwi
Find the first value 'a' in data2.
Using the found row position as the index, find the value of the second column of data2.
Record in result.
Repeat process. I want to make this work as a formula!

use:
=INDEX(IFNA(VLOOKUP(AF5:AF; AH:AI; 2; )))

Combine rows with similar information into 1

I have a table that looks like this example:
Order Bagged Shipped
----------------------------------
1 Y
2 Y
1 Y
3 Y
I want to combine like order numbers into 1 row like below:
Order Bagged Shipped
----------------------------------
1 Y Y
2 Y
3 Y
How can I do this in PowerBi desktop?

Assuming your data really is as simple as your example (values are either null or 'Y' and no conflicts), I suggest something like:
SELECT Order, MAX(Bagged), MAX(Shipped)
FROM mytable
GROUP BY Order
The GROUP BY Order indicates you want one row per order, the MAX for the other columns ensures you get the 'Y' (if it exists for that Order) or null (if 'Y' doesn't exist for that Order).

In BI, select Transform, then add the GroupBy function to your existing code:
#"Grouped Rows" = Table.Group(#"Previous Step", {"Order"}, {
{"Bagged", each if List.Contains([Bagged], "Y") then "Y" else null},
{"Shipped", each if List.Contains([Shipped], "Y") then "Y" else null}
})
in
#"Grouped Rows"

how to solve concatenate issue with.cell()? row = row work, column = column gives error

I am looping through an excel sheet, looking for a specific name. When found, I print the position of the cell and the value.
I would like to find the position and value of a neighbouring cell, however I can't get .cell() to work by adding 2, indicating I would like the cell 2 columns away in the same row.
row= row works, but column= column gives error, and column + 2 gives error. Maybe this is due to me listing columns as 'ABCDEFGHIJ' earlier in my code? (For full code, see below)
print 'Cell position {} has value {}'.format(cell_name, currentSheet[cell_name].value)
print 'Cell position next door TEST {}'.format(currentSheet.cell(row=row, column=column +2))
Full code:
file = openpyxl.load_workbook('test6.xlsx', read_only = True)
allSheetNames = file.sheetnames
#print("All sheet names {}" .format(file.sheetnames))
for sheet in allSheetNames:
print('Current sheet name is {}'.format(sheet))
currentSheet = file[sheet]
for row in range(1, currentSheet.max_row + 1):
#print row
for column in 'ABCDEFGHIJ':
cell_name = '{}{}'.format(column,row)
if currentSheet[cell_name].value == 'sign_name':
print 'Cell position {} has value {}'.format(cell_name, currentSheet[cell_name].value)
print 'Cell position TEST {}'.format(currentSheet.cell(row=row, column=column +2))
I get this output:
Current sheet name is Sheet1
Current sheet name is Sheet2
Cell position D5 has value sign_name
and:
TypeError: cannot concatenate 'str' and 'int' objects
I get the same error if I try "column = column" as "column = column +2".
Why does row=row work, but column=column dosen't? And how to find the cell name of the cell to the right of my resulting D5 cell?

The reason row=row works and column=column doesn't is because your column value is a string (letter from A to J) while the column argument of a cell is expecting an int (A would be 1, B would be 2, Z would be 26, etc.)
There are a few changes I would make in order to more effectively iterate through the cells and find a neighbor. Firstly, OpenPyXl offers sheet.iter_rows(), which given no arguments, will provide a generator of all rows that are used in the sheet. So you can iterate with
for row in currentSheet.iter_rows():
for cell in row:
because each row is a generator of cells in that row.
Then in this new nested for loop, you can get the current column index with cell.column (D would give 4) and the cell to the right (increment by one column) would be currentSheet.cell(row=row, column=cell.column+1)
Note the difference between the two cell's: currentSheet.cell() is a request for a specific cell while cell.column+1 is the column index of the current cell incremented by 1.
Relevant OpenPyXl documentation:
https://openpyxl.readthedocs.io/en/stable/api/openpyxl.cell.cell.html
https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.worksheet.html

use a custom function to find all words in a column

Background
The following question is a variation from Unnest grab keywords/nextwords/beforewords function.
1) I have the following word_list
word_list = ['crayons', 'cars', 'camels']
2) And df1
l = ['there are many crayons, in the blue box crayons that are',
'cars! i like a lot of sports cars because they go fast',
'the camels, in the middle east have many camels to ride ']
df1 = pd.DataFrame(l, columns=['Text'])
df1
Text
0 there are many crayons, in the blue box crayons that are
1 cars! i like a lot of sports cars because they go fast
2 the camels, in the middle east have many camels to ride
3) I also have a function find_next_words which uses word_list to grab words from Text column in df1
def find_next_words(row, word_list):
sentence = row[0]
trigger_words = []
next_words = []
for keyword in word_list:
words = sentence.split()
for index in range(0, len(words) - 1):
if words[index] == keyword:
trigger_words.append(keyword)
next_words.append(words[index + 1:index + 3])
return pd.Series([trigger_words, next_words], index = ['TriggerWords','NextWords'])
4) And it's pieced together with the following
df2 = df1.join(df.apply(lambda x: find_next_words(x, word_list), axis=1))
Output
Text TriggerWords NextWords
0 [crayons] [[that, are]]
1 [cars] [[because, they]]
2 [camels] [[to, ride]]
Problem
5) The output misses the following
crayons, from row 0 of Text column df1
cars! from row 1 of Text column df1
camels, from row 2 of Text column df1
Goal
6) Grab all corresponding words from df1 even if the words in df1 have a slight variation e.g. crayons, cars! from the words in word_list
(For this toy example, I know I can easily fix this problem by just adding these word variations to word_list = ['crayons,','crayons', 'cars!',cars, 'camels,', 'camels']. But this would be impractical to do with my my real word_list, which contains ~20K words)
Desired Output
Text TriggerWords NextWords
0 [crayons, crayons] [[in, the], [that, are]]
1 [cars, cars] [[i,like],[because, they]]
2 [camels, camels] [[in, the], [to, ride]]
Questions
How do I 1) tweak my word_list (e.g. regex?) 2) or find_next_words function to achieve my desired output?

You can tweak your regex something like this
\b(crayons|cars|camels)\b(?:[^a-z\n]*([a-z]*)[^a-z\n]*([a-z]*))
Regex Demo

import nltk
change
words = sentence.split()
to
words = nltk.word_tokenize(sentence)
this leads to
'crayons', ','
instead of
'crayons,'
which allows find_next_words to correctly identify all words from word_list in Text column

3 column excel find value cell b2 in column a replace with cell c2

using 3 columns 100k down column "a:a"= part desc with part number y400cc(webpage title) cell "b1"=old part number y400cc "c1"=new part number wpy400cc.
*****Find cell b2 in column a and replace with cell c2 ?***

I am not Sure I understood your question fully. But here is my proposal:
Column A contains some text (Example: CELL A2 = "xx456yy")
Column B contains a part number which may or may not be found in A (Example CELL B2 = "456")
Column C contains the new part number (Example: CELL C2 = "900")
Column D to have the following formula, which will replace the Column B text found in Column A, with Column C text:
=IF(IFERROR(FIND(B2,A2),1)<>1,REPLACE(A2,FIND(B2,A2),LEN(B2),C2),"")

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regextmatch + partial vlookup in google spreadsheets - regex

I'm not 100% clear on what you need, but is it this in cell C1? =arrayformula(trim(if(regexmatch(A1:A,"Y "),"Y ",)&if(countif(B:B,iferror(regexextract(A1:A,"\ (\w+)$")))>=1,iferror(regexextract(A1:A,"\ (\w+)$")),)))

Related

A formula that finds the value in range 1 in range 2 and returns the value in another column at that row position

Combine rows with similar information into 1

how to solve concatenate issue with.cell()? row = row work, column = column gives error

use a custom function to find all words in a column

3 column excel find value cell b2 in column a replace with cell c2

Categories

Resources