Delimiting columns very specifically - regex

I've got a column (with many thousands of rows) which I'd like to delimit into multiple rows. I have some experience using regular expressions in Excel, and I have some experience using delimiters in excel, but this one is just a tad too hard..
Let me give you three example-lines:
- 23-12-05: For sale for 2000. 2010-09-09: Not found
- 25-11-09: For sale for 3400. Last date found: 2010-07-08
- 18-06-08: For sale for 5500. 21-07-09: Changed from 5500 to 4900. 16-09-09: Jumped from 4900 to 4700. 2010-02-04: Not found
Most other lines follow these structures. How can I create a new column based on just the first symbols before [COLON]; A second column based on the symbols between the first [COLON] and the first [DOT]. How can I continue to the last IF the text LAST DATE is not found? Finally: How can I use regex (or another way) to use the text 'NOT FOUND' to paste the last date into a new column?
Trust me, I have been at this for quite some time now (sigh). Any help is much appreciated!

you can actually use formulas for this.
Assuming the text is in A1,
B1: =LEFT(A1,FIND(":",A1)-1)
C1: =MID(A1,FIND(":",A1)+1,FIND(".",A1,FIND(":",A1))-FIND(":",A1))
D1: =MID(A1,FIND(".",A1,FIND(":",A1))+1,LEN(A1))
E1: =MID(A1,FIND("Not found",A1)-12,10)
(I'm assuming the date format does not change for the E1)

By the way, this also works for me, to get the last date in a cell:
=LOOKUP(9999999999999999,FIND("**-**-**",A1,ROW($1:$1024)))
Only problem here is: I haven't the slightest clue what exactly I am doing here.
For example, I'd like to use the same code to find the FIRST occurence of a date.
Can anyone explain this code to me? Why am I searching for a very high number? What is it in this code that makes that I find the last occurence? What does it mean that the 'starting number' is "row(1:1024)"?
Anybody knows?

Related

VLOOKUP missing entries?

I have a decently sized spreadsheet of about 6 thousand entries and I am trying to match two columns to make sure they match up using vlookup and that is working fine but I have about 400 or so entries that don't have a match and looking through a few and just using control-f to see if they were somewhere with a different naming convention I found the a matching cell for several of the entries that supposedly don't have a match, any clue why this might be? I checked the length of the values in the cell to see if there was white spacing I was missing but they matched in length and the values in the cell are the exact same. This is the vlookup setup I am using
=VLOOKUP(A1,$B$1:$B$6074,1,0). The CSV is here if anyone wanted to see it. I am not saying they are all wrong but I looked at about 10 or so and they all had matches so I am a bit lost on this.

Arrayformula to check if column contains text and pull the number next to it. Google Sheets

In desperate need of some assistance with this!
Wasn't sure how to title this question...
SAMPLE SHEET - CLICK ME! :)
In SupportingSheet!H1 I have the following formula:
=ArrayFormula(if(G1:G<>"", IF(DASHBOARD!N2<>"", G1:G/DASHBOARD!$P$2-filter(DASHBOARD!O1:O100,REGEXMATCH(DASHBOARD!N1:N100,E1:E100)),G1:G/(DASHBOARD!$M$3)),))
The part I struggle with is:
G1:G/DASHBOARD!$P$2-filter(DASHBOARD!O1:O100,REGEXMATCH(DASHBOARD!N1:N100,E1:E100))
It needs to divide two numbers and then subtract another number. I can't seem to get this formula to pull the correct number.
It needs to check if the text in E1:E100 exist in DASHBOARD!N1:N100, if yes, pull the number from DASHBOARD!O1:O100.
For example, text in SupportingSheet!E1 can be found in DASHBOARD!N2, hence it needs to pull the number from DASHBOARD!O2.
Column SupportingSheet!J has the actual end result that a formula needs to produce.
It doesn't look like Regexmatch works as an Arrayformula and I am not sure how to go about it.
Please note, that text in SupportingSheet!E1:E is not always identical. Often it will have a random number of "space" at the end (long story...). That is why Regexmatch was a perfect option until I realised it didn't work.
Please let me know if further clarification is needed.
Below is an image of the random spaces (non-printable characters) at the end.
use:
=ARRAYFORMULA(IF(G1:G="",,IF(DASHBOARD!N2<>"",
IFNA(G1:G/DASHBOARD!$P$2-VLOOKUP(E1:E1000, DASHBOARD!N1:O100, 2, 0),
G1:G/DASHBOARD!$M$3))))

Avoid duplicate code in Excel IF formula code

I want to avoid duplicate code within excel formulas. Is there a method to repeat a certain code segment?
=IF(A1=1,(A1-B2-C3),(A1-B2-C3)+1)
This would be especially useful when it comes to more complex or longer sections. But: everything must be in ONE formula in ONE cell. Thanks! :-)
EDIT: This is my current code.
=IF(ISNUMBER(SEARCH(".amp",A2)),IFERROR(MID(A2,FIND("#",SUBSTITUTE(A2,"-","#",LEN(A2)-LEN(SUBSTITUTE(A2,"-",""))))+1,SEARCH(".html",A2)-FIND("#",SUBSTITUTE(A2,"-","#",LEN(A2)-LEN(SUBSTITUTE(A2,"-",""))))-5),""),IFERROR(MID(A2,FIND("#",SUBSTITUTE(A2,"-","#",LEN(A2)-LEN(SUBSTITUTE(A2,"-",""))))+1,SEARCH(".html",A2)-FIND("#",SUBSTITUTE(A2,"-","#",LEN(A2)-LEN(SUBSTITUTE(A2,"-",""))))-1),""))
It strips the long ID number out of any URL of a specific CMS. So
FIND("#",SUBSTITUTE(A2,"-","#",LEN(A2)-LEN(SUBSTITUTE(A2,"-","")))
is probably the part which occurs more than once and should be replaced for a code which does not be that duplicate-prone.
EXAMPLE: www.domain.com/path1/path2/this-is-an-article-123-dd-123456789.html --> 1234567890
EXAMPLE: www.domain.com/path1/path2/this-is-an-article-123-dd-1234567890.amp.html ->
1234567890
EXAMPLE: www.domain.com/path1/this-is-an-article-1234567890.html ->
1234567890
In google sheets, you could use REGEXEXTRACT to get what you want:
Formula in B1:
=REGEXEXTRACT(A1,"\d{8,}")
Place the complex common sub-expression in its own cell and refer to that cell.
EDIT#1:
As an alternative, you can use a Named Formula for the sub-expression:
Named Formula
So here is another way of finding the code in Excel:
Here is the formula in Cell B1 which needs to be confirmed by pressing Ctrl+Shift+Enter, then drag it down to apply across board:
{=FILTERXML("<data><a>"&SUBSTITUTE(MID(A1,LARGE(IF(MID(A1,ROW($A$1:INDEX($A:$A,LEN(A1))),1)="-",ROW($A$1:INDEX($A:$A,LEN(A1)))),1)+1,LEN(A1)),".","</a><a>")&"</a></data>","/data/a[1]")}
For the logic behind this formula you may give a read to this article: Extract Words with FILTERXML.
Cheers :)
Ps. it seems that GoogleSheet has out performed Excel in some area already.

Google Sheets Pattern Matching/RegEx for COUNTIF

The documentation for pattern matching for Google Sheets has not been helpful. I've been reading and searching for a while now and can't find this particular issue. Maybe I'm having a hard time finding the correct terms to search for but here is the problem:
I have several numbers (part numbers) that follow this format: ##-####
Categories can be defined by the part numbers, i.e. 50-03## would be one product category, and the remaining 2 digits are specific for a model.
I've been trying to run this:
=countif(E9:E13,"50-03[123][012]*")
(E9:E13 contains the part number formatted as text. If I format it any other way, the values show up screwed up because Google Sheets thinks I'm writing a date or trying to do arithmetic.)
This returns 0 every time, unless I were to change to:
=countif(E9:E13,"50-03*")
So it seems like wildcards work, but pattern matching does not?
As you identified and Wiktor mentioned COUNTIF only supports wildcards.
There are many ways to do what you want though, to name but 2
=ArrayFormula(SUM(--REGEXMATCH(E9:E13, "50-03[123][012]*")))
=COUNTA(FILTER(E9:E13, REGEXMATCH(E9:E13, "50-03[123][012]*")))
This is a really big hammer for a problem like yours, but you can use QUERY to do something like this:
=QUERY(E9:E13, "select count(E) where E matches '50-03[123][012]' label count(E) ''")
The label bit is to prevent QUERY from adding an automatic header to the count() column.
The nice thing about this approach is that you can pull in other columns, too. Say that over in column H, you have a number of orders for each part. Then, you can take two cells and show both the count of parts and the sum of orders:
=QUERY(E9:H13, "select count(E), sum(H) where E matches '50-03[123][012]' label count(E) '', sum(H) ''")
I routinely find this question on $searchEngine and fail to notice that I linked another question with a similar problem and other relevant answers.

Conditional Vlook up without using VBA

I want to convert an input to desired output. Kindly help.
In the output - the columns value should start from most recent (year)
Please click this to see data
Unfortunately VLOOKUP is not able to fulfill that ask. However the INDEX-function can.
Here is a good read on how to use it:
http://fiveminutelessons.com/learn-microsoft-excel/use-index-lookup-multiple-values-list
This will work for you spreedsheet, if your input table starts at A1 without a header and your output table starts at H3 with the first ID.
You get this by copy&pasting the first column of your input table to column H and then remove duplicates.
{=IF(ISERROR(INDEX($A$1:$C$7,SMALL(IF($A$1:$A$7=$H$3,ROW($A$1:$A$7)),ROW(1:1)),3)),"",
INDEX($A$1:$C$7;SMALL(IF($A$1:$A$7=$H$3,ROW($A$1:$A$7)),ROW(1:1)),3))}
Let's look at the formula step by step:
The curly brackets tell excel that this is an array formula, the interesting part for you is: when you've inserted the formula (without curly brackets) press shift+ctrl+enter, excel will then know that this is an array formula.
'error at formula?, then blank, else formula
=IF(ISERROR(....),"",...)
When you autofill this formula you probably dont know how many instances of your lookup variable are. So when you put this formula in 4 cells, but there are only 3 entries, this bit will keep the cell blank instead of giving an error.
INDEX($A$1:$C$7,SMALL(IF($A$1:$A$7=$H$3,ROW($A$1:$A$7)),ROW(1:1)),3))
$A$1:$C$7 is your data matrix. Your IDs (in your case 125 and 501) are to be found in $A$1:$A$7. ROW(1:1) is the absolute(!) rowID, 3 the absolute(!) column id. So when you move your input table those values have to be changed.
What exactly SMALL and INDEX do are well described in the link above. (Or at least better than I could.)
Hope that clarified some parts,
Tom