Extract multiple substrings of numbers of a specific length from string in Google Sheets - regex

I'd need to split or extract only numbers made of 8 digits from a string in Google Sheets.
I've tried with SPLIT or REGEXREPLACE but I can't find a way to get only the numbers of that length, I only get all the numbers in the string!
For example I'm using
=SPLIT(lower(N2),"qwertyuiopasdfghjklzxcvbnm`-=[]\;' ,./!:##$%^&*()")
but I get all the numbers while I only need 8 digits numbers.
This may be a test value:
00150412632BBHBBLD 12458 32354 1312548896 ACT inv 62345471
I only need to extract "62345471" and nothing else!
Could you please help me out?
Many thanks!

Please use the following formula for a single cell.
Drag it down for more cells.
=INDEX(TRANSPOSE(QUERY(TRANSPOSE(IF(LEN(SPLIT(REGEXREPLACE(A2&" ","\D+"," ")," "))=8,
SPLIT(REGEXREPLACE(A2&" ","\D+"," ")," "),"")),"where Col1 is not null ",0)))
Functions used:
QUERY
INDEX
TRANSPOSE
IF
LEN
SPLIT
REGEXREPLACE

If you only need to do this for one cell (or you have your heart set on dragging the formula down into individual cells), use the following formula:
=REGEXEXTRACT(" "&N2&" ","\s(\d{8})\s")
However, I suspect you want to process the eight-digit number out of all cells running N2:N. If that is the case, clear whatever will be your results column (including any headers) and place the following in the top cell of that otherwise cleared results column:
=ArrayFormula({"Your Header"; IF(N2:N="",,IFERROR(REGEXEXTRACT(" "&N2:N&" ","\s(\d{8})\s")))})
Replace the header text Your Header with whatever you want your actual header text to be. The formula will show that header text and will return all results for all rows where N2:N is not null. Where no eight-digit number is found, null will be returned.
By prepending and appending a space to the N2:N raw strings before processing, spaces before and after string components can be used to determine where only eight digits exist together (as opposed to eight digits within a longer string of digits).
The only assumption here is that there are, in fact, spaces between string components. I did not assume that the eight-digit number will always be in a certain position (e.g., first, last) within the string.

Try this, take a look at Example sheet
=FILTER(TRANSPOSE(SPLIT(B2," ")),LEN(TRANSPOSE(SPLIT(B2," ")))=8)
Or this to get them all.
=JOIN(" ,",FILTER(TRANSPOSE(SPLIT(B2," ")),LEN(TRANSPOSE(SPLIT(B2," ")))=8))
Explanation
SPLIT with the dilimiter set to " " space TRANSPOSE and FILTER TRANSPOSE(SPLIT(B2," ") with the condition1 set to LEN(TRANSPOSE(SPLIT(B2," "))) is = 8
JOIN the outputed column whith " ," to gat all occurrences of number with a length of 8
Note: to get the numbers with the length of N just replace 8 in the FILTER function with a cell refrence.

Using this on a cell worked just fine for me:
(cell_with_data)=REGEXEXTRACT(A1,"[0-9]{8}$")

Related

Pull initials from name field adding periods after each initial

As the title says, given a column of an arbitrary number of words of arbitrary length, Want a single ArrayFormula to get the first letters of all words in the said column.
I have tried two methods, seen in the sample sheet.
Using SPLIT and ARRAYFORMULA can get it one cell but cannot extend down the column.
Using 2 REGEXEXTRACT, can get for first 2 initials and extend down
But is it possible to get for an arbitrary number of words for the whole column using ArrayFormula?
Is it possible to use REGEXEXTRACT to return the first letters of many words?
I would also like to place a ". " after the initials. Example to make Ed Williams into E. W.?
In addition to player0's solution, this might also work
=ArrayFormula(iferror(if(len(A:A), regexreplace(substitute(A:A&".", " ", ". "), "[^A-Z.\s]",),)))
=ARRAYFORMULA(TRANSPOSE(QUERY(TRANSPOSE(IF(LEN(A1:A),
IFERROR(REGEXEXTRACT(SPLIT(A1:A, " "), "."))&".", )),,999^99)))
=ARRAYFORMULA(UPPER(REGEXREPLACE(A1:A6,"(\w)\S*\s*","$1. ")))
Capture the first letter(\w) followed by non Space characters and space characters
Replace everything with just the capture group \w($1) and period . .
Convert it to UPPER case, if needed.

Power BI - extract number from text string based on conditions

I have done extensive searching and I don't believe this is a repeat, but is definitely and extension of previous questions. I am attempting to extract numbers from a text string within a Power BI function. I have successfully extracted the numbers from the string into a value using the below:
Text.Combine(
List.RemoveNulls(
List.Transform(
Text.ToList([string_col]),
each if Value.Is(Value.FromText(_), type number)
then _ else null)
)
)
Using this code works great when the number I am interested in is the only number in the string, for example:
"Bring on the 1234567 comments" results in 1234567
However, I can't resolve extracting my number when multiple different numbers occur in the string, for example:
"Bring on on the 1234567 comments with 50 telling me this is a repeat" results in 123456750
What I need to do is one pull the number within the string that meets conditions (one in my case). For my particular issue, the number I need to extract will always be the only 7 digit number in the string, so I feel like this should be a more straight forward answer?
Is there a way to extract only the 7 digit number using my provided function or something similar? If I am way off base, can someone please set me on the proper path?
As always, the communities help is greatly appreciated.
Diedrich
First, you could use the Text.Select function to extract all numbers.
FirstStep =
Table.AddColumn(Source, "MyNumberColumn", each Text.Select([MyStringColumn], {"0".."9"}))
I found this solution on this blog post from Erik Svensen:
https://eriksvensen.wordpress.com/2018/03/06/extraction-of-number-or-text-from-a-column-with-both-text-and-number-powerquery-powerbi
For your specific requirement, maybe you need to column type the NumberColumn as text:
FirstStep =
Table.AddColumn(
Source,
"MyTempNumberColumn",
each Text.Select([MyStringColumn], {"0".."9"}),
type text)
From there, depending on the length of the result you could test presence of seven characters sequence in original string, as many times as needed until you reach the end of the new sequence made only of numbers.
SecondStep=
Table.AddColumn("My7numbers",
each if Text.Length([MyNumberColumn]) = 7
then [MyTempNumberColumn]
else if
Text.Contains([MyStringColumn],
Text.Range([MyTempNumberColumn], 0, 7))
then
Text.Range([MyTempNumberColumn], 0, 7))
else if
Text.Contains([MyStringColumn],
Text.Range([MyTempNumberColumn], 1, 7)
then
Text.Range([MyTempNumberColumn], 1, 7))
Depending on how many numbers you can get, it might worth trying to use Liste.Generate in a function that would give a list of every 7 figures sequences from [MyTempNumberColumn], whatever its length.
https://learn.microsoft.com/en-us/powerquery-m/list-generate

Google Sheets ArrayFormula to get INITIALS of arbitrary length name

Sample sheet.
As the title says, given a column of arbitrary number of words of arbitrary length, Want a single ArrayFormula to get the first letters of all words in the said column.
I have tried two methods, seen in sample sheet.
1) Using SPLIT and ARRAYFORMULA, can get it one cell but cannot extend down column.
2) Using 2 REGEXEXTRACT, can get for first 2 initials and extend down
But is it possible to get for arbitrary number of words for whole column using ArrayFormula.
Is it possible to use REGEXEXTRACT to return the first letters of many words?
This replaces every word with the captured first letter
=ARRAYFORMULA(UPPER(REGEXREPLACE(A1:A6,"(\w)\S*\s?","$1")))

Count specific character in a cell in openoffice calc

I have a cell with some text content.
For example: "Red, shirt, size,"
I need to count how many times the comma is used in this cell.
The result should be "3"
Any ideas?
You can use the following formula: LEN(Cell)-LEN(SUBSTITUTE(Cell;"YourCharacter";""))
In your case, the formula would be: LEN(Cell)-LEN(SUBSTITUTE(Cell;",";"").
LEN(Cell) does the following: Counts the number of characters in your cell.
LEN(SUBSTITUTE(Cell;"YourCharacter";"")) counts the number of characters in your cell without the character ",". By subtracting the second formula, you get the number of occurrences of your character.
You can use LEN() function as below
=LEN(cell)-LEN(SUBSTITUTE(cell;",";""))

SQLite: How to split a column

I have a column containing two names, which I'd like to extract into two separate columns surname1 and surname2 (I don't need the name nor the initial letter (e.g. N.)).
The exemplary content of that column is:
AwyeEaef2012 MS101 N.Lopez-O.Lorenzi.txt
-Lopez and Lorenzi are these two which we are looking for in this row.
What is good about my situation is that the first name comes always after the first dot (.) and ends just before the dash (-) and the second name comes just after second dot and ends just before the third dot and txt (.txt).
I know how to write a regex and using LIKE check if that column contains some specific surname but not the opposite way- how to read surnames and write them into two new columns.
Several rows from that column look like below:
WyeEaef MN2014 MS401 N.Lopez-O.Lorenzi.txt
AwyufEQ WCH2014 OS401 N.Lorenzi-O.Lopez.txt
THAFa5u WCH2014 LS107 N.Larry-O.Lolly.txt
So the pattern is as I mentioned *.Name1-[A-Z].Name2.txt
Where * is max 30 characters of capital and small letters and numbers
It could be approached in this manner: other words we need to divide this into substrings divided by dots first substring is a waste, the second without two last characters(a dash and acapital letter, e.g. -O) is the first name, the third substring is the second name and the fourth is another waste(a former file format).
I'd like to have an output of three columns:
initialColumn, firstName, secondName
The workaround that I wrote as a formula in Excel which I personally don't love, but might be useful for someone in the future.
=MID(A1;FIND(".";A1;1)+1;FIND(".";A1;FIND(".";A1;1)+1)-FIND(".";A1;1)-3)
I was surprised that Excel can manage processing ~0.5mln of records in the blink of an eye.