Power BI - extract number from text string based on conditions - powerbi

I have done extensive searching and I don't believe this is a repeat, but is definitely and extension of previous questions. I am attempting to extract numbers from a text string within a Power BI function. I have successfully extracted the numbers from the string into a value using the below:
Text.Combine(
List.RemoveNulls(
List.Transform(
Text.ToList([string_col]),
each if Value.Is(Value.FromText(_), type number)
then _ else null)
)
)
Using this code works great when the number I am interested in is the only number in the string, for example:
"Bring on the 1234567 comments" results in 1234567
However, I can't resolve extracting my number when multiple different numbers occur in the string, for example:
"Bring on on the 1234567 comments with 50 telling me this is a repeat" results in 123456750
What I need to do is one pull the number within the string that meets conditions (one in my case). For my particular issue, the number I need to extract will always be the only 7 digit number in the string, so I feel like this should be a more straight forward answer?
Is there a way to extract only the 7 digit number using my provided function or something similar? If I am way off base, can someone please set me on the proper path?
As always, the communities help is greatly appreciated.
Diedrich

First, you could use the Text.Select function to extract all numbers.
FirstStep =
Table.AddColumn(Source, "MyNumberColumn", each Text.Select([MyStringColumn], {"0".."9"}))
I found this solution on this blog post from Erik Svensen:
https://eriksvensen.wordpress.com/2018/03/06/extraction-of-number-or-text-from-a-column-with-both-text-and-number-powerquery-powerbi
For your specific requirement, maybe you need to column type the NumberColumn as text:
FirstStep =
Table.AddColumn(
Source,
"MyTempNumberColumn",
each Text.Select([MyStringColumn], {"0".."9"}),
type text)
From there, depending on the length of the result you could test presence of seven characters sequence in original string, as many times as needed until you reach the end of the new sequence made only of numbers.
SecondStep=
Table.AddColumn("My7numbers",
each if Text.Length([MyNumberColumn]) = 7
then [MyTempNumberColumn]
else if
Text.Contains([MyStringColumn],
Text.Range([MyTempNumberColumn], 0, 7))
then
Text.Range([MyTempNumberColumn], 0, 7))
else if
Text.Contains([MyStringColumn],
Text.Range([MyTempNumberColumn], 1, 7)
then
Text.Range([MyTempNumberColumn], 1, 7))
Depending on how many numbers you can get, it might worth trying to use Liste.Generate in a function that would give a list of every 7 figures sequences from [MyTempNumberColumn], whatever its length.
https://learn.microsoft.com/en-us/powerquery-m/list-generate

Related

Extract multiple substrings of numbers of a specific length from string in Google Sheets

I'd need to split or extract only numbers made of 8 digits from a string in Google Sheets.
I've tried with SPLIT or REGEXREPLACE but I can't find a way to get only the numbers of that length, I only get all the numbers in the string!
For example I'm using
=SPLIT(lower(N2),"qwertyuiopasdfghjklzxcvbnm`-=[]\;' ,./!:##$%^&*()")
but I get all the numbers while I only need 8 digits numbers.
This may be a test value:
00150412632BBHBBLD 12458 32354 1312548896 ACT inv 62345471
I only need to extract "62345471" and nothing else!
Could you please help me out?
Many thanks!
Please use the following formula for a single cell.
Drag it down for more cells.
=INDEX(TRANSPOSE(QUERY(TRANSPOSE(IF(LEN(SPLIT(REGEXREPLACE(A2&" ","\D+"," ")," "))=8,
SPLIT(REGEXREPLACE(A2&" ","\D+"," ")," "),"")),"where Col1 is not null ",0)))
Functions used:
QUERY
INDEX
TRANSPOSE
IF
LEN
SPLIT
REGEXREPLACE
If you only need to do this for one cell (or you have your heart set on dragging the formula down into individual cells), use the following formula:
=REGEXEXTRACT(" "&N2&" ","\s(\d{8})\s")
However, I suspect you want to process the eight-digit number out of all cells running N2:N. If that is the case, clear whatever will be your results column (including any headers) and place the following in the top cell of that otherwise cleared results column:
=ArrayFormula({"Your Header"; IF(N2:N="",,IFERROR(REGEXEXTRACT(" "&N2:N&" ","\s(\d{8})\s")))})
Replace the header text Your Header with whatever you want your actual header text to be. The formula will show that header text and will return all results for all rows where N2:N is not null. Where no eight-digit number is found, null will be returned.
By prepending and appending a space to the N2:N raw strings before processing, spaces before and after string components can be used to determine where only eight digits exist together (as opposed to eight digits within a longer string of digits).
The only assumption here is that there are, in fact, spaces between string components. I did not assume that the eight-digit number will always be in a certain position (e.g., first, last) within the string.
Try this, take a look at Example sheet
=FILTER(TRANSPOSE(SPLIT(B2," ")),LEN(TRANSPOSE(SPLIT(B2," ")))=8)
Or this to get them all.
=JOIN(" ,",FILTER(TRANSPOSE(SPLIT(B2," ")),LEN(TRANSPOSE(SPLIT(B2," ")))=8))
Explanation
SPLIT with the dilimiter set to " " space TRANSPOSE and FILTER TRANSPOSE(SPLIT(B2," ") with the condition1 set to LEN(TRANSPOSE(SPLIT(B2," "))) is = 8
JOIN the outputed column whith " ," to gat all occurrences of number with a length of 8
Note: to get the numbers with the length of N just replace 8 in the FILTER function with a cell refrence.
Using this on a cell worked just fine for me:
(cell_with_data)=REGEXEXTRACT(A1,"[0-9]{8}$")

Regex in sqlite

i have been thinking about this a lot.
So i wanna create a table which contains a password.
The password should at least be 6 chars long a contain minimum 2 numbers.
My version was:
create table User (
passwort varchar(80) not null check (length(passwort) >= 6 and passwort like '%[0-9]%[0-9]%')
);
The Problem with this approach is that the password has to contain [0-9] twice instead of the actual numbers. Does anyone know how to get rid of that problem ?
Thanks in advance.
How about .*?\d.*?\d.*?
This ensures that between zero or more characters (including digits), there must be 2 digits.
While I still recommend you split the work in 2 as per my comment, ie.
Check the length of the string.
Use the actual expression to check if the string contains 2 numbers.
You can use the following expression: ^(?=.{6,}).*?\d.*?\d.*?$. What is does is that it looks ahead for a minimum of 6 characters and then checks that the string is made up from 2 numbers, which can be separated by 0 or more characters.
An example of the expression is available here.

REGEXEXTRACT - Error when trying to get a phone number from sting

I am wondering if someone can help me get this formula right in google spreadsheets.
After a 2 week event I do get a spreadsheet with more that 2000 rows of comments which include phone numbers here and there. I am trying to extract the phone numbers from those strings.
example string: call at 228-219-4241 after
formula: =IFERROR(REGEXEXTRACT(V133,"^(?(?:\d{3}))?[-.]?(?:\d{3})[-.]?(?:\d{4})$"),"NOT FOUND!!!")
and I do get "NOT FOUND!!!!
image from gsheet... NOT FOUND!!!
But it works only in this case..
just the number
Cheers.
Your regex is too complicated and your restricting it to a rule that says the number is the first thing in the string, change to this:
=iferror(regexextract(A1,"\d{3}\-\d{3}\-\d{4}"))
In your example the '^' sign means beginning of the line and '$' means the end so your saying the first thing in your string will always be 3 numbers and the last will always be 4

extract number from string in Oracle

I am trying to extract a specific text from an Outlook subject line. This is required to calculate turn around time for each order entered in SAP. I have a subject line as below
SO# 3032641559 FW: Attached new PO 4500958640- 13563 TYCO LJ
My final output should be like this: 3032641559
I have been able to do this in MS excel with the formulas like this
=IFERROR(INT(MID([#[Normalized_Subject]],SEARCH(30,[#[Normalized_Subject]]),10)),"Not Found")
in the above formula [#[Normalized_Subject]] is the name of column in which the SO number exists. I have asked to do this in oracle but I am very new to this. Your help on this would be greatly appreciated.
Note: in the above subject line the number 30 is common in every subject line.
The last parameter of REGEXP_SUBSTR() indicates the sub-expression you want to pick. In this case you can't just match 30 then some more numbers as the second set of digits might have a 30. So, it's safer to match the following, where x are more digits.
SO# 30xxxxxx
As a regular expression this becomes:
SO#\s30\d+
where \s indicates a space \d indicates a numeric character and the + that you want to match as many as there are. But, we can use the sub-expression substringing available; in order to do that you need to have sub-expressions; i.e. create groups where you want to split the string:
(SO#\s)(30\d+)
Put this in the function call and you have it:
regexp_substr(str, '(SO#\s)(30\d+)', 1, 1, 'i', 2)
SQL Fiddle

Regular Expressions, group elements in a list

I'm parsing an html page - converting it to a full string - and what I get is a list that I would be able to split into sections. Here's what I mean:
chemestry  // subject name
21/10 // date of the exam
6+ // value of the first mark
25/12 // date of the second exam
8 // value of the second mark
physics // subject name [and so on...]
27/11
10
the app I'm building will display a tableview representing each subject, if a cell is clicked the user will see the list of marks that he got in that specific subject.
What I need to do is let the "compiler" know that a string represents the subject name, two integers split by a "/" represents the mark's date and a single integer (eventually followed by a + or -) is the mark's value. Also, each time the compiler encounter a new string it'll mean that a new and different subject is being parsed.
How can I achieve this? Thank you very much for your help in advance
Try this regex:
[a-z ]+\s*(?:[^a-z]+?\d{2}/\d{2}[^a-z]+?\d+½?[+-]?)+
DEMO: https://regex101.com/r/eM3mD4/2