Keep only numbers in a string / remove all non-numbers - powerbi

I have a string column where I only need to the numbers from each string, e.g.
A-123 -> 123
456 -> 456
7-X89 -> 789
How can this be done in PowerQuery?

Add column. In custom column formula type this one-liner:
= Text.Select( [Column], {"0".."9"} )
where [Column] is a string column with a mix of digits and other characters. It extracts numbers only. The new column is still a text column, so you have to change the type.
Edit. If there are dots and minus characters:
= Text.Select( [Column1], {"0".."9", "-", "."} ))
Alternatively, you can transform the existing column:
= Table.TransformColumns( #"PreviousStepName" , {{"Column", each Text.Select( _ , {"0".."9","-","."} ) }} )

An alternative solution is to split the values on each number, and remove blanks from the resulting list.
This result can be used as a new list of delimiters to be used with function Splitter.SplitTextByEachDelimiter to split the original text again and combine the resulting list to the final result.
Explanation: Splitter.SplitTextByEachDelimiter first splits on the first delimiter in the list, then on the second and so on. Note that this function creates a function that must be called with the original string as parameter, so like S.S(delimiters)(string).
Example code:
let
Source = Table1,
NumbersOnly = Table.TransformColumns(Source,{{"String", (string) => Text.Combine(Splitter.SplitTextByEachDelimiter(List.Select(Text.SplitAny(string,"0123456789"), each _ <> ""))(string))}})
in
NumbersOnly

First, create a custom function in PowerQuery using New Query - From Other Sources -> Blank Query. Open the Advanced Editor and paste the following code:
(source) =>
let
NumbersOnly = (char) => if Character.ToNumber(char) >=48 and Character.ToNumber(char) < 58 then char else "",
Len = Text.Length(source),
Acc = List.Accumulate(
List.Generate( () => 0, each _ < Len, each _ + 1),
"",
(acc, index) => acc& NumbersOnly(Text.At(source, index))
),
AsNumber = Number.FromText(Acc)
in
AsNumber
Name this query NumbersOnly.
Now in your main query, add another calculated column where you call this NumbersOnly function with the source column, e.g.:
let
Source = Table.FromRecords({[text="A-123"], [text="456"], [text="7-X89"]}),
Result = Table.AddColumn(Source, "Values", each NumbersOnly([text]), Int64.Type)
in
Result

Related

power bi DAX hierarchical table concatenation of names

Since two days I'm on a problem and I can't solve it so I come here to ask some help...
I have that bit of dax that basically take the path of a hierarchical table (integers) and take the string names of the 2 first in the path.
the names I use:
'HIERARCHY' the hierarchical table with names, id, path, nbrItems, string
mytable / addedcolumn1/2 the new table used to emulate the for loop
DisplayPath =
var __Path =PATH(ParentChild[id], ParentChild[parent_id])
var __P1 = PATHITEM(__Path,1) var __P2 = PATHITEM(__Path,2)
var l1 = LOOKUPVALUE(ParentChild[Place],ParentChild[id],VALUE(__P1))
var l2a = LOOKUPVALUE(ParentChild[Place],ParentChild[id],VALUE(__P2))
var l2 = if(ISBLANK(l2a), "", " -> " & l2a)
return CONCATENATE(l1,l2)
My problem is... I don't know the number of indexes in my path, can go from 0 to I guess 15...
I've tried some things but can't figure out a solution.
First I added a new column called nbrItems which calculate the number of items in the list of the path.
The two columns:
Then I added that bit of code that emulates a for loop depending on the number of items in the path list, and I'd like in it to
get name of parameters
concatenate them in one string that I can return and get
string =
var n = 'HIERARCHY'[nbrItems]
var mytable = GENERATESERIES(1, n)
var addedcolumn1 = ADDCOLUMNS(mytable, "nom", /* missing part: get name */)
var addedcolumn2 = ADDCOLUMNS(addedcolumn1, "string", /* missing part: concatenate previous concatenated and new name */)
var mymax = MAXX(addedcolumn2, [Value])
RETURN MAXX(FILTER(addedcolumn2, [Value] = mymax), [string])
Full table:
Thanks for your help in advance!
Ok, so after some research and a lot of try and error... I've came up to a nice and simple solution:
The original problem was that I had a hierarchical table ,but with all data in the same table.
like so
What I did was, adding a new "parent" column with this dax:
parent =
var a = 'HIERARCHY'[id_parent]
var b = CALCULATE(MIN('HIERARCHY'[libelle]), FILTER(ALL('HIERARCHY'), 'HIERARCHY'[id_h] = a))
RETURN b
This gets the parent name from the id_parent (ref. screen).
then I could just use the path function, not on the id's but on the names... like so:
path = PATH('HIERARCHY'[libelle], 'HIERARCHY'[parent])
It made the problem easy because I didn't need to replace the id's by there names after this...
and finally to make it look nice, I used some substitution to remove the pipes:
formated_path = SUBSTITUTE('HIERARCHY'[path], "|", " -> ")
final result

generate a one-column table that contains hundreds of different categories using M or DAX

I need to split my products into a total of 120 predefined price clusters/buckets. These clusters can overlap and look somewhat like that:
As I dont want to write down all of these strings manually: Is there a convenient way to do this in M or DAX directly using a bit of code?
Thanks in advance!
Dave
With m-Query you can create a function. Open the query editor. Richt click and create empty query. Create function (ignore warning) and call it : RowGenerator.
Open advanced editor and past the following code:
let
Bron = (base as number, start as number, end as number) => let
Bron = Table.FromList(List.Generate(() => start, each _ <= end, each _ + 1), Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Aangepaste kolom toegevoegd" = Table.AddColumn(Bron, "Aangepast", each Number.ToText(base) & " - " & Number.ToText([Column1]))
in
#"Aangepaste kolom toegevoegd"
in
Bron
This function creates a table where base is your first number and start, end the range.
Add another empty query, open the advanged editor and paste:
let
Bron = List.Generate(() => 0, each _ < 5, each _ + 1),
#"Geconverteerd naar tabel" = Table.FromList(Bron, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Aangeroepen aangepaste functie" = Table.AddColumn(#"Geconverteerd naar tabel", "test", each RowGenerator(_[Column1], _[Column1] + 1, 5)),
#"test uitgevouwen" = Table.ExpandTableColumn(#"Aangeroepen aangepaste functie", "test", {"Column1", "Aangepast"}, {"Column1.1", "Price Cluster"}),
#"Kolommen verwijderd" = Table.RemoveColumns(#"test uitgevouwen",{"Column1", "Column1.1"})
in
#"Kolommen verwijderd"
This creates first a list of 5 rows, then it calls the previous made function for each row and the last step is to expend the rows and remove the not needed columns.
Enjoy:
You can create this bucket by DAX (New Table):
Table = SELECTCOLUMNS(
GENERATE(SELECTCOLUMNS(GENERATESERIES(0,10,1),"FirstPart",[Value]), SELECTCOLUMNS(GENERATESERIES(0,10,1),"SecondPart",[Value]))
,"Bucket", [FirstPart] & " - " & [SecondPart]
)
Table = SELECTCOLUMNS(
GENERATE(SELECTCOLUMNS(GENERATESERIES(0,9,1),"FirstPart",[Value]), TOPN([FirstPart], SELECTCOLUMNS(GENERATESERIES(1,9,1),"SecondPart",[Value]), [SecondPart],ASC))
,"Bucket", [FirstPart] & " - " & [SecondPart]
)

EXPAND MULTIPLE COLUMNS POWER BI

I´ve been struggling with this:
My table shows 3 records but when expanding there are like 100 columns. I used this code:
#"Expanded Data" = Table.ExpandTableColumn(#"Source", "Document", List.Union(List.Transform(#"Source"[Document]), each Table.ColumnNames(_))),
but it's not working. How can I expand simultaneously all columns? Also, inside those columns there are even more, for example I expand the first time end then those new columns have more records inside.
What could I do? Thanks in advance!
Try this ExpandAllRecords function - it recursively expands every Record-type column:
https://gist.github.com/Mike-Honey/0a252edf66c3c486b69b
This should work for Records Columns.
let
ExpandIt = (TableToExpand as table, optional ColumnName as text) =>
let
ListAllColumns = Table.ColumnNames(TableToExpand),
ColumnsTotal = Table.ColumnCount(TableToExpand),
CurrentColumnIndex = if (ColumnName = null) then 0 else List.PositionOf(ListAllColumns, ColumnName),
CurrentColumnName = ListAllColumns{CurrentColumnIndex},
CurrentColumnContent = Table.Column(TableToExpand, CurrentColumnName),
IsExpandable = if List.IsEmpty(List.Distinct(List.Select(CurrentColumnContent, each _ is record))) then false else true,
FieldsToExpand = if IsExpandable then Record.FieldNames(List.First(List.Select(CurrentColumnContent, each _ is record))) else {},
ColumnNewNames = List.Transform(FieldsToExpand, each CurrentColumnName &"."& _),
ExpandedTable = if IsExpandable then Table.ExpandRecordColumn(TableToExpand, CurrentColumnName, FieldsToExpand, ColumnNewNames) else TableToExpand,
NextColumnIndex = CurrentColumnIndex+1,
NextColumnName = ListAllColumns{NextColumnIndex},
OutputTable = if NextColumnIndex > ColumnsTotal-1 then ExpandedTable else #fx_ExpandIt(ExpandedTable, NextColumnName)
in
OutputTable
in
ExpandIt
This basically takes Table to Transform as the main argument,and then one by one checks if the Column Record is expandable (if column has "records" in it, it will expand it, otherwise move to next column and checks it again).
Then it returns the Output table once everything is expanded.
This function is calling the function from inside for each iteration.

Skip a record if empty

I've created a function that cleans my data of extra columns with null values. There should always be 15 columns after this however occasionally there is more or less and when this happens those tables should just be removed.
I've tried just skipping all those rows and returning an empty table but when I try to expand those tables I get an error "Cannot convert the value false to type Number."
(tbl as table) =>
let
ColumnNames = Table.ColumnNames(tbl),
RemoveNullColumns = Table.SelectColumns(tbl, List.Select(ColumnNames, each List.MatchesAny(Table.Column(tbl, _), each _ <> null))),
CheckColumns = Table.Skip(RemoveNullColumns, Table.ColumnCount(RemoveNullColumns) <> 15)
in
CheckColumns
See if this works for you. Removes any columns containing a null and returns tbl only if there are 15 remaining columns
(tbl as table) =>
let ColumnNames = Table.ColumnNames(tbl),
ReplacedValue = Table.ReplaceValue(tbl,null,"imanull",Replacer.ReplaceValue,ColumnNames ),
UnpivotedColumns = Table.UnpivotOtherColumns(ReplacedValue, {}, "Attribute", "Value"),
FilteredRows = Table.SelectRows(UnpivotedColumns, each ([Value] = "imanull")),
NonNullColumns= List.Difference(ColumnNames,List.Distinct(FilteredRows[Attribute])),
Results = if List.Count (NonNullColumns) <> 15 then null else Table.SelectColumns(tbl,NonNullColumns)
in Results

Spark - remove special characters from rows Dataframe with different column types

Assuming I've a Dataframe with many columns, some are type string others type int and others type map.
e.g.
field/columns types: stringType|intType|mapType<string,int>|...
|--------------------------------------------------------------------------
| myString1 |myInt1| myMap1 |...
|--------------------------------------------------------------------------
|"this_is_#string"| 123 |{"str11_in#map":1,"str21_in#map":2, "str31_in#map": 31}|...
|"this_is_#string"| 456 |{"str12_in#map":1,"str22_in#map":2, "str32_in#map": 32}|...
|"this_is_#string"| 789 |{"str13_in#map":1,"str23_in#map":2, "str33_in#map": 33}|...
|--------------------------------------------------------------------------
I want to remove some characters like '_' and '#' from all columns of String and Map type
so the result Dataframe/RDD will be:
|------------------------------------------------------------------------
|myString1 |myInt1| myMap1|... |
|------------------------------------------------------------------------
|"thisisstring"| 123 |{"str11inmap":1,"str21inmap":2, "str31inmap": 31}|...
|"thisisstring"| 456 |{"str12inmap":1,"str22inmap":2, "str32inmap": 32}|...
|"thisisstring"| 789 |{"str13inmap":1,"str23inmap":2, "str33inmap": 33}|...
|-------------------------------------------------------------------------
I am not sure if it's better to convert the Dataframe into an RDD and work with it or perform the work in the Dataframe.
Also, not sure how to handle the regexp with different column types in the best way (I am sing scala).
And I would like to perform this action for all column of these two types (string and map), trying to avoid using the column names like:
def cleanRows(mytabledata: DataFrame): RDD[String] = {
//this will do the work for a specific column (myString1) of type string
val oneColumn_clean = mytabledata.withColumn("myString1", regexp_replace(col("myString1"),"[_#]",""))
...
//return type can be RDD or Dataframe...
}
Is there any simple solution to perform this?
Thanks
One option is to define two udfs to handle string type column and Map type column separately:
import org.apache.spark.sql.functions.udf
val df = Seq(("this_is#string", 3, Map("str1_in#map" -> 3))).toDF("myString", "myInt", "myMap")
df.show
+--------------+-----+--------------------+
| myString|myInt| myMap|
+--------------+-----+--------------------+
|this_is#string| 3|Map(str1_in#map -...|
+--------------+-----+--------------------+
1) Udf to handle string type columns:
def remove_string: String => String = _.replaceAll("[_#]", "")
def remove_string_udf = udf(remove_string)
2) Udf to handle Map type columns:
def remove_map: Map[String, Int] => Map[String, Int] = _.map{ case (k, v) => k.replaceAll("[_#]", "") -> v }
def remove_map_udf = udf(remove_map)
3) Apply udfs to corresponding columns to clean it up:
df.withColumn("myString", remove_string_udf($"myString")).
withColumn("myMap", remove_map_udf($"myMap")).show
+------------+-----+-------------------+
| myString|myInt| myMap|
+------------+-----+-------------------+
|thisisstring| 3|Map(str1inmap -> 3)|
+------------+-----+-------------------+