Add custom column based on string in another column

Add custom column based on string in another column - powerbi

Source data:
Market Platform Web sales $ Mobile sales $ Insured
FR iPhone 1323 8709 Y
IT iPad 12434 7657 N
FR android 234 2352355 N
IT android 12323 23434 Y
Is there a way to evaluate the sales of devices that are insured?
if List.Contains({"iPhone","iPad","iPod"},[Platform]) and ([Insured]="Y") then [Mobile sales] else "error"
Something to that extent, just not sure how to approach it

A direct answer to your question is
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
SumUpSales = Table.AddColumn(Source, "Sales of insured devices", each if List.Contains({"iPhone","iPad","iPod"}, _[Platform]) and Text.Upper(_[Insured]) = "Y" then _[#"Mobile sales $"] else null, type number)
in
SumUpSales
However, I would like to stress you few things.
First, it's better to convert values in [Insured] column to boolean first. That way you can catch errors before they corrupt your data without you noticing. My example doesn't do that, all it does is negating letter case in [Insured], since PowerM is case-sensitive language.
Second, you'd better use null rather than text value error. Then, you can set column type, and do some math with its values, such as summing them up. In case of mixed text and number values you will get an error in this and many other cases.
And last.
It is probably better way to use a pivot table for visualizing data like this. You just need to add a column which groups all Apple (and/or other) devices together based on the same logic, but excluding [Insured]. Pivot tables are more flexible, and I personally like them very much.

Related

Google Sheets: How can I extract partial text from a string based on a column of different options?

Goal: I have a bunch of keywords I'd like to categorise automatically based on topic parameters I set. Categories that match must be in the same column so the keyword data can be filtered.
e.g. If I have "Puppies" as a first topic, it shouldn't appear as a secondary or third topic otherwise the data cannot be filtered as needed.
Example Data: https://docs.google.com/spreadsheets/d/1TWYepApOtWDlwoTP8zkaflD7AoxD_LZ4PxssSpFlrWQ/edit?usp=sharing
Video: https://drive.google.com/file/d/11T5hhyestKRY4GpuwC7RF6tx-xQudNok/view?usp=sharing
Parameters Tab: I will add words in columns D-F that change based on the keyword data set and there will often be hundreds, if not thousands, of options for larger data sets.
Categories Tab: I'd like to have a formula or script that goes down the columns D-F in Parameters and fills in a corresponding value (in Categories! columns D-F respectively) based on partial match with column B or C (makes no difference to me if there's a delimiter like a space or not. Final data sheet should only have one of these columns though).
Things I've Tried:
I've tried a bunch of things. Nested IF formula with regexmatch works but seems clunky.
e.g. this formula in Categories! column D
=IF(REGEXMATCH($B2,LOWER(Parameters!$D$3)),Parameters!$D$3,IF(REGEXMATCH($B2,LOWER(Parameters!$D$4)),Parameters!$D$4,""))
I nested more statements changing out to the next cell in Parameters!D column (as in , manually adding $D$5, $D$6 etc) but this seems inefficient for a list thousands of words long. e.g. third topic will get very long once all dog breed types are added.
Any tips?
Functionality I haven't worked out:
if a string in Categories B or C contains more than one topic in the parameters I set out, is there a way I can have the first 2 to show instead of just the first one?
e.g. Cell A14 in Categories, how can I get a formula/automation to add both "Akita" & "German Shepherd" into the third topic? Concatenation with a CHAR(10) to add to new line is ideal format here. There will be other keywords that won't have both in there in which case these values will just show up individually.
Since this data set has a bunch of mixed breeds and all breeds are added as a third topic, it would be great to differentiate interest in mixes vs pure breeds without confusion.
Any ideas will be greatly appreciated! Also, I'm open to variations in layout and functionality of the spreadsheet in case you have a more creative solution. I just care about efficiently automating a tedious task!!

Try using custom function:
To create custom function:
1.Create or open a spreadsheet in Google Sheets.
2.Select the menu item Tools > Script editor.
3.Delete any code in the script editor and copy and paste the code below into the script editor.
4.At the top, click Save save.
To use custom function:
1.Click the cell where you want to use the function.
2.Type an equals sign (=) followed by the function name and any input value — for example, =DOUBLE(A1) — and press Enter.
3.The cell will momentarily display Loading..., then return the result.
Code:
function matchTopic(p, str) {
var params = p.flat(); //Convert 2d array into 1d
var buildRegex = params.map(i => '(' + i + ')').join('|'); //convert array into series of capturing groups. Example (Dog)|(Puppies)
var regex = new RegExp(buildRegex,"gi");
var results = str.match(regex);
if(results){
// The for loops below will convert the first character of each word to Uppercase
for(var i = 0 ; i < results.length ; i++){
var words = results[i].split(" ");
for (let j = 0; j < words.length; j++) {
words[j] = words[j][0].toUpperCase() + words[j].substr(1);
}
results[i] = words.join(" ");
}
return results.join(","); //return with comma separator
}else{
return ""; //return blank if result is null
}
}
Example Usage:
Parameters:
First Topic:
Second Topic:
Third Topic:
Reference:
Custom Functions

I've added a new sheet ("Erik Help") with separate formulas (highlighted in green currently) for each of your keyword columns. They are each essentially the same except for specific column references, so I'll include only the "First Topic" formula here:
=ArrayFormula({"First Topic";IF(A2:A="",,IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) & IFERROR(CHAR(10)&REGEXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))))})
This formula first creates the header (which can be changed within the formula itself as you like).
The opening IF condition leaves any row in the results column blank if the corresponding cell in Column A of that row is also blank.
JOIN is used to form a concatenated string of all keywords separated by the pipe symbol, which REGEXEXTRACT interprets as OR.
IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) will attempt to extract any of the keywords from each concatenated string in Columns B and C. If none is found, IFERROR will return null.
Then a second-round attempt is made:
& IFERROR(CHAR(10)&REGEXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>"")))))
Only this time, REGEXREPLACE is used to replace the results of the first round with null, thus eliminating them from being found in round two. This will cause any second listing from the JOIN clause to be found, if one exists. Otherwise, IFERROR again returns null for round two.
CHAR(10) is the new-line character.
I've written each of the three formulas to return up to two results for each keyword column. If that is not your intention for "First Topic" and "Second Topic" (i.e., if you only wanted a maximum of one result for each of those columns), just select and delete the entire round-two portion of the formula shown above from the formula in each of those columns.

DAX to Test for Whole Number

I have a Actuals column like so:
ID | Airport
------------
A | 98.4
B | 98.0
C | 95.3
I'm attempting to format the numbers above into percentages for a front-end report. I have this written in a switch statement - for ease I'll just write the logic as an IF boolean.
example_measure =
VAR Nums = SELECTEDVALUES(Table[Actuals])
VAR FormatNums = IF(DIVIDE(ROUND(nums,1), nums) = 1,
format(nums,"0%"),format(nums,"0.0%")
-
RETURN
FormatNums
no matter what I do this always returns a number with a floating point value of 1f
so in my raw data of 98.0 I'm expecting the format to return "98%" rather than "98.0%"
the measures are used on individual cards, and are filtered so they can only ever show one value or blank, meaning they will only ever display one value on their own.
I've toyed with using if(nums = int(nums) which should evaluate to true for 98.0 but it I always get false.

There is a simpler way - just use built-in formatting codes:
Formatted Actuals =
VAR x = SELECTEDVALUE(Data[Actuals])
RETURN
FORMAT(x, "General Number") & "%"
Result:
Built-in style "General Number" handles your situation correctly, you just need to add % symbol to the formatted string. No need to test for whole numbers.

To convert a column/measure into percentage, you can simply divide the column by 100 and then format it as a percentage. Use the following steps:
Create a new column/measure: Perc_value = Table[Actuals]/100
Then go into the modelling tab, select the created column/measure and format it as a % and limit the number of decimal places to 0
This should give the view you are looking for. Hope this helps.
Edit:
You can use the below formula to achieve the desired result:
Column4 = IF('Table'[Column2]-ROUND('Table'[Column2],0)=0,FORMAT('Table'[Column2]/100,"0%"),FORMAT('Table'[Column2]/100,"0.0%"))
Replace Column2 withyour field and you should be good to go.

Stata xtline overlayed plot for multiple groups

I am attempting to produce an overlayed -xtline- plot that distinguishes between males and females (or any number of multiple groups) by displaying different plot styles for each group. I chose to recast the xtline plot as "connected" and show males using circle markers and females as triangle markers. Taking cues from this question on Statalist, I produced code similar to what is below. When I try this solution Stata produces the "too many options" error, which is perhaps predictable given the large number of unique persons. I am aware of this solution which employs combined graphs but that is also not practical given the large number of unique individuals in my data.
Does a more simple solution to this problem exist? Does Stata have the capacity to overlay multiple -xtline- plots like it can -twoway- plots?
The code below, using publicly available data from UCLA's excellent Stata guide shows my basic code and reproduces the error:
use http://www.ats.ucla.edu/stat/stata/examples/alda/data/alcohol1_pp, clear
xtset id age
gsort -male id
qui levelsof id if !male, loc(fidlevs)
qui levelsof id if male, loc(midlevs)
qui levelsof id, loc(alllevs)
tokenize `alllevs'
loc len_f : word count `fidlevs'
loc len_m : word count `midlevs'
loc len_all : word count `alllevs'
loc start_f = `len_all' - `len_f'
forval i = 1/`len_all' {
if `i' < `start_f' {
loc m_plot_opt "`m_plot_opt' plot`i'opts(recast(connected) mcolor(black) msize(medsmall) msymbol(circle) lcolor(black) lwidth(medthin) lpattern(solid))"
}
else if `i' >= `start_f' {
loc f_plot_opt "`f_plot_opt' plot`i'opts(recast(connected) mcolor(black) msize(medsmall) msymbol(triangle) lcolor(black) lwidth(medthin) lpattern(solid))"
}
}
di "xtline alcuse, legend(off) scheme(s1mono) overlay `m_plot_opt' `f_plot_opt'"
xtline alcuse, legend(off) scheme(s1mono) overlay `m_plot_opt' `f_plot_opt'

It is difficult (for me) to separate the programming issue here from statistical or graphical views on what kind of graph works well, or at all. Even with this modest dataset there are 82 distinct identifiers, so any attempt to show them distinctly fails to be useful, if only because the resulting legend takes up most of the real estate.
There is considerable ingenuity in the question code in working through all the identifiers, but a broad-brush approach seems to work as well. Try this:
use http://www.ats.ucla.edu/stat/stata/examples/alda/data/alcohol1_pp, clear
xtset id age
separate alcuse, by(male) veryshortlabel
label var alcuse1 "male"
label var alcuse0 "female"
line alcuse? age, legend(off) sort connect(L)
Key points:
There is nothing very special about xtline. It's just a convenience wrapper. When frustrated by its wired-in choices, people often just reach for line.
To get distinct colours, distinct variables suffice, which is where separate has a role. See also this Tip.
Although the example dataset is well behaved, extra options sort connect(L) will help in some case to remove spurious connections between individuals or panels. (In extreme cases, reach for linkplot (SSC).)
This could be fine too:
line alcuse age if male || line alcuse age if !male, legend(order(1 "male" 2 "female")) sort connect(L)

Extracting dollar amounts from existing sql data?

I have a field with that contains a mix of descriptions and dollar amounts. With TSQL, I would like to extract those dollar amounts, then insert them into a new field for the record.
-- UPDATE --
Some data samples could be:
Used knife set for sale $200.00 or best offer.
$4,500 Persian rug for sale.
Today only, $100 rebate.
Five items for sale: $20 Motorola phone car charger, $150 PS2, $50.00 3 foot high shelf.
In the set above I was thinking of just grabbing the first occurrence of the dollar figure... that is the simplest.
I'm not trying to remove the amounts from the original text, just get their value, and add them to a new field.
The amounts could/could not contain decimals, and commas.
I'm sure PATINDEX won't cut it and I don't need an extremely RegEx function to accomplish this.
However, looking at The OLE Regex Find (Execute) function here, appears to be the most robust, however when trying to use the function I get the following error message in SSMS:
SQL Server blocked access to procedure 'sys.sp_OACreate' of component
'Ole Automation Procedures' because this component is turned off as
part of the security configuration for this server. A system
administrator can enable the use of 'Ole Automation Procedures' by
using sp_configure. For more information about enabling 'Ole
Automation Procedures', see "Surface Area Configuration" in SQL Server
Books Online.
I don't want to go and changing my server settings just for this function. I have another regex function that works just fine without changes.
I can't imagine this being that complicated to just extract dollar amounts. Any simpler ways?
Thanks.

CREATE FUNCTION dbo.fnGetAmounts(#str nvarchar(max))
RETURNS TABLE
AS
RETURN
(
-- generate all possible starting positions ( 1 to len(#str))
WITH StartingPositions AS
(
SELECT 1 AS Position
UNION ALL
SELECT Position+1
FROM StartingPositions
WHERE Position <= LEN(#str)
)
-- generate possible lengths
, Lengths AS
(
SELECT 1 AS [Length]
UNION ALL
SELECT [Length]+1
FROM Lengths
WHERE [Length] <= 15
)
-- a Cartesian product between StartingPositions and Lengths
-- if the substring is numeric then get it
,PossibleCombinations AS
(
SELECT CASE
WHEN ISNUMERIC(substring(#str,sp.Position,l.Length)) = 1
THEN substring(#str,sp.Position,l.Length)
ELSE null END as Number
,sp.Position
,l.Length
FROM StartingPositions sp, Lengths l
WHERE sp.Position <= LEN(#str)
)
-- get only the numbers that start with Dollar Sign,
-- group by starting position and take the maximum value
-- (ie, from $, $2, $20, $200 etc)
SELECT MAX(convert(money, Number)) as Amount
FROM PossibleCombinations
WHERE Number like '$%'
GROUP BY Position
)
GO
declare #str nvarchar(max) = 'Used knife set for sale $200.00 or best offer.
$4,500 Persian rug for sale.
Today only, $100 rebate.
Five items for sale: $20 Motorola phone car charger, $150 PS2, $50.00 3 foot high shelf.'
SELECT *
FROM dbo.fnGetAmounts(#str)
OPTION(MAXRECURSION 32767) -- max recursion option is required in the select that uses this function

This link should help.
http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/extracting-numbers-with-sql-server
Assuming you are OK with extracting the numeric's, regardless of wether or not there is a $ sign. If that is a strict requirement, some mods will be needed.

django query to calculate time

The following are the timestamps that is present in the timestamp column of the table userdata.My question is that how to write a query such that i get the output as i need the month name i.2,2010-03 and the total time used in the month
userdata.months.filter().order_by('-timestamp')
'2011-03-07 16:03:01'
'2011-03-07 16:07:04'
'2011-03-06 11:03:01'
'2011-03-08 16:03:01'
'2011-03-04 09:03:01'
'2011-05-16 16:03:01'
'2011-05-18 16:03:01'
'2011-07-16 12:03:01'
'2011-07-17 12:03:01'
'2011-07-17 15:03:01'

something like
my_month = 2 (or whatever you need)
userdata.months.filter(timestamp__month=my_month).aggregate(sum('timestamp__time')
i'm not sure about that timestamp__time, you may search a bit about it, but i think it's correct.
otherwise, you can use a raw query (userdata.months.raw('query here'))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js