How to combine ifelse and left_join - if-statement

I have two data sets. First dataset (AH) has a columns: Account_Number, Account_Name, Market_Value and second data set (AH1) has Account_Number, Market_Value and Fund. I am trying to bring in Fund to the first data set from second dataset for only the Account_Numbers I want. Let's say I want to bring the Funds for the account number that starts with 2 if it does not starts with 2 then I want what it is in Account_Number column. Could you please advise how would I do that?
I tried this syntax:
ifelse(AH$Account_Number == starts_with("2"),left_join(AH,AH1, by = "Account_Number),AH$Account_Number

Related

How to replace missing values in panel data?

I am looking into weekly earnings data, where I have defined my data as pre-pandemic earning data and post-pandemic earning data. Now for some individuals, I have some missing values for the post-pandemic period which I want to replace with their pre-pandemic earnings. However, I am struggling with the coding for this panel data. I was hoping someone could help me with. Thanks in advance.
It is always easier if you share example data (see dataex) or at least list what variables you have. The example below will therefore most likely need to be edited.
* Sort the data by individual id and the time unit
* that indicates if this the obs is pre or post pandemic
sort id time
* This replaces the earnings value with a missing value if the
* id var is the same as on the next row AND the earnings var
* on is missing on the next row
replace earnings = . if id == id[_n+1] & missing(earnings[_n+1])
This assumes that all individuals are indeed represented in each time period and that you have a unique id variable (id) in your data set.

Power Query / Power BI - How to move a cell value to a separate cell the easiest way?

I want to move a single value from column B to column A, how can I achieve it in the most simplest way in Power Query / Query Editor (Power BI)?
Please see attached images.
I know I might need to declare a variable so please enlighten me. By the way, I will delete row 1 afterwards, promote my headers, and rename column2 as PERIOD.
Thank you.
This might be along the lines of what you want to do.
If I start with this table named as Table1:
Then I click on the fx to the left of the formula bar:
And type = Table.InsertRows(Source, Table.RowCount(Source), {[Column2 = Source[KP20 rate]{0}, KP20 rate = null, Column4 = null]}) into the formula bar:
I used Table.InsertRows to create a new row in Table1. Source is the name of the latest state of Table1 after it is pulled into Power Query and before I do this step. So I actually use Source as the name of the table for this step instead of Table1. (Each applied step basically results in its own table. You probably know this already, but others may not.) So for this step I use Source as the table name in the Table.InsertRows statement. Then, since I want the new row to appear at the bottom of Source, I just enter the Table.RowCount of Source as the row number location for the new row. Then I enter each of the Columns' names and their values to be added. For Column2, I entered the value "Source[KP20 rate]{0}." Source[KP20 rate]{0} basically treats column KP20 rate as a list, where {0} serves as a pointer to the first item in the list. To target the second item in Source[KP20 rate] you would use Source[KP20 rate]{1}. You can see that I set the values for the other two columns (KP20 rate and Column4) to null.
The result:
Here's the M code in case you want to see it:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Custom1 = Table.InsertRows(Source, Table.RowCount(Source), {[Column2 = Source[KP20 rate]{0}, KP20 rate = null, Column4 = null]})
in
Custom1

Get total count of each distinct value

If I for example have a column of countries that might repeat and the list follows like this: Spain, Spain, Italy, Spain
I want to get the result that I take the number that a country appears in the column and divide it by total number. I have tried:
CountRows = DIVIDE(DISTINCTCOUNT('Report (7)'[Country]); COUNT('Report (7)'[Country]) )
Any suggestions? do I need a new column for that?
The easiest way to achieve this type of calculation is to add one column with the number of occurrence of the selected words divided by the number of row in the table.
You need to use the function Earlier to get the context.
If you have one table named Table1 and your column Country
Something like :
Divide(COUNTROWS(FILTER(table1, Table1[Country] = EARLIER(Table1[Country]))),COUNTROWS(Table1))
Don't forget to put your new column in Percentage type or add some decimal to see the correct data.

SAS Counting Keywords Across Columns

I have list of keywords; CORPORATE, REAL ESTATE, COMPETITION, TRADE, DISPUTE
I want to be able to count the number of occurrences these keywords appear between columns practice_area1—Practice_area10. Therefore, I want to scan across 10 columns and create new columns. Each new column will represent each of the keywords (above) and a count as a value e.g. Corporate 4
Once this statement has ran and we have our new columns I want to create a new variable “Practice Group” which is populated with the highest count of the new variables we have just created. The dataset below is an example of how the data should look:
Please could somebody offer me some advice of the best approach to do this?
Many Thanks
Chris
All you need to do is use makean array for all the columns which you want to check. Then do loop it through each column for the word you want to check by using count function and add the count in the loop.
Code below is checking for three columns and three values. You can apply this code to as many columns as you want.
data have(drop= i);
col1 = 'CORPORATE, REAL ESTATE, REAL ESTATE';
col2= 'CORPORATE, CORPORATE, TRADE, TRADE, TRADE, REAL ESTATE';
col3= 'TRADE, TRADE, DISPUTE,REAL ESTATE';
array col[*] col1 - col3;
realestate=0;/*starting with zero*/
trade=0;
corporate= 0;
do i = 1 to 3;
realestate =realestate+count( col(i), 'REAL ESTATE');/* adding through the loop*/
TRADE =trade+count( col(i), 'TRADE');
CORPORATE= corporate+count( col(i), 'CORPORATE');
end;
run;

Search through the data with a loop or nested loops in SAS

I am rather a beginner in SAS. I have the following problem. Given is a big data set (my_time) which I imported into SAS looking as follows
I want to implement the following algorithm
for every account look for a status and if it is equal to na then look for the same contract after one year (one year after it gets the status na) and put the information "my_date", "status" and "money" in three new columns "new_my_date", "new_status" and "new_money" like in
I need something like countifs in excel. I found loops in SAS like DO but not for the purpose to look through all rows.
I do not even know for which key word I have to look.
I would be grateful for any hint.
A simple method would be by sorting, then exploiting the special variable prefix first. and retain statement to get the desired result.
Step 1: Sort by account, date, and status
proc sort data=have;
by account my_date status;
run;
This will guarantee that your data is in the order that you need. Since we are looking only for year+1 after the status = 'na', anything that happens in-between that doesn't matter.
Step 2: Use a data step to remember the first year when na happens for that account
data want;
set have;
by account my_date status;
retain first_na_year first_na_account;
if(first.account) then call missing(first_na_year,first_na_account);
if(status IN('na', 'tna') ) then do;
first_na_year = year;
first_na_month = month;
first_na_account = account;
end;
if( year = first_na_year+1
AND first_na_month = month
AND account = first_na_account)
AND status NOT IN('na', 'tna') )
then do;
new_status = status ;
new_my_date = my_date;
new_money = money;
end;
if(cmiss(new_status, new_my_date, new_money) ) = 0;
drop first:;
run;
For each row, we compare three things:
Is the status not 'na'?
Is the year 1 year bigger than the last time it was 'na'?
Is this the same account we're comparing?
If all are true, then we want to create the three new variables.
What's happening:
SAS is inherently a looping language, so we do not need to use a do loop here. When SAS goes to a new row, it will clear all variables in the Program Data Vector (PDV) in preparation for filling them in with the new values in the row.
Since SAS the SAS data step only goes forwards and doesn't like to go backwards, we want it to remember the first time that na occurs for that account. retain tells SAS not to discard the value of a variable when it reads a new row.
When we are done doing our comparison and we've moved onto the next account, we reset these variables to missing. by group processing allows SAS to know exactly where the first and last occurrence of the account is in the dataset.
At the end, we output only if all 3 of the new variables are not missing. cmiss counts how many variables are not missing. Note that output is always implied before the run statement, so we simply need to use an "if without then" in this case.
The final statement, drop first:;, is a simple shortcut to remove any variables that start with the phrase first. This prevents them from being shown in the final dataset.