I am trying to help a public school here, but I have very limited knowledge in Power Bi so I hope your guys could enlight me on this case:
we have a very simple report with a table and a kpi
Kpi counts all students
table shows studants grades
Student Math Portuguese History Science
StD A 6 6 7 8
StD B 6 7 6 7
StD C 8 9 7 8
StD D 6 6 6 6
StD E 6 7 8 8
StD F 8 6 7 7
the rule that must be applied to the kpi (count(Students)) and to the table is to show studenst only if:
at least 2 subjects are equal or under 6
portuguese is equal or under 6
math is under 6
all the rest should not be showed in the table or counted in the KPI. In this case I would see/count only students A, B, D,E & F
any help would be very appreciated
To tackle your task try the following:
Create a calculated column in your table with the following DAX code:
isValid =
VAR cond_2_subjects = (('Table'[Math] <= 6 ) + ('Table'[Portuguese] <= 6) + ('Table'[History] <= 6) + ('Table'[Science] <= 6)) >= 2
VAR cond_portuguese = 'Table'[Portuguese] <= 6
VAR cond_math = 'Table'[Math] < 6
RETURN
-- This will check if any of the given conditions is true
IF(
cond_2_subjects || cond_portuguese || cond_math,
TRUE(),
FALSE()
)
The table should then look like this:
The KPI (measure) can then be written like so:
# Students =
CALCULATE(
COUNT('Table'[Student]),
-- only count Students where conditions are true (calculated column isValid = True)
'Table'[isValid] = TRUE()
)
The final result should then look like this:
The table on the left has specified 'Table'[isValid] = TRUE() as filter on visual
I have a csv with one of the columns that contains periods:
timespan (string): PnYnMnD, where P is a literal value that starts the expression, nY is the number of years followed by a literal Y, nM is the number of months followed by a literal M, nD is the number of days followed by a literal D, where any of these numbers and corresponding designators may be absent if they are equal to 0, and a minus sign may appear before the P to specify a negative duration.
I want to return a data frame that contains all the data in the csv with parsed timespan column.
So far I have a code that parses periods:
import re
timespan_regex = re.compile(r'P(?:(\d+)Y)?(?:(\d+)M)?(?:(\d+)D)?')
def parse_timespan(timespan):
# check if the input is a valid timespan
if not timespan or 'P' not in timespan:
return None
# check if timespan is negative and skip initial 'P' literal
curr_idx = 0
is_negative = timespan.startswith('-')
if is_negative:
curr_idx = 1
# extract years, months and days with the regex
match = timespan_regex.match(timespan[curr_idx:])
years = int(match.group(1) or 0)
months = int(match.group(2) or 0)
days = int(match.group(3) or 0)
timespan_days = years * 365 + months * 30 + days
return timespan_days if not is_negative else -timespan_days
print(parse_timespan(''))
print(parse_timespan('P2Y11M20D'))
print(parse_timespan('-P2Y11M20D'))
print(parse_timespan('P2Y'))
print(parse_timespan('P0Y'))
print(parse_timespan('P2Y4M'))
print(parse_timespan('P16D'))
Output:
None
1080
-1080
730
0
850
16
How do I apply this code to the whole csv column while running the function processing csv?
def do_process_citation_data(f_path):
global my_ocan
my_ocan = pd.read_csv(f_path, names=['oci', 'citing', 'cited', 'creation', 'timespan', 'journal_sc', 'author_sc'],
parse_dates=['creation', 'timespan'])
my_ocan = my_ocan.iloc[1:] # to remove the first row
my_ocan['creation'] = pd.to_datetime(my_ocan['creation'], format="%Y-%m-%d", yearfirst=True)
my_ocan['timespan'] = parse_timespan(my_ocan['timespan']) #I tried like this, but sure it is not working :)
return my_ocan
Thank you and have a lovely day :)
Like with Python's builtin map, Pandas also has that method. You can check its documentation here. Since you already have your function ready which takes a single parameter and returns a value, you just need this:
my_ocan['timespan'] = my_ocan['timespan'].map(parse_timespan) #This will take each value in the column "timespan", pass it to your function 'parse_timespan', and update the specific row with the returned value
And here is a generic demo:
import pandas as pd
def demo_func(x):
#Takes an int or string, prefixes with 'A' and returns a string.
return "A" + str(x)
df = pd.DataFrame({"Column_1": [1, 2, 3, 4], "Column_2": [10, 9, 8, 7]})
print(df)
df['Column_1'] = df['Column_1'].map(demo_func)
print("After mapping:\n{}".format(df))
Output:
Column_1 Column_2
0 1 10
1 2 9
2 3 8
3 4 7
After mapping:
Column_1 Column_2
0 A1 10
1 A2 9
2 A3 8
3 A4 7
I want to know to which quarter(Q1,Q2,Q3,Q4) does the current month belongs to in python. I'm fetching the current date by importing time module as follows:
import time
print "Current date " + time.strftime("%x")
any idea how to do it ?
Modifying your code, I get this:
import time
month = int(time.strftime("%m")) - 1 # minus one, so month starts at 0 (0 to 11)
quarter = month / 3 + 1 # add one, so quarter starts at 1 (1 to 4)
quarter_str = "Q" + str(quarter) # convert to the "Qx" format string
print quarter_str
Or you could use the bisect module:
import time
import bisect
quarters = range(1, 12, 3) # This defines quarters: Q1 as 1, 2, 3, and so on
month = int(time.strftime("%m"))
quarter = bisect.bisect(quarters, month)
quarter_str = = "Q" + str(quarter)
print quarter_str
strftime does not know about quarters, but you can calculate them from the month:
Use time.localtime to retrieve the current time in the current timezone. This function returns a named tuple with year, month, day of month, hour, minute, second, weekday, day of year, and time zone offset. You will only need the month (tm_mon).
Use the month to calculate the quarter. If the first quarter starts with January and ends with March, the second quarter starts with April and ends with June, etc. then this is as easy as dividing by 4 without remainder and adding 1 (for 1..3 // 4 == 0, 0 + 1 == 1, 4..6 // 4 == 1, 1 + 1 == 2, etc.). If your definition of what a quarter is differs (e.g. companies may choose different start dates for their financial quarters), you have to adjust the calculation accordingly.
I coded a function that calculates the moving average of a stock given a list of dates and prices. But the output is incorrect. I just need a second set of eyes on the code. here is my code.
def calculate(self, stock_date_price_list, min_days=2):
'''Calculates the moving average and generates a signal strategy for buy or sell
strategy given a list of stock date and price. '''
stock_averages = []
stock_signals = []
price_list = [float(n) for n in stock_date_price_list[1::2]]
days_window = collections.deque(maxlen=min_days)
rounding_point = 0.01
for price in price_list:
days_window.append(price)
stock_averages.append(0)
stock_signals.append("")
if len(days_window) == min_days:
moving_avg = sum(days_window) / min_days
stock_averages[-1] = moving_avg
if price < moving_avg:
stock_signals[-1] = "SELL"
elif price > moving_avg:
if price_list[-2] < stock_averages[-2]:
stock_signals[-1] = "BUY"
stock_averages[:] = ("%.2f" % avg if abs(avg)>=rounding_point else ' ' for avg in stock_averages)
return stock_averages, stock_signals
The input is a list of stock price and dates in the following format:
[2012-10-10,52.30,2012-10-09,51.60]
The output I get is:
2012-10-01 659.39
2012-10-02 661.31
2012-10-03 671.45
2012-10-04 666.80
2012-10-05 652.59
2012-10-08 638.17
2012-10-09 635.85
2012-10-10 640.91
2012-10-11 628.10
2012-10-12 629.71 648.43 SELL
2012-10-15 634.76 645.97 SELL
2012-10-16 649.79 644.81 BUY
2012-10-17 644.61 642.13 BUY
2012-10-18 632.64 638.71 SELL
2012-10-19 609.84 634.44 SELL
2012-10-22 634.03 634.02 BUY
2012-10-23 613.36 631.77 SELL
2012-10-24 616.83 629.37 SELL
Whereas it should be:
2012-10-01 659.39
2012-10-02 661.31
2012-10-03 671.45
2012-10-04 666.80
2012-10-05 652.59
2012-10-08 638.17
2012-10-09 635.85
2012-10-10 640.91
2012-10-11 628.10
2012-10-12 629.71 648.43
2012-10-15 634.76 645.97
2012-10-16 649.79 644.81 BUY
2012-10-17 644.61 642.13
2012-10-18 632.64 638.71 SELL
2012-10-19 609.84 634.44
2012-10-22 634.03 634.02 BUY
2012-10-23 613.36 631.77 SELL
2012-10-24 616.83 629.37
Parameters for buying/selling:
If the closing price on a particular day has crossed above the simple moving average (i.e., the closing price on that day is above that day's simple moving average, while the previous closing price is not above the previous simple moving average), generate a buy signal.
If the closing price on a particular day has crossed below the simple moving average, generate a sell signal.
Otherwise, generate no signal.
As you state yourself, the condition for buying is not just
price > moving_avg but also that the previous_price < previous_moving_avg.
You do address this with
price_list[-2] < stock_averages[-2]
except that price_list is one big list, and price_list[-2] is always the penultimate item in the big list. It isn't necessarily the previous price relative to where you are in the loop.
Similarly, the signal to sell needs to be not only price < moving_avg but also that previous_price > previous_moving_avg.
There are other (mainly stylistic) problems with calculate.
stock_data_price_list is a required input, but you only use the
slice stock_data_price_list[1::2]. If that's the case, you should require the slice as the input, not stock_data_price_list
price_list is essentially this slice, except that you call float
on each item. That implies the data has not been parsed properly.
Don't make calculate be both a data parser as well as a data
analyzer. It's much better to make simple functions which accomplish one and only one task.
Similarly, calculate should not be in the business of formatting
the result:
stock_averages[:] = ("%.2f" % avg if abs(avg)>=rounding_point else ' ' for avg in stock_averages)
Here is how you could fix the code using pandas:
import pandas as pd
data = [('2012-10-01', 659.38999999999999),
('2012-10-02', 661.30999999999995),
('2012-10-03', 671.45000000000005),
('2012-10-04', 666.79999999999995),
('2012-10-05', 652.59000000000003),
('2012-10-08', 638.16999999999996),
('2012-10-09', 635.85000000000002),
('2012-10-10', 640.90999999999997),
('2012-10-11', 628.10000000000002),
('2012-10-12', 629.71000000000004),
('2012-10-15', 634.75999999999999),
('2012-10-16', 649.78999999999996),
('2012-10-17', 644.61000000000001),
('2012-10-18', 632.63999999999999),
('2012-10-19', 609.84000000000003),
('2012-10-22', 634.02999999999997),
('2012-10-23', 613.36000000000001),
('2012-10-24', 616.83000000000004)]
df = pd.DataFrame(data, columns=['date','price'])
df['average'] = pd.rolling_mean(df['price'], 10)
df['prev_price'] = df['price'].shift(1)
df['prev_average'] = df['average'].shift(1)
df['signal'] = ''
buys = (df['price']>df['average']) & (df['prev_price']<df['prev_average'])
sells = (df['price']<df['average']) & (df['prev_price']>df['prev_average'])
df.loc[buys, 'signal'] = 'BUY'
df.loc[sells, 'signal'] = 'SELL'
print(df)
yields
date price average prev_price prev_average signal
0 2012-10-01 659.39 NaN NaN NaN
1 2012-10-02 661.31 NaN 659.39 NaN
2 2012-10-03 671.45 NaN 661.31 NaN
3 2012-10-04 666.80 NaN 671.45 NaN
4 2012-10-05 652.59 NaN 666.80 NaN
5 2012-10-08 638.17 NaN 652.59 NaN
6 2012-10-09 635.85 NaN 638.17 NaN
7 2012-10-10 640.91 NaN 635.85 NaN
8 2012-10-11 628.10 NaN 640.91 NaN
9 2012-10-12 629.71 648.428 628.10 NaN
10 2012-10-15 634.76 645.965 629.71 648.428
11 2012-10-16 649.79 644.813 634.76 645.965 BUY
12 2012-10-17 644.61 642.129 649.79 644.813
13 2012-10-18 632.64 638.713 644.61 642.129 SELL
14 2012-10-19 609.84 634.438 632.64 638.713
15 2012-10-22 634.03 634.024 609.84 634.438 BUY
16 2012-10-23 613.36 631.775 634.03 634.024 SELL
17 2012-10-24 616.83 629.367 613.36 631.775
[18 rows x 6 columns]
Without pandas, you could do something like this:
nan = float('nan')
def calculate(prices, size=2):
'''Calculates the moving average and generates a signal strategy for buy or sell
strategy given a list of stock date and price. '''
averages = [nan]*(size-1) + moving_average(prices, size)
previous_prices = ([nan] + prices)[:-1]
previous_averages = ([nan] + averages)[:-1]
signal = []
for price, ave, prev_price, prev_ave in zip(
prices, averages, previous_prices, previous_averages):
if price > ave and prev_price < prev_ave:
signal.append('BUY')
elif price < ave and prev_price > prev_ave:
signal.append('SELL')
else:
signal.append('')
return averages, signal
def window(seq, n=2):
"""
Returns a sliding window (of width n) over data from the sequence
s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
"""
for i in xrange(len(seq) - n + 1):
yield tuple(seq[i:i + n])
def moving_average(data, size):
return [(sum(grp)/len(grp)) for grp in window(data, n=size)]
def report(*args):
for row in zip(*args):
print(''.join(map('{:>10}'.format, row)))
dates = ['2012-10-01',
'2012-10-02',
'2012-10-03',
'2012-10-04',
'2012-10-05',
'2012-10-08',
'2012-10-09',
'2012-10-10',
'2012-10-11',
'2012-10-12',
'2012-10-15',
'2012-10-16',
'2012-10-17',
'2012-10-18',
'2012-10-19',
'2012-10-22',
'2012-10-23',
'2012-10-24']
prices = [659.38999999999999,
661.30999999999995,
671.45000000000005,
666.79999999999995,
652.59000000000003,
638.16999999999996,
635.85000000000002,
640.90999999999997,
628.10000000000002,
629.71000000000004,
634.75999999999999,
649.78999999999996,
644.61000000000001,
632.63999999999999,
609.84000000000003,
634.02999999999997,
613.36000000000001,
616.83000000000004]
averages, signals = calculate(prices, size=10)
report(dates, prices, averages, signals)
which yields
2012-10-01 659.39 nan
2012-10-02 661.31 nan
2012-10-03 671.45 nan
2012-10-04 666.8 nan
2012-10-05 652.59 nan
2012-10-08 638.17 nan
2012-10-09 635.85 nan
2012-10-10 640.91 nan
2012-10-11 628.1 nan
2012-10-12 629.71 648.428
2012-10-15 634.76 645.965
2012-10-16 649.79 644.813 BUY
2012-10-17 644.61 642.129
2012-10-18 632.64 638.713 SELL
2012-10-19 609.84 634.438
2012-10-22 634.03 634.024 BUY
2012-10-23 613.36 631.775 SELL
2012-10-24 616.83 629.367
I am new to SML and I have write a program that takes two years and compares them, then takes two months and compares them, and finally two dates.
The problem im having is that if the year is older than the first it should stop and false, but some how not sure if it is my logic or something it continues to check the months and then the dates before returning true or false.
I want it only check the month if the year is false and only check the day if the month is false.
fun is_older(year1 : int, month1 : int, day1 : int, year2 : int, month2 : int, day2 : int) =
if year1 < year2 andalso year1 > 0
then true
else
if month1 < month2 andalso month1 > 0 andalso month2 <= 12
then true
else
if day1 < day2 andalso day1 > 0 andalso day2 <= 31
then true
else false;
I'm assuming you're trying to compare two dates in a year, and return true/false value. What you did is mostly correct. In your second if statement, you want to check if month1<month2 only if year1=year2. Otherwise,even if year1=2014, and year2=2013, you'll get true value if their month agree with your second if statement.
Similarly, in your third if statement, you want to check days only if year1=year2 andalso month1=month2.