Combined Bar plots in python - python-2.7

I have a grouped dataframe which is this:
Speed (mph)
Label Hour
5 13 18.439730
14 24.959555
15 33.912493
7 13 23.397055
14 18.497228
15 33.493978
12 13 32.851146
14 33.187193
15 32.597150
14 13 14.491841
14 12.397724
15 19.581669
21 13 34.985289
14 34.817009
15 34.888187
26 13 35.813901
14 36.622450
15 36.540348
28 13 33.761174
14 33.951116
15 33.736014
29 13 34.545862
14 34.227974
15 34.435377
I am trying to plot bar plots where each bar is grouped by their Label and Hour
An example:
The above graph is just an example I found on the internet. I don't really need the lines and the numbers over the bars.
I tried plotting like this:
newdf.plot.bar()
plt.show()
which gives me -
Question
How can I plot the graph so that Label:5,Hour:13,14,15 and close to gether, then some space and then Label:7,Hour:13,14,15 close together and so on?

It seems you need to unstack
df.unstack().plot.bar()
plt.show()

Related

Issue generating DAG using CausalFS cpp package

Link to CausalFS GitHub I'm using v.2.0 of CausalFS cpp package by Kui Yu.
Upon running the structural learning algos, my DAG and MB are not matching.
I'm trying to generate a DAG based on the data given in the CDD/data/data.txt directory and CDD/data.txt via some of the Local-to-global structure learning algos mentioned in the manual (PCMB-CSL, STMB-CSL etc.). Running the commands as given by the manual (pg. 18 of 26).
But my resulting DAG is just filled with zeros (for the most part). Given that this is a example dataset that looks suspicious. Upon then checking CDD/mb/mb.out I find that the Markov blankets for the variables do not agree with the DAG output.
For ex, running ./main ./data/data.txt ./data/net.txt 0.01 PCMB-CSL "" "" -1 gives a 1 at position (1,22) (one-indexed) only (relaxing alpha value to 0.1 (kept at 0.01 in ex) gives just another 1). However, this doesn't agree with the output MB for each variable, which looks like (upon running IAMB as ./main ./data/data.txt ./net/net.txt 0.01 IAMB all "" "")-
0 21
1 22 26 28
2 29
3 14 21
4 5 12
5 4 12
6 8 12
7 8 12
8 6 7 12
9 11 15
10 35
11 9 12 15 33
12 4 6 7 8 11 13
13 8 12 14 15 17 30 34
14 3 13 20
15 8 9 11 13 14 17 30
16 15
17 13 15 18 27 30
18 17 19 20 27
19 18 20
20 14 18 21 28
21 0 3 20 26
22 1 21 23 24 28
23 1 22 24
24 5 22 23 25
25 24
26 1 21 22
27 17 18 28 29
28 1 18 21 22 27 29
29 2 27
30 13 14 15 17
31 34
32 15 18 34
33 11 12 32 35 36
34 30 31 32 35 36
35 10 33 34
36 33 34 35
Such an MB profile suggests the DAG to be much more connected.
I would love to hear suggestions from people who've managed to get the package to behave appropriately. I just do not understand the error here from my side. (I'm running on PopOS 20.04)
Thanks a bunch <3
P.S- The files just continue to write upon rerunning the code, so make sure to appropriately delete them.

Add a dynamic constant line based on column in powerbi

I am trying to plot a line chart in powerBI with a reference line based on another columns value. I have data that represents the journeys of different cars on different sections of road. I am plotting those journeys that travel over the same section of road. e.g. RoadId 10001.
Distance JourneyNum Speed ThresholdSpeed RoadId
1 10 50 60 10001
2 10 51 60 10001
3 10 52 60 10001
1 11 45 60 10001
2 11 46 60 10001
3 11 47 60 10001
7 12 20 30 10009
8 12 21 30 10009
9 12 22 30 10009
10 12 23 30 10009
So currently I have:
Distance on x-axis (Axis),
Speed on y-axis (Values),
JourneyNum as the Legend (Legend),
filter to roadId 10001
I want to also add the thresholdSpeed as a reference line or just as another line would do. Any help?
I don't think it's possible (yet) to pass a measure to a constant line, so you'll need a different approach.
One possibility is to reshape your data so that ThresholdSpeed appears as part of your Legend. You can do this in DAX like so:
Table2 =
VAR NewRows = SELECTCOLUMNS(Table1,
"Distance", Table1[Distance],
"JourneyNum", "Threshold",
"Speed", Table1[ThresholdSpeed],
"ThresholdSpeed", Table1[ThresholdSpeed],
"RoadId", Table1[RoadId])
RETURN UNION(Table1, DISTINCT(NewRows))
Which results in a table like this:
Distance JourneyNum Speed ThresholdSpeed RoadId
1 10 50 60 10001
2 10 51 60 10001
3 10 52 60 10001
1 11 45 60 10001
2 11 46 60 10001
3 11 47 60 10001
1 Threshold 60 60 10001
2 Threshold 60 60 10001
3 Threshold 60 60 10001
7 12 20 30 10009
8 12 21 30 10009
9 12 22 30 10009
10 12 23 30 10009
7 Threshold 30 30 10009
8 Threshold 30 30 10009
9 Threshold 30 30 10009
10 Threshold 30 30 10009
Then you make a line chart on this table instead:
Note: It's probably preferable to do this transformation in the query editor though so you don't have redundant tables.

Facebook Graph API: page_fans_online, localizing dates to where the business is located

I am trying to replicate the data that is used in "When your fans are online" section" of a business page's insights dashboard. I am using the following parameters in the /insights/page_fans_online api call which returns the data I am after:
parameters={'period':'day','since':'2018-10-20T07:00:00','until':'2018-10-21T07:00:00','access_token':page_token['access_token'][0]}
The data returned can be seen below, where:
end_time = end_time (based on the since & until dates in the parameters)
name = metric
apiHours = hour of day returned
localDate = localized date (applied manually)
localHours = - 6 hour offset to localize to Auckland/New Zealand (applied
manually to replicate what is seen on the insights dashboard.
fansOnline = number of unique page fans online during that hour
Data:
end_time name apiHours localDate localHours fansOnline
2018-10-21T07:00:00+0000 page_fans_online 0 2018-10-19 18 21
1 2018-10-19 19 29
2 2018-10-19 20 20
3 2018-10-19 21 18
4 2018-10-19 22 20
5 2018-10-19 23 15
6 2018-10-19 0 4
7 2018-10-19 1 6
8 2018-10-19 2 5
9 2018-10-19 3 8
10 2018-10-19 4 17
11 2018-10-19 5 19
12 2018-10-19 6 26
13 2018-10-19 7 24
14 2018-10-19 8 20
15 2018-10-19 9 22
16 2018-10-19 10 19
17 2018-10-19 11 22
18 2018-10-19 12 18
19 2018-10-19 13 18
20 2018-10-19 14 18
21 2018-10-19 15 18
22 2018-10-19 16 21
23 2018-10-19 17 28
It took a while to work out that the data returned when pulling page_fans_online using the parameters specified above is for Wednesday October 19th, for a New Zealand business page.
If we look at the last row in the data above:
end_time = 2018-10-21
apiHours = 23
localDate = 2018-10-19
localHours = 17
fansOnline = 28
It is saying on 2018-10-21 # 11 pm there were 28 unique fans online. This translates to , on 2018-10-19 # 5 pm there were 28 unique fans online when the dates and times are manually localized, (I worked the offset out by checking the "When your fans online" graphs on the page insights).
There is a -54 hour offset between 2018-10-21 11:00 pm and 2018-10-19 5:00 pm, and my question is, what is the logic used behind the end_time and hour of day returned by the page_fans_online insights metric and is there any info regarding how this should be localized depending on what country the business is located?
There is only a simple description of what page_fans_online is in the page/insights docs and says the hours are in PST/PDT but that does not help with localizing the date and hour of day:
https://developers.facebook.com/docs/graph-api/reference/v3.1/insights

python: obtaining a column of dates from the columns of years-months-days

Suppose I have a very simple dataframe:
>>> a
Out[158]:
monthE yearE dayE
0 10 2014 15
1 2 2012 15
2 2 2014 15
3 12 2015 15
4 2 2012 15
Suppose that I want to create the column with the date related to every line, using three columns of integers.
When I have simple numbers it is enough to do like:
>>> datetime.date(1983,11,8)
Out[159]: datetime.date(1983, 11, 8)
If I have to create a column of dates (theoretically a very basic request), instead:
a.apply(lambda x: datetime.date(x['yearE'],x['monthE'],x['dayE']))
I obtain the following error:
KeyError: ('yearE', u'occurred at index monthE')
I think you can first remove last char E and then use to_datetime, but then get pandas timestamps not python dates:
df.columns = df.columns.str[:-1]
df['date'] = pd.to_datetime(df)
#if multiple columns filter by subset
#df['date'] = pd.to_datetime(df[['year','month','day']])
print (df)
month year day date
0 10 2014 15 2014-10-15
1 2 2012 15 2012-02-15
2 2 2014 15 2014-02-15
3 12 2015 15 2015-12-15
4 2 2012 15 2012-02-15
print (df.date.dtypes)
datetime64[ns]
print (df.date.iloc[0])
2014-10-15 00:00:00
print (type(df.date.iloc[0]))
<class 'pandas.tslib.Timestamp'>
Thank you MaxU for solution:
df['date'] = pd.to_datetime(df.rename(columns = lambda x: x[:-1]))
#if another columns in df
#df['date'] = pd.to_datetime(df[['yearE','monthE','dayE']].rename(columns=lambda x: x[:-1]))
print (df)
monthE yearE dayE date
0 10 2014 15 2014-10-15
1 2 2012 15 2012-02-15
2 2 2014 15 2014-02-15
3 12 2015 15 2015-12-15
4 2 2012 15 2012-02-15
But if really need python dates add axis=1 to apply, but then is impossible use some pandas functions:
df['date'] =df.apply(lambda x: datetime.date(x['yearE'],x['monthE'],x['dayE']), axis=1)
print (df)
monthE yearE dayE date
0 10 2014 15 2014-10-15
1 2 2012 15 2012-02-15
2 2 2014 15 2014-02-15
3 12 2015 15 2015-12-15
4 2 2012 15 2012-02-15
print (df.date.dtypes)
object
print (df.date.iloc[0])
2014-10-15
print (type(df.date.iloc[0]))
<class 'datetime.date'>

If index duplicated then add column value to sum

The pandas DF has datetime index with price and volume at that price.
Last Volume
Date_Time
20160907 070000 1.1249 17
20160907 070001 1.1248 12
20160907 070001 1.1249 15
20160907 070002 1.1248 13
20160907 070002 1.1249 20
I want to create a column that keeps a running total(sum) of volume through the sequence if the price repeats. I am trying to create a column that would look like this.
Last Volume VolumeCount
1.1249 17 17
1.1248 12 12
1.1249 15 32
1.1248 13 25
1.1249 20 52
I have been working on different functions and loops and I can't seem to create a column that that isn't a total sum of the group. I would really appreciate any help or suggestions. Thank you.
Try:
DF['VolumeCount'] = DF.groupby('Last')['Volume'].cumsum()
I hope this helps.
You want to accumulated volume on contiguous sets of same Last
consider the df
Last Volume
Date_Time
20160907-70000 1.1249 17
20160907-70001 1.1248 12
20160907-70001 1.1248 15
20160907-70002 1.1248 13
20160907-70002 1.1249 20
Then
df.Volume.groupby((df.Last != df.Last.shift()).cumsum()).cumsum()
Date_Time
20160907-70000 17
20160907-70001 12
20160907-70001 27
20160907-70002 40
20160907-70002 20
Name: Volume, dtype: int64