A way to combine dictionaries with other data to write a csv - list

I have four dictionaries which I need to combine, alongside other data, into a csv file and wondered if anyone knew of a good approach.
Dictionaries
# dict format is "Frequency": Vout
up_50pc = {230: 5.93, 235: 5.96, 240: 5.96, 245: 5.98, 250: 6.0, 255: 5.99, 260: 6.05, 265: 5.97, 270: 5.99}
down_50pc = {270: 6.01, 265: 6.0, 260: 6.02, 255: 5.97, 250: 5.97, 245: 6.0, 240: 6.0, 235: 5.98, 230: 5.95}
up_100pc = {230: 5.92, 235: 5.97, 240: 5.97, 245: 5.97, 250: 5.95, 255: 5.97, 260: 6.04, 265: 6.0, 270: 5.97}
down_100pc = {270: 5.96, 265: 5.99, 260: 6.02, 255: 5.98, 250: 5.97, 245: 5.96, 240: 5.94, 235: 5.95, 230: 5.89}
Other data
Vin = 12.34
duty = 50
dir_up = "UP"
dir_down = "DOWN"
headers = ["Duty", "Direction", "Vin", "Frequency", "Vout"]
not sure whether I need to go through each dict and write the values into a temporary dict and append the values from the temporary dict into a csv or some sort of list of lists or nested dict.
mydict = {"Duty": "blank",
"Direction": "blank",
"Vin": "blank",
"Frequency": "blank",
"Vout":"blank"}
I know I can combine dicts, e.g. a = {"A":1} and b= {"B":2} using c = {**a, **b}, I just don't know how to begin to think about getting the extra data in so that the csv output looks something like that below?
desired output
"Duty", "Direction", "Vin", "Frequency", "Vout"
50, UP, 12.34, 230, 5.92,
50, UP, 12.34, 235, 5.96,
50, UP, 12.34, 240, 5.96,
...
50, UP, 12.34, 270, 5.99,
...
50 ,DOWN, 12.34, 270, 6.01,
50 ,DOWN, 12.34, 270, 6.0,
...
50 ,DOWN, 12.34, 230, 5.95,
...
100 ,UP, 12.34, 230, 5.92,

Related

Pyplot Barchart: Bars not grouping around xticks properly

Im trying to group four bars around the xticks in a bar chart. Heres some sample data (mind you, Im running this in Python 2.7) and my code.
import matplotlib.pyplot as plt
import numpy as np
xps_s1 = range(2008, 2019)
xps_s2 = range(2012, 2019)
xps_s3 = range(2013, 2019)
xps_s4 = range(2014, 2019)
yps_s1 = [94.6, 93.9, 93, 94.7, 94.6, 95.4, 95, 93.6, 93, 93.6, 92.2]
yps_s2 = [81.5, 90.2, 91.5, 94, 95, 94.3, 95.3]
yps_s3 = [83.9, 92.7, 93.3, 94.4, 94.4, 94.6]
yps_s4 = [90.6, 95, 94.8, 94, 93.9]
y_means = [94.6, 93.9, 93, 94.7, np.mean([81.5, 94.6]),
np.mean([83.9, 90.2, 95.4]), np.mean([92.7, 91.5, 95, 90.6]),
np.mean([93.3, 94, 93.6, 95]), np.mean([94.4, 95, 93, 94.8]),
np.mean([94.4, 94.3, 93.6, 94]), np.mean([91.4, 94.6, 95.3, 92.2, 93.9])]
fig = plt.subplots()
ax = plt.axes(xlim=(2007,2019), ylim=(75, 100))
w = 0.2
plt.xticks(np.arange(2008, 2019, step = 1))
rects1 = ax.bar([x-w for x in xps_s1], yps_s1, width=w, align="center",
color='goldenrod', label='Sample1')
rects2 = ax.bar([x-w*2 for x in xps_s2], yps_s2, width=w, align="center",
color='grey', label='Sample2')
rects3 = ax.bar([x+w for x in xps_s3], yps_s3, width=w, align="center",
color='silver', label='Sample3')
rects4 = ax.bar([x+w*2 for x in xps_s4], yps_s4, width=w, align="center",
color='thistle', label='Sample4')
mean_line =ax.plot(xps_s1,y_means, label='Overall',
linestyle='-', color = "indianred")
legend = ax.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()
When I had three bars I set w = 0.3and the bars grouped nicely around the ticks (I had rects1 sit snuggly atop the tick, the other two right up against its flanks, the remaining .09 of width set the years apart)
Now with the above code they dont seem to be related to any tick really and they dont group properly.
What am I doing wrong?
Thanks a lot in advance!
I think you want to use align='edge' to simplify the calculations. Is this what you are trying to obtain?
import matplotlib.pyplot as plt
import numpy as np
xps_s1 = range(2008, 2019)
xps_s2 = range(2012, 2019)
xps_s3 = range(2013, 2019)
xps_s4 = range(2014, 2019)
yps_s1 = [94.6, 93.9, 93, 94.7, 94.6, 95.4, 95, 93.6, 93, 93.6, 92.2]
yps_s2 = [81.5, 90.2, 91.5, 94, 95, 94.3, 95.3]
yps_s3 = [83.9, 92.7, 93.3, 94.4, 94.4, 94.6]
yps_s4 = [90.6, 95, 94.8, 94, 93.9]
y_means = [94.6, 93.9, 93, 94.7, np.mean([81.5, 94.6]),
np.mean([83.9, 90.2, 95.4]), np.mean([92.7, 91.5, 95, 90.6]),
np.mean([93.3, 94, 93.6, 95]), np.mean([94.4, 95, 93, 94.8]),
np.mean([94.4, 94.3, 93.6, 94]), np.mean([91.4, 94.6, 95.3, 92.2, 93.9])]
fig = plt.subplots()
ax = plt.axes(xlim=(2007,2019), ylim=(75, 100))
w = 0.2
plt.xticks(np.arange(2008, 2019, step = 1))
rects1 = ax.bar([x-w for x in xps_s1], yps_s1, width=w, align="edge",
color='goldenrod', label='Sample1')
rects2 = ax.bar([x-w*2 for x in xps_s2], yps_s2, width=w, align="edge",
color='grey', label='Sample2')
rects3 = ax.bar([x for x in xps_s3], yps_s3, width=w, align="edge",
color='silver', label='Sample3')
rects4 = ax.bar([x+w for x in xps_s4], yps_s4, width=w, align="edge",
color='thistle', label='Sample4')
mean_line =ax.plot(xps_s1,y_means, label='Overall',
linestyle='-', color = "indianred")
legend = ax.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()

Force ylim range in subgraph

When plotting a serie of subgraphs with matplotlib, I can't set the ylim range properly.
Here's part of the code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
(...) # loading npy data
titles = ["basestr1", "basestr2", "basestr3", "basestr4", "basestr5"]
labels = ["baselab1", "baselab2", "baselab3", "baselab4", "baselab5"]
linew = 2.24
ms = 10
mw = 2
fc = (1,1,1)
bc = (1,1,1)
mpl.rcParams['axes.prop_cycle'] = mpl.cycler(color=[(1,0.4,0.4), (0.1,0.6,0.1), (0.04,0.2,0.04)])
mpl.rcParams.update({'font.size': 12})
fig2 = plt.subplots(2, 2, figsize=(12,9), facecolor=fc)
plt.rc('font', family='serif')
ax0 = plt.subplot(221)
ax1 = plt.subplot(222)
ax2 = plt.subplot(223)
ax3 = plt.subplot(224)
axl = [ax0, ax1, ax2, ax3]
em = []
fp = []
fn = []
gm = []
for c,element in enumerate(elements):
em.append([i[0] for i in element])
fp.append([i[1][1] if 1 in i[1] else 0 for i in element]) # red
fn.append([i[1][2] if 2 in i[1] else 0 for i in element]) # light green
gm.append([i[1][3] if 3 in i[1] else 0 for i in element]) # dark green
axl[c].semilogy(em[c], fp[c], "-x", lw=linew, markersize=ms, mew=mw) # red
axl[c].semilogy(em[c], fn[c], "-x", lw=linew, markersize=ms, mew=mw) # light green
axl[c].semilogy(em[c], gm[c], "-o", lw=linew, markersize=ms, mew=mw, mfc='None') # dark green
axl[c].set_ylim([-10, 200]) # <-- Here's the issue; it seems not to work properly.
axl[c].grid(True,which="both")
axl[c].set_title(titles[c])
axl[c].set_xlabel(labels[c])
axl[c].set_ylabel(r'Count')
plt.legend(['False', 'True', 'Others'], loc=3, bbox_to_anchor=(.62, 0.4), borderaxespad=0.)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
plt.savefig('/home/username/Desktop/figure.png',
facecolor=fig2.get_facecolor(),edgecolor='w',orientation='landscape',papertype=None,
format=None, transparent=False, bbox_inches=None, pad_inches=0.1,
frameon=None)
plt.show() # block=False
Where elements is a list containing 4 arrays.
Each of these array looks like:
elements[0]
Out[16]:
array([[1, {0.0: 1252, 1.0: 11, 2.0: 170, 3.0: 11}],
[2, {0.0: 1251, 1.0: 12, 2.0: 163, 3.0: 18}],
[3, {0.0: 1229, 1.0: 34, 2.0: 148, 3.0: 33}],
...,
[6, {0.0: 1164, 1.0: 99, 2.0: 125, 3.0: 56}],
[7, {0.0: 1111, 1.0: 152, 2.0: 105, 3.0: 76}],
[8, {0.0: 1056, 1.0: 207, 2.0: 81, 3.0: 100}]], dtype=object)
Where am I wrong?
I can set any values I want in axl[c].set_ylim([-10, 200]) it doesn't change anything on the output graph.
Update:
Ok, it seems not possible to set other value as 1 as starting y-axis value here.

Grouped line charts using pandas and matplotlib

I have a dataset like this:
DataSet image
DataSet can be found here: https://ucr.fbi.gov/crime-in-the-u.s/2013/crime-in-the-u.s.-2013/tables/1tabledatadecoverviewpdf/table_1_crime_in_the_united_states_by_volume_and_rate_per_100000_inhabitants_1994-2013.xls
And i want to plot line chart containing line for every crime rate by year.
Something like this:
Crime Rate graph
But the graph shows continuous years on x-axis like 2005.5 2007.5.
Any one can help? or suggest a better approach to do this. Thanks
and here is the code:
%matplotlib inline
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import plotly.plotly as py
import seaborn as sns
cd =pd.read_clipboard() #after copying the dataset from given url above
yearRate = cd[['Year','ViolentCrimeRate','MurderRate','RapeRate','RobberyRate','AggravatedAssaultRate','PropertyCrimeRate','BurglaryRate','LarcenyTheftRate','MotorVehicleTheftRate']]
# These are the "Tableau 20" colors as RGB.
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
(44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
(148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
(227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
(188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]
for i in range(len(tableau20)):
r, g, b = tableau20[i]
tableau20[i] = (r / 255., g / 255., b / 255.)
plt.figure(figsize=(20,15))
ax = plt.subplot(111)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylim(0,5000)
plt.xlim(1994, 2013)
plt.yticks(fontsize=14)
plt.xticks(fontsize=14)
for y in range(0, 5000, 1000):
plt.plot(range(1994, 2013), [y] * len(range(1994, 2013)), "--", lw=0.5, color="black", alpha=0)
rates=['ViolentCrimeRate','MurderRate','RapeRate','RobberyRate','AggravatedAssaultRate','PropertyCrimeRate','BurglaryRate','LarcenyTheftRate','MotorVehicleTheftRate']
for rank, column in enumerate(rates):
# Plot each line separately with its own color, using the Tableau 20
# color set in order.
plt.plot(yearRate.Year.values,yearRate[column.replace("\n", " ")].values,lw=2.5, color=tableau20[rank])
# Add a text label to the right end of every line. Most of the code below
# is adding specific offsets y position because some labels overlapped.
y_pos = yearRate[column.replace("\n", " ")].values[-1] - 0.5
if column == "MotorVehicleTheftRate":
y_pos -= 50
elif column == "MurderRate":
y_pos -= 50
plt.text(2013, y_pos, column, fontsize=14, color=tableau20[rank])
Adding:
plt.xticks(cd['Year'])
solved the issue.

Matplotlib: plotting two legends outside of the axis makes it cutoff by the figure box

Task:
Plot a donut chart with two legends outside of the axis (first legend - on the right side with respect to the figure, second - on the bottom).
Problem:
When saving the figure, part of the 1st legend is cut off [especially when it contains a long text, see example below]
Desired result:
Make a tight layout of the figure by taking into consideration the dimensions of both legends.
Code:
import matplotlib.pyplot as plt
from pylab import *
ioff() # don't show figures
colors = [(102, 194, 165), (252, 141, 98), (141, 160, 203), (231, 138,195),
(166, 216, 84), (255, 217, 47), (171, 197, 233), (252, 205, 229)]
for icol in range(len(colors)):
red,green,blue = colors[icol]
colors[icol] = (red / 255., green / 255., blue / 255.)
fig = plt.figure(1, figsize=(8, 8))
ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
sizes_component_1 = [12, 23, 100, 46]
sizes_component_2 = [15, 30, 45, 10, 44, 45, 50, 70]
component_1 = 'exampleofalongtextthatiscutoff', '2', '3', '4'
component_2 = 'Unix', 'Mac', 'Windows7', 'Windows10', 'WindowsXP', 'Linux', 'FreeBSD', 'Android'
patches1, texts1, autotexts1 = ax.pie(sizes_component_1, radius=1, pctdistance=0.9, colors=colors, autopct='%1.1f%%', shadow=False, startangle=90)
patches2, texts2, autotexts2 = ax.pie(sizes_component_2, radius=0.8, pctdistance=0.6, colors=colors, autopct='%1.1f%%', shadow=False, startangle=90)
# To draw circular donuts
ax.axis('equal')
# Draw white circle
centre_circle = plt.Circle((0,0),0.6,color='black', fc='white')
ax.add_artist(centre_circle)
# Shrink current axis by 20%
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
lgd1=ax.legend(patches1,component_1, frameon=False, loc='center left', bbox_to_anchor=(1.0, 0.8), borderaxespad=0.1)
lgd2=ax.legend(patches2,component_2, frameon=False, loc='center left', ncol=len(patches2)/2, bbox_to_anchor=(0.0, -0.005), borderaxespad=0)
ax_elem = ax.add_artist(lgd1)
fig.suptitle('Title', fontsize=16)
fig.savefig('donut.png',bbox_extra_artists=(lgd1,lgd2,), bbox_inches='tight')
plt.gcf().clear() # clears buffer
This issue is come with pie chart: https://github.com/matplotlib/matplotlib/issues/4251
And it is not fixed.

R: regex from string to two dimensional data frame in one command?

I have a string s containing such key-value pairs, and I would like to construct from it data frame,
s="{'#JJ': 121, '#NN': 938, '#DT': 184, '#VB': 338, '#RB': 52}"
r1<-sapply(strsplit(s, "[^0-9_]+",as.numeric),as.numeric)
r2<-sapply(strsplit(s, "[^A-Z]+",as.numeric),as.character)
d<-data.frame(id=r2,value=r1)
what gives:
r1
[,1]
[1,] NA
[2,] 121
[3,] 938
[4,] 184
[5,] 338
[6,] 52
r2
[,1]
[1,] ""
[2,] "JJ"
[3,] "NN"
[4,] "DT"
[5,] "VB"
[6,] "RB"
d
id value
1 NA
2 JJ 121
3 NN 938
4 DT 184
5 VB 338
6 RB 52
First I would like don't have NA and "" after using regular expression. I think it should be something like {2,} meaning match all from second occurence, but I can not do that in R.
Another think I would like to do will be: having a data frame with column like below:
m
1 {'#JJ': 121, '#NN': 938, '#DT': 184, '#VB': 338, '#RB': 52}
2 {'#NN': 168, '#DT': 59, '#VB': 71, '#RB': 5, '#JJ': 35}
3 {'#JJ': 18, '#NN': 100, '#DT': 23, '#VB': 52, '#RB': 11}
4 {'#NN': 156, '#JJ': 39, '#DT': 46, '#VB': 67, '#RB': 21}
5 {'#NN': 112, '#DT': 39, '#VB': 57, '#RB': 8, '#JJ': 32}
6 {'#DT': 236, '#NN': 897, '#VB': 420, '#RB': 122, '#JJ': 240}
7 {'#NN': 316, '#RB': 25, '#DT': 66, '#VB': 112, '#JJ': 81}
8 {'#NN': 198, '#DT': 29, '#VB': 85, '#RB': 37, '#JJ': 44}
9 {'#RB': 30}
10 {'#NN': 373, '#DT': 48, '#VB': 71, '#RB': 21, '#JJ': 36}
11 {'#NN': 49, '#DT': 17, '#VB': 23, '#RB': 11, '#JJ': 8}
12 {'#NN': 807, '#JJ': 135, '#DT': 177, '#VB': 315, '#RB': 69}
I would like to iterate over each row and split it numerical values into the columns named by the key.
Example of few rows showing, how I would like it will looks like:
I would use something that parses JSON, what your data seems to be:
s <- "{'#JJ': 121, '#NN': 938, '#DT': 184, '#VB': 338, '#RB': 52}"
parse.one <- function(s) {
require(rjson)
v <- fromJSON(gsub("'", '"', s))
data.frame(id = gsub("#", "", names(v)),
value = unlist(v, use.names = FALSE))
}
parse.one(s)
# id value
# 1 JJ 121
# 2 NN 938
# 3 DT 184
# 4 VB 338
# 5 RB 52
For the second part of the question, I would pass a slightly modified version of the parse.one function through lapply, then let plyr's rbind.fill function align the pieces together while filling missing values with NA:
df <- data.frame(m = c(
"{'#JJ': 121, '#NN': 938, '#DT': 184, '#VB': 338, '#RB': 52}",
"{'#NN': 168, '#DT': 59, '#VB': 71, '#RB': 5, '#JJ': 35}",
"{'#JJ': 18, '#NN': 100, '#DT': 23, '#VB': 52, '#RB': 11}",
"{'#JJ': 12, '#VB': 5}"
))
parse.one <- function(s) {
require(rjson)
y <- fromJSON(gsub("'", '"', s))
names(y) <- gsub("#", "", names(y))
as.data.frame(y)
}
library(plyr)
rbind.fill(lapply(df$m, parse.one))
# JJ NN DT VB RB
# 1 121 938 184 338 52
# 2 35 168 59 71 5
# 3 18 100 23 52 11
# 4 12 NA NA 5 NA
For now, I'll offer a solution to the first part of your question. Clean up your string and use read.table:
s="{'#JJ': 121, '#NN': 938, '#DT': 184, '#VB': 338, '#RB': 52}"
read.table(text = gsub(",", "\n", gsub("[{|}|#]", "", s)),
header = FALSE, sep = ":", strip.white=TRUE)
# V1 V2
# 1 JJ 121
# 2 NN 938
# 3 DT 184
# 4 VB 338
# 5 RB 52
For the second part, here's another alternative using concat.split from a package I wrote called "splitstackshape":
Sample data:
df <- data.frame(m = c(
"{'#JJ': 121, '#NN': 938, '#DT': 184, '#VB': 338, '#RB': 52}",
"{'#NN': 168, '#DT': 59, '#VB': 71, '#RB': 5, '#JJ': 35}",
"{'#JJ': 18, '#NN': 100, '#DT': 23, '#VB': 52, '#RB': 11}"
))
Similar cleanup as above, plus add an "id" column.
df$m <- gsub("[{|}|#]", "", df$m)
df$id <- 1:nrow(df)
Load the "splitstackshape" package:
# install.packages("splitstackshape")
library(splitstackshape)
df2 <- concat.split(concat.split.multiple(df, "m", ",", "long"),
"m", ":", drop = TRUE)
## df2 <- df2[complete.cases(df2), ] ##
## ^^ might be necessary if there are NAs in the resulting data.frame
The data are now in a "long" format that is easy to manipulate:
df2
# id time m_1 m_2
# 1 1 1 JJ 121
# 2 2 1 NN 168
# 3 3 1 JJ 18
# 4 1 2 NN 938
# 5 2 2 DT 59
# 6 3 2 NN 100
# 7 1 3 DT 184
# 8 2 3 VB 71
# 9 3 3 DT 23
# 10 1 4 VB 338
# 11 2 4 RB 5
# 12 3 4 VB 52
# 13 1 5 RB 52
# 14 2 5 JJ 35
# 15 3 5 RB 11
Here's an example of manipulating the data, using dcast from the "reshape2" package:
library(reshape2)
dcast(df2, id ~ m_1, value.var="m_2")
# id DT JJ NN RB VB
# 1 1 184 121 938 52 338
# 2 2 59 35 168 5 71
# 3 3 23 18 100 11 52