Reading Data from CSV and fill Empty Values Python - python-2.7

I am reading in a CSV file with the general schema of
,abv,ibu,id,name,style,brewery_id,ounces
14,0.061,60.0,1979,Bitter Bitch,American Pale Ale (APA),177,12.0
0 , 0.05,, 1436, Pub Beer, American Pale Lager, 408, 12.0
I am running into problems where fields are not existing such as in object 0 where it is lacking an IBU. I would like to be able to insert a value such as 0.0 that would work as a float for values that require floats and an empty string for ones that require strings.
My code is along the lines of
import csv
import numpy as np
def dataset(path, filter_field, filter_value):
with open(path, 'r') as csvfile:
reader = csv.DictReader(csvfile)
if filter_field:
for row in filter(lambda row: row[filter_field]==filter_value, reader):
yield row
def main(path):
data = [(row["ibu"], float(row["ibu"])) for row in dataset(path, "style", "American Pale Lager")]
As of right now my code would throw an error sine there are empty values in the "ibu" column for object 0.
How should one go about solving this problem?

You can do the following:
add a default dictionary input that you can use for missing values
and also to update upon certain conditions such as when ibu is empty
this is your implementation changed to correct for what you need. If I were you I would use pandas ...
import csv, copy
def dataset(path, filter_field, filter_value, default={'brewery_id':-1, 'style': 'unkown style', ' ': -1, 'name': 'unkown name', 'abi':0.0, 'id': -1, 'ounces':-1, 'ibu':0.0}):
with open(path, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
if row is None:
break
if row[filter_field].strip() != filter_value:
continue
default_row = copy.copy(default)
default_row.update(row)
# you might want to add conditions
if default_row["ibu"] == "":
default_row["ibu"] = default["ibu"]
yield default_row
data = [(row["ibu"], float(row["ibu"])) for row in dataset('test.csv', "style", "American Pale Lager")]
print data
>> [(0.0, 0.0)]

Why don't you use
import pandas as pd
df = pd.read_csv(data_file)
The following is the result:
In [13]: df
Out[13]:
Unnamed: 0 abv ibu id name style \
0 14 0.061 60.0 1979 Bitter Bitch American Pale Ale (APA)
1 0 0.050 NaN 1436 Pub Beer American Pale Lager
brewery_id ounces
0 177 12.0
1 408 12.0

Simulating your file with a text string:
In [48]: txt=b""" ,abv,ibu,id,name,style,brewery_id,ounces
...: 14,0.061,60.0,1979,Bitter Bitch,American Pale Ale (APA),177,12.0
...: 0 , 0.05,, 1436, Pub Beer, American Pale Lager, 408, 12.0
...: """
I can load it with numpy genfromtxt.
In [49]: data=np.genfromtxt(txt.splitlines(),delimiter=',',dtype=None,skip_heade
...: r=1,filling_values=0)
In [50]: data
Out[50]:
array([ (14, 0.061, 60., 1979, b'Bitter Bitch', b'American Pale Ale (APA)', 177, 12.),
( 0, 0.05 , 0., 1436, b' Pub Beer', b' American Pale Lager', 408, 12.)],
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<i4'), ('f4', 'S12'), ('f5', 'S23'), ('f6', '<i4'), ('f7', '<f8')])
In [51]:
I had to skip the header line because it is incomplete (a blank for the 1st field). The result is a structured array - a mix of ints, floats and strings (bytestrings in Py3).
After correcting the header line, and using names=True, I get
array([ (14, 0.061, 60., 1979, b'Bitter Bitch', b'American Pale Ale (APA)', 177, 12.),
( 0, 0.05 , 0., 1436, b' Pub Beer', b' American Pale Lager', 408, 12.)],
dtype=[('f0', '<i4'), ('abv', '<f8'), ('ibu', '<f8'), ('id', '<i4'), ('name', 'S12'), ('style', 'S23'), ('brewery_id', '<i4'), ('ounces', '<f8')])
genfromtxt is the most powerful csv reader in numpy. See it's docs for more parameters. The pandas reader is faster and more flexible - but of course produces a data frame, not array.

Related

Plotting Elevation in Python

I'm trying to create a map of Malawi with altitude shown. Something like this, but of Malawi of course:
I have downloaded some elevation data from here: http://research.jisao.washington.edu/data_sets/elevation/
This is a print of that data after I created a cube:
meters, from 5-min data / (unknown) (time: 1; latitude: 360; longitude: 720)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
Attributes:
history:
Elevations calculated from the TBASE 5-minute
latitude-longitude resolution...
invalid_units: meters, from 5-min data
I started with importing my data, forming a cube, removing the extra variables (time and history) and limiting my data to the latitudes and longitudes for Malawi.
import matplotlib.pyplot as plt
import matplotlib.cm as mpl_cm
import numpy as np
import iris
import cartopy
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
import iris.analysis.cartography
def main():
#bring in altitude data
Elev = '/exports/csce/datastore/geos/users/s0899345/Climate_Modelling/Actual_Data/elev.0.5-deg.nc'
Elev= iris.load_cube(Elev)
#remove variable for time
del Elev.attributes['history']
Elev = Elev.collapsed('time', iris.analysis.MEAN)
Malawi = iris.Constraint(longitude=lambda v: 32.0 <= v <= 36., latitude=lambda v: -17. <= v <= -8.)
Elev = Elev.extract(Malawi)
print 'Elevation'
print Elev.data
print 'latitude'
print Elev.coord('latitude')
print 'longitude'
print Elev.coord('longitude')
This works well and the output is as follows:
Elevation
[[ 978. 1000. 1408. 1324. 1080. 1370. 1857. 1584.]
[ 1297. 1193. 1452. 1611. 1354. 1480. 1350. 627.]
[ 1418. 1490. 1625. 1486. 1977. 1802. 1226. 482.]
[ 1336. 1326. 1405. 728. 1105. 1559. 1139. 789.]
[ 1368. 1301. 1463. 1389. 671. 942. 947. 970.]
[ 1279. 1116. 1323. 1587. 839. 1014. 1071. 1003.]
[ 1096. 969. 1179. 1246. 855. 979. 927. 638.]
[ 911. 982. 1235. 1324. 681. 813. 814. 707.]
[ 749. 957. 1220. 1198. 613. 688. 832. 858.]
[ 707. 1049. 1037. 907. 624. 771. 1142. 1104.]
[ 836. 1044. 1124. 1120. 682. 711. 1126. 922.]
[ 1050. 1204. 1199. 1161. 777. 569. 999. 828.]
[ 1006. 869. 1183. 1230. 1354. 616. 762. 784.]
[ 838. 607. 883. 1181. 1174. 927. 591. 856.]
[ 561. 402. 626. 775. 1053. 726. 828. 733.]
[ 370. 388. 363. 422. 508. 471. 906. 1104.]
[ 504. 326. 298. 208. 246. 160. 458. 682.]
[ 658. 512. 334. 309. 156. 162. 123. 340.]]
latitude
DimCoord(array([ -8.25, -8.75, -9.25, -9.75, -10.25, -10.75, -11.25, -11.75,
-12.25, -12.75, -13.25, -13.75, -14.25, -14.75, -15.25, -15.75,
-16.25, -16.75], dtype=float32), standard_name='latitude', units=Unit('degrees'), var_name='lat', attributes={'title': 'Latitude'})
longitude
DimCoord(array([ 32.25, 32.75, 33.25, 33.75, 34.25, 34.75, 35.25, 35.75], dtype=float32), standard_name='longitude', units=Unit('degrees'), var_name='lon', attributes={'title': 'Longitude'})
However when I try to plot it, it doesn't work... this is what I did:
#plot map with physical features
ax = plt.axes(projection=cartopy.crs.PlateCarree())
ax.add_feature(cartopy.feature.COASTLINE)
ax.add_feature(cartopy.feature.BORDERS)
ax.add_feature(cartopy.feature.LAKES, alpha=0.5)
ax.add_feature(cartopy.feature.RIVERS)
#plot altitude data
plot=ax.plot(Elev, cmap=mpl_cm.get_cmap('YlGn'), levels=np.arange(0,2000,150), extend='both')
#add colour bar index and a label
plt.colorbar(plot, label='meters above sea level')
#set map boundary
ax.set_extent([32., 36., -8, -17])
#set axis tick marks
ax.set_xticks([33, 34, 35])
ax.set_yticks([-10, -12, -14, -16])
lon_formatter = LongitudeFormatter(zero_direction_label=True)
lat_formatter = LatitudeFormatter()
ax.xaxis.set_major_formatter(lon_formatter)
ax.yaxis.set_major_formatter(lat_formatter)
#save the image of the graph and include full legend
plt.savefig('Map_data_boundary', bbox_inches='tight')
plt.show()
The error I get is 'Attribute Error: Unknown property type cmap' and the following map of the whole world...
Any ideas?
I'll prepare the data the same as you, except to remove the time dimension I'll use iris.util.squeeze, which removes any length-1 dimension.
import iris
elev = iris.load_cube('elev.0.5-deg.nc')
elev = iris.util.squeeze(elev)
malawi = iris.Constraint(longitude=lambda v: 32.0 <= v <= 36.,
latitude=lambda v: -17. <= v <= -8.)
elev = elev.extract(malawi)
As #ImportanceOfBeingErnest says, you want a contour plot. When unsure what plotting function to use, I recommend browsing the matplotlib gallery to find something that looks similar to what you want to produce. Click on an image and it shows you the code.
So, to make the contour plot you can use the matplotlib.pyplot.contourf function, but you have to get the relevant data from the cube in the form of numpy arrays:
import matplotlib.pyplot as plt
import matplotlib.cm as mpl_cm
import numpy as np
import cartopy
cmap = mpl_cm.get_cmap('YlGn')
levels = np.arange(0,2000,150)
extend = 'max'
ax = plt.axes(projection=cartopy.crs.PlateCarree())
plt.contourf(elev.coord('longitude').points, elev.coord('latitude').points,
elev.data, cmap=cmap, levels=levels, extend=extend)
However, iris provides a shortcut to the maplotlib.pyplot functions in the form of iris.plot. This automatically sets up an axes instance with the right projection, and passes the data from the cube through to matplotlib.pyplot. So the last two lines can simply become:
import iris.plot as iplt
iplt.contourf(elev, cmap=cmap, levels=levels, extend=extend)
There is also iris.quickplot, which is basically the same as iris.plot, except that it automatically adds a colorbar and labels where appropriate:
import iris.quickplot as qplt
qplt.contourf(elev, cmap=cmap, levels=levels, extend=extend)
Once plotted, you can get hold of the axes instance and add your other items (for which I simply copied your code):
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
qplt.contourf(elev, cmap=cmap, levels=levels, extend=extend)
ax = plt.gca()
ax.add_feature(cartopy.feature.COASTLINE)
ax.add_feature(cartopy.feature.BORDERS)
ax.add_feature(cartopy.feature.LAKES, alpha=0.5)
ax.add_feature(cartopy.feature.RIVERS)
ax.set_xticks([33, 34, 35])
ax.set_yticks([-10, -12, -14, -16])
lon_formatter = LongitudeFormatter(zero_direction_label=True)
lat_formatter = LatitudeFormatter()
ax.xaxis.set_major_formatter(lon_formatter)
ax.yaxis.set_major_formatter(lat_formatter)
It seems you want something like a contour plot. So instead of
plot = ax.plot(...)
you probably want to use
plot = ax.contourf(...)
Most probably you also want to give latitude and longitude as arguments to contourf,
plot = ax.contourf(longitude, latitude, Elev, ...)
You can try to add this:
import matplotlib.colors as colors
color = plt.get_cmap('YlGn') # and change cmap=mpl_cm.get_cmap('YlGn') to cmap=color
And also try to update your matplotlib:
pip install --upgrade matplotlib
EDIT
color = plt.get_cmap('YlGn') # and change cmap=mpl_cm.get_cmap('YlGn') to cmap=color

Regridding? CRU Observed data and CORDEX data in Python Iris

I am trying to compare simulated climate model data from CORDEX to Observed data from CRU 4.00. I am doing this in Python running iris. I have managed to get all of my climate models to run, but the observed data won't. I suspect this is because the model data are in rotated pole grid with x/y axis and 0.44 degree resolution where as the observed data is on a linear grid and 0.5 degree resolution.
In order to make them comparable I think I need to regrid them, but I am a bit confused on how to do this, and the iris userguide is confusing me further... I am relatively new to this!
This is the simplified code to create a line graph showing one model output and the CRU data:
import matplotlib.pyplot as plt
import iris
import iris.coord_categorisation as iriscc
import iris.plot as iplt
import iris.quickplot as qplt
import iris.analysis.cartography
import matplotlib.dates as mdates
def main():
#bring in all the files we need and give them a name
CCCma = '/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/AFR_44_tas/ERAINT/1979-2012/tas_AFR-44_ECMWF-ERAINT_evaluation_r1i1p1_CCCma-CanRCM4_r2_mon_198901-200912.nc'
CRU = '/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/Actual_Data/cru_ts4.00.1901.2015.tmp.dat.nc'
#Load exactly one cube from given file
CCCma = iris.load_cube(CCCma)
CRU = iris.load_cube(CRU, 'near-surface temperature')
#remove flat latitude and longitude and only use grid latitude and grid longitude
lats = iris.coords.DimCoord(CCCma.coord('latitude').points[:,0], \
standard_name='latitude', units='degrees')
lons = CCCma.coord('longitude').points[0]
for i in range(len(lons)):
if lons[i]>100.:
lons[i] = lons[i]-360.
lons = iris.coords.DimCoord(lons, \
standard_name='longitude', units='degrees')
CCCma.remove_coord('latitude')
CCCma.remove_coord('longitude')
CCCma.remove_coord('grid_latitude')
CCCma.remove_coord('grid_longitude')
CCCma.add_dim_coord(lats, 1)
CCCma.add_dim_coord(lons, 2)
lats = iris.coords.DimCoord(CRU.coord('latitude').points[:,0], \
standard_name='latitude', units='degrees')
lons = CRU.coord('longitude').points[0]
for i in range(len(lons)):
if lons[i]>100.:
lons[i] = lons[i]-360.
lons = iris.coords.DimCoord(lons, \
standard_name='longitude', units='degrees')
CRU.remove_coord('latitude')
CRU.remove_coord('longitude')
CRU.remove_coord('grid_latitude')
CRU.remove_coord('grid_longitude')
CRU.add_dim_coord(lats, 1)
CRU.add_dim_coord(lons, 2)
#we are only interested in the latitude and longitude relevant to Malawi
Malawi = iris.Constraint(longitude=lambda v: 32.5 <= v <= 36., \
latitude=lambda v: -17. <= v <= -9.)
CCCma = CCCma.extract(Malawi)
CRU=CRU.extract(Malawi)
#time constraignt to make all series the same
iris.FUTURE.cell_datetime_objects = True
t_constraint = iris.Constraint(time=lambda cell: 1989 <= cell.point.year <= 2008)
CCCma = CCCma.extract(t_constraint)
CRU=CRU.extract(t_constraint)
#data is in Kelvin, but we would like to show it in Celcius
CCCma.convert_units('Celsius')
#CRU.convert_units('Celsius')
#We are interested in plotting the graph with time along the x ais, so we need a mean of all the coordinates, i.e. mean temperature across whole country
iriscc.add_year(CCCma, 'time')
CCCma = CCCma.aggregated_by('year', iris.analysis.MEAN)
CCCma.coord('latitude').guess_bounds()
CCCma.coord('longitude').guess_bounds()
CCCma_grid_areas = iris.analysis.cartography.area_weights(CCCma)
CCCma_mean = CCCma.collapsed(['latitude', 'longitude'],
iris.analysis.MEAN,
weights=CCCma_grid_areas)
iriscc.add_year(CRU, 'time')
CRU = CRU.aggregated_by('year', iris.analysis.MEAN)
CRU.coord('latitude').guess_bounds()
CRU.coord('longitude').guess_bounds()
CRU_grid_areas = iris.analysis.cartography.area_weights(CRU)
CRU_mean = CRU.collapsed(['latitude', 'longitude'],
iris.analysis.MEAN,
weights=CRU_grid_areas)
#set major plot indicators for x-axis
plt.gca().xaxis.set_major_locator(mdates.YearLocator(5))
#assign the line colours
qplt.plot(CCCma_mean, label='CanRCM4_ERAINT', lw=1.5, color='blue')
qplt.plot(CRU_mean, label='Observed', lw=1.5, color='black')
#create a legend and set its location to under the graph
plt.legend(loc="upper center", bbox_to_anchor=(0.5,-0.05), fancybox=True, shadow=True, ncol=2)
#create a title
plt.title('Mean Near Surface Temperature for Malawi 1989-2008', fontsize=11)
#add grid lines
plt.grid()
#show the graph in the console
iplt.show()
if __name__ == '__main__':
main()
And this is the error I get:
runfile('/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/Python Code and Output Images/Line_Graph_Annual_Tas_Play.py', wdir='/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/Python Code and Output Images')
Traceback (most recent call last):
File "<ipython-input-8-2976c65ebce5>", line 1, in <module>
runfile('/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/Python Code and Output Images/Line_Graph_Annual_Tas_Play.py', wdir='/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/Python Code and Output Images')
File "/usr/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "/usr/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
builtins.execfile(filename, *where)
File "/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/Python Code and Output Images/Line_Graph_Annual_Tas_Play.py", line 124, in <module>
main()
File "/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/Python Code and Output Images/Line_Graph_Annual_Tas_Play.py", line 42, in main
lats = iris.coords.DimCoord(CRU.coord('latitude').points[:,0], \
IndexError: too many indices
Thank you!
So turns out I didn't need to regrid. In case anyone else wants to run a line graph with CRU data in python with iris. Here is the code to do it. In my case I was limiting the lat/lons to only look at Malawi and I was only interested in some years.
#bring in all the files we need and give them a name
CRU= '/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/Actual_Data/cru_ts4.00.1901.2015.tmp.dat.nc'
#Load exactly one cube from given file
CRU = iris.load_cube(CRU, 'near-surface temperature')
#define the latitude and longitude
lats = iris.coords.DimCoord(CRU.coord('latitude').points, \
standard_name='latitude', units='degrees')
lons = CRU.coord('longitude').points
#we are only interested in the latitude and longitude relevant to Malawi
Malawi = iris.Constraint(longitude=lambda v: 32.5 <= v <= 36., \
latitude=lambda v: -17. <= v <= -9.)
CRU = CRU.extract(Malawi)
#time constraignt to make all series the same
iris.FUTURE.cell_datetime_objects = True
t_constraint = iris.Constraint(time=lambda cell: 1950 <= cell.point.year <= 2005)
CRU = CRU.extract(t_constraint)
#We are interested in plotting the graph with time along the x ais, so we need a mean of all the coordinates, i.e. mean temperature across whole country
iriscc.add_year(CRU, 'time')
CRU = CRU.aggregated_by('year', iris.analysis.MEAN)
CRU.coord('latitude').guess_bounds()
CRU.coord('longitude').guess_bounds()
CRU_grid_areas = iris.analysis.cartography.area_weights(CRU)
CRU_mean = CRU.collapsed(['latitude', 'longitude'],
iris.analysis.MEAN,
weights=CRU_grid_areas
#set major plot indicators for x-axis
plt.gca().xaxis.set_major_locator(mdates.YearLocator(5))
#assign the line colours
qplt.plot(CRU_mean, label='Observed', lw=1.5, color='black')
#create a legend and set its location to under the graph
plt.legend(loc="upper center", bbox_to_anchor=(0.5,-0.05), fancybox=True, shadow=True, ncol=2)
#create a title
plt.title('Mean Near Surface Temperature for Malawi 1950-2005', fontsize=11)
#add grid lines
plt.grid()
#save the image of the graph and include full legend
plt.savefig('Historical_Temperature_LineGraph_Annual', bbox_inches='tight')
#show the graph in the console
iplt.show()

Bokeh - Grouped Bar chart

Maintainer note: This question as-is is obsolete, since the bokeh.charts API was deprecated and removed years ago. But see the answer below for how to create grouped bar charts with the stable bokeh.plotting API in newer versions of Bokeh
I want to create a simple bar chart (like the one in the oficial example page)
I tried executing the code in this old answer Plotting Bar Charts with Bokeh
but it show the error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-ba53ce344126> in <module>()
11
12 bar = Bar(xyvalues, cat, title="Stacked bars",
---> 13 xlabel="category", ylabel="language")
14
15 output_file("stacked_bar.html")
/usr/local/lib/python2.7/dist-packages/bokeh/charts/builders/bar_builder.pyc in Bar(data, label, values, color, stack, group, agg, xscale, yscale, xgrid, ygrid, continuous_range, **kw)
318 kw['y_range'] = y_range
319
--> 320 chart = create_and_build(BarBuilder, data, **kw)
321
322 # hide x labels if there is a single value, implying stacking only
/usr/local/lib/python2.7/dist-packages/bokeh/charts/builder.pyc in create_and_build(builder_class, *data, **kws)
60 # create the new builder
61 builder_kws = {k: v for k, v in kws.items() if k in builder_props}
---> 62 builder = builder_class(*data, **builder_kws)
63
64 # create a chart to return, since there isn't one already
/usr/local/lib/python2.7/dist-packages/bokeh/charts/builder.pyc in __init__(self, *args, **kws)
280
281 # handle input attrs and ensure attrs have access to data
--> 282 attributes = self._setup_attrs(data, kws)
283
284 # remove inputs handled by dimensions and chart attributes
/usr/local/lib/python2.7/dist-packages/bokeh/charts/builder.pyc in _setup_attrs(self, data, kws)
331 attributes[attr_name].iterable = custom_palette
332
--> 333 attributes[attr_name].setup(data=source, columns=attr)
334
335 else:
/usr/local/lib/python2.7/dist-packages/bokeh/charts/attributes.pyc in setup(self, data, columns)
193
194 if columns is not None and self.data is not None:
--> 195 self.set_columns(columns)
196
197 if self.columns is not None and self.data is not None:
/usr/local/lib/python2.7/dist-packages/bokeh/charts/attributes.pyc in set_columns(self, columns)
185 # assume this is now the iterable at this point
186 self.iterable = columns
--> 187 self._setup_default()
188
189 def setup(self, data=None, columns=None):
/usr/local/lib/python2.7/dist-packages/bokeh/charts/attributes.pyc in _setup_default(self)
142 def _setup_default(self):
143 """Stores the first value of iterable into `default` property."""
--> 144 self.default = next(self._setup_iterable())
145
146 def _setup_iterable(self):
/usr/local/lib/python2.7/dist-packages/bokeh/charts/attributes.pyc in _setup_iterable(self)
320
321 def _setup_iterable(self):
--> 322 return iter(self.items)
323
324 def get_levels(self, columns):
TypeError: 'NoneType' object is not iterable
The oficial example did work
URL: http://docs.bokeh.org/en/0.11.0/docs/user_guide/charts.html#userguide-charts-data-types
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Bar(df, label='yr', values='mpg', agg='median', group='origin',
title="Median MPG by YR, grouped by ORIGIN", legend='top_right')
output_file("bar.html")
show(p)
BUT, I don't want to use pandas, I want to use a simple python dictionary like this:
my_simple_dict = {
'Group 1': [22,33,44,55],
'Group 2': [44,66,0,24],
'Group 3': [2,99,33,51]
}
How cant I achive a Bar chart that shows the tree groups (Group 1, Group 2, Group 3) with the x-axis going from 1 to 4?
NOTE: I am working with python 2.7
The question and other answers are obsolete, as bokeh.charts was deprecated and removed several years ago. However. support for grouped and stacked bar charts using the stable bokeh.plotting API has improved greatly since then:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html
Here is a full example:
from bokeh.io import show
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 3, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", "2017"), ("Pears", "2015), ... ]
x = [ (fruit, year) for fruit in fruits for year in years ]
counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an hstack
source = ColumnDataSource(data=dict(x=x, counts=counts))
p = figure(x_range=FactorRange(*x), plot_height=250, title="Fruit Counts by Year",
toolbar_location=None, tools="")
p.vbar(x='x', top='counts', width=0.9, source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
For now the solution I found is changing the dict structure
from bokeh.charts import Bar, output_file, show, hplot
import pandas as pd
my_simple_dict = {
'Group 1': [22,33,44,55],
'Group 2': [44,66,0,24],
'Group 3': [2,99,33,51]
}
my_data_transformed_dict = {}
my_data_transformed_dict['x-axis'] = []
my_data_transformed_dict['value'] = []
my_data_transformed_dict['group-name'] = []
for group, group_list in my_simple_dict.iteritems():
x_axis = 0
for item in group_list:
x_axis += 1
my_data_transformed_dict['x-axis'].append(x_axis)
my_data_transformed_dict['value'].append(item)
my_data_transformed_dict['group-name'].append(group)
my_bar = Bar(my_data_transformed_dict, values='value',label='x-axis',group='group-name',legend='top_right')
output_file("grouped_bar.html")
show(my_bar)
If someone knows a better way please tell me

Need help to read a file in python

I am trying to read a file and file contains various kinds of data.
The example of the file type is given below.
[CIRCUIT1]
CIRCUITNAME=CIRCUIT1
00.12 12/20 2.3 23.6
00.12 12/20 2.3 23.6
00.42 12/20 2.2 23.3
00.42 12/20 2.2 23.3
[CIRCUIT2]
CIRCUITNAME=CIRCUIT2
00.12 12/20 2.2 26.7
00.12 12/20 2.2 26.7
00.42 12/20 2.2 26.5
00.42 12/20 2.2 26.5
00.42 12/20 2.2 26.5
[AMBIENT]
00.42 12/20 8.6
01.42 12/20 8.6
02.42 12/20 8.6
03.42 12/20 8.7
04.42 12/20 8.8
05.42 12/20 8.6
06.42 12/20 8.7
Now, I have defined a function which only returns the 3rd and 4th column of circuit1.
but date and time formats should be returned and will be defined later. But I'm getting index out of range error.
def load_ci(filepath):
fileObj=open(filepath, 'r')
time_1=[],time_2=[],t=0,ti=0,loadCurrent_1=[],surfaceTemp_1=[],loadCurrent_2=[],surfaceTemp_2=[],ambient=[]
read=0
for line in fileObj:
if not line.strip():
continue
if read==1:
if '[AMBIENT]' in line:
read=3
continue
elif 'CIRCUITNAME=CIRCUIT2' in line: read=2
else:
if line!='\n' and '[CIRCUIT2]' not in line:
point=line.split(' ')
t=(float(point[0]))
ti=int(t)*3600+(t-int(t))*60*100
time_1.append(ti)
loadCurrent_1.append(float(point[2]))
surfaceTemp_1.append(float(point[3]))
if read==2:
if '[AMBIENT]' in line:
read=3
continue
elif 'CIRCUITNAME=CIRCUIT2' in line: read=2
else:
if line!='\n' and '[CIRCUIT2]' not in line:
point=line.split(' ')
t=(float(point[0]))
ti=int(t)*3600+(t-int(t))*60*100
time_2.append(ti)
loadCurrent_2.append(float(point[2]))
surfaceTemp_2.append(float(point[3]))
if read==3:
if line!='\n':
point=line.split(' ')
ambient.append(float(point[2]))
if 'CIRCUITNAME=CIRCUIT1' in line: read=1
return np.array(loadCurrent_1),np.array(surfaceTemp_1),np.array(loadCurrent_2),np.array(surfaceTemp_2),np.array(ambient),np.array(time_1),np.array(time_2)
After you detect a line containing [AMBIENT], you need to advance to the next line, while changing your read state to 3. Add a continue statement after read = 3 at two points in your code where you check for [AMBIENT]
Additionally, change your code checking for [CIRCUIT2] from
if line != '\n' and line != '[CIRCUIT2]':
to
if line != '\n' and '[CIRCUIT2]' not in line:
If you want to disregard empty lines, you can add a check at the beginning of your loop like:
if not line.strip():
continue
I've reworked your code in the question to break out parsing circuit from ambient data, to simplify state management. I pass around the file object, utilizing its iteration state to keep track of where we are in the file at any given point. Sections of the file start with '[...]' and end with a blank line, so I can take advantage of that. I group all of the circuit data into a dictionary for convenience, but this could be rolled into a full fledged class as well if you wanted to.
import numpy as np
def parseCircuit(it, header):
loadCurrent, surfaceTemp, time = [], [], []
for line in it:
line = line.strip()
if not line:
break
elif line.startswith('CIRCUITNAME='):
name = line[12:]
else:
point=line.split(' ')
h, m = map(int, point[0].split('.'))
time.append(h * 3600 + m * 60)
loadCurrent.append(float(point[2]))
surfaceTemp.append(float(point[3]))
return {'name': name,
'surfaceTemp': np.array(surfaceTemp),
'loadCurrent': np.array(loadCurrent),
'time': np.array(time)}
def parseAmbient(it, header):
ambient = []
for line in it:
line = line.strip()
if not line:
break
point=line.split(' ')
ambient.append(float(point[2]))
return np.array(ambient)
def load_ci(filepath):
fileObj=open(filepath, 'r')
circuits = {}
ambient = None
for line in fileObj:
line = line.strip() # remove \n from end of line
if not line: # skip empty lines
continue
if line.startswith('[CIRCUIT'):
circuit = parseCircuit(fileObj, line)
circuits[circuit['name']] = circuit
elif line.startswith('[AMBIENT'):
ambient = parseAmbient(fileObj, line)
return circuits, ambient
print load_ci('test.ci')
outputs
({'CIRCUIT2': {'loadCurrent': array([ 2.2, 2.2, 2.2, 2.2, 2.2]), 'surfaceTemp': array([ 26.7, 26.7, 26.5, 26.5, 26.5]), 'name': 'CIRCUIT2', 'time': array([ 720, 720, 2520, 2520, 2520])}, 'CIRCUIT1': {'loadCurrent': array([ 2.3, 2.3, 2.2, 2.2]), 'surfaceTemp': array([ 23.6, 23.6, 23.3, 23.3]), 'name': 'CIRCUIT1', 'time': array([ 720, 720, 2520, 2520])}}, array([ 8.6, 8.6, 8.6, 8.7, 8.8, 8.6, 8.7]))

Opening a text file and then storing the contents into a nested dictionary in python 2.7

I'm fairly new to Python, and computing languages in general. I want to open a text file and then store its contents in a nested dictionary. Here's my code so far:
inputfile = open("Proj 4.txt", "r")
for line in inputfile:
line = line.strip()
print line
NGC = {}
inputfile.close()
I know I need to use the add operation for dictionaries I'm just unsure how to proceed. Here's a copy of the text file:
NGC0224
Name: Andromeda Galaxy
Messier: M31
Distance: 2900
Magnitude: 3.4
NGC6853
Name: Dumbbell Nebula
Messier: M27
Distance: 1.25
Magnitude: 7.4
NGC4826
Name: Black Eye Galaxy
Messier: M64
Distance: 19000
Magnitude: 8.5
NGC4254
Name: Coma Pinwheel Galaxy
Messier: M99
Distance: 60000
Brightness: 9.9 mag
NGC5457
Name: Pinwheel Galaxy
Messier: M101
Distance: 27000
Magnitude: 7.9
NGC4594
Name: Sombrero Galaxy
Messier: M104
Distance: 50000
with open(infilepath) as infile:
answer = {}
name = None
for line in infile:
line = line.strip()
if line.startswith("NGC"):
name = line
answer[name] = {}
else:
var, val = line.split(':', 1)
answer[name][var.strip()] = val.strip()
Output with your text file:
>>> with open(infilepath) as infile:
... answer = {}
... name = None
... for line in infile:
... line = line.strip()
... if line.startswith("NGC"):
... name = line
... answer[name] = {}
... else:
... var, val = line.split(':', 1)
... answer[name][var.strip()] = val.strip()
...
>>> answer
{'NGC6853': {'Messier': 'M27', 'Magnitude': '7.4', 'Distance': '1.25', 'Name': 'Dumbbell Nebula'}, 'NGC4254': {'Brightness': '9.9 mag', 'Messier': 'M99', 'Distance': '60000', 'Name': 'Coma Pinwheel Galaxy'}, 'NGC4594': {'Messier': 'M104', 'Distance': '50000', 'Name': 'Sombrero Galaxy'}, 'NGC0224': {'Messier': 'M31', 'Magnitude': '3.4', 'Distance': '2900', 'Name': 'Andromeda Galaxy'}, 'NGC4826': {'Messier': 'M64', 'Magnitude': '8.5', 'Distance': '19000', 'Name': 'Black Eye Galaxy'}, 'NGC5457': {'Messier': 'M101', 'Magnitude': '7.9', 'Distance': '27000', 'Name': 'Pinwheel Galaxy'}}
You have to define better how you want this data mapped to a dictionary. I you can change the file format, it would be nice to reformat it as a standard INI file. You could read it with the ConfigParser module.
But if you really want to go this way. Here is a quick and dirty solution:
d = {}
k = ''
for line in open('Proj 4.txt'):
if ':' in line:
key, value = line.split(':', 1)
d[k][key] = value.strip()
else:
k = line.strip()
d[k] = {}
The dict d has the parsed file.