What does the output of gen_coeffs.gen_two_diode in PVMismatch mean? - pvlib

I do not understand the Output of gen_coeffs.gen_two_diode, which is:
fjac: array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
... nan, nan, nan, nan, nan, nan, nan, nan]])
fun: array([-5.00391537e-02, -5.32664899e-02, -5.71223793e-02, - 5.97740265e-02,-6.18808152e-02, -6.36665900e-02...-1.94677609e-07,
-2.92041963e-07, -3.89423353e-07, -4.86821782e-07, -5.35527386e-07])
ipvt: array([1, 2, 3, 4], dtype=int32)
message: 'The cosine of the angle between func(x) and any column of the\n Jacobian is at most 0.000000 in absolute value'
nfev: 1
njev: 1
qtf: array([nan, nan, nan, nan])
status: 4
success: True
x: array([-24.51750581, -13.41004545, 0.06324555, 3.16227766])
Can I find the values which I need (Isat1, Isat2, Rs and Rs) in this output ? or after using the gen_coeffs.gen_two_diode where can I find the parameters needed for the 2 diode model ?

The output is the typical output of a Scipy method for the solver. Values I am looking for are in the x array
[-24.51750581, -13.41004545, 0.06324555, 3.16227766]. But a bit encrypted:
The Result of the saturation currents are given as the exponent of the e function and the Shunt and Series resistance are expresed as the RMS.
isat1 = np.exp(sol.x[0])
isat2 = np.exp(sol.x[1])
rs = sol.x[2] ** 2.0
rsh = sol.x[3] ** 2.0

Related

Change matrix elements in one matrix, given statements in two other matrices, in python

I have two 1D matrices A and B, containing NaN values in some random places. I want to add these matrices element wise (C[i] = A[i] + B[i]) and take the mean of the element sums. This works well and efficiently in the code below:
import numpy as np
# Create some fake matrices
A = np.arange(0,10,0.5)
B = 10.0*np.arange(0,10,0.5)
# Replace some random elements in A and B with NaN
A[15] = np.nan
A[16] = np.nan
A[17] = np.nan
A[18] = np.nan
B[1] = np.nan
B[2] = np.nan
B[17] = np.nan
B[18] = np.nan
# Sum over A and B, element wise, and take the mean of the sums
C = 0.5 * ( np.where(np.isnan(A), B, A + np.nan_to_num(B)) )
But, if one of A[i] and B[i] contains NaN and the other one doesn't, I don't want to take the mean of the sum, but rather keep the value of the matrix with a value that is not NaN. This I have not been able to solve.
In other words (given A and B) eventually I want C to be:
A
array([ 0., 0.5, 1., 1.5, 2., 2.5, 3., 3.5, 4., 4.5,
5., 5.5, 6., 6.5, 7., nan, nan, nan, nan, 9.5])
B
array([ 0., nan, nan, 15., 20., 25., 30., 35., 40., 45.,
50., 55., 60., 65., 70., 75., 80., nan, nan, 95.])
# What I eventually want C to be:
C
array([ 0., 0.5, 1. , 8.25, 11., 13.75, 16.5, 19.25, 22., 24.75,
27.5, 30.25, 33., 35.75, 38.5, 75., 80., nan, nan, 52.25])
Does anyone have any (efficient) suggestions how I can do this? (For example, I would like to avoid time consuming loops if possible).
NumPy's nanmean generates warnings when both numbers are np.nan, but it gives the result you want:
C = np.nanmean([A, B], axis=0)

Flatten 3 level MultiIndex Pandas dataframe

I have the following pandas df:
Window 5 15 30 45
feature col0 col1 col2 col0 col1 col2 col0 col1 col2 col0 col1 col2
metric mean std mean std mean std mean std mean std mean std mean std mean std mean std mean std mean std mean std
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 -0.878791 1.453479 -0.265591 0.712361 0.532332 0.894304 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 -0.748535 1.459479 -0.023874 1.250110 0.913094 1.134599 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
It has 3 levels which I would like to flatten to:
col0_5_mean col0_5_std col0_15_mean col0_15_std col0_30_mean col0_30_std col0_45_mean col0_45_std col1_5_mean col1_5_std...
So order should be feature_window_metric.
The df is generated by:
import numpy as np
import pandas as pd
np.random.seed(123)
# def add_mean_std_cols3(df):
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
windows = [5, 15, 30, 45]
stats = ['mean', 'std']
cols = pd.MultiIndex.from_product([windows, df.columns, stats],
names=['window', 'feature', 'metric'])
df2 = pd.DataFrame(np.empty((df.shape[0], len(cols))), columns=cols,
index=df.index)
for window in windows:
df2.loc[:, window] = df.rolling(window=window).agg(stats).values
print df2
So far I tried the following solution among others:
From Pandas dataframe with multiindex column - merge levels
df2.columns = df2.columns.map('|'.join)
TypeError: sequence item 0: expected string, long found
I appreciate suggestions,
Thanks
Use
In [1914]: df2.columns = ['{1}_{0}_{2}'.format(*c) for c in df2.columns]
In [1915]: df2.columns
Out[1915]:
Index([u'col0_5_mean', u'col0_5_std', u'col1_5_mean', u'col1_5_std',
u'col2_5_mean', u'col2_5_std', u'col0_15_mean', u'col0_15_std',
u'col1_15_mean', u'col1_15_std', u'col2_15_mean', u'col2_15_std',
u'col0_30_mean', u'col0_30_std', u'col1_30_mean', u'col1_30_std',
u'col2_30_mean', u'col2_30_std', u'col0_45_mean', u'col0_45_std',
u'col1_45_mean', u'col1_45_std', u'col2_45_mean', u'col2_45_std'],
dtype='object')
In [1916]: df2.head(2)
Out[1916]:
col0_5_mean col0_5_std col1_5_mean col1_5_std col2_5_mean col2_5_std \
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
col0_15_mean col0_15_std col1_15_mean col1_15_std ... \
0 NaN NaN NaN NaN ...
1 NaN NaN NaN NaN ...
col1_30_mean col1_30_std col2_30_mean col2_30_std col0_45_mean \
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
col0_45_std col1_45_mean col1_45_std col2_45_mean col2_45_std
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
[2 rows x 24 columns]
You can still use map with format:
df2.columns = df2.columns.map('{0[0]} | {0[1]} | {0[2]}'.format)

Find Indexes of Non-NaN Values in Pandas DataFrame

I have a very large dataset (roughly 200000x400), however I have it filtered and only a few hundred values remain, the rest are NaN. I would like to create a list of indexes of those remaining values. I can't seem to find a simple enough solution.
0 1 2
0 NaN NaN 1.2
1 NaN NaN NaN
2 NaN 1.1 NaN
3 NaN NaN NaN
4 1.4 NaN 1.01
For instance, I would like a list of [(0,2), (2,1), (4,0), (4,2)].
Convert the dataframe to it's equivalent NumPy array representation and check for NaNs present. Later, take the negation of it's corresponding indices (indicating non nulls) using numpy.argwhere. Since the output required must be a list of tuples, you could then make use of generator map function applying tuple as function to every iterable of the resulting array.
>>> list(map(tuple, np.argwhere(~np.isnan(df.values))))
[(0, 2), (2, 1), (4, 0), (4, 2)]
assuming that your column names are of int dtype:
In [73]: df
Out[73]:
0 1 2
0 NaN NaN 1.20
1 NaN NaN NaN
2 NaN 1.1 NaN
3 NaN NaN NaN
4 1.4 NaN 1.01
In [74]: df.columns.dtype
Out[74]: dtype('int64')
In [75]: df.stack().reset_index().drop(0, 1).apply(tuple, axis=1).tolist()
Out[75]: [(0, 2), (2, 1), (4, 0), (4, 2)]
if your column names are of object dtype:
In [81]: df.columns.dtype
Out[81]: dtype('O')
In [83]: df.stack().reset_index().astype(int).drop(0,1).apply(tuple, axis=1).tolist()
Out[83]: [(0, 2), (2, 1), (4, 0), (4, 2)]
Timing for 50K rows DF:
In [89]: df = pd.concat([df] * 10**4, ignore_index=True)
In [90]: df.shape
Out[90]: (50000, 3)
In [91]: %timeit list(map(tuple, np.argwhere(~np.isnan(df.values))))
10 loops, best of 3: 144 ms per loop
In [92]: %timeit df.stack().reset_index().drop(0, 1).apply(tuple, axis=1).tolist()
1 loop, best of 3: 1.67 s per loop
Conclusion: the Nickil Maveli's solution is 12 times faster for this test DF

How to expand an array?

I am new to programming but trying to do the following:
I have an array of time x=[12,18,27,34] with corresponding flux values in array y=[34,68,22,81]. I have expanded x so that the new array (x_new) adds 1 from xmin to xmax.
x_new=[np.min(x)+i for i in range(0,np.max(x)-np.min(x)+1)]
I want to expand my flux array so that it is the same length as x_new but I need the original values in identical index positions as the original x values. What can be input into the spare points of my expanded flux array can be anything.
Any ideas would be great!
[Edit] Just got what you are expecting. Please try the following:
import numpy as np
x=[12,18,27,34] # array of time
y=[34,68,22,81] # corresponding flux values
if __name__ == '__main__':
x_new=[np.min(x)+i for i in range(0,np.max(x)-np.min(x)+1)]
y_new = []
for i in xrange(len(x_new)):
if x_new[i] in x:
y_new.append(y[x.index(x_new[i])])
else:
y_new.append(np.nan)
print 'x_new = ', x_new
print 'y_new = ', y_new
Output:
x_new = [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]
y_new = [34, nan, nan, nan, nan, nan, 68, nan, nan, nan, nan, nan, nan, nan, nan, 22, nan, nan, nan, nan, nan, nan, 81]

Selecting data from an HDFStore by floating-point data_column

I have a table in an HDFStore with a column of floats f stored as a data_column. I would like to select a subset of rows where, e.g., f==0.6.
I'm running in to trouble that I'm assuming is related to a floating-point precision mismatch somewhere. Here is an example:
In [1]: f = np.arange(0, 1, 0.1)
In [2]: s = f.astype('S')
In [3]: df = pd.DataFrame({'f': f, 's': s})
In [4]: df
Out[4]:
f s
0 0.0 0.0
1 0.1 0.1
2 0.2 0.2
3 0.3 0.3
4 0.4 0.4
5 0.5 0.5
6 0.6 0.6
7 0.7 0.7
8 0.8 0.8
9 0.9 0.9
[10 rows x 2 columns]
In [5]: with pd.get_store('test.h5', mode='w') as store:
...: store.append('df', df, data_columns=True)
...:
In [6]: with pd.get_store('test.h5', mode='r') as store:
...: selection = store.select('df', 'f=f')
...:
In [7]: selection
Out[7]:
f s
0 0.0 0.0
1 0.1 0.1
2 0.2 0.2
4 0.4 0.4
5 0.5 0.5
8 0.8 0.8
9 0.9 0.9
[7 rows x 2 columns]
I would like the query to return all of the rows but instead several are missing. A query with where='f=0.3' returns an empty table:
In [8]: with pd.get_store('test.h5', mode='r') as store:
selection = store.select('df', 'f=0.3')
...:
In [9]: selection
Out[9]:
Empty DataFrame
Columns: [f, s]
Index: []
[0 rows x 2 columns]
I'm wondering whether this is the intended behavior, and if so is there is a simple workaround, such as setting a precision limit for floating-point queries in pandas? I'm using version 0.13.1:
In [10]: pd.__version__
Out[10]: '0.13.1-55-g7d3e41c'
I don't think so, no. Pandas is built around numpy, and I have never seen any tools for approximate float equality except testing utilities like assert_allclose, and that won't help here.
The best you can do is something like:
In [17]: with pd.get_store('test.h5', mode='r') as store:
selection = store.select('df', '(f > 0.2) & (f < 0.4)')
....:
In [18]: selection
Out[18]:
f s
3 0.3 0.3
If this is a common idiom for you, make a function for it. You can even get fancy by incorporating numpy float precision.