Is there a way to compare formatted value by NumberFormat? - unit-testing

I'm creating unit tests for a currency format using intl package last version 0.16.1.
My formatter is: final currencyFormat = NumberFormat.simpleCurrency(locale: 'pt_BR');
My expected result is: var expected = r'R$ 1,00';. And I applied my formatter to value 1.0 in this way: var formatted = currencyFormat.format(1.0) resulting in R$ 1,00.
When I test two values using an expect(formatted, expected), it show me that the results are not the same.
Expected: 'R$ 1,00'
Actual: 'R$ 1,00'
Which: is different.
Expected: R$ 1,00
Actual: R$ 1,00
^
Differ at offset 2
Well, I had take some time to discover that the runes of two strings have one character diff.
expected runes: (82, 36, 32, 49, 44, 48, 48)
formatted runes: (82, 36, 160, 49, 44, 48, 48)
My question is: if is using a Non-breaking space when formatting, how can I avoid this error when there is no documentation talking about it?

Until now, the only way I achieved was put unicode character to my expected string: var expected = 'R\$\u{00A0}1,00';.
This is not a good solution because if we have many formatters inside app and test all that use Non-breaking space we have to put in our expected value the ASCII code.

Related

Aggregating using regex in Google Sheets

I have the following cell in my spreadsheet. It is a string that has the structure of a JSON object.
D2 = {color: white, quantity: 23, size: small}, {color: black, size: medium, Quantity: 40}
For the cell D3, I want the output to be 63, which is the total of the quantities found in D2. Moreover, the following things should be noted:
If something goes wrong, for example, if a quantity is given as a string, then D3 is simply kept empty.
Anytime D2 is updated, D3 is updated as well.
If D3 is updated manually, that result is kept instead of the formula. This will happen for rows where D2 is empty.
Is there any way to do this using Google Sheets? I tried using REGEXEXTRACT, however, it did not produce any result. Any help is greatly appreciated.
Edits:
The order of the attributes is not important. For example, the quantity can appear before or after the size.
The case is also not important.
=SUMPRODUCT(IFERROR(REGEXREPLACE(SPLIT(D22, ":"), "}.*", "")))
=SUMPRODUCT(IFERROR(REGEXREPLACE(SPLIT(SUBSTITUTE(D21, "quantity:", "♦"), "♦"), "}.*", "")))

How to generate conditional words in the text? (Inline code)

I want to have some conditional words in my R Markdown document. Depending of the outcome in some of the calculations from the tables different words should show up in the ordinary text. Please, see my example below:
The table (a chunk):
testtabell <- matrix(c(32, 33, 45, 67, 21, 56, 76, 33, 22), ncol=3,byrow = TRUE)
colnames(testtabell) <- c("1990", "1991", "1992")
rownames(testtabell) <- c("Region1", "Region2", "Region3")
testtabell <- as.table(testtabell)
testtabell
This should be in the inline code and generate different word options in the regular text flow in the RMD:
`r if testtabell[2,2]-[2,1] < testtabell[3,2]-testtabell[3,1] then type "under" or else "above"`
Although the question is ancient, maybe someone can use this solution.
You can use the following inline code:
`r ifelse(testtabell[2,2]-testtabell[2,1] < testtabell[3,2]-testtabell[3,1],"under","above")`

How to see if the user chose a element in a list (python)

I'm trying to use the following code :
blosum = input("pick a matrix:")
x = [30, 40, 50, 100, 75, 70]
while blosum not in x :
blosum = raw_input("Incorrect, pick a valid matrix:")
print ('ok')
I want it to decide whether the user chose one of the options of the list. If the user chose one of them, then the program should keep running, otherwise, it keeps telling the user to pick a valid matrix. But it doesn't work, why?
Go ahead and change raw_input to input in your code like so and cast the user supplied data to an integer like so:
blosum = int(input("pick a matrix: "))
x = [30, 40, 50, 100, 75, 70]
while blosum not in x:
blosum = int(input("Incorrect, pick a valid matrix:"))
print ("OK")
Test
$ python test.py
pick a matrix: 1
Incorrect, pick a valid matrix:2
Incorrect, pick a valid matrix:30
OK
This will work for both Python 2.7+ and 3+, I believe, but you should test it out nonetheless.
See the difference between raw_input and input in questions like these:
How can I read inputs as integers?
What's the difference between raw_input() and input() in python3.x?

Converting Values to Switch Statement Cases

For a programming class, I have to convert a range of values to a switch statement without using if/else ifs. Here are the values that I need to convert to cases:
0 to 149 ............. $10.00
150 to 299 .........$15.00
300 to 449 .........$25.00
550 to 749..........$40.00
750 to 1199........$65.00
2000 and above.....$85.00
I am having difficulty finding a way to separate the values since they are so close in number (like 149 to 150).
I have used plenty of algorithms such as dividing the input by 2000, and then multiplying that by 10 to get a whole number, but they are too close to each other to create a new case for.
The first thing to do is to figure out your granularity. It looks like in your case you do not deal with increments less than 50.
Next, convert each range to a range of integers resulting from dividing the number by the increment (i.e. 50). In your case, this would mean
0, 1, 2 --> 10
3, 4, 5 --> 15
6, 7, 8 --> 25
... // And so on
This maps to a very straightforward switch statement.
Note: This also maps to an array of values, like this:
10, 10, 10, 15, 15, 15, 25, 25, 25, ...
Now you can get the result by doing array[n/50].

Detecting mulicollinear , or columns that have linear combinations while modelling in Python : LinAlgError

I am modelling data for a logit model with 34 dependent variables,and it keep throwing in the singular matrix error , as below -:
Traceback (most recent call last):
File "<pyshell#1116>", line 1, in <module>
test_scores = smf.Logit(m['event'], train_cols,missing='drop').fit()
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 1186, in fit
disp=disp, callback=callback, **kwargs)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 164, in fit
disp=disp, callback=callback, **kwargs)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 357, in fit
hess=hess)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 405, in _fit_mle_newton
newparams = oldparams - np.dot(np.linalg.inv(H),
File "/usr/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 445, in inv
return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
File "/usr/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 328, in solve
raise LinAlgError, 'Singular matrix'
LinAlgError: Singular matrix
Which was when I stumpled on this method to reduce the matrix to its independent columns
def independent_columns(A, tol = 0):#1e-05):
"""
Return an array composed of independent columns of A.
Note the answer may not be unique; this function returns one of many
possible answers.
https://stackoverflow.com/q/13312498/190597 (user1812712)
http://math.stackexchange.com/a/199132/1140 (Gerry Myerson)
http://mail.scipy.org/pipermail/numpy-discussion/2008-November/038705.html
(Anne Archibald)
>>> A = np.array([(2,4,1,3),(-1,-2,1,0),(0,0,2,2),(3,6,2,5)])
2 4 1 3
-1 -2 1 0
0 0 2 2
3 6 2 5
# try with checking the rank of matrixs
>>> independent_columns(A)
np.array([[1, 4],
[2, 5],
[3, 6]])
"""
Q, R = linalg.qr(A)
independent = np.where(np.abs(R.diagonal()) > tol)[0]
#print independent
return A[:, independent], independent
A,independent_col_indexes=independent_columns(train_cols.as_matrix(columns=None))
#train_cols will not be converted back from a df to a matrix object,so doing this explicitly
A2=pd.DataFrame(A, columns=train_cols.columns[independent_col_indexes])
test_scores = smf.Logit(m['event'],A2,missing='drop').fit()
I still get the LinAlgError , though I was hoping I will have the reduced matrix rank now.
Also, I see np.linalg.matrix_rank(train_cols) returns 33 (ie. before calling on the independent_columns function total "x" columns was 34(ie, len(train_cols.ix[0])=34 ), meaning I don't have a full rank matrix), while np.linalg.matrix_rank(A2) returns 33 (meaning I have dropped a columns, and yet I still see the LinAlgError , when I run test_scores = smf.Logit(m['event'],A2,missing='drop').fit() , what am I missing ?
reference to the code above -
How to find degenerate rows/columns in a covariance matrix
I tried to start building the model forward,by introducing each variable at a time, which doesn't give me the singular matrix error, but I would rather have a method that is deterministic, and lets me know, what am I doing wrong & how to eliminate these columns.
Edit (updated post the suggestions by #
user333700 below)
1. You are right, "A2" doesn't have the reduced rank of 33 . ie. len(A2.ix[0]) =34 -> meaning the possibly collinear columns are not dropped - should I increase the "tol", tolerance to get rank of A2 (and the numbers of columns thereof) , as 33. If I change the tol to "1e-05" above, then I do get len(A2.ix[0]) =33, which suggests to me that tol >0 (strictly) is one indicator.
After this I just did the same, test_scores = smf.Logit(m['event'],A2,missing='drop').fit(), without nm to get the convergence.
2. Errors post trying 'nm' method. Strange thing though is that if I take just 20,000 rows, I do get the results. Since it is not showing up Memory error, but "Inverting hessian failed, no bse or cov_params available" - I am assuming, there are multiple nearly-similar records - what would you say ?
m = smf.Logit(data['event_custom'].ix[0:1000000] , train_cols.ix[0:1000000],missing='drop')
test_scores=m.fit(start_params=None,method='nm',maxiter=200,full_output=1)
Warning: Maximum number of iterations has been exceeded
Warning (from warnings module):
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 374
warn(warndoc, Warning)
Warning: Inverting hessian failed, no bse or cov_params available
test_scores.summary()
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
test_scores.summary()
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 2396, in summary
yname_list)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 2253, in summary
use_t=False)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/iolib/summary.py", line 826, in add_table_params
use_t=use_t)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/iolib/summary.py", line 447, in summary_params
std_err = results.bse
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/tools/decorators.py", line 95, in __get__
_cachedval = self.fget(obj)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 1037, in bse
return np.sqrt(np.diag(self.cov_params()))
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 1102, in cov_params
raise ValueError('need covariance of parameters for computing '
ValueError: need covariance of parameters for computing (unnormalized) covariances
Edit 2: (updated post the suggestions by #user333700 below)
Reiterating what I am trying to model - less than about 1% of total
users "convert" (success outcomes) - so I took a balanced sample of
35(+ve) /65 (-ve)
I suspect the model is not robust, though it converges. So, will use "start_params" as the params from earlier iteration, from a different dataset.
This edit is about confirming is the "start_params" can feed into the results as below -:
A,independent_col_indexes=independent_columns(train_cols.as_matrix(columns=None))
A2=pd.DataFrame(A, columns=train_cols.columns[independent_col_indexes])
m = smf.Logit(data['event_custom'], A2,missing='drop')
#m = smf.Logit(data['event_custom'], train_cols,missing='drop')#,method='nm').fit()#This doesnt work, so tried 'nm' which work, but used lasso, as nm did not converge.
test_scores=m.fit_regularized(start_params=None, method='l1', maxiter='defined_by_method', full_output=1, disp=1, callback=None, alpha=0, \
trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03)
a_good_looking_previous_result.params=test_scores.params #storing the parameters of pass1 to feed into pass2
test_scores.params
bidfloor_Quartile_modified_binned_0 0.305765
connectiontype_binned_0 -0.436798
day_custom_binned_Fri -0.040269
day_custom_binned_Mon 0.138599
day_custom_binned_Sat -0.319997
day_custom_binned_Sun -0.236507
day_custom_binned_Thu -0.058922
user_agent_device_family_binned_iPad -10.793270
user_agent_device_family_binned_iPhone -8.483099
user_agent_masterclass_binned_apple 9.038889
user_agent_masterclass_binned_generic -0.760297
user_agent_masterclass_binned_samsung -0.063522
log_height_width 0.593199
log_height_width_ScreenResolution -0.520836
productivity -1.495373
games 0.706340
entertainment -1.806886
IAB24 2.531467
IAB17 0.650327
IAB14 0.414031
utilities 9.968253
IAB1 1.850786
social_networking -2.814148
IAB3 -9.230780
music 0.019584
IAB9 -0.415559
C(time_day_modified)[(6, 12]]:C(country)[AUS] -0.103003
C(time_day_modified)[(0, 6]]:C(country)[HKG] 0.769272
C(time_day_modified)[(6, 12]]:C(country)[HKG] 0.406882
C(time_day_modified)[(0, 6]]:C(country)[IDN] 0.073306
C(time_day_modified)[(6, 12]]:C(country)[IDN] -0.207568
C(time_day_modified)[(0, 6]]:C(country)[IND] 0.033370
... more params here
Now on a different dataset(pass2, for indexing), I model the same as below -:
ie. I read a new dataframe, do all the variable transformation and then model via Logit as earlier .
m_pass2 = smf.Logit(data['event_custom'], A2_pass2,missing='drop')
test_scores_pass2=m_pass2.fit_regularized(start_params=a_good_looking_previous_result.params, method='l1', maxiter='defined_by_method', full_output=1, disp=1, callback=None, alpha=0, \
trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03)
and, possibly keep iterating by picking up "start_params" from earlier passes.
Several points to this:
You need tol > 0 to detect near perfect collinearity, which might also cause numerical problems in later calculations.
Check the number of columns of A2 to see whether a column has really be dropped.
Logit needs to do some non-linear calculations with the exog, so even if the design matrix is not very close to perfect collinearity, the transformed variables for the log-likelihood, derivative or Hessian calculations might still end up being with numerical problems, like singular Hessian.
(All these are floating point problems when we work near floating point precision, 1e-15, 1e-16. There are sometimes differences in the default thresholds for matrix_rank and similar linalg functions which can imply that in some edge cases one function identifies it as singular and another one doesn't.)
The default optimization method for the discrete models including Logit is a simple Newton method, which is fast in reasonably nice cases, but can fail in cases that are badly conditioned. You could try one of the other optimizers which will be one of those in scipy.optimize, method='nm' is usually very robust but slow, method='bfgs' works well in many cases but also can run into convergence problems.
Nevertheless, even when one of the other optimization methods succeeds, it is still necessary to inspect the results. More often than not, a failure with one method means that the model or estimation problem might not be well defined.
A good way to check whether it is just a problem with bad starting values or a specification problem is to run method='nm' first and then run one of the more accurate methods like newton or bfgs using the nm estimate as starting value, and see whether it succeeds from good starting values.