How can I convert +00:00 timezone to Z timezone - python-2.7

I am comparing the following dates as:
result = 2018-06-29T20:56:41+00:00 <= 2018-06-30T00:38:32Z
But this is giving as false. How can I make the 2 dates compare to True as Cleary 29 < 30. Initially, I thought it has to do with the timezone but on google search, found out that both formats are UTC timezone. Can anyone help me understand if that's correct and then compare these results to true?

Are you actually converting them to datatime objects, e.g.:
In []:
d1 = datetime.strptime("2018-06-29T20:56:41+00:00", "%Y-%m-%dT%H:%M:%S%z")
d2 = datetime.strptime("2018-06-30T00:38:32Z", "%Y-%m-%dT%H:%M:%S%z")
d1 <= d2
Out[]:
True
Note: in Py3.7 the first of these could be replaced with datetime.fromisoformat()
But even the string forms should also return True, so not sure why you are getting False:
In []:
"2018-06-29T20:56:41+00:00" <= "2018-06-30T00:38:32Z"
Out[]:
True

Related

Trouble using utcoffset with Chart.js

I'm trying to use Chart.js with a datetime x axis, and I need to adjust all my values by subtracting 5 hours. Here's some of my code:
var timeFormat = 'MM/DD HH:mm';
time: {
format: timeFormat,
tooltipFormat: 'll',
parser: function(utcMoment) {
return moment(utcMoment).utcOffset(5, true);
}
},
Without the parser function, my values are normal (10:00, January 10, 2021), but with the parser function, for some reason my values are set back all the way to 2001. Yes two-thousand-and-one.(10:00, January 10, 2001) Note that the time is not actually changed (So two errors: 1.time not adjusted when it should be. 2:years adjusted when it shouldn't be). Why could this be?
I will assume that the reason you want to roll it back by 5 hours is because of a timezone difference. If that's the case, you should use moment-timezone instead of moment.
With that said, subtracting 5 hours from the current date is actually simpler than what you're doing.
Before feeding a date into moment, you need to convert it to the js Date object like so: new Date('2021-01-10 00:00:00'). Since your parser function accepts the date in m/d H:M format, you would need to append the year to it first.
So here is how your code should look:
parser: function(utcMoment) {
const new_date = utcMoment.split(' ')[0] + '/' + (new Date().getFullYear()) + ' ' + utcMoment.split(' ')[1];
return moment(new Date(new_date)).subtract({hours: 5})
}

select column with non-zero values from dataframe

I have data like the data below. I would like to only return the columns from the dataframe that contain at least one non-zero value. So in the example below it would be column ALF. Returning non-zero rows doesn’t seem that tricky but selecting the column and records is giving me a little trouble.
print df
Data:
Type ADR ALE ALF AME
Seg0 0.0 0.0 0.0 0.0
Seg1 0.0 0.0 0.5 0.0
When I try something like the link below:
Pandas: How to select columns with non-zero value in a sparse table
m1 = (df['Type'] == 'Seg0')
m2 = (df[m1] != 0).all()
print (df.loc[m1,m2])
I get a key error for 'Type'
In my opinion you get key error because first column is index:
Solution use DataFrame.any for check at least one non zero value to mask and then filter index of Trues:
m2 = (df != 0).any()
a = m2.index[m2]
print (a)
Index(['ALF'], dtype='object')
Or if need list:
a = m2.index[m2].tolist()
print (a)
['ALF']
Similar solution is filter columns names:
a = df.columns[m2]
Detail:
print (m2)
ADR False
ALE False
ALF True
AME False
dtype: bool

Error searching first occurence

I have a list in python and I'm searching the first occurrence of a value (>=5000000), but it returns the first of >=500000.
Here the command:
First_Range_Init_Freq = next(x for x in Freq_Cust if x > '5000000.000')
The strange is, it works fine if I search an exact value in the list with "==", but returns wrong value if I search something bigger or equal ">=".
Any ideas?
Not sure if I understand problem correctly (insufficient data).
Are you trying to,
1) Get first occurrence of value >=5000000 in a list?
2) Or its position?
3) You are comparing a int (>=5000000) with a stringx >
'5000000.000'
Anyway, If its first occurrence you are trying to find you can use array index method as below. (problem using a sample list)
Freq_Cust = ['4800000.000', '5000000.000' , '5000001.000', '50000010.000', '4900000.000', '500000.000' ]
First_Range_Init_Freq = [Freq_Cust.index(x) for x in Freq_Cust if x >= '5000000.000'][0]
print Freq_Cust[First_Range_Init_Freq]
Result:
Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
5000000.000
>>>
For its position you can get using,
>>> print First_Range_Init_Freq
1
I am assuming you are comparing string with string or int with int. Else you will need to balance the comparision with correct type.
Let me know if this is not what you are looking for. And update question with more details.
EDIT1
Based on your comments. I believe the problem is that you are comparing a number as text and hence its giving incorrect result.
For e.g. see below,
>>> '5000000' == '5000000.000'
False
>>> '5000000.000' > '5000000'
True
>>> 5000000 == 5000000.000
True
>>>
A textual '5000000' is not = '5000000.000' , while a int 5000000 number is = float 5000000.000.
So, if you map your list values to float and also the comparision is done with a float , it will yield correct result.
Freq_Cust = ['4800000', '5000000' , '5000001', '50000010', '49000000', '500000' ]
Freq_Cust = map(float,Freq_Cust)
First_Range_Init_Freq = [Freq_Cust.index(x) for x in Freq_Cust if x >= float('5000000.000')][0]
print Freq_Cust[First_Range_Init_Freq]
>>>
5000000.0
>>>
Ofcourse you can just compare using x >= 5000000.000 or x >= 5000000.
If your list values are part int and part float, you may want to map to float as a catch-all solution.

How to compare values in pandas between two different columns?

My Table:
A Country Code1 Code2
626349 US 640AD1237 407223
702747 NaN IO1062123 407255
824316 US NaN NaN
712947 US 00220221 870262123
278147 Canada 721AC31234 109123
278144 Canada NaN 7214234321
278142 Canada 72142QW134 109123AS12
Here in the above table I need to check country and code.
I want a 5th column with correct or wrong, pseudocode:
If 'Country' == 'US' and (length(Code1) OR length(Code2) == 9):
Add values to 5th column as correct.
else:
Add values to 5th column as incorrect.
If 'Country' == 'Canada' and (length(Code1) OR length(Code2) == 10):
Add values to 5th column as correct.
else:
Add values to 5th column as incorrect.
if no values are there either in Country or Code Column than insufficient information.
I am not able to understand how should I do this in pandas. Please help. Thanks.
I tried to first find the length of rows of Code1 and Code2 and store it in different df but after that I am not able to Compare the different set of data as what I need to do.
Len1 = df.Code1.map(len)
Len2 = df.Code2.map(len)
LengthCode = pd.DataFrame({'Len_Code1': Len1,'Len_Code2': Len2})
Please tell me the better way of how to do this in single dataframe if possible.
I tried this
df[(df.Country == 'US') & ((df.Code1.str.len() == 9)|(df.Code2.str.len() == 9))|(df.Country == 'Canada') & ((df.Code1.str.len() == 10)|(df.Code2.str.len() == 10))]
But it is getting long and I will not be able to write for many countries.
This will give you a 'is_correct' boolean column:
code_lengths = {'US':9, 'Canada':10}
df['correct_code_length'] = df.Country.replace(code_lengths)
df['is_correct'] = (df.Code1.apply(lambda x: len(str(x))) == df.correct_code_length) | (df.Code2.apply(lambda x: len(str(x))) == df.correct_code_length)
You will need to populate the code_lengths dictionary with more countries as necessary.

Find empty or NaN entry in Pandas Dataframe

I am trying to search through a Pandas Dataframe to find where it has a missing entry or a NaN entry.
Here is a dataframe that I am working with:
cl_id a c d e A1 A2 A3
0 1 -0.419279 0.843832 -0.530827 text76 1.537177 -0.271042
1 2 0.581566 2.257544 0.440485 dafN_6 0.144228 2.362259
2 3 -1.259333 1.074986 1.834653 system 1.100353
3 4 -1.279785 0.272977 0.197011 Fifty -0.031721 1.434273
4 5 0.578348 0.595515 0.553483 channel 0.640708 0.649132
5 6 -1.549588 -0.198588 0.373476 audio -0.508501
6 7 0.172863 1.874987 1.405923 Twenty NaN NaN
7 8 -0.149630 -0.502117 0.315323 file_max NaN NaN
NOTE: The blank entries are empty strings - this is because there was no alphanumeric content in the file that the dataframe came from.
If I have this dataframe, how can I find a list with the indexes where the NaN or blank entry occurs?
np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:
In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))
In [155]: df.iloc[2,7]
Out[155]: nan
In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]
Finding values which are empty strings could be done with applymap:
In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))
Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.
Try this:
df[df['column_name'] == ''].index
and for NaNs you can try:
pd.isna(df['column_name'])
Check if the columns contain Nan using .isnull() and check for empty strings using .eq(''), then join the two together using the bitwise OR operator |.
Sum along axis 0 to find columns with missing data, then sum along axis 1 to the index locations for rows with missing data.
missing_cols, missing_rows = (
(df2.isnull().sum(x) | df2.eq('').sum(x))
.loc[lambda x: x.gt(0)].index
for x in (0, 1)
)
>>> df2.loc[missing_rows, missing_cols]
A2 A3
2 1.10035
5 -0.508501
6 NaN NaN
7 NaN NaN
I've resorted to
df[ (df[column_name].notnull()) & (df[column_name]!=u'') ].index
lately. That gets both null and empty-string cells in one go.
In my opinion, don't waste time and just replace with NaN! Then, search all entries with Na. (This is correct because empty values are missing values anyway).
import numpy as np # to use np.nan
import pandas as pd # to use replace
df = df.replace(' ', np.nan) # to get rid of empty values
nan_values = df[df.isna().any(axis=1)] # to get all rows with Na
nan_values # view df with NaN rows only
Partial solution: for a single string column
tmp = df['A1'].fillna(''); isEmpty = tmp==''
gives boolean Series of True where there are empty strings or NaN values.
you also do something good:
text_empty = df['column name'].str.len() > -1
df.loc[text_empty].index
The results will be the rows which are empty & it's index number.
Another opltion covering cases where there might be severar spaces is by using the isspace() python function.
df[df.col_name.apply(lambda x:x.isspace() == False)] # will only return cases without empty spaces
adding NaN values:
df[(df.col_name.apply(lambda x:x.isspace() == False) & (~df.col_name.isna())]
To obtain all the rows that contains an empty cell in in a particular column.
DF_new_row=DF_raw.loc[DF_raw['columnname']=='']
This will give the subset of DF_raw, which satisfy the checking condition.
You can use string methods with regex to find cells with empty strings:
df[~df.column_name.str.contains('\w')].column_name.count()