Time Series manipulation

Time Series manipulation - python-2.7

So I have a dataframe that I dump a time series into. The index is the date. I need to do calculations based on date.
For eg. I have {
XRT_Close
Date
2010-01-04 35.94
2010-01-05 36.17
2010-01-06 36.50
...
2015-02-07 36.60
2015-02-08 36.52 }
How would I go about doing say... Percentage change of beginning to end of the month? How would I construct a loop to cycle through the months?
Any help will be met with huge appreciation. Thank you.

First create year and month columns:
df['year'] = [x.year for x in df.index]
df['month'] = [x.month for x in df.index]
Group by them:
grouped = df.groupby(['year','month'])
Define the function you want to run on the groups:
def PChange(df):
begin = df['column_name'].iloc[0]
end = df['column_name'].iloc[-1]
return (end-begin)/(end+begin)*100
Apply the function to the groups:
grouped.apply(PChange)
Let me know if it works.

Related

For Loop and If Statement not performing as expected

Here's the code:
# Scrape table data
alltable = driver.find_elements_by_id("song-table")
date = date.today()
simple_year_list = []
complex_year_list = []
dateformat1 = re.compile(r"\d\d\d\d")
dateformat2 = re.compile(r"\d\d\d\d-\d\d-\d\d")
for term in alltable:
simple_year = dateformat1.findall(term.text)
for year in simple_year:
if 1800 < int(year) < date.year: # Year can't be above what the current year is or below 1800,
simple_year_list.append(simple_year) # Might have to be changed if you have a song from before 1800
else:
continue
complex_year = dateformat2.findall(term.text)
complex_year_list.append(complex_year)
The code uses regular expressions to find four consecutive digits. Since there are multiple 4 digit numbers, I want to narrow it down to between 1800 and 2021 since that's a reasonable time frame. simple_year_list, however, prints out numbers that don't follow the conditions.

You aren't saving the right value here:
simple_year_list.append(simple_year)
You should be saving the year:
simple_year_list.append(year)
I would need more information to help further though. Maybe give us a sample of the data you're working through, and the output you're seeing?

You can do it all in regex.
Add start ^ and end $ anchors, and range restriction via pattern:
dateformat1 = re.compile(r"^(1[89]\d\d|20([01]\d|2[01]))$")

Pandas: SettingWithCopyWarning, trying to understand how to write the code better, not just whether to ignore the warning

I am trying to change all date values in a spreadsheet's Date column where the year is earlier than 1900, to today's date, so I have a slice.
EDIT: previous lines of code:
df=pd.read_excel(filename)#,usecols=['NAME','DATE','EMAIL']
#regex to remove weird characters
df['DATE'] = df['DATE'].str.replace(r'[^a-zA-Z0-9\._/-]', '')
df['DATE'] = pd.to_datetime(df['DATE'])
sample row in dataframe: name, date, email
[u'Public, Jane Q.\xa0' u'01/01/2016\xa0' u'jqpublic#email.com\xa0']
This line of code works.
df["DATE"][df["DATE"].dt.year < 1900] = dt.datetime.today()
Then, all date values are formatted:
df["DATE"] = df["DATE"].map(lambda x: x.strftime("%m/%d/%y"))
But I get an error:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-
versus-copy
I have read the documentation and other posts, where using .loc is suggested
The following is the recommended solution:
df.loc[row_indexer,col_indexer] = value
but df["DATE"].loc[df["DATE"].dt.year < 1900] = dt.datetime.today() gives me the same error, except that the line number is actually the line number after the last line in the script.
I just don't understand what the documentation is trying to tell me as it relates to my example.
I started messing around with pulling out the slice and assigning to a separate dataframe, but then I'm going to have to bring them together again.

You are producing a view when you df["DATE"] and subsequently use a selector [df["DATE"].dt.year < 1900] and try to assign to it.
df["DATE"][df["DATE"].dt.year < 1900] is the view that pandas is complaining about.
Fix it with loc like this:
df.loc[df.DATE.dt.year < 1900, "DATE"] = pd.datetime.today()

My thought would be that you could do
df.loc[df.DATE.dt.year < 1900, "DATE"] = dt.datetime.today()
df.loc[:, "DATE"] = df.DATE.map(lambda x: x.strftime("%m/%d/%y")
Not at a computer so I can't test but I think that should do it.

Python Time Series

I am working on a real estate cash-flow simulation.
What I want in the end is a time series where everyday I report if the property is vacant, leased and if I collected rent.
In my present code, I create first a profit array with values of "Leased", "Vacant" or "Today you collected rent of $1000", so I used this to create my time series:
rng=pd.date_range('6/1/2016', periods=len(profit), freq='D')
ts=pd.Series(profit, index=rng)
To simplify, I assumed I collected rent every 30 days. Now I want to be more specific and collect it every 5th day of the month (for example) and be flexible on the day the next tenant will move in.
Do you know commands or a good source where I can learn how to iterate from month to month?
Any help would be appreciated

You can build a sequence of dates using date_range and .shift() (freq='M' is for month-end frequencies) with pd.datetools.day like so:
date_sequence = pd.date_range(start, end, freq='M').shift(num_of_days, freq=pd.datetools.day)
and then use this sequence to select dates from the DateTimeIndex using
df.loc[date_sequence, 'column_name'] = value
Alternatively, you can use pd.DateOffset() like so:
ts = pd.date_range(start=date(2015, 6, 1), end=date(2015, 12, 1), freq='MS')
DatetimeIndex(['2015-06-01', '2015-07-01', '2015-08-01', '2015-09-01',
'2015-10-01', '2015-11-01', '2015-12-01'],
dtype='datetime64[ns]', freq='MS')
Now add 5 days:
ts + pd.DateOffset(days=5)
to get:
DatetimeIndex(['2015-06-06', '2015-07-06', '2015-08-06', '2015-09-06',
'2015-10-06', '2015-11-06', '2015-12-06'],
dtype='datetime64[ns]', freq=None)

How do you remove seconds and milliseconds from a date time string in python

How I can convert a date in format "2013-03-15 05:14:51.327" to "2013-03-15 05:14", i.e. removing the seconds and milliseconds. I don't think there is way in Robot frame work. Please let me know if any one have a solution for this in python.

Try this (Thanks Blender!)
>>> date = "2013-03-15 05:14:51.327"
>>> newdate = date.rpartition(':')[0]
>>> print newdate
2013-03-15 05:14

In Robotframework the most straightforward way would be to user Split String From Right from the String library library:
${datestring}= Set Variable 2019-03-15 05:14:51.327
${parts}= Split String From Right ${datestring} : max_split=1
# parts is a list of two elements - everything before the last ":", and everything after it
# take the 1st element, it is what we're after
${no seconds}= Get From List ${parts} 0
Log ${no senods} # 2019-03-15 05:14

Regular expression for numeric range

Looking for a regular expression to cover a number range. More specifically, consider a numeric format:
NN-NN
where N is a number. So examples are:
04-11
07-12
06-06
I want to be able to specify a range. For example, anything between:
01-27 and 02-03
When I say range, it is as if the - is not there. So the range:
the range 01-27 to 02-03
would cover:
01-28, 01-29, 01-30, 01-31 and 02-01
I want the regular expression so that I can plug in values for the range very easily. Any ideas?

Validating dates is not where regexes strengths are.
for example, how would you validate February regarding leap years.
The solution is to use the available date API's in your language

'0[12]-[0-3][1-9]' would match all of the required dates, however, it would also match dates like 01-03. If you want to match exactly and only the dates in that range, you'll need to do something a little more advanced.
Here's an easily configurable example in Python:
from calendar import monthrange
import re
startdate = (1,27)
enddate = (2,3)
d = startdate
dateList = []
while d != enddate:
(month, day) = d
dateList += ['%02i-%02i' % (month, day)]
daysInMonth = monthrange(2011,month)[1] # took a random non-leap year
# but you might want to take the current year
day += 1
if day > daysInMonth:
day = 1
month+=1
if month > 12:
month = 1
d = (month,day)
dateRegex = '|'.join(dateList)
testDates = ['01-28', '01-29', '01-30', '01-31', '02-01',
'04-11', '07-12', '06-06']
isMatch = [re.match(dateRegex,x)!=None for x in testDates]
for i, testDate in enumerate(testDates):
print testDate, isMatch[i]
dateRegex looks like this:
'01-27|01-28|01-29|01-30|01-31|02-01|02-02'
And the output is:
01-28 True
01-29 True
01-30 True
01-31 True
02-01 True
04-11 False
07-12 False
06-06 False

It's not completely clear for me, and you didn't mention language as well, but in PHP it looks like this:
if (preg_match('~\d{2}-\d{2}~', $input, $matches) {
// do something here
}
Do you have any use case so we can adjust code to your needs?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Time Series manipulation - python-2.7

Related

For Loop and If Statement not performing as expected

Pandas: SettingWithCopyWarning, trying to understand how to write the code better, not just whether to ignore the warning

Python Time Series

How do you remove seconds and milliseconds from a date time string in python

Regular expression for numeric range

Categories

Resources