RSI Crossover Back-Testing Issue - back-testing

As I was running my RSI crossover back-testing script on Jupyter, I encountered an unexpected error message. Upon closer examination, I discovered that the issue stemmed from a KeyError with the "rsi21" column. This came as a surprise to me, as I had previously defined the "rsi21" column in my code and it had been functioning without issue up until this point. I have tried to troubleshoot the problem by carefully reviewing my code for any typos or syntax errors, but to no avail. It is frustrating to encounter this issue and I am eager to find a solution. Can anyone provide any insight or assistance in resolving this KeyError with the "rsi21" column?
KeyError Traceback (most recent call last)
File ~\.julia\conda\3\lib\site-packages\pandas\core\indexes\base.py:3803, in Index.get_loc(self, key, method, tolerance)
3802 try:
-> 3803 return self._engine.get_loc(casted_key)
3804 except KeyError as err:
File ~\.julia\conda\3\lib\site-packages\pandas\_libs\index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File ~\.julia\conda\3\lib\site-packages\pandas\_libs\index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()
File pandas\_libs\hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas\_libs\hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'rsi21'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In [31], line 28
26 df_temp = df_5min[df_5min.index.date == index.date()]
27 # if the RSI21 is above the RSI55 and RSI100 and the RSI is above 30 in the 5 minute interval data, place a buy order
---> 28 if (row['rsi21'] > row['rsi55']) and (row['rsi21'] > row['rsi100']) and (df_temp['rsi21'].values[-1] > 30):
29 # place buy order
30 num_trades += 1
31 # calculate profit/loss based on the current price and the buy price
File ~\.julia\conda\3\lib\site-packages\pandas\core\frame.py:3805, in DataFrame.__getitem__(self, key)
3803 if self.columns.nlevels > 1:
3804 return self._getitem_multilevel(key)
-> 3805 indexer = self.columns.get_loc(key)
3806 if is_integer(indexer):
3807 indexer = [indexer]
File ~\.julia\conda\3\lib\site-packages\pandas\core\indexes\base.py:3805, in Index.get_loc(self, key, method, tolerance)
3803 return self._engine.get_loc(casted_key)
3804 except KeyError as err:
-> 3805 raise KeyError(key) from err
3806 except TypeError:
3807 # If we have a listlike key, _check_indexing_error will raise
3808 # InvalidIndexError. Otherwise we fall through and re-raise
3809 # the TypeError.
3810 self._check_indexing_error(key)
KeyError: 'rsi21'
In an effort to resolve the KeyError with the "rsi21" column, I tried several different approaches. First, I checked my code for any typos or syntax errors that could be causing the issue. I also tried re-running the script to see if the error was a temporary glitch. However, neither of these approaches yielded any results. I was expecting the script to run smoothly and for the "rsi21" column to be recognized without issue. However, instead, I received the KeyError message and the script halted. This discrepancy between my expectations and the actual result has been frustrating and I am seeking a solution to resolve the issue.
import yfinance as yf
start_date = "2020-01-01"
end_date = "2022-12-31"
stock = yf.Ticker("SPY")
df = stock.history(start=start_date, end=end_date)
def moving_average(df, window):
return df['Close'].rolling(window=window).mean()
df_5min = stock.history(interval='5m')
def rsi(df, window):
# create a new column for the difference between the current and previous close
df['diff'] = df['Close'].diff()
# create new columns for the gain and loss
df['gain'] = df['diff'].apply(lambda x: x if x > 0 else 0)
df['loss'] = df['diff'].apply(lambda x: abs(x) if x < 0 else 0)
# calculate the average gain and loss over the specified window
avg_gain = df['gain'].rolling(window=window).mean()
avg_loss = df['loss'].rolling(window=window).mean()
# calculate the relative strength
df['rs'] = avg_gain / avg_loss
# return the relative strength index
return 100 - (100 / (1 + df['rs']))
df['rsi21'] = rsi(df, 21)
df['rsi55'] = rsi(df, 55)
df['rsi100'] = rsi(df, 100)
num_trades = 0
total_profit = 0
for index, row in df.iterrows():
# get the 5-minute interval data for the current date
df_temp = df_5min[df_5min.index.date == index.date()]
# if the RSI21 is above the RSI55 and RSI100 and the RSI is above 30 in the 5 minute interval data, place a buy order
if (row['rsi21'] > row['rsi55']) and (row['rsi21'] > row['rsi100']) and (df_temp['rsi21'].values[-1] > 30):
# place buy order
num_trades += 1
# calculate profit/loss based on the current price and the buy price
profit = row['Close'] - df_temp['Open'].values[-1]
total_profit += profit
# if the RSI21 is below the RSI55 and RSI100 and the RSI is below 70 in the 5 minute interval data, place a sell order
elif (row['rsi21'] < row['rsi55']) and (row['rsi21'] < row['rsi100']) and (df_temp['rsi21'].values[-1] < 70):
# place sell order
num_trades += 1
# calculate profit/loss based on the current price and the sell price
profit = df_temp['Open'].values[-1] - row['Close']
total_profit += profit
#print the total number of trades and total profit/loss
print("Total number of trades:", num_trades)
print("Total profit/loss:", total_profit)

Related

AttributeError: 'NoneType' object has no attribute 'shape' using OpenCV

I am reading images from google drive mounted to google colab. I have two folders, one with positive covid-19 chest x-rays and another with normal chest x-rays. I am trying to show these images side-by-side for comparison. Here are images of the code and error:
First Lines Of Code
Error Image
Here is also the written code:
Cimages = ('/content/drive/My Drive/Data/Covid')
Nimages = ('/content/drive/My Drive/Data/Normal')
import skimage
from skimage.transform import resize
def plot(i):
normal = cv2.imread(dataset +'Normal//' + Nimages[i])
normal = skimage.transform.resize(normal, (150,150,3))
covid = cv2.imread(dataset +'Covid//' + Cimages[i])
covid = skimage.transform.resize(normal, (150,150,3), mode = reflect)
pair = np.concatenate((normal, covid), axis = 1)
print('Normal vs. Covid')
plt.figure(figsize=(10,5))
plt.imshow(pair)
plt.show()
for i in range(0,3):
plot(i)
This gives me an error:
AttributeError Traceback (most recent call last)
<ipython-input-52-237aff042641> in <module>()
1 for i in range(0,3):
----> 2 plot(i)
<ipython-input-50-85bb2e03725c> in plot(i)
3 def plot(i):
4 normal = cv2.imread(dataset +'Normal//' + Nimages[i])
----> 5 normal = skimage.transform.resize(normal, (150,150,3))
6 covid = cv2.imread(dataset +'Covid//' + Cimages[i])
7 covid = skimage.transform.resize(normal, (150,150,3), mode = reflect)
/usr/local/lib/python3.6/dist-packages/skimage/transform/_warps.py in resize(image, output_shape, order, mode, cval, clip, preserve_range, anti_aliasing, anti_aliasing_sigma)
89 output_shape = tuple(output_shape)
90 output_ndim = len(output_shape)
---> 91 input_shape = image.shape
92 if output_ndim > image.ndim:
93 # append dimensions to input_shape
AttributeError: 'NoneType' object has no attribute 'shape'
So it seems to be occurring in the skimage.tranform.resize line of code. Please help.
The issue is not with the function
skimage.tranform.resize
but with the reading of the image
normal = cv2.imread(dataset +'Normal//' + Nimages[i])
not sure what are you trying to do there but Nimages[i] won't give you first file in a folder but it will yield first character of a string, in your case /. Then you will send dataset variable + Normal// + / which is basically in your case Normal/// and then you will try to read image on that path, but without doubt the is no image there in which case opencv will return to you None (which is basically nothing). and then you try to resize None with skimage which will fail.
better option would be to read the image directly or in a loop that could look somehow like this:
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(Nimages) if isfile(join(Nimages , f))]
for image_path in onlyfiles:
normal = cv2.imread(join(Nimages , image_path))
normal = skimage.transform.resize(normal, (150,150,3))
assuming that there are only images in your mentioned directory.

Keras loss is nan when using inputing data from csv with numpy

I'm trying to use the TensorFlow's Boston housing price example to learn how to use TensorFlow/Keras for regressions, but I keep running into a problem using my own data, even when I make as small of changes as possible. After giving up on writing everything myself, I simply changed the two lines of the code that input the data:
boston_housing = keras.datasets.boston_housing
(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()
to something that, after looking online, should also create numpy arrays from my csv:
np_array = genfromtxt('trainingdata.csv', delimiter=',')
np_array = np.delete(np_array, (0), axis=0) # Remove header
test_np_array = np_array[:800,:]
tr_np_array = np_array[800:,:] # Separate out test and train data
train_labels = tr_np_array[:, 20] # Get the last row for the labels
test_labels = test_np_array[:, 20]
train_data = np.delete(tr_np_array, (20), axis=1)
test_data = np.delete(test_np_array, (20), axis=1) # Remove the last row so the data is only the features
Everything I can look at seems right – the shapes of the arrays are all correct, the arrays do seem to be correct-looking numpy arrays, the features do seem to become normalized, etc. and yet when I set verbose to 1 on model.fit(...), the very first lines of output show a problem with loss:
Epoch 1/500
32/2560 [..............................] - ETA: 18s - loss: nan - mean_absolute_error: nan
2016/2560 [======================>.......] - ETA: 0s - loss: nan - mean_absolute_error: nan
2560/2560 [==============================] - 0s 133us/step - loss: nan - mean_absolute_error: nan - val_loss: nan - val_mean_absolute_error: nan
I'm especially confused because every other place on stack overflow where I've seen the "TensorFlow loss is 'NaN'" error, it has generally a) been with a custom loss function, and b) once the model has trained for a while, not (as here) within the first 52 passes. Where that's not the case, it's because the data wasn't normalized, but I do that later in the code, and the normalization works for the housing pricing example and prints out numbers clustered around 0. At this point, my best guess is that it's a problem with the genfromtxt command, but if anyone can see what I'm doing wrong or where I might find my issue, I'd be incredibly appreciative.
Edit:
Here is the full code for the program. Commenting out lines 13 through 26 and uncommenting lines 10 and 11 make the program work perfectly. Commenting out lines 13 and 14 and uncommenting 16 and 17 was my attempt at using pandas, but that led to the same errors.
import tensorflow as tf
from tensorflow import keras
import numpy as np
from numpy import genfromtxt
import pandas as pd
print(tf.__version__)
# boston_housing = keras.datasets.boston_housing # Line 10
# (train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()
np_array = genfromtxt('trainingdata.csv', delimiter=',') # Line 13
np_array = np.delete(np_array, (0), axis=0)
# df = pd.read_csv('trainingdata.csv') # Line 16
# np_array = df.get_values()
test_np_array = np_array[:800,:]
tr_np_array = np_array[800:,:]
train_labels = tr_np_array[:, 20]
test_labels = test_np_array[:, 20]
train_data = np.delete(tr_np_array, (20), axis=1)
test_data = np.delete(test_np_array, (20), axis=1) # Line 26
order = np.argsort(np.random.random(train_labels.shape))
train_data = train_data[order]
train_labels = train_labels[order]
mean = train_data.mean(axis=0)
std = train_data.std(axis=0)
train_data = (train_data - mean) / std
test_data = (test_data - mean) / std
labels_mean = train_labels.mean(axis=0)
labels_std = train_labels.std(axis=0)
train_labels = (train_labels - labels_mean) / labels_std
test_labels = (test_labels - labels_mean) / labels_std
def build_model():
model = keras.Sequential([
keras.layers.Dense(64, activation=tf.nn.relu,
input_shape=(train_data.shape[1],)),
keras.layers.Dense(64, activation=tf.nn.relu),
keras.layers.Dense(1)
])
optimizer = tf.train.RMSPropOptimizer(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae'])
return model
model = build_model()
model.summary()
EPOCHS = 500
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)
history = model.fit(train_data, train_labels, epochs=EPOCHS,
validation_split=0.2, verbose=1,
callbacks=[early_stop])
[loss, mae] = model.evaluate(test_data, test_labels, verbose=0)
print("Testing set Mean Abs Error: ${:7.2f}".format(mae * 1000 * labels_std))

Formatting multiple lines of PYsnmp 4.4 print output to rows and columns Python 2.7

I'm new to python and looking for some assistance on formatting print output to rows and columns. This data will eventually be sent to csv file.
The script will grab data from multiple hosts. The number of lines is variable as well as the length of the interface name and description.
Currently the output looks like this:
hostname IF-MIB::ifDescr.1 = GigabitEthernet0/0/0<br/>
hostname IF-MIB::ifAlias.1 = --> InterfaceDesc<br/>
hostname IF-MIB::ifOperStatus.1 = 'up'<br/>
hostname IF-MIB::ifDescr.2 = GigabitEthernet0/0/1<br/>
hostname IF-MIB::ifAlias.2 = --> InterfaceDesc<br/>
hostname IF-MIB::ifOperStatus.2 = 'up'<br/>
hostname IF-MIB::ifDescr.3 = GigabitEthernet0/0/2<br/>
hostname IF-MIB::ifAlias.3 = --> InterfaceDesc<br/>
hostname IF-MIB::ifOperStatus.3 = 'up'<br/>
I'm trying to format it to the following rows and columns with headers of each row(hostname, interface, interface desc, and status).
hostname interface interface desc status
hostname GigabitEthernet0/0/0 InterfaceDesc up
hostname GigabitEthernet0/0/1 InterfaceDesc up
hostname GigabitEthernet0/0/2 InterfaceDesc up
The print code I currently have is here. I want to keep the print statements for errors.
for errorIndication, errorStatus, errorIndex, varBinds in snmp_iter:
# Check for errors and print out results
if errorIndication:
print(errorIndication)
elif errorStatus:
print('%s at %s' % (errorStatus.prettyPrint(),
errorIndex and varBinds[int(errorIndex) - 1][0] or '?'))
else:
for varBind in varBinds:
print(hostip),
print(' = '.join([x.prettyPrint() for x in varBind]))
Full script:
from pysnmp.hlapi import *
routers = ["router1"]
#adds routers to bulkCmd
def snmpquery (hostip):
snmp_iter = bulkCmd(SnmpEngine(),
CommunityData('Community'),
UdpTransportTarget((hostip, 161)),
ContextData(),
0, 50, # fetch up to 50 OIDs
ObjectType(ObjectIdentity('IF-MIB', 'ifDescr')),
ObjectType(ObjectIdentity('IF-MIB', 'ifAlias')),
ObjectType(ObjectIdentity('IF-MIB', 'ifOperStatus')),
lexicographicMode=False) # End bulk request once outside of OID child objects
for errorIndication, errorStatus, errorIndex, varBinds in snmp_iter:
# Check for errors and print out results
if errorIndication:
print(errorIndication)
elif errorStatus:
print('%s at %s' % (errorStatus.prettyPrint(),
errorIndex and varBinds[int(errorIndex) - 1][0] or '?'))
else:
for rowId, varBind in enumerate(varBindTable):
oid, value = varBind
print('%20.20s' % value)
if not rowId and rowId % 3 == 0:
print('\n')
# calls snmpquery for all routers in list
for router in routers:
snmpquery(router)
Any help you can provide is much appreciated.
Thanks!
Assuming the snmp_iter is initialized with three SNMP table columns:
snmp_iter = bulkCmd(SnmpEngine(),
UsmUserData('usr-md5-des', 'authkey1', 'privkey1'),
Udp6TransportTarget(('demo.snmplabs.com', 161)),
ContextData(),
0, 25,
ObjectType(ObjectIdentity('IF-MIB', 'ifDescr')),
ObjectType(ObjectIdentity('IF-MIB', 'ifAlias')),
ObjectType(ObjectIdentity('IF-MIB', 'ifOperStatus')))
you can be sure that (for the GETNEXT and GETBULK commands) pysnmp always returns rectangular table in a row by row fashion.
Knowing the number of the columns you have requested (3) you should be able to print the output row by row:
for rowId, varBind in enumerate(varBindTable):
oid, value = varBind
print('%20.20s' % value)
if not rowId and rowId % 3 == 0:
print('\n')

I am trying to run Dickey-Fuller test in statsmodels in Python but getting error

I am trying to run Dickey-Fuller test in statsmodels in Python but getting error P
Running from python 2.7 & Pandas version 0.19.2. Dataset is from Github and imported the same
enter code here
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
print 'Results of Dickey-Fuller Test:'
dftest = ts.adfuller(timeseries, autolag='AIC' )
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print dfoutput
test_stationarity(tr)
Gives me following error :
Results of Dickey-Fuller Test:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-10ab4b87e558> in <module>()
----> 1 test_stationarity(tr)
<ipython-input-14-d779e1ed35b3> in test_stationarity(timeseries)
19 #Perform Dickey-Fuller test:
20 print 'Results of Dickey-Fuller Test:'
---> 21 dftest = ts.adfuller(timeseries, autolag='AIC' )
22 #dftest = adfuller(timeseries, autolag='AIC')
23 dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
C:\Users\SONY\Anaconda2\lib\site-packages\statsmodels\tsa\stattools.pyc in adfuller(x, maxlag, regression, autolag, store, regresults)
209
210 xdiff = np.diff(x)
--> 211 xdall = lagmat(xdiff[:, None], maxlag, trim='both', original='in')
212 nobs = xdall.shape[0] # pylint: disable=E1103
213
C:\Users\SONY\Anaconda2\lib\site-packages\statsmodels\tsa\tsatools.pyc in lagmat(x, maxlag, trim, original)
322 if x.ndim == 1:
323 x = x[:,None]
--> 324 nobs, nvar = x.shape
325 if original in ['ex','sep']:
326 dropidx = nvar
ValueError: too many values to unpack
tr must be a 1d array-like, as you can see here. I don't know what is tr in your case. Assuming that you defined tr as the dataframe that contains the time serie's data, you should do something like this:
tr = tr.iloc[:,0].values
Then adfuller will be able to read the data.
just change the line as:
dftest = adfuller(timeseries.iloc[:,0].values, autolag='AIC' )
It will work. adfuller requires a 1D array list. In your case you have a dataframe. Therefore mention the column or edit the line as mentioned above.
I am assuming since you are using the Dickey-Fuller test .you want to maintain the timeseries i.e date time column as index.So in order to do that.
tr=tr.set_index('Month') #I am assuming here the time series column name is Month
ts = tr['othercoulumnname'] #Just use the other column name here it might be count or anything
I hope this helps.

Django queryset - filter/exclude by sum over column

For a small caching application I have the following problem/question:
Part of the model:
class CachedResource(models.Model):
...
filesize = models.PositiveIntegerField()
created = models.DateTimeField(auto_now_add=True, editable=False)
...
The cache should be e.g. limited to 200MB - and keep the newest files.
How can I create a queryset like:
CachedResource.objects.order_by('-created').exclude(" summary of filesize < x ")
Any input appreciated!
Example:
created filesize keep/delete?
2014-06-22 15:00 50 keep (sum: 50)
2014-06-22 14:50 100 keep (sum: 150)
2014-06-22 14:40 30 keep (sum: 180)
2014-06-22 14:30 20 keep (sum: 200)
2014-06-22 14:20 50 delete (sum: 250 > 200)
2014-06-22 14:10 10 delete ...
2014-06-22 14:00 200 delete ...
2014-06-22 13:50 10 delete ...
2014-06-22 13:40 2 delete ...
... ... ... ...
Each object in the following queryset will have a 'filesize_sum' attribute holding the summary of filesizes of all cache resources created since that object's creation time.
qs = CachedResource.objects.order_by('-created').extra(select={
'filesize_sum': """
SELECT
SUM(filesize)
FROM
CachedResource_table_name as cr
WHERE
cr.created >= CachedResource_table_name.created
"""})
Then you can make a loop to do what you want. For example, you could make a loop that breaks on the first object with filesize_sum > 200MB and run a delete query on the queryset for all objects with a smaller or equal creation date to that object:
for obj in qs:
if obj.filesize_sum > 200:
qs.filter(created__lte=obj.created).delete()
break
Keep in mind though that you probably want to also take some action before inserting a new cache resource, so that the filesize of the new resource does not exceed your limit. For example, you could run the above procedure with:
limit = configured_limit - filesize_of_cache_resource_to_insert
Probably there is a better way to do that:
cachedResources = CachedResource.objects.order_by('-created')
list_of_items = []
size_of_files = 0
for item in cachedResources:
if size_of_files < 200:
list_of_items.append(item.id)
else
break
cached_resources_by_size = CachedResource.objects.filter(id__in=list_of_items).order_by('-created')
totals = CachedResource.objects.values('id').aggregate(sum=Sum('filesize'), count=Count('id'))
num_to_keep = totals['count'] * min(MAX_FILESIZE / totals['sum'], 1)
while num_to_keep < totals['count']:
new_sum = CachedResource.objects.filter(id__in=CachedResource.objects.order_by('-created')[:num_to_keep]).aggregate(sum=Sum('filesize'))
# if <not acceptable approximation>:
# adjust approximation
# continue
CachedResource.objects.order_by('-created')[num_to_keep:].delete()
break
The aggregation in line 1 can get you the total filesize and the number of entries in a single query. Based on these results, it's easy to calculate an approximated amount of entries to keep. You can do some extra checking to assert that this approximation falls within certain limits (+/- 20% as you said). Then a simple order_by and a slice will result in a queryset of all entries to delete.