Code:
import cv2
import numpy as np
import sys
import webcolors
import time
cam=cv2.VideoCapture('video2.avi')
_, fo = cam.read()
framei = cv2.cvtColor(fo, cv2.COLOR_BGR2GRAY)
bg_avg = np.float32(framei)
video_width = int(cam.get(3))
video_height = int(cam.get(4))
fr = int(cam.get(5))
print("frame rate of stored video:::",fr)
while(cam.isOpened):
f,img=cam.read()
start_fps=time.time()
.
.
.
k = cv2.waitKey(20)
if(k == 27):
break
endtime_fps=time.time()
diff_fps=endtime_fps-start_fps
print("Frame rate::",1/diff_fps)
With every iteration, this prints a different frame rate like: 31.249936670193268, 76.92300920661702, 142.85290010558222, 166.67212398172063, 200.00495922941204, 38.46150460330851... etc with some values being repeated a few times. Now the value of frame rate for the stored video is 25. So what is the actual frame rate at which it is being read?
You can get FPS(Frames Per Second) using the code below:
import cv2
cam = cv2.VideoCapture('video2.avi')
fps = cam.get(cv2.CAP_PROP_FPS)
I'm not certain, but I think this might come down to your timing method. I don't think Python's time.time() method guarantees enough precision to provide the real-time profiling information you desire.
Related
I am trying to show images one after another in a loop in one figure, meaning that after image 1 is shown, after a few second, image 2 be shown in the same figure.
I have the following code so far:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
for i in range(1,4):
PATH = "kodim01.png"
N = "%02d" % i
print PATH.replace("01", N)
image = mpimg.imread(PATH) # images are color images
plt.show()
plt.imshow(image)
However it shows one image (the first image) 3 times. although the path changes. the image does not change. Please see results below:
Here
How can I 1) show all the images in one figure one after each other, i.e. the successive image be replaced to the previous one
e.g. 1 sec delay between each image. and 2) show all the images, not only repeating one image?
Thanks
Using the %matplotlib inline backend
The %matplotlib inline backend displays the matplotlib plots as png images. You can use IPython.display to display the image and in this case display another image after some time.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from IPython import display
import time
%matplotlib inline
PATH = "../data/kodim{0:02d}.png"
for i in range(1,4):
p = PATH.format(i)
#print p
image = mpimg.imread(p) # images are color images
plt.gca().clear()
plt.imshow(image);
display.display(plt.gcf())
display.clear_output(wait=True)
time.sleep(1.0) # wait one second
Note that in this case you are not showing the image in the same figure, but rather replace the image of the old figure with the image of the new figure.
Using the %matplotlib notebook backend
You may directly use an interactive backend to show your plot. This allows to animate the plot using matplotlib.animation.FuncAnimation.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import matplotlib.animation as animation
%matplotlib notebook
PATH = "../data/kodim{0:02d}.png"
def update(i):
p = PATH.format(i)
#print p
image = mpimg.imread(p) # images are color images
plt.gca().clear()
plt.imshow(image)
ani = animation.FuncAnimation(plt.gcf(), update, range(1,4), interval = 1000, repeat=False)
plt.show()
Here is my solution:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from time import sleep
%matplotlib inline
for i in range(1,3):
PATH = "kodim01.png"
N = "%02d" % i
print PATH.replace("01", N)
image = mpimg.imread(PATH.replace("01", N))
# plt.show()
plt.imshow(image)
sleep(5) #in seconds
The more coherent way to do this would be to:
use dir() to obtain all the image names into a variable. For example images = dir(path) Images will hold all the names of the images in your directory pointed to by your path.
Then loop through the images like so:
for image in images:
cur_img =mpimg.imread(image)
...
plt.imshow(cur_img)
sleep(5) #in seconds
The reason I don't like the string manipulation is because it is a short term fix. It's nicer to do it this way so that it will work in a more general format.
import glob
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
images = []
for img_path in glob.glob('folder/*.png'):
images.append(mpimg.imread(img_path))
plt.figure(figsize=(20,10))
columns = 5
for i, image in enumerate(images):
plt.subplot(len(images) / columns + 1, columns, i + 1)
plt.imshow(image)
where 'folder' contains images with .png extension
Question about framerates on the picamera v2:
According to the documentation of picamera , the following framerates are feasible for this hardware:
Resolution Aspect Ratio Framerates Video Image FoV Binning
1 1920x1080 16:9 0.1-30fps x Partial None
2 3280x2464 4:3 0.1-15fps x x Full None
3 3280x2464 4:3 0.1-15fps x x Full None
4 1640x1232 4:3 0.1-40fps x Full 2x2
5 1640x922 16:9 0.1-40fps x Full 2x2
6 1280x720 16:9 40-90fps x Partial 2x2
7 640x480 4:3 40-90fps x Partial 2x2
However, when gathering images with the capture_sequence method (which in the documentation is referred to as the fastest method) I don't get close to these numbers.
For the 1280x720 rate it maxes out at 25 fps, at 640x480 it maxes out close to 60.
The calculations I'm performing are irrelevant i.e. commenting them out doesn't make a difference (calculations are fast enough to not be the cause of the issue).
If somebody would see some flaws in what I'm try to do and would solve increasing the framerate ... .
import io
import time
import picamera
#import multiprocessing
from multiprocessing.pool import ThreadPool
#import threading
import cv2
#from PIL import Image
from referenceimage import ReferenceImage
from detectobject_stream import detectobject_stream
from collections import deque
from datawriter import DataWriter
backgroundimage=ReferenceImage()
threadn = cv2.getNumberOfCPUs()
pool = ThreadPool(processes = threadn)
pending = deque()
Numberofimages=500
starttime=time.time()
#datawrite=DataWriter()
#datawrite.start()
def outputs():
stream = io.BytesIO()
Start=True
global backgroundimage
for i in range(Numberofimages):
yield stream
#print time.time()-starttime
#start = time.time()
while len(pending) > 0 and pending[0].ready():
timestamps = pending.popleft().get()
#print timestamps
if len(pending)<threadn:
stream.seek(0)
task = pool.apply_async(detectobject_stream, (stream.getvalue(),backgroundimage,Start,0))
pending.append(task)
Start=False
stoptime = time.time()
print stoptime-start
stream.seek(0)
stream.truncate()
#print i
with picamera.PiCamera() as camera:
#camera.resolution = (640, 480)
camera.resolution = (1280, 720)
camera.framerate = 60
camera.start_preview()
time.sleep(2)
start = time.time()
camera.capture_sequence(outputs(),format='bgr',use_video_port=True)
finish = time.time()
print('Captured images at %.2ffps' % (Numberofimages / (finish - start)))
thanks in advance
I am looking at two scenarios building a model using scikit-learn and I can not figure out why one of them is returning a result that is so fundamentally different than the other. The only thing different between the two cases (that I know of) is that in one case I am one-hot-encoding the categorical variables all at once (on the whole data) and then splitting between training and test. In the second case I am splitting between training and test and then one-hot-encoding both sets based off of the training data.
The latter case is technically better for judging the generalization error of the process but this case is returning a normalized gini that is dramatically different (and bad - essentially no model) compared to the first case. I know the first case gini (~0.33) is in line with a model built on this data.
Why is the second case returning such a different gini? FYI The data set contains a mix of numeric and categorical variables.
Method 1 (one-hot encode entire data and then split) This returns: Validation Sample Score: 0.3454355044 (normalized gini).
from sklearn.cross_validation import StratifiedKFold, KFold, ShuffleSplit,train_test_split, PredefinedSplit
from sklearn.ensemble import RandomForestRegressor , ExtraTreesRegressor, GradientBoostingRegressor
from sklearn.linear_model import LogisticRegression
import numpy as np
import pandas as pd
from sklearn.feature_extraction import DictVectorizer as DV
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from sklearn.grid_search import GridSearchCV,RandomizedSearchCV
from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor
from scipy.stats import randint, uniform
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
def gini(solution, submission):
df = zip(solution, submission, range(len(solution)))
df = sorted(df, key=lambda x: (x[1],-x[2]), reverse=True)
rand = [float(i+1)/float(len(df)) for i in range(len(df))]
totalPos = float(sum([x[0] for x in df]))
cumPosFound = [df[0][0]]
for i in range(1,len(df)):
cumPosFound.append(cumPosFound[len(cumPosFound)-1] + df[i][0])
Lorentz = [float(x)/totalPos for x in cumPosFound]
Gini = [Lorentz[i]-rand[i] for i in range(len(df))]
return sum(Gini)
def normalized_gini(solution, submission):
normalized_gini = gini(solution, submission)/gini(solution, solution)
return normalized_gini
# Normalized Gini Scorer
gini_scorer = metrics.make_scorer(normalized_gini, greater_is_better = True)
if __name__ == '__main__':
dat=pd.read_table('/home/jma/Desktop/Data/Kaggle/liberty/train.csv',sep=",")
y=dat[['Hazard']].values.ravel()
dat=dat.drop(['Hazard','Id'],axis=1)
folds=train_test_split(range(len(y)),test_size=0.30, random_state=15) #30% test
#First one hot and make a pandas df
dat_dict=dat.T.to_dict().values()
vectorizer = DV( sparse = False )
vectorizer.fit( dat_dict )
dat= vectorizer.transform( dat_dict )
dat=pd.DataFrame(dat)
train_X=dat.iloc[folds[0],:]
train_y=y[folds[0]]
test_X=dat.iloc[folds[1],:]
test_y=y[folds[1]]
rf=RandomForestRegressor(n_estimators=1000, n_jobs=1, random_state=15)
rf.fit(train_X,train_y)
y_submission=rf.predict(test_X)
print("Validation Sample Score: {:.10f} (normalized gini).".format(normalized_gini(test_y,y_submission)))
Method 2 (first split and then one-hot encode) This returns: Validation Sample Score: 0.0055124452 (normalized gini).
from sklearn.cross_validation import StratifiedKFold, KFold, ShuffleSplit,train_test_split, PredefinedSplit
from sklearn.ensemble import RandomForestRegressor , ExtraTreesRegressor, GradientBoostingRegressor
from sklearn.linear_model import LogisticRegression
import numpy as np
import pandas as pd
from sklearn.feature_extraction import DictVectorizer as DV
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from sklearn.grid_search import GridSearchCV,RandomizedSearchCV
from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor
from scipy.stats import randint, uniform
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
def gini(solution, submission):
df = zip(solution, submission, range(len(solution)))
df = sorted(df, key=lambda x: (x[1],-x[2]), reverse=True)
rand = [float(i+1)/float(len(df)) for i in range(len(df))]
totalPos = float(sum([x[0] for x in df]))
cumPosFound = [df[0][0]]
for i in range(1,len(df)):
cumPosFound.append(cumPosFound[len(cumPosFound)-1] + df[i][0])
Lorentz = [float(x)/totalPos for x in cumPosFound]
Gini = [Lorentz[i]-rand[i] for i in range(len(df))]
return sum(Gini)
def normalized_gini(solution, submission):
normalized_gini = gini(solution, submission)/gini(solution, solution)
return normalized_gini
# Normalized Gini Scorer
gini_scorer = metrics.make_scorer(normalized_gini, greater_is_better = True)
if __name__ == '__main__':
dat=pd.read_table('/home/jma/Desktop/Data/Kaggle/liberty/train.csv',sep=",")
y=dat[['Hazard']].values.ravel()
dat=dat.drop(['Hazard','Id'],axis=1)
folds=train_test_split(range(len(y)),test_size=0.3, random_state=15) #30% test
#first split
train_X=dat.iloc[folds[0],:]
train_y=y[folds[0]]
test_X=dat.iloc[folds[1],:]
test_y=y[folds[1]]
#One hot encode the training X and transform the test X
dat_dict=train_X.T.to_dict().values()
vectorizer = DV( sparse = False )
vectorizer.fit( dat_dict )
train_X= vectorizer.transform( dat_dict )
train_X=pd.DataFrame(train_X)
dat_dict=test_X.T.to_dict().values()
test_X= vectorizer.transform( dat_dict )
test_X=pd.DataFrame(test_X)
rf=RandomForestRegressor(n_estimators=1000, n_jobs=1, random_state=15)
rf.fit(train_X,train_y)
y_submission=rf.predict(test_X)
print("Validation Sample Score: {:.10f} (normalized gini).".format(normalized_gini(test_y,y_submission)))
While the previous comments correctly suggest it is best to map over your entire feature space first, in your case both the Train and Test contain all of the feature values in all of the columns.
If you compare the vectorizer.vocabulary_ between the two versions, they are exactly the same, so there is no difference in mapping. Hence, it cannot be causing the problem.
The reason Method 2 fails is because your dat_dict gets re-sorted by the original index when you execute this command.
dat_dict=train_X.T.to_dict().values()
In other words, train_X has a shuffled index going into this line of code. When you turn it into a dict, the dict order re-sorts into the numerical order of the original index. This causes your Train and Test data become completely de-correlated with y.
Method 1 doesn't suffer from this problem, because you shuffle the data after the mapping.
You can fix the issue by adding a .reset_index() both times you assign the dat_dict in Method 2, e.g.,
dat_dict=train_X.reset_index(drop=True).T.to_dict().values()
This ensures the data order is preserved when converting to a dict.
When I add that bit of code, I get the following results:
- Method 1: Validation Sample Score: 0.3454355044 (normalized gini)
- Method 2: Validation Sample Score: 0.3438430991 (normalized gini)
I can't get your code to run, but my guess is that in the test dataset either
you're not seeing all the levels of some of the categorical variables, and hence if you calculate your dummy variables just on this data, you'll actually have different columns.
Otherwise, maybe you have the same columns but they're in a different order?
I am trying write a program for 1D FDTD wave propagation, everything is fine except the interval keyword argument of FuncAnimation. Whenever I increase the interval from 10 (19 to be precise), the animation figure closes before running (exits as soon as it pops up). Now i can easily slow down the animation using time.sleep, but it would be great if i could understand this. Can somebody please explain me how this interval argument works. Is it, in anyway related to the time required by the function that updates frames which is being called by FuncAnimation? Also what is blit for?
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
def main():
#defining dimensions
xdim=720
time_tot = 500
xsource = xdim/2
#stability factor
S=1
#Speed of light
c=1
epsilon0=1
mu0=1
delta =1 # Space step
deltat = S*delta/c # Time step
Ez = np.zeros(xdim) # Arrays to store Electric field and magnetic field
Hy = np.zeros(xdim)
epsilon = epsilon0*np.ones(xdim) #Permittivity and permeability values.
mu = mu0*np.ones(xdim)
fig , axis = plt.subplots(1,1)
axis.set_xlim(len(Ez))
axis.set_ylim(-3,3)
axis.set_title("E Field")
line, = axis.plot([],[])
def init():
line.set_data([],[])
return line,
def animate(n, *args, **kwargs):
Hy[0:xdim-1] = Hy[0:xdim-1]+(delta/(delta*mu[0:xdim-1]))*(Ez[1:xdim]-Ez[0:xdim-1])
Ez[1:xdim]= Ez[1:xdim]+(delta/(delta*epsilon[1:xdim]))*(Hy[1:xdim]-Hy[0:xdim-1])
#Ez[xsource] = Ez[xsource] + 30.0*(1/np.sqrt(2*np.pi))*np.exp(-(n-80.0)**2/(100))
Ez[xsource]=np.sin(2*n*np.pi/180)
ylims = axis.get_ylim()
if (abs(np.amax(Ez))>ylims[1]): # Scaling axis
axis.set_ylim(-(np.amax(Ez)+2),np.amax(Ez)+2)
line.set_data(np.arange(len(Ez)),Ez)
return line,
ani = animation.FuncAnimation(fig, animate, init_func=init, frames=(time_tot), interval=10, blit=False, repeat =False)
fig.show()
if __name__ == "__main__": main()
I am trying to identify the type of noise based on that article:
Model selection with Probabilistic (PCA) and Factor Analysis (FA)
I am using scikit-learn-0.14.1.win32-py2.7 on win8 64bit
I know that it refers on version 0.15, however at the version 0.14 documentation it mentions that the score method is available for PCA so I guess it should normally work:
sklearn.decomposition.ProbabilisticPCA
The problem is that no matter which PCA I will use for the *cross_val_score*, I always get a type error message saying that the estimator PCA does not have a score method:
*TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator PCA(copy=True, n_components=None, whiten=False) does not.*
Any ideas why is that happening?
Many thanks in advance
Christos
X has 1000 samples of 40 features
here is a portion of the code:
import numpy as np
import csv
from scipy import linalg
from sklearn.decomposition import PCA, FactorAnalysis
from sklearn.cross_validation import cross_val_score
from sklearn.grid_search import GridSearchCV
from sklearn.covariance import ShrunkCovariance, LedoitWolf
#read in the training data
train_path = '<train data path>/train.csv'
reader = csv.reader(open(train_path,"rb"),delimiter=',')
train = list(reader)
X = np.array(train).astype('float')
n_samples = 1000
n_features = 40
n_components = np.arange(0, n_features, 4)
def compute_scores(X):
pca = PCA()
pca_scores = []
for n in n_components:
pca.n_components = n
pca_scores.append(np.mean(cross_val_score(pca, X, n_jobs=1)))
return pca_scores
pca_scores = compute_scores(X)
n_components_pca = n_components[np.argmax(pca_scores)]
Ok, I think I found the problem. it is not working with PCA, but it does work with PPCA
However, by not providing a cv number the cross_val_score automatically sets 3-fold cross validation
that created 3 sets with sizes 334, 333 and 333 (my initial training set contains 1000 samples)
Since nympy.mean cannot make a comparison between sets with different sizes (334 vs 333), python rises an exception.
thx