Vectorizing Files using sklearn

Vectorizing Files using sklearn - python-2.7

I am trying to read 100 training files and vectorize them using sklean. The contents of these files are word representing system calls. Once vectorized, I would like to print the vectors out.
My first attempt was the following:
import re
import os
import sys
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import numpy as np
import numpy.linalg as LA
trainingdataDir = 'C:\data\Training data'
def readfile():
for file in os.listdir(trainingdataDir):
trainingfiles = os.path.join(trainingdataDir, file)
if os.path.isfile(trainingfiles):
data = open(trainingfiles, "rb").read()
return data
train_set = [readfile()]
vectorizer = CountVectorizer()
transformer = TfidfTransformer()
trainVectorizerArray = vectorizer.fit_transform(train_set).toarray()
print 'Fit Vectorizer to train set', trainVectorizerArray
However, this only returns the vector for the last file.
I concluded that the print function should be placed in the for loop. So the second attempt:
def readfile():
for file in os.listdir(trainingdataDir):
trainingfiles = os.path.join(trainingdataDir, file)
if os.path.isfile(trainingfiles):
data = open(trainingfiles, "rb").read()
trainVectorizerArray = vectorizer.fit_transform(data).toarray()
print 'Fit Vectorizer to train set', trainVectorizerArray
However, this does not return anything.
Could you please assist me with this? Why am I not able to see the vectors being printed out?

The issue was because the list of data sets used to vectorize was empty. I managed to vectorize a set of 100 files. I first opened the files, then read each file and finally added them to a list. The list of data set is then used to by the 'tfidf_vectorizer'
import re
import os
import sys
import numpy as np
import numpy.linalg as LA
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
trainingdataDir = 'C:\\data\\Training data'
tfidf_vectorizer = TfidfVectorizer()
transformer = TfidfTransformer()
def readfile(trainingdataDir):
train_set = []
for file in os.listdir(trainingdataDir):
trainingfiles = os.path.join(trainingdataDir, file)
if os.path.isfile(trainingfiles):
data = open(trainingfiles, 'r')
data_set=str.decode(data.read())
train_set.append(data_set)
return train_set
tfidf_matrix_train = tfidf_vectorizer.fit_transform(readfile(trainingdataDir))
print 'Fit Vectorizer to train set',tfidf_matrix_train
print "cosine scores ==> ",cosine_similarity(tfidf_matrix_train[0:1], tfidf_matrix_train)

Related

Why layer.get_weights() is returning list of length 1 and 4?

I am trying to get the weights and biases of all convolutional layers of resnet50 model for one of my assignments. I learned that we can use the function layer.get_weights() to get the weight and bias. This will return a list of which contains two elements weight of the layer stored at layer.get_weights()[0] and the bias is stored at layer.get_weights()[1]. Here is the code which I used.
import tensorflow as to
import source
from source import models
from source.utils.image import read_image_bgr, preprocess_image, resize_image
from source.utils.visualization import draw_box, draw_caption
from source.utils.colors import label_color
from source.models import retinanet
import warnings
warnings.filterwarnings("ignore")
from tensorflow import ConfigProto
import numpy as np
import os
import argparse
import keras
from keras.layers import Input,Conv2D,MaxPooling2D,UpSampling2D, Activation, Dropout
from keras.models import Model
ap = argparse.ArgumentParser()
ap.add_argument("-weight", "--weight_file", type=str,default="trained_model.h5",help="Path to the weights file")
ap.add_argument("-backbone", "--backbone", type=str, default="resnet50",help="Backbone model name")
args = vars(ap.parse_args())
#fetching a tensorflow session
def get_session():
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
return tf.Session(config=config)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))
keras.backend.tensorflow_backend.set_session(get_session())
model = str(args.get("weight_file", False))
backbone = str(args.get("backbone", False))
model = models.load_model(str(model), backbone_name=str(backbone))
#model is the resnet50 model
for layer in model.layers:
print('layer name', layer.name)
we = layer.get_weights()
print('len(we)',len(we))
But in my case, I am getting length 1 for some of the cases and length 4 for other cases which is different from what it is expected. I am really confused at this point. If anybody has any idea and suggestions will be really helpful.
Thanks in advance.

The get_weights() function returns both trainable and not trainable parameters of a layer. The BatchNormalization layer has 4 parameters, which explains the 4 length outputs (since Resnet blocks have batchnorm). As far as I am aware, ResNet models do not use the bias term in the convolutional layers because of the batchnorm, which would explain the length 1 outputs.

get "LogicError: explicit_context_dependent failed: invalid device context - no currently active context? " when running tensorRT in ROS

I have an inference code in TensorRT(with python). I want to run this code in ROS but I get the below error when trying to allocate buffer:
LogicError: explicit_context_dependent failed: invalid device context - no currently active context?
The code works well out of the ROS package. A ROS node publishes an image and the given code get the image to do inference. The inference code is shown below:
#!/usr/bin/env python
# Revision $Id$
import rospy
from std_msgs.msg import String
from cv_bridge import CvBridge
import cv2
import os
import numpy as np
import argparse
import torch
from torch.autograd import Variable
from torchvision import transforms
import torch.nn.functional as F
import torch._utils
from PIL import Image
from sensor_msgs.msg import Image as ImageMsg
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import random
import sys
import common
import shutil
from itertools import chain
TRT_LOGGER = trt.Logger()
# cuda.init()
class ModelData(object):
def __init__(self):
self.MODEL_PATH = "./MobileNet_v2_Final.onnx" ## converted model from pytorch to onnx
self.batch_size = 1
self.num_classes = 3
self.engine = build_int8_engine(self.MODEL_PATH, self.batch_size)
self.context = self.engine.create_execution_context()
### ROS PART
self.bridge_ROS = CvBridge()
self.loop_rate = rospy.Rate(1)
self.pub = rospy.Publisher('Image_Label', String, queue_size=1)
print('INIT Successfully')
def callback(self, msg):
rospy.loginfo('Image received...')
cv_image = self.bridge_ROS.imgmsg_to_cv2(msg, desired_encoding="passthrough")
inputs, outputs, bindings, stream = common.allocate_buffers(context.engine)
[output] = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream, batch_size=effective_batch_size)
def listener(self):
rospy.Subscriber("chatter", ImageMsg, self.callback)
while not rospy.is_shutdown():
rospy.loginfo('Getting image...')
self.loop_rate.sleep()
def build_int8_engine(model_file, batch_size=32):
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_batch_size = batch_size
builder.max_workspace_size = common.GiB(1)
with open(model_file, 'rb') as model:
parser.parse(model.read(),)
return builder.build_cuda_engine(network)
if __name__ == '__main__':
rospy.init_node("listener", anonymous=True)
infer = ModelData()
infer.listener()
The error comes from the below class in stream = cuda.Stream():
#!/usr/bin/env python
# Revision $Id$
from itertools import chain
import argparse
import os
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import tensorrt as trt
# Simple helper data class that's a little nicer to use than a 2-tuple.
class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def __str__(self):
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
def __repr__(self):
return self.__str__()
# Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
def allocate_buffers(engine):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
ctx.pop()
del ctx
return inputs, outputs, bindings, stream
# This function is generalized for multiple inputs/outputs.
# inputs and outputs are expected to be lists of HostDeviceMem objects.
def do_inference(context, bindings, inputs, outputs, batch_size=1):
# Transfer input data to the GPU.
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# [cuda.memcpy_htod(inp.device, inp.host) for inp in inputs]
# Run inference.
context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
# context.execute(batch_size=batch_size, bindings=bindings)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# [cuda.memcpy_dtoh(out.host, out.device) for out in outputs]
# Synchronize the stream
stream.synchronize()
# Return only the host outputs.
return [out.host for out in outputs]
More info:
TensorRT: 6.1.5
Python: 2.7
rosversion: 1.14.3
rosdistro: melodic

You need to explicitly create Cuda Device and load Cuda Context in the worker thread i.e. your callback function, instead of using import pycuda.autoinit in the main thread, as follows
import pycuda.driver as cuda
import threading
def callback():
cuda.init()
device = cuda.Device(0) # enter your Gpu id here
ctx = device.make_context()
allocate_buffers() # load Cuda buffers or any other Cuda or TenosrRT operations
ctx.pop() # very important
if __name__ == "__main__":
worker_thread = threading.Thread(target=callback())
worker_thread.start()
worker_thread.join()
Note: do not forget to remove import pycuda.autoinit in both modules
This is also discussed in a question here

please init cuda.
As answers above.
import pycuda.driver as cuda in main.py or befor import cuda-XXX-process

how to import data from .csv file to django sqlite using import.py

i want to retrieve data from csv file but my hardware(RFID) output gives me two different entries of same employee for timein & timeout
WorkNames.csv
In this there are double entries i want only one row for one employee in models.py Employee class
so please help in my METHOD1 by suggesting some code which can combine 2rows into one while transfering data into sqlite
OR
by helping in METHOD2 by suggesting some code to import data from output.csv
METHOD 1: I used import.py to import data. Data is imported to sqlite but double entries
import os
import sys
import csv
from datetime import datetime
project_dir='catalog'
sys.path.append(project_dir)
os.environ['DJANGO_SETTINGS_MODULE']='locallibrary.settings'
import django
django.setup()
from catalog.models import employee1
data=csv.reader(open('locallibrary/WorkNames.csv'),delimiter=',')
i=1
from datetime import datetime
for row in data:
if row[0]== '':
break
else:
Employee1=employee1()
Employee1.Date=row[0]
Employee1.Name=row[1]
Employee1.Number=row[2]
if row[3] =="" or row[3]=='Time IN':
Employee1.Time_IN='00:00:00 AM'
else:
Employee1.Time_IN=str(row[3])
if row[4] =="" or row[4]=='Time OUT':
Employee1.Time_OUT='00:00:00 AM'
else:
Employee1.Time_OUT=str(row[4])
Employee1.save()
**METHOD 2:but to get a timein timeout in row I generated a new file **output.csv
the file is genrated properly but data is not importing from csv to sqlite
import os
import sys
import csv
from datetime import datetime
project_dir='catalog'
sys.path.append(project_dir)
os.environ['DJANGO_SETTINGS_MODULE']='locallibrary.settings'
import django
django.setup()
from catalog.models import employee1
data=csv.reader(open('locallibrary/WorkNames.csv'),delimiter=',')
from datetime import datetime
lines=list(data)
r=5
c=5
for i in range(r):
for j in range(1,5):
if lines[0][j]=="":
break
if lines[i][2] == lines[j][2]:
lines[j][3]=lines[i][3]
r=5
c=5
for i in range(r):
for j in range(1,5):
if lines[i][3]!="" and lines[i][4]=="":
lines[i][3]='00:00:00 AM'
lines[i][4]='00:00:00 AM'
break
writer = csv.writer(open('locallibrary/output.csv', 'w'))
for r in lines:
writer.writerow(r)
open('locallibrary/output.csv').close()
data1=csv.reader(open('locallibrary/WorkNames.csv'),delimiter=',')
i=1
for row in data:
if row['Date']=='':
break
else:
Employee1=employee1()
Employee1.Date=row['Date']
Employee1.Name=row['Name']
Employee1.Number=row['Number']
Employee1.Time_IN=row['Time IN']
Employee1.Time_OUT=row['Time OUT']
Employee1.save()
so please help in my METHOD1 by suggesting some code which can combine 2rows into one while transfering data into sqlite
OR
by helping in METHOD2 by suggesting some code to import data from output.csv

Suggestion for your method 1
If there aren't to many rows in your CSV file (maybe less than 10.000), then you could use a python dict to group by employee number and date.
I modified a little part of your code:
... some code ...
record_dict = {}
for row in data:
if row[0] == '':
break
record_date = row[0]
emp_number = row[2]
key = (emp_number, record_date)
if key not in record_dict:
record_dict[key] = employee1()
record_dict[key].Date = row[0]
record_dict[key].Name = row[1]
record_dict[key].Number = row[2]
record_dict[key].Time_IN = None
record_dict[key].Time_OUT = None
if row[3] not in ('', 'Time IN'):
record_dict[key].Time_IN = row[3]
if row[4] not in ('', 'Time OUT'):
record_dict[key].Time_OUT = row[4]
for emp in record_dict.values():
emp.save()
By the way: you should read PEP8, which is the python coding style guide showing the preferred way to write python code.

Show matplotlib plot inline in jupyter when plot is created in external function

I am running an inference algorithm and would like to show the likelihood function after each iteration. However, the plotting function is part of a package that i am importing. I've managed to cobble it together such that the plot is shown using the tkAgg backend in an external gui window, but is there any way to make it show as an inline plot? Here is what I'm using now:
Minimal Working Example
Jupyter Code
%matplotlib inline
#import matplotlib
#matplotlib.use('tkAgg')
import matplotlib.pyplot as plt
import sys
import numpy as np
sys.path.append('/path/to/file')
#______________________________________________________________
import testclass
a = testclass.test()
a.iterator()
as can be seen below this should iteratively plot a series of dots updating the plot with one dot at a time. When I run it inline I only get the full plot after it has finished running.
Class Code
import numpy as np
import matplotlib
matplotlib.use('tkAgg')
import matplotlib.pyplot as plt
import time
class test(object):
def __init__(self):
self.x = np.random.randint(0,50,size=5)
def iterator(self):
for i in range(5):
self.plotter(i)
st = time.time()
while (time.time()-st)<2:
pass
def plotter(self,i):
if not hasattr(self,'fig'):
self.fig = plt.figure()
else:
plt.close(self.fig)
self.fig = plt.figure()
#plt.ion()
self.fig.gca().plot(self.x[:i],'o')
self.fig.show()
Original Code
Notebook Code
import matplotlib
matplotlib.use('tkAgg')
import mypackage
class_instance = mypackage.myclass()
myclass.fit(n_iterations=100)
the plotting function is a bound method of the class and is called by the fit method.
Plotting Function Function
def update_plot(self,r,LLst,kkk):
if not hasattr(self,'LL_fig'):
self.LL_fig = plt.figure()
else:
plt.close(self.LL_fig)
self.LL_fig = plt.figure()
#plt.ion()
#self.LL_fig.clf()
ax = self.LL_fig.gca()
ax.plot(LLst[1:],linestyle='-',marker='.')
#plt.gca().set_xlim([0,np.max([50,kkk])])
ax.set_xlim([0,np.max([50,kkk])])
ax.set_xlabel('EM iter')
ax.set_ylabel('$\mathcal{L}( \\theta )$')
seaborn.despine(trim=True,offset=15)
#plt.draw()
self.LL_fig.show()
#display.clear_output(wait=True)
#display.display(plt.gcf())
sys.stdout.write("\riter: %s || LL: %s || message: %s" %(kkk,np.round(LLst[-1],decimals=2), r['status']))
sys.stdout.flush()
Also, if I don't close and 're-initialise' the figure each time, the plot starts coming up empty. Any help would be much appreciated!
edit:
if I try using matplotlib inline instead of tkAgg backend I get the following warning message:
UserWarning: matplotlib is currently using a non-GUI backend, so cannot show the figure
"matplotlib is currently using a non-GUI backend, "

Use the cell magic %matplotlib inline (if you aren't familiar with cell magics, just place it in a line on its on in one of your cells)

How to get ATR of a smard card from HID omnilkey

I want to get ATR of the smartcard. I am using HID omnikey 5321. I am following this link "http://pyscard.sourceforge.net/user-guide.html#requesting-any-card"
so far i have tried:
>>>from smartcard.CardType import AnyCardType
>>>from smartcard.CardRequest import CardRequest
>>>from smartcard.util import toHexString
>>>
>>> cardtype = AnyCardType()
>>> cardrequest = CardRequest( timeout=1, cardType=cardtype )
>>> cardservice = cardrequest.waitforcard()
>>>
>>>>>> cardservice.connection.connect()
i am getting error at the
cardservice.connection.connect()
error like:
raise CardConnectionException('Unable to connect with ptotocol[pcscprotocol] + . '+ScardGetErrorMessage(hresult)
CardConnectionException: Unable to conenct the card with T0 or T1 . Card is not responding to reset.

Because You dont specify the reader to connect:
r=readers()
#r[Number of reader list].
cardservice.connection = r[0].createConnection()
cardservice.connection.connect()
A simple Example:
from __future__ import print_function
from smartcard.Exceptions import NoCardException
from smartcard.System import readers
from smartcard.util import toHexString
for reader in readers():
try:
connection = reader.createConnection()
connection.connect()
print(reader, toHexString(connection.getATR()))
except NoCardException:
print(reader, 'no card inserted')
import sys
if 'win32' == sys.platform:
print('press Enter to continue')
sys.stdin.read(1)
-Another Selecting Reader:
from __future__ import print_function
from smartcard.Exceptions import NoCardException
from smartcard.System import readers
from smartcard.util import toHexString
from smartcard.CardType import AnyCardType
from smartcard.CardRequest import CardRequest
cardtype = AnyCardType()
r=readers()
cardrequest = CardRequest( timeout=10, cardType=cardtype )
cardservice = cardrequest.waitforcard()
print('Available Readers:')
for i in range(len(readers())):
print('[',i+1,']',r[i])
if(len(readers()) < 1):
print("\nNO AVAILABLE READERS!\n")
else:
print("Select you Reader: (Ctrl+C to Exit)")
my_input = input()
selectReader = clamp(int(my_input)-1,0,len(readers()))
print('Selected: ',r[selectReader])
cardservice.connection = r[selectReader].createConnection()
cardservice.connection.connect()
try:
print('Card ATR:',toHexString(cardservice.connection.getATR()),file=f)
except:
print("Cant not Get ATR")
.
Full Information:
https://pyscard.sourceforge.io/pyscard-framework.html#framework-samples
https://github.com/LudovicRousseau/pyscard
https://pyscard.sourceforge.io/user-guide.html

In python, you can use the pyscard library to interact with smart cards, there is an example that should help you display the ATR at http://pyscard.sourceforge.net/pyscard-framework.html#framework-samples

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Vectorizing Files using sklearn - python-2.7

Related

Why layer.get_weights() is returning list of length 1 and 4?

get "LogicError: explicit_context_dependent failed: invalid device context - no currently active context? " when running tensorRT in ROS

how to import data from .csv file to django sqlite using import.py

Show matplotlib plot inline in jupyter when plot is created in external function

How to get ATR of a smard card from HID omnilkey

Categories

Resources