fasttext assertion "counts.size() == osz_" failed - c++

I am trying to use fasttext for text classification and I am training on a corpus of 850MB of texts on Windows, but I keep getting the following error:
assertion "counts.size() == osz_" failed: file "src/model.cc", line 206, function: void fasttext::Model::setTargetCounts(const std::vector<long int>&) Aborted (core dumped)
I checked the values of counts.size() and osz_ and found that counts.size = 2515626 and osz_ = 300. When I call in.good() on the input stream in FastText::loadModel i get 0, in.fail()=1 and in.eof()=1.
I am using the following commands to train and test my model:
./fasttext supervised -input fasttextinput -output fasttextmodel -dim 300 -epoch 5 -minCount 5 -wordNgrams 2
./fasttext test fasttextmodel.bin fasttextinput
My input data is properly formatted according to the fasttext github page, so I am wondering if this is some failure of me or a bug.
Thanks for any support on this!

To close this thread:
As #Sixhobbits' pointed out the error was related to https://github.com/facebookresearch/fastText/issues/73 (running out of disk space when saving the fastText supervised model)

Related

Failed to load pre trained onnx models in OpenCV C++

This is my first time with ONNX models and I’m not sure if I’m having a newbie problem so sorry in advance!
I’ve just tried to load a couple of models and I have the same assert always:
[ERROR:0#0.460] global onnx_importer.cpp:1054 cv::dnn::dnn4_v20221220::ONNXImporter::handleNode DNN/ONNX: ERROR during processing node with 3 inputs and 1 outputs: [Concat]:(onnx_node!Concat_2) from domain='ai.onnx'
OpenCV: terminate handler is called! The last OpenCV error is:
OpenCV(4.7.0-dev) Error: Unspecified error (> Node [Concat#ai.onnx]:(onnx_node!Concat_2) parse error: OpenCV(4.7.0-dev) C:\GHA-OCV-2\_work\ci-gha-workflow\ci-gha-workflow\opencv\modules\dnn\src\layers\concat_layer.cpp:105: error: (-215:Assertion failed) curShape.size() == outputs[0].size() in function 'cv::dnn::ConcatLayerImpl::getMemoryShapes'
> ) in cv::dnn::dnn4_v20221220::ONNXImporter::handleNode, file C:\GHA-OCV-2\_work\ci-gha-workflow\ci-gha-workflow\opencv\modules\dnn\src\onnx\onnx_importer.cpp, line 1073
Both models come from https://github.com/PeterL1n/RobustVideoMatting and they are “rvm_resnet50_fp32.onnx” and “rvm_mobilenetv3_fp32.onnx”
Obviously I’m loading them with
robustNN = cv::dnn::readNetFromONNX(robustNNPath);
Thank you in advance for any tip!

BigQuery - Where can I find the error stream?

I have uploaded a CSV file with 300K rows from GCS to BigQuery, and received the following error:
Where can I find the error stream?
I've changed the create table configuration to allow 4000 errors and it worked, so it must be a problem with the 3894 rows in the message, but this error message does not tell me much about which rows or why.
Thanks
I'm finally managed to see the error stream by running the following command in the terminal:
bq --format=prettyjson show -j <JobID>
It returns a JSON with more details.
In my case it was:
"message": "Error while reading data, error message: Could not parse '16.66666666666667' as int for field Course_Percentage (position 46) starting at location 1717164"
You should be able to click on Job History in the BigQuery UI, then click the failed load job. I tried loading an invalid CSV file just now, and the errors that I see are:
Errors:
Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details. (error code: invalid)
Error while reading data, error message: CSV table references column position 1, but line starting at position:0 contains only 1 columns. (error code: invalid)
The first one is just a generic message indicating the failure, but the second error (from the "error stream") is the one that provides more context for the failure, namely CSV table references column position 1, but line starting at position:0 contains only 1 columns.
Edit: given a job ID, you can also use the BigQuery CLI to see complete information about the failure. You would use:
bq --format=prettyjson show -j <job ID>
Using python client it's
from google.api_core.exceptions import BadRequest
job = client.load_table_from_file(*args, **kwargs)
try:
result = job.result()
except BadRequest as ex:
for err in ex.errors:
print(err)
raise
# or alternatively
# job.errors
You could also just do.
try:
load_job.result() # Waits for the job to complete.
except ClientError as e:
print(load_job.errors)
raise e
This will print the errors to screen or you could log them etc.
Following the rest of the answers, you could also see this information in the GCP logs (Stackdriver) tool.
But It might happen that this does not answer your question. It seems like there are detailed errors (such as the one Elliot found) and more imprecise ones. Which gives you no description at all independently of the UI you're using to explore it.

can't load digits trained caffe model with opencv readnetfromcaffe

I've built digits from this tutorial recently, everything is ok and I finally trained my AlexNet model (also trained a SqueezNet so that I can upload the model here) ! the problem is when I download my model from Digits, I can not load it into my program for testing!I have tested my program with GoogleNet downloaded from this link and it's working fine!
I'm using OpenCV readNetFromCaffe in this function to load Caffe model
void deepNetwork::loadModel( cv::String model ,cv::String weight ,string lablesPath,int ps){
patchSize=ps;
labeslPath=lablesPath;
try
{
net = dnn::readNetFromCaffe(weight,model);
cerr<<"loaded succ"<<endl;
}
catch (cv::Exception& e)
{
std::cerr << "Exception: " << e.what() << std::endl;
}}
I get the following error loading my model
OpenCV Error: Assertion failed (pbBlob.raw_data_type() ==
caffe::FLOAT16) in blo
bFromProto, file
/home/nvidia/build-opencv/opencv/modules/dnn/src/caffe/caffe_im
porter.cpp, line 242 Exception:
/home/nvidia/build-opencv/opencv/modules/dnn/src/caffe/caffe_importer
.cpp:242: error: (-215) pbBlob.raw_data_type() == caffe::FLOAT16 in
function blo
bFromProto
OpenCV Error: Requested object was not found (Requested blob "data"
not found) i
n setInput, file
/home/nvidia/build-opencv/opencv/modules/dnn/src/dnn.cpp, line
1606 terminate called after throwing an instance of 'cv::Exception'
what():
/home/nvidia/build-opencv/opencv/modules/dnn/src/dnn.cpp:1606: error:
(-204) Requested blob "data" not found in function setInput
Aborted (core dumped)
any help would be appreciated <3
opencv version 3.3.1 also tested on (3.3.0 ,3.4.1) same error!
testing on a system without Cuda, Cudnn or Caffe just pure c++ and OpenCv...
but i've trained my model on a aws ec2 instance (p3.2xlarge ) with Cuda,Cudnn and caffe !
you can download the trained squeezNet model (.prototxt and .caffemodel) here
finally, I found the problem!
it's a version problem I have digits 6.1.1 working with nvcaffe 0.17.0 for training which is not compatible with previous Caffe and OpenCv libraries ! you have to downgrade NvCaffe to version 0.15.14 and it will open with OpenCv easily!
OpenCV DNN model expect caffemodel in BVLC format. But, NVCaffe stores the caffe model in more efficient format which different than BVLC Caffe.
If you want model compatible with both BVLC/Caffe as well as NVcaffe.
Add this flag in solver.prototxt
store_blobs_in_old_format = true
Please read the DIGITS NVCaffe Documentation.
NVCaffe Documenation - store_blobs_in_old_format

deeplearning4j: cannot use an existing Word2Vec dutchembeddings

I tried to use the dutchembeddings in Word2Vec format with dl4j. But an exception is thrown when loadStaticModel is called: "Unable to guess input file format"
WordVectorSerializer.loadStaticModel(new File(WORD_VECTORS_PATH)
https://github.com/clips/dutchembeddings (I downloaded the wikipedia 160 tar.gz)
How can I get the dutchembeddings in Word2Vec format working with dl4j?
Stacktrace
Loading word vectors and creating DataSetIterators
o.d.m.e.l.WordVectorSerializer - Trying DL4j format...
o.d.m.e.l.WordVectorSerializer - Trying CSVReader...
o.d.m.e.l.WordVectorSerializer - Trying BinaryReader...
Exception in thread "main" java.lang.RuntimeException: Unable to guess input file format
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.loadStaticModel(WordVectorSerializer.java:2646)
at org.deeplearning4j.examples.convolution.sentenceclassification.CnnDutchSentenceClassification.main(CnnDutchSentenceClassification.java:122)
Process finished with exit code 1

Why am I getting an opencv error with size of images when they are as expected?

I was trying to run the opencv gender classification example from this site,
http://docs.opencv.org/modules/contrib/doc/facerec/tutorial/facerec_gender_classification.html
When I try to run the facerec_fisherfaces.cpp it gives an error saying,
OpenCV Error: Unsupported format or combination of formats (In the Fisherfaces method all input samples (training images) must be of equal size! Expected 40000 pixels, but was 0 pixels.) in train, file /home/kavin/opencv/opencv-2.4.10/modules/contrib/src/facerec.cpp, line 564
terminate called after throwing an instance of 'cv::Exception'
what(): /home/kavin/opencv/opencv- 2.4.10/modules/contrib/src/facerec.cpp:564: error: (-210) In the Fisherfaces method all input samples (training images) must be of equal size! Expected 40000 pixels, but was 0 pixels. in function train
Aborted (core dumped)
The following is a csv file entry I have used fir this program:
/home/kavin/Desktop/actors_cropped/female/an1_200_200.jpg;0
I have 50 such entries and images exist in those locations of size 200*200. I have used the same code facerec_fisherfaces.cpp as in the tutorial link i have posted. Please help.
This was because there was an empty row in the csv file and also the encoding had to be UTF-8
There is an invalid image that is included in the training. I solved this by popping the image from the list of images and label.
c = 0
for i in images:
try:
height, width = i.shape[:2]
except:
images.pop(c)
labels.pop(c)
print("An invalid Image has been detected...continuing")
c += 1