deeplearning4j: cannot use an existing Word2Vec dutchembeddings - word2vec

I tried to use the dutchembeddings in Word2Vec format with dl4j. But an exception is thrown when loadStaticModel is called: "Unable to guess input file format"
WordVectorSerializer.loadStaticModel(new File(WORD_VECTORS_PATH)
https://github.com/clips/dutchembeddings (I downloaded the wikipedia 160 tar.gz)
How can I get the dutchembeddings in Word2Vec format working with dl4j?
Stacktrace
Loading word vectors and creating DataSetIterators
o.d.m.e.l.WordVectorSerializer - Trying DL4j format...
o.d.m.e.l.WordVectorSerializer - Trying CSVReader...
o.d.m.e.l.WordVectorSerializer - Trying BinaryReader...
Exception in thread "main" java.lang.RuntimeException: Unable to guess input file format
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.loadStaticModel(WordVectorSerializer.java:2646)
at org.deeplearning4j.examples.convolution.sentenceclassification.CnnDutchSentenceClassification.main(CnnDutchSentenceClassification.java:122)
Process finished with exit code 1

Related

Failed to load pre trained onnx models in OpenCV C++

This is my first time with ONNX models and I’m not sure if I’m having a newbie problem so sorry in advance!
I’ve just tried to load a couple of models and I have the same assert always:
[ERROR:0#0.460] global onnx_importer.cpp:1054 cv::dnn::dnn4_v20221220::ONNXImporter::handleNode DNN/ONNX: ERROR during processing node with 3 inputs and 1 outputs: [Concat]:(onnx_node!Concat_2) from domain='ai.onnx'
OpenCV: terminate handler is called! The last OpenCV error is:
OpenCV(4.7.0-dev) Error: Unspecified error (> Node [Concat#ai.onnx]:(onnx_node!Concat_2) parse error: OpenCV(4.7.0-dev) C:\GHA-OCV-2\_work\ci-gha-workflow\ci-gha-workflow\opencv\modules\dnn\src\layers\concat_layer.cpp:105: error: (-215:Assertion failed) curShape.size() == outputs[0].size() in function 'cv::dnn::ConcatLayerImpl::getMemoryShapes'
> ) in cv::dnn::dnn4_v20221220::ONNXImporter::handleNode, file C:\GHA-OCV-2\_work\ci-gha-workflow\ci-gha-workflow\opencv\modules\dnn\src\onnx\onnx_importer.cpp, line 1073
Both models come from https://github.com/PeterL1n/RobustVideoMatting and they are “rvm_resnet50_fp32.onnx” and “rvm_mobilenetv3_fp32.onnx”
Obviously I’m loading them with
robustNN = cv::dnn::readNetFromONNX(robustNNPath);
Thank you in advance for any tip!

What gridmix input format likes?

I use Rumen mine job-history files, contains job-trace.json and job-topology.json.
GirdMix usage likes:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-gridmix-2.7.3.jar -libjars $HADOOP_HOME/share/hadoop/tools/lib/hadoop-rumen-2.7.3.jar -Dgridmix.compression-emulation.enable=false <iopath> <trace>
And, means working directory for Gridmix, so I feed with: file:///home/hadoop/input, means the trace file extracted from log files, feed with file:///home/hadoop/rumen/job-trace-1hr.json.
Finally, meet with following Exceptions:
2019-03-07 16:37:12,495 ERROR [main] gridmix.Gridmix (Gridmix.java:start(534)) - Startup failed. java.io.IOException: Found no satisfactory file in file:/home//hadoop/input
2019-03-07 16:37:13,040 INFO [main] util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 2
2019-03-07 16:37:13,041 INFO [Thread-1] gridmix.Gridmix (Gridmix.java:run(657)) - Exiting...
So what this parameter like , or how to use it?
can anyone have some ideas?
Thanks.
I found it's my own incorrect useage;
I check out gridmix parameters usage, due to too small input data.
gridmix.min.file.size | The minimum size of the input files. The default limit is 128 MiB. Tweak this parameter if you see an error-message like "Found no satisfactory file" while testing GridMix with a relatively-small input data-set.
So, I tuned larger input data.
Using -generate 10G.
Thanks.

fasttext assertion "counts.size() == osz_" failed

I am trying to use fasttext for text classification and I am training on a corpus of 850MB of texts on Windows, but I keep getting the following error:
assertion "counts.size() == osz_" failed: file "src/model.cc", line 206, function: void fasttext::Model::setTargetCounts(const std::vector<long int>&) Aborted (core dumped)
I checked the values of counts.size() and osz_ and found that counts.size = 2515626 and osz_ = 300. When I call in.good() on the input stream in FastText::loadModel i get 0, in.fail()=1 and in.eof()=1.
I am using the following commands to train and test my model:
./fasttext supervised -input fasttextinput -output fasttextmodel -dim 300 -epoch 5 -minCount 5 -wordNgrams 2
./fasttext test fasttextmodel.bin fasttextinput
My input data is properly formatted according to the fasttext github page, so I am wondering if this is some failure of me or a bug.
Thanks for any support on this!
To close this thread:
As #Sixhobbits' pointed out the error was related to https://github.com/facebookresearch/fastText/issues/73 (running out of disk space when saving the fastText supervised model)

cascade face detection C++ Opencv 3.0

I am trying to implement face detection mentioned in the tutorial
http://docs.opencv.org/3.0-beta/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html#cascade-classifier
I am using OpenCV 3.0 on Ubuntu 14.04.
I downloaed the cascade xml files from here
https://github.com/opencv/opencv/tree/master/data/haarcascades
When I compile the code it gives me this error message:
OpenCV Error: Parsing error (/...../haarcascade_frontalcatface.xml(5): Valid XML should start with '<?xml ...?>') in icvXMLParse, file /home/taleb/opencv3/opencv/modules/core/src/persistence.cpp, line 2220
terminate called after throwing an instance of 'cv::Exception'
what(): /home/taleb/opencv3/opencv/modules/core/src/persistence.cpp:2220: error: (-212) /home/taleb/pythonproject/test1/haarcascade_frontalcatface.xml(5): Valid XML should start with '<?xml ...?>' in function icvXMLParse
Any suggestion?
I found a couple of fixes in stack overflow and other websites. They are as follows:
Change the character encoding from UTF-8 to ANSI with Notepad++.
Previous answer:
convert_cascade is for cascades trained by haartraining application and it does not support a format of cascades trained by traincascade application.
To do this with traincascade, just run opencv_traincascade again with
the same "-data" but set "-numStages" to the point you want to
generate up to. The application will load the trained stages, realize
that there is required number of stages, write the result cascade in
xml and finish a work. Interrupting the process during a stage could
result in corrupt data, so if you're best off deleting the stage in
completion.
refrence: https://stackoverflow.com/a/25831423/5671364.
XML Standard states:
if no encoding declaration is present in the XML document (and no
external encoding declaration mechanism such as the HTTP header is
available), the assumed encoding of an XML document depends on the
presence of the Byte-Order-Mark (BOM).
There are 3 ways to fix this:
Let OpenCV just put the ´encoding="ASCII"´ tag into the top root XML
tag.
Leave the top root XML tag, but encode everything as UTF-8
before writing it to file.
Do something else, with Byte-Order-Mark,
but keep it to the standard.
refrence: http://code.opencv.org/issues/976

error retrieving background image from BackgroundSubtractorMOG2

I'm trying to get the background image from BackgroundSubtractorMOG2:
bg->getBackgroundImage(back);
but I get a Thread 1 SIGABRT (which as a c++ n00b puzzles me)
and this error:
OpenCV Error: Assertion failed (nchannels == 3) in getBackgroundImage, file /Users/hm/Downloads/OpenCV-2.4.4/modules/video/src/bgfg_gaussmix2.cpp, line 579
libc++abi.dylib: terminate called throwing an exception
(lldb)
I'm not sure what the problem is, suspecting it's something to do with the nmixtures paramater, but I've left that as the default(3). Any hints ?
It looks like you need to use 3 channel images rather than grayscale. Make sure the image type you are using is CV_8UC3 or if you are reading from a file use cv::imread('path/to/file') with no additional arguments.