beginner question on investigating on samples in Weka

beginner question on investigating on samples in Weka - data-mining

I've just used Weka to train my SVM classifier under "Classify" tag.
Now I want to further investigate which data samples are mis-classified,I need to study their pattern,but I don't know where to look at this from Weka.
Could anyone give me some help please?
Thanks in advance.

You can enable the option from:
You will get the following instance predictions:
=== Predictions on test split ===
inst# actual predicted error prediction
1 2:Iris-ver 2:Iris-ver 0.667
...
16 3:Iris-vir 2:Iris-ver + 0.667
EDIT
As I explained in the comments, you can use the StratifiedRemoveFolds filter to manually split the data and create the 10-folds of the cross-validation.
This Primer from the Weka wiki has some examples of how to invoke Weka from the command line. Here's a sample bash script:
#!/bin/bash
# I assume weka.jar is on the CLASSPATH
# 10-folds CV
for f in $(seq 1 10); do
echo -n "."
# create train/test set for fold=f
java weka.filters.supervised.instance.StratifiedRemoveFolds -i iris.arff \
-o iris-f$f-train.arff -c last -N 10 -F $f -V
java weka.filters.supervised.instance.StratifiedRemoveFolds -i iris.arff \
-o iris-f$f-test.arff -c last -N 10 -F $f
# classify using SVM and store predictions of test set
java weka.classifiers.functions.SMO -C 1.0 \
-K "weka.classifiers.functions.supportVector.RBFKernel -G 0.01" \
-t iris-f$f-train.arff -T iris-f$f-test.arff \
-p 0 > f$f-pred.txt
#-i > f$f-perf.txt
done
echo
For each fold, this will create two datasets (train/test) and store the predictions in a text file as well. That way you can match each index with the actual instance in the test set.
Of course the same can be done in the GUI if you prefer (only a bit more tedious!)

Related

How to pass a command which contains special characters through SSH?

I would like to run the following command from Jenkins:
ssh -i ~/.ssh/company.pem -o StrictHostKeyChecking=no user#$hostname "supervisorctl start company-$app ; awk -v app=$app '$0 ~ "program:company-"app {p=NR} p && NR==p+6 && /^autostart/ {$0="autostart=true" ; p=0} 1' /etc/supervisord.conf > $$.tmp && sudo mv $$.tmp /etc/supervisord.conf”
This is one of the last steps of a job which creates a CloudFormation stack.
Running the command from the target server's terminal works properly.
In this step, I'd like to ssh to each one of the servers (members of ASG's within the new stack) and search and replace a specific line as shown above in the /etc/supervisord.conf, basically setting one specific service to autostart.
When I run the command I get the following error:
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
I've tried escaping the double quotes but got the same error, any idea what I'm doing wrong?

You are running in to this issue due to the way the shell handles nested quotes. This is a use case for a HERE DOCUMENT or heredoc - A HERE DOCUMENT allows you to write multi-line commands passed through bash without worrying about quotes. The structure is as follows:
$ ssh -t user#server.com <<'END'
command |\
command2 |\
END
<--- Oh yeah, the -t is important to the ssh command as it lets the shell know to behave as if being used interactively, and will avoid warnings and unexpected results.
In your specific case, you should try something like:
$ ssh -t -i ~/.ssh/company.pem -o StrictHostKeyChecking=no user#$hostname <<'END'
supervisorctl start company-$app |\
awk -v app=$app '$0 ~ \"program:company-\"app {p=NR} p && NR==p+6 \
&& /^autostart/ {$0="autostart=true" ; p=0} 1' \
/etc/supervisord.conf > $$.tmp && sudo mv $$.tmp /etc/supervisord.conf
END
Just a note, since I can't be sure about your desired output of the command you are running, be advised to keep track of your own " and ' marks, and to escape them accordingly in your awk command as you would at an interactive terminal. I notice the "'s around program:company and I am confused a bit by them If they are a part of the pattern in the string being searched they will need to be escaped accordingly. P.S.

proper syntax for splitting large mp3 files into several

I can split one large mp3 file into several files based on silence using the mp3split command / program below
mp3splt -f -t 4.0 -a -d split audio_file.mp3
and I get
split/audio_file_000m_00s_005m_00s.mp3
but how can I get
split/000m_00s_005m_00s_audio_file.mp3
or increment by one in the front
split/000_audio_file_000m_00s_005m_00s.mp3
split/001_audio_file_005m_00s_010m_00s.mp3
I looked at the syntax http://wiki.librivox.org/index.php/How_To_Split_With_Mp3Splt but couldn't figure out what needs to change in my syntax.
I'm using ubuntu 16.04 64bit linux

You need to set the -o (output format) option.
Try something like:
mp3splt -o #N3_#f -f -t 4.0 -a -d split audio_file.mp3
Giving you:
001_audio_file.mp3,
002_audio_file.mp3,
003_audio_file.mp3…
The man page is a little messy, but it's all there.

I used
mp3splt -o #N3_#mm_#ss_#f -f -t 4.0 -a -d split audio_file.mp3
which gives me
/split/001_000m_00s_audio_file.mp3
/split/002_004m_00s_audio_file.mp3

GIZA++ output missing .ti.final and actual.ti.final files

I am having issues understanding the basics of how to run the GIZA++.
I went through the discussion, here on StackOverflow (Is there a tutorial about giza++?) and through the links people provided there. I have downloaded and compiled the latest giza from the Moses-SMT Github.
git clone https://github.com/moses-smt/giza-pp.git
cd giza-pp
make
After the successful compilation I have written a simple script for testing purposes.
#!/bin/bash
SRC=french
TRG=english
PREFIX=out
GIZA=../giza-pp
# Cleaning from previous run ...
rm -f *.log
rm -f *.vcb
rm -f *.snt
rm -f *.vcb.classes
rm -f *.vcb.classes.cats
rm -f *.gizacfg
rm -f *.cooc
rm -f ${PREFIX}*
# Converting plain text into sentence format using the "plain2snt.out" tool ...
${GIZA}/GIZA++-v2/plain2snt.out ${SRC} ${TRG}
# Generating word clusters using the "mkcls" tool ...
${GIZA}/mkcls-v2/mkcls -p${SRC} -V${SRC}.vcb.classes
${GIZA}/mkcls-v2/mkcls -p${TRG} -V${TRG}.vcb.classes
# Generating coocurrence using the "snt2cooc" tool ...
${GIZA}/GIZA++-v2/snt2cooc.out ${SRC}.vcb ${TRG}.vcb ${SRC}_${TRG}.snt > ${SRC}_${TRG}.cooc
# Running "GIZA++" ...
${GIZA}/GIZA++-v2/GIZA++ -S ${SRC}.vcb -T ${TRG}.vcb -C ${SRC}_${TRG}.snt -CoocurrenceFile ${SRC}_${TRG}.cooc -o ${PREFIX} >> giza.log 2>&1
Now this is the content of the directory right after I run the script.
jakub#jakub-virtual-machine:~/Master/giza-pp_test$ ls
english french_english.snt out.d3.final out.perp
english_french.snt french.vcb out.d4.final out.t3.final
english.vcb french.vcb.classes out.D4.final out.trn.src.vcb
english.vcb.classes french.vcb.classes.cats out.Decoder.config out.trn.trg.vcb
english.vcb.classes.cats giza.log out.gizacfg out.tst.src.vcb
french out.a3.final out.n3.final out.tst.trg.vcb
french_english.cooc out.A3.final out.p0_3.final run_test.sh
The point is that the output is missing (for me important) files listed below.
out.ti.final
out.actual.ti.final
Now I've been looking to the GIZA's Main.cpp (lines: 260 - 273) and can see the lines that should be creating these files.
cerr << "writing Final tables to Disk \n";
string t_inv_file = Prefix + ".ti.final" ;
if( !FEWDUMPS)
m1.getTTable().printProbTableInverse(t_inv_file.c_str(), m1.getEnglishVocabList(),
m1.getFrenchVocabList(),
m1.getETotalWCount(),
m1.getFTotalWCount());
t_inv_file = Prefix + ".actual.ti.final" ;
if( !FEWDUMPS )
m1.getTTable().printProbTableInverse(t_inv_file.c_str(),
eTrainVcbList.getVocabList(),
fTrainVcbList.getVocabList(),
m1.getETotalWCount(),
m1.getFTotalWCount(), true);
I am also having the "cerr" line printed in the log, but I just cannot find out why these files are not present within the output.
jakub#jakub-virtual-machine:~/Master/giza-pp_test$ cat giza.log | tail -n 25
p0_count is 4.0073 and p1 is 5.99635; p0 is 0.400584 p1: 0.599416
Model4: TRAIN CROSS-ENTROPY 0.80096 PERPLEXITY 1.74226
Model4: (10) TRAIN VITERBI CROSS-ENTROPY 0.801289 PERPLEXITY 1.74266
Dumping alignment table (a) to file:out.a3.final
Dumping distortion table (d) to file:out.d3.final
Dumping nTable to: out.n3.final
Model4 Viterbi Iteration : 10 took: 0 seconds
H3333344444 Training Finished at: Fri Oct 23 16:24:44 2015
Entire Viterbi H3333344444 Training took: 0 seconds
==========================================================
writing Final tables to Disk
Writing PERPLEXITY report to: out.perp
Writing source vocabulary list to : out.trn.src.vcb
Writing source vocabulary list to : out.trn.trg.vcb
Writing source vocabulary list to : out.tst.src.vcb
Writing source vocabulary list to : out.tst.trg.vcb
writing decoder configuration file to out.Decoder.config
Entire Training took: 0 seconds
Program Finished at: Fri Oct 23 16:24:44 2015
==========================================================
Did someone please run into a similar problem please? Is this some kind of bug or I am doing something wrong?
Edit:
Now I have recompiled the whole GIZA++ without the -DBINARY_SEARCH_FOR_TTABLE option within the CFLAGS in a Makefile. And changed the script so that it won't generate and provide the coocurrence file to the GIZA++. After I have re-ran the script the output did contain the out.actual.ti.final and out.ti.final. Does anybody know how to explain this behaviour? I taught that I would get a better allignment and probability estimates using the coocurrence file, is it of any need please? Or is it only for improving the speed of the performance?

I faced the same issue before .
I think the missing step is
In the Makefile located at .\giza-pp\GIZA++-v2\, substitute the line:
CFLAGS_OPT = $(CFLAGS) -O3 -funroll-loops -DNDEBUG -DWORDINDEX_WITH_4_BYTE -DBINARY_SEARCH_FOR_TTABLE -DWORDINDEX_WITH_4_BYTE
with the line:
CFLAGS_OPT = $(CFLAGS) -O3 -funroll-loops -DNDEBUG -DWORDINDEX_WITH_4_BYTE -DWORDINDEX_WITH_4_BYTE
Check this put and good luck

Regex not working with Grep

im trying to print the content of a html table cell.
i thought the easiest way to do this was with grep,
but for some reason the regex works on regexr.com but not within Grep.
Maybe something with escaping? i tried escaping al the smaller and larger than <> symbols.
This is the code i'm using
wget -q -O login.html --save-cookies cookies.txt --keep-session-cookies --post-data 'username=sssss&password=fffff' http://ffffff/login
wget -q -O page.html --load-cookies cookies.txt http://ffffff/somepage |grep -P '(?<=<tr><td class=list2>www</td><td class=list2 align=center>A</td><td class=list2 >)(.*?)(?=</td><td class=list2 align=center><input type=checkbox name=arecs5)' |recode html...ascii
Can anybody help me please? I'm from the netherlands so sorry for my english.
i aslo tried adding the -c option and it printed 0
EDIT:
Added my full code, i found 1 mistake. i didn't have the -O parameter to output the page's html. but it still doesnt work. it prints nothing

Traditional grep doesn't support lookarounds the way you're using it.
Try using grep -P (PCRE):
grep -P 'pattern' file

Consider using Ack or ag that supports natively PCRE.

Finally, it works.
I added -qO- to wget, i don't know why but when adding a - after the -O it works.

Crop MP3 to first 30 seconds

Original Question
I want to be able to generate a new (fully valid) MP3 file from an existing MP3 file to be used as a preview -- try-before-you-buy style. The new file should only contain the first n seconds of the track.
Now, I know I could just "chop the stream" at n seconds (calculating from the bitrate and header size) when delivering the file, but this is a bit dirty and a real PITA on a VBR track. I'd like to be able to generate a proper MP3 file.
Anyone any ideas?
Answers
Both mp3split and ffmpeg are both good solutions. I chose ffmpeg as it is commonly installed on linux servers and is also easily available for windows. Here's some more good command line parameters for generating previews with ffmpeg
-t <seconds> chop after specified number of seconds
-y force file overwrite
-ab <bitrate> set bitrate e.g. -ab 96k
-ar <rate Hz> set sampling rate e.g. -ar 22050 for 22.05kHz
-map_meta_data <outfile>:<infile> copy track metadata from infile to outfile
instead of setting -ab and -ar, you can copy the original track settings, as Tim Farley suggests, with:
-acodec copy

I also recommend ffmpeg, but the command line suggested by John Boker has an unintended side effect: it re-encodes the file to the default bitrate (which is 64 kb/s in the version I have here at least). This might give your customers a false impression of the quality of your sound files, and it also takes longer to do.
Here's a command line that will slice to 30 seconds without transcoding:
ffmpeg -t 30 -i inputfile.mp3 -acodec copy outputfile.mp3
The -acodec switch tells ffmpeg to use the special "copy" codec which does not transcode. It is lightning fast.
NOTE: the command was updated based on comment from Oben Sonne

If you wish to REMOVE the first 30 seconds (and keep the remainder) then use this:
ffmpeg -ss 30 -i inputfile.mp3 -acodec copy outputfile.mp3

try:
ffmpeg -t 30 -i inputfile.mp3 outputfile.mp3

This command also works perfectly.
I cropped my music files from 20 to 40 seconds.
-y : force output file to overwrite.
ffmpeg -i test.mp3 -ss 00:00:20 -to 00:00:40 -c copy -y temp.mp3

you can use mp3cut:
cutmp3 -i foo.mp3 -O 30s.mp3 -a 0:00.0 -b 0:30.0
It's in ubuntu repo, so just: sudo apt-get install cutmp3.

You might want to try Mp3Splt.
I've used it before in a C# service that simply wrapped the mp3splt.exe win32 process. I assume something similar could be done in your Linux/PHP scenario.

I have got an error while doing the same
Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argumentStream mapping:
Fix for me was:
ffmpeg -ss 00:02:43.00 -t 00:00:10 -i input.mp3 -codec:a libmp3lame out.mp3

My package medipack is a very simple command-line app as a wrapper over ffmpeg.
you can achieve trimming your video using these commands:
medipack trim input.mp3 -s 00:00 -e 00:30 -o output.mp3
medipack trim input.mp3 -s 00:00 -t 00:30 -o output.mp3
you can view options of trim subcommand as:
srb#srb-pc:$ medipack trim -h
usage: medipack trim [-h] [-s START] [-e END | -t TIME] [-o OUTPUT] [inp]
positional arguments:
inp input video file ex: input.mp4
optional arguments:
-h, --help show this help message and exit
-s START, --start START
start time for cuting in format hh:mm:ss or mm:ss
-e END, --end END end time for cuting in format hh:mm:ss or mm:ss
-t TIME, --time TIME clip duration in format hh:mm:ss or mm:ss
-o OUTPUT, --output OUTPUT
you could also explore other options using medipack -h
srb#srb-pc:$ medipack --help
usage: medipack.py [-h] [-v] {trim,crop,resize,extract} ...
positional arguments:
{trim,crop,resize,extract}
optional arguments:
-h, --help show this help message and exit
-v, --version Display version number
you may visit my repo https://github.com/srbcheema1/medipack and checkout examples in README.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

beginner question on investigating on samples in Weka - data-mining

I've just used Weka to train my SVM classifier under "Classify" tag. Now I want to further investigate which data samples are mis-classified,I need to study their pattern,but I don't know where to look at this from Weka. Could anyone give me some help please? Thanks in advance.

Related

How to pass a command which contains special characters through SSH?

proper syntax for splitting large mp3 files into several

GIZA++ output missing .ti.final and actual.ti.final files

Regex not working with Grep

Crop MP3 to first 30 seconds

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

beginner question on investigating on samples in Weka - data-mining

I've just used Weka to train my SVM classifier under "Classify" tag. Now I want to further investigate which data samples are mis-classified,I need to study their pattern,but I don't know where to look at this from Weka. Could anyone give me some help please? Thanks in advance.

Related

How to pass a command which contains special characters through SSH?

proper syntax for splitting large mp3 files into several

GIZA++ output missing *.ti.final and *actual.ti.final files

Regex not working with Grep

Crop MP3 to first 30 seconds

Categories

Resources

GIZA++ output missing .ti.final and actual.ti.final files