test/training effect on classifers results [closed] - weka

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
im struggling to understand the effect of training/test data's effect on my correctly classified instances result.
An example with naive bayes if i apply more test data in percentage split the algorithm becomes more reliable?

The point of splitting your entire data set into training and test is that the model you want to learn (naive Bayes or otherwise) should reflect the true relationship between cause and effect (features and prediction) and not simply the data. For example, you can always fit a curve perfectly to a number of data points, but doing that will likely make it useless for the prediction you were trying to make.
By using a separate test set, the learned model is tested on unseen data. Ideally, the error (or whatever you're measuring) on training and test set would be about the same, suggesting that your model is reasonably general and not overfit to the training data.
If in your case decreasing the size of the training set increases performance on the test set, it suggests that the learned model is too specific and cannot be generalised. Instead of changing the training/test split, you should tweak the parameters of your learner however. You might also want to consider using cross validation instead of a simple training/test split as it will provide more reliable performance estimates.

Related

Matching melodies [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I'm trying to figure out an approach to compare two melodies to see how similar they are. Timbre doesn't matter. So if I had two recordings, one of a flute playing Happy Birthday and one of a Trumpet playing the same thing at the same pitches and tempo, it should consider them a match.
Is there a .NET or C++ library that can do this? If not, can someone give me an idea of what techniques I would need to do something like this?
Aubio has a C++ interface and several methods for performing pitch detection.
Since you are assuring that pitch and tempo will be the same and you seem to be ruling out harmonies, you can measure pitch over time and compare the two results.
Your comparison algorithm will require trial-and-error refinement. Keep in mind:
noise, timbre, and volume fluctuations can make the pitch at any moment a subjective topic
real-world performers can have similar pitch and tempo, but it's unlikely that they'll be perfectly the same.
The two songs may not start at the same moment in the recording.

Informatica : Sequence generate [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Without using sequence generator How can we generate sequences in informatica mapping ?
Thanks
Well, like others said, I would have preferred to get a specific question on why you are trying to avoid sequence generator. Having said that, if i open myself to "the idea of an alternate" to sequence generator, some things do come to mind
If you have a relatively simplistic mapping, you can embed a oracle/db sequence.nextval call hidden in the source qualifier.
you can embed db/sequence call in a sql tranformation too. But know that it would be anti-performant.
you will be able to achieve a sequence generator behaviour using a persistent variable too, but there are limitations and downsides.
So, again, depending upon what you are trying to do and where you are getting stuck, you might want to repost/edit your question.. and perhaps get a more direct answer.

Insert images as part of a QR code? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I'd like to be able to generate custom QR codes that include image/logo designs as part of the actual QR code. Something like detailed here: http://mashable.com/2011/07/23/creative-qr-codes/
Where can I find guidelines on how I can manipulate the QR code that's generated for, say a URL, and still leave it 100% readable/scannable?
Steps of what I'd like to be able to do:
Generate QR Code for something, like a web address. (Already doing this with google charts)
Insert an graphic/image into the code itself, like a logo.
Scan it, and get the URL from step 1.
There are no hard and fast rules for what distortion will cause a scanner to fail. While the code itself is standardized, there are no standards for the algorithms that detect a code from an image, including handling real life distortion like perspective and lighting. Different hardware and software perform differently.
The image at the bottom of http://www.swetake.com/qr/qr1_en.html shows some of the "special" areas of code. It's best to leave those areas, all the black, white, and cyan areas, alone. There are additional reserved areas in all but the smallest codes. These show up first in the the lower right quadrant of the code but get repeated multiple times as the code gets larger. Those should also be left intact, particularly for smaller codes.
Changes to the remaining area can generally be corrected if sufficient error encoding is used when the code is created but, again, a lot depends on capture conditions. An unchanged code will always have the highest probability of decoding.

Large data processing technology & books [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I am looking for good resources on how to query large volume of data efficiently.
Each data item is represented as many different attributes such as quantity, price, history info, etc. The client will provide different query criteria but without requirement to change the dataset. By simply storing all data into MS SQL is not a good method b/c the scalability of MS SQL is not that good. Here we are targeting many tera byte data and need 200-300 CPU clusters.
I am interested in good resources or books that I can at least do some research.
Did you consider NoSql solution as MongoDb ?
If query speed is not your number one issue you should see if you could build a solution with ROOT, possibly in conjunction with PROOF. In contrast to a NoSql solution you would here trade consistency for some speed.
It is used by the CERN experiments to store and retrieve their experimental data (much more than you require) and if you can find a way to handle the I/O it can be made to scale pretty well.
I have heard it is used by some firms doing quantitative finance.

Generating word library - C or C++ [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I need to create a simple application, but speed is very important here. Application is pretty much simple.
It will generate all available chars by saving them to text file. User will enter length that will be used for generating so the application will use recursive function with loop inside.
Will C be faster then C++ in this matter, or it does not matter?
Speed is very important because if my application needs to generate/save to file 10 million+ words.
It doesn't really matter, chances are your application will be I/O bound rather than CPU bound unless you have enough RAM to hold all that in memory.
It's much more important that you choose the best algorithm, and the best data structures to back that algorithm up.
Then implement that in the language you're most familiar with. C++ has the advantage of having easy to use containers in its standard libraries, but that's about it. You can write slow code in both, and fast code in both.