What is the the fastest algorithm in DNA pattern matching [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
Suppose we have a string S with a length of several millions. The string only contains 'a' 't' 'g' 'c' and we have a pattern W with a length of roughly 20. What could be the fastest algorithm in C++ to find ALL occurrences of W in S? It seems KMP is not fast enough.

KMP is linear in S+W. You can't get faster than that.
You at least need to read the data, and that is also linear. So even if your algorithm is instant, you still can't do much better than KMP.
I suspect you do something wrong reading the data or traversing it in a way that destroys caching.

You could try a Suffix Tree although, if you are only processing it once, the tree takes O(n log n) to create, so KMP is faster for single checkings. So if you have multiple distinct 'W's to find then I would go with a Suffix Tree, else KMP is probably your best bet.
From the wikipedia article:
The suffix array of a string can be used as an index to quickly locate
every occurrence of a substring pattern P within the string S. Finding
every occurrence of the pattern is equivalent to finding every suffix
that begins with the substring. Thanks to the lexicographical
ordering, these suffixes will be grouped together in the suffix array
and can be found efficiently with two binary searches.

Related

Most frequent substring of fixed length - simple solution needed [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Please describe (without implementation!) algorithm (possibly fastest)
which receives string of n letters and positive integer k as
arguments, and prints the most frequent substring of length k (if
there are multiple such substrings, algorithm prints any one of them).
String is composed of letters "a" and "b". For example: for string
ababaaaabb and k=3 the answer is "aba", which occurs 2 times (the fact
that they overlap doesn't matter). Describe an algorithm, prove its
correctness and calculate its complexity.
I can use only most basic functions of C++: no vectors, classes, objects etc. I also don't know about strings, only char tables. Can someone please explain to me what the algorithm would be, possibly with implementation in code for easier understanding? That's question from university exam, that's why it's so weird.
A simple solution is by trying all possible substrings from left to right (i.e. starting from indices i=0 to n-k), and comparing each to the next substrings (i.e. starting from indices j=i+1 to n-k).
For every i-substring, you count the number of occurrences, and keep a trace of the most frequentso far.
As a string comparison costs at worst k character comparisons, and you will be performing (n-k-1)(n-k)/2 such comparisons and the total cost is of order O(k(n-k)²). [In fact the cost can be lower because some of the string comparisons may terminate early, but I an not able to perform the evaluation.]
This solution is simple but probably not efficient.
In theory you can reduce the cost using a more efficient string matching algorithm, such as Knuth-Morris-Pratt, resulting in O((n-k)(n+k)) operations.

Best possible way to search for a given value among N unsorted numbers [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
One of my friend has been asked with a question in an interview:
The best possible way to search for a given value among N unsorted numbers in a array.
If the array is unsorted, you need to perform a linear scan of the list. This examines (worst case) every element in the array. Such a search is O(n).
Sorting won't help here, since the best sorts run in O(n log n).
If it was sorted I'd say std::binary_search, but for unsorted, just go with std::find (unless the container you use has a member find; if it does, then use that as it is probably faster).
It depends:
-if you just want to search a single value, the answer given by bush is enough.
-if you know what you want to perform repeated "query", it could be better to perform before some kind of preprocessing,in order to find fastly these value.If you want to know only if a value is contained in the array, you can use structures like hashset,bloom filter,etc..
In other cases,you would like to know also position of items inside the array. In this scenario,you can consider to use an hashmap

how to parse mathematical functions in C++ [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Can you give me some ideas about how can I make a simple mathematical expression parser in C?
User enters a mathematical function in a string and from the string I want to create the function in C.
eg. x + sin(2*x)
-> return x + sin(2x);
Thanks in advance.
You can parse the expression based "Shunting-Yard Algorithm" http://en.wikipedia.org/wiki/Shunting-yard_algorithm. You will need to extend to handle the function calls such as sin, cos etc...
This is not a simple thing to do at all, in face, it's a hard thing. You need a full grammar parser, combined with pre-defined constants/functions (sin, log, pi, etc).
If you have no extensive previous experience with C I would disrecommend doing this, but if you really want to do this look at recursive descent parsing which is arguably the easiest way to do this (without putting a burden on the user, like reverse polish notation).
Last but not least you say you want to create a C function from the user-generated input. This is almost always a wrong thing to do - generating code from user input, instead the easiest approach is pre-processing to create a intermediate representation that can be efficiently executed.
Writing an expression parser and evaluator is one of the usual examples used when discussions parser writing techniques.
For example you could look the documentation for flex/bison or lex/yacc. That will have examples of constructing parsers/expression evaluators.
One way to do it is to use reverse polish notation for the expressions and a stack for the operands.
Some quick pseudo-code:
if element is operand
push in stack
else if element is operation
pop last 2 elements
perform operation
push result in stack
Repeat till end of expression. Final result is the only element in stack.

Right sequence of brackets [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Please help with writing program on C++. We have a sequence of brackets. It consists from 4 kinds - (), [], {}, <>. Required to find the shortest sequence with the right placement of brackets, for which the initial sequence would be a subsequence, i. e. would be obtained from the resulting correct sequence by deleting some (possibly zero) number of brackets.
Example:
initial sequence <]}} {([])
the answer: <[] {} {} ([]) <>>
Your proposed answer doesn't seem to fit the requirements. For example, it doesn't look (at least to me) like you can generate the }}{ sequence by deleting elements from <[] {} {} ([]) <>>. You also seem to have a completely unnecessary pair of angle brackets. Presumably, your intent is also that the brackets in the generated sequence are balanced--otherwise, the correct answer is to simply leave the original sequence unchanged. With no other requirements, that's clearly the shortest sequence from which you can generate that sequence by deleting (zero) items.
If the requirement for balancing is correct, it looks like your original input has four possible correct results:
<[]{}{}{([])}>
<[]{}{}{}([])>
<>[]{}{}{}([])
<>[]{}{}{([])}
All these are the same length, so I don't see a particular reason to prefer one over the other. This looks enough like homework that I'm not going just give a direct solution to the problem, but I think the simplest code you could write for the job would probably produce the first of these four solutions (and that may provide at least some guidance about how I'd solve the problem).
I'm reasonably certain this can be done entirely using counters--shouldn't need any sort of "context stacks" (though a stack-based solution is certainly possible as well).

Generate prime factors of a number [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm trying to write a function that given an Int greater than one gives a non decreasing list made of the prime factors (with repetition) of that number
Example: n = 12, the output should be [2,2,3]
I don't know where to start.
There are of course well know algorithms for what you want to do, so simple google search would really solve that.
However, I'd like to show you a simple thinking process that might be helpful in the future.
Since the factors have to appear in the ascending order, you might:
Start with the lowest prime (2).
Check if the number can be divided by it. If it can, do it and go back to 1.
If not, replace 2 with a next prime and go back to 2.
Now, it's obvious that the biggest prime you will ever check is the number you've started with. However, the basic multiplication axiom states that if a number can be divided by a:
n / a = b
Then it can also be divided by b! You can use that fact to further narrow the checking range, but I'll leave it to you to figure (or google) the upper bound.
Actual implementation is of course a part of your homework and thus supplying code wouldn't be a wise idea here. However, I don't think that stuff such as next_prime will be hard for you.