Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Please help with writing program on C++. We have a sequence of brackets. It consists from 4 kinds - (), [], {}, <>. Required to find the shortest sequence with the right placement of brackets, for which the initial sequence would be a subsequence, i. e. would be obtained from the resulting correct sequence by deleting some (possibly zero) number of brackets.
Example:
initial sequence <]}} {([])
the answer: <[] {} {} ([]) <>>
Your proposed answer doesn't seem to fit the requirements. For example, it doesn't look (at least to me) like you can generate the }}{ sequence by deleting elements from <[] {} {} ([]) <>>. You also seem to have a completely unnecessary pair of angle brackets. Presumably, your intent is also that the brackets in the generated sequence are balanced--otherwise, the correct answer is to simply leave the original sequence unchanged. With no other requirements, that's clearly the shortest sequence from which you can generate that sequence by deleting (zero) items.
If the requirement for balancing is correct, it looks like your original input has four possible correct results:
<[]{}{}{([])}>
<[]{}{}{}([])>
<>[]{}{}{}([])
<>[]{}{}{([])}
All these are the same length, so I don't see a particular reason to prefer one over the other. This looks enough like homework that I'm not going just give a direct solution to the problem, but I think the simplest code you could write for the job would probably produce the first of these four solutions (and that may provide at least some guidance about how I'd solve the problem).
I'm reasonably certain this can be done entirely using counters--shouldn't need any sort of "context stacks" (though a stack-based solution is certainly possible as well).
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have a data that consists of DNA-sequences, where the words represented as kmers of length 6, and the sentences represented as DNA-sequences. Each DNA-sequence has 80 kmers (words)
The list of kmers I have is around 130,000 kmers, but after removing the duplicate elements, I would have 4500 kmers only. So, this huge gap confused me in regarding removing the duplicate kmers or not. My question is, is it recommended in this case to remove the duplicated kmers in the word2vec algorithm?
Thanks.
Without an example, it's unclear what you mean by "removing the duplicate elements". (Does that mean, when the same token appears twice in a row? Or twice in one "sentence"? Or, as I'm not familiar with what your data looks like in this domain, something else entirely?)
That you say there are 130,000 tokens in the vocabulary, but then 4,500 later, is also confusing. Typically the "vocabulary" size is the number of unique tokens. Removing duplicate tokens couldn't possibly change the number of unique tokens encountered.
In the usual domain of word2vec, natural language, words don't often repeat one-after-another. To the extent they sometimes might – as in say the utterance "it's very very hot in here" – it's not really an important enough case that I've noticed anyone commenting about handling that "very very" differently than any other two words.
(If a corpus had some artificially-duplicated full-sentences, it might be the case that you'd want to try discarding the exact-duplicate-sentences. Word2vec benefits from a variety of different usage-examples. Repeating the same sentence 10 times essentially just overweights those training examples – it's not nearly as good as 10 contrasting, but still valid, examples of the same words' usage.)
You're in a different domain that's not natural language, with different co-occurrence frequencies, and different end-goals. Word2vec might prove useful, but it's unlikely any general rules-of-thumb or recommendations from other domains will be useful. You should test things both ways, evaluate the results on your ultimate task in a robust repeatable way, and choose based on what you discover.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Please describe (without implementation!) algorithm (possibly fastest)
which receives string of n letters and positive integer k as
arguments, and prints the most frequent substring of length k (if
there are multiple such substrings, algorithm prints any one of them).
String is composed of letters "a" and "b". For example: for string
ababaaaabb and k=3 the answer is "aba", which occurs 2 times (the fact
that they overlap doesn't matter). Describe an algorithm, prove its
correctness and calculate its complexity.
I can use only most basic functions of C++: no vectors, classes, objects etc. I also don't know about strings, only char tables. Can someone please explain to me what the algorithm would be, possibly with implementation in code for easier understanding? That's question from university exam, that's why it's so weird.
A simple solution is by trying all possible substrings from left to right (i.e. starting from indices i=0 to n-k), and comparing each to the next substrings (i.e. starting from indices j=i+1 to n-k).
For every i-substring, you count the number of occurrences, and keep a trace of the most frequentso far.
As a string comparison costs at worst k character comparisons, and you will be performing (n-k-1)(n-k)/2 such comparisons and the total cost is of order O(k(n-k)²). [In fact the cost can be lower because some of the string comparisons may terminate early, but I an not able to perform the evaluation.]
This solution is simple but probably not efficient.
In theory you can reduce the cost using a more efficient string matching algorithm, such as Knuth-Morris-Pratt, resulting in O((n-k)(n+k)) operations.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Can you give me some ideas about how can I make a simple mathematical expression parser in C?
User enters a mathematical function in a string and from the string I want to create the function in C.
eg. x + sin(2*x)
-> return x + sin(2x);
Thanks in advance.
You can parse the expression based "Shunting-Yard Algorithm" http://en.wikipedia.org/wiki/Shunting-yard_algorithm. You will need to extend to handle the function calls such as sin, cos etc...
This is not a simple thing to do at all, in face, it's a hard thing. You need a full grammar parser, combined with pre-defined constants/functions (sin, log, pi, etc).
If you have no extensive previous experience with C I would disrecommend doing this, but if you really want to do this look at recursive descent parsing which is arguably the easiest way to do this (without putting a burden on the user, like reverse polish notation).
Last but not least you say you want to create a C function from the user-generated input. This is almost always a wrong thing to do - generating code from user input, instead the easiest approach is pre-processing to create a intermediate representation that can be efficiently executed.
Writing an expression parser and evaluator is one of the usual examples used when discussions parser writing techniques.
For example you could look the documentation for flex/bison or lex/yacc. That will have examples of constructing parsers/expression evaluators.
One way to do it is to use reverse polish notation for the expressions and a stack for the operands.
Some quick pseudo-code:
if element is operand
push in stack
else if element is operation
pop last 2 elements
perform operation
push result in stack
Repeat till end of expression. Final result is the only element in stack.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I am struggling with a homework problem. I have tried this problem for hours literally. I found a similar question here, but it is not exactly my problem.
The homework problem says 1. (20 points) Construct regular expressions for the following languages.
a) All strings of digits with at most one repeated digit.
The only way I see how this is possible would be to exhaustively somehow take care of every possible case. There are 10 different digits, so it's like A LOT of different cases. I think the max length string can be 11, because after 11, you have to have a second repeated digit. So the number of possible combinations is 10^11. I thought even about writing a DFA and just converting it to a regex, but even that seems like it's impossible.
Does anyone have any advice? We aren't allowed to use non-standard regex features, like groups, lookahead, etc. This is just a plain old regex kind of problem.
Response to comment:
It is not binary. I already asked the teacher.
"Commenters, “regular expression” has one well-defined meaning in computer science. Since this is homework, it’s almost certainly that which is meant (and even more so as it talks of “languages”), and not some specific library. There’s no ambiguity here, and no clarification needed." This is basically what we want. The standard regex stuff often used in theoretical CS classes. As far as what we learned in class, I go to USC if anyone is familiar with that and we only barely talked about this at all. We're onto a completely different topic now.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm trying to prove that the RE
s(bs)*[(abb*a*)*ab]*aa(a∪b) where s=b*a*ab
can be simplified to
(a∪b)*abaa(a∪b),
but all the general transformations I know such as (ab)*a=a(ba)* or (a*b)*a*=(a∪b)* seem to have no effect.
So the questions are:
Is a step-by-step transformation process here viable? If so, can you show me how it is done?
If other types of technique are needed for the proof, what are they?
Is there a clear & comprehensive reference explaining techniques used in simplifying formal RE's?
Thanks.
There are lots of ways to do this.
One simple way to do it is to convert the regular expressions to NFAs. To see if two NFAs recognize the same language:
Consider the starting state of both NFAs.
Take the epsilon closure of the states you are considering for each NFA.
If the set of states for one NFA contains at least one accepting state but the set for the other NFA contains no accepting states, then the NFAs are not equal and you are done.
Otherwise, for each symbol, follow that symbol to get a new set of states for each NFA, and go back to step 1.
This will take a finite number of steps because there are a finite number of sets of states to consider.
You could also convert both regular expressions to minimal DFAs and show that they are isormorphic, but this technique is really just the same thing as the above technique but with the steps in a different order.
Illustration
Consider the REs aa* and a*a. It's obvious that they recognize the same language, but we'll use them to illustrate the algorithm.
First we consider the starting states for each NFA. These are {1} for the left RE and {1} for the right RE, which we will write as {1},{1}.
Then we take the epsilon closure, which is {1},{1,2}.
Neither set contains an ending state, so we continue.
The outgoing symbols are only a, so we take all a transitions to get {2},{1,3}.
The epsilon closure is {2,3},{1,2,3}.
Both contain ending states, so we continue.
Again, the outgoing symbols are a, so we take all a transitions to get {2},{1,3}.
The epsilon closure is {2,3},{1,2,3}. This is the same as step #5, so we don't do it again.
Since there is nothing else to examine, we are done, and the two have been proven identical.