I've implemented a "big-int" in scheme as a list, so the first element is the sign of the number (+ or -) and the following are the value of the number itself, first the ones, then the tens etc.
For example: (+ 0 0 1) is for 100, (- 9 2 3 1) is for -1329 etc.
What I need now is to implement addition, subtraction and multiplication for big-ints implemented in this manner. I've done addition and subtraction, can someone help me with multiplication, please?
Break the problem up into smaller pieces. First, write a function that multiplies a Big-int by a single digit. Then, extend this (using Greg Hewgill's hint that you probably know how to do this on paper) to a function that multiplies a Big-int by a list of digits. Finally, wrap this in a function that accepts two Big-ints, strips off the sign, and then calls your previous function.
I strongly suggest that you write test cases for these functions before developing them.
here is a well known divide-and-conquer approach to big-int multiplication:
http://ozark.hendrix.edu/~burch/csbsju/cs/160/notes/31/1.html
This approach cuts the two numbers into halves and deals with them recursively. It is very fast, and it is easy to implement, despite the long explanation it may seem like. I highly recommend it. It takes all of about 10 lines of code in other languages, probably fewer in scheme :).
Related
So I have the following task, where I have to explain whether or not it's possible to construct a regular expression following these constraints:
Numbers N < 1.000.000 which contain digit 1 exactly as often as digit 2
My task is not to actually write an regular expression, but just to reasoning about the fact if it's possible or not.
My conclusion is that it's perfectly possible, but I would have to write down every possible string in my regex, which would be extremely long.
So far I have the following regular expression:
(([3-9]*[1]{n}[3-9]*[2]{n}[3-9]*){0,1000})
Where n is iterating towards it's maximum length of 1.000.000
I know that iterators is not a "thing" in the regular language. But is there an easier way to "iterate" n, instead of having to write it all by hand?
My conclusion is that it's perfectly possible, but I would have to write down every possible string in my regex, which would be extremely long.
For questions about theory, the fact that it would be extremely long is irrelevant. You're correct. Any finite language is regular.
I know that iterators is not a "thing" in the regular language. But is there an easier way to "iterate" n, instead of having to write it all by hand?
No, there isn't. Perhaps it would help to think about how to construct a state machine for this.
One state, the initial state, is the one where the number of 1s seen and the number of 2s seen are equal.
A second state is one where one more 1 has been seen than 2s.
A third state is one where one more 2 has been seen than 1s.
A fourth state is one where two more 1s have been seen than 2s. Etc...
It should be fairly straightforward to convince yourself that you cannot combine any of those states, as there is no overlap in the valid remaining substrings for each state. Therefore, you would need >1000000 states, and as the translation of a regular expression to a state machine is fairly direct, the complexity of a regular expression cannot be less than the complexity of the state machine.
Of course, when programming, you can implement this far more efficiently. You can for instance store the count of 1s and 2s in additional variables. But variables are a feature state machines do not have.
Is there any function in C++ (included in some library or header file), that would count an expression from string?
Let's say we have a string, which equals 2 + 3 * 8 - 5 (but it's taken from user's keyboard, so we don't know what expression exactly it will be while writing the code), and we want this function co count it, but of course in the correct order (1. power / root 2. times / divide 3. increase / decrease).
I've tried to take all the numbers to an array of ints and operators to an array of chars (okay, actually vectors, because I don't know how many numbers and operators it's going to contain), but I'm not sure what to do next.
Note, that I'm asking if there's any function already written for that, if not I'm going to just try it again.
By count, I'm taking that to mean "evaluate the expression".
You'll have to use a parser generator like boost::spirit to do this properly. If you try hand-writing this I guarantee pain and misery for all involved.
Try looking on here for calculator apps, there are several:
http://boost-spirit.com/repository/applications/show_contents.php
There are also some simple calculator-style grammars in the boost::spirit examples.
Quite surprisingly, others before me have not redirected you to this particular algorithm which is easy to implement and converts your string into this special thing called Reverse Polish Notation (RPN). Computing expressions in RPN is easy, the hard part is implementing Shunting Yard but it is something done many times before and you may find many tutorials on the subject.
Quick overview of the algorithms:
PRN - RPN is a way to write expressions that eliminates the need for parentheses, thus it allows easier computation. Practically speaking, to compute one such expression you walk the string from left to right while keeping a stack of operands. Whenever you meet an operand, you push it onto the stack. Whenever you meet an operation token, you calculate its result on the last 2 operands (if the operation is binary ofcourse, on the last only if it is unary) and push it onto the stack. Rinse and repeat until the end of string.
Shunting Yard really is much harder to simply overview and if I do try to accomplish this task, this answer will end up looking much like the wikipedia article I linked above so I'll save that trouble to both of us.
Tl;DR; Go read the links in the first sentence.
You will have to do this yourself as there is no standard library that handles infix equation resolution
If you do, please include the code for what you are trying and we can help you
I've been tasked with creating a simple spell checker for an assignment but have given next to no guidance so was wondering if anyone could help me out. I'm not after someone to do the assignment for me, but any direction or help with the algorithm would be awesome! If what I'm asking is not within the guildlines of the site then I'm sorry and I'll look elsewhere. :)
The project loads correctly spelled lower case words and then needs to make spelling suggestions based on two criteria:
One letter difference (either added or subtracted to get the word the same as a word in the dictionary). For example 'stack' would be a suggestion for 'staick' and 'cool' would be a suggestion for 'coo'.
One letter substitution. So for example, 'bad' would be a suggestion for 'bod'.
So, just to make sure I've explained properly.. I might load in the words [hello, goodbye, fantastic, good, god] and then the suggestions for the (incorrectly spelled) word 'godd' would be [good, god].
Speed is my main consideration here so while I think I know a way to get this work, I'm really not too sure about how efficient it'll be. The way I'm thinking of doing it is to create a map<string, vector<string>> and then for each correctly spelled word that's loaded in, add the correctly spelled work in as a key in the map and the populate the vector to be all the possible 'wrong' permutations of that word.
Then, when I want to look up a word, I'll look through every vector in the map to see if that word is a permutation of one of the correctly spelled word. If it is, I'll add the key as a spelling suggestion.
This seems like it would take up HEAPS of memory though, cause there would surely be thousands of permutations for each word? It also seems like it'd be very very slow if my initial dictionary of correctly spelled words was large?
I was thinking that maybe I could cut down time a bit by only looking in the keys that are similar to the word I'm looking at. But then again, if they're similar in some way then it probably means that the key will be a suggestion meaning I don't need all those permutations!
So yeah, I'm a bit stumped about which direction I should look in. I'd really appreciate any help as I really am not sure how to estimate the speed of the different ways of doing things (we haven't been taught this at all in class).
The simpler way to solve the problem is indeed a precomputed map [bad word] -> [suggestions].
The problem is that while the removal of a letter creates few "bad words", for the addition or substitution you have many candidates.
So I would suggest another solution ;)
Note: the edit distance you are describing is called the Levenshtein Distance
The solution is described in incremental step, normally the search speed should continuously improve at each idea and I have tried to organize them with the simpler ideas (in term of implementation) first. Feel free to stop whenever you're comfortable with the results.
0. Preliminary
Implement the Levenshtein Distance algorithm
Store the dictionnary in a sorted sequence (std::set for example, though a sorted std::deque or std::vector would be better performance wise)
Keys Points:
The Levenshtein Distance compututation uses an array, at each step the next row is computed solely with the previous row
The minimum distance in a row is always superior (or equal) to the minimum in the previous row
The latter property allow a short-circuit implementation: if you want to limit yourself to 2 errors (treshold), then whenever the minimum of the current row is superior to 2, you can abandon the computation. A simple strategy is to return the treshold + 1 as the distance.
1. First Tentative
Let's begin simple.
We'll implement a linear scan: for each word we compute the distance (short-circuited) and we list those words which achieved the smaller distance so far.
It works very well on smallish dictionaries.
2. Improving the data structure
The levenshtein distance is at least equal to the difference of length.
By using as a key the couple (length, word) instead of just word, you can restrict your search to the range of length [length - edit, length + edit] and greatly reduce the search space.
3. Prefixes and pruning
To improve on this, we can remark than when we build the distance matrix, row by row, one world is entirely scanned (the word we look for) but the other (the referent) is not: we only use one letter for each row.
This very important property means that for two referents that share the same initial sequence (prefix), then the first rows of the matrix will be identical.
Remember that I ask you to store the dictionnary sorted ? It means that words that share the same prefix are adjacent.
Suppose that you are checking your word against cartoon and at car you realize it does not work (the distance is already too long), then any word beginning by car won't work either, you can skip words as long as they begin by car.
The skip itself can be done either linearly or with a search (find the first word that has a higher prefix than car):
linear works best if the prefix is long (few words to skip)
binary search works best for short prefix (many words to skip)
How long is "long" depends on your dictionary and you'll have to measure. I would go with the binary search to begin with.
Note: the length partitioning works against the prefix partitioning, but it prunes much more of the search space
4. Prefixes and re-use
Now, we'll also try to re-use the computation as much as possible (and not just the "it does not work" result)
Suppose that you have two words:
cartoon
carwash
You first compute the matrix, row by row, for cartoon. Then when reading carwash you need to determine the length of the common prefix (here car) and you can keep the first 4 rows of the matrix (corresponding to void, c, a, r).
Therefore, when begin to computing carwash, you in fact begin iterating at w.
To do this, simply use an array allocated straight at the beginning of your search, and make it large enough to accommodate the larger reference (you should know what is the largest length in your dictionary).
5. Using a "better" data structure
To have an easier time working with prefixes, you could use a Trie or a Patricia Tree to store the dictionary. However it's not a STL data structure and you would need to augment it to store in each subtree the range of words length that are stored so you'll have to make your own implementation. It's not as easy as it seems because there are memory explosion issues which can kill locality.
This is a last resort option. It's costly to implement.
You should have a look at this explanation of Peter Norvig on how to write a spelling corrector .
How to write a spelling corrector
EveryThing is well explain in this article, as an example the python code for the spell checker looks like this :
import re, collections
def words(text): return re.findall('[a-z]+', text.lower())
def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model
NWORDS = train(words(file('big.txt').read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word):
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in splits if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
inserts = [a + c + b for a, b in splits for c in alphabet]
return set(deletes + transposes + replaces + inserts)
def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words): return set(w for w in words if w in NWORDS)
def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)
Hope you can find what you need on Peter Norvig website.
for spell checker many data structures would be useful for example BK-Tree. Check Damn Cool Algorithms, Part 1: BK-Trees I have done implementation for the same here
My Earlier code link may be misleading, this one is correct for spelling corrector.
off the top of my head, you could split up your suggestions based on length and build a tree structure where children are longer variations of the shorter parent.
should be quite fast but i'm not sure about the code itself, i'm not very well-versed in c++
I want to write a calculator which can take words as input.
for e.g. "two plus five multiply with 7" should give 37 as output.
I won't lie, this is a homework so before doing this, I thought if I can be pointed to something which might be useful for these kinds of things and I am not aware of.
Also, approach n how to do this would be ok too , I guess. It has to be written in C++. No other language would be accepted.
Thanks.
[Edit] -- Thanks for the answers. This is an introductory course. So keeping things as simple as possible would be appreciated. I should have mentioned this earlier.
[Edit 2] -- Reached a stage where I can give input in numbers and get correct output with precedence and all. Just want to see how to convert word to number now. Thanks to everybody who is trying to help. This is why SO rocks.
As long as the acceptable input is strict enough, writing a recursive descent parser should be easy. The grammar for this shouldn't differ much from a grammar for a simple calculator that only accepts digits and symbols.
With an std::istream class and the extraction operator (>>) you can easily break the input into separate words.
Here's a lecture that shows how to do this for a regular calculator with digits and symbols.
The main difference from that in your case is parsing the numbers. Instead of the symbols '0-9', your input will have words like "zero", "one", "nine", "hundreds", "thousands", etc. To parse such a stream of words you can do something like this:
break the words into groups, nested by the "multipliers", like "thousands", "hundreds", "billions", etc; these groups can nest. Tens ("ten", "twenty", "thirty"), teens ("eleven", "twelve", "thirteen", ...), and units ("one", "two", "three", ...) don't nest anything. You turn something like "one hundred and three thousands two hundred and ninety seven" into something like this:
+--------+---------+-----+
/ | | \
thousand hundred ninety seven
/ \ |
hundred three two
|
one
for each group, recursively sum all its components, and then multiply it by its "multiplier":
103297
+--------+------+----+
/ | | \
(* 1000) + (* 100) + 90 + 7
/ \ |
(* 100) + 3 2
|
1
This was some of my favorite stuff in school.
The right way to do this is to implement it as a parser -- write a grammar, tokenize your input, and parse away. Then you can evaluate it recursively.
If none of that sounds familiar, though (i.e. this is an introductory class) then you can hack it together in a much less robust way -- just do it how you'd do it naturally. Convert the sentence to numbers and operations, and then just do the operations.
NOTE: I'm assuming this is an introductory level course, and not a course in compiler theory. If you're being taught about specific things relevant to this (e.g. algorithms other than what I mention), you will almost certainly be expected apply those concepts.
First, you'll have to understand the individual words. For this purpose, you can likely just hack it together - read one word at a time and try to understand that. Gradually build up a function which can read the set of numbers you need to be able to work with.
If the input is simple enough (only expressions of the basic form you provide, no parentheses or anything), you can simply alternate between reading a number and an operator until the input is fully read (but if this is supposed to be robust, you need to stop and display an error if the last thing you read is an operator), so you can write separate methods for numbers and operators.
To understand how the expression needs to be calculated, use the shunting yard algorithm to parse the expression with proper operator precedence. You can likely combine the initial parsing of the words with this by simply using that to supply the tokens to the algorithm.
To actually calculate the result, you'll need to evaluate the parsed expression. To do that, you can simply use a stack for the output: when you would normally push an operator, you instead pop 2 values, apply the operator to them, and push the result. After the shunting yard algorithm is complete, there will be one value on the stack, containing the final result.
I am considering the problem of validating real numbers of various formats, because this is very similar to a problem I am facing in design.
Real numbers may come in different combinations of formats, for example:
1. with/without sign at the front
2. with/without a decimal point (if no decimal point, then perhaps number of decimals can be agreed beforehand)
3. base 10 or base 16
We need to allow for each combination, so there are 2x2x2=8 combinations. You can see that the complexity increases exponentially with each new condition imposed.
In OO design, you would normally allocate a class for each number format (e.g. in this case, we have 8 classes), and each class would have a separate validation function. However, with each new condition, you have to double the number of classes required and it soon becomes a nightmare.
In procedural programming, you use 3 flags (i.e. has_sign, has_decimal_point and number_base) to identify the property of the real number you are validating. You have a single function for validation. In there, you would use the flags to control its behaviour.
// This is part of the validation function
if (has_sign)
check_sign();
for (int i = 0; i < len; i++)
{
if (has_decimal_point)
// Check if number[i] is '.' and do something if it is. If not, continue
if (number_base = BASE10)
// number[i] must be between 0-9
else if (number_base = BASE16)
// number[i] must be between 0-9, A-F
}
Again, the complexity soon gets out of hand as the function becomes cluttered with if statements and flags.
I am sure that you have come across design problems of this nature before - a number of independent differences which result in difference in behaviour. I would be very interested to hear how have you been able to implement a solution without making the code completely unmaintainable.
Would something like the bridge pattern have helped?
In OO design, you would normally
allocate a class for each number
format (e.g. in this case, we have 8
classes), and each class would have a
separate validation function.
No no no no no. At most, you'd have a type for representing Numeric Input (in case String doesn't make it); another one for Real Number (in most languages you'd pick a built-in type, but anyway); and a Parser class, which has the knowledge to take a Numeric Input and transform it into a Real Number.
To be more general, one difference of behaviour in and by itself doesn't automatically map to one class. It can just be a property inside a class. Most importantly, behaviours should be treated orthogonally.
If (imagining that you write your own parser) you may have a sign or not, a decimal point or not, and hex or not, you have three independent sources of complexity and it would be ok to find three pieces of code, somewhere, that treat one of these issues each; but it would not be ok to find, anywhere, 2^3 = 8 different pieces of code that treat the different combinations in an explicit way.
Imagine that add a new choice: suddenly, you remember that numbers might have an "e" (such as 2.34e10) and want to be able to support that. With the orthogonal strategy, you'll have one more independent source of complexity, the fourth one. With your strategy, the 8 cases would suddenly become 16! Clearly a no-no.
I don't know why you think that the OO solution would involve a class for each number pattern. My OO solution would be to use a regular expression class. And if I was being procedural, I would probably use the standard library strtod() function.
You're asking for a parser, use one:
http://www.pcre.org/
http://www.complang.org/ragel/
sscanf
boost::lexical_cast
and plenty of other alternatives...
Also: http://en.wikipedia.org/wiki/Parser_generator
Now how do I handle complexity for this kind of problems ? Well if I can, I reformulate.
In your case, using a parser generator (or regular expression) is using a DSL (Domain Specific Language), that is a language more suited to the problem you're dealing with.
Design pattern and OOP are useful, but definitely not the best solution to each and every problem.
Sorry but since i use vb, what i do is a base function then i combine a evaluator function
so ill fake code it out the way i have done it
function getrealnumber(number as int){ return getrealnumber(number.tostring) }
function getrealnumber(number as float){ return getrealnumber(number.tostring) }
function getrealnumber(number as double){ return getrealnumber(number.tostring) }
function getrealnumber(number as string){
if ishex(){ return evaluation()}
if issigned(){ return evaluation()}
if isdecimal(){ return evaluation()}
}
and so forth up to you to figure out how to do binary and octal
You don't kill a fly with a hammer.
I realy feel like using a Object-Oriented solution for your problem is an EXTREME overkill. Just because you can design Object-Oriented solution , doesn't mean you have to force such one to every problem you have.
From my experience , almost every time there is a difficulty in finding an OOD solution to a problem , It probably mean that OOD is not appropiate. OOD is just a tool , its not god itself. It should be used to solve large scale problems , and not problems such one you presented.
So to give you an actual answer (as someone mentioned above) : use regular expression , Every solution beyond that is just an overkill.
If you insist using an OOD solution.... Well , since all formats you presented are orthogonal to each other , I dont see any need to create a class for every possible combination. I would create a class for each format and pass my input through each , in that case the complexity will grow linearly.