How to generate all possible 4 letter words in C++ - c++

I want to write a program that can give me all 4 letter words (from the dictionary or outside the dictionary). I code in C++. And by far, I've reached nowhere.
I'm simply a beginner in C++, I can apply the logic but I'm not introduced to advanced features in C++. It doesn't matter if it takes a long time for this program to end execution, I just want the solution.
For example:
abcd
king
ngik
cbda
play
lpay
payl
and so on (just a few of the millions of outputs I hope this program to output).
NOTE: The words generated need not make sense and I do not want to discard any combinations, I want it all.

I suggest looping say i from 0 to 26^4 - 1, each time outputting 'A' + i / (26*26*26), 'A' + i / (26*26) % 26, 'A' + i / 26 % 26, and 'A' + i % 26, then a newline.

Make a array that has all the possible letters in it(add numbers and symbols if you want).
Then use four nested for loops that loop a number from 0 to the length of the array.
Lets say that the loop number variables are a,b,c,d.
In the inner loop(the last one) you can output it as array[a] + array[b] + array[c] + array[d]
This gives all the possible combinations where you can add numbers and symbols aswell.

use recursion
explore DFS a 26-ary tree and output the letter every time you go down on the corresponding branch. depth = 4.
P.S. recursive algorithms eat stack memory like hell... so be sure your machine let programs have enough stack mem. You don't wanna run this on a pic micro with only 3-level call stack :-)

Related

If two digits are the same 0s then make 0 condition in C++ and Root

In my code, legends are running within a loop, and I am trying to show a graph with
0-10%
10-20%
and so on. The problem is when I write this code
legend->AddEntry(gr[i], Form("%d0-%d0 %%",i+0,i+1), "lep");
It shows
00-10%
10-20% etc
So how to not show 00, but 0 in the first line?
A small adaptation of the shown statement should be enough; use:
legend->AddEntry(gr[i], Form("%d-%d %%", i*10 , (i+1)*10), "lep");
Explanation:
Form("%d0-%d0 %%",i+0,i+1) seems to be some kind of string formatting, and i your loop variable which runs from 0 to 9, right? The shown Form statement just appends "0" hard-coded to the single digit in i; instead, you can multiply i by 10, resulting in the actual numbers you want printed; and since 10*0 is still 0, this will be a single digit still; so, replace the previous Form(...) call with Form("%d-%d %%", i*10, (i+1)*10) and you should have the result you want!
In case you're worrying that printing i*10 is "less efficient" than printing i with "0" suffix - don't. The formatting and output of the string is most probably orders of magnitude slower than the multiplication anyway, so any overhead of doing multiple multiplications is negligible.

Program is not accessing correct index of array.. Why?

I come to this site in need of help, after struggling with this problem for a few days now. I am trying to program a poem that accepts some data from standard input and then outputs a poem based on that data.
The code seems to be working, but it is not correct! It is giving me the wrong index of the array I am using. I would love extra eyes to help me with my code and let me know what I am doing wrong.
ALSO! For some reason, I am not able to access the third array of the char array... I tried to place "SIZE - 1" in there but it prints nothing... Would love to understand why this is. Does this look right?
// Program that accepts some data from standard input,
#include <iostream>
#include <cstring>
//here... extracted.
for (int sign = 0; sign < poem[line]; sign++)
{
if (line > word_count)
{
std::cout << " ";
print_poem(seed);
}
else
{
print_poem(seed);
}
You haven't mentioned what exactly the task is but I can at least explain to you parts of the problem.
Are the correct syllables being printed?
Let's assess if the correct syllables are being printed. I ran your code on my machine (with the input you provided that is "100 3 1 5 7 5") and got:
nahoewachi
tetsunnunoyasa
munahohuke
The syllable count of each line is fine (5,7,5) so that's not a problem.
The first syllable you have a problem with is chi in nahoewachi. I'm only illustrating why this syllable is being printed. You can apply the same logic to the rest.
Initially, the seed is 100. Before processing the first row, you apply generate_prnd, which gives 223. Before calculating chi, you print 4 other syllables (na, ho, e and wa). This means that you have applied generate_prnd 8 times before calculating the fifth syllable.
Applying generate_prnd 8 times on 223 gives 711. Applying one more time (to get row) gives 822.
822%9 = 3rd row (0 indexed).
Applying one more time (to get column) gives 361. 361%5 = 1st column.
Therefore the index for the fifth syllable is (3,1). The string at the (3,1)th index is "chi". Therefore, the correct syllable is being printed. The indexing is correct. There's a problem with your logic if you want a different syllable to be printed.
Now, let's assess why there aren't any spaces in your output.
In the example you provided, num_lines=3. The word_counts (actually syllable counts) are 5, 7 and 5. You are applying a space when line (which is always less than num_lines) is greater than word_count.
However, line is always less than word_count since the maximum value of line is 2 (num_lines - 1). Therefore, a space will never be printed.
P.S. If you are allocating memory using new, don't forget to deallocate using delete later.

Checking if a string contains an English sentence

As of right now, I decided to take a dictionary and iterate through the entire thing. Every time I see a newline, I make a string containing from that newline to the next newline, then I do string.find() to see if that English word is somewhere in there. This takes a VERY long time, each word taking about 1/2-1/4 a second to verify.
It is working perfectly, but I need to check thousands of words a second. I can run several windows, which doesn't affect the speed (Multithreading), but it still only checks like 10 a second. (I need thousands)
I'm currently writing code to pre-compile a large array containing every word in the English language, which should speed it up a lot, but still not get the speed I want. There has to be a better way to do this.
The strings I'm checking will look like this:
"hithisisastringthatmustbechecked"
but most of them contained complete garbage, just random letters.
I can't check for impossible compinations of letters, because that string would be thrown out because of the 'tm', in between 'thatmust'.
You can speed up the search by employing the Knuth–Morris–Pratt (KMP) algorithm.
Go through every dictionary word, and build a search table for it. You need to do it only once. Now your search for individual words will proceed at faster pace, because the "false starts" will be eliminated.
There are a lot of strategies for doing this quickly.
Idea 1
Take the string you are searching and make a copy of each possible substring beginning at some column and continuing through the whole string. Then store each one in an array indexed by the letter it begins with. (If a letter is used twice store the longer substring.
So the array looks like this:
a - substr[0] = "astringthatmustbechecked"
b - substr[1] = "bechecked"
c - substr[2] = "checked"
d - substr[3] = "d"
e - substr[4] = "echecked"
f - substr[5] = null // since there is no 'f' in it
... and so forth
Then, for each word in the dictionary, search in the array element indicated by its first letter. This limits the amount of stuff that has to be searched. Plus you can't ever find a word beginning with, say 'r', anywhere before the first 'r' in the string. And some words won't even do a search if the letter isn't in there at all.
Idea 2
Expand upon that idea by noting the longest word in the dictionary and get rid of letters from those strings in the arrays that are longer than that distance away.
So you have this in the array:
a - substr[0] = "astringthatmustbechecked"
But if the longest word in the list is 5 letters, there is no need to keep any more than:
a - substr[0] = "astri"
If the letter is present several times you have to keep more letters. So this one has to keep the whole string because the "e" keeps showing up less than 5 letters apart.
e - substr[4] = "echecked"
You can expand upon this by using the longest words starting with any particular letter when condensing the strings.
Idea 3
This has nothing to do with 1 and 2. Its an idea that you could use instead.
You can turn the dictionary into a sort of regular expression stored in a linked data structure. It is possible to write the regular expression too and then apply it.
Assume these are the words in the dictionary:
arun
bob
bill
billy
body
jose
Build this sort of linked structure. (Its a binary tree, really, represented in such a way that I can explain how to use it.)
a -> r -> u -> n -> *
|
b -> i -> l -> l -> *
| | |
| o -> b -> * y -> *
| |
| d -> y -> *
|
j -> o -> s -> e -> *
The arrows denote a letter that has to follow another letter. So "r" has to be after an "a" or it can't match.
The lines going down denote an option. You have the "a or b or j" possible letters and then the "i or o" possible letters after the "b".
The regular expression looks sort of like: /(arun)|(b(ill(y+))|(o(b|dy)))|(jose)/ (though I might have slipped a paren). This gives the gist of creating it as a regex.
Once you build this structure, you apply it to your string starting at the first column. Try to run the match by checking for the alternatives and if one matches, more forward tentatively and try the letter after the arrow and its alternatives. If you reach the star/asterisk, it matches. If you run out of alternatives, including backtracking, you move to the next column.
This is a lot of work but can, sometimes, be handy.
Side note I built one of these some time back by writing a program that wrote the code that ran the algorithm directly instead of having code looking at the binary tree data structure.
Think of each set of vertical bar options being a switch statement against a particular character column and each arrow turning into a nesting. If there is only one option, you don't need a full switch statement, just an if.
That was some fast character matching and really handy for some reason that eludes me today.
How about a Bloom Filter?
A Bloom filter, conceived by Burton Howard Bloom in 1970 is a
space-efficient probabilistic data structure that is used to test
whether an element is a member of a set. False positive matches are
possible, but false negatives are not; i.e. a query returns either
"inside set (may be wrong)" or "definitely not in set". Elements can
be added to the set, but not removed (though this can be addressed
with a "counting" filter). The more elements that are added to the
set, the larger the probability of false positives.
The approach could work as follows: you create the set of words that you want to check against (this is done only once), and then you can quickly run the "in/not-in" check for every sub-string. If the outcome is "not-in", you are safe to continue (Bloom filters do not give false negatives). If the outcome is "in", you then run your more sophisticated check to confirm (Bloom filters can give false positives).
It is my understanding that some spell-checkers rely on bloom filters to quickly test whether your latest word belongs to the dictionary of known words.
This code was modified from How to split text without spaces into list of words?:
from math import log
words = open("english125k.txt").read().split()
wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)
def infer_spaces(s):
"""Uses dynamic programming to infer the location of spaces in a string
without spaces."""
# Find the best match for the i first characters, assuming cost has
# been built for the i-1 first characters.
# Returns a pair (match_cost, match_length).
def best_match(i):
candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)
# Build the cost array.
cost = [0]
for i in range(1,len(s)+1):
c,k = best_match(i)
cost.append(c)
# Backtrack to recover the minimal-cost string.
costsum = 0
i = len(s)
while i>0:
c,k = best_match(i)
assert c == cost[i]
costsum += c
i -= k
return costsum
Using the same dictionary of that answer and testing your string outputs
>>> infer_spaces("hithisisastringthatmustbechecked")
294.99768817854056
The trick here is finding out what threshold you can use, keeping in mind that using smaller words makes the cost higher (if the algorithm can't find any usable word, it returns inf, since it would split everything to single-letter words).
In theory, I think you should be able to train a Markov model and use that to decide if a string is probably a sentence or probably garbage. There's another question about doing this to recognize words, not sentences: How do I determine if a random string sounds like English?
The only difference for training on sentences is that your probability tables will be a bit larger. In my experience, though, a modern desktop computer has more than enough RAM to handle Markov matrices unless you are training on the entire Library of Congress (which is unnecessary- even 5 or so books by different authors should be enough for very accurate classification).
Since your sentences are mashed together without clear word boundaries, it's a bit tricky, but the good news is that the Markov model doesn't care about words, just about what follows what. So, you can make it ignore spaces, by first stripping all spaces from your training data. If you were going to use Alice in Wonderland as your training text, the first paragraph would, perhaps, look like so:
alicewasbeginningtogetverytiredofsittingbyhersisteronthebankandofhavingnothingtodoonceortwiceshehadpeepedintothebookhersisterwasreadingbutithadnopicturesorconversationsinitandwhatistheuseofabookthoughtalicewithoutpicturesorconversation
It looks weird, but as far as a Markov model is concerned, it's a trivial difference from the classical implementation.
I see that you are concerned about time: Training may take a few minutes (assuming you have already compiled gold standard "sentences" and "random scrambled strings" texts). You only need to train once, you can easily save the "trained" model to disk and reuse it for subsequent runs by loading from disk, which may take a few seconds. Making a call on a string would take a trivially small number of floating point multiplications to get a probability, so after you finish training it, it should be very fast.

Having trouble groking multiple exclusive combinations in C++

I have an ordered string that I need to present to a user:
ABCCDDCBBBCBBDDBCAAA
Objects represented by 'B' are tagged, such that 2 Bs will have a '~' after them.
AB~CCDDCB~BBCBBDDBCAAA
AB~CCDDCBB~BCBBDDBCAAA
AB~CCDDCBBB~CBBDDBCAAA
and so on...
I've used the combinations library by Howard Hinnant and it works well for this simple case. My test code uses a vector of locations as ints that was sent through for_each_combination.
However, I'm lost as to what to do when I have multiple tags for B.
For example, 4Bs total need to be tagged, 2 by '~' and 2 by '#'
AB~CCDDCB~B#B#CBBDDBCAAA
AB#CCDDCB~B~B#CBBDDBCAAA
AB~CCDDCB#B~B#CBBDDBCAAA
AB#CCDDCB#B~B~CBBDDBCAAA
ABCCDDCB~B~B#CB#BDDBCAAA
and so on...
The pseudocode I've written out is a cascade. After the first for_each_combination, for each of the resulting combinations, copy every other location to another vector and do another for_each_combination.
Considering the number of combinations I will be working with, I'm hoping there's a better way.
this sounds like homework, so first of all i'm not going to just give you code, and secondly i added the [homework] tag.
now, with four markers to place, a reasonable solution is a set of nested for loops, four of them
the markers are restricted to 7 positions, so just count from 0 to 6, inclusive
then translate that to positions in the string
then output the string with the markers
i've verified that this is simple to do
if you need further help with it, just ask a new question (and show what you've got so far)
I believe I've answered my own question after a bit of looking.
First, I switched from using Howard's library to Hervé's combinations library. The main draw is that using next_combination allows me to chain the combination calculations together, like so:
do {
do {
cout << values << endl;
} while (next_combination(values.begin() + 3, values.begin() + 5, values.end()));
} while (next_combination(values.begin(), values.begin() + 3, values.end()));
I need to massage this into an iterator, but this is exactly what I need.

Generating letters of the english alphabet

I have homework due that states that I need to write a program that generates the first 15 letters of the english alphabet. I can't delcare and set 15 different variables or constants. The letters must be displayed in a number of columns initially set by the user. the numbers have to be aligned in columns. Can anyone help? Maximum number of columns is 7 and the minimum is 1.
Here's some pseudo-code to get you started. Read it, understand it, then try to implement it.
get numcols from user
if numcols < 1 or numcols > 7:
print error and exit
ch = 'a'
for count = 1 to 15:
output ch followed by space
add 1 to ch
if count is an integral multiplier of numcols:
output newline
endif
endfor
if numcols is not equal to 3 or 5:
output newline
endif
It's pitched at about the level of your homework (no fancy stuff and the smallest hint of awkwardness) and should map reasonably well into C code.
As part of this implementation, you should research:
the fact that character constants like 'a' are really integers in disguise.
remainder or modulus (%) operators and how/why they are useful here.
getting user input with scanf.
putchar for outputting characters.
why you have that final if statement :-)
Here is a hint:
ASCII code of A is 65, B is 66, C is 67 and so on. You can do it in a loop starting from 65 and going on for 15 iterations.
This can be done with two nested loops, one for the vertical and one for the horizontal. since the numbers are in sequence in value you can increment the variable for the character each time.
I don't want to give away more than that unless another user says I should. I've already given a lot of help and I'm sure you can figure out the rest.
If you feel you need more help I'll try to not give too much but explain more.