splitting up cipher text into substrings (vigenere cipher) - c++

I'm trying to implement an algorithm which will guess at a likely key length of a keyword of a vigenere cipher.
I'm going about the steps of finding the index of coincidence for each possible key length, but I can't figure out a way to split the cipher text into the substrings.
That is, I'm trying to take a certain cipher text like this
ERTEQSDFPQKCJAORIJARTARTAAFIHGNAPROEOHAGJEOIHJA (this is random text, there's no coded message here)
and split it up into different strings like this:
key length 2: ETQDP... (every second letter starting from position 0)
RESFQ... (every second letter starting from position 1)
key length 3: EEDQ.... (every third letter starting from position 0)
and so on.
Any ideas?
UPDATE
I've tried implementing my own code now, and here's what I've done:
void findKeyLength(string textToTest)
{
size_t length = textToTest.length();
vector<char> vectorTextChar;
//keeping key length to half the size of ciphertext; should be reasonable
for (size_t keylength = 1; keylength < length / 2; keylength++)
{
for (size_t i = keylength; i < keylength ; i++)
{
string subString = "";
for (size_t k = i; k < length; k+=i)
{
vectorTextChar.push_back(textToTest[k]);
}
for (vector<char>::iterator it= vectorTextChar.begin(); it!=vectorTextChar.end(); ++it)
{
subString += *it;
}
cout << subString << endl; //just to see what it looks like
cout << "Key Length : " << keylength << "IC: " << indexOfCoincidence(subString) << endl;
vectorTextChar.clear();
}
}
}
Like I've mentioned below, I'll have output which only reflects the substring that is based
on the first characters (i.e. 1, 3, 5, 7, 9 if the keylength is 2, but not 2, 4, 6, 8, 10...)

Not tested, but you could do something like this:
int len = cipherText.length;
char[] text2inspect = char[(len/keyLength) + 1]
for (int startIndex = 0; keyLength > startIndex; startIndex++){
int destIndex = 0;
for (int index = startIndex; len > index; index += keyLength){
text2inspect[destIndex++] = cipherText[index];
}
text2inspect[destIndex] = 0; // String termination
// add your inspection code here
}

Related

Why does my matrix print in 3 rows and multiple columns while i need the opposite? How can i also decrypt the message?

I am trying to encrypt a message by having it print in 3 columns and as many rows as needed. I also need help with reversing my encryption
I am having most difficulties printing my results. I am working with a Rail Fence Cipher key N which dictate number of columns. N = Number of Columns, but in my code I made it default = 3.
My Conditions are
.The plaintext is written, the sequence of each letter’s vertical position on the columns varies right and left in a repeating cycle
.The ciphertext is then read off in columns
int main()
{
int num_cols, num_spaces=0 ,f=0;
string message;
// accept the message from the user
cout<<" Enter the message to encrypt : ";
getline(cin,message);
num_cols=3;
// first we need to remove any spaces that might be there in the message
for(int i = 0 ; i<message.size(); ++i)
{
if(message[i] == ' ')
{
++num_spaces;
}
}
remove(message.begin(), message.end(),' ');
message.resize(message.size() - num_spaces);
cout<<"\n The equivalent cipher text would be : "<<endl;
vector<vector<char>> matrix(3,vector<char>(message.size(),' '));
// encrypt the message
for(int i=0,j=0; i<message.size(); i++)
{
matrix[j][i] = message[i];
if(j == 3-1)
{
f=1;
}
else if(j==0)
f=0;
if(f==0)
{
j++;
}
else j--;
}
// Printing the grid
for (int i = 0; i < 3; i++)
{
for (int j = 0; j < message.size(); j++)
{
cout << matrix[i][j];
}
cout << endl;
}
return 0;
// end
}
So, your first problem, the correct output is easy to solve. You had just a minor typo. The encrypted text will be printed with:
// Printing the grid
for (int i = 0; i < 3; i++)
{
for (int j = 0; j < matrix[i].size(); j++)
{
if (matrix[i][j] != ' ')
std::cout << matrix[i][j];
}
}
If you want to see the rails, then please use:
std::cout << "\n\n";
for (int i = 0; i < 3; i++)
{
for (int j = 0; j < matrix[i].size(); j++)
{
std::cout << matrix[i][j];
}
std::cout << '\n';
}
So, with that the question is answered.
Deciphering is similar simple.
You create again rails and do for the length of the string the same as before. But, do not put a letter in that position, but a marker, for example a '*'.
Then go rail by rail (row by row) through your rails, and everytime, when you find a marker, the replace that marker with the next character from the encrypted text.
Then go again zigzag through your rails, and collect all characters at the corresponding positions. The result will be the drcrypted text.
I prepared a complete solution for encryption and decryption. The number of rails can be specified at the top. In my example I use 5. Please change it to 3, if you want. I made it straigthforward. For easier understanding. The code should be further optimized. E.g., there are a lot of repetions that could be put into functions.
Anyway. It is just an example:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
// Please specify the number of rails:
constexpr size_t NumberOfRails = 5u;
constexpr char EmptyRailPosition = '.';
constexpr char UsedRailPosition = '*';
// Some aliases for easier writing and understanding
using Rail = std::vector<char>;
using Rails = std::vector<Rail>;
std::string clearText{ "abcd efgh ijkl mnopqrst uv wxyz" };
int main() {
// Remove all spaces from string
clearText.erase(std::remove_if(clearText.begin(), clearText.end(), ::isspace), clearText.end());
// Create the rails for the encryption
Rails encryptedRails(NumberOfRails, Rail(clearText.length(), EmptyRailPosition));
// Now encrypt
int railIndexIncrementor{ -1 };
size_t railIndex{};
// Go thorugh all characters in the clear string
for (size_t columnIndexInClearText{}; columnIndexInClearText < clearText.length(); ++columnIndexInClearText) {
// Write character on rail
encryptedRails[railIndex][columnIndexInClearText] = clearText[columnIndexInClearText];
// If we are at the top or at the bottom of our rails, then change direction
if (0 == (columnIndexInClearText % (NumberOfRails-1)))
railIndexIncrementor = -railIndexIncrementor;
// Modify rail index. Increment or decrement. Depending on railIncrementor
railIndex = static_cast<int>(railIndex) + railIndexIncrementor;
}
// Here we will store the encrypted message
std::string encryptedMessage{};
// Show rails
std::cout << "\nRails: \n\n";
for (const Rail& rail : encryptedRails) {
for (const char c : rail) {
std::cout << c;
// Build string with encrypted message
if (c != EmptyRailPosition) encryptedMessage += c;
}
std::cout << '\n';
}
// Show encrypted message
std::cout << "\n\nEncrypted message: \n\n" << encryptedMessage << "\n\n";
// ------------------------------------------------------------------------------------
// ------------------------------------------------------------------------------------
// ------------------------------------------------------------------------------------
// Decryption
// ------------------------------------------------------------------------------------
// Build empty rails
Rails decryptedRails(NumberOfRails, Rail(clearText.length(), '.'));
railIndexIncrementor = -1;
railIndex = 0;
// Go through all necessary positions in the decryption rails
for (size_t columnIndexInEncryptedText{}; columnIndexInEncryptedText < encryptedMessage.length(); ++columnIndexInEncryptedText) {
// Write makrker on rail
decryptedRails[railIndex][columnIndexInEncryptedText] = UsedRailPosition;
// If we are at the top or at the bottom of our rails, then change direction
if (0 == (columnIndexInEncryptedText % (NumberOfRails - 1)))
railIndexIncrementor = -railIndexIncrementor;
// Modify rail index. Increment or decrement. Depending on railIncrementor
railIndex = static_cast<int>(railIndex) + railIndexIncrementor;
}
// Show rails with markers
std::cout << "\n\nRails with markers: \n\n";
for (const Rail& rail : decryptedRails) {
for (const char c : rail) std::cout << c;
std::cout << '\n';
}
// Now exchange the markers with the characters of the encrypted string
size_t indexInEncryptedMessage{};
// Now, put a character from the string at positions with marker
for (Rail& rail : decryptedRails)
for (char& c : rail)
if (c == UsedRailPosition) c = encryptedMessage[indexInEncryptedMessage++];
// Show rails with replaced markers
std::cout << "\nRails with replaced markers, so decrypted rails:\n\n";
for (const Rail& rail : encryptedRails) {
for (const char c : rail) std::cout << c;
std::cout << '\n';
}
// Read back string from rails
railIndexIncrementor = -1;
railIndex = 0;
std::string decrytpedMessage{};
// Go through all necessary positions in the decryption rails
for (size_t columnIndexInEncryptedText{}; columnIndexInEncryptedText < encryptedMessage.length(); ++columnIndexInEncryptedText) {
// Read character from rail
decrytpedMessage += decryptedRails[railIndex][columnIndexInEncryptedText];
// If we are at the top or at the bottom of our rails, then change direction
if (0 == (columnIndexInEncryptedText % (NumberOfRails - 1)))
railIndexIncrementor = -railIndexIncrementor;
// Modify rail index. Increment or decrement. Depending on railIncrementor
railIndex = static_cast<int>(railIndex) + railIndexIncrementor;
}
// and, show decrypted message
std::cout << "\n\nDecypted message:\n\n" << decrytpedMessage << "\n\n";
return 0;
}

Why is my brute force substring search returning extra counts?

Doing some work with timing different algorithms, however my brute force implementation which I have found numerous times on different sites is sometimes returning more results than, say, Notepad++ search or VSCode search. Not sure what I am doing wrong.
The program opens a txt file with a DNA strand string of length 10000000 and searches and counts the number of occurrences of the string passed in via command line.
Algorithm:
int main(int argc, char *argv[]) {
// read in dna strand
ifstream file("dna.txt");
string dna((istreambuf_iterator<char>(file)), istreambuf_iterator<char>());
dna.c_str();
int dnaLength = dna.length();
cout << "DNA Strand Length: " << dnaLength << endl;
string pat = argv[1];
cout << "Pattern: " << pat << endl;
// algorithm
int M = pat.length();
int N = dnaLength;
int localCount = 0;
for (int i = 0; i <= N - M; i++) {
int j;
for (j = 0; j < M; j++) {
if (dna.at(i + j) != pat.at(j)) {
break;
}
}
if (j == M) {
localCount++;
}
}
The difference might be because your algorithm also counts overlapping results, while a quick check with Notepad++ shows that it does not.
Example:
Let dna be "FooFooFooFoo"
And your pattern "FooFoo"
What result do you expect? Notepad++ shows 2 (one starts at position 1, the second at position 7 (after the first).
Your algorithm will find 3 (position 1, 4 and 7)
In your algorithm, the index i increase by 1 every loop. This may cause double counting for some searching pattern. For eaxample, search for ABAB in the text ... ABABABABABAB .... The answer may be 5 times in your methods, and it would be 3 times if each character is not allowed to be double counted. Which answer you want?
To avoid double counting, you may rewrite the index i to a while loop:
i = 0;
while (i < M)
{
for (j = 0; j < M; j++) {
if (dna.at(i + j) != pat.at(j)) {
break;
}
}
if (j == M) {
localCount++;
i += M;
}
else ++i;
}
Or, you can employ the function std::string::find(const string&, int p=0). The first argument is the pattern to look for, and the second the position to start search:
int pos = 0, count=0;
pos = dna.find(pat); // initial serach start from pos=0;
while( pos != std::string::npos) { // while not end of string
++count;
pos = dna.find(pat, pos + M); // start search from pos+M
}
These two methods provide a self-confirmation for confidence.

Search a string for all occurrences of a substring in C++

Write a function countMatches that searches the substring in the given string and returns how many times the substring appears in the string.
I've been stuck on this awhile now (6+ hours) and would really appreciate any help I can get. I would really like to understand this better.
int countMatches(string str, string comp)
{
int small = comp.length();
int large = str.length();
int count = 0;
// If string is empty
if (small == 0 || large == 0) {
return -1;
}
// Increment i over string length
for (int i = 0; i < small; i++) {
// Output substring stored in string
for (int j = 0; j < large; j++) {
if (comp.substr(i, small) == str.substr(j, large)) {
count++;
}
}
}
cout << count << endl;
return count;
}
When I call this function from main, with countMatches("Hello", "Hello"); I get the output of 5. Which is completely wrong as it should return 1. I just want to know what I'm doing wrong here so I don't repeat the mistake and actually understand what I am doing.
I figured it out. I did not need a nested for loop because I was only comparing the secondary string to that of the string. It also removed the need to take the substring of the first string. SOOO... For those interested, it should have looked like this:
int countMatches(string str, string comp)
{
int small = comp.length();
int large = str.length();
int count = 0;
// If string is empty
if (small == 0 || large == 0) {
return -1;
}
// Increment i over string length
for (int i = 0; i < large; i++) {
// Output substring stored in string
if (comp == str.substr(i, small)) {
count++;
}
}
cout << count << endl;
return count;
}
The usual approach is to search in place:
std::string::size_type pos = 0;
int count = 0;
for (;;) {
pos = large.find(small, pos);
if (pos == std::string::npos)
break;
++count;
++pos;
}
That can be tweaked if you're not concerned about overlapping matches (i.e., looking for all occurrences of "ll" in the string "llll", the answer could be 3, which the above algorithm will give, or it could be 2, if you don't allow the next match to overlap the first. To do that, just change ++pos to pos += small.size() to resume the search after the entire preceding match.
The problem with your function is that you are checking that:
Hello is substring of Hello
ello is substring of ello
llo is substring of llo
...
of course this matches 5 times in this case.
What you really need is:
For each position i of str
check if the substring of str starting at i and of length = comp.size() is exactly comp.
The following code should do exactly that:
size_t countMatches(const string& str, const string& comp)
{
size_t count = 0;
for (int j = 0; j < str.size()-comp.size()+1; j++)
if (comp == str.substr(j, comp.size()))
count++;
return count;
}

Need help understanding a word jumble for loop

This code is part of a program that jumbles a word. I need help understanding how the for loop is working and creating the jumbled word. For example if theWord = "apple" the output would be something like: plpea. So I want to know whats going on in the for loop to make this output.
std::string jumble = theWord;
int length = theWord.size();
for (int i = 0; i < length; i++)
{
int index1 = (rand() % length);
int index2 = (rand() % length);
char temp = jumble[index1];
jumble[index1] = jumble[index2];
jumble[index2] = temp;
}
std::cout << jumble << std::endl;
I'll add comments on each line of the for loop:
for (int i = 0; i < length; i++) // basic for loop syntax. It will execute the same number of times as there are characters in the string
{
int index1 = (rand() % length); // get a random index that is 0 to the length of the string
int index2 = (rand() % length); // Does the same thing, gets a random index
char temp = jumble[index1]; // Gets the character at the random index
jumble[index1] = jumble[index2]; // set the value at the first index to the value at the second
jumble[index2] = temp; // set the value at the second index to the vaue of the first
// The last three lines switch two characters
}
You can think of it like this: For each character in the string, switch two characters in the string.
Also the % (or the modulus operator) just gets the remainder Understanding The Modulus Operator %
It's also important to understand that myString[index] will return whatever character is at that index. Ex: "Hello world"[1] == "e"

Longest common substring from more than two strings - C++

I need to compute the longest common substrings from a set of filenames in C++.
Precisely, I have an std::list of std::strings (or the QT equivalent, also fine)
char const *x[] = {"FirstFileWord.xls", "SecondFileBlue.xls", "ThirdFileWhite.xls", "ForthFileGreen.xls"};
std::list<std::string> files(x, x + sizeof(x) / sizeof(*x));
I need to compute the n distinct longest common substrings of all strings, in this case e.g. for n=2
"File" and ".xls"
If I could compute the longest common subsequence, I could cut it out it and run the algorithm again to get the second longest, so essentially this boils down to:
Is there a (reference?) implementation for computing the LCS of a std::list of std::strings?
This is not a good answer but a dirty solution that I have - brute force on a QList of QUrls from which only the part after the last "/" is taken. I'd love to replace this with "proper" code.
(I have discovered http://www.icir.org/christian/libstree/ - which would help greatly, but I can't get it to compile on my machine. Someone used this maybe?)
QString SubstringMatching::getMatchPattern(QList<QUrl> urls)
{
QString a;
int foundPosition = -1;
int foundLength = -1;
for (int i=urls.first().toString().lastIndexOf("/")+1; i<urls.first().toString().length(); i++)
{
bool hit=true;
int xj;
for (int j=0; j<urls.first().toString().length()-i+1; j++ ) // try to match from position i up to the end of the string :: test character at pos. (i+j)
{
if (!hit) break;
QString firstString = urls.first().toString().right( urls.first().toString().length()-i ).left( j ); // this needs to match all k strings
//qDebug() << "SEARCH " << firstString;
for (int k=1; k<urls.length(); k++) // test all other strings, k = test string number
{
if (!hit) break;
//qDebug() << " IN " << urls.at(k).toString().right(urls.at(k).toString().length() - urls.at(k).toString().lastIndexOf("/")+1);
//qDebug() << " RES " << urls.at(k).toString().indexOf(firstString, urls.at(k).toString().lastIndexOf("/")+1);
if (urls.at(k).toString().indexOf(firstString, urls.at(k).toString().lastIndexOf("/")+1)<0) {
xj = j;
//qDebug() << "HIT LENGTH " << xj-1 << " : " << firstString;
hit = false;
}
}
}
if (hit) xj = urls.first().toString().length()-i+1; // hit up to the end of the string
if ((xj-2)>foundLength) // have longer match than existing, j=1 is match length
{
foundPosition = i; // at the current position
foundLength = xj-1;
//qDebug() << "Found at " << i << " length " << foundLength;
}
}
a = urls.first().toString().right( urls.first().toString().length()-foundPosition ).left( foundLength );
//qDebug() << a;
return a;
}
If as you say suffix trees are too heavyweight or otherwise impractical, the following
fairly simple brute-force approach may be adequate for your application.
I assume distinct substrings shall be non-overlapping and are picked from
left to right.
Even with these assumptions, there need not be a unique set that comprises
"the N distinct longest common substrings" of a set of strings. Whatever N is,
there might be more than N distinct common substrings all of the same maximal
length and any choice of N from among them would be arbitrary. Accordingly
the solution finds the at-most N *sets* of the longest distinct common
substrings in which all those of the same length are one set.
The algorithm is as follows:
Q is the target quota of lengths.
Strings is the problem set of strings.
Results is an initially empty multimap that maps a length to a set of strings,
Results[l] being the set with length l
N, initially 0, is the number of distinct lengths represented in Results
If Q is 0 or Strings is empty return Results
Find any shortest member of Strings; keep a copy of it S and remove it
from Strings. We proceed by comparing the substrings of S with those
of Strings because all the common substrings of {Strings, S} must be
substrings of S.
Iteratively generate all the substrings of S, longest first, using the
obvious nested loop controlled by offset and length. For each substring ss of
S:
If ss is not a common substring of Strings, next.
Iterate over Results[l] for l >= the length of ss until end of
Results or until ss is found to be a substring of the examined
result. In the latter case, ss is not distinct from a result already
in hand, so next.
ss is common substring distinct from any already in hand. Iterate over
Results[l] for l < the length of ss, deleting each result that is a
substring of ss, because all those are shorter than ss and not distinct
from it. ss is now a common substring distinct from any already in hand and
all others that remain in hand are distinct from ss.
For l = the length of ss, check whether Results[l] exists, i.e. if
there are any results in hand the same length as ss. If not, call that
a NewLength condition.
Check also if N == Q, i.e. we have already reached the target quota of distinct
lengths. If NewLength obtains and also N == Q, call that a StickOrRaise condition.
If StickOrRaise obtains then compare the length of ss with l = the
length of the shortest results in hand. If ss is shorter than l
then it is too short for our quota, so next. If ss is longer than l
then all the shortest results in hand are to be ousted in favour of ss, so delete
Results[l] and decrement N.
Insert ss into Results keyed by its length.
If NewLength obtains, increment N.
Abandon the inner iteration over substrings of S that have the
same offset of ss but are shorter, because none of them are distinct
from ss.
Advance the offset in S for the outer iteration by the length of ss,
to the start of the next non-overlapping substring.
Return Results.
Here is a program that implements the solution and demonstrates it with
a list of strings:
#include <list>
#include <map>
#include <string>
#include <iostream>
#include <algorithm>
using namespace std;
// Get a non-const iterator to the shortest string in a list
list<string>::iterator shortest_of(list<string> & strings)
{
auto where = strings.end();
size_t min_len = size_t(-1);
for (auto i = strings.begin(); i != strings.end(); ++i) {
if (i->size() < min_len) {
where = i;
min_len = i->size();
}
}
return where;
}
// Say whether a string is a common substring of a list of strings
bool
is_common_substring_of(
string const & candidate, list<string> const & strings)
{
for (string const & s : strings) {
if (s.find(candidate) == string::npos) {
return false;
}
}
return true;
}
/* Get a multimap whose keys are the at-most `quota` greatest
lengths of common substrings of the list of strings `strings`, each key
multi-mapped to the set of common substrings of that length.
*/
multimap<size_t,string>
n_longest_common_substring_sets(list<string> & strings, unsigned quota)
{
size_t nlengths = 0;
multimap<size_t,string> results;
if (quota == 0) {
return results;
}
auto shortest_i = shortest_of(strings);
if (shortest_i == strings.end()) {
return results;
}
string shortest = *shortest_i;
strings.erase(shortest_i);
for ( size_t start = 0; start < shortest.size();) {
size_t skip = 1;
for (size_t len = shortest.size(); len > 0; --len) {
string subs = shortest.substr(start,len);
if (!is_common_substring_of(subs,strings)) {
continue;
}
auto i = results.lower_bound(subs.size());
for ( ;i != results.end() &&
i->second.find(subs) == string::npos; ++i) {}
if (i != results.end()) {
continue;
}
for (i = results.begin();
i != results.end() && i->first < subs.size(); ) {
if (subs.find(i->second) != string::npos) {
i = results.erase(i);
} else {
++i;
}
}
auto hint = results.lower_bound(subs.size());
bool new_len = hint == results.end() || hint->first != subs.size();
if (new_len && nlengths == quota) {
size_t min_len = results.begin()->first;
if (min_len > subs.size()) {
continue;
}
results.erase(min_len);
--nlengths;
}
nlengths += new_len;
results.emplace_hint(hint,subs.size(),subs);
len = 1;
skip = subs.size();
}
start += skip;
}
return results;
}
// Testing ...
int main()
{
list<string> strings{
"OfBitWordFirstFileWordZ.xls",
"SecondZWordBitWordOfFileBlue.xls",
"ThirdFileZBitWordWhiteOfWord.xls",
"WordFourthWordFileBitGreenZOf.xls"};
auto results = n_longest_common_substring_sets(strings,4);
for (auto const & val : results) {
cout << "length: " << val.first
<< ", substring: " << val.second << endl;
}
return 0;
}
Output:
length: 1, substring: Z
length: 2, substring: Of
length: 3, substring: Bit
length: 4, substring: .xls
length: 4, substring: File
length: 4, substring: Word
(Built with gcc 4.8.1)