compare two alphanumeric string - c++

I need to compare string into following way. Can anyone provide me some insight or algorithm in c++.
For example:
"a5" < "a11" - because 5 is less than 11
"6xxx < 007asdf" - because 6 < 7
"00042Q < 42s" - because Q < s alphabetically
"6 8" < "006 9" - because 8 < 9

I suggest you look at the algorithm strverscmp uses - indeed it might be that this function will do the job for you.
What this function does is the following. If both strings are equal,
return 0. Otherwise find the position between two bytes with the
property that before it both strings are equal, while directly after
it there is a difference. Find the largest consecutive digit strings
containing (or starting at, or ending at) this position. If one or
both of these is empty, then return what strcmp(3) would have
returned (numerical ordering of byte values). Otherwise, compare both
digit strings numerically, where digit strings with one or more
leading zeros are interpreted as if they have a decimal point in front
(so that in particular digit strings with more leading zeros come
before digit strings with fewer leading zeros). Thus, the ordering is
000, 00, 01, 010, 09, 0, 1, 9, 10.

Your examples only show digits, letters, and spaces. So for the moment I'll assume you ignore every other symbol (effectively treat them as spaces). You also seem to want to treat uppercase and lowercase letters as equivalent.
It also appears that you interpret runs of digits as a "term" and runs of letters as a "term", with any transition between a letter and a digit being equivalent to a space. A single space is considered equivalent to any number of spaces.
(Note: You are conspicuously missing an example of what to do in cases like:
"5a" vs "a11"
"a5" vs "11a"
So you have to work out what to do when you face a comparison of a numeric term with a string term. You also don't mention intrinsic equalities...such as should "5 a" == "5a" just because "5 a" < "5b"?)
One clear way of doing this would be turn the strings into std::vector of "terms", and then compare these vectors (rather than trying to compare the strings directly). These terms would be either numeric or string. This might help get you started, especially the STL answer:
how to split a string value that contains characters and numbers
Trickier methods that worked on the strings themselves without making an intermediary will be faster in one-off comparisons. But they'll likely be harder to understand and modify, and perhaps slower if you are going to repeatedly compare the same structures.
A nice aspect of parsing into a structure is that you get an intrinsic "cleanup" of the data in the process. Getting the information into a canonical form is often a goal in programs that are tolerating such a variety of inputs.

I'm assuming that you want the compare to be done in this order: presence of digits in range 1-9; value of digits; number of digits; value of the string after the digits.
It's in C, but you can easily transform it into using the C++ std::string class.
int isdigit(int c)
{
return c >= '1' && c <= '9';
}
int ndigits(const char *s)
{
int i, nd = 0;
int n = strlen(s);
for (i = 0; i < n; i++) {
if (isdigit(s[i]))
nd++;
}
return nd;
}
int compare(const char *s, const char *t)
{
int sd, td;
int i, j;
sd = ndigits(s);
td = ndigits(t);
/* presence of digits */
if (!sd && !td)
return strcasecmp(s, t);
else if (!sd)
return 1;
else if (!td)
return -1;
/* value of digits */
for (i = 0, j = 0; i < sd && j < td; i++, j++) {
while (! isdigit(*s))
s++;
while (! isdigit(*t))
t++;
if (*s != *t)
return *s - *t;
s++;
t++;
}
/* number of digits */
if (i < sd)
return 1;
else if (j < td)
return -1;
/* value of string after last digit */
return strcasecmp(s, t);
}

Try this and read about std::string.compare:
#include <iostream>
using namespace std;
int main(){
std::string fred = "a5";
std::string joe = "a11";
char x;
if ( fred.compare( joe ) )
{
std::cout << "fred is less than joe" << std::endl;
}
else
{
std::cout << "joe is less than fred" << std::endl;
}
cin >> x;
}

Related

How to solve Permutation of a phone number problem

The Problem:
A company is distributing phone numbers to its employees to make things easier. the next digit cannot be equal to the last is the only rule for example 0223 is not allowed while 2023 is allowed. At least three digits will be excluded every time. Write a function that takes in a length of the phone number and the digits that will be excluded. The function should print all possible phone numbers.
I got this question in an interview and I have seen one like it before at my university. It is a permutation problem. My question is what is the best way or decent way to solve this without a million for loops.
I do understand that this is technically how it works
length of phone number = 3;
[0-9], [0-9] excluding the last digit, [0-9] excluding the last digit
but I am unsure on how the best way to turn this into code. Any language is accepted!
thank you:
Also I might be asking this in the wrong place. please let me know if I am.
A simple way to solve this problem could be using Recursion. Here's my commented C++ code:
void solve(int depth, int size, vector <int> &curr_seq){
// If the recursion depth is equal to size, that means we've decided size
// numbers, which means that curr_seq.size() == size. In other words, we've
// decided enough numbers at this point to create a complete phone number, so
// we print it and return.
if(depth == size){
for(int item : curr_seq){
cout << item;
}
cout << "\n";
return;
}
// Try appending every possible digit to the current phone number
for(int i = 0; i <= 9; ++i){
// Make sure to only append the digit i if it is not equal to the last digit
// of the phone number. We can also append it, however, if curr_seq
// is empty (because that means that we haven't decided the 1st digit yet).
if(curr_seq.empty() || curr[curr.size() - 1] != i){
curr_seq.push_back(i);
solve(depth + 1, size, curr);
curr_seq.pop_back();
}
}
}
I think I like the recursive solution, but you can also just generate all permutations up to the limit (iterate), filter out any with repeating digits, and print the successful candidates:
#include <iomanip>
#include <iostream>
#include <sstream>
using namespace std;
// Because C/C++ still has no integer power function.
int ipow(int base, int exp) {
int result = 1;
for (;;) {
if (exp & 1)
result *= base;
exp >>= 1;
if (!exp)
return result;
base *= base;
}
}
void noconsec(const int len) {
int lim = ipow(10, len);
// For e.g. len 4 (lim 10000),
// obviously 00xx won't work, so skip anything smaller than lim / 100.
int start = (len <= 2) ? 0 : (lim / 100);
for (int num = start;num < lim;num++) {
// Convert to string.
std::stringstream ss;
ss << std::setw(len) << std::setfill('0') << num;
std::string num_s = ss.str();
// Skip any consecutive digits.
bool is_okay = true;
auto prev_digit = num_s[0];
for (int digit_idx = 1;digit_idx < num_s.length();digit_idx++) {
auto digit = num_s[digit_idx];
if (prev_digit == digit) {
is_okay = false;
}
prev_digit = digit;
}
// Output result.
if (is_okay) {
cout << num_s << "\n";
}
}
}
int main(const int argc, const char * const argv[]) {
noconsec(4);
}
Differences to note, this needs an integer power function to compute the limit. Converting an int to a string and then checking the string is more complex than constructing the string directly. I guess it could be useful if you have a list of integers already, but mostly I did it for fun.

Given two string S and T. Determine a substring of S that has minimum difference with T?

I have two string S and T where length of S >= length of T. I have to determine a substring of S which has same length as T and has minimum difference with T. Here difference between two strings of same length means, the number of indexes where they differ. For example: "ABCD" and "ABCE" differ at 3rd index, so their difference is 1.
I know I can use KMP(Knuth Morris Pratt) Pattern Searching algorithm to search T within S. But, what if S doesn't contain T as a substring? So, I have coded a brute force approach to solve this:
int main() {
string S, T;
cin >> S >> T;
int SZ_S = S.size(), SZ_T = T.size(), MinDifference = INT_MAX;
string ans;
for (int i = 0; i + SZ_T <= SZ_S; i++) { // I generate all the substring of S
int CurrentDifference = 0; // and check their difference with T
for (int j = 0; j < SZ_T; j++) { // and store the substring with minimum difference
if (S[i + j] != T[j])
CurrentDifference++;
}
if (CurrentDifference < MinDifference) {
ans = S.substr (i, SZ_T);
MinDifference = CurrentDifference;
}
}
cout << ans << endl;
}
But, my approach only works when S and T has shorter length. But, the problem is S and T can have length as large as 2 * 10^5. How can I approach this?
Let's maximize the number of characters that match. We can solve the problem for each character of the alphabet separately, and then sum up the results for
substrings. To solve the problem for a particular character, give string S and T as sequences 0 and 1 and multiply them using the FFT https://en.wikipedia.org/wiki/Fast_Fourier_transform.
Complexity O(|A| * N log N) where |A| size of the alphabet (for an uppercase letter is 26).

How to extract numbers used in string?

I've got a std::string number = "55353" and I want to extract the numbers that I've used in this string (5 and 3). Is there a function to do that? If so, please tell me it's name, I've been searching for quite a while now and still haven't found it...
UPD:
I've solved my problem (kinda)
std::string number(std::to_string(num));
std::string mas = "---------";
int k = 0;
for (int i = 0; i < number.size(); i++) {
char check = number[i];
for (int j = 0; j < mas.size(); j++) {
if (check == mas[j])
break;
if (check != mas[j] && check != mas[j+1]) {
mas[k] = check;
k++;
break;
}
}
}
mas.resize(k); mas.shrink_to_fit();
std::string mas will contain numbers that were used in std::string number which is a number converted to std::string using std::to_string().
Try this:
std::string test_data= "55335";
char digit_to_delete = '5';
unsigned int position = test_data.find();
test_data.erase(position, 1);
cout << "The changed string: " << test_data << "\n";
The algorithm is to find the number (as a character) within the string. The position is then used to erase the digit in the string.
Your question looks like homework, so I can guess what you forgot to tell us.
mas starts with ten -. If you spot a 5, you should replace the 6th (!) dash with a '5'. That "6th" is just an artifact of English. C++ starts to count at zero, not one. The position for zero is mas[0], the first element of the array.
The one tricky bit is to understand that characters in a string aren't numbers. The proper term for them is "(decimal) digits". And to get their numerical value, you have to subtract '0' - the character zero. So '5' - '0' == 5 - the character five minus the character zero is the number 5.

String having maximum number of given substrings made after swapping some characters?

So, this is an interview question that I was going through.
I have strings a, b, and c. I want to obtain string k by swapping some letters in a, so that k should contain as many non-overlapping substrings equal either to b or c as possible. Substring of string x is a string formed by consecutive segment of characters from x. Two substrings of string x overlap if there is position i in string x occupied by both of them.
Input: The first line contains string a, the second line contains string b, and the third line contains string c (1 ≤ |a|, |b|, |c| ≤ 10^5, where |s| denotes the length of string s).
All three strings consist only of lowercase English letters.
It is possible that b and c coincide.
Output: Find one of possible strings k.
Example:
I/P
abbbaaccca
ab
aca
O/P
ababacabcc
this optimal solutions has three non-overlaping substrings equal to either b or c on positions 1 – 2 (ab), 3 – 4 (ab), 5 – 7 (aca).
Now, the approach that I could think of was to make a character count array for each of the strings, and then proceed ahead. Basically, iterate over the original string (a), check for occurences of b and c. If not there, swap as many characters as possible to make either b or c (whichever is shorter). But, clearly this is not the optimal approach.
Can anyone suggest something better? (Only pseudocode will be enough)
Thanks!
First thing is you'll need to do is count the number of occurrences of each character of each string. The occurrences count of a will be your knapsack, whom you'll need to fill with as many b's or c's.
Note that when I say knapsack I mean the character count vector of a, and inserting b to a will mean reducing the character count vector of a by the character count vector of b.
I'm a little bit short with my mathematical prove, but you'll need to
insert as many b as possible to the knapsack
Insert as many c as possible to the knapsack (in the space that left after 1).
If a removal of a b from the knapsack will enable an insertion of more c, remove b from the knapsack. Otherwise, finish.
Fill as many c that you can to the knapsack
Repeat 3-4.
Throughout the program count the number of b and c in the knapsack and the output should be:
[b_count times b][c_count times c][char_occurrence_left_in_knapsack_for_char_x times char_x for each char_x in lower_case_english]
This should solve your problem at O(n).
Assuming that allowed characters have ASCII code 0-127, I would write a function to count the occurence of each character in a string:
int[] count(String s) {
int[] res = new int[128];
for(int i=0; i<res.length(); i++)
res[i] = 0;
for(int i=0; i<a.length(); i++)
res[i]++;
return res;
}
We can now count occurrences in each string:
int aCount = count(a);
int bCount = count(b);
int cCount = count(c);
We can then write a function to count how many times a string can be carved out of characters of another string:
int carveCount(int[] strCount, int[] subStrCount) {
int min = Integer.MAX_VALUE;
for(int i=0; i<subStrCount.length(); i++) {
if (subStrCount[i] == 0)
continue;
if (strCount[i] >= subStrCount[i])
min = Math.min(min, strCount[i]-subStrCount[i]);
else {
return 0;
}
}
for(int i=0; i<subStrCount.length(); i++) {
if (subStrCount[i] != 0)
strStrCount[i] -= min;
}
return min;
}
and call the function:
int bFitCount = carve(aCount, bCount);
int cFitCount = carve(aCount, cCount);
EDIT: I didn't realize you wanted all characters originally in a, fixing here.
Finally, to produce the output:
StringBuilder sb = new StringBuilder();
for(int i=0; i<bFitCount; i++) {
sb.append(b);
for(int i=0; i<cFitCount; i++) {
sb.append(c);
for(int i=0; i<aCount.length; i++) {
for(int j=0; j<aCount[i]; j++)
sb.append((char)i);
}
return sb.toString();
One more comment: if the goal is to maximize the number of repetitions(b)+repetitions(c), then you may want to first swab b and c if c is shorter. This way if they share some characters you have better chance of increasing the result.
The algorithm could be optimized further, but as it is it should have complexity O(n), where n is the sum of the length of the three strings.
A related problem is called Knapsack problem.
This is basically the solution described by #Tal Shalti.
I tried to keep everything readable.
My program return abbcabacac as one of the string with the most occurences (3).
To get all permutations without repeating a permutation I use std::next_permutation from algorithm. There not much happening in the main function. I only store the number of occurrences and the permutation, if a higher number of occurrences was achieved.
int main()
{
std::string word = "abbbaaccca";
std::string patternSmall = "ab";
std::string patternLarge = "aca";
unsigned int bestOccurrence = 0;
std::string bestPermutation = "";
do {
// count and remove occurrence
unsigned int occurrences = FindOccurences(word, patternLarge, patternSmall);
if (occurrences > bestOccurrence) {
bestOccurrence = occurrences;
bestPermutation = word;
std::cout << word << " .. " << occurences << std::endl;
}
} while (std::next_permutation(word.begin(), word.end()));
std::cout << "Best Permutation " << bestPermutation << " with " << bestOccurrence << " occurrences." << std::endl;
return 0;
}
This function handles the basic algorithm. pattern1 is the longer pattern, so it will be searched for last. If a pattern is found, it will be replaced with the string "##", since this should be very rare in the English language.
The variable occurrenceCounter keeps track of the number of found occurences.
unsigned int FindOccurrences(const std::string& word, const std::string& pattern1, const std::string& pattern2)
{
unsigned int occurrenceCounter = 0;
std::string tmpWord(word);
// '-1' makes implementation of while() easier
std::string::size_type i = -1;
i = -1;
while (FindPattern(tmpWord, pattern2, ++i)) {
occurrenceCounter++;
tmpWord.replace(tmpWord.begin() + i, tmpWord.begin() + i + pattern2.size(), "##");
}
i = -1;
while (FindPattern(tmpWord, pattern1, ++i)) {
occurrenceCounter++;
tmpWord.replace(tmpWord.begin() + i, tmpWord.begin() + i + pattern1.size(), "##");
}
return occurrenceCounter;
}
This function returns the first position of the found pattern. If the pattern is not found, std::string::npos is returned by string.find(...). Also string.find(...) starts to search for the pattern starting by index i.
bool FindPattern(const std::string& word, const std::string& pattern, std::string::size_type& i)
{
std::string::size_type foundPosition = word.find(pattern, i);
if (foundPosition == std::string::npos) {
return false;
}
i = foundPosition;
return true;
}

Comparing a char

So, I am trying to figure out the best/simplest way to do this. For my algorithms class we are supposed read in a string (containing up to 40 characters) from a file and use the first character of the string (data[1]...we are starting the array at 1 and wanting to use data[0] as something else later) as the number of rotations(up to 26) to rotate letters that follow (it's a Caesar cipher, basically).
An example of what we are trying to do is read in from a file something like : 2ABCD and output CDEF.
I've definitely made attempts, but I am just not sure how to compare the first letter in the array char[] to see which number, up to 26, it is. This is how I had it implemented (not the entire code, just the part that I'm having issues with):
int rotation = 0;
char data[41];
for(int i = 0; i < 41; i++)
{
data[i] = 0;
}
int j = 0;
while(!infile.eof())
{
infile >> data[j+1];
j++;
}
for(int i = 1; i < 27; i++)
{
if( i == data[1])
{
rotation = i;
cout << rotation;
}
}
My output is always 0 for rotation.
I'm sure the problem lies in the fact that I am trying to compare a char to a number and will probably have to convert to ascii? But I just wanted to ask and see if there was a better approach and get some pointers in the right direction, as I am pretty new to C++ syntax.
Thanks, as always.
Instead of formatted input, use unformatted input. Use
data[j+1] = infile.get();
instead of
infile >> data[j+1];
Also, the comparison of i to data[1] needs to be different.
for(int i = 1; i < 27; i++)
{
if( i == data[1]-'0')
// ^^^ need this to get the number 2 from the character '2'.
{
rotation = i;
std::cout << "Rotation: " << rotation << std::endl;
}
}
You can do this using modulo math, since characters can be treated as numbers.
Let's assume only uppercase letters (which makes the concept easier to understand).
Given:
static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const std::string original_text = "MY DOG EATS HOMEWORK";
std::string encrypted_text;
The loop:
for (unsigned int i = 0; i < original_text.size(); ++i)
{
Let's convert the character in the string to a number:
char c = original_text[i];
unsigned int cypher_index = c - 'A';
The cypher_index now contains the alphabetic offset of the letter, e.g. 'A' has index of 0.
Next, we rotate the cypher_index by adding an offset and using modulo arithmetic to "circle around":
cypher_index += (rotation_character - 'A'); // Add in the offset.
cypher_index = cypher_index % sizeof(letters); // Wrap around.
Finally, the new, shifted, letter is created by looking up in the letters array and append to the encrypted string:
encrypted_text += letters[cypher_index];
} // End of for loop.
The modulo operation, using the % operator, is great for when a "wrap around" of indices is needed.
With some more arithmetic and arrays, the process can be expanded to handle all letters and also some symbols.
First of all you have to cast the data chars to int before comparing them, just put (int) before the element of the char array and you will be okay.
Second, keep in mind that the ASCII table doesn't start with letters. There are some funny symbols up until 60-so element. So when you make i to be equal to data[1] you are practically giving it a number way higher than 27 so the loop stops.
The ASCII integer value of uppercase letters ranges from 65 to 90. In C and its descendents, you can just use 'A' through 'Z' in your for loop:
change
for(int i = 1; i < 27; i++)
to
for(int i = 'A'; i <= 'Z'; i++)
and you'll be comparing uppercase values. The statement
cout << rotation;
will print the ASCII values read from infile.
How much of the standard library are you permitted to use? Something like this would likely work better:
#include <iostream>
#include <string>
#include <sstream>
int main()
{
int rotation = 0;
std::string data;
std::stringstream ss( "2ABCD" );
ss >> rotation;
ss >> data;
for ( int i = 0; i < data.length(); i++ ) {
data[i] += rotation;
}
// C++11
// for ( auto& c : data ) {
// c += rotation;
// }
std::cout << data;
}
Live demo
I used a stringstream instead of a file stream for this example, so just replace ss with your infile. Also note that I didn't handle the wrap-around case (i.e., Z += 1 isn't going to give you A; you'll need to do some extra handling here), because I wanted to leave that to you :)
The reason your rotation is always 0 is because i is never == data[1]. ASCII character digits do not have the same underlying numeric value as their integer representations. For example, if data[1] is '5', it's integer value is actually 49. Hint: you'll need to know these values when handle the wrap-around case. Do a quick google for "ANSI character set" and you'll see all the different values.
Your determination of the rotation is also flawed in that you're only checking data[1]. What happens if you have a two-digit number, like 10?