Roman Numerals to Arabic (vinculum) - reading characters in a string - c++

I am currently working on a project that converts Roman Numerals to Arabic Numbers and vice versa.
I am also responsible to implement concepts like vinculum, where if you put a bar on top of a Roman numeral, the numbers below will be multiplied by 1,000.
The problem I am having is I can get only one side working, meaning:
I can either just convert from Roman Numeral to Arabic without Vinculum:
ex. I = 1, II = 2
However, when this works my vinculum code does not work.
Here is a snippet of my code:
int romanToDecimal(char input[], size_t end) {
int roman = 0;
int vroman = 0;
for (int i = 0; i < strlen(input); ++i)
{
int s1 = value(input[i]);
int s2 = value(input[i]);
if (input[i] == '-')
{
for (int j = i - 1; j >= 0; --j)
{
roman = (roman + value(input[j]));
}
roman *= 1000;
for (int k = i + 1; k <= strlen(input); k++)
roman += value(input[k]);
}
else
roman += s1;
}
return roman;
}
We use '-' instead of the bar on top of the characters, because we cannot do that in computer easily. So IV-, would be 4000 and XI- would be 11,000 etc...
I understand that the way I am doing the loop is causing some numbers that were converted to add twice, because if(input[i] == '-') cycles through each character in the string one at a time.
OK, so my question is basically what is the logic to get it to work? So if the string contains '-' it will multiply the number by 1000, if the string does not contain '-' at ALL, then it will just convert as normal. Right now I believe what is happening is that when "if (input[i] == '-')" is false, that part of the code still runs, how do I not get it to run at all when the string contains '-'??

The posted code seems incomplete or at least has some unused (like end, which if it represents the length of string could be used in place of the following repeated strlen(input)) or meaningless (like s2) variables.
I can't understand the logic behind your "Vinculum" implementation, but the simple
roman += s1; // Where s1 = value(input[i]);
It's clearly not enough to parse a roman number, where the relative position of each symbol is important. Consider e.g. "IV", which is 4 (= 5 - 1), vs. "VI", which is 6 (= 5 + 1).
To parse the "subtractive" notation, you could store a partial result and compare the current digit to the previous one. Something like the following:
#include <stdio.h>
#include <string.h>
int value_of(char ch);
long decimal_from_roman(char const *str, size_t length)
{
long number = 0, partial = 0;
int value = 0, last_value = 0;
for (size_t i = 0; i < length; ++i)
{
if (str[i] == '-')
{
number += partial;
number *= 1000;
partial = 0;
continue;
}
last_value = value;
value = value_of(str[i]);
if (value == 0)
{
fprintf(stderr, "Wrong format.\n");
return 0;
}
if (value > last_value)
{
partial = value - partial;
}
else if (value < last_value)
{
number += partial;
partial = value;
}
else
{
partial += value;
}
}
return number + partial;
}
int main(void)
{
char const *tests[] = {
"I", "L", "XXX", "VI", "IV", "XIV", "XXIII-",
"MCM", "MCMXII", "CCXLVI", "DCCLXXXIX", "MMCDXXI", // 1900, 1912, 246, 789, 2421
"CLX", "CCVII", "MIX", "MLXVI" // 160, 207, 1009, 1066
};
int n_samples = sizeof(tests) / sizeof(*tests);
for (int i = 0; i < n_samples; ++i)
{
long number = decimal_from_roman(tests[i], strlen(tests[i]));
printf("%12ld %s\n", number, tests[i]);
}
return 0;
}
int value_of(char ch)
{
switch (ch)
{
case 'I':
return 1;
case 'V':
return 5;
case 'X':
return 10;
case 'L':
return 50;
case 'C':
return 100;
case 'D':
return 500;
case 'M':
return 1000;
default:
return 0;
}
}
Note that the previous code only checks for wrong characters, but doesn't discard strings like "MMMMMMMMMMIIIIIIIIIIIIIV". Consider it just a starting point and feel free to improve it.

Related

Run-length decompression using C++

I have a text file with a string which I encoded.
Let's say it is: aaahhhhiii kkkjjhh ikl wwwwwweeeett
Here the code for encoding, which works perfectly fine:
void Encode(std::string &inputstring, std::string &outputstring)
{
for (int i = 0; i < inputstring.length(); i++) {
int count = 1;
while (inputstring[i] == inputstring[i+1]) {
count++;
i++;
}
if(count <= 1) {
outputstring += inputstring[i];
} else {
outputstring += std::to_string(count);
outputstring += inputstring[i];
}
}
}
Output is as expected: 3a4h3i 3k2j2h ikl 6w4e2t
Now, I'd like to decompress the output - back to original.
And I am struggling with this since a couple days now.
My idea so far:
void Decompress(std::string &compressed, std::string &original)
{
char currentChar = 0;
auto n = compressed.length();
for(int i = 0; i < n; i++) {
currentChar = compressed[i++];
if(compressed[i] <= 1) {
original += compressed[i];
} else if (isalpha(currentChar)) {
//
} else {
//
int number = isnumber(currentChar).....
original += number;
}
}
}
I know my Decompress function seems a bit messy, but I am pretty lost with this one.
Sorry for that.
Maybe there is someone out there at stackoverflow who would like to help a lost and beginner soul.
Thanks for any help, I appreciate it.
Assuming input strings cannot contain digits (this cannot be covered by your encoding as e. g. both the strings "3a" and "aaa" would result in the encoded string "3a" – how would you ever want to decompose again?) then you can decompress as follows:
unsigned int num = 0;
for(auto c : compressed)
{
if(std::isdigit(static_cast<unsigned char>(c)))
{
num = num * 10 + c - '0';
}
else
{
num += num == 0; // assume you haven't read a digit yet!
while(num--)
{
original += c;
}
}
}
Untested code, though...
Characters in a string actually are only numerical values, though. You can consider char (or signed char, unsigned char) as ordinary 8-bit integers as well. And you can store a numerical value in such a byte, too. Usually, you do run length encoding exactly that way: Count up to 255 equal characters, store the count in a single byte and the character in another byte. One single "a" would then be encoded as 0x01 0x61 (the latter being the ASCII value of a), "aa" would get 0x02 0x61, and so on. If you have to store more than 255 equal characters you store two pairs: 0xff 0x61, 0x07 0x61 for a string containing 262 times the character a... Decoding then gets trivial: you read characters pairwise, first byte you interpret as number, second one as character – rest being trivial. And you nicely cover digits that way as well.
#include "string"
#include "iostream"
void Encode(std::string& inputstring, std::string& outputstring)
{
for (unsigned int i = 0; i < inputstring.length(); i++) {
int count = 1;
while (inputstring[i] == inputstring[i + 1]) {
count++;
i++;
}
if (count <= 1) {
outputstring += inputstring[i];
}
else {
outputstring += std::to_string(count);
outputstring += inputstring[i];
}
}
}
bool alpha_or_space(const char c)
{
return isalpha(c) || c == ' ';
}
void Decompress(std::string& compressed, std::string& original)
{
size_t i = 0;
size_t repeat;
while (i < compressed.length())
{
// normal alpha charachers
while (alpha_or_space(compressed[i]))
original.push_back(compressed[i++]);
// repeat number
repeat = 0;
while (isdigit(compressed[i]))
repeat = 10 * repeat + (compressed[i++] - '0');
// unroll releat charachters
auto char_to_unroll = compressed[i++];
while (repeat--)
original.push_back(char_to_unroll);
}
}
int main()
{
std::string deco, outp, inp = "aaahhhhiii kkkjjhh ikl wwwwwweeeett";
Encode(inp, outp);
Decompress(outp, deco);
std::cout << inp << std::endl << outp << std::endl<< deco;
return 0;
}
The decompression can't possibly work in an unambiguous way because you didn't define a sentinel character; i.e. given the compressed stream it's impossible to determine whether a number is an original single number or it represents the repeat RLE command. I would suggest using '0' as the sentinel char. While encoding, if you see '0' you just output 010. Any other char X will translate to 0NX where N is the repeat byte counter. If you go over 255, just output a new RLE repeat command

What's wrong with my dynamic programming algorithm with memoization?

*Sorry about my poor English. If there is anything that you don't understand, please tell me so that I can give you more information that 'make sence'.
**This is first time asking question in Stackoverflow. I've searched some rules for asking questions correctly here, but there should be something I missed. I welcome all feedback.
I'm currently solving algorithm problems to improve my skill, and I'm struggling with one question for three days. This question is from https://algospot.com/judge/problem/read/RESTORE , but since this page is in KOREAN, I tried to translate it in English.
Question
If there are 'k' pieces of partial strings given, calculate shortest string that includes all partial strings.
All strings consist only lowercase alphabets.
If there are more than 1 result strings that satisfy all conditions with same length, choose any string.
Input
In the first line of input, number of test case 'C'(C<=50) is given.
For each test case, number of partial string 'k'(1<=k<=15) is given in the first line, and in next k lines partial strings are given.
Length of partial string is between 1 to 40.
Output
For each testcase, print shortest string that includes all partial strings.
Sample Input
3
3
geo
oji
jing
2
world
hello
3
abrac
cadabra
dabr
Sample Output
geojing
helloworld
cadabrac
And here is my code. My code seems to work perfect with Sample Inputs, and when I made test inputs for my own and tested, everything worked fine. But when I submit this code, they say my code is 'wrong'.
Please tell me what is wrong with my code. You don't need to tell me whole fixed code, I just need sample inputs that causes error with my code. Added code description to make my code easier to understand.
Code Description
Saved all input partial strings in vector 'stringParts'.
Saved current shortest string result in global variable 'answer'.
Used 'cache' array for memoization - to skip repeated function call.
Algorithm I designed to solve this problem is divided into two function -
restore() & eraseOverlapped().
restore() function calculates shortest string that includes all partial strings in 'stringParts'.
Result of resotre() is saved in 'answer'.
For restore(), there are three parameters - 'curString', 'selected' and 'last'.
'curString' stands for currently selected and overlapped string result.
'selected' stands for currently selected elements of 'stringParts'. Used bitmask to make my algorithm concise.
'last' stands for last selected element of 'stringParts' for making 'curString'.
eraseOverlapped() function does preprocessing - it deletes elements of 'stringParts' that can be completly included to other elements before executing restore().
#include <algorithm>
#include <iostream>
#include <vector>
#include <cstring>
#include <string>
#define MAX 15
using namespace std;
int k;
string answer; // save shortest result string
vector<string> stringParts;
bool cache[MAX + 1][(1 << MAX) + 1]; //[last selected string][set of selected strings in Bitmask]
void restore(string curString, int selected=0, int last=0) {
//base case 1
if (selected == (1 << k) - 1) {
if (answer.empty() || curString.length() < answer.length())
answer = curString;
return;
}
//base case 2 - memoization
bool& ret = cache[last][selected];
if (ret != false) return;
for (int next = 0; next < k; next++) {
string checkStr = stringParts[next];
if (selected & (1 << next)) continue;
if (curString.empty())
restore(checkStr, selected + (1 << next), next + 1);
else {
int check = false;
//count max overlapping area of two strings and overlap two strings.
for (int i = (checkStr.length() > curString.length() ? curString.length() : checkStr.length())
; i > 0; i--) {
if (curString.substr(curString.size()-i, i) == checkStr.substr(0, i)) {
restore(curString + checkStr.substr(i, checkStr.length()-i), selected + (1 << next), next + 1);
check = true;
break;
}
}
if (!check) { // if there aren't any overlapping area
restore(curString + checkStr, selected + (1 << next), next + 1);
}
}
}
ret = true;
}
//check if there are strings that can be completely included by other strings, and delete that string.
void eraseOverlapped() {
//arranging string vector in ascending order of string length
int vectorLen = stringParts.size();
for (int i = 0; i < vectorLen - 1; i++) {
for (int j = i + 1; j < vectorLen; j++) {
if (stringParts[i].length() < stringParts[j].length()) {
string temp = stringParts[i];
stringParts[i] = stringParts[j];
stringParts[j] = temp;
}
}
}
//deleting included strings
vector<string>::iterator iter;
for (int i = 0; i < vectorLen-1; i++) {
for (int j = i + 1; j < vectorLen; j++) {
if (stringParts[i].find(stringParts[j]) != string::npos) {
iter = stringParts.begin() + j;
stringParts.erase(iter);
j--;
vectorLen--;
}
}
}
}
int main(void) {
int C;
cin >> C; // testcase
for (int testCase = 0; testCase < C; testCase++) {
cin >> k; // number of partial strings
memset(cache, false, sizeof(cache)); // initializing cache to false
string inputStr;
for (int i = 0; i < k; i++) {
cin >> inputStr;
stringParts.push_back(inputStr);
}
eraseOverlapped();
k = stringParts.size();
restore("");
cout << answer << endl;
answer.clear();
stringParts.clear();
}
}
After determining which string-parts can be removed from the list since they are contained in other string-parts, one way to model this problem might be as the "taxicab ripoff problem" problem (or Max TSP), where each potential length reduction by overlap is given a positive weight. Considering that the input size in the question is very small, it seems likely that they expect a near brute-force solution, with possibly some heuristic and backtracking or other form of memoization.
Thanks Everyone who tried to help me solve this problem. I actually solved this problem with few changes on my previous algorithm. These are main changes.
In my previous algorithm I saved result of restore() in global variable 'answer' since restore() didn't return anything, but in new algorithm since restore() returns mid-process answer string I no longer need to use 'answer'.
Used string type cache instead of bool type cache. I found out using bool cache for memoization in this algorithm was useless.
Deleted 'curString' parameter from restore(). Since what we only need during recursive call is one previously selected partial string, 'last' can replace role of 'curString'.
CODE
#include <algorithm>
#include <iostream>
#include <vector>
#include <cstring>
#include <string>
#define MAX 15
using namespace std;
int k;
vector<string> stringParts;
string cache[MAX + 1][(1 << MAX) + 1];
string restore(int selected = 0, int last = -1) {
if (selected == (1 << k) - 1) {
return stringParts[last];
}
if (last == -1) {
string ret = "";
for (int next = 0; next < k; next++) {
string resultStr = restore(selected + (1 << next), next);
if (ret.empty() || ret.length() > resultStr.length())
ret = resultStr;
}
return ret;
}
string& ret = cache[last][selected];
if (!ret.empty()) {
cout << "cache used in [" << last << "][" << selected << "]" << endl;
return ret;
}
string curString = stringParts[last];
for (int next = 0; next < k; next++) {
if (selected & (1 << next)) continue;
string checkStr = restore(selected + (1 << next), next);
int check = false;
string resultStr;
for (int i = (checkStr.length() > curString.length() ? curString.length() : checkStr.length())
; i > 0; i--) {
if (curString.substr(curString.size() - i, i) == checkStr.substr(0, i)) {
resultStr = curString + checkStr.substr(i, checkStr.length() - i);
check = true;
break;
}
}
if (!check)
resultStr = curString + checkStr;
if (ret.empty() || ret.length() > resultStr.length())
ret = resultStr;
}
return ret;
}
void EraseOverlapped() {
int vectorLen = stringParts.size();
for (int i = 0; i < vectorLen - 1; i++) {
for (int j = i + 1; j < vectorLen; j++) {
if (stringParts[i].length() < stringParts[j].length()) {
string temp = stringParts[i];
stringParts[i] = stringParts[j];
stringParts[j] = temp;
}
}
}
vector<string>::iterator iter;
for (int i = 0; i < vectorLen - 1; i++) {
for (int j = i + 1; j < vectorLen; j++) {
if (stringParts[i].find(stringParts[j]) != string::npos) {
iter = stringParts.begin() + j;
stringParts.erase(iter);
j--;
vectorLen--;
}
}
}
}
int main(void) {
int C;
cin >> C;
for (int testCase = 0; testCase < C; testCase++) {
cin >> k;
for (int i = 0; i < MAX + 1; i++) {
for (int j = 0; j < (1 << MAX) + 1; j++)
cache[i][j] = "";
}
string inputStr;
for (int i = 0; i < k; i++) {
cin >> inputStr;
stringParts.push_back(inputStr);
}
EraseOverlapped();
k = stringParts.size();
string resultStr = restore();
cout << resultStr << endl;
stringParts.clear();
}
}
This algorithm is much slower than the 'ideal' algorithm that the book I'm studying suggests, but it was fast enough to pass this question's time limit.

I Am Able To Go Outside Array Bounds

Given two strings, write a method to decide if one is an anagram/permutation of the other. This is my approach:
I wrote this function to check if 2 strings are anagrams (such as dog and god).
In ascii, a to z is 97 - 122.
Basically I have an array of bools that are all initially false. Everytime I encounter a char in string1, it marks it as true.
To check if its an anagram, I check if any chars of string2 are false (should be true if encountered in string1).
I'm not sure how but this works too: arr[num] = true; (shouldnt work because I dont take into account that ascii starts at 97 and thus goes out of bounds).
(Side note: is there a better approach than mine?)
EDIT: Thanks for all the responses! Will be carefully reading each one. By the way: not an assignment. This is a problem from a coding interview practice book
bool permutation(const string &str1, const string &str2)
{
// Cannot be anagrams if sizes are different
if (str1.size() != str2.size())
return false;
bool arr[25] = { false };
for (int i = 0; i < str1.size(); i++) // string 1
{
char ch = (char)tolower(str1[i]); // convert each char to lower
int num = ch; // get ascii
arr[num-97] = true;
}
for (int i = 0; i < str2.size(); i++) // string 2
{
char ch = (char)tolower(str2[i]); // convert char to lower
int num = ch; // get ascii
if (arr[num-97] == false)
return false;
}
return true;
}
There is nothing inherent in C++ arrays that prevents you from writing beyond the end of them. But, in doing so, you violate the contract you have with the compiler and it is therefore free to do what it wishes (undefined behaviour).
You can get bounds checking on "arrays" by using the vector class, if that's what you need.
As for a better approach, it's probably better if your array is big enough to cover every possible character (so you don't have to worry about bounds checking) and it shouldn't so much be a truth value as a count, so as to handle duplicate characters within the strings. If it's just a truth value, then here and her would be considered anagrams.
Even though you state it's not an assignment, you'll still learn more if you implement it yourself, so it's pseudo-code only from me. The basic idea would be:
def isAnagram (str1, str2):
# Different lengths means no anagram.
if len(str1) not equal to len(str2):
return false
# Initialise character counts to zero.
create array[0..255] (assumes 8-bit char)
for each index 0..255:
set count[index] to zero
# Add 1 for all characters in string 1.
for each char in string1:
increment array[char]
# Subtract 1 for all characters in string 2.
for each char in string2:
decrement array[char]
# Counts will be all zero for an anagram.
for each index 0..255:
if count[index] not equal to 0:
return false
return true
Working approach : with zero additional cost.
bool permutation(const std::string &str1, const std::string &str2)
{
// Cannot be anagrams if sizes are different
if (str1.size() != str2.size())
return false;
int arr[25] = {0 };
for (int i = 0; i < str1.size(); i++) // string 1
{
char ch = (char)tolower(str1[i]); // convert each char to lower
int num = ch; // get ascii
arr[num-97] = arr[num-97] + 1 ;
}
for (int i = 0; i < str2.size(); i++) // string 2
{
char ch = (char)tolower(str2[i]); // convert char to lower
int num = ch; // get ascii
arr[num-97] = arr[num-97] - 1 ;
}
for (int i =0; i< 25; i++) {
if (arr[i] != 0) {
return false;
}
}
return true;
}
Yes, C and C++ both doesn't carry out the index-out-of-bounds.
It is the duty of the programmer to make sure that the program logic doesn't cross the legitimate limits. It is the programmer who need to make checks for the violations.
Improved Code:
bool permutation(const string &str1, const string &str2)
{
// Cannot be anagrams if sizes are different
if (str1.size() != str2.size())
return false;
int arr[25] = { 0 }; //<-------- Changed
for (int i = 0; i < str1.size(); i++) // string 1
{
char ch = (char)tolower(str1[i]); // convert each char to lower
int num = ch; // get ascii
arr[num-97] += 1; //<-------- Changed
}
for (int i = 0; i < str2.size(); i++) // string 2
{
char ch = (char)tolower(str2[i]); // convert char to lower
int num = ch; // get ascii
arr[num-97] = arr[num-97] - 1 ; //<-------- Changed
}
for (int i =0; i< 25; i++) { //<-------- Changed
if (arr[i] != 0) { //<-------- Changed
return false; //<-------- Changed
}
}
return true;
}

Finding common characters in two strings

I am coding for the problem in which we got to count the number of common characters in two strings. Main part of the count goes like this
for(i=0; i < strlen(s1); i++) {
for(j = 0; j < strlen(s2); j++) {
if(s1[i] == s2[j]) {
count++;
s2[j] = '*';
break;
}
}
}
This goes with an O(n^2) logic. However I could not think of a better solution than this. Can anyone help me in coding with an O(n) logic.
This is very simple. Take two int arrays freq1 and freq2. Initialize all its elements to 0. Then read your strings and store the frequencies of the characters to these arrays. After that compare the arrays freq1 and freq2 to find the common characters.
It can be done in O(n) time with constant space.
The pseudo code goes like this :
int map1[26], map2[26];
int common_chars = 0;
for c1 in string1:
map1[c1]++;
for c2 in string2:
map2[c2]++;
for i in 1 to 26:
common_chars += min(map1[i], map2[i]);
Your current code is O(n^3) because of the O(n) strlens and produces incorrect results, for example on "aa", "aa" (which your code will return 4).
This code counts letters in common (each letter being counted at most once) in O(n).
int common(const char *a, const char *b) {
int table[256] = {0};
int result = 0;
for (; *a; a++)table[*a]++;
for (; *b; b++)result += (table[*b]-- > 0);
return result;
}
Depending on how you define "letters in common", you may have different logic. Here's some testcases for the definition I'm using (which is size of the multiset intersection).
int main(int argc, char *argv[]) {
struct { const char *a, *b; int want; } cases[] = {
{"a", "a", 1},
{"a", "b", 0},
{"a", "aa", 1},
{"aa", "a", 1},
{"ccc", "cccc", 3},
{"aaa", "aaa", 3},
{"abc", "cba", 3},
{"aasa", "asad", 3},
};
int fail = 0;
for (int i = 0; i < sizeof(cases) / sizeof(*cases); i++) {
int got = common(cases[i].a, cases[i].b);
if (got != cases[i].want) {
fail = 1;
printf("common(%s, %s) = %d, want %d\n",
cases[i].a, cases[i].b, got, cases[i].want);
}
}
return fail;
}
You can do it with 2n:
int i,j, len1 = strlen(s1), len2 = strlen(s2);
unsigned char allChars[256] = { 0 };
int count = 0;
for( i=0; i<len1; i++ )
{
allChars[ (unsigned char) s1[i] ] = 1;
}
for( i=0; i<len2; i++ )
{
if( allChars[ (unsigned char) s1[i] ] == 1 )
{
allChars[ (unsigned char) s2[i] ] = 2;
}
}
for( i=0; i<256; i++ )
{
if( allChars[i] == 2 )
{
cout << allChars[i] << endl;
count++;
}
}
Following code traverses each sting only once. So the complexity is O(n). One of the assumptions is that the upper and lower cases are considered same.
#include<stdio.h>
int main() {
char a[] = "Hello world";
char b[] = "woowrd";
int x[26] = {0};
int i;
int index;
for (i = 0; a[i] != '\0'; i++) {
index = a[i] - 'a';
if (index > 26) {
//capital char
index = a[i] - 'A';
}
x[index]++;
}
for (i = 0; b[i] != '\0'; i++) {
index = b[i] - 'a';
if (index > 26) {
//capital char
index = b[i] - 'A';
}
if (x[index] > 0)
x[index] = -1;
}
printf("Common characters in '%s' and '%s' are ", a, b);
for (i = 0; i < 26; i++) {
if (x[i] < 0)
printf("%c", 'a'+i);
}
printf("\n");
}
int count(string a, string b)
{
int i,c[26]={0},c1[26]={};
for(i=0;i<a.length();i++)
{
if(97<=a[i]&&a[i]<=123)
c[a[i]-97]++;
}
for(i=0;i<b.length();i++)
{
if(97<=b[i]&&b[i]<=123)
c1[b[i]-97]++;
}
int s=0;
for(i=0;i<26;i++)
{
s=s+abs(c[i]+c1[i]-(c[i]-c1[i]));
}
return (s);
}
This is much easier and better solution
for (std::vector<char>::iterator i = s1.begin(); i != s1.end(); ++i)
{
if (std::find(s2.begin(), s2.end(), *i) != s2.end())
{
dest.push_back(*i);
}
}
taken from here
C implementation to run in O(n) time and constant space.
#define ALPHABETS_COUNT 26
int commonChars(char *s1, char *s2)
{
int c_count = 0, i;
int arr1[ALPHABETS_COUNT] = {0}, arr2[ALPHABETS_COUNT] = {0};
/* Compute the number of occurances of each character */
while (*s1) arr1[*s1++-'a'] += 1;
while (*s2) arr2[*s2++-'a'] += 1;
/* Increment count based on match found */
for(i=0; i<ALPHABETS_COUNT; i++) {
if(arr1[i] == arr2[i]) c_count += arr1[i];
else if(arr1[i]>arr2[i] && arr2[i] != 0) c_count += arr2[i];
else if(arr2[i]>arr1[i] && arr1[i] != 0) c_count += arr1[i];
}
return c_count;
}
First, your code does not run in O(n^2), it runs in O(nm), where n and m are the length of each string.
You can do it in O(n+m), but not better, since you have to go through each string, at least once, to see if a character is in both.
An example in C++, assuming:
ASCII characters
All characters included (letters, numbers, special, spaces, etc...)
Case sensitive
std::vector<char> strIntersect(std::string const&s1, std::string const&s2){
std::vector<bool> presents(256, false); //Assuming ASCII
std::vector<char> intersection;
for (auto c : s1) {
presents[c] = true;
}
for (auto c : s2) {
if (presents[c]){
intersection.push_back(c);
presents[c] = false;
}
}
return intersection;
}
int main() {
std::vector<char> result;
std::string s1 = "El perro de San Roque no tiene rabo, porque Ramon Rodriguez se lo ha cortado";
std::string s2 = "Saint Roque's dog has no tail, because Ramon Rodriguez chopped it off";
//Expected: "S a i n t R o q u e s d g h l , b c m r z p"
result = strIntersect(s1, s2);
for (auto c : result) {
std::cout << c << " ";
}
std::cout << std::endl;
return 0;
}
Their is a more better version in c++ :
C++ bitset and its application
A bitset is an array of bool but each Boolean value is not stored separately instead bitset optimizes the space such that each bool takes 1 bit space only, so space taken by bitset bs is less than that of bool bs[N] and vector bs(N). However, a limitation of bitset is, N must be known at compile time, i.e., a constant (this limitation is not there with vector and dynamic array)
As bitset stores the same information in compressed manner the operation on bitset are faster than that of array and vector. We can access each bit of bitset individually with help of array indexing operator [] that is bs[3] shows bit at index 3 of bitset bs just like a simple array. Remember bitset starts its indexing backward that is for 10110, 0 are at 0th and 3rd indices whereas 1 are at 1st 2nd and 4th indices.
We can construct a bitset using integer number as well as binary string via constructors which is shown in below code. The size of bitset is fixed at compile time that is, it can’t be changed at runtime.
For more information about bitset visit the site : https://www.geeksforgeeks.org/c-bitset-and-its-application
The code is as follows :
// considering the strings to be of lower case.
int main()
{
string s1,s2;
cin>>s1>>s2;
//Declaration for bitset type variables
bitset<26> b_s1,b_s2;
// setting the bits in b_s1 for the encountered characters of string s1
for(auto& i : s1)
{
if(!b_s1[i-'a'])
b_s1[i-'a'] = 1;
}
// setting the bits in b_s2 for the encountered characters of string s2
for(auto& i : s2)
{
if(!b_s2[i-'a'])
b_s2[i-'a'] = 1;
}
// counting the number of set bits by the "Logical AND" operation
// between b_s1 and b_s2
cout<<(b_s1&b_s2).count();
}
No need to initialize and keep an array of 26 elements (numbers for each letter in alphabet). Just fo the following:
Using HashMap store letter as a key and integer got the count as a value.
Create a Set of characters.
Iterate through each string characters, add to the Set from step 2. If add() method returned false, (means that same character already exists in the Set), then add the character to the map and increment the value.
These steps are written considering Java programming language.
Python Code:
>>>s1='abbc'
>>>s2='abde'
>>>p=list(set(s1).intersection(set(s2)))
>>print(p)
['a','b']
Hope this helps you, Happy Coding!
can be easily done using the concept of "catching" which is a sub-algorithm of hashing.

testing a string to see if a number is present and asigning that value to a variable while skipping all the non-numeric values?

given a string say " a 19 b c d 20", how do I test to see if at that particular position on the string there is a number? (not just the character '1' but the whole number '19' and '20').
char s[80];
strcpy(s,"a 19 b c d 20");
int i=0;
int num=0;
int digit=0;
for (i =0;i<strlen(s);i++){
if ((s[i] <= '9') && (s[i] >= '0')){ //how do i test for the whole integer value not just a digit
//if number then convert to integer
digit = s[i]-48;
num = num*10+digit;
}
if (s[i] == ' '){
break; //is this correct here? do nothing
}
if (s[i] == 'a'){
//copy into a temp char
}
}
These are C solutions:
Are you just trying to parse the numbers out of the string? Then you can just walk the string using strtol().
long num = 0;
char *endptr = NULL;
while (*s) {
num = strtol(s, &endptr, 10);
if (endptr == s) { // Not a number here, move on.
s++;
continue;
}
// Found a number and it is in num. Move to next location.
s = endptr;
// Do something with num.
}
If you have a specific location and number to check for you can still do something similar.
For example: Is '19' at position 10?
int pos = 10;
int value = 19;
if (pos >= strlen(s))
return false;
if (value == strtol(s + pos, &endptr, 10) && endptr != s + pos)
return true;
return false;
Are you trying to parse out the numbers without using any library routines?
Note: I haven't tested this...
int num=0;
int sign=1;
while (*s) {
// This could be done with an if, too.
switch (*s) {
case '-':
sign = -1;
case '+':
s++;
if (*s < '0' || *s > '9') {
sign = 1;
break;
}
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
// Parse number, start with zero.
num = 0;
do {
num = (num * 10) + (*s - '0');
s++;
} while (*s >= '0' && *s <= '9');
num *= sign;
// Restore sign, just in case
sign = 1;
// Do something with num.
break;
default:
// Not a number
s++;
}
}
It seems like you want to parse the string and extract all the numbers from it; if so, here's a more "C++" way to do it:
string s = "a 19 b c d 20"; // your char array will work fine here too
istringstream buffer(s);
string token;
int num;
while (!buffer.eof())
{
buffer >> num; // Try to read a number
if (!buffer.fail()) { // if it doesn't work, failbit is set
cout << num << endl; // It's a number, do what you want here
} else {
buffer.clear(); // wasn't a number, clear the failbit
buffer >> token; // pull out the non-numeric token
}
}
This should print out the following:
19
20
The stream extraction operator pulls out space-delimited tokens automatically, so you're saved from having to do any messy character-level operations or manual integer conversion. You'll need to #include <sstream> for the stringstream class.
You can use atoi().
after your if you need to shift to while to collect subqsequent digits until you hit a non-digit.
BUT, more inportantly, have you clearly defined your requirements? Will you allow whitespace between the digits? What if there are two numbers, like abc123def456gh?
Its not very clear what you are looking for.. Assuming you want to extract all the digits from a string and then from a whole number from the found digits you can try the following:
int i;
unsigned long num=0; // to hold the whole number.
int digit;
for (i =0;i<s[i];i++){
// see if the ith char is a digit..if yes extract consecutive digits
while(isdigit(s[i])) {
num = num * 10 + (s[i] - '0');
i++;
}
}
It is assumed that all the digits in your string when concatenated to from the whole number will not overflow the long data type.
There's no way to test for a whole number. Writing a lexer, as you've done is one way to go. Another would be to try and use the C standard library's strtoul function (or some similar function depending on whether the string has floating point numbers etc).
Your code needs to allow for whitespaces and you can use the C library's isdigit to test if the current character is a digit or not:
vector<int> parse(string const& s) {
vector<int> vi;
for (size_t i = 0; i < s.length();) {
while (::isspace((unsigned char)s[ i ]) i++;
if (::isdigit((unsigned char)s[ i ])) {
int num = s[ i ] - '0';
while (::isdigit((unsigned char)s[ i ])) {
num = num * 10 + (s[ i ] - '0');
++i;
}
vi.push_back(num);
}
....
Another approach will be to use boost::lexical_cast:
vector<string> tokenize(string const& input) {
vector<string> tokens;
size_t off = 0, start = 0;
while ((off = input.find(' ', start)) != string::npos) {
tokens.push_back(input.substr(start, off-start));
start = off + 1;
}
return tokens;
}
vector<int> getint(vector<string> tokens) {
vector<int> vi;
for (vector<string> b = tokens.begin(), e = tokens.end(); b! = e; ++b) {
try
{
tokens.push_back(lexical_cast<short>(*b));
}
catch(bad_lexical_cast &) {}
}
return vi;
}