What's wrong with strcmp and c_str() here? - c++

The following code is my solution for the Largest Number. However it will crash.
If I directly compare tampa and tempb in cmp() by using tempa > tempb instead of strcmp(), it is OK. So what is wrong here?
#include <iostream>
#include <string>
#include <stack>
#include <vector>
#include <climits>
#include <cstdio>
#include <algorithm>
#include <sstream>
#include <cstring>
using namespace std;
bool cmp(string a, string b) {
string tempa = a + b;
string tempb = b + a;
int res = strcmp(tempa.c_str(), tempb.c_str());
if (res < 0) {
return false;
} else return true;
}
class Solution {
public:
string largestNumber(vector<int>& nums) {
vector<string> str;
string res = "0";
if (nums.empty()) {
return res;
}
for (int i = 0; i < nums.size(); ++i) {
stringstream ss;
ss << nums[i];
str.push_back(ss.str());
}
sort(str.begin(), str.end(), cmp);
res = "";
for (int i = 0; i < str.size(); ++i) {
res += str[i];
}
return res[0] == '0' ? "0" : res;
}
};
int main()
{
Solution sol;
int data[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
vector<int> nums(data, data + sizeof(data) / sizeof(data[0]));
string res = sol.largestNumber(nums);
cout << res << endl;
return 0;
}

Your comparison is not equivalent to tempa > tempb. It is equivalent to !(tempa < tempb) or tempa >= tempb. And a "greater than or equal" comparison does not satisfy the requirements of std::sort. Specifically, the comparison needs to be irreflexive, which is to say that for an operand A, cmp(A, A) should be false, but with >=, it is true.
If you want the strcmp equivalent of tempa > tempb, then do:
return strcmp(tempa.c_str(), tempb.c_str()) > 0;
On closer inspection, your comparison is fundamentally broken. Why are you concatenating a and b together to form temporary strings for comparison? As one simple example of what can go wrong there, an empty string will compare equal to every other string, because:
(string("anything") + "") == (string("") + "anything")

Related

Avoiding or improving brute force method: Counting character repetition from all words in a dictionary text file

I wrote this utility function that will take the contents of a alpha dictionary file and will add up the repetition count of each letter or character of the alphabet.
This is what I have so far:
#include <algorithm>
#include <fstream>
#include <iostream>
#include <map>
#include <string>
#include <vector>
// this function just generates a map of each of the alphabet's
// character position within the alphabet.
void initCharIndexMap( std::map<unsigned, char>& index ) {
char c = 'a';
for ( unsigned i = 1; i < 27; i++ ) {
index[i] = c;
c++;
}
}
void countCharacterRepetition( std::vector<std::string>& words, const std::map<unsigned, char> index, std::map<char, unsigned>& weights ) {
unsigned count = 0;
for ( auto& s : words ) {
std::transform(s.begin(), s.end(), s.begin(), ::tolower );
for ( std::size_t i = 0; i < s.length(); i++ ) {
using It = std::map<unsigned, char>::const_iterator;
for ( It it = index.cbegin(); it != index.cend(); ++it ) {
if ( s[i] == it->second ) {
count++;
weights[it->second] += count;
}
count = 0;
}
}
}
}
int main() {
std::vector<std::string> words;
std::string line;
std::ifstream file;
file.open( "words_alpha.txt" );
while( std::getline( file, line )
words.push_back(line);
std::map<unsigned, char> index;
initCharIndexMap(index);
std::map<char, unsigned> weights;
countCharRepetition(words, index, weights);
for (auto& w : weights)
std::cout << w.first << ' ' << w.second << '\n';
return EXIT_SUCCESS;
}
It gives me this output which appears to be valid at first glance:
a 295794
b 63940
c 152980
d 113190
e 376455
f 39238
g 82627
h 92369
i 313008
j 5456
k 26814
l 194915
m 105208
n 251435
o 251596
p 113662
q 5883
r 246141
s 250284
t 230895
u 131495
v 33075
w 22407
x 10493
y 70578
z 14757
The dictionary text file that I am using can be found from this github page.
This appears to be working. It took about 3 minutes to process on my current machine which isn't horrible, however, this seems like a brute force approach. Is there a more efficient way of doing a task like this?
If you're just counting how many times each character appears, then all you need is this:
int frequency[26] = {};
for (auto const& str : words) {
for (int i=0; i<str.size(); i++) {
frequency[tolower(str[i]) - 'a']++;
}
}
for (int i=0; i<26; i++) {
cout << char(i + 'a') << " " << frequency[i] << endl;
}
If you want to include upper and lowercase characters, change the array size to 90, remove the tolower call, and change your loop so that it prints only if i is between a and z or A and Z.
If you are just going for performance, I would say you still have to read in the file char by char - but I think all the searching is processing that could be optimised.
I would say the following pseudo code should be faster (I'll try and knock up an example later):
void read_dictionary(char *fileName)
{
// Pre-sized array (faster access)
std::array<int, 26> alphabet_count = {0};
// Open the file
FILE *file = fopen(fileName, "r");
if (file == NULL)
return; //could not open file
// Read through the file
char c;
while ((c = fgetc(file)) != EOF)
{
// If it is a letter a-z
if ( ((c >= 'a') && (c <= 'z')) ||
{
// Increment the array value for that letter
++alphabet_count[c - 'a'];
}
// else if letter A-Z
else if ( ((c >= 'A') && (c <= 'Z')) ||
{
// Increment the array value for that letter
++alphabet_count[c - 'A'];
}
}
}
The point here is that we are not searching for matches we are using the char value to index into the array to increment the alphabet letter
All of the aforementioned answers assume continuity between a and z, and history will tell you that is not always the case. A solution doesn't need to assume this, and can still be efficient.
#include <iostream>
#include <fstream>
#include <iterator>
#include <climits>
#include <cctype>
int main(int argc, char *argv[])
{
if (argc < 2)
return EXIT_FAILURE;
unsigned int count[1U << CHAR_BIT] {};
std::ifstream inp(argv[1]);
for (std::istream_iterator<char> it(inp), it_eof; it != it_eof; ++it)
++count[ std::tolower(static_cast<unsigned char>(*it)) ];
for (unsigned i=0; i<(1U << CHAR_BIT); ++i)
{
if (std::isalpha(i) && count[i])
std::cout << static_cast<char>(i) << ' ' << count[i] << '\n';
}
}
Output
[~ user]$ clang++ --std=c++14 -O2 -o main main.cpp
[~ user] time ./main /usr/share/dict/words
a 199554
b 40433
c 103440
d 68191
e 235331
f 24165
g 47094
h 64356
i 201032
j 3167
k 16158
l 130463
m 70680
n 158743
o 170692
p 78163
q 3734
r 160985
s 139542
t 152831
u 87353
v 20177
w 13864
x 6932
y 51681
z 8460
real 0m0.085s
user 0m0.073s
sys 0m0.005s
That would probably be sufficiently fast enough for your application, whatever it is.
#include <array>
#include <fstream>
#include <iostream>
int main()
{
std::ifstream file;
file.open( "words_alpha.txt" );
char c;
std::array<std::size_t, 26> counts {};
while( file >> c)
++counts[c-'a'];
for(char c = 0; c<26;++c)
std::cout<<'('<<c+'a'<<','<<counts[c]<<")\n";
}
Your version keeps track of words unnecessarily: you're simply counting characters in a file. The separation into words and lines doesn't matter. It's also unnecessary to store the words.
You could aim for readable high-level code and write something like this:
// https://github.com/KubaO/stackoverflown/tree/master/questions/letter-count-56498637
#include <cctype>
#include <fstream>
#include <iostream>
#include <iterator>
#include <limits>
#include <utility>
#include <vector>
//*
int main() {
Histogram<char, 'a', 'z'> counts;
std::ifstream file;
file.open("words_alpha.txt");
for (auto ch : make_range<char>(file)) counts.count(tolower(ch));
for (auto c : std::as_const(counts)) std::cout << c.value << ' ' << c.count << '\n';
}
This is the bare minimum of how modern C++ code should look
This requires the Histogram class, and a make_range adapter for input streams. You can't merely implement std::begin and std::end for std::ifstream, because the member end() function takes precedence and interferes (see this answer). The code below is the fragment marked //* above.
template <typename T>
void saturating_inc(T &val) {
if (val < std::numeric_limits<T>::max()) val++;
}
template <typename T, T min, T max>
class Histogram {
using counter_type = unsigned;
using storage_type = std::vector<counter_type>;
storage_type counts;
public:
template <typename U>
void count(U val) {
if (val >= min && val <= max) saturating_inc(counts[size_t(val - min)]);
}
Histogram() : counts(1 + max - min) {}
struct element {
T value;
counter_type count;
};
class const_iterator {
T val;
storage_type::const_iterator it;
public:
const_iterator(T val, storage_type::const_iterator it) : val(val), it(it) {}
const_iterator &operator++() {
++val;
++it;
return *this;
}
bool operator!=(const const_iterator &o) const { return it != o.it; }
element operator*() const { return {val, *it}; }
};
const_iterator begin() const { return {min, counts.begin()}; }
const_iterator end() const { return {0, counts.end()}; }
};
template <class C, class T>
class istream_range {
C &ref;
public:
istream_range(C &ref) : ref(ref) {}
std::istream_iterator<T> begin() { return {ref}; }
std::istream_iterator<T> end() { return {}; }
};
template <class T, class C>
istream_range<C, T> make_range(C &ref) {
return {ref};
}
This concludes the example.

Performance warning for isspace function, conversion from int to bool

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
#include <iterator>
using namespace std;
bool notSpace(char c) {
return !isspace(c);
}
bool isSpace(char c) {
return isspace(c);
}
vector<string> split(const string& s) {
vector<string> words;
string::const_iterator i = s.begin();
while (i != s.end()) {
i = find_if(i, s.end(), notSpace); // " "
if (i != s.end()) {
string::const_iterator j = i;
j = find_if(i, s.end(), isSpace);
words.push_back(string(i, j));
i = j;
}
}
return words;
}
int main() {
string test = "Hello world, I'm a simple guy";
vector<string> words = split(test);
for (vector<string>::size_type i = 0; i < words.size();i++) {
cout << words[i] << endl;
}
return 0;
}
When I compile the code I get this warning:
warning C4800: 'int': forcing value to bool 'true' or 'false'
(performance warning)
on the return of this function:
bool isSpace(char c) {
return isspace(c);
}
Is good habit changing isspace(c) to (isspace(c) != 0) ? Or is it just an unnecessary fussiness?
Take a look at the code below:
#include <iostream>
using namespace std;
bool f()
{
return 2;
}
int main()
{
cout <<f()<<endl;
return 0;
}
it will print 1 when you return 2, that's why you get the warning.
someone may think a bool is kind of small integer, but it isn't.
If you go back to C, there was no bool type, that's why many C methods (like isspace) returns int, even WINDOWS type of BOOL is actually kind of integer and can return other values but TRUE (1) or FALSE (0).

Custom Functor in std::set

#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int order[26];
struct lexcmp
{
bool operator()(const string &s1,const string &s2)
{
int i=0;
int j=min(s1.size(),s2.size());
while(1)
{
if(order[s1[i]-'a']<order[s2[i]-'a'])
return true;
if(order[s1[i]-'a']>order[s2[i]-'a'])
return false;
if(i==j-1)
return false;
i++;
}
}
};
int main()
{
string s;
cin>>s;
for(int i=0;i<s.size();i++)
{
order[s[i]-'a']=i;
}
set<string,lexcmp> store;
int m;
cin>>m;
while(m--)
{
string q;
cin>>q;
store.insert(q);
}
for(auto i=store.begin();i!=store.end();i++)
{
cout<<*i<<endl;
}
}
return 0;
}
Problem in making the Custom Functor
The problem is, i have a new order of elements (instead of simple a-z). //Saved in order array
All i want is order the given strings on the based of new order.
for eg: Order is : bacdefghijklmnopqrstuvwxyz
So if the strings are ss , aa , bb
The new ordering will be bb,aa,ss.
The Code is working fine but it is giving me a problem while the strings are like "pas" "p" to be compared.
p should come before pas but it is coming after.
What modifications should i do in the functor?
Here's one approach:
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <algorithm>
#include <numeric>
#include <array>
#include <string>
#include <locale>
struct lexcmp {
lexcmp() { std::iota(order_.begin(), order_.end(), std::int_fast8_t{}); }
explicit lexcmp(std::string const& order) {
assert(order.size() == order_.size());
for (std::size_t i{}; i != order_.size(); ++i) {
char const order_letter = order[i];
assert(std::isalpha(order_letter, std::locale::classic()));
assert(std::islower(order_letter, std::locale::classic()));
order_[i] = order_letter - 'a';
}
auto unique_order_letters = [this]{
auto order = order_;
std::sort(order.begin(), order.end());
return order.end() - std::unique(order.begin(), order.end()) == 0;
};
assert(unique_order_letters());
}
bool operator ()(std::string const& a, std::string const& b) const {
auto const a_len = a.size(), b_len = b.size();
std::size_t i{};
for (auto const len = std::min(a_len, b_len); i != len; ++i) {
if (auto const diff = order_[a[i] - 'a'] - order_[b[i] - 'a']) {
return diff < 0;
}
}
return i == a_len && i != b_len;
}
private:
std::array<std::int_fast8_t, 26> order_;
};
Online Demo

Deleting two specific Characters

So I have a little problem, I want to achieve this in C++, but I don't know how to do it:
Given is a string containing random numbers, symbols, and letters:
std::string = "1653gbdtsr362g2v3f3t52bv^hdtvsbjj;hdfuue,9^1dkkns";
Now I'm trying to find all ^ characters, and if those are followed by a number between 0 and 9, delete the ^ and the number, so:
"^1ghhu^7dndn^g"
becomes:
"ghhudndn^g"
I know how to find and replace/erase chars from a string, but I don't know how to check if it's followed by a number in a not hard coded way. Any help is appreciated.
std::string s = "^1ghhu^7dndn^g";
for (int i = 0; i < s.length() - 1; ++i)
{
if (s[i] == '^' && std::isdigit(s[i + 1]))
{
s.erase(i, 2);
--i;
}
}
This needs these includes:
#include <string>
#include <cctype>
I would do it this way:
#include <iostream>
#include <string>
#include <utility>
#include <iterator>
template<class Iter, class OutIter>
OutIter remove_escaped_numbers(Iter first, Iter last, OutIter out) {
for ( ; first != last ; )
{
auto c = *first++;
if (c == '^' && first != last)
{
c = *first++;
if (std::isdigit(c))
continue;
else {
*out++ = '^';
*out++ = c;
}
}
else {
*out++ = c;
}
}
return out;
}
int main()
{
using namespace std::literals;
auto input = "^1ghhu^7dndn^g"s;
auto output = std::string{};
remove_escaped_numbers(input.begin(), input.end(), std::back_inserter(output));
std::cout << output << std::endl;
}
or this way:
#include <iostream>
#include <regex>
int main()
{
using namespace std::literals;
auto input = "^1ghhu^7dndn^g"s;
static const auto repl = std::regex { R"___(\^\d)___" };
auto output = std::regex_replace(input, repl, "");
std::cout << output << std::endl;
}
A solution using std::stringstream, and returning the input string cleared of caret-digit's.
#include <iostream>
#include <sstream>
#include <cctype>
int t404()
{
std::stringstream ss;
std::string inStr("1653gbdtsr362g2v3f3t52bv^hdtvsbjj;hdfuue,9^1dkkns");
for (size_t i = 0; i<inStr.size(); ++i)
{
if(('^' == inStr[i]) && isdigit(inStr[i+1]))
{
i += 1; // skip over caret followed by single digit
}
else
{
ss << inStr[i];
}
}
std::cout << inStr << std::endl; // compare input
std::cout << ss.str() << std::endl; // to results
return 0;
}
Output:
1653gbdtsr362g2v3f3t52bv^hdtvsbjj;hdfuue,9^1dkkns
1653gbdtsr362g2v3f3t52bv^hdtvsbjj;hdfuue,9dkkns
you can simply loop over the string and copy it while skipping the undesired chars. Here is a possible function to do it:
std::string filterString (std::string& s) {
std::string result = "";
std::string::iterator it = s.begin();
char c;
while (it != s.end()) {
if (*it == '^' && it != s.end() && (it + 1) != s.end()) {
c = *(it + 1);
if(c >= '0' && c <= '9') {
it += 2;
continue;
}
}
result.push_back(*it);
++ it;
}
return result;
}
A robust solution would be to use the regex library that C++11 brings in.
std::string input ("1653gbdtsr362g2v3f3t52bv^hdtvsbjj;hdfuue,9^1dkkns");
std::regex rx ("[\\^][\\d]{1}"); // "[\^][\d]{1}"
std::cout << std::regex_replace(input,rx,"woot");
>> 1653gbdtsr362g2v3f3t52bv^hdtvsbjj;hdfuue,9wootdkkns
This locates a "^" character ([\^]) followed by 1 ({1}) digit ([\d]) and replaces all occurances with "woot".
Hope this code can solve your problem:
#include <iostream>
#include <string>
int main()
{
std::string str = "^1ghhu^7dndn^g";
std::string::iterator first, last;
for ( std::string::iterator it=str.begin(); it!=str.end(); ++it)
{
if(*it == '^')
{
first = it;
it++;
while(isdigit(*it))
{
it++;
}
last = it - 1;
if(first != last)
{
if((last + 1) != str.end())
{
str.erase(first, last + 1);
}
else
{
str.erase(first, str.end());
break;
}
}
}
}
std::cout<< str << std::endl;
return 0;
}
The output:
$ ./erase
ghhudndn^g

Bit Operation For Finding String Difference

The following string of mine tried to find difference between two strings.
But it's horribly slow as it iterate the length of string:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
int hd(string s1, string s2) {
// hd stands for "Hamming Distance"
int dif = 0;
for (unsigned i = 0; i < s1.size(); i++ ) {
string b1 = s1.substr(i,1);
string b2 = s2.substr(i,1);
if (b1 != b2) {
dif++;
}
}
return dif;
}
int main() {
string string1 = "AAAAA";
string string2 = "ATATT";
string string3 = "AAAAA";
int theHD12 = hd(string1,string2);
cout << theHD12 << endl;
int theHD13 = hd(string1,string3);
cout << theHD13 << endl;
}
Is there a fast alternative to do that?
In Perl we can have the following approach:
sub hd {
return ($_[0] ^ $_[1]) =~ tr/\001-\255//;
}
which is much2 faster than iterating the position.
I wonder what's the equivalent of it in C++?
Try to replace the for loop by:
for (unsigned i = 0; i < s1.size(); i++ ) {
if (b1[i] != b2[i]) {
dif++;
}
}
This should be a lot faster because no new strings are created.
Fun with the STL:
#include <numeric> //inner_product
#include <functional> //plus, equal_to, not2
#include <string>
#include <stdexcept>
unsigned int
hd(const std::string& s1, const std::string& s2)
{
// TODO: What should we do if s1.size() != s2.size()?
if (s1.size() != s2.size()){
throw std::invalid_argument(
"Strings passed to hd() must have the same lenght"
);
}
return std::inner_product(
s1.begin(), s1.end(), s2.begin(),
0, std::plus<unsigned int>(),
std::not2(std::equal_to<std::string::value_type>())
);
}
Use iterators:
int GetHammingDistance(const std::string &a, const std::string &b)
{
// Hamming distance is not defined for strings of different lengths.
ASSERT(a.length() == b.length());
std::string::const_iterator a_it = a.begin();
std::string::const_iterator b_it = b.begin();
std::string::const_iterator a_end = a.end();
std::string::const_iterator b_end = b.end();
int distance = 0;
while (a_it != a_end && b_it != b_end)
{
if (*a_it != *b_it) ++distance;
++a_it; ++b_it;
}
return distance;
}
Choice 1: Modify your original code to be as effecient as possable.
int hd(string const& s1, string const& s2)
{
// hd stands for "Hamming Distance"
int dif = 0;
for (std::string::size_type i = 0; i < s1.size(); i++ )
{
char b1 = s1[i];
char b2 = s2[i];
dif += (b1 != b2)?1:0;
}
return dif;
}
Second option use some of the STL algorithms to do the heavy lifting.
struct HammingFunc
{
inline int operator()(char s1,char s2)
{
return s1 == s2?0:1;
}
};
int hd(string const& s1, string const& s2)
{
int diff = std::inner_product(s1.begin(),s1.end(),
s2.begin(),
0,
std::plus<int>(),HammingFunc()
);
return diff;
}
Some obvious points that might make it faster:
Pass the strings as const references, not by value
Use the indexing operator [] to get characters, not a method call
Compile with optimization on
You use strings.
As explained here
The hunt for the fastest Hamming Distance C implementation
if you can use char* my experiements conclude that for Gcc 4.7.2 on an Intel Xeon X5650 the fastest general purpose hamming distance calculating function for small strings (char arrays) is:
// na = length of both strings
unsigned int HammingDistance(const char* a, unsigned int na, const char* b) {
unsigned int num_mismatches = 0;
while (na) {
if (*a != *b)
++num_mismatches;
--na;
++a;
++b;
}
return num_mismatches;
}
If your problem allows you to set an upper distance limit, so that you don't care for greater distances and this limit is always less than the strings' length, the above example can be furhterly optimized to:
// na = length of both strings, dist must always be < na
unsigned int HammingDistance(const char* const a, const unsigned int na, const char* const b, const unsigned int dist) {
unsigned int i = 0, num_mismatches = 0;
while(i <= dist)
{
if (a[i] != b[i])
++num_mismatches;
++i;
}
while(num_mismatches <= dist && i < na)
{
if (a[i] != b[i])
++num_mismatches;
++i;
}
return num_mismatches;
}
I am not sure if const does anything regarding speed, but i use it anyways...