Recently I was asked in an interview to convert the string "aabbbccccddddd" to "a2b3c4d5". The goal is to replace each repeated character with a single occurrence and a repeat count. Here 'a' is repeated twice in the input, so we have to write it as 'a2' in the output. Also I need to write a function to reverse the format back to the original one (e.g. from the string "a2b3c4d5" to "aabbbccccddddd"). I was free to use either C or C++. I wrote the below code, but the interviewer seemed to be not very happy with this. He asked me to try a smarter way than this.
In the below code, I used formatstring() to eliminate repeated chars by just adding the repeated count and used reverseformatstring() to convert back to the original string.
void formatstring(char* target, const char* source) {
int charRepeatCount = 1;
bool isFirstChar = true;
while (*source != '\0') {
if (isFirstChar) {
// Always add the first character to the target
isFirstChar = false;
*target = *source;
source++; target++;
} else {
// Compare the current char with previous one,
// increment repeat count
if (*source == *(source-1)) {
charRepeatCount++;
source++;
} else {
if (charRepeatCount > 1) {
// Convert repeat count to string, append to the target
char repeatStr[10];
_snprintf(repeatStr, 10, "%i", charRepeatCount);
int repeatCount = strlen(repeatStr);
for (int i = 0; i < repeatCount; i++) {
*target = repeatStr[i];
target++;
}
charRepeatCount = 1; // Reset repeat count
}
*target = *source;
source++; target++;
}
}
}
if (charRepeatCount > 1) {
// Convert repeat count to string, append it to the target
char repeatStr[10];
_snprintf(repeatStr, 10, "%i", charRepeatCount);
int repeatCount = strlen(repeatStr);
for (int i = 0; i < repeatCount; i++) {
*target = repeatStr[i];
target++;
}
}
*target = '\0';
}
void reverseformatstring(char* target, const char* source) {
int charRepeatCount = 0;
bool isFirstChar = true;
while (*source != '\0') {
if (isFirstChar) {
// Always add the first character to the target
isFirstChar = false;
*target = *source;
source++; target++;
} else {
// If current char is alpha, add it to the target
if (isalpha(*source)) {
*target = *source;
target++; source++;
} else {
// Get repeat count of previous character
while (isdigit(*source)) {
int currentDigit = (*source) - '0';
charRepeatCount = (charRepeatCount == 0) ?
currentDigit : (charRepeatCount * 10 + currentDigit);
source++;
}
// Decrement repeat count as we have already written
// the first unique char to the target
charRepeatCount--;
// Repeat the last char for this count
while (charRepeatCount > 0) {
*target = *(target - 1);
target++;
charRepeatCount--;
}
}
}
}
*target = '\0';
}
I didn't find any issues with above code. Is there any other better way of doing this?
The approach/algorithm is fine, perhaps you could refine and shrink the code a bit (by doing something simpler, there's no need to solve this in an overly complex way). And choose an indentation style that actually makes sense.
A C solution:
void print_transform(const char *input)
{
for (const char *s = input; *s;) {
char current = *s;
size_t count = 1;
while (*++s == current) {
count++;
}
if (count > 1) {
printf("%c%zu", current, count);
} else {
putc(current, stdout);
}
}
putc('\n', stdout);
}
(This can be easily modified so that it returns the transformed string instead, or writes it to a long enough buffer.)
A C++ solution:
std::string transform(const std::string &input)
{
std::stringstream ss;
std::string::const_iterator it = input.begin();
while (it != input.end()) {
char current = *it;
std::size_t count = 1;
while (++it != input.end() && *it == current) {
count++;
}
if (count > 1) {
ss << current << count;
} else {
ss << current;
}
}
return ss.str();
}
Since several others have suggested very reasonable alternatives, I'd like to offer some opinions on what I think is your underlying question: "He asked me to try a smarter way than this.... Is there any other better way of doing this?"
When I interview a developer, I'm looking for signals that tell me how she approaches a problem:
Most important, as H2CO3 noted, is correctness: will the code work? I'm usually happy to overlook small syntax errors (forgotten semicolons, mismatched parens or braces, and so on) if the algorithm is sensible.
Proper use of the language, especially if the candidate claims expertise or has had extensive experience. Does he understand and use idioms appropriately to write straightforward, uncomplicated code?
Can she explain her train of thought as she formulates her solution? Is it logical and coherent, or is it a shotgun approach? Is she able and willing to communicate well?
Does he account for edge cases? And if so, does the intrinsic algorithm handle them, or is everything a special case? Although I'm happiest if the initial algorithm "just works" for all cases, I think it's perfectly acceptable to start with a verbose approach that covers all cases (or simply to add a "TODO" comment, noting that more work needs to be done), and then simplifying later, when it may be easier to notice patterns or duplicated code.
Does she consider error-handling? Usually, if a candidate starts by asking whether she can assume the input is valid, or with a comment like, "If this were production code, I'd check for x, y, and z problems," I'll ask what she would do, then suggest she focus on a working algorithm for now and (maybe) come back to that later. But I'm disappointed if a candidate doesn't mention it.
Testing, testing, testing! How will the candidate verify his code works? Does he walk through the code and suggest test cases, or do I need to remind him? Are the test cases sensible? Will they cover the edge cases?
Optimization: as a final step, after everything works and has been validated, I'll sometimes ask the candidate if she can improve her code. Bonus points if she suggests it without my prodding; negative points if she spends a lot of effort worrying about it before the code even works.
Applying these ideas to the code you wrote, I'd make these observations:
Using const appropriately is a plus, as it shows familiarity with the language. During an interview I'd probably ask a question or two about why/when to use it.
The proper use of char pointers throughout the code is a good sign. I tend to be pedantic about making the data types explicit within comparisons, particularly during interviews, so I'm happy to see, e.g.
while (*source != '\0') rather than the (common, correct, but IMO less careful) while(*source).
isFirstChar is a bit of a red flag, based on my "edge cases" point. When you declare a boolean to keep track of the code's state, there's often a way of re-framing the problem to handle the condition intrinsically. In this case, you can use charRepeatCount to decide if this is the first character in a possible series, so you won't need to test explicitly for the first character in the string.
By the same token, repeated code can also be a sign that an algorithm can be simplified. One improvement would be to move the conversion of charRepeatCount to a separate function. See below for an even better solution.
It's funny, but I've found that candidates rarely add comments to their code during interviews. Kudos for helpful ones, negative points for those of the ilk "Increment the counter" that add verbosity without information. It's generally accepted that, unless you're doing something weird (in which case you should reconsider what you've written), you should assume the person who reads your code is familiar with the programming language. So comments should explain your thought process, not translate the code back to English.
Excessive levels of nested conditionals or loops can also be a warning. You can eliminate one level of nesting by comparing each character to the next one instead of the previous one. This works even for the last character in the string, because it will be compared to the terminating null character, which won't match and can be treated like any other character.
There are simpler ways to convert charRepeatCount from an int to a string. For example, _snprintf() returns the number of bytes it "prints" to the string, so you can use
target += _snprintf(target, 10, "%i", charRepeatCount);
In the reversing function, you've used the ternary operator perfectly ... but it's not necessary to special-case the zero value: the math is the same regardless of its value. Again, there are also standard utility functions like atoi() that will convert the leading digits of a string into an integer for you.
Experienced developers will often include the increment or decrement operation as part of the condition in a loop, rather than as a separate statement at the bottom: while(charRepeatCount-- > 0). I'd raise an eyebrow but give you a point or two for humor and personality if you wrote this using the slide operator: while (charRepeatCount --> 0). But only if you'd promise not to use it in production.
Good luck with your interviewing!
I think your code is too complex for the task. Here's my approach (using C):
#include <ctype.h>
#include <stdio.h>
void format_str(char *target, char *source) {
int count;
char last;
while (*source != '\0') {
*target = *source;
last = *target;
target++;
source++;
for (count = 1; *source == last; source++, count++)
; /* Intentionally left blank */
if (count > 1)
target += sprintf(target, "%d", count);
}
*target = '\0';
}
void convert_back(char *target, char *source) {
char last;
int val;
while (*source != '\0') {
if (!isdigit((unsigned char) *source)) {
last = *source;
*target = last;
target++;
source++;
}
else {
for (val = 0; isdigit((unsigned char) *source); val = val*10 + *source - '0', source++)
; /* Intentionally left blank */
while (--val) {
*target = last;
target++;
}
}
}
*target = '\0';
}
format_str compresses the string, and convert_back uncompresses it.
Your code "works", but it doesn't adhere to some common patterns used in C++. You should have:
used std::string instead of plain char* array(s)
pass that string as const reference to avoid modification, since you write the result somewhere else;
use C++11 features such as ranged based for loops and lambdas as well.
I think the interviewer's purpose was to test your ability to deal with the C++11 standard, since the algorithm itself was pretty trivial.
Perhaps the interviewer wanted to test your knowledge of existing standard library tools. Here's how my take could look in C++:
#include <string>
#include <sstream>
#include <algorithm>
#include <iostream>
typedef std::string::const_iterator Iter;
std::string foo(Iter first, Iter last)
{
Iter it = first;
std::ostringstream result;
while (it != last) {
it = std::find_if(it, last, [=](char c){ return c != *it; });
result << *first << (it - first);
first = it;
}
return result.str();
}
int main()
{
std::string s = "aaabbbbbbccddde";
std::cout << foo(s.begin(), s.end());
}
An extra check is needed for empty input.
try this
std::string str="aabbbccccddddd";
for(int i=0;i<255;i++)
{
int c=0;
for(int j=0;j<str.length();j++)
{
if(str[j] == i)
c++;
}
if(c>0)
printf("%c%d",i,c);
}
My naive approach:
void pack( char const * SrcStr, char * DstBuf ) {
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
char c = 0;
int RepeatCount = 1;
while( '\0' != *Src_Ptr ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
for( RepeatCount = 1; *Src_Ptr == c; ++RepeatCount ) {
++Src_Ptr;
}
if( RepeatCount > 1 ) {
Dst_Ptr += sprintf( Dst_Ptr, "%i", RepeatCount );
RepeatCount = 1;
}
}
*Dst_Ptr = '\0';
};
void unpack( char const * SrcStr, char * DstBuf ) {
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
char c = 0;
while( '\0' != *Src_Ptr ) {
if( !isdigit( *Src_Ptr ) ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
} else {
int repeat_count = strtol( Src_Ptr, (char**)&Src_Ptr, 10 );
memset( Dst_Ptr, c, repeat_count - 1 );
Dst_Ptr += repeat_count - 1;
}
}
*Dst_Ptr = '\0';
};
But if interviewer asks for error-handling than solution turns to be much more complex (and ugly). My portable approach:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
// for MSVC
#ifdef _WIN32
#define snprintf sprintf_s
#endif
int pack( char const * SrcStr, char * DstBuf, size_t DstBuf_Size ) {
int Err = 0;
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
size_t SrcBuf_Size = strlen( SrcStr ) + 1;
char const * SrcBuf_End = SrcStr + SrcBuf_Size;
char const * DstBuf_End = DstBuf + DstBuf_Size;
char c = 0;
int RepeatCount = 1;
// don't forget about buffers intercrossing
if( !SrcStr || !DstBuf || 0 == DstBuf_Size \
|| (DstBuf < SrcBuf_End && DstBuf_End > SrcStr) ) {
return 1;
}
// source string must contain no digits
// check for destination buffer overflow
while( '\0' != *Src_Ptr && Dst_Ptr < DstBuf_End - 1 \
&& !isdigit( *Src_Ptr ) && !Err ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
for( RepeatCount = 1; *Src_Ptr == c; ++RepeatCount ) {
++Src_Ptr;
}
if( RepeatCount > 1 ) {
int res = snprintf( Dst_Ptr, DstBuf_End - Dst_Ptr - 1, "%i" \
, RepeatCount );
if( res < 0 ) {
Err = 1;
} else {
Dst_Ptr += res;
RepeatCount = 1;
}
}
}
*Dst_Ptr = '\0';
return Err;
};
int unpack( char const * SrcStr, char * DstBuf, size_t DstBuf_Size ) {
int Err = 0;
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
size_t SrcBuf_Size = strlen( SrcStr ) + 1;
char const * SrcBuf_End = SrcStr + SrcBuf_Size;
char const * DstBuf_End = DstBuf + DstBuf_Size;
char c = 0;
// don't forget about buffers intercrossing
// first character of source string must be non-digit
if( !SrcStr || !DstBuf || 0 == DstBuf_Size \
|| (DstBuf < SrcBuf_End && DstBuf_End > SrcStr) || isdigit( SrcStr[0] ) ) {
return 1;
}
// check for destination buffer overflow
while( '\0' != *Src_Ptr && Dst_Ptr < DstBuf_End - 1 && !Err ) {
if( !isdigit( *Src_Ptr ) ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
} else {
int repeat_count = strtol( Src_Ptr, (char**)&Src_Ptr, 10 );
if( !repeat_count || repeat_count - 1 > DstBuf_End - Dst_Ptr - 1 ) {
Err = 1;
} else {
memset( Dst_Ptr, c, repeat_count - 1 );
Dst_Ptr += repeat_count - 1;
}
}
}
*Dst_Ptr = '\0';
return Err;
};
int main() {
char str[] = "aabbbccccddddd";
char buf1[128] = {0};
char buf2[128] = {0};
pack( str, buf1, 128 );
printf( "pack: %s -> %s\n", str, buf1 );
unpack( buf1, buf2, 128 );
printf( "unpack: %s -> %s\n", buf1, buf2 );
return 0;
}
Test: http://ideone.com/Y7FNE3. Also works in MSVC.
Try to make do with less boilerplate:
#include <iostream>
#include <iterator>
#include <sstream>
using namespace std;
template<typename in_iter,class ostream>
void torle(in_iter i, ostream &&o)
{
while (char c = *i++) {
size_t n = 1;
while ( *i == c )
++n, ++i;
o<<c<<n;
}
}
template<class istream, typename out_iter>
void fromrle(istream &&i, out_iter o)
{
char c; size_t n;
while (i>>c>>n)
while (n--) *o++=c;
}
int main()
{
typedef ostream_iterator<char> to;
string line; stringstream converted;
while (getline(cin,line)) {
torle(begin(line),converted);
cout<<converted.str()<<'\n';
fromrle(converted,ostream_iterator<char>(cout));
cout<<'\n';
}
}
Related
I am developing a C/C++ function to trim extra whitespace except 1 blank for very large data set. Here is my function:
void iterative_trim_whitespace(const char* src, char* target){
bool hitspace(*src = ' ');
while (*src != '\x0'){
if (!hitspace){
*target++ = *src++;
}
else{
src++;
}
if (isspace(*src)){
hitspace = true;
}
else{
hitspace = false;
}
}
}
I wrote a recursive function to do the same thing. I can supply it if you wish. However, for very large data with big strings, the recursive function calll stack overhead could be prohibitive. Does anyone know the fastest way to do this in C/C++ ? I am familiar the Standard Template Library and Boost template libraries. However I think native C/C++ would be faster than C++ templates.
I'm going to assume your intent is a little bit different than "trim" would normally imply. "Trim" is usually used to mean removing extra white space from the beginning and/or end of a string, but you seem to mean that each place there's a run of whitespace in the input, you want a single space in the output.
I'm also assuming you're set on a C-like implementation that deals with C-style strings. If that's not a given, then it's going to be a lot simpler and cleaner to just use iterators and standard algorithms.
Assuming that's the case, I think I'd do things more like this:
bool copy_word(char *&dest, char const *&src) {
while (isspace(*(unsigned char *)src))
++src;
while (*src && *src != ' ') {
*dest = *src;
++dest;
++src;
}
return *src != '\0';
}
void trim_whitespace(char *dest, char const *src) {
while (copy_word(dest, src))
*dest++ = ' ';
*dest = '\0';
}
There are two major points to keep in mind here: first, when you have a sequence of actions to take (skip white space, then copy non-white space, for example) it's probably cleaner to encode that as a sequence, rather than as different routes through a single loop. Second, when you use isspace, you must1 cast the operand to some unsigned type to avoid UB.
Edit: For what it's worth, I put together a little test/benchmark program to see how my code fares vs. the code in the OP's answer.
#include <ctype.h>
#include <time.h>
#include <vector>
#include <set>
#include <deque>
#include <iostream>
#include <string>
#include <algorithm>
void iterative_trim_whitespace(const char* src, char* target){
bool firstspace(true);
while (*src != '\x0'){
if (firstspace){
*target++ = *src++;
}
else{
src++;
}
if (firstspace && isspace(*(src - 1))){
firstspace = false;
}
else{
firstspace = true;
}
}
*target = '\x0';
}
struct my_isspace {
bool operator()(char ch) {
return ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r' || ch == '\v';
}
};
bool copy_word(char *&dest, char const *&src) {
my_isspace check;
while (check((*src)))
++src;
while (*src && !check(*src))
*dest++ = *src++;
return *src != '\0';
}
void trim_whitespace(char *dest, char const *src) {
while (copy_word(dest, src))
*dest++ = ' ';
*dest = '\0';
}
void show(std::string const &label, double t) {
std::cout << "Time for " << label << " " << t << " seconds\n";
}
template <class test, class T>
double timer(test t, T a, T b) {
clock_t start = clock();
t(a, b);
clock_t stop = clock();
return double(stop-start)/CLOCKS_PER_SEC;
}
void generate_string(std::vector<char> &dest, size_t size) {
for (int i=0; i<size; i++) {
if (rand() % 5 == 0)
dest.push_back(' ');
else
dest.push_back(rand() % 26 + 'A');
}
dest.push_back('\0');
}
int main() {
static const int size = 1024 * 1024 * 100;
std::vector<char> src, dest;
generate_string(src, size-2);
dest.resize(size);
show("Original", timer(iterative_trim_whitespace, &src[0], &dest[0]));
show("Jerry's", timer(trim_whitespace, &dest[0], &src[0]));
return 0;
}
At least when I run it, I get:
Time for Original 0.749 seconds
Time for Jerry's 0.468 seconds
I should probably add: as I sort of alluded to in a comment, the implementation of isspace on the compiler I'm using is fairly slow, at least compared to the simple one I've thrown in here. In fairness, however, it wouldn't surprise me (much, anyway) if part of the benefit of this is simply being implemented as a function object, which often makes it quite a bit easier for the compiler to generate inline code for it.
For what it's worth, two other points:
Microsoft's link-time code generation slows both of these quite a bit
Either way, the trimming is quite a bit faster than initially generating the input
1 Well, technically, it is possible for char to be an unsigned type to start with -- but it's unusual enough that you shouldn't count on it. It's also possible for all your input to fall within the ASCII subset of characters that your char can probably hold, in which case it'll seem to work just fine -- but that's what's pernicious: you can test it (as much as you want) but until you do so with text that contains characters that are encoded as what will be a negative number as a char, it looks fine. Then when your French/Spanish/Norwegian/etc., customer tests it, it falls flat on its face.
This certainly looks reasonable, and a recursive version would be horrible. If these are large strings I'd consider modifying them in place instead of copying, but that's a higher-level design decision. It doesn't affect speed, but could reduce memory consumption.
If you need a real quick solution, don't do this at all. Instead have a iterator over input string which skips spaces. Anywhere you need to manipulate a 'trimmed' string just pass this iterator.
This may or may not be possible depending on how far the development has gone by now.
Your code won't work as written. If the first character is a space, then it won't copy the space, and it won't copy the character AFTER the space. Something like this is more reasonable:
bool hitSpace = false;
while (*src != '\x0')
{
if (isspace(*src))
{
if (hitSpace)
{
src++;
}
else
{
*target++ = *src++;
hitSpace = true;
}
}
else
{
*target++ = *src++;
hitSpace = false;
}
}
First, I would choose iterative (in C or C++) over recursive. The compiler will probably convert a recursive algorithm into a loop anyway, but if it doesn't (or you build in debug mode) then you will overflow your stack for sure. Besides, there's a cost to calling functions and you want to avoid that.
Your basic algorithm looks sound (once the bug Jim spotted is fixed). I would check that isspace is being inlined. If not then replace it with *str == ' '.
A solution involving templates is surely just complicating a simple problem to no advantage.
Jerry Coffin, I just got back from appointment from Lexington. This version has been tested. I apologize for the first version which I hastily wrote in a rush to get to my dentist in Lexington.
void iterative_trim_whitespace(const char* src, char* target){
bool firstspace(true);
while (*src != '\x0'){
if (firstspace){
*target++ = *src++;
}
else{
src++;
}
if (firstspace && isspace(*(src - 1))){
firstspace = false;
}
else{
firstspace = true;
}
}
*target = '\x0';
}
void iterative_trim_whitespace_revised(const char* src, char* target){
bool firstspace(true);
int ct(0);
while (*src != '\x0'){
if (firstspace){
*target++ = *src++;
}
else{
src += ct - 2;
}
if (firstspace){
char *x = (char *)src - 1;
ct = 1;
bool sentinel(false);
while(isspace(*(x + (ct - 1)))){
ct += 1;
sentinel = true;
}
if (sentinel){
firstspace = false;
}
}
else{
ct = 1;
firstspace = true;
}
}
*target = '\x0';
}
void iterative_trim_whitespace_friday_5Timesfaster(const char* src, char* target){
bool firstspace(true);
int ct(0);
while (*src != '\x0'){
if (firstspace){
*target++ = *src++;
}
else{
src += ct - 2;
}
if (firstspace){
char *x = (char *)src - 1;
ct = 1;
bool sentinel(false);
while(*(x + (ct - 1)) == ' '){
ct += 1;
sentinel = true;
}
if (sentinel){
firstspace = false;
}
}
else{
ct = 1;
firstspace = true;
}
}
*target = '\x0';
}
// Here is our ProjectDirector's version from this morning
// TrimLeading() and TrimTrailing are additional inline functions
void iterative_trim_whitespace_ProjectDirector(const char* src, char* target){
int out=0;
for (int i=0;src[i]!= '\x0';i++) {
if (src[i] != ' ' || src[i+1] != ' '){
target[out++]=src[i];
}
}
target[out]= '\x0';
}
I'm reading a string from a file so it's in the form of a char array. I need to tokenize the string and save each char array token as a uint8_t hex value in an array.
char* starting = "001122AABBCC";
// ...
uint8_t[] ending = {0x00,0x11,0x22,0xAA,0xBB,0xCC}
How can I convert from starting to ending? Thanks.
Here is a complete working program. It is based on Rob I's solution, but fixes several problems has been tested to work.
#include <string>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
const char* starting = "001122AABBCC";
int main()
{
std::string starting_str = starting;
std::vector<unsigned char> ending;
ending.reserve( starting_str.size());
for (int i = 0 ; i < starting_str.length() ; i+=2) {
std::string pair = starting_str.substr( i, 2 );
ending.push_back(::strtol( pair.c_str(), 0, 16 ));
}
for(int i=0; i<ending.size(); ++i) {
printf("0x%X\n", ending[i]);
}
}
strtoul will convert text in any base you choose into bytes. You have to do a little work to chop the input string into individual digits, or you can convert 32 or 64bits at a time.
ps uint8_t[] ending = {0x00,0x11,0x22,0xAA,0xBB,0xCC}
Doesn't mean anything, you aren't storing the data in a uint8 as 'hex', you are storing bytes, it's upto how you (or your debugger) interpretes the binary data
With C++11, you may use std::stoi for that :
std::vector<uint8_t> convert(const std::string& s)
{
if (s.size() % 2 != 0) {
throw std::runtime_error("Bad size argument");
}
std::vector<uint8_t> res;
res.reserve(s.size() / 2);
for (std::size_t i = 0, size = s.size(); i != size; i += 2) {
std::size_t pos = 0;
res.push_back(std::stoi(s.substr(i, 2), &pos, 16));
if (pos != 2) {
throw std::runtime_error("bad character in argument");
}
}
return res;
}
Live example.
I think any canonical answer (w.r.t. the bounty notes) would involve some distinct phases in the solution:
Error checking for valid input
Length check and
Data content check
Element conversion
Output creation
Given the usefulness of such conversions, the solution should probably include some flexibility w.r.t. the types being used and the locale required.
From the outset, given the date of the request for a "more canonical answer" (circa August 2014) liberal use of C++11 will be applied.
An annotated version of the code, with types corresponding to the OP:
std::vector<std::uint8_t> convert(std::string const& src)
{
// error check on the length
if ((src.length() % 2) != 0) {
throw std::invalid_argument("conversion error: input is not even length");
}
auto ishex = [] (decltype(*src.begin()) c) {
return std::isxdigit(c, std::locale()); };
// error check on the data contents
if (!std::all_of(std::begin(src), std::end(src), ishex)) {
throw std::invalid_argument("conversion error: input values are not not all xdigits");
}
// allocate the result, initialised to 0 and size it to the correct length
std::vector<std::uint8_t> result(src.length() / 2, 0);
// run the actual conversion
auto str = src.begin(); // track the location in the string
std::for_each(result.begin(), result.end(), [&str](decltype(*result.begin())& element) {
element = static_cast<std::uint8_t>(std::stoul(std::string(str, str + 2), nullptr, 16));
std::advance(str, 2); // next two elements
});
return result;
}
The template version of the code adds flexibility;
template <typename Int /*= std::uint8_t*/,
typename Char = char,
typename Traits = std::char_traits<Char>,
typename Allocate = std::allocator<Char>,
typename Locale = std::locale>
std::vector<Int> basic_convert(std::basic_string<Char, Traits, Allocate> const& src, Locale locale = Locale())
{
using string_type = std::basic_string<Char, Traits, Allocate>;
auto ishex = [&locale] (decltype(*src.begin()) c) {
return std::isxdigit(c, locale); };
if ((src.length() % 2) != 0) {
throw std::invalid_argument("conversion error: input is not even length");
}
if (!std::all_of(std::begin(src), std::end(src), ishex)) {
throw std::invalid_argument("conversion error: input values are not not all xdigits");
}
std::vector<Int> result(src.length() / 2, 0);
auto str = std::begin(src);
std::for_each(std::begin(result), std::end(result), [&str](decltype(*std::begin(result))& element) {
element = static_cast<Int>(std::stoul(string_type(str, str + 2), nullptr, 16));
std::advance(str, 2);
});
return result;
}
The convert() function can then be based on the basic_convert() as follows:
std::vector<std::uint8_t> convert(std::string const& src)
{
return basic_convert<std::uint8_t>(src, std::locale());
}
Live sample.
uint8_t is typically no more than a typedef of an unsigned char. If you're reading characters from a file, you should be able to read them into an unsigned char array just as easily as a signed char array, and an unsigned char array is a uint8_t array.
I'd try something like this:
std::string starting_str = starting;
uint8_t[] ending = new uint8_t[starting_str.length()/2];
for (int i = 0 ; i < starting_str.length() ; i+=2) {
std::string pair = starting_str.substr( i, i+2 );
ending[i/2] = ::strtol( pair.c_str(), 0, 16 );
}
Didn't test it but it looks good to me...
You may add your own conversion from set of char { '0','1',...'E','F' } to uint8_t:
uint8_t ctoa(char c)
{
if( c >= '0' && c <= '9' ) return c - '0';
else if( c >= 'a' && c <= 'f' ) return 0xA + c - 'a';
else if( c >= 'A' && c <= 'F' ) return 0xA + c - 'A';
else return 0;
}
Then it will be easy to convert a string in to array:
uint32_t endingSize = strlen(starting)/2;
uint8_t* ending = new uint8_t[endingSize];
for( uint32_t i=0; i<endingSize; i++ )
{
ending[i] = ( ctoa( starting[i*2] ) << 4 ) + ctoa( starting[i*2+1] );
}
This simple solution should work for your problem
char* starting = "001122AABBCC";
uint8_t ending[12];
// This algo will work for any size of starting
// However, you have to make sure that the ending have enough space.
int i=0;
while (i<strlen(starting))
{
// convert the character to string
char str[2] = "\0";
str[0] = starting[i];
// convert string to int base 16
ending[i]= (uint8_t)atoi(str,16);
i++;
}
uint8_t* ending = static_cast<uint8_t*>(starting);
I made this program just out of interest and wanted to make it better. My problem is that I want to make a nested for-loop to carry out the iterations but I can't get my head around it, I have tried many times but my head is melting. Any help would be greatly appreciated. Also for some reason on windows and openSuse (from what I have seen) the program prints out some random characters after the expected output, a solution to this would be a great bonus. Thanks !
Sorry I didn't make it clearer, the point of the code is to be able to theoretically generate every combination of letters from AAAAAAAA to ZZZZZZZZ.
1) No it's not homework
#include <iostream>
using namespace std;
int main()
{
char pass [] = {'A','A','A','A','A','A','A','A'};
while(pass[0] != '[')
{
pass[7]++;
if(pass[7]=='[')
{
pass[6]++;
pass[7] = 'A';
}
if(pass[6] == '[')
{
pass[6] = 'A';
pass[5]++;
}
if(pass[5] == '[')
{
pass[5] = 'A';
pass[4]++;
}
if(pass[4] == '[')
{
pass[4] = 'A';
pass[3]++;
}
if(pass[3] == '[')
{
pass[3] = 'A';
pass[2]++;
}
if(pass[2] == '[')
{
pass[2] = 'A';
pass[1]++;
}
if(pass[1] == '[')
{
pass[1] = 'A';
pass[0]++;
}
cout << pass << endl;
}
return 0;
}
Maybe like this:
const char char_first = 'A';
const char char_last = '[';
const unsigned int passlen = 8;
while (pass[0] != char_last)
{
++pass[passlen - 1];
for (unsigned int i = passlen - 1; i != 0; --i)
{
if (pass[i] == char_last)
{
++pass[i - 1]; // OK, i is always > 0
pass[i] = char_first;
}
}
}
For printing, include <string> and say:
std::cout << std::string(pass, passlen) << std::endl;
I took the liberty of making a few of the magic numbers into constants. If you're ever going to refactor this into a separate function, you'll see the merit of this.
Since (to output it) you use pass as a C string, it should be null terminated. Since it is not, garbage is printed. So you could define it as:
char pass [] = {'A','A','A','A','A','A','A','A','\0'};
or simpler
char pass[] = "AAAAAAAAA";
I'd forget about carrying on my own and just convert to/from numbers. What you're doing here is basically printing a numbers whose digits range from 'A' to ']', mappable to 0-28 via the magic of ASCII (why no ^ in passwords?)
Printing the number of anything then really boils down to
#include <iostream>
#include <cmath>
using namespace std;
std::string format(long num, int ndigits) {
if(ndigits == 0) {
return "";
} else {
char digit = 'A' + num % 28;
return format(num / 28, ndigits - 1) + digit;
}
}
int main()
{
for(int i = 0 ; i < powl(28,8) ; ++i) {
cout << format(i, 8) << endl;
}
}
You may still want to work in a char array instead of producing a billion temporary strings if you're serious about the loop, but the principle stays the same.
First try to find the common parts in the expressions looking like
if(pass[7]=='[')
{
pass[6]++;
pass[7] = 'A';
}
You should think along a line like "There's always the same number here, and a one-lower number there". Then, you replace that notion of a number with a variable and find out which range the variable has. KerrekSB gave you a solution, try to arrive at similar code from your own reasoning.
You just have to play a bit with your while and make it fit a for-loop.
while(pass[0] != '[') becomes for (i=0; pass[0] != '['; i++)
then you can replace all ifs with only one:
if(pass[i+1] == '[')
{
pass[i+1] = 'A';
pass[i]++;
}
How did we come to that conclusion? Well if you check all your if-statements all that changes between them is the indices. You can see clearly that pattern so you just replace the indices with a variable.
For starters, this is definitely not a case for a nested loop. In fact,
your entire code boils down to:
pass = initialPattern();
while ( isValidPattern( pass ) ) {
nextPattern( pass );
std::cout << pass << std::endl;
}
(But I wonder if you don't really mean to do the output before the
increment.)
Now all you have to do is define the type of pass and relevant
functions; you might even consider
putting everything in a class, since all of the functions operate on the
same data instance.
Judging from your code, pass should be an std::string with 8
characters; the initialization could be written:
std::string pass( 8, 'A' );
isValidPattern apparently only looks at the first character. (I'm not
sure that's correct, but that's what your code does.) Something like:
bool
isValidPattern( std::string const& pattern )
{
return pattern[0] != '[';
}
according to your code, but something like:
struct NotIsUpper
{
bool operator()( char ch ) const
{
return ! ::isupper( static_cast<unsigned char>( ch ) );
}
};
bool
isValidPattern( std::string const& pattern )
{
return pattern.size() == 8
&& std::find_if( pattern.begin(), pattern.end(), NotIsUpper() )
== pattern.end();
}
would seem more appropriate. (Of course, if you're doing any sort of
coding with text, you'd already have NotIsUpper and its siblings in
your tool kit.)
Finally, nextPattern seems to be nothing more than a multi-digit
increment, where the data is stored in big-endian order. So the
following (classical) algorithm would seem appropriate:
void
nextPattern( std::string& pattern )
{
static char const firstDigit = 'A';
static char const lastDigit = 'Z';
static std::string const invalidPattern( 1, '[' );
std::string::reverse_iterator current = pattern.rbegin();
std::string::reverse_iterator end = pattern.rend();
while ( current != end && *current == lastDigit ) {
*current = firstDigit;
++ current;
}
if ( current != end ) {
++ *current;
} else {
pattern = invalidPattern;
}
}
Formally, there is no guarantee in the standard that the letters will
be encoded in sequential ascending order, so for maximum portability,
you probably should in fact use an std::vector<int> with values in the
range [0, 26), and map those to letters just befor output. This
would be trivial if you put all of these operations in a class, since
the internal representation wouldn't be visible to the client code.
Something like:
class PatternGenerator
{
std::vector<int> myData;
public:
explicit PatternGenerator()
: myData( 8, 0 )
{
}
void next()
{
static int const lastDigit = 26;
std::vector<int>::reverse_iterator current = pattern.rbegin();
std::vector<int>::reverse_iterator end = pattern.rend();
while ( current != end && *current == lastDigit - 1 ) {
*current = 0;
++ current;
}
if ( current != end ) {
++ *current;
} else {
myData.front() = lastDigit;
}
}
bool isValid() const
{
return myData.front() < lastDigit;
}
friend std::ostream& operator<<(
std::ostream& dest, PatternGenerator const& obj )
{
static char const characterMap[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
for ( std::vector<int>::iterator current = obj.myData.current();
current != obj.myData.end():
++ current ) {
dest << characterMap[*current];
}
return dest;
}
};
(Note that things like isValid become simpler, because they can depend on the class invariants.)
Given this, all you have to write is:
int
main()
{
PatternGenerator pass;
while ( pass.isValid() ) {
std::cout << pass << std::endl;
pass.next();
}
return 0;
}
To do nested loops, you need to turn it inside-out.
You've written the code thinking as follows: go through all the possibilities for the last symbol, then change the second-last once and go back, etc. That's like counting up from 1, getting to 10 and putting a 1 in the tens column, etc.
Nested loops work the other way: go through the possibilities for the first symbol, allowing the inner loops to take care of possibilities for the other symbols each time. i.e., "list all those numbers, in order, that start with 0 in the millions place, then the ones that start with 1 etc.". In the outermost loop, you just set that value for the first digit, and the nested loops take care of the rest.
This is a question in my paper test today, the function signature is
int is_match(char* pattern,char* string)
The pattern is limited to only ASCII chars and the quantification * and ?, so it is relatively simple. is_match should return 1 if matched, otherwise 0.
How do I do this?
Brian Kernighan provided a short article on A Regular Expression Matcher that Rob Pike wrote as a demonstration program for a book they were working on. The article is a very nice read explaining a bit about the code and regular expressions in general.
I have played with this code, making a few changes to experiment with some extensions such as to also return where in the string the pattern matches so that the substring matching the pattern can be copied from the original text.
From the article:
I suggested to Rob that we needed to find the smallest regular
expression package that would illustrate the basic ideas while still
recognizing a useful and non-trivial class of patterns. Ideally, the
code would fit on a single page.
Rob disappeared into his office, and at least as I remember it now,
appeared again in no more than an hour or two with the 30 lines of C
code that subsequently appeared in Chapter 9 of TPOP. That code
implements a regular expression matcher that handles these constructs:
c matches any literal character c
. matches any single character
^ matches the beginning of the input string
$ matches the end of the input string
* matches zero or more occurrences of the previous character
This is quite a useful class; in my own experience of using regular
expressions on a day-to-day basis, it easily accounts for 95 percent
of all instances. In many situations, solving the right problem is a
big step on the road to a beautiful program. Rob deserves great credit
for choosing so wisely, from among a wide set of options, a very small
yet important, well-defined and extensible set of features.
Rob's implementation itself is a superb example of beautiful code:
compact, elegant, efficient, and useful. It's one of the best examples
of recursion that I have ever seen, and it shows the power of C
pointers. Although at the time we were most interested in conveying
the important role of a good notation in making a program easier to
use and perhaps easier to write as well, the regular expression code
has also been an excellent way to illustrate algorithms, data
structures, testing, performance enhancement, and other important
topics.
The actual C source code from the article is very very nice.
/* match: search for regexp anywhere in text */
int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');
return 0;
}
/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;
}
/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}
See This Question for a solution you can not submit. See this paper for a description of how to implement a more readable one.
Here is recursive extendable implementation. Tested for first order of pattern complexity.
#include <string.h>
#include <string>
#include <vector>
#include <iostream>
struct Match {
Match():_next(0) {}
virtual bool match(const char * pattern, const char * input) const {
return !std::strcmp(pattern, input);
}
bool next(const char * pattern, const char * input) const {
if (!_next) return false;
return _next->match(pattern, input);
}
const Match * _next;
};
class MatchSet: public Match {
typedef std::vector<Match *> Set;
Set toTry;
public:
virtual bool match(const char * pattern, const char * input) const {
for (Set::const_iterator i = toTry.begin(); i !=toTry.end(); ++i) {
if ((*i)->match(pattern, input)) return true;
}
return false;
}
void add(Match * m) {
toTry.push_back(m);
m->_next = this;
}
~MatchSet() {
for (Set::const_iterator i = toTry.begin(); i !=toTry.end(); ++i)
if ((*i)->_next==this) (*i)->_next = 0;
}
};
struct MatchQuestion: public Match {
virtual bool match(const char * pattern, const char * input) const {
if (pattern[0] != '?')
return false;
if (next(pattern+1, input))
return true;
if (next(pattern+1, input+1))
return true;
return false;
}
};
struct MatchEmpty: public Match {
virtual bool match(const char * pattern, const char * input) const {
if (pattern[0]==0 && input[0]==0)
return true;
return false;
}
};
struct MatchAsterisk: public Match {
virtual bool match(const char * pattern, const char * input) const {
if (pattern[0] != '*')
return false;
if (pattern[1] == 0) {
return true;
}
for (int i = 0; input[i] != 0; ++i) {
if (next(pattern+1, input+i))
return true;
}
return false;
}
};
struct MatchSymbol: public Match {
virtual bool match(const char * pattern, const char * input) const {
// TODO: consider cycle here to prevent unnecessary recursion
// Cycle should detect special characters and call next on them
// Current implementation abstracts from that
if (pattern[0] != input[0])
return false;
return next(pattern+1, input+1);
}
};
class DefaultMatch: public MatchSet {
MatchEmpty empty;
MatchQuestion question;
MatchAsterisk asterisk;
MatchSymbol symbol;
public:
DefaultMatch() {
add(&empty);
add(&question);
add(&asterisk);
add(&symbol);
}
void test(const char * p, const char * input) const {
testOneWay(p, input);
if (!std::strcmp(p, input)) return;
testOneWay(input, p);
}
bool testOneWay(const char * p, const char * input) const {
const char * eqStr = " == ";
bool rv = match(p, input);
if (!rv) eqStr = " != ";
std::cout << p << eqStr << input << std::endl;
return rv;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
using namespace std;
typedef vector<string> Strings;
Strings patterns;
patterns.push_back("*");
patterns.push_back("*hw");
patterns.push_back("h*w");
patterns.push_back("hw*");
patterns.push_back("?");
patterns.push_back("?ab");
patterns.push_back("a?b");
patterns.push_back("ab?");
patterns.push_back("c");
patterns.push_back("cab");
patterns.push_back("acb");
patterns.push_back("abc");
patterns.push_back("*this homework?");
patterns.push_back("Is this homework?");
patterns.push_back("This is homework!");
patterns.push_back("How is this homework?");
patterns.push_back("hw");
patterns.push_back("homework");
patterns.push_back("howork");
DefaultMatch d;
for (unsigned i = 0; i < patterns.size(); ++i)
for (unsigned j =i; j < patterns.size(); ++j)
d.test(patterns[i].c_str(), patterns[j].c_str());
return 0;
}
If something is unclear, ask.
Cheat. Use #include <boost/regex/regex.hpp>.
try to make a list of interesting test cases:
is_match("dummy","dummy") should
return true;
is_match("dumm?y","dummy") should
return true;
is_match("dum?y","dummy")
should return false;
is_match("dum*y","dummy") should
return true;
and so on ...
then see how to make the easier test pass, then the next one ...
Didn't test this, actually code it, or debug it, but this might get you a start...
for each character in the pattern
if pattern character after the current one is *
// enter * state
while current character from target == current pattern char, and not at end
get next character from target
skip a char from the pattern
else if pattern character after the current one is ?
// enter ? state
if current character from target == current pattern char
get next char from target
skip a char from the pattern
else
// enter character state
if current character from target == current pattern character
get next character from target
else
return false
return true
The full power of regular expressions and finite state machines are not needed to solve this problem. As an alternative there is a relatively simple dynamic programming solution.
Let match(i, j) be 1 if it is possible to match the the sub-string string[i..n-1] with the sub-pattern pattern[j, m - 1], where n and m are the lengths of string and pattern respectively. Otherwise let match(i, j) be 0.
The base cases are:
match(n, m) = 1, you can match an empty string with an empty pattern;
match(i, m) = 0, you can't match a non-empty string with an empty pattern;
The transition is divided into 3 cases depending on whether the current sub-pattern starts with a character followed by a '*', or a character followed by a '?' or just starts with a character with no special symbol after it.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int is_match(char* pattern, char* string)
{
int n = strlen(string);
int m = strlen(pattern);
int i, j;
int **match;
match = (int **) malloc((n + 1) * sizeof(int *));
for(i = 0; i <= n; i++) {
match[i] = (int *) malloc((m + 1) * sizeof(int));
}
for(i = n; i >= 0; i--) {
for(j = m; j >= 0; j--) {
if(i == n && j == m) {
match[i][j] = 1;
}
else if(i < n && j == m) {
match[i][j] = 0;
}
else {
match[i][j] = 0;
if(pattern[j + 1] == '*') {
if(match[i][j + 2]) match[i][j] = 1;
if(i < n && pattern[j] == string[i] && match[i + 1][j]) match[i][j] = 1;
}
else if(pattern[j + 1] == '?') {
if(match[i][j + 2]) match[i][j] = 1;
if(i < n && pattern[j] == string[i] && match[i + 1][j + 2]) match[i][j] = 1;
}
else if(i < n && pattern[j] == string[i] && match[i + 1][j + 1]) {
match[i][j] = 1;
}
}
}
}
int result = match[0][0];
for(i = 0; i <= n; i++) {
free(match[i]);
}
free(match);
return result;
}
int main(void)
{
printf("is_match(dummy, dummy) = %d\n", is_match("dummy","dummy"));
printf("is_match(dumm?y, dummy) = %d\n", is_match("dumm?y","dummy"));
printf("is_match(dum?y, dummy) = %d\n", is_match("dum?y","dummy"));
printf("is_match(dum*y, dummy) = %d\n", is_match("dum*y","dummy"));
system("pause");
return 0;
}
The time complexity of this approach is O(n * m). The memory complexity is also O(n * m) but with a simple modification can be reduced to O(m).
Simple recursive implementation. It's slow but easy to understand:
int is_match(char *pattern, char *string)
{
if (!pattern[0]) {
return !string[0];
} else if (pattern[1] == '?') {
return (pattern[0] == string[0] && is_match(pattern+2, string+1))
|| is_match(pattern+2, string);
} else if (pattern[1] == '*') {
size_t i;
for (i=0; string[i] == pattern[0]; i++)
if (is_match(pattern+2, string+i)) return 1;
return 0;
} else {
return pattern[0] == string[0] && is_match(pattern+1, string+1);
}
}
Hope I got it all right.
A C program to find the index,from where the sub-string in the main string is going to start.
enter code here
#include<stdio.h>
int mystrstr (const char *,const char *);
int mystrcmp(char *,char *);
int main()
{
char *s1,*s2;//enter the strings, s1 is main string and s2 is substring.
printf("Index is %d\n",mystrstr(s1,s2));
//print the index of the string if string is found
}
//search for the sub-string in the main string
int mystrstr (const char *ps1,const char *ps2)
{
int i=0,j=0,c=0,l,m;char *x,*y;
x=ps1;
y=ps2;
while(*ps1++)i++;
while(*ps2++)j++;
ps1=x;
ps2=y;
char z[j];
for(l=0;l<i-j;l++)
{
for(m=l;m<j+l;m++)
//store the sub-string of similar size from main string
z[c++]=ps1[m];
z[c]='\0'
c=0;
if(mystrcmp(z,ps2)==0)
break;
}
return l;
}
int mystrcmp(char *ps3,char *ps4) //compare two strings
{
int i=0;char *x,*y;
x=ps3;y=ps4;
while((*ps3!=0)&&(*ps3++==*ps4++))i++;
ps3=x;ps4=y;
if(ps3[i]==ps4[i])
return 0;
if(ps3[i]>ps4[i])
return +1;
else
return -1;
}
I want to output my floats without the ending zeros.
Example: float 3.570000 should be outputted as 3.57
and float 3.00000 should be outputted as 3.0 (so here would be the exception!)
A more efficient and (in my opinion) clearer form of paxdiablo's morphNumericString(). Sorry not compiled or tested.
void morphNumericString( char *s )
{
char *p, *end, *decimal, *nonzero;
// Find the last decimal point and non zero character
end = p = strchr(s,'\0');
decimal = nonzero = NULL;
while( p > s )
{
p--;
if( !nonzero && *p!='0' )
{
nonzero = p;
}
if( !decimal && *p=='.' )
{
decimal = p;
break; // nonzero must also be non NULL, so stop early
}
}
// eg "4.3000" -> "4.3"
if( decimal && nonzero && nonzero>decimal )
*(nonzero+1) = '\0';
// eg if(decimal) "4.0000" -> "4.0"
// if(!decimal) "4" -> "4.0"
else
strcpy( decimal?decimal:end, ".0" );
}
This is not possible with standard printf semantics. In, the past, I've had to do this by outputting to a string (with something like "%.20f") then post-processing the string.
Something like this is probably what you're looking for:
#include <stdio.h>
void morphNumericString (char *s) {
char *p;
int count;
// Find decimal point, if any.
p = strchr (s,'.');
if (p == NULL) {
// No decimal, just add one fractional position.
strcat (s, ".0");
} else {
// Decimal, start stripping off trailing zeros.
while (s[strlen(s)-1] == '0') {
s[strlen(s)-1] = '\0';
}
// If all fractional positions were zero, add one.
if (s[strlen(s)-1] == '.') {
strcat (s, "0");
}
}
}
int main (int argc, char *argv[]) {
char str[100];
int i;
for (i = 1; i < argc; i++) {
strcpy (str, argv[i]);
morphNumericString (str);
printf ("[%s] -> [%s]\n", argv[i], str);
}
return 0;
}
The code runs through each of it's arguments, morphing each one in turn. The following transcript shows how it works:
pax> ./qq 3.750000 12 12.507 47.90 56.0000000 76.0 0
[3.750000] -> [3.75]
[12] -> [12.0]
[12.507] -> [12.507]
[47.90] -> [47.9]
[56.0000000] -> [56.0]
[76.0] -> [76.0]
[0] -> [0.0]
You should be aware however that, if using floats or doubles, you'll have to watch out for the normal floating point inaccuracies. On my system 3.57 is actually 3.5699999999999998401, which won't be truncated at all.
Of course, you can get around that problem by using a less-specific number of output digits in the sprintf, something less than the actual accuracy of the floating point point. For example, "%.10f" on my system outputs 3.5700000000 which will be truncated. Adding the following lines to main:
sprintf (str, "%.10f", 3.57);
morphNumericString (str);
printf ("[%.10f] -> [%s]\n", 3.57, str);
sprintf (str, "%.10f", 3.0);
morphNumericString (str);
printf ("[%.10f] -> [%s]\n", 3.0, str);
will result in the following added output:
[3.5700000000] -> [3.57]
[3.0000000000] -> [3.0]
as per your test data.
One other possibility (if your input range and precision can be controlled) is to use the g format specifier. This outputs in either f format where the precision is the maximum number of digits (rather than fixed number like f) or exponential format (e).
Basically, it prefers the non-exponential output format as long as all the information is shown. It will switch to exponential only if that will deliver more information. A simple example is the format string "%.4g" with 3.7 and .0000000004. The former would be printed as 3.7, the latter as 4e-10.
Update: for those more concerned about performance than robustness or readability, you could try the following (unnecessary in my opinion but to each their own):
void morphNumericString (char *s) {
char *p = strchr (s,'.');
if (p == NULL) {
strcat (s, ".0");
return;
}
p = &(p[strlen(p)-1]);
while ((p != s) && (*p == '0') && (*(p-1) != '.'))
*p-- = '\0';
}
It may be faster but, given the extreme optimisations I've seen modern compilers do, you can never be too sure. I tend to code for readability first and only worry about speed when it becomes an issue (YAGNI can apply equally to performance as well as functionality).
My approach is:
implement a function trim(char c) to eliminate the trail 'c', for examaple:
void trim(std::string &str, char c) {
size_t len = str.length();
const char *s = str.c_str();
const char *p = s + len - 1;
while (p != s && *p == c) {
-- p;
}
++ p;
size_t end = p - s;
str = str.substr(0, end);
}
char buf[32];
printf(buf, "%f", 0.01);
std::string s(buf);
trim(buf, '0');
void tidyFloatRepresentation(char *s)
{
size_t end = strlen (s);
size_t i = end - 1;
char *lastZero = NULL;
while (i && s[i] == '0') lastZero = &s[i--];
while (i && s[i] != '.') i--;
if (lastZero && s[i] == '.')
{
if (lastZero == &s[i] + 1)
*(lastZero + 1) = '\0';
else
*lastZero = '\0';
}
else
{
strcpy (&s[end + 1], ".0");
}
}
This will fail when the input ends with a decimal point, but for the most part, it will truncate (or append) appropriately.
Here's another string modification function. I did test it.
It's longer than Bill's and Diablo's, but it handles trailing 9's as well as 0's and should perform well.
Leaves a single trailing zero after the dot. You'll have to truncate using sprintf to get proper trailing 9's.
void morphNumericString( char *s ) {
char *point = s;
while ( * point && * point != '.' ) ++ point;
char *last = strchr( point, 0 );
if ( point == last ) {
* point = '.';
++ last;
}
if ( point == last - 1 ) {
* last = '0';
} else {
-- last;
if ( * last == '0' ) {
while ( * last == '0' ) -- last;
} else if ( * last == '9' ) {
while ( * last == '9' || * last == '.' ) {
if ( * last == '9' ) * last = '0';
-- last;
}
( * last ) ++;
}
if ( last < point + 1 ) last = point + 1;
}
* ++ last = 0;
}
Edit: Yikes, this fails on input like 999.999. Left as an exercise to the reader ;v)
It might be easier to calculate the fractional part directly:
double value= -3.57;
double int_part;
double frac= modf(value, &int_part);
int64 int_part_as_int= int_part;
int significant_digits= 10;
int64 fractional_scale= pow(10., significant_digits);
int64 fraction_magnitude= fabs(frac)*fractional_scale + 0.5;
fractional_magnitude/fractional_scale will be the fraction rounded to significant_digits sig figs. Even with doubles, this is guaranteed not to overflow.
Formatting the fraction should be straightforward.