Calculate MD5 of a string in C++ - c++

I have a nice example of memory mapped files that calculates the MD5 hash of a file. That works fine with no problems.
I would like to change it to calculate the MD5 hash of a string.
So the example is:
(include #include <openssl/md5.h> to run this code, and also boost stuff if you want to run the one with the file)
unsigned char result[MD5_DIGEST_LENGTH];
boost::iostreams::mapped_file_source src(path);
MD5((unsigned char*)src.data(), src.size(), result);
std::ostringstream sout;
sout<<std::hex<<std::setfill('0');
for(long long c: result)
{
sout<<std::setw(2)<<(long long)c;
}
return sout.str();
The change I made is:
std::string str("Hello");
unsigned char result[MD5_DIGEST_LENGTH];
MD5((unsigned char*)str.c_str(), str.size(), result);
std::ostringstream sout;
sout<<std::hex<<std::setfill('0');
for(long long c: result)
{
sout<<std::setw(2)<<(long long)c;
}
return sout.str();
But this produces the result:
8b1a9953c4611296a827abf8c47804d7
While the command $ md5sum <<< Hello gives the result:
09f7e02f1290be211da707a266f153b3
Why don't the results agree? Which one is wrong?
Thanks.
EDIT:
So I got the right answer which is ticked down there. The correct way to call md5sum from terminal is:
$ printf '%s' "Hello" | md5sum
To avoid the new line being included.

You are passing a final newline to the md5sum program, but not to your code.
You can see that the bash <<< operator adds a newline:
$ od -ta <<<Hello
0000000 H e l l o nl
0000006
To avoid this, use printf:
$ printf '%s' Hello | od -ta
0000000 H e l l o
0000005
$ printf '%s' Hello | md5sum
8b1a9953c4611296a827abf8c47804d7 -
Alternatively, you could include a newline in your program version:
std::string str("Hello\n");

Related

C++: Output file of system command not generated

I am trying to run a shell command from my code. However, the output file (ts.dat) is not generated.
Can somebody let me know how to solve this problem?
string cmd1, input;
cout << "Enter the input file name: ";
cin >> input;
cmd1 = "grep 'DYNA>' input | cut -c9-14 > ts.dat";
system((cmd1).c_str());
Edit this line:
cmd1="grep 'DYNA>' input | cut -c9-14 > ts.dat";
To this:
cmd1="grep 'DYNA>' " + input + " | cut -c9-14 > ts.dat";
You need to actually use the value from the input string. The way you have your code currently, you are just writing the word input in your string and not using the value that is stored in the string.
cmd1="grep 'DYNA>' "+input+" | cut -c9-14 > ts.dat";
Placing input inside the quotes will let the compiler parse it as a string instead of a variable.

Strange ASCII response(chinese) when trying to replicate strlwr in codeblocks 13.12

The following code gives a very strange result:
#include <iostream>
#include <fstream>
using namespace std;
ifstream f("f1.in");
ofstream g("f1.out");
char sir[255];
int i;
char strlwr(char sir[]) //if void nothing changes
{
int i = 0;
for (i = 0; sir[i] != NULL; i++) {
sir[i] = tolower(sir[i]);
}
return 0; //if instead of 0 is 1 it will kinda work , but strlwr(sir) still needs to be displayed
}
int main()
{
f.get(sir, 255);
g << sir << '\n'; // without '\n' strlwr will no more maters
g << strlwr(sir);
g << sir;
return 0;
}
f1.in:
JHON HAS A COW
f1.out:
䡊乏䠠十䄠䌠坏
桪湯栠獡愠挠睯
It shows this only when I am using just CAPS.
I am using Code::Blocks 13.12 on Ubuntu 14, European version.
I would be very interested in knowing why it shows this.
I am interested in knowing if it gives you the same thing.
Congratulations! You've discovered mojibake! Your output text is 100% correct, but whatever your viewing it with is interpreting it as unicode.
If you convert the unicode output into their hex numerical values, the issue will become clear. (Code borrowed from this StackOverflow answer.)
$ cat unicode.txt
䡊乏䠠十䄠䌠坏
桪湯栠獡愠挠睯
$ cat unicode.txt | while IFS= read -r -d '' -n1 c; do printf "%02X\n" "'$c"; done
484A
4E4F
4820
5341
4120
4320
574F
0A
686A
6E6F
6820
7361
6120
6320
776F
0A
The second command reads the file character by character and prints the little endian form in hex. The reason each character is two bytes of data is because the input is understood to be UTF-16, a 2-byte encoding.
If you reinterpret the hex output as single byte ASCII instead (and correct for endianness) you can see that your program did work:
$ cat unicode.txt | while IFS= read -r -d '' -n1 c; do printf "%02X\n" "'$c"; done
484A ; JH
4E4F ; ON
4820 ; H
5341 ; AS
4120 ; A
4320 ; C
574F ; OW
0A ; \n
686A ; jh
6E6F ; on
6820 ; h
7361 ; as
6120 ; a
6320 ; c
776F ; ow
0A ; \n
To determine if the issue is your C++ program or your viewing program, try running the following command xxd f1.out. If it looks like ASCII, then it's your viewing programs fault. Otherwise, it's your program's fault and you should look into setlocale and/or opening your output file in binary mode.
Either way, you should probably change g<<strlwr(sir); to just strlwr(sir);. Currently it's adding a NULL byte to your output which is probably unintended.

Need command line expansion to print all keyboard characters

Is there an awk-like or sed-like command line hack I can issue to generate a list of all keyboard characters (such as a-zA-z0-9!-*, etc)? I'm writing a simple Caesar cipher program in my intro programming class where we do the rotation not through ASCII values, but indexing into an alphabet string, something like this:
String alphabet = "abcdefghijklmnopqrstuvwxyz";
for (int pos = 0; pos < message.length(); pos++) {
char ch = message.charAt(pos);
int chPos = alphabet.indexOf(ch);
char cipherCh = alphabet.charAt(chPos+rotation%alphabet.length());
System.out.print(cipherCh);
}
Clearly I can write a loop in some other language and print all ASCII values, but I'd love something closer to the command line as flashier example.
Is this what you're looking for:
awk 'END {for (i=33; i<=126; i++) printf("%c",i); print ""}' /dev/null
This generates:
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
I chose the range from 33 to 126 as the printable chars. See ascii man page
This is pure shell, no externals:
$ for i in {32..126}; do printf \\$(($i/64*100+$i%64/8*10+$i%8)); done
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
It converts decimals to octals and prints the corresponding character.
It works in Bash and ksh, dash and ash (if you use $(seq 32 126) instead of {32..126}) and zsh (if you use print -n instead of printf).

How can I extract a substring after a match position?

I have a requirement to grep a string or pattern (say around 200 characters before and after the string or pattern) from an extremely long line ed file. The file contains streams of data (market trading data) coming from a remote server and getting appended onto this line of the file.
I know that I can match lines containing a specific pattern using grep (or other tools), but once I have such lines, how can I extract a portion of the line? I want to grab the part of the line with the pattern plus roughly 200 characters before and after the pattern. I would be especially interested in answers using...(supply tools or languages you're comfortable with here).
If what you need is the 200 characters before and after the expression plus the expression itself, then you are looking at:
/.{200}aaa.{200}/
If you need captures for each (allowing you to extract each part as a unit), then you use this regexp:
/(.{200})(aaa)(.{200})/
If your grep has -o then that will output only the matched part.
echo "abc def ghi jkl mno pqr" | egrep -o ".{4}ghi.{4}"
produces:
def ghi jkl
(.{0,200}(pattern).{0,200}), or something?
Is this what you want (in C)?
If it is, feel free to adapt to your specific needs.
#include <stdio.h>
#include <string.h>
void prt_grep(const char *haystack, const char *needle, int padding) {
char *ptr, *start, *finish;
ptr = strstr(haystack, needle);
if (!ptr) return;
start = (ptr - padding);
if (start < haystack) start = haystack;
finish = ptr + strlen(needle) + padding;
if (finish > haystack + strlen(haystack)) finish = haystack + strlen(haystack);
for (ptr = start; ptr < finish; ptr++) putchar(*ptr);
}
int main(void) {
const char *longline = "123456789 ASDF 123456789";
const char *pattern = "ASDF";
prt_grep(longline, pattern, 5); /* you want 200 */
return 0;
}
I think I might approach the problem by matching the part of the string I need, then using the match position as the starting point for the substring extraction. In Perl, once your regex suceeds, the pos built-in tells you where you left off:
if( $long_string = m/$regex/ ) {
$substring = substr( $long_string, pos( $long_string ), 200 );
}
I tend to write my programs in Perl instead of doing everything in the regular expression. There's nothing particularly special about Perl in this case.
I think this may be more basic that everybody is thinking, correct me if I'm wrong...
Do you want to print before and after the string excluding the string?
awk -F "ASDF" '{print "Before ASDF" $1 "\n" "After ASDF" $2}' $FILE
This will print something like:
Before ASDF blablabla
After ASDF blablablabla
Change it to match your needs, remove the "\n" and or the "Before..." and "After..." comments
Do you want to supress the string from the file?
This will replace the string with a blank space, again, change it to whatever you need.
sed -i 's/ASDF/\ /' longstring.txt
HTH

Regex Replacing : to ":" etc

I've got a bunch of strings like:
"Hello, here's a test colon:. Here's a test semi-colon;"
I would like to replace that with
"Hello, here's a test colon:. Here's a test semi-colon;"
And so on for all printable ASCII values.
At present I'm using boost::regex_search to match &#(\d+);, building up a string as I process each match in turn (including appending the substring containing no matches since the last match I found).
Can anyone think of a better way of doing it? I'm open to non-regex methods, but regex seemed a reasonably sensible approach in this case.
Thanks,
Dom
The big advantage of using a regex is to deal with the tricky cases like &#38; Entity replacement isn't iterative, it's a single step. The regex is also going to be fairly efficient: the two lead characters are fixed, so it will quickly skip anything not starting with &#. Finally, the regex solution is one without a lot of surprises for future maintainers.
I'd say a regex was the right choice.
Is it the best regex, though? You know you need two digits and if you have 3 digits, the first one will be a 1. Printable ASCII is after all -~. For that reason, you could consider &#1?\d\d;.
As for replacing the content, I'd use the basic algorithm described for boost::regex::replace :
For each match // Using regex_iterator<>
Print the prefix of the match
Remove the first 2 and last character of the match (&#;)
lexical_cast the result to int, then truncate to char and append.
Print the suffix of the last match.
This will probably earn me some down votes, seeing as this is not a c++, boost or regex response, but here's a SNOBOL solution. This one works for ASCII. Am working on something for Unicode.
NUMS = '1234567890'
MAIN LINE = INPUT :F(END)
SWAP LINE ? '&#' SPAN(NUMS) . N ';' = CHAR( N ) :S(SWAP)
OUTPUT = LINE :(MAIN)
END
* Repaired SNOBOL4 Solution
* &#38; -> &
digit = '0123456789'
main line = input :f(end)
result =
swap line arb . l
+ '&#' span(digit) . n ';' rem . line :f(out)
result = result l char(n) :(swap)
out output = result line :(main)
end
I don't know about the regex support in boost, but check if it has a replace() method that supports callbacks or lambdas or some such. That's the usual way to do this with regexes in other languages I'd say.
Here's a Python implementation:
s = "Hello, here's a test colon:. Here's a test semi-colon;"
re.sub(r'&#(1?\d\d);', lambda match: chr(int(match.group(1))), s)
Producing:
"Hello, here's a test colon:. Here's a test semi-colon;"
I've looked some at boost now and I see it has a regex_replace function. But C++ really confuses me so I can't figure out if you could use a callback for the replace part. But the string matched by the (\d\d) group should be available in $1 if I read the boost docs correctly. I'd check it out if I were using boost.
The existing SNOBOL solutions don't handle the multiple-patterns case properly, due to there only being one "&". The following solution ought to work better:
dd = "0123456789"
ccp = "#" span(dd) $ n ";" *?(s = s char(n)) fence (*ccp | null)
rdl line = input :f(done)
repl line "&" *?(s = ) ccp = s :s(repl)
output = line :(rdl)
done
end
Ya know, as long as we're off topic here, perl substitution has an 'e' option. As in evaluate expression. E.g.
echo "Hello, here's a test colon:. Here's a test semi-colon; Further test &#65;. abc.~.def." | perl -we 'sub translate { my $x=$_[0]; if ( ($x >= 32) && ($x <= 126) ) { return sprintf("%c",$x); } else { return "&#".$x.";"; } } while (<>) { s/&#(1?\d\d);/&translate($1)/ge; print; }'
Pretty-printing that:
#!/usr/bin/perl -w
sub translate
{
my $x=$_[0];
if ( ($x >= 32) && ($x <= 126) )
{
return sprintf( "%c", $x );
}
else
{
return "&#" . $x . ";" ;
}
}
while (<>)
{
s/&#(1?\d\d);/&translate($1)/ge;
print;
}
Though perl being perl, I'm sure there's a much better way to write that...
Back to C code:
You could also roll your own finite state machine. But that gets messy and troublesome to maintain later on.
Here's another Perl's one-liner (see #mrree's answer):
a test file:
$ cat ent.txt
Hello,  here's a test colon:.
Here's a test semi-colon; 'ƒ'
the one-liner:
$ perl -pe's~&#(1?\d\d);~
> sub{ return chr($1) if (31 < $1 && $1 < 127); $& }->()~eg' ent.txt
or using more specific regex:
$ perl -pe"s~&#(1(?:[01][0-9]|2[0-6])|3[2-9]|[4-9][0-9]);~chr($1)~eg" ent.txt
both one-liners produce the same output:
Hello,  here's a test colon:.
Here's a test semi-colon; 'ƒ'
boost::spirit parser generator framework allows easily to create a parser that transforms desirable NCRs.
// spirit_ncr2a.cpp
#include <iostream>
#include <string>
#include <boost/spirit/include/classic_core.hpp>
int main() {
using namespace BOOST_SPIRIT_CLASSIC_NS;
std::string line;
while (std::getline(std::cin, line)) {
assert(parse(line.begin(), line.end(),
// match "&#(\d+);" where 32 <= $1 <= 126 or any char
*(("&#" >> limit_d(32u, 126u)[uint_p][&putchar] >> ';')
| anychar_p[&putchar])).full);
putchar('\n');
}
}
compile:
$ g++ -I/path/to/boost -o spirit_ncr2a spirit_ncr2a.cpp
run:
$ echo "Hello,  here's a test colon:." | spirit_ncr2a
output:
"Hello,  here's a test colon:."
I did think I was pretty good at regex but I have never seen lambdas been used in regex, please enlighten me!
I'm currently using python and would have solved it with this oneliner:
''.join([x.isdigit() and chr(int(x)) or x for x in re.split('&#(\d+);',THESTRING)])
Does that make any sense?
Here's a NCR scanner created using Flex:
/** ncr2a.y: Replace all NCRs by corresponding printable ASCII characters. */
%%
&#(1([01][0-9]|2[0-6])|3[2-9]|[4-9][0-9]); { /* accept 32..126 */
/**recursive: unput(atoi(yytext + 2)); skip '&#'; `atoi()` ignores ';' */
fputc(atoi(yytext + 2), yyout); /* non-recursive version */
}
To make an executable:
$ flex ncr2a.y
$ gcc -o ncr2a lex.yy.c -lfl
Example:
$ echo "Hello,  here's a test colon:.
> Here's a test semi-colon; 'ƒ'
> &#59; <-- may be recursive" \
> | ncr2a
It prints for non-recursive version:
Hello,  here's a test colon:.
Here's a test semi-colon; 'ƒ'
; <-- may be recursive
And the recursive one produces:
Hello,  here's a test colon:.
Here's a test semi-colon; 'ƒ'
; <-- may be recursive
This is one of those cases where the original problem statement apparently isn't very complete, it seems, but if you really want to only trigger on cases which produce characters between 32 and 126, that's a trivial change to the solution I posted earlier. Note that my solution also handles the multiple-patterns case (although this first version wouldn't handle cases where some of the adjacent patterns are in-range and others are not).
dd = "0123456789"
ccp = "#" span(dd) $ n *lt(n,127) *ge(n,32) ";" *?(s = s char(n))
+ fence (*ccp | null)
rdl line = input :f(done)
repl line "&" *?(s = ) ccp = s :s(repl)
output = line :(rdl)
done
end
It would not be particularly difficult to handle that case (e.g. ;#131;#58; produces ";#131;:" as well:
dd = "0123456789"
ccp = "#" (span(dd) $ n ";") $ enc
+ *?(s = s (lt(n,127) ge(n,32) char(n), char(10) enc))
+ fence (*ccp | null)
rdl line = input :f(done)
repl line "&" *?(s = ) ccp = s :s(repl)
output = replace(line,char(10),"#") :(rdl)
done
end
Here's a version based on boost::regex_token_iterator. The program replaces decimal NCRs read from stdin by corresponding ASCII characters and prints them to stdout.
#include <cassert>
#include <iostream>
#include <string>
#include <boost/lexical_cast.hpp>
#include <boost/regex.hpp>
int main()
{
boost::regex re("&#(1(?:[01][0-9]|2[0-6])|3[2-9]|[4-9][0-9]);"); // 32..126
const int subs[] = {-1, 1}; // non-match & subexpr
boost::sregex_token_iterator end;
std::string line;
while (std::getline(std::cin, line)) {
boost::sregex_token_iterator tok(line.begin(), line.end(), re, subs);
for (bool isncr = false; tok != end; ++tok, isncr = !isncr) {
if (isncr) { // convert NCR e.g., ':' -> ':'
const int d = boost::lexical_cast<int>(*tok);
assert(32 <= d && d < 127);
std::cout << static_cast<char>(d);
}
else
std::cout << *tok; // output as is
}
std::cout << '\n';
}
}