c++ testing with input cases from text files

c++ testing with input cases from text files - c++

How can i test a c++ program with given input test cases?
For example an input like this
23 45 78 45 12 67 23 76 56 34
34 65 78 45 74 3 98 34 23 97
I want to be able to input these numbers from a text file into a program at run-time and test expected output. I cannot input thousands of numbers by hand in a console so are there any softwares that allow these kind of testing?
This is already being used in InterviewStreet, they use given test cases and match the expected output to test a program.
--Edit--
Is there any way i can pass values from a text file into stdin ?

You could write a little bash script to run all of your tests. An individual test would look something like this:
#!/bin/bash
testprog < input1.txt > output1.txt
diff expected_output1.txt output1.txt
# examine the return code of diff to see if the files are different?

Related

Reading integers from a file c++

I implemented an LZW compressor which encodes the strings into integers with the help of a hash function. I stored the coded string in a text file. Now I need to decompress the same code. I am confused how to differentiate between a two digit integer and a single digit integer while reading from the text file.
For example, my dictionary is:
0 c
1 bba
3 aa
5 ac
7 bb
8 aab
9 a
10 b
and so on.
Now, suppose I encoded a string 'aaabbbac' into "9 3 10 7 9 0" which gets stored in the text file as 9310790. How to differentiate between 0, 1 and 10 while reading from a file?

Some options:
Store them in binary format rather than text format. That might be a little challenge to read and write but it might be worthy the learning. The problem is if you want to visualize the numbers using a text editor but you can find some tool to visualize binary files. Assuming 2 bytes per integer (type short), your example would be in hexa (not considering endian): 00 09 00 03 00 0a 00 07 00 09 00 00
Store them with fixed length per number. Example: printf("%03d", number) will always create numbers with 3 digits. Your example would be: 009003010007009000
Use a comma or semi-colon separator: 9,3,10,7,9,0

How to write on a specific line in a file?

I want to write some data on an already existing file. It is a file that contains about 8-10 lines of header(# comments) and then thousands of lines of data values. What i want is to keep the header same but add the updated data values to the file. It is quite possible that after the update I have less number of lines of data values.
So basically i want to erase everything after the last # comment in the header and then start writing the new values from there onwards. Is that possible?
Here is an example:
Original File
#Program
#Date
#Hello
0 23 23 54
1 12 4 2
2 253 786 9887
3 3 23 54
4 1 4 4
5 23 6 81
Updated File
#Program
#Date
#Hello
0 2 23 54
2 253 786 9887
5 23 6 81
The code i am editing is using fopen to read the file and fprintf to write to it. I would prefer if the answers are along these lines so that i don't have to change those two.

The simplest way I came up with is open the Original File, read and copy the header in to memory such as a string header. Then overwrite the whole file by writing the header, then the new data

Write a function that reads the headers from the file and store them into a class/variable/struct.
Write a function that writes the headers to the file
Write a function that writes the desired values to the file
Execute all three functions in that order. The fact that it is the same file that you overwrite is irrelevant, just be sure to close it before writing back to it

Line Counter for VS 2010

I'm looking for a line counting tool like Project Line Counter by Oz Solomon: Project Line Counter. This add-in worked perfectly for me with VS 2005 and 2008. But, unfortunately, Oz has no time to develop it further or to adapt it to VS 2010.
Do you know some great line counting tool for C++ code that perferably (but not necessarily) meets the following requirements:
distinguish between commented lines, blank lines, code only lines etc.
possibility to restrict to certain files/folders (or even VS projects)
list file names
no cost
integrable in VS 2010
Thanks in advance,
Flinsch.

Source Monitor is not integrated with VS2010, but it gives a very detailed source code metric reports:

I'm using Project Line Counter in Visual Studio 2010 SP1 on Windows 7 64-bit. (It also works without SP1.) You need PLC 221 http://www.wndtabs.com/downloads/PLC221.zip plus a modified registry file you can get from my website: http://www.onemanmmo.com/index.php?cmd=newsitem&comment=news.1.41.0

I know this doesn't meet all of your requirements but I like cloc. It's a simple to use command line tool. Example use / output...
C:\src>cloc --no3 gstreamer
9021 text files.
6495 unique files.
26138 files ignored.
http://cloc.sourceforge.net v 1.09 T=258.0 s (16.7 files/s, 5527.7 lines/s)
--------------------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------------------
C 1633 142010 131712 617327
C/C++ Header 1588 50625 75916 216335
Bourne Shell 52 6830 6060 43742
C++ 73 3937 3928 29514
XML 262 1500 1117 26552
m4 117 3209 2064 23008
make 456 2335 950 8661
HTML 37 92 8 6695
Python 14 1437 934 4446
Teamcenter def 27 30 0 3141
Perl 6 396 251 2338
yacc 2 257 114 2021
Assembly 16 312 356 1782
Objective C 5 277 159 1001
XSLT 10 134 42 853
Lisp 4 91 119 393
IDL 2 40 0 353
lex 2 41 11 190
CSS 2 9 1 153
Bourne Again Shell 4 37 50 146
Tcl/Tk 1 10 27 46
sed 2 0 0 16
D 1 0 0 15
--------------------------------------------------------------------------------
SUM: 4316 213609 223819 988728
--------------------------------------------------------------------------------

This is a project based on Project Line Counter by Oz Solomon, and it improves it in some regards but the line-counting algorithms seems to be the same:
http://www.codeproject.com/KB/macros/LineCounterAddin.aspx
It has an update for VS 2010 made by it's users (see the comments):
http://shiz.wussie.nl/LineCounterAddIn2010.zip

"Kloc" can be used to calculate lines of code. Its a independent tool it cannot be integrated with VS. All you have to do is specify the files and folders and it will calculate the loc for u.

Maybe you can use the integrated macro __LINE__, that returns a current line in code to see how VS makes line counting.

Stripping hex bytes with sed - no match

I have a text file with two non-ascii bytes (0xFF and 0xFE):
??58832520.3,ABC
348384,DEF
The hex for this file is:
FF FE 35 38 38 33 32 35 32 30 2E 33 2C 41 42 43 0A 33 34 38 33 38 34 2C 44 45 46
It's coincidental that FF and FE happen to be the leading bytes (they exist throughout my file, although seemingly always at the beginning of a line).
I am trying to strip these bytes out with sed, but nothing I do seems to match them.
$ sed 's/[^a-zA-Z0-9\,]//g' test.csv
??588325203,ABC
348384,DEF
$ sed 's/[a-zA-Z0-9\,]//g' test.csv
??.
Main question: How do I strip these bytes?
Bonus question: The two regex's above are direct negations, so one of them logically has to filter out these bytes, right? Why do both of these regex's match the 0xFF and 0xFE bytes?
Update: the direct approach of stripping out a range of hex byte (suggested by two answers below) seems to strip out the first "legit" byte from each line and leave the bytes I'm trying to get rid of:
$sed 's/[\x80-\xff]//' test.csv
??8832520.3,ABC
48384,DEF
FF FE 38 38 33 32 35 32 30 2E 33 2C 41 42 43 0A 34 38 33 38 34 2C 44 45 46 0A
Notice the missing "5" and "3" from the beginning of each line, and the new 0A added to the end of the file.
Bigger Update: This problem seems to be system-specific. The problem was observed on OSX, but the suggestions (including my original sed statement above) work as I expect them to on NetBSD.
A solution: This same task seems easy enough via Perl:
$ perl -pe 's/^\xFF\xFE//' test.csv
58832520.3,ABC
348384,DEF
However, I'll leave this question open since this is only a workaround, and doesn't explain what the problem was with sed.

sed 's/[^ -~]//g'
or as the other answer implies
sed 's/[\x80-\xff]//g'
See section 3.9 of the sed info pages. The chapter entitled escapes.
Edit for OSX, the native lang setting is en_US.UTF-8
try
LANG='' sed 's/[^ -~]//g' myfile
This works on an osx machine here, I'm not entirely sure why it does not work when in UTF-8

This will strip out all lines that begin with the specific bytes FF FE
sed -e 's/\xff\xfe//g' hexquestion.txt
The reason that your negated regexes aren't working is that the [] specifies a character class. sed is assuming a particular character set, probably ascii. These characters in your file aren't 7 bit ascii characters, as they both begin with F. sed doesn't know how to deal with these. The solution above doesn't use character classes, so it should be more portable between platforms and character sets.

The FF and FE bytes at the beginning of your file is what is called a "byte order mark (BOM)". It can appear at the start of Unicode text streams to indicate the endianness of the text. FF FE indicates UTF-16 in Little Endian
Here's an excerpt from the FAQ:
Q: How I should deal with BOMs?
A: Here are some guidelines to follow:
A particular protocol (e.g. Microsoft conventions for .txt files) may require use of the BOM on certain Unicode data streams, such as files. When you need to conform to such a protocol, use a BOM.
Some protocols allow optional BOMs in the case of untagged text. In those cases,
Where a text data stream is known to be plain text, but of unknown encoding, BOM can be used as a signature. If there is no BOM, the encoding could be anything.
Where a text data stream is known to be plain Unicode text (but not which endian), then BOM can be used as a signature. If there is no BOM, the text should be interpreted as big-endian.
Some byte oriented protocols expect ASCII characters at the beginning of a file. If UTF-8 is used with these protocols, use of the BOM as encoding form signature should be avoided.
Where the precise type of the data stream is known (e.g. Unicode big-endian or Unicode little-endian), the BOM should not be used. In particular, whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used.
References
unicode.org/FAQ/UTF BOM
See also
Wikipedia/Byte order mark
Wikipedia/Endianness
Related questions
Why would I use a Unicode Signature Byte-Order-Mark (BOM)?
Difference between Big Endian and little Endian Byte order

To show that this isn't an issue of the Unicode BOM, but an issue of eight-bit versus seven-bit characters and tied to the locale, try this:
Show all the bytes:
$ printf '123 abc\xff\xfe\x7f\x80' | hexdump -C
00000000 31 32 33 20 61 62 63 ff fe 7f 80 |123 abc....|
Have sed remove characters that aren't alpha-numeric in the user's locale. Notice that the space and 0x7f are removed:
$ printf '123 abc\xff\xfe\x7f\x80'|sed 's/[^[:alnum:]]//g' | hexdump -C
00000000 31 32 33 61 62 63 ff fe 80 |123abc...|
Have sed remove characters that aren't alpha-numeric in the C locale. Notice that only "123abc" remains:
$ printf '123 abc\xff\xfe\x7f\x80'|LANG=C sed 's/[^[:alnum:]]//g' | hexdump -C
00000000 31 32 33 61 62 63 |123abc|

On OS X, the Byte Order Mark is probably being read as a single word. Try either sed 's/^\xfffe//g' or sed 's/^\xfeff//g' depending on endianess.

You can get the hex codes with \xff \xfE and replace it by nothing.

As an alternative you may used ed(1):
printf '%s\n' H $'g/[\xff\xfe]/s///g' ',p' | ed -s test.csv
printf '%s\n' H $'g/[\xff\xfe]/s///g' wq | ed -s test.csv # in-place edit

What function was used to code these passwords in AFX?

I am trying to work out the format of a password file which is used by a LOGIN DLL of which the source cannot be found. The admin tool was written in AFX, so I hope that it perhaps gives a clue as to the algorithm used to encode the passwords.
Using the admin tool, we have two passwords that are encoded. The first is "dinosaur123456789" and the hex of the encryption is here:
The resulting hex values for the dinosaur password are
00h: 4A 6E 3C 34 29 32 2E 59 51 6B 2B 4E 4F 20 47 75 ; Jn<4)2.YQk+NO Gu
10h: 6A 33 09 ; j3.
20h: 64 69 6E 6F 73 61 75 72 31 32 33 34 35 36 37 38 ; dinosaur12345678
30h: 39 30 ; 90
Another password "gertcha" is encoded as
e8h: 4D 35 4C 46 53 5C 7E ; GROUT M5LFS\~
I've tried looking for a common XOR, but failed to find anything. The passwords are of the same length in the password file so I assume that these are a reversible encoding (it was of another age!). I'm wondering if the AFX classes may have had a means that would be used for this sort of thing?
If anyone can work out the encoding, then that would be great!
Thanks, Matthew
[edit:]
Okay, first, I'm moving on and going to leave the past behind in the new solution. It would have been nice to use the old data still. Indeed, if someone wants to solve it as a puzzle, then I would still like to be able to use it.
For those who want to have a go, I got two passwords done.
All 'a' - a password with 19 a's:
47 7D 47 38 58 57 7C 73 59 2D 50 ; G}G8XW|sY-P
79 68 29 3E 44 52 31 6B 09 ; yh)>DR1k.
All 'b' - a password with 16 b's.
48 7D 2C 71 78 67 4B 46 49 48 5F ; H},qxgKFIH_
69 7D 39 79 5E 09 ; i}9y^.
This convinced me that there is no simple solution involved, and that there is some feedback.

Well, I did a quick cryptanalysis on it, and so far, I can tell you that each password appears to start off with it's ascii value + 26. The next octet seems to be the difference between the first char of the password and the second, added to it's ascii value. The 3d letter, I haven't figured out yet. I think it's safe to say you are dealing with some kind of feedback cipher, which is why XOR turns up nothing. I think each octets value will depend on the previous.
I can go on, but this stuff takes a lot of time. Hopefully this may give you a start, or maybe give you a couple of ideas.

But since the output is equal in length with the input this looks like some fixed key cipher. It may be a trivial xor.
I suggest testing the following passwords:
* AAAAAAAA
* aaaaaaaa
* BBBBBBBB
* ABABABAB
* BABABABA
* AAAABBBB
* BBBBAAAA
* AAAAAAAAAAAAAAAA
* AAAAAAAABBBBBBBB
* BBBBBBBBAAAAAAAA
This should maybe allow us to break the cipher without reverse engineering the DLL.

Can the dll encode single character passwords? Or even a zero-character password?
You're going to want to start with the most trivial test cases.

You may be looking at this problem from the wrong angle. I would think that the best why to figure out how the password hashes are created is to reverse engineer the login dll.
I would recommend IDA Pro for this task. It's well worth the price for the help is gives you is reversing executable code into readable assembler. There are other disassemblers that are free if you don't want to pay money but I haven't come across anything as powerful as IDA Pro. A free static disassembler / debugger that I would recommend would be PEBrowse from SmidgeonSoft as it's good for quickly poking around a live running system and has good PDB support for loading debugging symbols.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js