SAS Compare Two Text Files (Unix / Windows) - sas

I need to compare two text files (in different directories) to see if they are different (a binary result is fine). Given a dataset such as the one below, is this possible within a datastep?
Pathname
c:\one\text1.txt
c:\two\text1.txt
c:\one\text2.txt
c:\two\text2.txt
Alternatively, macro code would be fine! Checksum is a possibility, I need the code to run in both windows & unix.

Pass it to the command line (via a pipe fileref)
In Windows, use the 'comp' command.
In Unix, use the 'diff' command.

Thanks to Chris J - this worked for me:
%let root=%sysfunc(pathname(work));
data;
file "&root.\x.txt";
put 'xxx';
data;
file "&root.\x2.txt";
put 'xx x';
filename x pipe "diff &root.\x.txt &root.\x2.txt ";
data;
infile x;
input x $1000.;
run;

Related

How do I add file paths as nodes to a tree or stack in C++

I have a project to search, rename or delete files and folders on a selected drive on the computer using a data structure(A tree, a stack, or a queue). My question is, how do I add file paths and directories as nodes in C++?
Comment in other answer suggests using one of the exec() functions. Then parsing and studying the output.
I approve of that idea, but I find it easier to use popen(). Each of the following examples are part of the Linux API, so the calls are c compatible and can be used directly by C++. I expect popen() will be available on other OS's.
To clarify,
1) popen() is a function call for your C++ code to invoke.
2) You will also need to create strings for your OS to generate the lists you want, and submit them to your invocation of popen(). The 1st parameter is the command string
3) in read mode, the output of your "ls -lsa " or "dir" command will be written into the output pipe of the spawned process, and your code will need to 'suck it in', I recommend capturing it to a std::stringstream.
4) after capture of the "dir -r" output, then parse and extract dir's and file names from the stringstream.
Examples of C++ access to popen:
FILE* m_pipe = nullptr; // popen return a FILE*
// use m_pipe to read from sub-process std::out
m_pipe = ::popen (m_cmd.c_str(), "r"); // read mode
// ^^ because popen is not in a namespace
m_pipe = ::popen(m_cmd.c_str(), "w"); // write to sub-process std::in
int32_t pcloseStat = ::pclose(m_pipe);
{
(void)memset(buff, 0, BUFF_SIZE);
// Reads characters from stream and stores them as a C string
// into buff until
// a) (BUFF_SIZE-1) characters have been read or
// b) a newline or
// c) the end-of-file is reached,
// whichever happens first.
char* stat = fgets (buff, BUFF_SIZE, m_pipe); // returns buff or null
int myErrno = errno; //^^^^^^ -- created by popen
}
Example of building a linux command for popen 1st parameter ...
std::string md5sumCmd ("head --bytes=1M " + mPFN +" | md5sum");
This command grabs the 1st 1Megabyte of file name in mPFN (a std::string), and pipes that output into md5sum ... essentially generating an md5sum of the 1s Meg of the file. The md5sum output is what will be received by the calling process.
You will need to create appropriate commands (to pass to popen) to show dir's and folder's and file names, etc.
What ever works from the command line should be fine, but some options might make parsing the output easier.
For your node based structures, add a string property that would serve as your file path. You might need to replace "\" with "/" in it however as the forward slash is often an escape character in most languages. For example in a queue:
class Node {
Node next;
char[50] path;
}
And you can create accessors and mutators the same way you would anything else in a class. This will allow you to assign it values and to read the values.
Folders could be used as a parent and the files are children. A tree structure would likely be the easiest way to do this.

C++ - Missing end of line characters in file read

I am using the C++ streams to read in a bunch of files in a directory and then write them to another directory. Since these files may be of different types, I am using a the generic ios::binary flag when reading/writing these files.
Example code below:
std::fstream inf( "ex.txt", std::ios::in | std::ios::binary);
char c;
while( inf >> c ) {
// writing to another file in binary format
}
The issue I have is that in the case of files containing text, the end of line characters in these text files are not being written to the output file.
Edit: Or at least they do not appear to be as when the newly written file is opened, there is only a single continuous line of characters.
Edit again: The problem (of the continuous string) appears to persist even when the read / write is made in text mode.
Thus, I was wondering if there was a way to check if a file has text or binary and then read/write it appropriately. Else, is there any way to preserve the end of line characters even when opening the file in binary format?
Edit: I am using the g++ 4.8.2 compiler
When you want to manipulate bytes, you need to use read and write methods, not >> << operators.
You can get the intended behavior with inp.flags(inp.flags() & ~std::ios_base::skipws);, though.

Reading of text file in Ubuntu has extra //r

I am porting a program created in C++ from MS Studio to Ubuntu . The program works fine except when it reads from a text file .
My text file consists of lines of information seperated by the delimiter :
General Manager:G001:def
Customer:C001:def:Lim:Tom:Mr:99999999:zor#hotmail.com:Blk 145 B North #03-03 Singapore 111111
Read method
while (getline(afile,line,'\n')) //read line and store string in variable line
{
stringstream ss(line);
string s;
while (getline(ss,s,':'))
{
word.push_back(s);
}
word.clear();
}
On Windows platform , it is stored correctly as def
However on Ubuntu platform , it is stored as def\\r
It works fine for Customer Record but gives problem for General Manager
I know it has something to do with Carriage return but I am not sure how to resolve it
If the text file was created on Windows, you can use the dos2unix command to remove the extra \r's from the file. The command is simply dos2unix filenamegoeshere

File Merger C++

I was developing an algorithm for a file spliter & merger, where I stumbled upon a problem on How do I merge the files(I split ) with their original extension(file format). I have an idea of writing the file format in the start of the very first chunk of split file (i-e if I have split a file in three files, then store the file format in 1.bij file). Will this idea work? Do you know any better idea, so please share with me.
Thanks
Why not let the user choose the filename with a command line argument ? You could use a -o command line option. Bonus point for letting the user redirects to the standard output using the -filename. He could then pipe with another tool. For instance: merger -o - part*.bin | tar zxvf -
You may include a header for each splitted file with its full filename, but also - for example - with original size, checksum and so on.
Edit: How to write text to binary stream
fstream f(/* initialize*/);
std::string s = "asdf";
// Store the size of text
auto size = s.size();
f.write(&size, sizeof(size));
// Store the string itself
f.write(s.c_str(), s.size());

embedding a text file in an exe which can be accessed using fopen

I would like to embed a text file with some data into my program.
let's call it "data.txt".
This text file is usually loaded with a function which requires the text file's file name as input and is eventually opened using a fopen() call... some something to the lines of
FILE* name = fopen("data.txt");
I can't really change this function and I would like the routine to open this same file every time it runs. I've seen people ask about embedding the file as a header but it seems that I wouldn't be able to call fopen() on a file that I embed into the header.
So my question is: is there a way to embed a text file as a callable file/variable to fopen()?
I am using VS2008.
Yes and No. The easiest way is to transform the content of the text file into an initialized array.
char data_txt[] = {
'd','a','t','a',' ','g','o','e','s',' ','h','e','r','e', //....
};
This transformation is easily done with a small perl script or even a small C program. You then compile and link the resulting module into your program.
An old trick to make this easier to manage with a Makefile is to make the script transform its data into the body of the initializer and write it to a file without the surrounding variable declaration or even the curly braces. If data.txt is transformed to data.inc, then it is used like so:
char data_txt[] = {
#include "data.inc"
};
Update
On many platforms, it is possible to append arbitrary data to the executable file itself. The trick then is to find it at run time. On platforms where this is possible, there will be file header information for the executable that indicates the length of the executable image. That can be used to compute an offset to use with fseek() after you have opened the executable file for reading. That is harder to do in a portable way, since it may not even be possible to learn the actual file name of your executable image at run time in a portable way. (Hint, argv[0] is not required to point to the actual program.)
If you cannot avoid the call to fopen(), then you can still use this trick to keep a copy of the content of data.txt, and put it back in a file at run time. You could even be clever and only write the file if it is missing....
If you can drop the call to fopen() but still need a FILE * pointing at the data, then this is likely possible if you are willing to play fast and loose with your C runtime library's implementation of stdio. In the GNU version of libc, functions like sprintf() and sscanf() are actually implemented by creating a "real enough" FILE * that can be passed to a common implementation (vfprintf() and vfscanf(), IIRC). That faked FILE is marked as buffered, and points its buffer to the users's buffer. Some magic is used to make sure the rest of stdio doesn't do anything stupid.
For any kind of file, base on RBerteig anwser you could do something simple as this with python:
This program will generate a text.txt.c file that can be compiled and linked to your code, to embed any text or binary file directly to your exe and read it directly from a variable:
import struct; # Needed to convert string to byte
f = open("text.txt","rb") # Open the file in read binary mode
s = "unsigned char text_txt_data[] = {"
b = f.read(1) # Read one byte from the stream
db = struct.unpack("b",b)[0] # Transform it to byte
h = hex(db) # Generate hexadecimal string
s = s + h; # Add it to the final code
b = f.read(1) # Read one byte from the stream
while b != "":
s = s + "," # Add a coma to separate the array
db = struct.unpack("b",b)[0] # Transform it to byte
h = hex(db) # Generate hexadecimal string
s = s + h; # Add it to the final code
b = f.read(1) # Read one byte from the stream
s = s + "};" # Close the bracktes
f.close() # Close the file
# Write the resultan code to a file that can be compiled
fw = open("text.txt.c","w");
fw.write(s);
fw.close();
Will generate something like
unsigned char text_txt_data[] = {0x52,0x61,0x6e,0x64,0x6f,0x6d,0x20,0x6e,0x75...
You can latter use your data in another c file using the variable with a code like this:
extern unsigned char text_txt_data[];
Right now I cant think of two ways to converting it to readable text. Using memory streams or converting it to a c-string.