Where did these .tmp files come from? - c++

Firstly, some details:
I am using a combination of C++ (Armadillo library) and R.
I am using Ubuntu as my operating system.
I am not using Rcpp
Suppose that I have some C++ code called cpp_code which:
Reads, as input from R, an integer,
Performs some calculations,
Saves, as output to R, a spreadsheet "out.csv". (I use .save( name, file_type = csv))
Some simplified R code would be:
for(i in 1:10000)
{
system(paste0("echo ", toString(i), " | ./cpp_code")) ## produces out.csv
output[i,,] <- read.csv("out.csv") ## reads out.csv
}
My Problem:
99% of the time, everything works fine. However, every now and then, I keep getting some unusual .tmp files like: "out.csv.tmp_a0ac9806ff7f0000703a". These .tmp files only appear for a second or so, then suddenly disappear.
Questions:
What is causing this?
Is there a way to stop this from happening?
Please go easy on me since computing is not my main discipline.
Thank you very much for your time.

Many programs write their output to a temporary file, then rename it to the destination file. This is often done to avoid leaving a half-written output file if the process is killed while writing. By using a temporary, the file can be atomically renamed to the output file name ensuring either:
the entire output file is properly written or
no change is made to the output file
Note there usually are still some race conditions that could result, for example, in the output file being deleted but the temporary file not renamed, but one of the two outcomes above is the general goal.

I believe you're using .save function in armadillo.
http://arma.sourceforge.net/docs.html#save_load_field
There are two functions you should see in
include/armadillo_bits/diskio_meat.hpp. In save_raw_ascii, it first stores data to the filename from diskio::gen_tmp_name, and if save_okay, rename by safe_rename. If safe_okay or safe_rename failed, you will see temporary file. The temporary file name is chosen as filename + .tmp_ + some hex value from file name.
//! Save a matrix as raw text (no header, human readable).
//! Matrices can be loaded in Matlab and Octave, as long as they don't have complex elements.
template<typename eT>
inline
bool
diskio::save_raw_ascii(const Mat<eT>& x, const std::string& final_name)
{
arma_extra_debug_sigprint();
const std::string tmp_name = diskio::gen_tmp_name(final_name);
std::fstream f(tmp_name.c_str(), std::fstream::out);
bool save_okay = f.is_open();
if(save_okay == true)
{
save_okay = diskio::save_raw_ascii(x, f);
f.flush();
f.close();
if(save_okay == true)
{
save_okay = diskio::safe_rename(tmp_name, final_name);
}
}
return save_okay;
}
//! Append a quasi-random string to the given filename.
//! The rand() function is deliberately not used,
//! as rand() has an internal state that changes
//! from call to call. Such states should not be
//! modified in scientific applications, where the
//! results should be reproducable and not affected
//! by saving data.
inline
std::string
diskio::gen_tmp_name(const std::string& x)
{
const std::string* ptr_x = &x;
const u8* ptr_ptr_x = reinterpret_cast<const u8*>(&ptr_x);
const char* extra = ".tmp_";
const uword extra_size = 5;
const uword tmp_size = 2*sizeof(u8*) + 2*2;
char tmp[tmp_size];
uword char_count = 0;
for(uword i=0; i<sizeof(u8*); ++i)
{
conv_to_hex(&tmp[char_count], ptr_ptr_x[i]);
char_count += 2;
}
const uword x_size = static_cast<uword>(x.size());
u8 sum = 0;
for(uword i=0; i<x_size; ++i)
{
sum += u8(x[i]);
}
conv_to_hex(&tmp[char_count], sum);
char_count += 2;
conv_to_hex(&tmp[char_count], u8(x_size));
std::string out;
out.resize(x_size + extra_size + tmp_size);
for(uword i=0; i<x_size; ++i)
{
out[i] = x[i];
}
for(uword i=0; i<extra_size; ++i)
{
out[x_size + i] = extra[i];
}
for(uword i=0; i<tmp_size; ++i)
{
out[x_size + extra_size + i] = tmp[i];
}
return out;
}

What “Dark Falcon” hypothesises is exactly true: when calling save, Armadillo creates a temporary file to which it saves the data, and then renames the file to the final name.
This can be seen in the source code (this is save_raw_ascii but the other save* functions work analogously):
const std::string tmp_name = diskio::gen_tmp_name(final_name);
std::fstream f(tmp_name.c_str(), std::fstream::out);
bool save_okay = f.is_open();
if(save_okay == true)
{
save_okay = diskio::save_raw_ascii(x, f);
f.flush();
f.close();
if(save_okay == true)
{
save_okay = diskio::safe_rename(tmp_name, final_name);
}
}
The comment on safe_rename says this:
Safely rename a file.
Before renaming, test if we can write to the final file.
This should prevent:
overwriting files that are write protected,
overwriting directories.
It’s worth noting that this will however not prevent a race condition.

Related

Reading/writing binary file returns 0xCCCCCCCC

I have a script that dumps class info into a binary file, then another script that retrieves it.
Since binary files only accept chars, I wrote three functions for reading and writing Short Ints, Ints, and Floats. I've been experimenting with them so they're not overloaded properly, but they all look like this:
void writeInt(ofstream& file, int val) {
file.write(reinterpret_cast<char *>(&val), sizeof(val));
}
int readInt(ifstream& file) {
int val;
file.read(reinterpret_cast<char *>(&val), sizeof(val));
return val;
}
I'll put the class load/save script at the end of the post, but I don't think it'll make too much sense without the rest of the class info.
Anyway, it seems that the file gets saved properly. It has the correct size, and all of the data matches when I load it. However, at some point in the load process, the file.read() function starts returning 0xCCCCCCCC every time. This looks to me like a read error, but I'm not sure why, or how to correct it. Since the file is the correct size, and I don't touch the seekg() function, it doesn't seem likely that it's reaching the end of file prematurely. I can only assume it's an issue with my read/write method, since I did kind of hack it together with limited knowledge. However, if this is the case, why does it read all the data up to a certain point without issue?
The error starts occurring at a random point each run. This may or may not be related to the fact that all the class data is randomly generated.
Does anyone have experience with this? I'm not even sure how to continue debugging it at this point.
Anyway, here are the load/save functions:
void saveToFile(string fileName) {
ofstream dataFile(fileName.c_str());
writeInt(dataFile, inputSize);
writeInt(dataFile, fullSize);
writeInt(dataFile, outputSize);
// Skips input nodes - no data needs to be saved for them.
for (int i = inputSize; i < fullSize; i++) { // Saves each node after inputSize
writeShortInt(dataFile, nodes[i].size);
writeShortInt(dataFile, nodes[i].skip);
writeFloat(dataFile, nodes[i].value);
//vector<int> connects;
//vector<float> weights;
for (int j = 0; j < nodes[i].size; j++) {
writeInt(dataFile, nodes[i].connects[j]);
writeFloat(dataFile, nodes[i].weights[j]);
}
}
read(500);
}
void loadFromFile(string fileName) {
ifstream dataFile(fileName.c_str());
inputSize = readInt(dataFile);
fullSize = readInt(dataFile);
outputSize = readInt(dataFile);
nodes.resize(fullSize);
for (int i = 0; i < inputSize; i++) {
nodes[i].setSize(0); // Sets input nodes
}
for (int i = inputSize; i < fullSize; i++) { // Loads each node after inputSize
int s = readShortInt(dataFile);
nodes[i].setSize(s);
nodes[i].skip = readShortInt(dataFile);
nodes[i].value = readFloat(dataFile);
//vector<int> connects;
//vector<float> weights;
for (int j = 0; j < nodes[i].size; j++) {
nodes[i].connects[j] = readInt(dataFile); //Error occurs in a random instance of this call of readInt().
nodes[i].weights[j] = readFloat(dataFile);
}
read(i); //Outputs class data to console
}
read(500);
}
Thanks in advance!
You have to check the result of open, read, write operations.
And you need to open files (for reading and writing) as binary.

Adding filenames from a directory to a char* array. c++

I ame trying to get filenames from a directory and put it in a char* array for latter use. But this dont seem to work the way i want to. When printing it only showes the last filename on all spots.
So my question howe can i add the file names in every spot inside the char*[]?
/*Placed outside*/
int i = 0;
char* Files[20] = {};
/*Placed outside*/
while (handle != INVALID_HANDLE_VALUE)
{
char buffer[4100];
sprintf_s(buffer, "%ls", search_data.cFileName);
Files[i] = buffer;
i++;
if (FindNextFile(handle, &search_data) == FALSE)
/*Printing I use ImGui*/
#define IM_ARRAYSIZE(_ARR) ((int)(sizeof(_ARR)/sizeof(*_ARR)))
static int listbox_item_current = 1;
ImGui::ListBox("", &listbox_item_current, Files, i, 4);
You could use C++ standard filesystem, but for that I guess you would need C++17 (or atleast VS15), not really sure.
You would have to include:
#include <experimental/filesystem>
#include <filesystem>
using namespace std::experimental::filesystem::v1;
Using it should be simple:
int i = 0;
const char * directoryToSearch = "C:\etc\etc";
for (const auto & file : directory_iterator(directoryToSearch)) {
files[i] = new char[file.path().stem().string().length() + 1];
strcpy(files[i], file.path().stem().string().c_str());
++i;
}
Indeed, you should clean up the array after you're done using it. Don't forget, not many compilers support this at the moment.
When printing it only shows the last filename on all spots. That is just normal: you store the filename on each iteration in the same buffer and just copy the address of the buffer into your array. Unrelated to the question, as buffer is an automatic variable declared inside a loop (block scoped), using it outside of the loop is Undefined Behaviour, so you end with an array of dangling pointers.
The correct way would be to either use a 2D-array char Files[MAX_PATH][20]; and store a file name in each slot, or use dynamic memory allocate by new (or malloc at a lower level). For the second option, you can do it by hand, allocating memory for each file name - and remember to free anything at the end, or you can let the standard library manage it for you by using:
std::vector<std::string> Files;
...
while(...) {
...
File.push_back(search_data.cFileName);
Dear ImGui provides a ListBoxctor that allows to pass an opaque data storage along with an extractor, and it can be used here:
bool string_vector_items_getter(void* data, int idx, const char** out_text) {
std::vector<std::string> *v = reinterpret_cast<std::vector<std::string> >(data);
if (idx < 0 || idx >= v.size()) return false;
*out_text = v[idx].c_str();
return true;
}
and then:
ImGui::ListBox("", &listbox_item_current, &string_vector_items_getter, &Files, i, 4);
(beware: untested code!)

Another C++ strange segmentation fault by object creation

I've recently encountered a problem in c++ object creation. The problem is somewhat like it in question C++ strange segmentation fault by object creation, however the codes here are part of an open source project and may not have easy errors.
The object creation is called in a method and the method is called in two continuous steps.
The class is defined in strtokenizer.h as follows:
class strtokenizer {
protected:
vector<string> tokens;
int idx;
public:
strtokenizer(string str, string seperators = " ");
void parse(string str, string seperators);
int count_tokens();
string next_token();
void start_scan();
string token(int i);
};
And in strtokenizer.cpp, it is like this:
using namespace std;
strtokenizer::strtokenizer(string str, string seperators) {
parse(str, seperators);
}
void strtokenizer::parse(string str, string seperators) {
int n = str.length();
int start, stop;
if (flag) {
printf("%d\n", n);
}
start = str.find_first_not_of(seperators);
while (start >= 0 && start < n) {
stop = str.find_first_of(seperators, start);
if (stop < 0 || stop > n) {
stop = n;
}
tokens.push_back(str.substr(start, stop - start));
start = str.find_first_not_of(seperators, stop + 1);
}
start_scan();
}
int strtokenizer::count_tokens() {
return tokens.size();
}
void strtokenizer::start_scan() {
idx = 0;
return;
}
string strtokenizer::next_token() {
if (idx >= 0 && idx < tokens.size()) {
return tokens[idx++];
} else {
return "";
}
}
string strtokenizer::token(int i) {
if (i >= 0 && i < tokens.size()) {
return tokens[i];
} else {
return "";
}
}
The method that create the strtokenizer objects is as follows:
int dataset::read_wordmap(string wordmapfile, mapword2id * pword2id) {
pword2id->clear();
FILE * fin = fopen(wordmapfile.c_str(), "r");
if (!fin) {
printf("Cannot open file %s to read!\n", wordmapfile.c_str());
return 1;
}
char buff[BUFF_SIZE_SHORT];
string line;
fgets(buff, BUFF_SIZE_SHORT - 1, fin);
int nwords = atoi(buff);
for (int i = 0; i < nwords; i++) {
fgets(buff, BUFF_SIZE_SHORT - 1, fin);
line = buff;
strtokenizer strtok(line, " \t\r\n");
if (strtok->count_tokens() != 2) {
continue;
}
pword2id->insert(pair<string, int>(strtok->token(0), atoi(strtok->token(1).c_str())));
}
fclose(fin);
return 0;
}
When the read_wordmap() method is run for the first time (first read_wordmap() call), the 'strtok' object is created about 87k times and in the second time (second read_wordmap() call), the oject is expected to be run for more than 88k times. However, it will raise a error (sometime 'segmentation fault' and sometimes 'memory corruption (fast)') at about 86k times in the second method call, at the line:
strtokenizer strtok(line, " \t\r\n");
And when the code block of object creation is revised like those below, there will be no errors.
strtokenizer *strtok = new strtokenizer(line, " \t\r\n");
printf("line: %s", line.c_str());
if (strtok->count_tokens() != 2) {
continue;
}
pword2id->insert(pair<string, int>(strtok->token(0), atoi(strtok->token(1).c_str())));
It look like you have a memory corruption in your code. You should consider using a tool like valgrind (http://valgrind.org/) to check that the code does not write out of bounds.
Your revised code use heap memory instead of stack memory, which may hide the problem (even if it still exists).
By reading your code, there is several missing tests to ensure safe handling in case the provided wordmapfile has some unexpected data.
For example you do not check the result of fgets, so if the number of words at the begining of the file is bigger than the real number of words, you will have issues.
I carefully debugged my code under the suggestion of #Paul R and other friends and found it is because I haven't free memory in stack.
The codes proposed above are tiny parts of my project, and in the project a gibbs sampling algorithm is supposed to run for one thousand times(iterations).
In each iteration, old matrixes are supposed to be freed and new ones are to be "newed out". However, I forgot to free all the matrix and lists, and that's why my program corrupts.
The reason why I posted codes above is that the program will crash every time when it ran into the line:
strtokenizer strtok(line, " \t\r\n");
The object "strtok" will be run for 1000 * lines in files(with 10000+ lines). So it made me think maybe there are too many objects created and take up all of the stack memory. Even though I found there are no need to manually free them.
When debugged the program in visual studio, the monitor of memory occupancy showed a dramatic growth in each iteration and "bad_alloc" error took place every now and then. These made me realize that I forget to free some large dynamic matrix.
Thanks for you all!
And I apologise for the wrongly described question that takes up your time!

C++: how to output data to multiple .dat files?

I have a research project I'm working on. I am a beginner in C++ and programming in general. I have already made a program that generates interacting particles that move on continuous space as time progresses. The only things my program outputs are the XY coordinates for each particle in each time-step.
I want to visualize my findings, to know if my particles are moving as they should. My professor said that I must use gnuplot. Since I could not find a way to output my data in one file so that gnuplot would recognize it, I thought of the following strategy:
a) For each time-step generate one file with XY coordinates of the form "output_#.dat".
b) Generate a .png file for each one of them in gnuplot.
c) Make a movie of the moving particles with all the .png files.
I am going to worry about b and c later, but up to now, I am able to output all my data in one file using this code:
void main()
{
int i = 0;
int t = 0; // time
int j = 0;
int ctr = 0;
double sumX = 0;
double sumY = 0;
srand(time(NULL)); // give system time as seed to random generator
Cell* particles[maxSize]; // create array of pointers of type Cell.
for(i=0; i<maxSize; i++)
{
particles[i] = new Cell(); // initialize in memory
particles[i]->InitPos(); // give initial positions
}
FILE *newFile = fopen("output_00.dat","w"); // print initial positions
for(i=0; i<maxSize; i++)
{
fprintf(newFile, "%f %3 ", particles[i]->getCurrX());
fprintf(newFile, "%f %3 \n", particles[i]->getCurrY());
}
fclose(newFile);
FILE *output = fopen("output_01.dat","w");
for(t = 0; t < tMax; t++)
{
fprintf(output, "%d ", t);
for(i=0; i<maxSize; i++) // for every cell
{
sumX = 0;
sumY = 0;
for(j=0; j<maxSize; j++) // for all surrounding cells
{
sumX += particles[i]->ForceX(particles[i], particles[j]);
sumY += particles[i]->ForceY(particles[i], particles[j]);
}
particles[i]->setVelX(particles[i]->getPrevVelX() + (sumX)*dt); // update speed
particles[i]->setVelY(particles[i]->getPrevVelY() + (sumY)*dt);
particles[i]->setCurrX(particles[i]->getPrevX() + (particles[i]->getVelX())*dt); // update position
particles[i]->setCurrY(particles[i]->getPrevY() + (particles[i]->getVelY())*dt);
fprintf(output, " ");
fprintf(output, "%f %3 ", particles[i]->getCurrX());
fprintf(output, "%f %3 \n", particles[i]->getCurrY());
}
}
fclose(output);
}
This indeed generates 2 files, output_00.dat and output01.dat with the first one containing the initial randomly generated positions and the second one containing all my results.
I can feel that in the nested for loop, where I'm updating the speed and position for the XY coordinates, I can have a FILE* that will store the coordinates for each time step and then close it, before incrementing time. In that way, I will not need multiple pointers to be open at the same time. At least that is my intuition.
I do not know how to generate incrementing filenames. I have stumbled upon ofstream, but I don't understand how it works...
I think what I would like my program to do at this point is:
1) Generate a new file name, using a base name and the current loop counter value.
2) Open that file.
3) Write the coordinates for that time-step.
4) Close the file.
5) Repeat.
Any help will be greatly appreciated. Thank you for your time!
Using ofstream instead of fopen would be a better use of the C++ standard library, whereas now you are using C standard library calls, but there is nothing wrong per se with what you are doing doing now.
It seems like your real core question is how to generate a filename from an integer so you can use it in a loop:
Here is one way:
// Include these somewhere
#include <string>
#include <sstream>
// Define this function
std::string make_output_filename(size_t index) {
std::ostringstream ss;
ss << "output_" << index << ".dat";
return ss.str();
}
// Use that function with fopen in this way:
for (size_t output_file_number=0; /* rest of your for loop stuff */) {
FILE *file = fopen(make_output_filename(output_file_number).c_str(), "w");
/* use the file */
fclose(file);
}
This uses a std::ostringstream" to build a filename using stream operations, and returns the built std::string. When you pass it to fopen, you have to give it a const char * rather than a std::string, so we use the .c_str() member which exists just for this purpose.

Sorting a file with 55K rows and varying Columns

I want to find a programmatic solution using C++.
I have a 900 files each of 27MB size. (just to inform about the enormity ).
Each file has 55K rows and Varying columns. But the header indicates the columns
I want to sort the rows in an order w.r.t to a Column Value.
I wrote the sorting algorithm for this (definitely my newbie attempts, you may say).
This algorithm is working for few numbers, but fails for larger numbers.
Here is the code for the same:
basic functions I defined to use inside the main code:
int getNumberOfColumns(const string& aline)
{
int ncols=0;
istringstream ss(aline);
string s1;
while(ss>>s1) ncols++;
return ncols;
}
vector<string> getWordsFromSentence(const string& aline)
{
vector<string>words;
istringstream ss(aline);
string tstr;
while(ss>>tstr) words.push_back(tstr);
return words;
}
bool findColumnName(vector<string> vs, const string& colName)
{
vector<string>::iterator it = find(vs.begin(), vs.end(), colName);
if ( it != vs.end())
return true;
else return false;
}
int getIndexForColumnName(vector<string> vs, const string& colName)
{
if ( !findColumnName(vs,colName) ) return -1;
else {
vector<string>::iterator it = find(vs.begin(), vs.end(), colName);
return it - vs.begin();
}
}
////////// I like the Recurssive functions - I tried to create a recursive function
///here. This worked for small values , say 20 rows. But for 55K - core dumps
void sort2D(vector<string>vn, vector<string> &srt, int columnIndex)
{
vector<double> pVals;
for ( int i = 0; i < vn.size(); i++) {
vector<string>meancols = getWordsFromSentence(vn[i]);
pVals.push_back(stringToDouble(meancols[columnIndex]));
}
srt.push_back(vn[max_element(pVals.begin(), pVals.end())-pVals.begin()]);
if (vn.size() > 1 ) {
vn.erase(vn.begin()+(max_element(pVals.begin(), pVals.end())-pVals.begin()) );
vector<string> vn2 = vn;
//cout<<srt[srt.size() -1 ]<<endl;
sort2D(vn2 , srt, columnIndex);
}
}
Now the main code:
for ( int i = 0; i < TissueNames.size() -1; i++)
{
for ( int j = i+1; j < TissueNames.size(); j++)
{
//string fname = path+"/gse7307_Female_rma"+TissueNames[i]+"_"+TissueNames[j]+".txt";
//string fname2 = sortpath2+"/gse7307_Female_rma"+TissueNames[i]+"_"+TissueNames[j]+"Sorted.txt";
string fname = path+"/gse7307_Male_rma"+TissueNames[i]+"_"+TissueNames[j]+".txt";
string fname2 = sortpath2+"/gse7307_Male_rma"+TissueNames[i]+"_"+TissueNames[j]+"4Columns.txt";
vector<string>AllLinesInFile;
BioInputStream fin(fname);
string aline;
getline(fin,aline);
replace (aline.begin(), aline.end(), '"',' ');
string headerline = aline;
vector<string> header = getWordsFromSentence(aline);
int pindex = getIndexForColumnName(header,"p-raw");
int xcindex = getIndexForColumnName(header,"xC");
int xeindex = getIndexForColumnName(header,"xE");
int prbindex = getIndexForColumnName(header,"X");
string newheaderline = "X\txC\txE\tp-raw";
BioOutputStream fsrt(fname2);
fsrt<<newheaderline<<endl;
int newpindex=3;
while ( getline(fin, aline) ){
replace (aline.begin(), aline.end(), '"',' ');
istringstream ss2(aline);
string tstr;
ss2>>tstr;
tstr = ss2.str().substr(tstr.length()+1);
vector<string> words = getWordsFromSentence(tstr);
string values = words[prbindex]+"\t"+words[xcindex]+"\t"+words[xeindex]+"\t"+words[pindex];
AllLinesInFile.push_back(values);
}
vector<string>SortedLines;
sort2D(AllLinesInFile, SortedLines,newpindex);
for ( int si = 0; si < SortedLines.size(); si++)
fsrt<<SortedLines[si]<<endl;
cout<<"["<<i<<","<<j<<"] = "<<SortedLines.size()<<endl;
}
}
can some one suggest me a better way of doing this?
why it is failing for larger values. ?
The primary function of interest for this query is Sort2D function.
thanks for the time and patience.
prasad.
I'm not sure why your code is crashing, but recursion in that case is only going to make the code less readable. I doubt it's a stack overflow, however, because you're not using much stack space in each call.
C++ already has std::sort, why not use that instead? You could do it like this:
// functor to compare 2 strings
class CompareStringByValue : public std::binary_function<string, string, bool>
{
public:
CompareStringByValue(int columnIndex) : idx_(columnIndex) {}
bool operator()(const string& s1, const string& s2) const
{
double val1 = stringToDouble(getWordsFromSentence(s1)[idx_]);
double val2 = stringToDouble(getWordsFromSentence(s2)[idx_]);
return val1 < val2;
}
private:
int idx_;
};
To then sort your lines you would call
std::sort(vn.begin(), vn.end(), CompareByStringValue(columnIndex));
Now, there is one problem. This will be slow because stringToDouble and getWordsFromSentence are called multiple times on the same string. You would probably want to generate a separate vector which has precalculated the values of each string, and then have CompareByStringValue just use that vector as a lookup table.
Another way you can do this is insert the strings into a std::multimap<double, std::string>. Just insert the entries as (value, str) and then read them out line-by-line. This is simpler but slower (though has the same big-O complexity).
EDIT: Cleaned up some incorrect code and derived from binary_function.
You could try a method that doesn't involve recursion. if your program crashes using the Sort2D function with large values, then your probably overflowing the stack (danger of using recursion with a large number of function calls). Try another sorting method, maybe using a loop.
sort2D crashes because you keep allocating an array of strings to sort and then you pass it by value, in effect using O(2*N^2) memory. If you really want to keep your recursive function, simply pass vn by reference and don't bother with vn2. And if you don't want to modify the original vn, move the body of sort2D into another function (say, sort2Drecursive) and call that from sort2D.
You might want to take another look at sort2D in general, since you are doing O(N^2) work for something that should take O(N+N*log(N)).
The problem is less your code than the tool you chose for the job. This is purely a text processing problem, so choose a tool good at that. In this case on Unix the best tool for the job is Bash and the GNU coreutils. On Windows you can use PowerShell, Python or Ruby. Python and Ruby will work on any Unix-flavoured machine too, but roughly all Unix machines have Bash and the coreutils installed.
Let $FILES hold the list of files to process, delimited by whitespace. Here's the code for Bash:
for FILE in $FILES; do
echo "Processing file $FILE ..."
tail --lines=+1 $FILE |sort >$FILE.tmp
mv $FILE.tmp $FILE
done