Parsing set of Json files one by one - c++

I am trying to parse Json files using JsonCpp library. but I am facing a problem Which I can not fix it. the code shown below is working perfectly when I am parsing one file but when I added the part which iterates over files in directory the program crashes.
The first function is used to search in a certain directory for Json files and save their names in vector of string (results).
In main function, the program starts by defining the extension required (.json) then calling search function. after that I tried to open each file to parse it.
Finally, Thanks and I really appreciate any kind of help.
#include "jsoncpp.cpp"
#include <stdio.h>
#include "json.h"
#include <iostream>
#include <fstream>
#include <string>
#include <cstdio>
#include <cstring>
#include <unistd.h>
#include <dirent.h>
#include <vector>
using namespace std;
vector<string> results; // holds search results
// recursive search algorithm
void search(string curr_directory, string extension){
DIR* dir_point = opendir(curr_directory.c_str());
dirent* entry = readdir(dir_point);
while (entry){ // if !entry then end of directory
if (entry->d_type == DT_DIR){ // if entry is a directory
string fname = entry->d_name;
if (fname != "." && fname != "..")
search(entry->d_name, extension); // search through it
}
else if (entry->d_type == DT_REG){ // if entry is a regular file
string fname = entry->d_name; // filename
// if filename's last characters are extension
if (fname.find(extension, (fname.length() - extension.length())) != string::npos)
results.push_back(fname); // add filename to results vector
}
entry = readdir(dir_point);
}
return;
}
//
//
//
//
int main(int argc, char *argv[])
{
// read Files list
string extension; // type of file to search for
extension = "json";
// setup search parameters
string curr_directory = "/Users/ITSGC_Ready2Go/3dMap";
search(curr_directory, extension);
// loop over files
//if (results.size()){
//std::cout << results.size() << " files were found:" << std::endl;
for (unsigned int z = 0; z < results.size(); ++z){ // used unsigned to appease compiler warnings
// Opening the file using ifstream function from fstream library
cout <<results[z].c_str()<<endl;
Json::Value obj;
Json::Reader reader;
ifstream test(results[z].c_str());
//test.open (results[z].c_str(), std::fstream::in );
// Selection objects inside the file
reader.parse(test,obj);
//test >> obj;
// Parsing ID object and returning its value as integer
// cout << "id :" << stoi(obj["id"].asString()) <<endl;
// Parsing Line object with its internal objects
const Json::Value& lines = obj["lines"];
for (int i=0; i<lines.size();i++){
cout << "index : " << i << endl;
cout << "id:" << lines[i]["id"].asString() <<endl;
cout << "type:" << lines[i]["type"].asString() <<endl;
cout << "function:" << lines[i]["function"].asString() <<endl;
cout << "color:" << lines[i]["color"].asString() <<endl;
const Json::Value& poly = lines[i]["polyPoints"];
for (int j=0; j<poly.size();j++){
cout << "batch#"<<j<<endl;
cout << "latitude" << poly[j]["latitude"].asFloat()<<endl;
cout << "longitude" << poly[j]["longitude"].asFloat()<<endl;
cout << "altitude" << poly[j]["altitude"].asFloat()<<endl;
}
}
// Reading the OccupancyGrid object
// OccupancyGrid object is copied into constant to parse the arrays inside
const Json::Value& occupancyGrid = obj["occupancyGrid"];
cout << occupancyGrid.size() <<endl;
// The size of occupancyGrid is the used as number of iterations (#of rows)
for (int l=0; l<occupancyGrid.size();l++){
// Arrays inside occupancyGrid are copied into constant to parse the elements inside each array
const Json::Value& element = occupancyGrid[l];
// iterations over the size of the array in order to parse every element
cout << "row" << l << "--> ";
for (int k=0;k<element.size();k++){
cout << element[k].asFloat();
if(k<element.size()-1){ cout<< ",";}
}
cout << endl;
}
// Parsing roadSigns object as found in the file
// Need to understand the difference between format in the mail and the 1456 file
const Json::Value& roadsigns = obj["roadSigns"];
cout << "ArrayType: " << roadsigns["_ArrayType_"].asString()<<endl;
const Json::Value& ArraySize = roadsigns["_ArraySize_"];
for(int t=0;t<ArraySize.size();t++){
cout << ArraySize[t].asInt();
if (t<ArraySize.size()-1){ cout << " , ";}
}
cout<< endl;
if (roadsigns["_ArrayData_"].asString().empty()) {
cout << "ArrayData: "<<roadsigns["_ArrayData_"].asFloat(); }
else { cout << "ArrayData: empty "; }
cout <<endl;
test.close();
test.clear();
cout << "Done" << endl;
cout << "...." << endl;
cout << "...." << endl;
}
//else{
// std::cout << "No files ending in '" << extension << "' were found." << std::endl;
//}
}

Without access to the JSON library I can't help you too much, but the first obvious place for potential crashes would be if (fname.find(extension, (fname.length() - extension.length())) != string::npos). You need to make sure that your file name is longer than the size of your extension before making that call.
Also, for extremely deep directory trees you should put a limit on recursion, and all OSes I know of have some sort of character limit on directory and file names.

Related

Undeclared Identifier in my function call (C++)

I am a new programmer working in C++, I am trying to make a program that will import information from a file to an output file and then I'm going to do a search algorithm on the data. I am trying to use a structure of data and import that into an array and then call it in the main program.
For some reason I can't, for the life of me, get my function call to work; I keep getting an undeclared identifier error on inputFile in my function call in the main program. I realize I'm probably doing something fundamentally wrong, so I would really appreciate any help that can be given.
#include <iostream>
#include <iomanip>
#include <string>
#include <cstdlib>
#include <fstream>
using namespace std;
const int MAX_LOG_SIZE = 7584;
const string LOGFILE ="crimes.dat";
const string OUTPUT_FILE ="crimesorted.log";
// Structure of strings based on info from crimes.dat
struct CrimeInfo
{
string Crimedescr;
string Date;
string Time;
string Address;
string Grid;
string Latitude;
string Longitude;
};
CrimeInfo crimeList [MAX_LOG_SIZE];
void openInputFile(ifstream& inputFile, string inputFilename)
// here we open the input file crimes.dat
{
inputFile.open(inputFilename.c_str());
while (inputFile.fail())
{
cout << "Failed to open input file: " << inputFilename << ".\n";
exit(1);
}
};
void getLogEntry(ifstream &LOGFILE, CrimeInfo &entry)
{
getline(LOGFILE, entry.Date);
getline(LOGFILE, entry.Time);
getline(LOGFILE, entry.Address);
getline(LOGFILE, entry.Grid);
getline(LOGFILE, entry.Crimedescr);
getline(LOGFILE, entry.Latitude);
getline(LOGFILE, entry.Longitude);
}
/* opens an output file */
void openOutputFile(ofstream& outputFile, string outputFilename)
{
outputFile.open(outputFilename.c_str());
if (outputFile.fail())
{
cout << "Failed to open output file: " << outputFilename << ".\n";
exit(2);
}
}
void outputLogFile(string outputFilename, CrimeInfo arr[], int size)
{
// open output files
ofstream outputLogFile;
openOutputFile(outputLogFile, outputFilename);
// output the crime file
outputLogFile << "\nCrime log sort ^^:\n\n";
for (int i = 0; i < size; i++)
{
outputLogFile << arr[i].Date << " ";
outputLogFile << arr[i].Address << " (";
outputLogFile << arr[i].Longitude << " ";
outputLogFile << arr[i].Latitude << " ";
outputLogFile << arr[i].Time << " ";
outputLogFile << arr[i].Grid << " ";
outputLogFile << arr[i].Crimedescr << "";
outputLogFile << endl;
}
outputLogFile.close();
}
int main()
{
outputLogFile(OUTPUT_FILE, crimeList, MAX_LOG_SIZE);
for (int i =0; i < MAX_LOG_SIZE; i++)
getLogEntry(inputFile, crimeList[i].Date);
}
There are a lot of problems with your code. To help you out, I went through your code and left a lot of my own comments to tell you some suggestions I had; to make it easy, I deleted your comments so there's no confusion on what was yours and what I put there.
Here are some things I noticed in your code:
using namespace std is generally considered a very bad practice. Instead, just specify the namespace (e.g. std::string instead of just string).
You declared LOGFILE as a string at the top of your program, but then tried to use it as an ifstream& in the function getLogEntry.
Your main method is out of order. I'm assuming you want to load some data into the program from a file and then output that data to another file. The way you have it in your main method is, first, you output information you don't have yet and, second, import information but don't do anything with it.
You have a LOT of functions. As a general rule of thumb, don't make a whole function for opening a file, then a separate one for using it, then a separate one for closing it. There are a lot of big reasons why not to do this. The biggest reasons are that your program becomes very difficult to follow, and no one else will be able to use your code. In real-world applications, your code is only 20% for the computer and 80% for other programmers.
There are various formatting errors and such.
So, here is your original code with my comments...
#include <iostream>
#include <iomanip>
#include <string>
#include <cstdlib> // Unneeded since other headers here already include this
#include <fstream>
using namespace std; // NEVER globally use the entire standard namespace!
const int MAX_LOG_SIZE = 7584; // Can be declared 'constexpr'
const string LOGFILE ="crimes.dat";
const string OUTPUT_FILE ="crimesorted.log";
/*
NOTE:
> It often looks a lot cleaner to have a header part of your code
and then define your functions seperately. This is good practice
for when you need to start using header files with big projects
*/
struct CrimeInfo
{ // Can declare all variables by only listing type once if they're all the same type
string Crimedescr;
string Date;
string Time;
string Address;
string Grid;
string Latitude;
string Longitude;
};
CrimeInfo crimeList [MAX_LOG_SIZE]; // This should be in 'main()'
/*
This should not be its own function.
Making too many function can make things look a bit confusing.
Here, this is only 4 lines of code, so you shouldn't be making
an entire function for it.
*/
void openInputFile(ifstream& inputFile, string inputFilename)
{
inputFile.open(inputFilename.c_str());
while (inputFile.fail())
{
cout << "Failed to open input file: " << inputFilename << ".\n";
exit(1);
}
};
/*
This should also just be written out where its used. There's
no need to make a whole function for a task like this.
ERROR HERE:
> LOGFILE is NOT an std::ifstream! It is a std::string!
*/
void getLogEntry(ifstream &LOGFILE, CrimeInfo &entry)
{
getline(LOGFILE, entry.Date);
getline(LOGFILE, entry.Time);
getline(LOGFILE, entry.Address);
getline(LOGFILE, entry.Grid);
getline(LOGFILE, entry.Crimedescr);
getline(LOGFILE, entry.Latitude);
getline(LOGFILE, entry.Longitude);
}
/*
This should not be its own function.
Making too many function can make things look a bit confusing.
Here, this is only 4 lines of code, so you shouldn't be making
an entire function for it.
*/
void openOutputFile(ofstream& outputFile, string outputFilename)
{
outputFile.open(outputFilename.c_str());
if (outputFile.fail())
{
cout << "Failed to open output file: " << outputFilename << ".\n";
exit(2);
}
}
// It's a good idea to use some sort of documentation style for functions
void outputLogFile(
// Declare variables const when they aren't modified
/* (const) */ string outputFilename,
/* (const) */ CrimeInfo arr[],
/* (const) */ int size)
{
ofstream outputLogFile;
openOutputFile(outputLogFile, outputFilename); // Just write out the code
outputLogFile << "\nCrime log sort ^^:\n\n";
for (int i = 0; i < size; i++)
{
/*
You only need to declare the name of the stream one time
e.g.
outputLogFile << thing1 << thing2
<< thing3 << thing4 << thing5
<< thing6
<< endl;
*/
outputLogFile << arr[i].Date << " ";
outputLogFile << arr[i].Address << " (";
outputLogFile << arr[i].Longitude << " ";
outputLogFile << arr[i].Latitude << " ";
outputLogFile << arr[i].Time << " ";
outputLogFile << arr[i].Grid << " ";
outputLogFile << arr[i].Crimedescr << ""; // Empty quotes not needed here
outputLogFile << endl;
}
outputLogFile.close();
}
int main()
{
// What data are you outputting?
outputLogFile(OUTPUT_FILE, crimeList, MAX_LOG_SIZE);
// Are you trying to load the data you just outputted?
for (int i =0; i < MAX_LOG_SIZE; i++)
{ // I added these braces, but it's a good idea to always have braces
// You have not declared 'inputFile' anywhere
getLogEntry(inputFile, crimeList[i].Date);
}
}
Instead of leaving you to have to figure all that out on your own (I know how frustrating that can be), I went ahead and wrote your program how I'd do it. I tried to put comments in a lot of places to make it easy to follow along with. If you have any questions about it, feel free to ask me.
#include <fstream>
#include <iomanip>
#include <iostream>
#include <string>
/*
If you're using C++17, the lines below can just become one line:
using std::cin, std::cout, std::endl, std::ifstream,
std::ofstream, std::string, std::getline;
*/
using std::cin;
using std::cout;
using std::endl;
using std::ifstream;
using std::ofstream;
using std::string;
constexpr int MAX_LOG_SIZE = 7584;
const string LOGFILE_NAME = "crimes.dat";
// I'm assuming: inputFile ^^^
// outputFile vvv
const string OUTPUT_FILE_NAME = "crimesorted.log";
/*
NOTE: If you're trying to export data to "crimesorted.log"
and then load it back into the program through "crimes.dat",
that will be a problem. I say this because the main method
in your original code, this is the order you had it in.
*/
// [BEGIN] Function Prototypes
// Structure of strings based on info from crimes.dat
struct CrimeInfo
{
string Crimedescr, Date, Time, Address,
Grid, Latitude, Longitude;
};
/** (This is JavaDoc-style documentation)
[Purpose of function here]
#param outputFile [Describe paramater here]
#param arr[] [Describe parameter here]
#param size_of_arr Size of 'arr[]'
*/
void outputLogFile(
ofstream& outputFile, // Changed to 'std::ofstream&' because I declare this in 'main()'
const CrimeInfo arr[],
const int size_of_arr);
// [END] Function Prototypes
int main()
{
// Create std::ifstream and open a file
ifstream file_to_load;
file_to_load.open(LOGFILE_NAME);
// Constructing and using 'crimeList' here allows the size to be known in
// this scope. However, if it's passed to a function, it's passed as a pointer
CrimeInfo crimeList[MAX_LOG_SIZE];
// Check if file was open and do stuff with it
if (file_to_load.is_open())
{ // File was opened
for (int i = 0; i < MAX_LOG_SIZE; i++)
{
getline(file_to_load, crimeList[i].Date);
getline(file_to_load, crimeList[i].Time);
getline(file_to_load, crimeList[i].Address);
getline(file_to_load, crimeList[i].Grid);
getline(file_to_load, crimeList[i].Crimedescr);
getline(file_to_load, crimeList[i].Latitude);
getline(file_to_load, crimeList[i].Longitude);
}
file_to_load.close(); // Close file
}
else
{ // File could not be
cout << "Could not open file: " << LOGFILE_NAME << endl;
return 1;
}
// Create std::ofstream and output the log
ofstream outputFile;
outputFile.open(OUTPUT_FILE_NAME);
// Check if 'outputFile' opened OUTPUT_FILE_NAME successfully
if(outputFile.is_open())
{ // File was opened
outputLogFile(outputFile, crimeList, MAX_LOG_SIZE);
outputFile.close();
}
else
{ // File could not be opened
cout << "Could not open file: " << OUTPUT_FILE_NAME << endl;
return 1;
}
}
// Function definition for outputLogFile()
void outputLogFile(
ofstream &outputFile,
const CrimeInfo arr[],
const int size_of_arr)
{
outputFile << "\nCrime log sort ^^:\n\n";
for (int i = 0; i < size_of_arr; i++)
{
outputFile
<< arr[i].Date << '\n' // Newlines may look better than spaces here
<< arr[i].Address << " ("
<< arr[i].Longitude << ", "
<< arr[i].Latitude << ")\n"
<< arr[i].Time << '\n'
<< arr[i].Grid << '\n'
<< arr[i].Crimedescr
<< endl;
}
}

Assertion failure when writing to a text file using ofstream

I am trying to write some string data to a .txt file that i read from the user but after doing so, the program shuts down instead of continuing and when i check the results inside the .txt file i see some part of the data and then some gibberish, followed by an assertion failure error! Here's the code:
#include "std_lib_facilities.h"
#include <fstream>
using namespace std;
using std::ofstream;
void beginProcess();
string promptForInput();
void writeDataToFile(vector<string>);
string fileName = "links.txt";
ofstream ofs(fileName.c_str(),std::ofstream::out);
int main() {
// ofs.open(fileName.c_str(),std::ofstream::out | std::ofstream::app);
beginProcess();
return 0;
}
void beginProcess() {
vector<string> links;
string result = promptForInput();
while(result == "Y") {
for(int i=0;i <= 5;i++) {
string link = "";
cout << "Paste the link skill #" << i+1 << " below: " << '\n';
cin >> link;
links.push_back(link);
}
writeDataToFile(links);
links.clear(); // erases all of the vector's elements, leaving it with a size of 0
result = promptForInput();
}
std::cout << "Thanks for using the program!" << '\n';
}
string promptForInput() {
string input = "";
std::cout << "Would you like to start/continue the process(Y/N)?" << '\n';
std::cin >> input;
return input;
}
void writeDataToFile(vector<string> links) {
if(!ofs) {
error("Error writing to file!");
} else {
ofs << "new ArrayList<>(Arrays.AsList(" << links[0] << ',' << links[1] << ',' << links[2] << ',' << links[3] << ',' << links[4] << ',' << links[5] << ',' << links[6] << ',' << "));\n";
}
}
The problem lies probably somewhere in the ofstream writing procedure but i can't figure it out. Any ideas?
You seem to be filling a vector of 6 elemenents, with indices 0-5, however in your writeDataToFile function are dereferencing links[6] which is out of bounds of your original vector.
Another thing which is unrelated to your problem, but is good practice:
void writeDataToFile(vector<string> links)
is declaring a function which performs a copy of your vector. Unless you want to specifically copy your input vector, you most probably want to pass a const reference, like tso:
void writeDataToFile(const vector<string>& links)

Using to upper incorrectly? Working code until I entered toupper

My program worked like it was supposed to until I added the toupper part into my program. I've tried looking at my error code but it's not really helping. The errors are:
no matching function to call
2 arguments expected, one provided
So I know the error is in those two statements in my while loop. What did I do wrong?
I want to make a name like
john brown
go to
John Brown
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
using namespace std;
int main(){
string firstname[5];
string lastname[5];
ifstream fin( "data_names.txt" );
if (!fin) {
cout << "There is no file" << endl;
}
int i = 0;
while( i < 5 && (fin >> firstname[i]) && (fin >> lastname[i]) ) {
firstname[0] = toupper(firstname[0]);
lastname[0] = toupper(lastname[0]);
i++;
}
cout << firstname[0] << " " << lastname [0] << endl;
cout << firstname[1] << " " << lastname [1] << endl;
cout << firstname[2] << " " << lastname [2] << endl;
cout << firstname[3] << " " << lastname [3] << endl;
cout << firstname[4] << " " << lastname [4] << endl;
return 0;
}
std::toupper works on individual characters, but you are trying to apply it to strings. Besides adding #include <cctype>, you need to modify your while loop's body:
firstname[i][0] = toupper(firstname[i][0]);
lastname[i][0] = toupper(lastname[i][0]);
i++;
Then it should work as expected. Live demo here
As M.M helpfully pointed out in the comments, you should also check that your strings aren't empty before accessing their first characters, i.e. something like
if (!firstname[i].empty()) firstname[i][0] = toupper(...);
is strongly recommended.
Mind you, you will probably need more sophisticated logic if you get names like McDonald :)
You need ctype.h to get the proper definition for toupper(). It is usually implemented not as a function, but an array mapping.
#include <ctype.h>
The program has several flaws: using a string array instead of a string, not iterating through the string correctly, not declaring but using the C definition of toupper(), not exiting when the file does not exist.
Use this instead:
#include <ctype.h>
#include <iostream>
#include <string>
using namespace std;
int main ()
{
ifstream fin ("data_names.txt");
if (!fin)
{
cerr << "File missing" << endl;
return 1;
}
// not sure if you were trying to process 5 lines or five words per line
// but this will process the entire file
while (!fin.eof())
{
string s;
fin >> s;
for (i = 0; i < s.length(); ++i)
s [i] = toupper (s [i]);
cout << s << endl;
}
return 0;
}

Reading multiple files one by one

I am almost done with my project, but I still have one small thing that needs to be done...I need to run the entire program for each file in the directory. There are about 200 files in total. Below is the main class of the program that needs to run. I'm thinking I will put the entire thing in a do-while loop and run it until there are no more .dat files in the directory, but I'm not sure if that will work. Obviously, I'd like to replace the hard-coded file names with variables...I'm just not sure how to do that, either. Please let me know if you need clarification. I've been working on this project for a while and I'm getting kind of brain-numb. Thanks in advance for your help!
Edit My test directory is on a Windows machine, but it will be uploaded to a linux machine at school.
int main() {
NearestNeighbor face;
//string path = "C:\Users\Documents\NetBeansProjects\CSCE350";
//string searchPattern = "*dat";
// string fullSearchPath = path + searchPattern;
/*TEMPLATE DATA*/
/***********************************************************************************/
fstream templateData;
double data = 0.0;
templateData.open("003_template.dat", std::ios::in);
//check that the file is opened
if (!templateData.is_open()) {
std::cerr << "Template: Nooooooo!\n";
exit(0);
}
/*************************************************************************************/
//fill the templateVector with the values from templateData
std::vector<std::vector<double> > templateVector;
std::string line;
while (getline(templateData, line, '\n'))
templateVector.push_back(face.splitData(line));
//testing the contents of the templateVector
// cout << "TemplateVector: ";
// for (unsigned i = 0u; i != templateVector.size(); ++i) {
//
// std::cout << "Index[" << i << "] ";
// for(double value : templateVector[i])
// std::cout << value << " ";
// std::cout << "\n";
// }
/*QUERY DATA*/
/************************************************************************************/
std::ifstream inFile("003_AU01_query.dat", std::ios::in);
std::vector<double> queryVector;
double pixel = 0.0;
// Check that the file opened
if (!inFile.is_open()) {
std::cerr << "Query: Nooooooo!\n";
exit(1);
}
// fill the queryVector with the query data
while (inFile >> pixel) {
queryVector.push_back(pixel);
}
inFile.close();
// testing the content of the query vector
// for (unsigned i =0u; i < pixels.size(); i++){
// std::cout << "Index["<< i << "] " << pixels[i];
// }
// std::cout << "\n";
/*OUTPUT SCALAR PRODUCT*/
/****************************************************************************************/
vector<double> theList;
/*break out each of the vectors from the templateVector and compute the scalar product*/
for (auto& vec : templateVector) {
int i;
cout << "\nscalar_product: Index[" << i << "] " << face.scalar_product(vec, queryVector);
theList.push_back(face.scalar_product(vec, queryVector));//fill theList vector with the computations
i++;
std::cout << "\n";
}
//make sure that the sorted products are output with their original index numbers
vector<pair<int, double> > sorted;
sorted.reserve(theList.size());
for(size_t i = 0.00; i != theList.size(); i++){
sorted.push_back(make_pair(theList[i], i));
}
//sort the scalar products and print out the 10 closest neighbors
face.quickSort(sorted);
cout << "\nVector after sort:\n";
for(size_t i = 0; i < 10; i++){
cout << "idx: " << sorted[i].second << " " << "val: " << sorted[i].first << endl;
}
}
A solution in bash:
#!/bin/bash
for file in `ls`
do
./program $file
done
Of course you'd have to modify your main function to take an argument to pass to the fstream constructor:
int main(int argc, char **argv)
{
if (argc != 2)
{
// some error handling code
}
ifstream templateData(argv[1]);
if (!templateData)
{
// more error handling
}
// process the file
}
From your code it is windows.
This code will print all the *.dat file names in your folder:
Instead of printing do whatever you like.
first You'll need to include:
#include <windows.h>
Now to the code:
const wstring dir = L"C:\\Users\\Documents\\NetBeansProjects\\CSCE350";
const wstring ext = L"dat";
wstring findstr = dir;
findstr += L"\\*.";
findstr += ext;
WIN32_FIND_DATA ffd;
HANDLE hFind = FindFirstFile(findstr.c_str(),&ffd);
do{
if(!(ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)){
wstring path = dir;
path += L"\\";
path += ffd.cFileName;
wcout<< path<<endl;
}
} while (FindNextFile(hFind, &ffd) != 0);
FindClose(hFind);
EDIT:
On linux you can give your program a parameter path/to/dir/*.dat and you'll get the parameters trough argv, maybe its the better solution.
But if you insist to do it with code it is like this:
includes:
#include <sys/types.h>
#include <dirent.h>
Now the code:
const string dirname = "path/to/dir";
const string ext = ".dat";
DIR *dir;
struct dirent *de;
if((dir = opendir(dirname.c_str())) == NULL) {
//error... check errno and so on
cerr<<"Error..."<<endl;
}else{
while ((de = readdir(dir)) != NULL) {
//you can use stat to check if is is file or dir...
string filename(de->d_name);
if(ext = filename.substr(filename.size()-ext.size())){
cout<<dirname<<"/"<<filename<<endl;
}
}
closedir(dp);
}
Good luck

C++ - Reading Columns of a CSV files and only keeping ones that start with a specific string

so I am trying to figure out how to sort CSV files to help organize data that I need for an economics paper. The files are massive and there are a lot of them (about 587 mb of zipper files). The files are organized by columns in that all the variable names are in the first line and all the data for that variable is all below it. My goal is to be able to only take the columns that start with the an indicated string (ex input: "MC1", Get: MC10RT2,MC1WE02,...) and then save them into a separate file. Does anyone have any advice as to what the form that the code should take?
Just for fun a small program that should work for you. The thing you'll be intersted in is boost::split(columns, str, boost::is_any_of(","), boost::token_compress_off); that here create a vector of string from your csv-style string.
Very basic example, but your question was an excuse to play a bit with boost string algorithms, that I did know but never used...
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <set>
// Typedefs for eye candy
typedef std::vector<std::string> Fields;
typedef std::vector<Fields> Results;
typedef std::set<unsigned long> Columns;
// Split the CSV string to a vector of string
Fields split_to_fields(const std::string& str)
{
Fields columns;
boost::split(columns, str, boost::is_any_of(","),
boost::token_compress_off);
return columns;
}
// Read all the wanted columns
Results read_columns_of_csv(std::istream& stream, const Columns& wanted_columns)
{
std::string str;
Results results;
while (getline(stream, str))
{
Fields line{split_to_fields(str)};
Fields fields;
for (unsigned long wanted_column: wanted_columns)
{
if (line.size() < wanted_column)
{
std::cerr << "Line " << (results.size() + 1 )
<< " does not contain enough fields: "
<< line.size() << " < " << wanted_column
<< std::endl;
}
else
{
fields.push_back(line[wanted_column]);
}
}
results.push_back(fields);
}
return results;
}
// Read the ids of the columns you want to get
Columns read_wanted_columns(unsigned long max_id)
{
Columns wanted_columns;
unsigned long column;
do
{
std::cin >> column;
if ((column < max_id)
&& (column > 0))
{
wanted_columns.insert(column - 1);
}
}
while (column > 0);
return wanted_columns;
}
// Whole read process (header + columns)
Results read_csv(std::istream& stream)
{
std::string str;
if (!getline(stream, str))
{
std::cerr << "Empty file !" << std::endl;
return Results{};
}
// Get the column name
Fields columns{split_to_fields(str)};
// Output the column with id
unsigned long column_id = 1;
std::cout
<< "Select one of the column by entering its id (enter 0 to end): "
<< std::endl;
for (const std::string elem: columns)
{
std::cout << column_id++ << ": " << elem << std::endl;
};
// Read the choosen cols
return read_columns_of_csv(stream, read_wanted_columns(column_id));
}
int main(int argc, char* argv[])
{
// Manage errors for filename
if (argc < 2)
{
std::cerr << "Please specify a filename" << std::endl;
return -1;
}
std::ifstream file(argv[1]);
if (!file)
{
std::cerr << "Invalid filename: " << argv[1] << std::endl;
return -2;
}
// Process
Results results{read_csv(file)};
// Output
unsigned long line = 1;
std::cout << "Results: " << results.size() << " lines" << std::endl;
for (Fields fields: results)
{
std::cout << line++ << ": ";
std::copy(fields.begin(), fields.end(),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout << std::endl;
}
return 0;
}
I suggest using a vector of structures.
The structure will allow each row to have a different type.
Your program would take on the following structure:
Read data into a the vector.
Extra necessary fields out of each structure in the vector and write
to new file.
Close all files.