C++ Reading and storing information from file - c++

I have a text file that has the following data for a simple entry/exit system:
where each line has {time_stamp} {name} {door_entry} {status}
time_stamp - number of seconds since some arbitrary start time
name - The workers username
door_entry - door number entered/exited
status - whether they entered or exited the door
The text file is large and has about 10,000 entries similar to this
Question: I'm wondering how I can decompose each line and split each piece of information into a variable. So for example I have the Worker class here:
class Worker
{
std::string staffToken;
int doorEntry;
std::string status;
public:
Employee();
};
I want to solve this problem with an array as well. I know I could use a Vector or a Map but I want to solve this with an array.
I've created an array of pointer objects for the Worker class.
typedef Worker * WorkPtr;
WorkPtr * workers = new WorkPtr[MAX]; //where max is some large constant
for (int i = 0; i < MAX; ++i)
{
workers[i] = new Worker();
}
The goal of this problem I've created is that I simply want to check for any unusual activity in this text file where a Worker has entered or exited multiple times in a row:

you can use this template to split a string with a certain delimiter
template<typename Out>
void split(const std::string &s, char delim, Out result) {
std::stringstream ss;
ss.str(s);
std::string item;
while (std::getline(ss, item, delim)) {
*(result++) = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
for example:
ifstream fin("your_file");
while(getline(fin,str))
{
vector<string> res;
res = split(str, ' ');
//your process with res
}

Related

How do I sort string vectors based on 2nd column/3rd column etc?

I have a
vector<string>data
organized as such
//NAME ID AGE
//NAME ID AGE
//NAME ID AGE
//NAME ID AGE
I can sort it by name alphabetically, how can I sort it in ascending order based on the 2nd column/3rd column instead? Thank you for any assistance and advice.
std::sort's third overload has a third parameter allows you to provide a function to do the ordering logic.
// get nth token from a string
std::string getnTh(const std::string & str, int n)
{
std::istringstream strm(str);
std::string result;
for (int count = 0; count < n; count++)
{
if (!(strm >> result))
{
throw std::out_of_range("ran out of tokens before n");
}
}
return result;
}
// get ID, second token, from string
std::string get_ID(const std::string str)
{
return getnTh(str, 2);
}
// compare the ID field, second token, in two strings
bool cmp_ID(const std::string &a, const std::string &b)
{
std::string tokena = get_ID(a);
std::string tokenb = get_ID(b);
return tokena < tokenb;
}
int main()
{
std::vector<std::string> data {"c c c ", "b b b " , "a a a"};
std::sort (data.begin(), data.end(), cmp_ID);
}
Note: This code could be crunched down a bit. I've broken it down step by step for easy reading.
Note: This is BRUTAL! It is constantly parsing the same strings over and over, a disgusting waste of effort.
Instead you should make a structure to store the already parsed string and store that structure in the std::vector.
// stores a person
struct person
{
std::string name;
std::string ID;
std::string age;
// constructor to parse an input string into a new person
person(const std::string & in)
{
std::istringstream strm(in);
if (!(strm >> name >> ID >> age))
{
throw std::runtime_error("invalid person input");
}
}
};
// much simpler and faster compare function. All of the parsing is done once ahead of time.
bool cmp_ID(const person &a, const person &b)
{
return a.ID < b.ID;
}
int main()
{
// replaces vector<string> data
std::vector<person> data {{"c c c"}, {"b b b"} , {"a a a"}};
std::sort (data.begin(), data.end(), cmp_ID);
}
You can read those Strings by each character until you hit the first/second space.
Then you should be able to "filter" out the first/second attribute.

C++ reading in a comma delimited file[with one section as mm/dd/yyyy] by line, placing into a struct securely

To expand on the title, I am reading a file line-by-line that appears as so:
FirstName,LastName,mm/dd/yyyy,SSN,Role,Salary,Zip,Phone
I have this code I just wrote but am having some trouble placing it into my struct as I'm using std::string as opposed to char[]. I want to continue using std::string for purposes down the road. Also, excuse any syntax errors as I haven't written in c/c++ in a while. I have also read the most elegant way to iterate to words of a string but I am still confused on how to do that with the slashes involved in the date format. SSN and Salary are private members of a struct that will be pushed into a vector for later use. How can I do this using c++ libraries? To be honest, the istringstream confuses me as they include some type of parser inside their struct directly. Is this honestly the best way to accomplish what I'm trying to do?
char stringData[150]; //line to be read in
while(fgets(stringData, 150, infile) != NULL) {
if( currentLine == 1) {
fgets(stringData, 150, infile); //get column names | trash
}
else {
lineSize = sscanf(stringData, "%[^,],%[^,],%d/%d/%d,%d,%[^,],%lf,%[^,],%s", temp.firstName,temp.lastName,
&temp.birthMonth,&temp.birthDay,&temp.birthYear,
&tempSSN, temp.role, &tempSalary, temp.zip,
temp.phoneNum);
if(lineSize != 10) { //error message due to a row being incorrect
cerr << "/* ERROR: WRONG FORMAT OF INPUT(TOO FEW OR TOO MANY ARGUMENTS) ON LINE: */" << currentLine << '\n';
exit(1);
}
temp.setSSN(tempSSN);
temp.setSalary(tempSalary);
vector.push_back(temp);//push Employee temp into the vector and repeat loop
}
currentLine++
}
TL;DR: What is the easiest way to do this using c++ libraries?
As Sam Varshavchik already mentioned, the easiest way would be separating input with , then separate on of them with / again.
Thanks to this famous question I'm using following approach to split string :
template<typename Out>
void split(const std::string &s, char delim, Out result)
{
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim))
{
*(result++) = item;
}
}
std::vector<std::string> split(const std::string &s, char delim)
{
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
assuming that this is your structure :
struct info
{
std::string firstName;
std::string lastName;
std::string birthMonth;
std::string birthDay;
std::string birthYear;
std::string tempSSN;
std::string role;
std::string tempSalary;
std::string zip;
std::string phoneNum;
};
I would implement your needed function like this :
void parser(std::string fileName, std::vector<info> &inf)
{
std::string line;
std::ifstream infile(fileName);
int index = inf.size();
while(std::getline(infile, line))
{
inf.push_back({});
std::vector<std::string> comma_seprated_vec = split(line, ',');
inf.at(index).firstName = comma_seprated_vec.at(0);
inf.at(index).lastName = comma_seprated_vec.at(1);
inf.at(index).tempSSN = comma_seprated_vec.at(3);
inf.at(index).role = comma_seprated_vec.at(4);
inf.at(index).tempSalary = comma_seprated_vec.at(5);
inf.at(index).zip = comma_seprated_vec.at(6);
inf.at(index).phoneNum = comma_seprated_vec.at(7);
std::vector<std::string> slash_seprated_vec = split(comma_seprated_vec.at(2), '/');
inf.at(index).birthMonth = slash_seprated_vec.at(0);
inf.at(index).birthDay = slash_seprated_vec.at(1);
inf.at(index).birthYear = slash_seprated_vec.at(2);
++index;
}
}
Then you can use it like this :
int main()
{
std::vector<info> information;
parser("some file", information);
return 0;
}
There you go, your information are presented in information variable.

Insert object in c++ list

I have the follow program in c++. The objective is receive some logs (that came from filedata variable), parse the logs, and save them to object list.
The part to do the parse its working good. But i have a problem when i iterate the parsed vector and try to save some elements to the list. I can get the elements (for example: elements.at(0) i get the first column (timestamp) but when i save to my list, we save two times the log (in this case the filedata only have two logs), but we save two time the SAME log.
My cpp program:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <iterator>
#include <list>
class UserLogRecord
{
public:
std::string timestamp;
std::string id;
std::string name;
std::string data;
};
std::vector<std::string> explode(std::string const & s, char delim)
{
std::vector<std::string> result;
std::istringstream iss(s); // sequence of characters
for (std::string token; std::getline(iss, token, delim); )
{
result.push_back(std::move(token)); //transfer token to vector
}
return result;
}
int main()
{
const char* filedata =
"1496843100;2017-06-07 13:45:00;000002D8;2600;user1\n"
"9999999999;2017-06-07 13:45:00;000002D9;2800;user2";
std::vector<std::string> lines = explode(filedata,'\n');
std::list<UserLogRecord* > userLogRecords;
UserLogRecord* userLogRecord = new UserLogRecord;
//vector
for(int i = 0; i < lines.size(); i++)
{
std::vector<std::string> elements = explode(lines[i], ';');
userLogRecord->timestamp = elements.at(0);
userLogRecord->id = elements.at(2);
userLogRecord->name = elements.at(4);
userLogRecord->data = elements.at(3);
userLogRecords.push_back(userLogRecord);
}
//list of logs
std::list<UserLogRecord* >::const_iterator itLog = userLogRecords.begin();
while (itLog != userLogRecords.end())
{
std::cout << '\n' + (*itLog)->timestamp + '\n';
std::cout << (*itLog)->id + '\n';
std::cout << (*itLog)->name + '\n';
std::cout << (*itLog)->data + '\n';
++itLog;
}
return 0;
}
Output (when iterate the list):
9999999999
000002D9
user2
2800
9999999999
000002D9
user2
2800
Expected output:
1496843100
000002D8
user1
2600
9999999999
000002D9
user2
2800
My program save two logs (this part is 'good', i only have two logs) but its save always the same (curiously it is the last log)
You are only using one UserLogRecord and simply modifying the data instead of adding a new element in the list.
std::list::push_back is adding a copy of a pointer to UserLogRecord which remains the same for all the iterations of the loop. The same pointer is pushed multiple times in the list and therefore all elements have the same data.
The following code would fix the issue. A new UserLogRecord is created for each iteration. You must however remember to free the allocated memory with delete when you don't need it anymore.
//vector
for(int i = 0; i < lines.size(); i++)
{
UserLogRecord* userLogRecord = new UserLogRecord;
std::vector<std::string> elements = explode(lines[i], ';');
userLogRecord->timestamp = elements.at(0);
userLogRecord->id = elements.at(2);
userLogRecord->name = elements.at(4);
userLogRecord->data = elements.at(3);
userLogRecords.push_back(userLogRecord);
}

Attempting to get data from lines in a file

I'm quite new to C++, so sorry if this is a dumb question!
For a project we are given a file with a couple of thousand lines of values, each line having 9 different numbers.
I want to create a for/while loop that, for each loop, stores the 8th and 9th integer of a line as a variable so that I can do some calculations with them. The loop would then move onto the next line, store the 8th and 9th numbers of that line as the same variable, so that I can do the same calculation to it, ending when I've run out of lines.
My problem is less to do with reading the file, I'm just confused how I'd tell it to take only the 8th and 9th value from each line.
Thanks for any help, it is greatly appreciated!
Designed for readability rather than speed. It also performs no checking that the input file is the correct format.
template<class T> ConvertFromString(const std::string& s)
{
std::istringstream ss(s);
T x;
ss >> x;
return x;
}
std::vector<int> values8;
std::vector<int> values9;
std::ifstream file("myfile.txt");
std::string line;
while (std::getline(file, line))
{
std::istringstream ss(line);
for (int i = 0; i < 9; i++)
{
std::string token;
ss >> token;
switch (i)
{
case 8:
{
values8.push_back(ConvertFromString<int>(token));
}
break;
case 9:
{
values9.push_back(ConvertFromString<int>(token));
}
break;
}
}
}
First, split the string, then convert those to numbers using atoi. You then will take the 8th and 9th values from the array or vector with the numbers.
//Split string
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, elems);
return elems;
}
//new code goes here
std::string line;
std::vector<std::string> lineSplit = split(line, ' ');
std::vector<int> numbers;
for (int i = 0; i < lineSplit.size(); i++)
numbers.push_back(atoi(lineSplit[i]);//or stoi
int numb1 = numbers[7];//8th
int numb2 = numbers[8];//9th

C++ Join two pipe divided files on key fields

i am currently trying to create a C++ function to join two pipe divided files with over 10.000.000 records on one or two key fields.
The fiels look like
P2347|John Doe|C1234
P7634|Peter Parker|D2344
P522|Toni Stark|T288
and
P2347|Bruce Wayne|C1234
P1111|Captain America|D534
P522|Terminator|T288
To join on field 1 and 3, the expected output should show:
P2347|C1234|John Doe|Bruce Wayne
P522|T288|Toni Stark|Terminator
What I currently thinking about is using a set/array/vector to read in the files and create something like:
P2347|C1234>>John Doe
P522|T288>>Toni Stark
and
P2347|C1234>>Bruce Wayne
P522|T288>>Terminator
And then use the slip the first part as the key and match that against the second set/vector/array.
What I currently have is: Read in the first file and match the second file line by line against the set. It takes the whole line and matches it:
#include iostream>
#include fstream>
#include string>
#include set>
#include ctime>
using namespace std;
int main()
{
clock_t startTime = clock();
ifstream inf("test.txt");
set lines;
string line;
for (unsigned int i=1; std::getline(inf,line); ++i)
lines.insert(line);
ifstream inf2("test2.txt");
clock_t midTime = clock();
ofstream outputFile("output.txt");
while (getline(inf2, line))
{
if (lines.find(line) != lines.end())
outputFile > a;
return 0;
}
I am very happy for any suggestion. I am also happy to change the whole concept if there is any better (faster) way. Speed is critical as there might be even more than 10 million records.
EDIT: Another idea would be to take a map and have the key being the key - but this might be a little slower. Any suggestions?
Thanks a lot for any help!
I tried multiple ways to get this task completed, none of it was efficient so far:
Read everything into a set and parse the key fields into a format: keys >> values simulating an array type set. Parsing took a long time, but memory usage stays relatively low. Not fully developed code:
#include \
#include \
#include \
#include \
#include \
#include \
#include \
std::vector &split(const std::string &s, char delim, std::vector &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
std::vector split(const std::string &s, char delim) {
std::vector elems;
split(s, delim, elems);
return elems;
}
std::string getSelectedRecords(std::string record, int position){
std::string values;
std::vector tokens = split(record, ' ');
//get position in vector
for(auto& s: tokens)
//pick last one or depending on number, not developed
values = s;
return values;
}
int main()
{
clock_t startTime = clock();
std::ifstream secondaryFile("C:/Users/Batman/Desktop/test/secondary.txt");
std::set secondarySet;
std::string record;
for (unsigned int i=1; std::getline(secondaryFile,record); ++i){
std::string keys = getSelectedRecords(record, 2);
std::string values = getSelectedRecords(record, 1);
secondarySet.insert(keys + ">>>" + values);
}
clock_t midTime = clock();
std::ifstream primaryFile("C:/Users/Batman/Desktop/test/primary.txt");
std::ofstream outputFile("C:/Users/Batman/Desktop/test/output.txt");
while (getline(primaryFile, record))
{
//rewrite find() function to go through set and find all keys (first part until >> ) and output values
std::string keys = getSelectedRecords(record, 2);
if (secondarySet.find(keys) != secondarySet.end())
outputFile > a;
return 0;
}
Instead of pipe divided it currently uses space divided, but that should not be a problem. Reading the data is very quick, but parsing it takes an awful lot of time
The other option was taking a multimap. Similar concept with key fields pointing to values, but this one is very low and memory intensive.
#include \
#include \
#include \
#include \
#include \
#include \
#include \
int main()
{
std::clock_t startTime = clock();
std::ifstream inf("C:/Users/Batman/Desktop/test/test.txt");
typedef std::multimap Map;
Map map;
std::string line;
for (unsigned int i=1; std::getline(inf,line); ++i){
//load tokens into vector
std::istringstream buffer(line);
std::istream_iterator beg(buffer), end;
std::vector tokens(beg, end);
//get keys
for(auto& s: tokens)
//std::cout >>" second;
outputFile > a;
return 0;
}
Further thoughts are: Splitting the pipe divided files into different files with one column each right when importing the data. With that I will not have to parse anything but can read in each column individually.
EDIT: optimized the first example with a recursive split function. Still >30 seconds for 100.000 records. Would like to see that faster plus the actual find() function is still missing.
Any thoughts?
Thanks!