Finding duplicates in JSON file after parsing with Boost

Finding duplicates in JSON file after parsing with Boost - c++

How can I find duplicates in a JSON file after parsing it out like the code below? I want to count the number of duplicates in the data where a duplicate would have the first name, last name, and email address all match.
The JSON file is rather huge, so I won't copy and paste it here. But here is a snippet of it:
[
{
"firstName":"Cletus",
"lastName":"Defosses",
"emailAddress":"ea4ad81f-4111-4d8d-8738-ecf857bba992.Defosses#somedomain.org"
},
{
"firstName":"Sherron",
"lastName":"Siverd",
"emailAddress":"51c985c5-381d-4d0e-b5ee-83005f39ce17.Siverd#somedomain.org"
},
{
"firstName":"Garry",
"lastName":"Eirls",
"emailAddress":"cc43c2da-d12c-467f-9318-beb3379f6509.Eirls#somedomain.org"
}]
This is the main.cpp file:
#include <iostream>
#include <string>
#include "Customer.h"
#include "boost\property_tree\ptree.hpp"
#include "boost\property_tree\json_parser.hpp"
#include "boost\foreach.hpp"
using namespace std;
int main()
{
int numOfCustomers;
// parse the JSON file
boost::property_tree::ptree file;
boost::property_tree::read_json("customers.json", file);
cout << "Reading file..." << endl;
numOfCustomers = file.size();
// iterate over each top level entry
BOOST_FOREACH(boost::property_tree::ptree::value_type const& rowPair, file.get_child(""))
{
// rowPair.first == "" and rowPair.second is the subtree with names and emails
// iterate over rows and columns
BOOST_FOREACH(boost::property_tree::ptree::value_type const& itemPair, rowPair.second)
{
// e.g. itemPair.first == "firstName: " or "lastName: "
cout << itemPair.first << ": ";
// e.g. itemPair.second is the actual names and emails
cout << itemPair.second.get_value<std::string>() << endl;
}
cout << endl;
}
cout << endl;
return 0;
}
The Customer class is just a generic class.
class Customer
{
private:
std::string m_firstNme;
std::string m_lastName;
std::string m_emailAddress;
public:
std::string getFirstName();
void setFirstName(std::string firstName);
std::string getLastName();
void setLastName(std::string lastName);
std::string getEmailAddress();
void setEmailAddress(std::string emailAddress);
};

You'd typically insert the customer objects/keys into a std::set or std::map and define a total ordering that spots the duplicates on insertion.
Defining the key function and comparator object:
boost::tuple<string const&, string const&, string const&> key_of(Customer const& c) {
return boost::tie(c.getFirstName(), c.getLastName(), c.getEmailAddress());
}
struct by_key {
bool operator()(Customer const& a, Customer const& b) const {
return key_of(a) < key_of(b);
}
};
Now you can simply insert the objects in a set<Customer, by_key>:
set<Customer, by_key> unique;
// iterate over each top level array
BOOST_FOREACH(boost::property_tree::ptree::value_type const& rowPair, file.get_child(""))
{
Customer current;
current.setFirstName ( rowPair.second.get ( "firstName", "?" ) ) ;
current.setLastName ( rowPair.second.get ( "lastName", "?" ) ) ;
current.setEmailAddress ( rowPair.second.get ( "emailAddress", "?" ) ) ;
if (unique.insert(current).second)
cout << current << "\n";
else
cout << "(duplicate skipped)\n";
}
Full Demo
I've duplicated 1 entry in your sample JSON, and you can see it live
Live On Coliru
#include <iostream>
#include <string>
#include <set>
#include "Customer.h"
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>
#include <boost/foreach.hpp>
#include <boost/tuple/tuple_comparison.hpp>
using namespace std;
namespace {
boost::tuple<string const&, string const&, string const&> key_of(Customer const& c) {
return boost::tie(c.getFirstName(), c.getLastName(), c.getEmailAddress());
}
struct by_key {
bool operator()(Customer const& a, Customer const& b) const {
return key_of(a) < key_of(b);
}
};
inline ostream& operator<<(ostream& os, Customer const& c) {
return os << "{ '"
<< c.getFirstName() << "', '"
<< c.getLastName() << "', '"
<< c.getEmailAddress() << " }";
}
}
int main()
{
// parse the JSON file
boost::property_tree::ptree file;
boost::property_tree::read_json("customers.json", file);
cout << "Reading file..." << endl;
set<Customer, by_key> unique;
// iterate over each top level array
BOOST_FOREACH(boost::property_tree::ptree::value_type const& rowPair, file.get_child(""))
{
Customer current;
current.setFirstName ( rowPair.second.get ( "firstName", "?" ) ) ;
current.setLastName ( rowPair.second.get ( "lastName", "?" ) ) ;
current.setEmailAddress ( rowPair.second.get ( "emailAddress", "?" ) ) ;
if (unique.insert(current).second)
cout << current << "\n";
else
cout << "(duplicate skipped)\n";
}
cout << "\n" << (file.size() - unique.size()) << " duplicates were found\n";
}
Prints:
Reading file...
{ 'Sherron', 'Siverd', '51c985c5-381d-4d0e-b5ee-83005f39ce17.Siverd#somedomain.org }
{ 'Cletus', 'Defosses', 'ea4ad81f-4111-4d8d-8738-ecf857bba992.Defosses#somedomain.org }
(duplicate skipped)
{ 'Garry', 'Eirls', 'cc43c2da-d12c-467f-9318-beb3379f6509.Eirls#somedomain.org }
1 duplicates were found
NOTE I've adjusted the getters to be less wasteful by returning const&:
std::string const& getFirstName() const { return m_firstName; }
std::string const& getLastName() const { return m_lastName; }
std::string const& getEmailAddress() const { return m_emailAddress; }
BONUS
Here's the equivalent program in 26 lines of c++14 code:
Live On Coliru

Related

Can't iterate through all the words in thr file.txt

I have a txt file which contains two txt file references ei. main.txt contains eg1.txt and eg2.txt and i have to access the content in them and find the occurences of every word and return a string with the word and the documents it was preasent in(0 being eg1.txt and 1 being eg2.txt). My program compiles but I can't get past the first word I encounter. It gives the right result (word: 0 1) since the word is preasent in both the files and in the first position but it doesn't return the other words. Could someone please help me find the error? Thank you
string func(string filename) {
map<string, set<int> > invInd;
string line, word;
int fileNum = 0;
ifstream list (filename, ifstream::in);
while (!list.eof()) {
string fileName;
getline(list, fileName);
ifstream input_file(fileName, ifstream::in); //function to iterate through file
if (input_file.is_open()) {
while (getline(input_file, line)) {
stringstream ss(line);
while (ss >> word) {
if (invInd.find(word) != invInd.end()) {
set<int>&s_ref = invInd[word];
s_ref.insert(fileNum);
}
else {
set<int> s;
s.insert(fileNum);
invInd.insert(make_pair<string, set<int> >(string(word) , s));
}
}
}
input_file.close();
}
fileNum++;
}

Basically your function works. It is a little bit complicated, but i works.
After removing some syntax errors, the main problem is, that you do return nothing from you function. There is also no output statement.
Let me show you you the corrected function which shows some output.
#include <string>
#include <map>
#include <iostream>
#include <fstream>
#include <set>
#include <sstream>
#include <utility>
using namespace std;
void func(string filename) {
map<string, set<int> > invInd;
string line, word;
int fileNum = 0;
ifstream list(filename, ifstream::in);
while (!list.eof()) {
string fileName;
getline(list, fileName);
ifstream input_file(fileName, ifstream::in); //function to iterate through file
if (input_file.is_open()) {
while (getline(input_file, line)) {
stringstream ss(line);
while (ss >> word) {
if (invInd.find(word) != invInd.end()) {
set<int>& s_ref = invInd[word];
s_ref.insert(fileNum);
}
else {
set<int> s;
s.insert(fileNum);
invInd.insert(make_pair(string(word), s));
}
}
}
input_file.close();
}
fileNum++;
}
// Show the output
for (const auto& [word, fileNumbers] : invInd) {
std::cout << word << " : ";
for (const int fileNumber : fileNumbers) std::cout << fileNumber << ' ';
std::cout << '\n';
}
return;
}
int main() {
func("files.txt");
}
This works, I tested it. But maybe you want to return the findings to your main function. Then you should write:
#include <string>
#include <map>
#include <iostream>
#include <fstream>
#include <set>
#include <sstream>
#include <utility>
using namespace std;
map<string, set<int> > func(string filename) {
map<string, set<int> > invInd;
string line, word;
int fileNum = 0;
ifstream list(filename, ifstream::in);
while (!list.eof()) {
string fileName;
getline(list, fileName);
ifstream input_file(fileName, ifstream::in); //function to iterate through file
if (input_file.is_open()) {
while (getline(input_file, line)) {
stringstream ss(line);
while (ss >> word) {
if (invInd.find(word) != invInd.end()) {
set<int>& s_ref = invInd[word];
s_ref.insert(fileNum);
}
else {
set<int> s;
s.insert(fileNum);
invInd.insert(make_pair(string(word), s));
}
}
}
input_file.close();
}
fileNum++;
}
return invInd;
}
int main() {
map<string, set<int>> data = func("files.txt");
// Show the output
for (const auto& [word, fileNumbers] : data) {
std::cout << word << " : ";
for (const int fileNumber : fileNumbers) std::cout << fileNumber << ' ';
std::cout << '\n';
}
}
Please enable C++17 in your compiler.
And please see below a brushed up solution. A little bit cleaner and compacter, with comments and better variable names.
#include <string>
#include <map>
#include <iostream>
#include <fstream>
#include <set>
#include <sstream>
#include <utility>
using WordFileIndicator = std::map<std::string, std::set<int>>;
WordFileIndicator getWordsWithFiles(const std::string& fileNameForFileLists) {
// Here will stor the resulting output
WordFileIndicator wordFileIndicator{};
// Open the file and check, if it could be opened
if (std::ifstream istreamForFileList{ fileNameForFileLists }; istreamForFileList) {
// File number Reference
int fileNumber{};
// Read all filenames from the list of filenames
for (std::string fileName{}; std::getline(istreamForFileList, fileName) and not fileName.empty();) {
// Open the files to read their content. Check, if the file could be opened
if (std::ifstream ifs{ fileName }; ifs) {
// Add word and associated file number to set
for (std::string word{}; ifs >> word; )
wordFileIndicator[word].insert(fileNumber);
}
else std::cerr << "\n*** Error: Could not open '" << fileName << "'\n\n";
// Continue with next file
++fileNumber;
}
}
else std::cerr << "\n*** Error: Could not open '" << fileNameForFileLists << "'\n\n";
return wordFileIndicator;
}
// Some test code
int main() {
// Get result. All words and in which file they exists
WordFileIndicator data = getWordsWithFiles("files.txt");
// Show the output
for (const auto& [word, fileNumbers] : data) {
std::cout << word << " : ";
for (const int fileNumber : fileNumbers) std::cout << fileNumber << ' ';
std::cout << '\n';
}
}
There would be a much faster solution by using std::unordered_map and std::unordered_set

Please make sure your code is composed from many small functions. This improves readability, it easier to reason what code does, in such form parts of code can be reused in alternative context.
Here is demo how it can looks like and why it is better to have small functions:
#include <algorithm>
#include <filesystem>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <string>
#include <unordered_map>
#include <vector>
struct FileData
{
std::filesystem::path path;
int index;
};
bool operator==(const FileData& a, const FileData& b)
{
return a.index == b.index && a.path == b.path;
}
bool operator!=(const FileData& a, const FileData& b)
{
return !(a == b);
}
using WordLocations = std::unordered_map<std::string, std::vector<FileData>>;
template<typename T>
void mergeWordsFrom(WordLocations& loc, const FileData& fileData, T b, T e)
{
for (; b != e; ++b)
{
auto& v = loc[*b];
if (v.empty() || v.back() != fileData)
v.push_back(fileData);
}
}
void mergeWordsFrom(WordLocations& loc, const FileData& fileData, std::istream& in)
{
return mergeWordsFrom(loc, fileData, std::istream_iterator<std::string>{in}, {});
}
void mergeWordsFrom(WordLocations& loc, const FileData& fileData)
{
std::ifstream f{fileData.path};
return mergeWordsFrom(loc, fileData, f);
}
template<typename T>
WordLocations wordLocationsFromFileList(T b, T e)
{
WordLocations loc;
FileData fileData{{}, 0};
for (; b != e; ++b)
{
++fileData.index;
fileData.path = *b;
mergeWordsFrom(loc, fileData);
}
return loc;
}
WordLocations wordLocationsFromFileList(std::istream& in)
{
return wordLocationsFromFileList(std::istream_iterator<std::filesystem::path>{in}, {});
}
WordLocations wordLocationsFromFileList(const std::filesystem::path& p)
{
std::ifstream f{p};
f.exceptions(std::ifstream::badbit);
return wordLocationsFromFileList(f);
}
void printLocations(std::ostream& out, const WordLocations& locations)
{
for (auto& [word, filesData] : locations)
{
out << std::setw(10) << word << ": ";
for (auto& file : filesData)
{
out << std::setw(3) << file.index << ':' << file.path << ", ";
}
out << '\n';
}
}
int main()
{
auto locations = wordLocationsFromFileList("files.txt");
printLocations(std::cout, locations);
}
https://wandbox.org/permlink/nBbqYV986EsqvN3t

creating a class vector that does not delete it's content

I am a beginner , so i wanted to ask , can we create a class object vector/array , that does not delete it's content when i close the program like , so like I want a customer record , but whenever if we try to restart the program we need to enter the customer details again and again ...
how to prevent that from happening
#include <iostream>
#include <vector>
using namespace std;
class customer{
public:
int balance;
string name;
int password;
};
int main(){
vector <customer> cus;
...
if(choice == 1){
cout << cus[i].balance
}
return 0;
}

As a complement to Adam's answer, it is possible to encapsulate the serialization in the container class itself. Here is an simplified example:
The header file defining a persistent_vector class that saves its content to a file:
#include <iostream>
#include <string>
#include <vector>
#include <fstream>
#include <initializer_list>
namespace {
// Utility functions able to store one element of a trivially copyable type
template <class T>
std::ostream& store1(std::ostream& out, const T& val) {
out.write(reinterpret_cast<const char*>(&val), sizeof(val));
return out;
}
template <class T>
std::istream& load1(std::istream& in, T& val) {
in.read(reinterpret_cast<char*>(&val), sizeof(val));
return in;
}
// Specialization for the std::string type
template <>
std::ostream& store1<std::string>(std::ostream& out, const std::string& val) {
store1<size_t>(out, val.size());
if (out) out.write(val.data(), val.size());
return out;
}
template <>
std::istream& load1<std::string>(std::istream& in, std::string& val) {
size_t len;
load1<size_t>(in, len);
if (in) {
char* data = new char[len];
in.read(data, len);
if (in) val.assign(data, len);
delete[] data;
}
return in;
}
}
template <class T>
class persistent_vector {
const std::string path;
std::vector<T> vec;
// load the vector from a file
void load() {
std::ifstream in(path);
if (in) {
for (;;) {
T elt;
load1(in, elt);
if (!in) break;
vec.push_back(elt);
}
if (!in.eof()) {
throw std::istream::failure("Read error");
}
in.close();
}
}
// store the vector to a file
void store() {
std::ofstream out(path);
size_t n = 0;
if (out) {
for (const T& elt : vec) {
store1(out, elt);
if (!out) break;
++n;
}
}
if (!out) {
std::cerr << "Write error after " << n << " elements on " << vec.size() << '\n';
}
}
public:
// a bunch of constructors, first ones load data from the file
persistent_vector(const std::string& path) : path(path) {
load();
}
persistent_vector(const std::string& path, size_t sz) :
path(path), vec(sz) {
load();
};
// last 2 constructors ignore the file because they do receive data
persistent_vector(const std::string& path, size_t sz, const T& val) :
path(path), vec(sz, val) {
};
persistent_vector(const std::string& path, std::initializer_list<T> ini) :
path(path), vec(ini) {
}
// destructor strores the data to the file before actually destroying it
~persistent_vector() {
store();
}
// direct access to the vector (const and non const versions)
std::vector<T>& data() {
return vec;
}
const std::vector<T>& data() const {
return vec;
}
};
It can, out of the box, handle any trivially copyable type and std::string. User has to provide specializations of store1 and load1 for custom types.
Here is a trivial program using it:
#include <iostream>
#include <string>
#include "persistent_vector.h"
int main() {
std::cout << "Create new vector (0) or read an existing one (1): ";
int cr;
std::cin >> cr;
if (!std::cin || (cr != 0 && cr != 1)) {
std::cout << "Incorrect input\n";
return 1;
}
if (cr == 0) {
persistent_vector<std::string> v("foo.data", 0, "");
// skip to the end of line...
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
for (;;) {
std::string line;
std::cout << "Enter a string to add to the vector (empty string to end program)\n";
std::getline(std::cin, line);
if (line.empty()) break;
v.data().push_back(line);
}
}
else {
persistent_vector<std::string> v("foo.data");
for (const std::string& i : v.data()) {
std::cout << i << '\n';
}
}
return 0;
}

When a programmer creates a vector class, he must ensure that the resources acquired for that vector are released when they are no longer needed. (See RAII)
C++ Reference : https://en.cppreference.com/w/cpp/language/raii
Wikipedia : https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization
Stack Overflow : What is meant by Resource Acquisition is Initialization (RAII)?
Microsoft : https://learn.microsoft.com/en-us/cpp/cpp/object-lifetime-and-resource-management-modern-cpp?view=msvc-170
Before the program closes, all resources must be released.
(No leaking resources, memory included)
It is not possible to create a vector class that does not delete its contents after closing a program. Secure operating systems will release program resources when the program is closed.
If you want the program not to lose customer information after closing, you need to save the information in persistent (non-volatile) storage device, such as a disk.
As CinCout, 김선달, Serge Ballesta say, you have to save the customer information to a file, and write the program so that you can read that file during the start of the program.
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
struct customer {
std::string name;
int balance;
int password;
};
int main() {
std::vector <customer> customers;
std::ifstream ifs("info.txt");
{
customer customer{};
while (ifs >> customer.name >> customer.balance >> customer.password)
customers.push_back(customer);
}
for (const auto& [name, balance, password] : customers) {
std::cout <<
"\nName : " << name <<
"\nBalance : " << balance <<
"\nPassword : " << password <<
'\n';
}
std::cout << "\n\nWelcome\n\n";
std::ofstream ofs("info.txt", std::ios_base::app);
char cont{};
do {
customer customer{};
std::cout << "Name : ";
std::cin >> customer.name;
std::cout << "Balance : ";
std::cin >> customer.balance;
std::cout << "Password : ";
std::cin >> customer.password;
ofs << customer.name << ' ' << customer.balance << ' ' << customer.password << '\n';
std::cout << "Add another customer? (Y/N) : ";
std::cin >> cont;
} while (cont == 'Y');
for (const auto& [name, balance, password] : customers) {
std::cout <<
"\nName : " << name <<
"\nBalance : " << balance <<
"\nPassword : " << password <<
'\n';
}
}
CPlusPlus : https://www.cplusplus.com/doc/tutorial/files/
LearnCpp : https://www.learncpp.com/cpp-tutorial/basic-file-io/
(About File I/O)
This program is a prototype, I left some things incomplete (like check readings, user-defined I/O operators, duplicate code, formatting, reallocations of customers, ifs is not required after range-for + structured binding,...).
I suggest you read the book "Programming: Principles and Practice Using C+", I’m reading it and it helped me a lot.
(I’m also a beginner)
Edit: I also suggest you use "using namespace std;" only for small projects, examples or simple exercises.
Do not use "using namespace std;" for real projects, large projects or projects that may include other dependencies because the use of "using namespace std;" could lead to a possible naming collisions between names within std and the names of other codes and libraries.
It’s not good practice to use it all the time.

Overload the subscript operator to call a function based on the type assigned

I have an object that has functions like addString and addInteger. These functions add data to a JSON string. At the end, the JSON string can be obtained and sent out. How can this be made easier by overloading subscript operators to do the following?
jsonBuilder builder();
builder[ "string_value" ] = "Hello";
builder[ "int_value" ] = 5;
builder[ "another_string" ] = "Thank you";

You need to have a proxy class that is returned by the operator[] function and which handles the assignment. The proxy class then overloads the assignment operator to handle strings and integers differently.
Something like this:
#include <iostream>
#include <string>
struct TheMainClass
{
struct AssignmentProxy
{
std::string name;
TheMainClass* main;
AssignmentProxy(std::string const& n, TheMainClass* m)
: name(n), main(m)
{}
TheMainClass& operator=(std::string const& s)
{
main->addString(name, s);
return *main;
}
TheMainClass& operator=(int i)
{
main->addInteger(name, i);
return *main;
}
};
AssignmentProxy operator[](std::string const& name)
{
return AssignmentProxy(name, this);
}
void addString(std::string const& name, std::string const& str)
{
std::cout << "Adding string " << name << " with value \"" << str << "\"\n";
}
void addInteger(std::string const& name, int i)
{
std::cout << "Adding integer " << name << " with value " << i << "\n";
}
};
int main(int argc __attribute__((unused)), char *argv[] __attribute__((unused)))
{
TheMainClass builder;
builder[ "string_value" ] = "Hello";
builder[ "int_value" ] = 5;
builder[ "another_string" ] = "Thank you";
}
See here for a working example.

I think you need this finally. I have implemented for getting the string input, do the same for integer.
#include <iostream>
#include <string>
#include <map>
class jsonBuilder
{
public:
std::map<std::string,std::string> json_container;
std::string& operator[](char *inp)
{
std::string value;
json_container[std::string(inp)];
std::map<std::string,std::string>::iterator iter=json_container.find(std::string(inp));
return iter->second;
}
};
int main()
{
jsonBuilder jb;
jb["a"]="b";
std::map<std::string,std::string>::iterator iter=jb.json_container.find(std::string("a"));
std::cout<<"output: "<<iter->second;
}

Reading from file separated with semicolons and storing into array

I am completely lost and have been trying for hours to read from a file named "movies.txt" and storing the info from it into arrays, because it has semicolons. Any help? Thanks.
movies.txt:
The Avengers ; 2012 ; 89 ; 623357910.79
Guardians of the Galaxy ; 2014 ; 96 ; 333130696.46
Code:
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
struct Movie {
std::string name;
int year;
int rating;
double earnings;
};
int main()
{
const int MAX_SIZE = 100;
Movie movieList[MAX_SIZE];
std::string line;
int i = 0;
std::ifstream movieFile;
movieFile.open("movies.txt");
while (getline(movieFile, line, ';'))
{
movieFile >> movieList[i].name >> movieList[i].year >> movieList[i].rating >> movieList[i].earnings;
i++;
}
movieFile.close();
std::cout << movieList[0].name << " " << movieList[0].year << " " << movieList[0].rating << " " << movieList[0].earnings << std::endl;
std::cout << movieList[1].name << " " << movieList[1].year << " " << movieList[1].rating << " " << movieList[1].earnings << std::endl;
return 0;
}
What I want is to have:
movieList[0].name = "The Avengers";
movieList[0].year = 2012;
movieList[0].rating = 89;
movieList[0].earnings = 623357910.79;
movieList[1].name = "Guardians of the Galaxy";
movieList[1].year = 2014;
movieList[1].rating = 96;
movieList[1].earnings = 333130696.46;

I amended your code.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
struct Movie {
std::string name;
int year;
int rating;
double earnings;
};
std::vector<std::string>
split(const std::string &s, char delim = ',')
{
std::vector<std::string> elems;
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim))
{
elems.push_back(item);
}
return elems;
}
int main()
{
std::vector<Movie> movieList;
std::string line;
std::ifstream movieFile;
movieFile.open("movies.txt");
while (getline(movieFile, line))
{
std::vector<std::string> columns = split(line,';');
Movie movie;
movie.name = columns[0];
movie.year = std::stoi(columns[1]);
movie.rating = std::stoi(columns[2]);
movie.earnings = std::stof(columns[3]);
movieList.push_back(movie);
}
movieFile.close();
for (const Movie & m: movieList)
{
std::cout << m.name << " " << m.year << " " << m.rating << " " << m.earnings << std::endl;
}
return 0;
}
Basicly, I added a split function that splits the lines using ';'. Also I use vector to store the movies rather than hard coded array of movies. Much better this way.
P.S. Second version without vectors
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
struct Movie {
std::string name;
int year;
int rating;
double earnings;
};
void split(const std::string &s, char delim, std::string elems[])
{
std::stringstream ss(s);
std::string item;
int i = 0;
while (std::getline(ss, item, delim))
{
elems[i++] = item;
}
}
int main()
{
//std::vector<Movie> movieList;
const int MAX_SIZE = 100;
Movie movieList[MAX_SIZE];
int movieNo = 0;
std::string line;
std::ifstream movieFile;
movieFile.open("/home/marcin/testing/movies.txt");
std::string columns[4];
while (getline(movieFile, line))
{
split(line,';', columns);
movieList[movieNo].name = columns[0];
movieList[movieNo].year = std::stoi(columns[1]);
movieList[movieNo].rating = std::stoi(columns[2]);
movieList[movieNo].earnings = std::stof(columns[3]);
++movieNo;
}
movieFile.close();
for (int i =0; i < movieNo; ++i) {
std::cout << movieList[i].name
<< " "
<< movieList[i].year
<< " "
<< movieList[i].rating
<< " "
<< movieList[i].earnings
<< std::endl;
}
return 0;
}

Use getline(my_movieFile, movie_name, ';') to get the name of the movie up to the ;.
You'll need to figure out how to remove the trailing whitespace from the name if necessary.. you can search for examples.
Read the rest of the line using getline(movieFile, line)
Use std::replace to replace all ; with a space in line
Put line into a std::stringstream.
Then extract the remaining fields from the stringstream using the >> operators.
Put this in loop do { ... } while (movieFile);
Also, don't hardcode an arbitrary number of movies. Use a std::vector<Movie> and push_back to add new ones.

I think you want to break your line into tokens using something like std::strtok. Check out the reference here. The example given on that page uses a blank as a separator, you would use a semicolon.

Simple expression parser example using Boost::Spirit?

Is anyone aware of an online resource where I can find out how to write a simple expression parser using Boost::Spirit?.
I do not necessarily need to evaluate the expression, but I need to parse it and be able to return a boolean to indicate whether the expression is parsable or not (e.g. brackets not matching etc).
I need the parser to be able recognise function names (e.g. foo and foobar), so this would also be a useful example to help me learn writing BNF notation.
The expressions will be normal arithmetic equations, i.e. comprising of the following symbols:
opening/closing brackets
arithmetic operators
recognized function names, and check for their required arguments

Here's some old Spirit prototype code I had laying around:
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
#include <exception>
#include <iterator>
#include <sstream>
#include <list>
#include <boost/spirit.hpp>
#include <boost/shared_ptr.hpp>
using namespace std;
using namespace boost::spirit;
using namespace boost;
void g(unsigned int i)
{
cout << "row: " << i << endl;
}
struct u
{
u(const char* c): s(c) {}
void operator()(const char* first, const char* last) const
{
cout << s << ": " << string(first, last) << endl;
}
private:
string s;
};
struct Exp
{
};
struct Range: public Exp
{
};
struct Index: public Exp
{
};
struct String: public Exp
{
};
struct Op
{
virtual ~Op() = 0;
virtual string name() = 0;
};
Op::~Op() {}
struct CountIf: public Op
{
string name() { return "CountIf"; }
};
struct Sum: public Op
{
string name() { return "Sum"; }
};
struct Statement
{
virtual ~Statement() = 0;
virtual void print() = 0;
};
Statement::~Statement() {}
struct Formula: public Statement
{
Formula(const char* first, const char* last): s(first, last), op(new CountIf)
{
typedef rule<phrase_scanner_t> r_t;
r_t r_index = (+alpha_p)[u("col")] >> uint_p[&g];
r_t r_range = r_index >> ':' >> r_index;
r_t r_string = ch_p('\"') >> *alnum_p >> '\"';
r_t r_exp = r_range | r_index | r_string; // will invoke actions for index twice due to range
r_t r_list = !(r_exp[u("arg")] % ',');
r_t r_op = as_lower_d["countif"] | as_lower_d["sum"];
r_t r_formula = r_op >> '(' >> r_list >> ')';
cout << s << ": matched: " << boolalpha << parse(s.c_str(), r_formula, space_p).full << endl;
}
void print() { cout << "Formula: " << s << " / " << op->name() << endl; }
private:
string s;
shared_ptr<Op> op;
list<shared_ptr<Exp> > exp_list;
};
struct Comment: public Statement
{
Comment(const char* first, const char* last): comment(first, last) {}
void print() {cout << "Comment: " << comment << endl; }
private:
string comment;
};
struct MakeFormula
{
MakeFormula(list<shared_ptr<Statement> >& list_): list(list_) {}
void operator()(const char* first, const char* last) const
{
cout << "MakeFormula: " << string(first, last) << endl;
list.push_back(shared_ptr<Statement>(new Formula(first, last)));
}
private:
list<shared_ptr<Statement> >& list;
};
struct MakeComment
{
MakeComment(list<shared_ptr<Statement> >& list_): list(list_) {}
void operator()(const char* first, const char* last) const
{
cout << "MakeComment: " << string(first, last) << endl;
list.push_back(shared_ptr<Statement>(new Comment(first, last)));
}
private:
list<shared_ptr<Statement> >& list;
};
int main(int argc, char* argv[])
try
{
//typedef vector<string> v_t;
//v_t v(argv + 1, argv + argc);
// copy(v.begin(), v.end(), ostream_iterator<v_t::value_type>(cout, "\n"));
string s;
getline(cin, s);
// =COUNTIF(J2:J36, "Abc")
typedef list<shared_ptr<Statement> > list_t;
list_t list;
typedef rule<phrase_scanner_t> r_t;
r_t r_index = (+alpha_p)[u("col")] >> uint_p[&g];
r_t r_range = r_index >> ':' >> r_index;
r_t r_string = ch_p('\"') >> *alnum_p >> '\"';
r_t r_exp = r_range | r_index | r_string; // will invoke actions for index twice due to range
r_t r_list = !(r_exp[u("arg")] % ',');
r_t r_op = as_lower_d["countif"] | as_lower_d["sum"];
r_t r_formula = r_op >> '(' >> r_list >> ')';
r_t r_statement = (ch_p('=') >> r_formula [MakeFormula(list)])
| (ch_p('\'') >> (*anychar_p)[MakeComment(list)])
;
cout << s << ": matched: " << boolalpha << parse(s.c_str(), r_statement, space_p).full << endl;
for (list_t::const_iterator it = list.begin(); it != list.end(); ++it)
{
(*it)->print();
}
}
catch(const exception& ex)
{
cerr << "Error: " << ex.what() << endl;
}
Try running it and entering a line like:
=COUNTIF(J2:J36, "Abc")

The current version of Spirit (V2.x) contains a whole series of calculator examples from the very simple to a full fledged mini-c interpreter. You should have a look there as those are a perfect starting point for writing your own expression parser.

I'm not sure if this qualifies as simple either, but I've used this uri-grammar available at http://code.google.com/p/uri-grammar/source/browse/trunk/src/uri/grammar.hpp. It may not be trivial, but at least its parsing something that you probably understand already (URIs). When reading these grammars, its best to read from the bottom up, since that's where the most generic tokens tend to be defined.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding duplicates in JSON file after parsing with Boost - c++

Related

Can't iterate through all the words in thr file.txt

creating a class vector that does not delete it's content

Overload the subscript operator to call a function based on the type assigned

Reading from file separated with semicolons and storing into array

Simple expression parser example using Boost::Spirit?

Categories

Resources