Parsing a text file based on first word C++

Parsing a text file based on first word C++ - c++

I'm trying to write a simple program in C++ that reads information from a text file and prints it to the console. The text file will look similar to this.
thing1 contents1
thing2 contents2
thing3 contents3
thing4 contents4
Is there a way that I can print contents1 console by knowing that the preceding word is thing1?

#include <istream>
#include <string>
#include <vector>
std::vector<std::string> getContents(std::istream &stream, std::string mark) {
std::vector<std::string> contents;
std::string current;
while(stream) {
stream >> current;
if(current == mark) {
stream >> current;
contents.push_back(current);
}
}
return contents
}
this is a very basic example. i wouldnt suggest using just this, but it does get the job done. what it does do: if mark is found in stream, grab content. what it does not do: a lot checking to make sure the stream is valid, or that the line is valid (ie the content could come immediatly after mark). This also could probably be done easier on strings, its just my personal preferene to use streams
edit: thought i saw thing1 = content1. looked again and turns out it was thing1 content1. code edited appropriately

One way, using std::map
ifstream fin("textfile.txt");
std::string firstw, secondw;
std::map< std::string, std::string> m ;
while ( fin >> firstw >> secondw )
{
m[firstw] = secondw ;
}
fin.close( );
std::string input_word = "thing1" ;
// Wrap following in a function, input_word is the search element
if( m.find(input_word) != m.end () )
{
std::cout << m[input_word] ;
}

Related

How to loop through vectors for specific strings

I am struggling to declare a loop that takes a field of a vector, check whether it appears for the first time or jump to the next vector until this field contains a new string.
My input file (.csvx) looks something like:
No.; ID; A; B; C;...;Z;
1;1_380; Value; Value; Value;...; Value;
2;1_380; Value; Value; Value;...; Value;
3;1_380; Value; Value; Value;...; Value;
...
41;2_380; Value; Value; Value;...; Value;
42;2_380; Value; Value; Value;...; Value;
...
400000; 6_392; Value; Value; Value;...; Value;
Note:File is relatively large....
I managed to parse my file into a vector<vector<string> > and split lines at semicolons to access any field.
Now I would like to access the first "ID", i.e. 1_380 and store parameters from same line, then go to the next ID 2_380 and store again those parameters and so on...
This is my code so far:
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
#include <boost/algorithm/string.hpp>
using namespace std;
/*
* CSVX Reader defined to fetch data from
* CSVX file into vectors
*/
class CSVXReader
{
string fileName, delimiter;
public:
CSVXReader(string filename, string delm = ";") :
fileName(filename), delimiter(delm)
{}
vector<vector<string> > getData(); //Function to fetch data
}; //from CSVX file
/*
* Parse through CSVX file line by line
* and return the data in vector of vector
* of strings
*/
vector<vector<string> > CSVXReader::getData()
{
ifstream file(fileName);
vector<vector<string> > dataList; //Vector of vector
//contains all data
string line = "";
while (getline(file, line)) //Iterate through each line
//and split the content
//using delimiter
{
vector<string> vec; //Vector contains a row from
//input file
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
dataList.push_back(vec);
}
file.close();
return dataList;
}
int main(int argc, char** argv)
{
CSVXReader reader("file.csvx"); //Creating an object
//of CSVXReader
vector<vector<string> > dataList = reader.getData();//Get the data from
//CSVX file
for(vector<string> vec : datalist) //Loop to go through
//each line of
//dataList
//(vec1,vec2;vec3...)
if(vec[1] contains "_" && "appears for the first time")
{store parameters...};
else{go to next line};
return 0;
}
As you can see, I have no clue how to declare my loop properly...
To be clear, I want to check the second field of each vector "vec": is it new? -> Store data of same line, if not -> jump to next line, i.e. vector until a new ID appears.
Looking forward for any advice!

Since you wrote pseudo-code, it is difficult to write real code.
But in general, if you want to detect if an item has occurred already, you can utilize a std::unordered_set to implement the "appears for the first time".
Using your pseudo-code:
#include <unordered_set>
//...
std::unordered_set<std::string> stringSet;
//...
for(vector<string>& vec : datalist)
{
if(vec[1] contains "_" && !stringSet.count(vec[1]))
{
//...
stringSet.insert(vec[1]);
}
}
The condition checks if the item is in the unordered_set. If it is, then we skip, if not, then we process the item and add it to the unordered_set.

Basically you do not need all the code that the other answers provide. You need just one statement to copy the data to where you want to have them.
Let us assume that you have read your data already in your dataList. And you defined a new std::vector<std::vector<std::string>> parameter{}; where you want to store the unique result.
The algorithm libraray has a function called std:copy_if. This will copy data only, if a predicate (a condition) is true. Your condition is that a line is different from a previous line. Then it is a new line with new data and you will copy it. If a line is equal to its previous line data, then do not copy it.
So, we will remember the important data from the last line. And then compare in the next line the data with the stored value. If it is different, store the parameter. If not, then not. After each check, we assign the current value to the last value. As initial "last Value" we will use an empty string. So the first line will always be different. The statement will then look like this:
std::copy_if(dataList.begin(), dataList.end(), std::back_inserter(parameter),
[lastID = std::string{}](const std::vector<std::string> & sv) mutable {
bool result = (lastID != sv[1]);
lastID = sv[1];
return result;
}
);
So we copy all data from the begin to the end of the dataList to the parameter vector, if and only if, the second string in the source vector (index=1) is different than our old remembered value.
Rather straightforward.
An additional optimization would be, to immediately sort out the correct parameters and not store the complete vector with all data in the first place, but to store only necessary data. This will reduce the necessary memory drastically.
Modify your while loop to:
string line = "";
string oldValue{};
while (getline(file, line)) //Iterate through each line
//and split the content
//using delimiter
{
vector<string> vec; //Vector contains a row from
//input file
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
if (oldValue != vec[1]) {
dataList.push_back(vec);
}
oldValue = vec[1];
}
With that you get it right from the beginning.
An additional solution is like below
#include <vector>
#include <iostream>
#include <string>
#include <iterator>
#include <regex>
#include <fstream>
#include <sstream>
#include <algorithm>
std::istringstream testFile{R"(1;1_380; Value1; Value2; Value3; Value4
2;1_380; Value5; Value6; Value7; Value8
3;1_380; Value9 Value10
41;2_380; Value11; Value12; Value13
42;2_380; Value15
42;2_380; Value16
500;3_380; Value99
400000; 6_392; Value17; Value18; Value19; Value20
400001; 6_392; Value21; Value22; Value23; Value24)"
};
class LineAsVector { // Proxy for the input Iterator
public:
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, LineAsVector& lv) {
// Read a line
std::string line; lv.completeLine.clear();
std::getline(is, line);
// The delimiter
const std::regex re(";");
// Split values and copy into resulting vector
std::copy( std::sregex_token_iterator(line.begin(), line.end(), re, -1),
std::sregex_token_iterator(),
std::back_inserter(lv.completeLine));
return is;
}
// Cast the type 'CompleteLine' to std::string
operator std::vector<std::string>() const { return completeLine; }
protected:
// Temporary to hold the read vector
std::vector<std::string> completeLine{};
};
int main()
{
// This is the resulting vector which will contain the result
std::vector<std::vector<std::string>> parameter{};
// One copy statement to copy all necessary data from the file to the parameter list
std::copy_if (
std::istream_iterator<LineAsVector>(testFile),
std::istream_iterator<LineAsVector>(),
std::back_inserter(parameter),
[lastID = std::string{}](const std::vector<std::string> & sv) mutable {
bool result = (lastID != sv[1]);
lastID = sv[1];
return result;
}
);
// For debug purposes: Show result on screen
std::for_each(parameter.begin(), parameter.end(), [](std::vector<std::string> & sv) {
std::copy(sv.begin(), sv.end(), std::ostream_iterator<std::string>(std::cout, " "));
std::cout << '\n';
}
);
return 0;
}
Please note: In function main, we do everything in one statement: std::copy_if. The source is in this case an std::istream so an std::ifstream (a file) or wahtever you want. In SO I use an std::istringstream because I cannot use files here. But it is the same. Just replace the variable in the std::istream_iterator. We iterate over the file with the std::istream_iterator.
What a pitty that nobody will read this . . .

Ok fellas, I was playing around with my code and realized that #Armins second solution (modified while loop) doesn't consider unordered lists, i.e. if an element shows up again much later, it is compared with previous element (oldValue) and inserted, although it exists already in my container...
After some reading (and more has to come obviously), I tend to #Paul's unordered_set. My first question arises right here: why didn't you suggest set instead? From what I found, unordered_set is apparently faster for search operations. In my personal very limited mind this is difficult to understand... but I don't want to dig too deep here.
Is this your reason? Or are there other advantages that I missed?
Despite your suggestion, I tried to use set, which seems in my situation a better, because more ordered way. And again my code resists to run:
set<vector<string> > CSVReader::getData() {
ifstream file(fileName);
set<vector<string> > container;
string line = "";
string uniqueValue{};
while (getline(file, line)) //Iterate through each line and split the content using delimiter
{
//Vector contains a row from RAO file
vector<string> vec;
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
uniqueValue = vec[2];
//Line (or vector) is added to container if the uniqueValue, e.g. 1_380, appears for the first time
if(!container.count(uniqueValue))
{
container.insert(vec);
}
}
file.close();
return container;
}
The error says:
error: no matching function for call to 'std::set<std::vector<std::__cxx11::basic_string<char> > >::count(std::__cxx11::string&)'
if(!localDetails.count(localDetail))
Since I followed your example, what did I do wrong?
PS: Just reading about SO policies... hope this additional question is acceptable though

How to make sure the words being read in from the file are how I want them to be C++

If I had to read in a word from a document (one word at a time), and then pass that word into a function until I reach the end of the file, how would I do this?
What also must be kept in mind is that a word is any consecutive string of letters and the apostrophe ( so can't or rojas' is one word). Something like bad-day should be two separate words, and something like to-be-husband should be 3 separate words. I also need to ignore periods ., semi-colons ;, and pretty much anything that isn't part of a word. I have been reading it in using file >> s; and then removing stuff from the string but it has gotten very complicated. Is there a way to store into s only alphabet characters+apostrophes and stop at the end of a word (when a space occurs)?
while (!file.eof()) {
string s;
file >> s; //this is how I am currently reading it it
passToFunction(s);
}

Yes, there is a way: simply write the code to do it. Read one character at a time, and collect the characters in the string, until you gets a non-alphabetic, non-apostrophe character. You've now read one word. Wait until you read the next character that's a letter or an apostrophe, and then you take it from the top.
One other thing:
while (!file.eof())
This is always a bug, and a wrong thing to do. Just thought I'd mention this. I suppose that fixing this is going to be your first order of business, before writing the rest of your code.

OnlyLetterNumAndApp facet for a stream
#include <locale>
#include <string>
#include <fstream>
#include <iostream>
// This facet treats letters/numbers and apostrophe as alpha
// Everything else is treated like a space.
//
// This makes reading words with operator>> very easy to sue
// when you want to ignore all the other characters.
class OnlyLetterNumAndApp: public std::ctype<char>
{
public:
typedef std::ctype<char> base;
typedef base::char_type char_type;
OnlyLetterNumAndApp(std::locale const& l)
: base(table)
{
std::ctype<char> const& defaultCType = std::use_facet<std::ctype<char> >(l);
for(int loop = 0;loop < 256;++loop) {
table[loop] = (defaultCType.is(base::alnum, loop) || loop == '\'')
? base::alpha
: base::space;
}
}
private:
base::mask table[256];
};
Usage
int main()
{
std::ifstream file;
file.imbue(std::locale(std::locale(), new OnlyLetterNumAndApp(std::locale())));
file.open("test.txt");
std::string word;
while(file >> word) {
std::cout << word << "\n";
}
}
Test File
> cat test.txt
This is %%% a test djkhfdkjfd
try another $gh line's
bad-people.Do bad things
Result
> ./a.out
This
is
a
test
djkhfdkjfd
try
another
gh
line's
bad
people
Do
bad
things

iterate over ini file on c++, probably using boost::property_tree::ptree?

My task is trivial - i just need to parse such file:
Apple = 1
Orange = 2
XYZ = 3950
But i do not know the set of available keys. I was parsing this file relatively easy using C#, let me demonstrate source code:
public static Dictionary<string, string> ReadParametersFromFile(string path)
{
string[] linesDirty = File.ReadAllLines(path);
string[] lines = linesDirty.Where(
str => !String.IsNullOrWhiteSpace(str) && !str.StartsWith("//")).ToArray();
var dict = lines.Select(s => s.Split(new char[] { '=' }))
.ToDictionary(s => s[0].Trim(), s => s[1].Trim());
return dict;
}
Now I just need to do the same thing using c++. I was thinking to use boost::property_tree::ptree however it seems I just can not iterate over ini file. It's easy to read ini file:
boost::property_tree::ptree pt;
boost::property_tree::ini_parser::read_ini(path, pt);
But it is not possible to iterate over it, refer to this question Boost program options - get all entries in section
The question is - what is the easiest way to write analog of C# code above on C++ ?

To answer your question directly: of course iterating a property tree is possible. In fact it's trivial:
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/ini_parser.hpp>
int main()
{
using boost::property_tree::ptree;
ptree pt;
read_ini("input.txt", pt);
for (auto& section : pt)
{
std::cout << '[' << section.first << "]\n";
for (auto& key : section.second)
std::cout << key.first << "=" << key.second.get_value<std::string>() << "\n";
}
}
This results in output like:
[Cat1]
name1=100 #skipped
name2=200 \#not \\skipped
name3=dhfj dhjgfd
[Cat_2]
UsagePage=9
Usage=19
Offset=0x1204
[Cat_3]
UsagePage=12
Usage=39
Offset=0x12304
I've written a very full-featured Inifile parser using boost-spirit before:
Cross-platform way to get line number of an INI file where given option was found
It supports comments (single line and block), quotes, escapes etc.
(as a bonus, it optionally records the exact source locations of all the parsed elements, which was the subject of that question).
For your purpose, though, I think I'd recomment Boost Property Tree.

For the moment, I've simplified the problem a bit, leaving out the logic for comments (which looks broken to me anyway).
#include <map>
#include <fstream>
#include <iostream>
#include <string>
typedef std::pair<std::string, std::string> entry;
// This isn't officially allowed (it's an overload, not a specialization) but is
// fine with every compiler of which I'm aware.
namespace std {
std::istream &operator>>(std::istream &is, entry &d) {
std::getline(is, d.first, '=');
std::getline(is, d.second);
return is;
}
}
int main() {
// open an input file.
std::ifstream in("myfile.ini");
// read the file into our map:
std::map<std::string, std::string> dict((std::istream_iterator<entry>(in)),
std::istream_iterator<entry>());
// Show what we read:
for (entry const &e : dict)
std::cout << "Key: " << e.first << "\tvalue: " << e.second << "\n";
}
Personally, I think I'd write the comment skipping as a filtering stream buffer, but for those unfamiliar with the C++ standard library, it's open to argument that would be a somewhat roundabout solution. Another possibility would be a comment_iterator that skips the remainder of a line, starting from a designated comment delimiter. I don't like that as well, but it's probably simpler in some ways.
Note that the only code we really write here is to read one, single entry from the file into a pair. The istream_iterator handles pretty much everything from there. As such, there's little real point in writing a direct analog of your function -- we just initialize the map from the iterators, and we're done.

C++ , how do i get a particular string in a html code

I am trying to parse this XML Yahoo feed.
How do i like get each record into an array in C++
like create a structure
then got those variable
and record each element inside the structure.
In the first place, how do i get the value out
Thanks

You may want to see if the given page offers output in a JSON format. Then you can simply request the value instead of messing around with HTML. The Yahoo! Finance site may even offer an API that you can use to easily request the value.

If you want to mess with html code:
#include <iostream>
#include <fstream>
int main() {
std::ifstream ifile("in.html");
std::string line;
std::string ndl("<span id=\"yfs_l10_sgdmyr=x\">");
while(ifile.good()){
getline(ifile, line);
if (line.size()) {
size_t spos, epos;
if ((spos = line.find(ndl)) != std::string::npos) {
spos += ndl.size();
if ((epos = line.find(std::string("</span>"), spos)) != std::string::npos) {
std::cout << line.substr(spos, epos-spos) << std::endl;
}
}
}
}
return 0;
}

Is my fstream bad or not good()?

So I have a .cpp file with a Function which recieves a filename, and should return a String with the contents of the file (actualy modified contents, I modified the code to make it more understandable, but that doesn't have any effect on my problem). The problem is that f.good() is returning false and the loop, which reads the file is not working.
CODE :
#include "StdAfx.h"
#include "Form21.h"
#include <string>
#include <fstream>
#include <iostream>
string ReadAndWrite(char* a){
char filename[8];
strcpy_s(filename,a);
string output;
char c;
ifstream f(filename,ios::in);
output+= "Example text"; // <-- this writes and returns just fine!
c = f.get();
while (f.good())
{
output+= c;
c= f.get();
}
return output;
}
Does anyone have an idea on why this is happening?
Does it have something to do with, that this is a seperate .cpp file( it doesnt even throw out an error when I remove #include <fstream>).
Maybe there is a different kind of method to make the loop?
I'll be very happy to hear any suggestions on how to fix this or maybe a different method on how to achieve my goal.

First, there's really no reason to copy the file name you receive -- you can just use it as-is. Second, almost any loop of the form while (stream.good()), while (!stream.bad()), while (stream), etc., is nearly certain to be buggy. What you normally want to do is check whether reading some data worked.
Alternatively, you can skip using a loop at all. There are a couple of ways to do this. One that works nicely for shorter files looks like this:
string readfile(std::string const &filename) {
std::ifstream f(filename.c_str());
std::string retval;
retval << f.rdbuf();
return retval;
}
That works nicely up to a few tens of kilobytes (or so) of data, but starts to slow down on larger files. In such a case, you usually want to use ifstream::read to get the data, something along this general line:
std::string readfile(std::string const &filename) {
std::ifstream f(filename.c_str());
f.seekg(0, std::ios_base::end);
size_t size = f.tellg();
std::string retval(size, ' ');
f.seekg(0);
f.read(&retval[0], size);
return retval;
}
Edit: If you need to process the individual characters (not just read them) you have a couple of choices. One is to separate it into phases, where you read all the data in one phase, and do the processing in a separate phase. Another possibility (if you just need to look at individual characters during processing) is to use something like std::transform to read data, do the processing, and put the output into a string:
struct character_processor {
char operator()(char input) {
// do some sort of processing on each character:
return ~input;
}
};
std::transform(std::istream_iterator<char>(f),
std::istream_iterator<char>(),
std::back_inserter(result),
character_processor());

I would check that strlen(a) is not greater than 7...
You might overrun filename and get a file name that doesn't exist.
Not relating the problem, I would re-write the function:
string ReadAndWrite(string a) { // string here, if you are into C++ already
string filename; // also here
filename = a; // simpler
string output;
char c;
ifstream f(filename.c_str()); // no need for ios::in (but needs a char *, not a string
output+= "Example text"; // <-- this writes and returns just fine!
f >> c; // instead c = f.get();
while (f) // no need for f.good())
{
output+= c;
f >> c; // again, instead c= f.get();
}
return output;
}

Might I suggest using fopen? http://www.cplusplus.com/reference/clibrary/cstdio/fopen/ It takes in a filename and returns a file pointer. With that you can use fgets to read the file line by line http://www.cplusplus.com/reference/clibrary/cstdio/fgets/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parsing a text file based on first word C++ - c++

Related

How to loop through vectors for specific strings

How to make sure the words being read in from the file are how I want them to be C++

iterate over ini file on c++, probably using boost::property_tree::ptree?

C++ , how do i get a particular string in a html code

Is my fstream bad or not good()?

Categories

Resources