Convert text to csv file in C++? - c++

I got a text file that contain lots of line like the following:
data[0]: a=123 b=234 c=3456 d=4567 e=123.45 f=234.56
I am trying to extract the number out in order to convert it to a csv file in order to let excel import and recognize it.
My logic is, find the " " character, then chop the data out. For example, chop between first
" " and second " ". Is it viable? I have been trying on this but I did not succeed.
Actually I want to create a csv file like
a, b, c, d, e, f
123, 234, 3456 .... blablabla
234, 345, 4567 .... blablabla
But it seems it is quite difficult to do this specific task.
Are there any utilities/better method that could help me to do this?

I suggest you take a look at boost::tokenizer, this is the best approach I have found. You will find several example on the web. Have also a look at this high-score question.
Steps: for each line:
Cut string in two parts using the : character
Cut the right part into several strings using space character
separate the values using the = character, and stuff these into a std::vector<std::string>
Put these values in a file.
Last part can be something like:
std::ofstream f( "myfile.csv" );
for( const auto& s: vstrings )
f << s << ',';
f << "\n";

A easy way with no non-Standard libraries is:
std::string line;
while (getline(input_stream, line))
{
std::istringstream iss(line);
std::string word;
if (is >> word) // throw away "data[n]:"
{
std::string identifier;
std::string value;
while (getline(iss, identifier, '=') && is >> value)
std::cout << value << ",";
std::cout << '\n';
}
}
You can tweak it if training commas are causing excel any trouble, add more sanity checks (e.g. that value is numeric, that fields are consistent across all lines), but the basic parsing above is a start.

Related

Parsing Data of data from a file

i have this project due however i am unsure of how to parse the data by the word, part of speech and its definition... I know that i should make use of the tab spacing to read it but i have no idea how to implement it. here is an example of the file
Recollection n. The power of recalling ideas to the mind, or the period within which things can be recollected; remembrance; memory; as, an event within my recollection.
Nip n. A pinch with the nails or teeth.
Wodegeld n. A geld, or payment, for wood.
Xiphoid a. Of or pertaining to the xiphoid process; xiphoidian.
NB: Each word and part of speech and definition is one line in a text file.
If you can be sure that the definition will always follow the first period on a line, you could use an implementation like this. But it will break if there are ever more than 2 periods on a single line.
string str = "";
vector<pair<string,string>> v; // <word,definition>
while(getline(fileStream, str, '.')) { // grab line, deliminated '.'
str[str.length() - 1] = ""; // get rid of n, v, etc. from word
v.push_back(make_pair<string,string>(str,"")); // push the word
getline(fileStream, str, '.'); // grab the next part of the line
v.back()->second = str; // push definition into last added element
}
for(auto x : v) { // check your results
cout << "word -> " << x->first << endl;
cout << "definition -> " << x->second << endl << endl;
}
The better solution would be to learn Regular Expressions. It's a complicated topic but absolutely necessary if you want to learn how to parse text efficiently and properly:
http://www.cplusplus.com/reference/regex/

C++ retrieve numerical values in a line of string

Here is the content of txt file that i've managed read.
X-axis=0-9
y-axis=0-9
location.txt
temp.txt
I'm not sure whether if its possible but after reading the contents of this txt file i'm trying to store just the x and y axis range into 2 variables so that i'll be able to use it for later functions. Any suggestion? And do i need to use vectors? Here is the code for reading of the file.
string configName;
ifstream inFile;
do {
cout << "Please enter config filename: ";
cin >> configName;
inFile.open(configName);
if (inFile.fail()){
cerr << "Error finding file, please re-enter again." << endl;
}
} while (inFile.fail());
string content;
string tempStr;
while (getline(inFile, content)){
if (content[0] && content[1] == '/') continue;
cout << endl << content << endl;
depends on the style of your file, if you are always sure that the style will remain unchanged, u can read the file character by character and implement pattern recognition stuff like
if (tempstr == "y-axis=")
and then convert the appropriate substring to integer using functions like
std::stoi
and store it
I'm going to assume you already have the whole contents of the .txt file in a single string somewhere. In that case, your next task should be to split the string. Personally, yes, I would recommend using vectors. Say you wanted to split that string by newlines. A function like this:
#include <string>
#include <vector>
std::vector<std::string> split(std::string str)
{
std::vector<std::string> ret;
int cur_pos = 0;
int next_delim = str.find("\n");
while (next_delim != -1) {
ret.push_back(str.substr(cur_pos, next_delim - cur_pos));
cur_pos = next_delim + 1;
next_delim = str.find("\n", cur_pos);
}
return ret;
}
Will split an input string by newlines. From there, you can begin parsing the strings in that vector. They key functions you'll want to look at are std::string's substr() and find() methods. A quick google search should get you to the relevant documentation, but here you are, just in case:
http://www.cplusplus.com/reference/string/string/substr/
http://www.cplusplus.com/reference/string/string/find/
Now, say you have the string "X-axis=0-9" in vec[0]. Then, what you can do is do a find for = and then get the substrings before and after that index. The stuff before will be "X-axis" and the stuff after will be "0-9". This will allow you to figure that the "0-9" should be ascribed to whatever "X-axis" is. From there, I think you can figure it out, but I hope this gives you a good idea as to where to start!
std::string::find() can be used to search for a character in a string;
std::string::substr() can be used to extract part of a string into another new sub-string;
std::atoi() can be used to convert a string into an integer.
So then, these three functions will allow you to do some processing on content, specifically: (1) search content for the start/stop delimiters of the first value (= and -) and the second value (- and string::npos), (2) extract them into temporary sub-strings, and then (3) convert the sub-strings to ints. Which is what you want.

Why can't regex find the "(" in a Japanese string in C++?

I have a huge file of Japanese example sentences. It's set up so that one line is the sentence, and then the next line is comprised of the words used in the sentence separated by {}, () and []. Basically, I want to read a line from the file, find only the words in the (), store them in a separate file, and then remove them from the string.
I'm trying to do this with regexp. Here is the text I'm working with:
は 二十歳(はたち){20歳} になる[01]{になりました}
And here's the code I'm using to find the stuff between ():
std::smatch m;
std::regex e ("\(([^)]+)\)"); // matches things between ( and )
if (std::regex_search (components,m,e)) {
printToTest(m[0].str(), "what we got"); //Prints to a test file "what we got: " << m[0].str()
components = m.prefix().str().append(m.suffix().str());
//commponents is a string
printToTest(components, "[COMP_AFTER_REMOVAL]");
//Prints to test file "[COMP_AFTER_REMOVAL]: " << components
}
Here's what should get printed:
what we got:はたち
[COMP_AFTER_REMOVAL]:は 二十歳(){20歳} になる[01]{になりました}
Here's what gets printed:
what we got:は 二十歳(はたち
[COMP_AFTER_REMOVAL]:){20歳} になる[01]{になりました}
It seems like somehow the は is being confused for a (, which makes the regexp go from は to ). I believe it's a problem with the way the line is being read in from the file. Maybe it's not being read in as utf8 somehow. Here's what I do:
xml_document finalDoc;
string sentence;
string components;
ifstream infile;
infile.open("examples.utf");
unsigned int line = 0;
string linePos;
bool eof = infile.eof();
while (!eof && line < 1){
getline(infile, sentence);
getline(infile, components);
MakeSentences(sentence, components, finalDoc);
line++;
}
Is something wrong? Any tips? Need more code? Please help. Thanks.
You forgot to escape your backslashes. The compiler sees "\(([^)]+)\)" and interprets it as (([^)]+)) which is not the regex you wanted.
You need to type "\\(([^)]+)\\)"

How to read from a text file and split sentences apart in C

I want to read a series of questions from a text file. Each question is separated by a comma, so I am thinking that I have to check for each character to not be equal to a comma before copying the character?
The text file looks something like this "Is it red?, Is it bigger than a mailbox?, Is it an animal?"
In case it affects the code, I want to copy each string into a node to put in a tree later on.
while (fgets(stringPtr, 100, filePtr) != ',')
strcpy(stringPtr, treeNode);
Is something like this ok?
Given your description - something like the follow:
std::string question_string;
std::set<std::string> my_tree;
if (std::ifstream file_stream{filename})
{
while (std::getline(file_stream, question_string, ','))
my_tree.insert(question_string);
}
else
std::cerr << "unable to open " << filename << '\n';
You'll need to get the filename from somewhere, include the relevant headers (google the classes if you need to).

Program isn't properly extracting info from text files

I could use some help figuring out where the bug is. I have 2 text files from which I need to extract info.
The first is of the form
word1
word2
word3
etc.
and I just want the words put into a std::vector. There are 5000 words in the text file. When I put a little tester line in my code and ran it, I see that it only got 729 words.
The second text file is of the form
a a 0
a b 5
a c 3
etcetera
and I want to put those into a std::map that maps pairs of characters to integers. When I put a little tester line in my code and ran it, I see that it added zero elements to the map.
Here is the relevant code:
class AutoCorrector
{
public:
AutoCorrector(std::ifstream&, std::ifstream&);
~AutoCorrector();
void suggest(std::string);
private:
std::vector<std::string> wdvec;
std::map<std::pair<char,char>,int> kdmap;
};
AutoCorrector::AutoCorrector(std::ifstream& wdfile, std::ifstream& kdfile)
{
/* Insert 5000 most commond English words into a vector.
The file that is read was edit-copied copied from
http://www.englishclub.com/vocabulary/common-words-5000.htm
and so the numberings must be ignored on each line in order
to properly extract the words.
*/
if (wdfile.is_open()) {
std::string line;
while (std::getline(kdfile, line))
{
std::istringstream ss(line);
std::string nb, thisWord;
ss >> nb >> thisWord;
wdvec.push_back(thisWord);
}
// test ---
std::cout << "wdvec size = " << wdvec.size() << std::endl;
// -------
}
else
{
throw("Was not able to open key distance file.\n");
}
/* Insert keyboard pairwise distances into a map.
The file that is read from must have lines of the form
a a 0
a b 5
a c 3
etcetera,
indicating the distances between characters on a standard keyboard,
all lower-case letters and the apostrophe for a total of 27x27=729
lines in the file.
*/
if (kdfile.is_open()) {
std::string line;
while (std::getline(kdfile, line))
{
std::istringstream ss(line);
char c1, c2;
int thisInt;
ss >> c1 >> c2 >> thisInt;
std::pair<char,char> thisPair(c1, c2);
kdmap.insert(std::pair<std::pair<char,char>, int> (thisPair, thisInt));
}
// test --
std::cout << "kdmap size = " << kdmap.size() << std::endl;
// end test
}
else
{
throw("Was not able to open key distance file.\n");
}
}
Any help from the StackOverflow C++ purists is greatly appreciated. I'm open to suggestions on how I can simplify and elegantfy my code. Ultimately I'm trying to make an autocorrector that takes a word and searches for the most similar words from a list of the 5000 most common words.
27 * 27 = 729. So your first vector has got the same number of lines as the second file does. Why? Because you're reading from kdfile when you meant to read from wdfile.
while (std::getline(kdfile, line))
^^^^^^
That means you're reading everything out of the pairwise distance file and then the second loop has nothing left to extract.