Reading in multiple text strings of various lengths with C++ - c++

I have a bunch of ASCII based text files that are used as input files into various computer programs and I need to convert them to a different format. Each input file starts with a 4 digit number and is then either followed by further input data or comment lines if the first four digit number begins with a 0 (number zero). I am developing a C++ based file convertor and I would like it to read in the four digit number and if that number is a zero read in the comment lines that follow it. An example is provided below. C++ can easily read in the numbers as an array or by using std::vector; however reading in the character string gets to be much more complex. First of all if each comment line had the same number of words, I could treat each string as if it were filling its own line within a fixed column, but since each comment line has a different number of words, then the number of columns to be read in at each line would be different. Is there a simple way to read in the comment lines where C++ will not see the space between each word as the end of one column of data and the beginning of another? Generic numbers and data are used in the file example below, but hopefully you can see that the comment lines starting with the number 0 have a different number of words following them, making it impossible to read the file in as a serious of data columns.
0001 Input File Name
0001 - Description of input file goes here
0001 - PROGRAM name that works on this data
0000 ==========================================
0001 List of references used in the development of this input file
0001 [1] Ref. 1
0001 [2] Ref. 2
0001 [3] Ref. 3
1100 Input line 1: CBRD 1-0220
1101 Core Length (mm): 8.189
1102 Core diameter (mm): 37.81

Use getline function to read a line from the file to a string and work on that string to do whatever you want.
Something like: while(getline(file, string)) { ... }
You don't need to know max chars per line.
This is simply what I meant:
int main() {
std::fstream iFile("Input.txt", std::fstream::in);
//You might want to check if it is open
std::string line;
int firstNumber;
std::string word;
while(getline(iFile, line)){
std::stringstream lineStream(line);
lineStream >> firstNumber;
if(firstNumber == 0) { // modify based on what you want to do
while(lineStream >> word) {
std::cout << word << " ";
}
}
}
std::cout << std::endl;
iFile.close();
}

Based on the suggestions provided above, I was able to implement the following solution. I will try to clean it up and make it more generic so it can be applied easily to other problems. I appreciate the suggestions and the help it gave me.
#include <iostream>
#include <fstream>
#include <cstring>
#include <stdio.h>
int main(int argc, const char * argv[]) {
std::string Input_File("Input.txt");
std::string comment;
const int MAX_CHARS_PER_LINE = 1200;
const int MAX_TOKENS_PER_LINE = 40;
const char* const DELIMITER = " ";
FILE *Output_File;
std::ifstream inp(Input_File, std::ios::in | std::ios::binary);
if(!inp) {
std::cout << "Cannot Open " << Input_File << std::endl;
return 1; // Terminate program
}
Output_File = fopen ("Output_File.txt","w");
std::ofstream out("Output_File.txt", std::ios::in | std::ios::binary);
// read each line of the file
while (!inp.eof())
{
// read an entire line into memory
char buf[MAX_CHARS_PER_LINE];
inp.getline(buf, MAX_CHARS_PER_LINE);
// parse the line into blank-delimited tokens
int n = 0; // a for-loop index
// array to store memory addresses of the tokens in buf
const char* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0
// parse the line
token[0] = strtok(buf, DELIMITER); // first token
if (token[0]) // zero if line is blank
{
for (n = 1; n < MAX_TOKENS_PER_LINE; n++)
{
token[n] = strtok(0, DELIMITER); // subsequent tokens
if (!token[n]) break; // no more tokens
}
if (strncmp (token[0],"0",1) == 0)
{
for(int i = 0; i < n; i++) out << token[i] << " ";
}
out << std::endl;
}
}
inp.close();
return 0;
}

Related

(C++) Reading a CSV text file as a vector of integers

I'm a beginner programmer working through the 2019 Advent of Code challenges in C++.
The last piece of the puzzle I'm putting together is actually getting the program to read the input.txt file, which is essentially a long string of values in the form of '10,20,40,23" etc. on a single line.
In the previous puzzle I used the lines
int inputvalue;
std::ifstream file("input.txt");
while(file >> inputvalue){
//
}
to grab lines from the file, but it was formatted as a text file in sequential lines with no comma separation.
ie:
10
20
40
23
What can I do to read through the file using the comma delineation, and specifically how can I get those values to be read as integers, instead of as strings or chars, and store them into a vector?
While it would be strange to write a routine to read just one line from a comma separated file, instead of writing a general routine to read all lines (and just take the first one if you only wanted one) -- you can take out the parts for reading multiple lines into a std::vector<std::vector<int>> and just read one line into a std::vector<int> -- though it only saves a handful of lines of code.
The general approach would be to read the entire line of text with getline(file, line) and then create a std::stringstream (line) from which you could then use >> to read each integer followed by a getline (file, tmpstr, ',') to read the delimiter.
You can take a second argument in addition to the file to read, so you can pass the delimiter as the first character of the second argument -- that way there is no reason to re-compile your code to handle delimiters of ';' or ',' or any other single character.
You can put a short bit of code together to do that which could look like the following:
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
int main (int argc, char **argv) {
if (argc < 2) { /* validate at least 1 argument given */
std::cerr << "error: insufficient number of arguments.\n"
"usage: " << argv[0] << " <filename>\n";
return 1;
}
std::vector<int> v {}; /* vector<int> */
std::string line {}; /* string to hold each line */
std::ifstream f (argv[1]); /* open file-stream with 1st argument */
const char delim = argc > 2 ? *argv[2] : ','; /* delim is 1st char in */
/* 2nd arg (default ',') */
if (!f.good()) { /* validate file open for reading */
std::cerr << "errro: file open failed '" << argv[1] << "'.\n";
return 1;
}
if (getline (f, line)) { /* read line of input into line */
int itmp; /* temporary integer to fill */
std::stringstream ss (line); /* create stringstream from line */
while (ss >> itmp) { /* read integer value from ss */
std::string stmp {}; /* temporary string to hold delim */
v.push_back(itmp); /* add to vector */
getline (ss, stmp, delim); /* read delimiter */
}
}
for (auto col : v) /* loop over each integer */
std::cout << " " << col; /* output col value */
std::cout << '\n'; /* tidy up with newline */
}
(note: there are relatively few changes needed to read all lines into a vector of vectors, the more notable is simply replacing the if(getline...) with while(getline..) and then filling a temporary vector which, if non-empty, is then pushed back into your collection of vectors)
Example Input File
With a set of comma separated integers in the file named dat/int-1-10-1-line.txt, e.g.
$ cat dat/int-1-10-1-line.txt
1,2,3,4,5,6,7,8,9,10
Example Use/Output
Your use an results would be:
$ ./bin/read_csv_int-1-line dat/int-1-10-1-line.txt
1 2 3 4 5 6 7 8 9 10
Of course you can change the output format to whatever you need. Look things over and let me know if you have further questions.
You have options. In my opinion, the most straight-forward is to just read a string and then convert to integer. You can use the additional "delimiter" parameter of std::getline to stop when it encounters a comma:
std::string value;
while (std::getline(file, value, ',')) {
int ival = std::stoi(value);
std::cout << ival << std::endl;
}
A common alternative is to read a single character, expecting it to be a comma:
int ival;
while (file >> ival) {
std::cout << ival << std::endl;
// Skip comma (we hope)
char we_sure_hope_this_is_a_comma;
file >> we_sure_hope_this_is_a_comma;
}
If it's possible for whitespace to also be present, you may want a less "hopeful" technique to skip the comma:
// Skip characters up to (and including) next comma
for (char c; file >> c && c != ',';);
Or simply:
// Skip characters up to (and including) next comma
while (file && file.get() != ',');
Or indeed, if you expect only whitespace or a comma, you could do something like:
// Skip comma and any leading whitespace
(file >> std::ws).get();
Of course, all the above are more-or-less clunky ways of doing this:
// Skip characters up to (and including) next comma on next read
file.ignore(std::numeric_limits<std::streamsize>::max(), ',');
All these approaches assume input is a single line. If you expect multiple lines of input with comma-separated values, you'll also need to handle end-of-line occurring without encountering a comma. Otherwise, you might miss the first input on the next line. Except for the "hopeful" approach, which will work but only on a technicality.
For robustness, I generally advise you to read line-based input as a whole string with std::getline, and then use std::istringstream to read individual values out of that line.
Here is another compact solution using iterators.
#include <iostream>
#include <vector>
#include <string>
#include <iterator>
#include <fstream>
#include <algorithm>
template <char D>
struct WordDelimiter : public std::string
{};
template <char D>
std::istream &
operator>>(std::istream & is, WordDelimiter<D> & output)
{
// Output gets every comma-separated token
std::getline(is, output, D);
return is;
}
int main() {
// Open a test file with comma-separated tokens
std::ifstream f{"test.txt"};
// every token is appended in the vector
std::vector<std::string> vec{ std::istream_iterator<WordDelimiter<','>>{ f },
std::istream_iterator<WordDelimiter<','>>{} };
// Transform str vector to int vector
// WARNING: no error checking made here
std::vector<int> vecint;
std::transform(std::begin(vec),std::end(vec),std::back_inserter(vecint),[](const auto& s) { return std::stoi(s); });
for (auto val : vecint) {
std::cout << val << std::endl;
}
return 0;
}

Reading a text file and storing data into multiple arrays C++

I am trying to read a database file (as txt) where I want to skip empty lines and skip the column header line within the file and store each record as an array. I would like to take stop_id and find the stop_name appropriately. i.e.
If i say give me stop 17, the program will get "Jackson & Kolmar".
The file format is as follows:
17,17,"Jackson & Kolmar","Jackson & Kolmar, Eastbound, Southeast Corner",41.87685748,-87.73934698,0,,1
18,18,"Jackson & Kilbourn","Jackson & Kilbourn, Eastbound, Southeast Corner",41.87688572,-87.73761421,0,,1
19,19,"Jackson & Kostner","Jackson & Kostner, Eastbound, Southeast Corner",41.87691497,-87.73515882,0,,1
So far I am able to get the stop_id values but now I want to get the stop name values and am fairly new to c++ string manipulation
mycode.cpp
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
using namespace std;
int main()
{
string filename;
filename = "test.txt";
string data;
ifstream infile(filename.c_str());
while(!infile.eof())
{
getline(infile,line);
int comma = line.find(",");
data = line.substr(0,comma);
cout << "Line " << count << " "<< "is "<< data << endl;
count++;
}
infile.close();
string sent = "i,am,the,champion";
return 0;
}
You can use string::find 3 times to search for the third occurrence of the comma, and you must store the positions of the last 2 occurrences found in line, then use them as input data with string::substr and get the searched text:
std::string line ("17,17,\"Jackson & Kolmar\",\"Jackson & Kolmar, Eastbound, Southeast Corner\",41.87685748,-87.73934698,0,,1");
std::size_t found=0, foundBack;
int i;
for(i=0;i<3 && found!=std::string::npos;i++){
foundBack = found;
found=line.find(",",found+1);
}
std::cout << line.substr(foundBack+1,found-foundBack-1) << std::endl;
You can read the whole line of the file intoa string and then use stringstream to give you each piece one at a time up until and exluding the commas. Then you can fill up your arrays. I am assuming that you wanted each line in it's own array and that you wanted unlimited arrays. The best way to do that is to have an array of arrays.
std::string Line;
std::array<std::array<string>> Data;
while (std::getline(infile, Line))
{
std::stringstream ss;
ss << Line;
Data.push_back(std::vector<std::string>);
std::string Temp;
while (std::getline(ss, Temp, ','))
{
Data[Data.size() - 1].push_back(Temp);
}
}
This way you will have a vector, full of vectors, each of which conatining strings of all your data in that line. To access the strings as numbers, you can use std::stoi(std::string) which converts a string to an integer.

Count first digit on each line of a text file

My project takes a filename and opens it. I need to read each line of a .txt file until the first digit occurs, skipping whitespace, chars, zeros, or special chars. My text file could look like this:
1435 //1, nextline
0 //skip, next line
//skip, nextline
(*Hi 245*) 2 //skip until second 2 after comment and count, next line
345 556 //3 and count, next line
4 //4, nextline
My desired output would be all the way up to nine but I condensed it:
Digit Count Frequency
1: 1 .25
2: 1 .25
3: 1 .25
4: 1 .25
My code is as follows:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main() {
int digit = 1;
int array[8];
string filename;
//cout for getting user path
//the compiler parses string literals differently so use a double backslash or a forward slash
cout << "Enter the path of the data file, be sure to include extension." << endl;
cout << "You can use either of the following:" << endl;
cout << "A forwardslash or double backslash to separate each directory." << endl;
getline(cin,filename);
ifstream input_file(filename.c_str());
if (input_file.is_open()) { //if file is open
cout << "open" << endl; //just a coding check to make sure it works ignore
string fileContents; //string to store contents
string temp;
while (!input_file.eof()) { //not end of file I know not best practice
getline(input_file, temp);
fileContents.append(temp); //appends file to string
}
cout << fileContents << endl; //prints string for test
}
else {
cout << "Error opening file check path or file extension" << endl;
}
In this file format, (* signals the beginning of a comment, so everything from there to a matching *) should be ignored (even if it contains a digit). For example, given input of (*Hi 245*) 6, the 6 should be counted, not the 2.
How do I iterate over the file only finding the first integer and counting it, while ignoring comments?
One way to approach your problem is the following:
Create a std::map<int, int> where the key is the digit and the value is the count. This allows you to compute statistics on your digits such as the count and the frequency after you have parsed the file. Something similar can be found in this SO answer.
Read each line of your file as a std::string using std::getline as shown in this SO answer.
For each line, strip the comments using a function such as this:
std::string& strip_comments(std::string & inp,
std::string const& beg,
std::string const& fin = "") {
std::size_t bpos;
while ((bpos = inp.find(beg)) != std::string::npos) {
if (fin != "") {
std::size_t fpos = inp.find(fin, bpos + beg.length());
if (fpos != std::string::npos) {
inp = inp.erase(bpos, fpos - bpos + fin.length());
} else {
// else don't erase because fin is not found, but break
break;
}
} else {
inp = inp.erase(bpos, inp.length() - bpos);
}
}
return inp;
}
which can be used like this:
std::string line;
std::getline(input_file, line);
line = strip_comments(line, "(*", "*)");
After stripping the comments, use the string member function find_first_of to find the first digit:
std::size_t dpos = line.find_first_of("123456789");
What is returned here is the index location in the string for the first digit. You should check that the returned position is not std::string::npos, as that would indicate that no digits are found. If the first digit is found, the corresponding character can be extracted using const char c = line[dpos]; and converted to an integer using std::atoi.
Increment the count for that digit in the std::map as shown in that first linked SO answer. Then loop back to read the next line.
After reading all lines from the file, the std::map will contain the counts for all first digits found in each line stripped of comments. You can then iterate over this map to retrieve all the counts, accumulate the total count over all digits found, and compute the frequency for each digit. Note that digits not found will not be in the map.
I hope this helps you get started. I leave the writing of the code to you. Good luck!

basic lexical analyzer that reads a text file as input and writes all the "tokens" to an output file line by line [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
writing a very basic lexical analyzer in standard
C++ (std usage is allowed) that reads a text file
as input and writes all the "tokens" to an output
file line by line
● All items that are separated by white space are
tokens
● White space includes
● Space(s)
● Newlines
● Tabs
Input file having this
if (x > 50)
x=0
Output file should be like this
if
(
x
>
50
)
x
=
0
Edit 3
Ok so im almost there i have made an input file that takes whatever is written in it and then tokenizes it and shows on the screen. an output file is generated bt when it shows this output Output tokens 0033F8C8 i thinks its the array address. now all i need to do is store the output being shown on the screen should go into the output text file and on the screen it should print see your output file your inputs are tokenized.
code edit 3
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <fstream>
using std::ifstream;
#include <cstring>
using namespace std;
const int MAX_CHARS_PER_LINE = 512;
const int MAX_TOKENS_PER_LINE = 20;
const char* const DELIMITER = " ";
int main()
{
// create a file-reading object
ifstream infile;
infile.open("input.txt"); // open a file
if (!infile.good())
return 1; // exit if file not found
ofstream outfile;
outfile.open("output.txt");
if (!outfile)
{
cout << "error opening fiLe";
}
// read each line of the file
while (!infile.eof())
{
// read an entire line into memory
char buf[MAX_CHARS_PER_LINE];
infile.getline(buf, MAX_CHARS_PER_LINE);
// parse the line into blank-delimited tokens
int n = 0; // a for-loop index
// array to store memory addresses of the tokens in buf
const char* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0
// parse the line
token[0] = strtok(buf, DELIMITER); // first token
if (token[0]) // zero if line is blank
{
for (n = 1; n < MAX_TOKENS_PER_LINE; n++)
{
token[n] = strtok(0, DELIMITER); // subsequent tokens
if (!token[n]) break; // no more tokens
}
}
// process (print) the tokens
for (int i = 0; i < n; i++) // n = #of tokens
cout << "Token[" << i << "] = " << token[i] << endl;
cout << endl;
outfile<< "Output tokens" <<token;
infile.close();
outfile.close();
}
system("pause");
return 0;
}
Since you have tagged your question as C++, I recommend using the C++ language features as they are safer (less defects injected) than C.
For example, replace char * with std::string. The std::string has more features for lexing or searching than the C-Style string functions (str*()).
For your projects, I don't recommend using regular expressions. Getting them to work correctly may take you longer than implement a simple lexer.
Edit 1
For more specific help, please edit your question with the syntax rules of the language you are parsing.
One simple technique is to isolate "words" or identifiers from symbols.
Also read up on Backus-Naur notation
Your issue is too complicated (in the big picture) to be answered in StackOverflow.
Edit 2 - Example
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>
int main(void)
{
std::ifstream input("input.txt");
if (!input)
{
std::cerr << "Error opening \"input.txt\".\r\n";
return 1;
}
// Read the file.
std::string input_text;
while (std::getline(input, input_text))
{
static const char white_space[] = " \t";
static const char identifier_letters[] = "abcdefghijklmnopqrstuvwxyz";
std::string::size_type position = 0;
if (input_text.length() == 0)
{
continue; // Empty line
}
// Skip white space.
position = input_text.find_first_not_of(white_space, 0);
if ((position == std::string::npos)
{
continue; // Blank line.
}
// Check if the first (non-whitespace character) is a
// valid identifier letter
if (std::isalpha(input_text[position]))
{
// First character is a letter.
// Find the end of the identifier.
std::string::size_type identifier_end_posn = 0;
identifier_end_posn = input_text.find_first_not_of(identifier_letters, position);
if (identifier_end_posn != std::string::npos)
{
const int identifier_length = identifier_end_posn - position + 1;
std::string identifier = input_text.substr(position, identifier_length);
cout << identifier << "\n";
continue;
}
}
}
return 0;
}
You will need to augment the above code to look for symbols and numbers.

Incorrect char from file

I have the following .txt file:
test.txt
1,2,5,6
Passing into a small C++ program I made through command line as follows:
./test test.txt
Source is as follows:
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char **argv)
{
int temp =0;
ifstream file;
file.open(argv[1]);
while(!file.eof())
{
temp=file.get();
file.ignore(1,',');
cout<<temp<<' ';
}
return 0;
}
For some reason my output is not 1 2 5 6 but 49 50 53 54. What gives?
UPDATE:
Also, I noticed there is another implementation of get(). If I define char temp then I can do file.get(temp) and that will also save me converting ASCII representation. However I like using while (file >> temp) so I will be going with that. Thanks.
temp is an int. So you see the encoded ascii values after casting the char to an int.
49 is the ascii code for digit 49-48 = 1.
get() gives you a character (character code).
by the way, eof() only becomes true after a failed read attempt, so the code you show,
while(!file.eof())
{
temp=file.get();
file.ignore(1,',');
cout<<temp<<' ';
}
will possibly display one extraneous character at the end.
the conventional loop is
while( file >> temp )
{
cout << temp << ' ';
}
where the expression file >> temp reads in one number and produces a reference to file, and where that file objected is converted to bool as if you had written
while( !(file >> temp).fail() )
This does not do what you think it does:
while(!file.eof())
This is covered in Why is iostream::eof inside a loop condition considered wrong?, so I won't cover it in this answer.
Try:
char c;
while (file >> c)
{
// [...]
}
...instead. Reading in a char rather than an int will also save you having to convert the ascii representation (ASCII value 49 is 1, etc...).
For the record, and despite this being the nth duplicate, here's how this code might look in idiomatic C++:
for (std::string line; std::getline(file, line); )
{
std::istringstream iss(line);
std::cout << "We read:";
for (std::string n; std::getline(iss, line, ','); )
{
std::cout << " " << n;
// now use e.g. std::stoi(n)
}
std::cout << "\n";
}
If you don't care about lines or just have one line, you can skip the outer loop.