Extracting a particular data from a CSV file in c++ - c++

I have written a program to read a CSV file but I'm having some trouble in extracting data from that CSV file in c++. I want to count the no. of columns starting from the 5th column in the 1st row until the last column of the 1st row of the CSV file. I have written the following code to read a CVS file, but I am not sure how shall I count the no. of columns as I have mentioned before.
Will appreciate it if anyone could please tell me how shall I go about it?
char* substring(char* source, int startIndex, int endIndex)
{
int size = endIndex - startIndex + 1;
char* s = new char[size+1];
strncpy(s, source + startIndex, size); //you can read the documentation of strncpy online
s[size] = '\0'; //make it null-terminated
return s;
}
char** readCSV(const char* csvFileName, int& csvLineCount)
{
ifstream fin(csvFileName);
if (!fin)
{
return nullptr;
}
csvLineCount = 0;
char line[1024];
while(fin.getline(line, 1024))
{
csvLineCount++;
};
char **lines = new char*[csvLineCount];
fin.clear();
fin.seekg(0, ios::beg);
for (int i=0; i<csvLineCount; i++)
{
fin.getline(line, 1024);
lines[i] = new char[strlen(line)+1];
strcpy(lines[i], line);
};
fin.close();
return lines;
}
I have attached a few lines from the CSV file:-
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,
,Afghanistan,33.0,65.0,0,0,0,0,0,0,0,
,Albania,41.1533,20.1683,0,0,0,0
What I need is, in the 1st row, the number of dates after Long.

To answer your question:
I have attached a few lines from the CSV file:-
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0
What I need is, in the 1st row, the number of dates after Long.
Yeah, not that difficult - that's how I would do it:
#include <iostream>
#include <string>
#include <fstream>
#include <regex>
#define FILENAME "test.csv" //Your filename as Macro
//(The compiler just sees text.csv instead of FILENAME)
void read(){
std::string n;
//date format pattern %m/%dd/%YY
std::regex pattern1("\\b\\d{1}[/]\\d{2}[/]\\d{2}\\b");
//date format pattern %mm/%dd/%YY
std::regex pattern2("\\b\\d{2}[/]\\d{2}[/]\\d{2}\\b");
std::smatch result1, result2;
std::ifstream file(FILENAME, std::ios::in);
if ( ! file.is_open() )
{
std::cout << "Could not open file!" << '\n';
}
do{
getline(file,n,',');
//https://en.cppreference.com/w/cpp/string/basic_string/getline
if(std::regex_search(n,result1,pattern1))
std::cout << result1.str(1) << n << std::endl;
if(std::regex_search(n,result2,pattern2))
std::cout << result2.str(1) << n << std::endl;
}
while(!file.eof());
file.close();
}
int main ()
{
read();
return 0;
}
The file test.csv contains the following for testing:
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0
Province/State,Country/Region,Lat,Long,1/25/20,12/26/20,1/27/20, ,Bfghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Blbania,41.1533,20.1683,0,0,0,0
It actually is pretty simple:
getline takes the open file and "escapes" at a so called escape-charachter,
in your case a comma ','.
(That is the very best way I found in reading csv - you can replace it with whatever you want, for example: ';' or ' ' or '...' - guess you get the drill)
After this you got all data nicely separated underneath one another without a comma.
Now you can "filter" out what you need. I use regex - but use what ever you want.
(Just fyi: For c++ tagged questions you shouldn't use c-style like strncpy..)
I gave you an example for 1.23.20 (m/dd/yy) and to make it simple if your file contains a november or december like 12.22.20 (mm/dd/yy) to make
the regex pattern more easy to read/understand in 2 lines.
you can/may have to expand the regex pattern if the data somehow matches
your date format in the file, really good explained here and not as complicated as it looks.
From that point you can put all the printed stuff f.e. in a vector (some more convenient array) to handle and/or pass/return data - that's up to you.
If you need more explaining I am happy to help you out and/or expand this example, just leave a comment.

You basically want to search for the seperator substring within your line (normally it is ';').
If you print out your lines it should look like this:
a;b;c;d;e;f;g;h
There are several ways to achieve what you want, I would look for a strip or split upon character function. Something along the example below should work. If you use std you can go with str.IndexOf instead of a loop.
int rows(char* line,char seperator, int count) {
unsigned length = strlen(line);
for (int i=pos; i<length;i++){
if(strcmp(line[i],seperator)) break;
}
count++;
if (i<length-1) return rows(substring(line,i,length-i),seperator,count);
else return count;
}
The recursion can obviously be replaced by one loop ;)
int countSign(char* line, char* sign){
unsigned l = strlen(line);
int count = 0;
for (int i=0; i < l; i++) {
if(strcmp(line[i],sign)) count++;
}
}

Related

How to delimit and write file contents to vector?

So let's say I have a vector of ints and a text file which looks like this:
1|2|3|4|5
How can I add the numbers to the vector?
First, you would open the file using std::ifstream. There are a few ways you could then read these out, but one example would be to use std::getline with a custom "end of line" character, being your | in this case:
std::vector<int> myVect;
std::ifstream reader("./file.txt"); //Replace with path to your file
for(int i = 0; i < 5; i++) {
std::string item;
std::getline(reader, item, '|'); //The third argument tells it to read until a '|' char
int item = std::stoi(item); //Convert from string to int
myVect.push_back(number);
}
This example relies on you knowing how many elements you want to get, but can be modified to work with an unknown size.

Txt to 2 different arrays c++

I have a txt file with a lot of things in it.
The lines have this pattern: 6 spaces then 1 int, 1 space, then a string.
Also, the 1st line has the amount of lines that the txt has.
I want to put the integers in an array of ints and the string on an array of strings.
I can read it and put it into an array , but only if I'm considering the ints as chars and putting into one array of strings.When I try to separate things I have no idea on how I'd do it. Any ideas?
The code I used for putting everything in an array was this:
int size()
{
ifstream sizeX;
int x;
sizeX.open("cities.txt");
sizeX>>x;
return x;
};
int main(void)
{
int size = size();
string words[size];
ifstream file("cities.txt");
file.ignore(100000,'\n');
if(file.is_open())
{
for(int i=0; i<size; i++)
{
getline(file,words[i]);
}
}
}
Just to start I'm going to provide some tips about your code:
int size = size();
Why do you need to open the file, read the first line and then close it? That process can be done opening the file just once.
The code string words[size]; is absolutely not legal C++. You cannot instantiate a variable-length-array in C++. That C feature has been not included in C++ standard (some ref). I suggest you to replace with std::vector, which is more C++ code.
Here I write a snippet of function which perform what you need.
int parse_file(const std::string& filename,
std::vector<std::string>* out_strings,
std::vector<int>* out_integers) {
assert(out_strings != nullptr);
assert(out_integers != nullptr);
std::ifstream file;
file.open(filename, std::ios_base::in);
if (file.fail()) {
// handle the error
return -1;
}
// Local variables
int num_rows;
std::string line;
// parse the first line
std::getline(file, line);
if (line.size() == 0) {
// file empty, handle the error
return -1;
}
num_rows = std::stoi(line);
// reserve memory
out_strings->clear();
out_strings->reserve(num_rows);
out_integers->clear();
out_integers->reserve(num_rows);
for (int row = 0; row < num_rows; ++row) {
// read the line
std::getline(file, line);
if (line.size() == 0) {
// unexpected end of line, handle it
return -1;
}
// get the integer
out_integers->push_back(
std::stoi(line.substr(6, line.find(' ', 6) - 6)));
// get the string
out_strings->push_back(
line.substr(line.find(' ', 6) + 1, std::string::npos));
}
file.close();
return 0;
}
You can definitely improved it, but I think it's a good point where to start.
The last suggest I can give you, in order to improve the robustness of your code, you can match each line with a regular expression. In this way you can be sure your line is formatted exactly how you need.
For example:
std::regex line_pattern("\\s{6}[0-9]+\\s[^\\n]+");
if (std::regex_match(line, line_pattern) == false) {
// ups... the line is not formatted how you need
// this is an error
}

C++ Read until tab detected

This may be a little bit redundant, but is there a short/compact method of reading in a string until a tab is reached in C++? Similar to other questions, but I want to keep reading even if I hit a space. For example if the STDIN is
Cute Kitty is fabulous as always
Then I want to read in Cute Kitty; is; fabulous as always, three times.
I've seen people do this with regex in files, but how would you do this on the stdin in C++? I want to put it in a string class and whenever I try something like
scanf("%s\t", &mystring);
It throws up an error because I'm not using an array of chars.
Thanks, please keep answers easy enough for a noob to understand.
This code seems to work for me. It basically gets the line that was entered from the user via stdin and then reads each character waiting for a tab character (\t), or the end of the line.
#include <iostream>
#include <string>
int main()
{
std::string a;
std::getline(std::cin,a);
int index_holder = 0;
for(std::string::size_type i = 0; i < a.size(); ++i)
{
if(a[i] == '\t' || (i == a.size() - 1)) {
std::cout << a.substr(index_holder, i - index_holder) << std::endl;
index_holder = i + 1;
}
}
return 0;
}
Have a look at strtok:
char * strtok ( char * str, const char * delimiters );
Split string into tokens. A sequence of calls to this function split str into tokens, which are sequences of contiguous characters separated by any of the characters that are part of delimiters.

Comparing numbers in strings to numbers in int?

I'm trying to make a program that will open a txt file containing a list of names in this format (ignore the bullets):
3 Mark
4 Ralph
1 Ed
2 Kevin
and will create a file w/ organized names based on the number in front of them:
1 Ed
2 Kevin
3 Mark
4 Ralph
I think I'm experiencing trouble in line 40, where I try to compare the numbers stored in strings with a number stored in an int.
I can't think of any other way to tackle this, any advice would be wonderful!
#include <iostream>
#include <fstream>
#include <vector>
#include <cstdlib>
using namespace std;
int main()
{
ifstream in;
ofstream out;
string line;
string collection[5];
vector <string> lines;
vector <string> newLines;
in.open("infile.txt");
if (in.fail())
{
cout << "Input file opening failed. \n";
exit(1);
}
out.open("outfile.txt");
if (out.fail())
{
cout << "Output file opening failed. \n";
exit(1);
}
while (!in.eof())
{
getline(in, line);
lines.push_back(line);
}
for (int i = 0; i < lines.size(); i++)
{
collection[i] = lines[i];
}
for (int j = 0; j < lines.size(); j++)
{
for (int x = 0; x < lines.size(); x--)
{
if (collection[x][0] == j)
newLines.push_back(collection[x]);
}
}
for (int k = 0; k < newLines.size(); k++)
{
out << newLines[k] << endl;
}
in.close( );
out.close( );
return 0;
}
Using a debugger would tell you where you went wrong, but let me highlight the mistake:
if (collection[x][0] == j)
You're expecting a string like 3 Mark. The first character of this string is '3', but that has the ASCII value of 51, and that is the numerical value you'll get when trying work with it is this way! This will never equal j, unless you've got a lot of lines in your file, and then your search system will not work at all like you wanted. YOu need to convert that character into an integer, and then do your comparison.
C++ offers many way to process data via streams, including parsing simple datafiles and converting text to numbers and vice versa. Here's a simple standalone function that will read a datafile like you have (only with arbitrary text including spaces after the number on each line).
#include <algorithm>
// snip
struct file_entry { int i; std::string text; };
std::vector<file_entry> parse_file(std::istream& in)
{
std::vector<file_entry> data;
while (!in.eof())
{
file_entry e;
in >> e.i; // read the first number on the line
e.ignore(); // skip the space between the number and the text
std::getline(in, e.text); // read the whole of the rest of the line
data.push_back(e);
}
return data;
}
Because the standard way that >> works involves reading until the next space (or end of line), if you want to read a chunk of text which contains whitespace, it will be much easier to use std::getline to just slurp up the whole of the rest of the current line.
Note: I've made no attempt to handle malformed lines in the textfile, or any number of other possible error conditions. Writing a proper file parser is outside of the scope of this question, but there are plenty of tutorials out there on using C++'s stream functionality appropriately.
Now you have the file in a convenient structure, you can use other standard c++ features to sort it, rather than reinventing the wheel and trying to do it yourself:
int sort_file_entry(file_entry a, file_entry b)
{
return a.i < b.i;
}
int main()
{
// set up all your streams, etc.
std::vector<file_entry> lines = parse_file(in);
std::sort(lines.begin(), lines.end(), sort_file_entry);
// now you can write the sorted vector back out to disk.
}
Again, a full introduction to how iterators and containers work is well outside the scope of this answer, but the internet has no shortage of introductory C++ guides out there. Good luck!

Getting the nth line of a text file in C++

I need to read the nth line of a text file (e.g. textfile.findline(0) would find the first line of the text file loaded with ifstream textfile). Is this possible?
I don't need to put the contents of the file in an array/vector, I need to just assign a specific line of the text file to a varible (specifically a int).
P.S. I am looking for the simplest solution that would not require me to use any big external library (e.g. Boost)
Thanks in advance.
How about this?
std::string ReadNthLine(const std::string& filename, int N)
{
std::ifstream in(filename.c_str());
std::string s;
//for performance
s.reserve(some_reasonable_max_line_length);
//skip N lines
for(int i = 0; i < N; ++i)
std::getline(in, s);
std::getline(in,s);
return s;
}
If you want to read the start of the nth line, you can use stdin::ignore to skip over the first n-1 lines, then read from the next line to assign to the variable.
template<typename T>
void readNthLine(istream& in, int n, T& value) {
for (int i = 0; i < n-1; ++i) {
in.ignore(numeric_limits<streamsize>::max(), '\n');
}
in >> value;
}
Armen's solution is the correct answer, but I thought I'd throw out an alternative, based on jweyrich's caching idea. For better or for worse, this reads in the entire file at construction, but only saves the newline positions (doesn't store the entire file, so it plays nice with massive files.) Then you can simply call ReadNthLine, and it will immediately jump to that line, and read in the one line you want. On the other hand, this is only optimal if you want to get only a fraction of the lines at a time, and the line numbers are not known at compile time.
class TextFile {
std::ifstream file_stream;
std::vector<std::ifstream::streampos> linebegins;
TextFile& operator=(TextFile& b) = delete;
public;
TextFile(std::string filename)
:file_stream(filename)
{
//this chunk stolen from Armen's,
std::string s;
//for performance
s.reserve(some_reasonable_max_line_length);
while(file_stream) {
linebegins.push_back(file_stream.tellg());
std::getline(file_stream, s);
}
}
TextFile(TextFile&& b)
:file_stream(std::move(b.file_stream)),
:linebegins(std::move(b.linebegins))
{}
TextFile& operator=(TextFile&& b)
{
file_stream = std::move(b.file_stream);
linebegins = std::move(b.linebegins);
}
std::string ReadNthLine(int N) {
if (N >= linebegins.size()-1)
throw std::runtime_error("File doesn't have that many lines!");
std::string s;
// clear EOF and error flags
file_stream.clear();
file_stream.seekg(linebegins[N]);
std::getline(file_stream, s);
return s;
}
};
It's certainly possible. There are (n-1) '\n' characters preceding the nth line. Read lines until you reach the one you're looking for. You can do this on the fly without storing anything except the current line being considered.