Pull out data from a file and store it in strings in C++ - c++

I have a file which contains records of students in the following format.
Umar|Ejaz|12345|umar#umar.com
Majid|Hussain|12345|majid#majid.com
Ali|Akbar|12345|ali#geeks-inn.com
Mahtab|Maqsood|12345|mahtab#myself.com
Juanid|Asghar|12345|junaid#junaid.com
The data has been stored according to the following format:
firstName|lastName|contactNumber|email
The total number of lines(records) can not exceed the limit 100. In my program, I've defined the following string variables.
#define MAX_SIZE 100
// other code
string firstName[MAX_SIZE];
string lastName[MAX_SIZE];
string contactNumber[MAX_SIZE];
string email[MAX_SIZE];
Now, I want to pull data from the file, and using the delimiter '|', I want to put data in the corresponding strings. I'm using the following strategy to put back data into string variables.
ifstream readFromFile;
readFromFile.open("output.txt");
// other code
int x = 0;
string temp;
while(getline(readFromFile, temp)) {
int charPosition = 0;
while(temp[charPosition] != '|') {
firstName[x] += temp[charPosition];
charPosition++;
}
while(temp[charPosition] != '|') {
lastName[x] += temp[charPosition];
charPosition++;
}
while(temp[charPosition] != '|') {
contactNumber[x] += temp[charPosition];
charPosition++;
}
while(temp[charPosition] != endl) {
email[x] += temp[charPosition];
charPosition++;
}
x++;
}
Is it necessary to attach null character '\0' at the end of each string? And if I do not attach, will it create problems when I will be actually implementing those string variables in my program. I'm a new to C++, and I've come up with this solution. If anybody has better technique, he is surely welcome.
Edit: Also I can't compare a char(acter) with endl, how can I?
Edit: The code that I've written isn't working. It gives me following error.
Segmentation fault (core dumped)
Note: I can only use .txt file. A .csv file can't be used.

There are many techniques to do this. I suggest searching StackOveflow for "[C++] read file" to see some more methods.
Find and Substring
You could use the std::string::find method to find the delimiter and then use std::string::substr to return a substring between the position and the delimiter.
std::string::size_type position = 0;
positition = temp.find('|');
if (position != std::string::npos)
{
firstName[x] = temp.substr(0, position);
}

If you don't terminate a a C-style string with a null character there is no way to determine where the string ends. Thus, you'll need to terminate the strings.
I would personally read the data into std::string objects:
std::string first, last, etc;
while (std::getline(readFromFile, first, '|')
&& std::getline(readFromFile, last, '|')
&& std::getline(readFromFile, etc)) {
// do something with the input
}
std::endl is a manipulator implemented as a function template. You can't compare a char with that. There is also hardly ever a reason to use std::endl because it flushes the stream after adding a newline which makes writing really slow. You probably meant to compare to a newline character, i.e., to '\n'. However, since you read the string with std::getline() the line break character will already be removed! You need to make sure you don't access more than temp.size() characters otherwise.
Your record also contains arrays of strings rather than arrays of characters and you assign individual chars to them. You either wanted to yse char something[SIZE] or you'd store strings!

Related

C++ read different kind of datas from file until there's a string beginning with a number

In C++, I'd like to read from an input file which contains different kind of datas: first the name of a contestant (2 or more strings with whitespaces), then an ID (string without whitespaces, always beginning with a number), then another strings without ws and a numbers (the sports and their achieved places).
For example:
Josh Michael Allen 1063Szinyei running 3 swimming 1 jumping 1
I show you the code what I started to write and then stucked..
void ContestEnor::next()
{
string line;
getline(_f , line);
if( !(_end = _f.fail()) ){
istringstream is(line);
is >> _cur.contestant >> _cur.id; // here I don't know how to go on
_cur.counter = 0;
//...
}
}
Thank you for your help in advance.
You should look into using std::getline with a delimiter. This way, you can delimit on a space character and read until you find a string where the first character in a number. Here is a short code example (this seems rather homework-like, so I don't want to write too much of it for you ;):
std::string temp, id;
while (std::getline(_f, temp, ' ')) {
if (temp[0] >= 0 && temp[0] <= '9') {
id = temp;
}
// you would need to add more code for the rest of the data on that line
}
/* close the file, etc. */
This code should be pretty self-explanatory. The most important thing to know is that you can use std::getline to get data up until a delimiter. The delimiter is consumed, just like the default behavior of delimiting on a newline character. Thus, the name getline isn't entirely accurate - you can still get only part of a line if you need to.

C++ retrieve numerical values in a line of string

Here is the content of txt file that i've managed read.
X-axis=0-9
y-axis=0-9
location.txt
temp.txt
I'm not sure whether if its possible but after reading the contents of this txt file i'm trying to store just the x and y axis range into 2 variables so that i'll be able to use it for later functions. Any suggestion? And do i need to use vectors? Here is the code for reading of the file.
string configName;
ifstream inFile;
do {
cout << "Please enter config filename: ";
cin >> configName;
inFile.open(configName);
if (inFile.fail()){
cerr << "Error finding file, please re-enter again." << endl;
}
} while (inFile.fail());
string content;
string tempStr;
while (getline(inFile, content)){
if (content[0] && content[1] == '/') continue;
cout << endl << content << endl;
depends on the style of your file, if you are always sure that the style will remain unchanged, u can read the file character by character and implement pattern recognition stuff like
if (tempstr == "y-axis=")
and then convert the appropriate substring to integer using functions like
std::stoi
and store it
I'm going to assume you already have the whole contents of the .txt file in a single string somewhere. In that case, your next task should be to split the string. Personally, yes, I would recommend using vectors. Say you wanted to split that string by newlines. A function like this:
#include <string>
#include <vector>
std::vector<std::string> split(std::string str)
{
std::vector<std::string> ret;
int cur_pos = 0;
int next_delim = str.find("\n");
while (next_delim != -1) {
ret.push_back(str.substr(cur_pos, next_delim - cur_pos));
cur_pos = next_delim + 1;
next_delim = str.find("\n", cur_pos);
}
return ret;
}
Will split an input string by newlines. From there, you can begin parsing the strings in that vector. They key functions you'll want to look at are std::string's substr() and find() methods. A quick google search should get you to the relevant documentation, but here you are, just in case:
http://www.cplusplus.com/reference/string/string/substr/
http://www.cplusplus.com/reference/string/string/find/
Now, say you have the string "X-axis=0-9" in vec[0]. Then, what you can do is do a find for = and then get the substrings before and after that index. The stuff before will be "X-axis" and the stuff after will be "0-9". This will allow you to figure that the "0-9" should be ascribed to whatever "X-axis" is. From there, I think you can figure it out, but I hope this gives you a good idea as to where to start!
std::string::find() can be used to search for a character in a string;
std::string::substr() can be used to extract part of a string into another new sub-string;
std::atoi() can be used to convert a string into an integer.
So then, these three functions will allow you to do some processing on content, specifically: (1) search content for the start/stop delimiters of the first value (= and -) and the second value (- and string::npos), (2) extract them into temporary sub-strings, and then (3) convert the sub-strings to ints. Which is what you want.

Converting strings to char arrays using cstring

I have a small assignment which partly requires me to take inputs from a file in the form of strings and place them into char arrays so I can check if the string contains any '*' character at the end of it.
I have been able to extract the strings from the files successfully, however i have failed to find a way in which to place them in char arrays so i can process them.
I would be very grateful if someone would let me know how to place a string into char arrays using cstring library. Please keep in mind that the strings are taken from a file and not as user input.
some of the ways i tried is the following:
//Try 1
char CstringArray[] = LineFromFile;
//Try 2
char CstringArray[100] = LineFromFile;
//Try 3
ifstream Test("Test.txt");
Test>>CstringArray;
//Try 4
ifstream Test("Test.txt");
Test>>CstringArray[0];
Thank you very much
Since this is an assignment, your professor will probably not be happy with you using all of C++'s functionality, particularly if you don't understand it, but since it's a one liner I figured I'd tell you how I'd print all strings ending in an asterisks. Given that you have successfully opened the file to ifstream Test you can do:
copy_if(istream_iterator<string>(Test), istream_iterator<string>(), ostream_iterator<string>(cout, " "), [](const auto& i) { return !empty(i) && i.back() == '*'; })
EDIT:
I'm using an istream_iterator to read in each string in Test and istream_iterator, I'm operating on these values immediately, but if you needed to start by saving all the strings to a vector<string> you could also do this: vector<string> CstringArray{ istream_iterator<string>(Test), istream_iterator<string>() }
I'm using an ostream_iterator to directly stream out my selected strings rather than storing them
I'm using copy_if to iterate over all the strings that are streamed in, selecting only those that meet a given criteria
I'm using the lambda: [](const auto& i) { return !empty(i) && i.back() == '*'; } to conditionally select non-empty strings which end with an asterisks character

Can I use 2 or more delimiters in C++ function getline? [duplicate]

This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 4 years ago.
I would like to know how can I use 2 or more delimiters in the getline functon, that's my problem:
The program reads a text file... each line is goning to be like:
New Your, Paris, 100
CityA, CityB, 200
I am using getline(file, line), but I got the whole line, when I want to to get CityA, then CityB and then the number; and if I use ',' delimiter, I won't know when is the next line, so I'm trying to figure out some solution..
Though, how could I use comma and \n as a delimiter?
By the way,I'm manipulating string type,not char, so strtok is not possible :/
some scratch:
string line;
ifstream file("text.txt");
if(file.is_open())
while(!file.eof()){
getline(file, line);
// here I need to get each string before comma and \n
}
You can read a line using std::getline, then pass the line to a std::stringstream and read the comma separated values off it
string line;
ifstream file("text.txt");
if(file.is_open()){
while(getline(file, line)){ // get a whole line
std::stringstream ss(line);
while(getline(ss, line, ',')){
// You now have separate entites here
}
}
No, std::getline() only accepts a single character, to override the default delimiter. std::getline() does not have an option for multiple alternate delimiters.
The correct way to parse this kind of input is to use the default std::getline() to read the entire line into a std::string, then construct a std::istringstream, and then parse it further, into comma-separate values.
However, if you are truly parsing comma-separated values, you should be using a proper CSV parser.
Often, it is more intuitive and efficient to parse character input in a hierarchical, tree-like manner, where you start by splitting the string into its major blocks, then go on to process each of the blocks, splitting them up into smaller parts, and so on.
An alternative to this is to tokenize like strtok does -- from the beginning of input, handling one token at a time until the end of input is encountered. This may be preferred when parsing simple inputs, because its is straightforward to implement. This style can also be used when parsing inputs with nested structure, but this requires maintaining some kind of context information, which might grow too complex to maintain inside a single function or limited region of code.
Someone relying on the C++ std library usually ends up using a std::stringstream, along with std::getline to tokenize string input. But, this only gives you one delimiter. They would never consider using strtok, because it is a non-reentrant piece of junk from the C runtime library. So, they end up using streams, and with only one delimiter, one is obligated to use a hierarchical parsing style.
But zneak brought up std::string::find_first_of, which takes a set of characters and returns the position nearest to the beginning of the string containing a character from the set. And there are other member functions: find_last_of, find_first_not_of, and more, which seem to exist for the sole purpose of parsing strings. But std::string stops short of providing useful tokenizing functions.
Another option is the <regex> library, which can do anything you want, but it is new and you will need to get used to its syntax.
But, with very little effort, you can leverage existing functions in std::string to perform tokenizing tasks, and without resorting to streams. Here is a simple example. get_to() is the tokenizing function and tokenize demonstrates how it is used.
The code in this example will be slower than strtok, because it constantly erases characters from the beginning of the string being parsed, and also copies and returns substrings. This makes the code easy to understand, but it does not mean more efficient tokenizing is impossible. It wouldn't even be that much more complicated than this -- you would just keep track of your current position, use this as the start argument in std::string member functions, and never alter the source string. And even better techniques exist, no doubt.
To understand the example's code, start at the bottom, where main() is and where you can see how the functions are used. The top of this code is dominated by basic utility functions and dumb comments.
#include <iostream>
#include <string>
#include <utility>
namespace string_parsing {
// in-place trim whitespace off ends of a std::string
inline void trim(std::string &str) {
auto space_is_it = [] (char c) {
// A few asks:
// * Suppress criticism WRT localization concerns
// * Avoid jumping to conclusions! And seeing monsters everywhere!
// Things like...ah! Believing "thoughts" that assumptions were made
// regarding character encoding.
// * If an obvious, portable alternative exists within the C++ Standard Library,
// you will see it in 2.0, so no new defect tickets, please.
// * Go ahead and ignore the rumor that using lambdas just to get
// local function definitions is "cheap" or "dumb" or "ignorant."
// That's the latest round of FUD from...*mumble*.
return c > '\0' && c <= ' ';
};
for(auto rit = str.rbegin(); rit != str.rend(); ++rit) {
if(!space_is_it(*rit)) {
if(rit != str.rbegin()) {
str.erase(&*rit - &*str.begin() + 1);
}
for(auto fit=str.begin(); fit != str.end(); ++fit) {
if(!space_is_it(*fit)) {
if(fit != str.begin()) {
str.erase(str.begin(), fit);
}
return;
} } } }
str.clear();
}
// get_to(string, <delimiter set> [, delimiter])
// The input+output argument "string" is searched for the first occurance of one
// from a set of delimiters. All characters to the left of, and the delimiter itself
// are deleted in-place, and the substring which was to the left of the delimiter is
// returned, with whitespace trimmed.
// <delimiter set> is forwarded to std::string::find_first_of, so its type may match
// whatever this function's overloads accept, but this is usually expressed
// as a string literal: ", \n" matches commas, spaces and linefeeds.
// The optional output argument "found_delimiter" receives the delimiter character just found.
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters, char& found_delimiter) {
const auto pos = str.find_first_of(std::forward<D>(delimiters));
if(pos == std::string::npos) {
// When none of the delimiters are present,
// clear the string and return its last value.
// This effectively makes the end of a string an
// implied delimiter.
// This behavior is convenient for parsers which
// consume chunks of a string, looping until
// the string is empty.
// Without this feature, it would be possible to
// continue looping forever, when an iteration
// leaves the string unchanged, usually caused by
// a syntax error in the source string.
// So the implied end-of-string delimiter takes
// away the caller's burden of anticipating and
// handling the range of possible errors.
found_delimiter = '\0';
std::string result;
std::swap(result, str);
trim(result);
return result;
}
found_delimiter = str[pos];
auto left = str.substr(0, pos);
trim(left);
str.erase(0, pos + 1);
return left;
}
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters) {
char discarded_delimiter;
return get_to(str, std::forward<D>(delimiters), discarded_delimiter);
}
inline std::string pad_right(const std::string& str,
std::string::size_type min_length,
char pad_char=' ')
{
if(str.length() >= min_length ) return str;
return str + std::string(min_length - str.length(), pad_char);
}
inline void tokenize(std::string source) {
std::cout << source << "\n\n";
bool quote_opened = false;
while(!source.empty()) {
// If we just encountered an open-quote, only include the quote character
// in the delimiter set, so that a quoted token may contain any of the
// other delimiters.
const char* delimiter_set = quote_opened ? "'" : ",'{}";
char delimiter;
auto token = get_to(source, delimiter_set, delimiter);
quote_opened = delimiter == '\'' && !quote_opened;
std::cout << " " << pad_right('[' + token + ']', 16)
<< " " << delimiter << '\n';
}
std::cout << '\n';
}
}
int main() {
string_parsing::tokenize("{1.5, null, 88, 'hi, {there}!'}");
}
This outputs:
{1.5, null, 88, 'hi, {there}!'}
[] {
[1.5] ,
[null] ,
[88] ,
[] '
[hi, {there}!] '
[] }
I don't think that's how you should attack the problem (even if you could do it); instead:
Use what you have to read in each line
Then split up that line by the commas to get the pieces that you want.
If strtok will do the job for #2, you can always convert your string into a char array.

C++ Escape Phrase Substring

I'm trying to parse web data coming from a server, and I'm trying to find a more stl version of what I had.
My old code consisted of a for() loop and checked each character of the string against a set of escape characters and used a stringstream to collect the rest. As I'm sure you can imagine, this sort of loop leads to being a high point of failure when reading web data, as I need strict syntax checking.
I'm trying to instead start using the string::find and string::substr functions, but I'm unsure of the best implementation to do it with.
Basically, I want to read a string of data from a server, different data, separated by a comma. (i.e., first,lastname,email#email.com) and separate it at the commas, but read the data in between.
Can anyone offer any advice?
I'm not sure what kind of data are you parsing, but it's always a good idea to use a multi layer architecture. Each layer should implement an abstract function, and each layer should only do one job (like escaping characters).
The number of layers you use depends on the actual steps needed to decode the stream
for your problem I suggest the following layers:
1st: tokenize by ',' and '\n': convert in to some kind of vector of strings
2nd: resolve escapes: decode escape characers
you should use std::stringstream, and process the characters with a loop. unless your format is REALLY simple (like only a single separator character, without escapes), you can't really use any standard function.
For the learning experience, this is the code I ended up using to parse data into a map. You can use the web_parse_resurn.err to see if an error was hit, or use it for specific error codes.
struct web_parse_return {
map<int,string> parsedata;
int err;
};
web_parse_return* parsewebstring(char* escapechar, char* input, int tokenminimum) {
int err = 0;
map<int,string> datamap;
if(input == "MISSING_INFO") { //a server-side string for data left out in the call
err++;
}
else {
char* nTOKEN;
char* TOKEN = strtok_s(input, escapechar,&nTOKEN);
if(TOKEN != 0) { //if the escape character is found
int tokencount = 0;
while(TOKEN != 0) {//since it finds the next occurrence, keep going
datamap.insert(pair<int,string>(tokencount,TOKEN));
TOKEN = strtok_s(NULL, escapechar,&nTOKEN);
tokencount++;
}
if(tokencount < tokenminimum) //check that the right number was hit
err++; //other wise, up the error count
}
else {
err++;
}
}
web_parse_return* p = new web_parse_return; //initializing a new struct
p->err = err;
p->parsedata = datamap;
return p;
}