Format vector string on stdout - c++

So for this program I'm writing for class I have to format a vector string into standard output. I get how to do it with just strings with the 'printf' function but I don't understand how to do it with this.
Here's what I got:
void put(vector<string> ngram){
while(!(cin.eof())){ ///experimental. trying to read text in string then output in stdout.
printf(ngram, i);///

Okay, I can't read a lot out of your question but from what I understood, you want to print a vector of strings to the standard output!? This would work like this:
void put(std::vector<std::string> ngram){
for(int i=0; i<ngram.size(); i++)
{
//for each element in ngram do:
//here you have multiple options:
//I prefer std::cout like this:
std::cout<<ngram.at(i)<<std::endl;
//or if you want to use printf:
printf(ngram.at(i).c_str());
}
//done...
return;
}
Is that what you wanted?

If you simply want each item on a single line:
void put(const std::vector<std::string> &ngram) {
// Use an iterator to go over each item in the vector and print it.
for (std::vector<std::string>::iterator it = ngram.begin(), end = ngram.end(); it != end; ++it) {
// It is an iterator that can be used to access each string in the vector.
// The std::string c_str() method is used to get a c-style character array that printf() can use.
printf("%s\n", it->c_str());
}
}

Related

Input two matrices which didn't specialize size

I need to input two matrices with their sizes unfixed, using a blank row to declare the end of inputting each matrix.
For example, input:
1 2
3 4
(blank row here, end of input matrix 1)
5 6 7
8 9 10
(blank row here, end of input matrix 2)
will get a 2*2 matrix and a 2*3 matrix.
My current idea is to build a matrix large enough (like 1000*1000), then set loops and use cin to input each element (the code only shows how I input matrix 1):
int matx1[1000][1000];
for (i = 0;i < 1000;i++)
{
for (j = 0;j < 1000;j++)
{
temp = getchar();
if (temp == '\n')
{
mat1.col = j;
break;
}
else
{
putchar(temp);
}
cin>>matx1[i][j];
}
temp = getchar();
if (temp == '\n')
{
mat1.row = i;
break;
}
else
{
putchar(temp);
}
}
When I running this on Xcode, error happens, the putchar() function will interrupt my input in terminal by printing a number each time I press Enter, and the input result is in chaos.
I also tried the following code to avoid use of putchar():
for (i = 0; i < 1000; i++)
{
temp = getchar();
if (temp == '\n')
{
break;
}
else
{
matx1[i][0] = temp;
for (j = 1; j < 1000; j++)
{
cin >> matx1[i][j];
if (getchar() == '\n')
{
break;
}
}
}
}
Still, there are serious problems. The temp variable stores char and even if I convert it to int using ASCII, it works only if the first element of each line is smaller than 10, or the data of the first element of each line will be incorrectly stored.
So, the main question is:
How to switch to a new line to input the same matrix after press Enter once and switch to inputting the next matrix after press Enter again?
Or to say: how to get the event of '\n' without interfering with the original input stream?
To solve the problem at hand there is a more or less standard approach. You want to read csv data.
In your case, it is a little bit more difficult, because you do have a special format in your csv data. So first a " " separated list and then a empty line between 2 entries.
Now, how could this to be done? C++ is an object oriented language with many existing algorithms. You can create define a Proxy class and overwrite the extractor operator. The proxy class, and espcially the extractor, will do all the work.
The extractor, and that is the core of the question is, as said, a little bit more tricky. How can this be done?
In the extractor we will first read a complete line from an std::istream using the function std::getline. After having the line, we see a std::string containing "data-fields", delimited by a space. The std::string needs to be split up and the "data-fields"-contents shall be stored.
The process of splitting up strings is also called tokenizing. The "data-fields"-content is also called "token". C++ has a standard function for this purpose: std::sregex_token_iterator.
And because we have something that has been designed for such purpose, we should use it.
This thing is an iterator. For iterating over a string, hence sregex. The begin part defines, on what range of input we shall operate, then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.
1 --> give me the stuff that I defined in the regex and
-1 --> give me that what is NOT matched based on the regex.
We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators a parameter, and copies the data between the first iterator and 2nd iterator to the std::vector.
The statement
std::vector token(std::sregex_token_iterator(line.begin(), line.end(), separator, -1), {});
defines a variable "token" of type std::vector<std::string>, splits up the std::string and puts the tokens into the std::vector. For your case we will use std::transform to change your strings into integers.
Very simple.
Next step. We want to read from a file. The file conatins also some kind of same data. The same data are rows.
And as for above, we can iterate over similar data. If it is the file input or whatever. For this purpose C++ has the std::istream_iterator. This is a template and as a template parameter it gets the type of data that it should read and, as a constructor parameter, it gets a reference to an input stream. It doesnt't matter, if the input stream is a std::cin, or a std::ifstream or a std::istringstream. The behaviour is identical for all kinds of streams.
And since we do not have files an SO, I use (in the below example) a std::istringstream to store the input csv file. But of course you can open a file, by defining a std::ifstream csvFile(filename). No problem.
We can now read the complete csv-file and split it into tokens and get all data, by simply defining a new variable and use again the range constructor.
Matrix matrix1( std::istream_iterator<ColumnProxy>(testCsv), {} );
This very simple one-liner will read the complete csv-file and do all the expected work.
Please note: I am using C++17 and can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction").
Additionally, you can see that I do not use the "end()"-iterator explicitely.
This iterator will be constructed from the empty brace-enclosed initializer list with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.
Ì hope I could answer your basic question. Please see the full blown C++ example below:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <algorithm>
std::istringstream testCsv{ R"(1 2
3 4
5 6 7
8 9 10
)" };
// Define Alias for easier Reading
//using Columns = std::vector<std::string>;
using Columns = std::vector<int>;
using Matrix = std::vector<Columns>;
// The delimiter
const std::regex re(" ");
// Proxy for the input Iterator
struct ColumnProxy {
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, ColumnProxy& cp) {
// Read a line
cp.columns.clear();
if (std::string line; std::getline(is, line)) {
if (!line.empty()) {
// Split values and copy into resulting vector
std::transform(std::sregex_token_iterator(line.begin(), line.end(), re, -1),
std::sregex_token_iterator(),
std::back_inserter(cp.columns),
[](const std::string & s) {return std::stoi(s); });
}
else {
// Notify the caller. End of matrix
is.setstate(std::ios::eofbit | std::ios::failbit);
}
}
return is;
}
// Type cast operator overload. Cast the type 'Columns' to std::vector<std::string>
operator Columns() const { return columns; }
protected:
// Temporary to hold the read vector
Columns columns{};
};
int main()
{
// Define variable matrix with its range constructor. Read complete CSV in this statement, So, one liner
Matrix matrix1( std::istream_iterator<ColumnProxy>(testCsv), {} );
// Reset failbit and eofbit
testCsv.clear();
// Read 2nd matrix
Matrix matrix2(std::istream_iterator<ColumnProxy>(testCsv), {});
return 0;
}
Again:
What a pity that nobody will read this . . .

Using regex_match() on iterator with string in c++

working on a c++ project, I need to iterate on a string ( or char* depending the solution you could provide me ! ). So basically I'm doing this :
void Pile::evalExpress(char* expchar){
string express = expchar
regex number {"[+-*/]"};
for(string::iterator it = express.begin(); it!=express.end(); ++it){
if(regex_match(*it,number)){
cout<<*it<<endl;
}
}
}
char expchar[]="234*+";
Pile calcTest;
calcTest.evalExpress(expchar);
the iterator works well ( I can put a cout<<*it<<'endl above the if statement and I get a correct output )
and then when I try to compile :
error: no matching function for call to 'regex_match(char&, std::__cxx11::regex&)'
if(regex_match(*it,number)){
^
I have no idea why this is happening, I tried to don't use iterator and iterate directly on the expchar[i] but I have the same error with regex_match()...
Regards
Vincent
Read the error message! It tells you that you're trying to pass a single char to regex_match, which is not possible because it requires a string (or other sequence of characters) not a single character.
You could do if (std::regex_match(it, it+1, number)) instead. That says to search the sequence of characters from it to it+1 (i.e. a sequence of length one).
You can also avoid creating a string and iterate over the char* directly
void Pile::evalExpress(const char* expchar) {
std::regex number {"[+-*/]"};
for (const char* p = expchar; *p != '\0'; ++p) {
if (regex_match(p, p+1, number)) {
cout<<*p<<endl;
}
}
}

RapidJSON Looping through a string array?

I am using RapidJSON to parse JSON data except I can't work out how to loop through the members of:
{
"members":{
"0":{
"template":"this is member 1"
},
"1":{
"template":"this is member 2"
}
}
}
I tried the following
e_doc["members"][iString]["template"].GetString()
inside a loop with converting the loop index (i) to a string but it doesn't recognize it as a string.
It works as:
printf("%s", e_doc["members"]["0"]["template"].GetString());
printf("%s", e_doc["members"]["1"]["template"].GetString());
There might be a small issue as you are not iterating over an array, but over an object. However, in the end the code is similar.
const rapidjson::Value& membersObject = e_doc["members"];
for(rapidjson::Value::ConstMemberIterator it=membersObject.MemberBegin(); it != membersObject.MemberEnd(); it++) {
std::cout << it->value["template"].GetString();
}

C++ is mixing my strings?

I have this really simple c++ function I wrote myself.
It should just strip the '-' characters out of my string.
Here's the code
char* FastaManager::stripAlignment(char *seq, int seqLength){
char newSeq[seqLength];
int j=0;
for (int i=0; i<seqLength; i++) {
if (seq[i] != '-') {
newSeq[j++]=seq[i];
}
}
char *retSeq = (char*)malloc((--j)*sizeof(char));
for (int i=0; i<j; i++) {
retSeq[i]=newSeq[i];
}
retSeq[j+1]='\0'; //WTF it keeps reading from memory without this
return retSeq;
}
I think that comment speaks for itself.
I don't know why, but when I launch the program and print out the result, I get something like
'stripped_sequence''original_sequence'
However, if I try to debug the code to see if there's anything wrong, the flows goes just right, and ends up returning the correct stripped sequence.
I tried to print out the memory of the two variables, and here are the memory readings
memory for seq: http://i.stack.imgur.com/dHI8k.png
memory for *seq: http://i.stack.imgur.com/UqVkX.png
memory for retSeq: http://i.stack.imgur.com/o9uvI.png
memory for *retSeq: http://i.stack.imgur.com/ioFsu.png
(couldn't include links / pics because of spam filter, sorry)
This is the code I'm using to print out the strings
for (int i=0; i<atoi(argv[2]); i++) {
char *seq;
if (usingStructure) {
seq = fm.generateSequenceWithStructure(structure);
}else{
seq = fm.generateSequenceFromProfile();
}
cout<<">Sequence "<<i+1<<": "<<seq<<endl;
}
Now, I have really no idea about what's going on.
If you can use std::string, simply do this:
std::string FastaManager::stripAlignment(const std::string& str)
{
std::string result(str);
result.erase(std::remove(result.begin(), result.end(), '-'), result.end());
return result;
}
This is called "erase-remove idiom".
This happens because you put the terminating zero of a C string outside the allocated space. You should be allocating one extra character at the end of your string copy, and adding '\0' there. Or better yet, you should use std::string.
char *retSeq = (char*)malloc((j+1)*sizeof(char));
for (int i=0; i<j; i++) {
retSeq[i]=newSeq[i];
}
retSeq[j]='\0';
it keeps reading from memory without this
This is by design: C strings are zero-terminated. '\0' signals to string routines in C that the end of the string has been reached. The same convention holds in C++ when you work with C strings.
Personally, I think you would be best off using std::string unless you have really very good reason otherwise:
std::string FastaManager::stripAlignment(std::string value)
{
value.erase(std::remove(value.begin(), value.end(), value.begin(), '-'), value.end());
return value;
}
When you are using C strings you need to realize that they are null-terminated: C strings reach up to the first null character found. With code you posted you introduced an out of range assignment as you allocated 'j' elements and you assign to retSeq[j + 1] which is two character past the end of the string (surely you mean retSeq[j] = 0; anyway).

Cleaning a string of punctuation in C++

Ok so before I even ask my question I want to make one thing clear. I am currently a student at NIU for Computer Science and this does relate to one of my assignments for a class there. So if anyone has a problem read no further and just go on about your business.
Now for anyone who is willing to help heres the situation. For my current assignment we have to read a file that is just a block of text. For each word in the file we are to clear any punctuation in the word (ex : "can't" would end up as "can" and "that--to" would end up as "that" obviously with out the quotes, quotes were used just to specify what the example was).
The problem I've run into is that I can clean the string fine and then insert it into the map that we are using but for some reason with the code I have written it is allowing an empty string to be inserted into the map. Now I've tried everything that I can come up with to stop this from happening and the only thing I've come up with is to use the erase method within the map structure itself.
So what I am looking for is two things, any suggestions about how I could a) fix this with out simply just erasing it and b) any improvements that I could make on the code I already have written.
Here are the functions I have written to read in from the file and then the one that cleans it.
Note: the function that reads in from the file calls the clean_entry function to get rid of punctuation before anything is inserted into the map.
Edit: Thank you Chris. Numbers are allowed :). If anyone has any improvements to the code I've written or any criticisms of something I did I'll listen. At school we really don't get feed back on the correct, proper, or most efficient way to do things.
int get_words(map<string, int>& mapz)
{
int cnt = 0; //set out counter to zero
map<string, int>::const_iterator mapzIter;
ifstream input; //declare instream
input.open( "prog2.d" ); //open instream
assert( input ); //assure it is open
string s; //temp strings to read into
string not_s;
input >> s;
while(!input.eof()) //read in until EOF
{
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
}
input.close(); //close instream
for(mapzIter = mapz.begin(); mapzIter != mapz.end(); mapzIter++)
cnt = cnt + mapzIter->second;
return cnt; //return number of words in instream
}
void clean_entry(const string& non_clean, string& clean)
{
int i, j, begin, end;
for(i = 0; isalnum(non_clean[i]) == 0 && non_clean[i] != '\0'; i++);
begin = i;
if(begin ==(int)non_clean.length())
return;
for(j = begin; isalnum(non_clean[j]) != 0 && non_clean[j] != '\0'; j++);
end = j;
clean = non_clean.substr(begin, (end-begin));
for(i = 0; i < (int)clean.size(); i++)
clean[i] = tolower(clean[i]);
}
The problem with empty entries is in your while loop. If you get an empty string, you clean the next one, and add it without checking. Try changing:
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
to
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() > 0)
{
mapz[not_s]++; //increment occurence
}
input >>s;
EDIT: I notice you are checking if the characters are alphanumeric. If numbers are not allowed, you may need to revisit that area as well.
Further improvements would be to
declare variables only when you use them, and in the innermost scope
use c++-style casts instead of the c-style (int) casts
use empty() instead of length() == 0 comparisons
use the prefix increment operator for the iterators (i.e. ++mapzIter)
A blank string is a valid instance of the string class, so there's nothing special about adding it into the map. What you could do is first check if it's empty, and only increment in that case:
if (!not_s.empty())
mapz[not_s]++;
Style-wise, there's a few things I'd change, one would be to return clean from clean_entry instead of modifying it:
string not_s = clean_entry(s);
...
string clean_entry(const string &non_clean)
{
string clean;
... // as before
if(begin ==(int)non_clean.length())
return clean;
... // as before
return clean;
}
This makes it clearer what the function is doing (taking a string, and returning something based on that string).
The function 'getWords' is doing a lot of distinct actions that could be split out into other functions. There's a good chance that by splitting it up into it's individual parts, you would have found the bug yourself.
From the basic structure, I think you could split the code into (at least):
getNextWord: Return the next (non blank) word from the stream (returns false if none left)
clean_entry: What you have now
getNextCleanWord: Calls getNextWord, and if 'true' calls CleanWord. Returns 'false' if no words left.
The signatures of 'getNextWord' and 'getNextCleanWord' might look something like:
bool getNextWord (std::ifstream & input, std::string & str);
bool getNextCleanWord (std::ifstream & input, std::string & str);
The idea is that each function does a smaller more distinct part of the problem. For example, 'getNextWord' does nothing but get the next non blank word (if there is one). This smaller piece therefore becomes an easier part of the problem to solve and debug if necessary.
The main component of 'getWords' then can be simplified down to:
std::string nextCleanWord;
while (getNextCleanWord (input, nextCleanWord))
{
++map[nextCleanWord];
}
An important aspect to development, IMHO, is to try to Divide and Conquer the problem. Split it up into the individual tasks that need to take place. These sub-tasks will be easier to complete and should also be easier to maintain.