count occurrences of specific strings in c++ - c++

I have a c++ project to count the LLOC of an input file, which is a file generated by a code generator consists of sequence of functions denoted F1( ), F2( ),..., Fn( ), followed by the main program and control structures like if, while, do, switch, and etc. we should count the number of: main program + functions + semicolons + equations + if statements + switch statements + while statements + for statements. I can easily count, for example, the number of ; using find function, but how can I count the number of functions? is there any way to count the substring F*( , which means every substring that starts with F and ends with ( ?
Here is my code to count the number of semicolons:
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main(int argc, char ** argv) {
ifstream testfile;
std::string stringline;
std::string str2(";");
size_t found;
int positioncount = 0;
char arry[100];
testfile.open("program.cpp");
while (!testfile.eof()) {
testfile.getline(arry, 50);
stringline = arry;
if (stringline.find(str2) != std::string::npos) {
positioncount++;
}
}
cout << "\n" << positioncount;
testfile.close();
return 0;
}

Since the code is machine generated you can probably make assumptions about it which make life much easier: for example no comments, no strings containing stuff that looks like code, no nested classes, etc.
That may let you get away with basic regular expressions plus counting braces. Modern C++ has built-in regular expressions, you may want to look into that for things like your function names.
Counting occurences is commonly done with maps (cf. http://www.cplusplus.com/reference/map/map/?kw=map).

Related

I have made a program in C++ to separate words from a line by spacebar and display those words as an array. What's wrong in my code?

Please help me to find a bug in this program.It separates a line into words by spacebar. And display as a list.
If the first char of a word is in lower case, it is converted to uppercase.
#include <iostream>
#include <string>
using namespace std;
int main()
{
char line[30]="Hi there buddy",List[10][20];
unsigned int i=0,List_pos=0,no;
int first=0,last;
while(i!=sizeof(line)+1)
{
if(line[i]==' ' or i==sizeof(line))
{
last=i;
no=0;
for(int j=first;j<last;++j)
{
if(no==0)
List[List_pos][no]=toupper(line[j]);
else
List[List_pos][no]=line[j];
++no;
}
++List_pos;
first=last+1;
}
++i;
}
for(unsigned int a=0;a<List_pos;++a)
cout<<"\nList["<<a+1<<"]="<<List[a];
return 0;
}
Expected Output:
List[1]=Hi
List[2]=There
List[3]=Buddy
Actual Output:
List[1]=Hi
List[2]=ThereiXŚm
List[3]=Buddy
I suggest you use a string, as you already included it. And 'List is not really necessary in this situation. Try making a single for loop where you separate your line into words, in my opinion when you work with arrays you should use for loops. In your for loop, as you go through the line, you could just add a if statement which determines whether you're at the end of a word or not. I think the problem in your code is the multiple loops but I am not sure of it.
I provide you a code which works. Just adapt it to your display requirements and you will be fine
#include <iostream>
#include <string>
using namespace std;
int main()
{
string line = "Hi there buddy";
for (int i = 0; i < line.size(); i++) {
if (line[i] == ' ') {
line[i + 1] = toupper(line[i+1]);
cout<<'\n';
} else {
cout<<line[i];
}
}
return 0;
} ```
Challenged by the comment from PaulMcKenzie, I implemented a C++ solution with 3 statements:
Define a std::string, with the words to work on
Define a std::regex that finds words only. Whitespaces and other delimiters are ignored
Use the std::transform to transform the input string into output lines
std::transform has 4 parameters.
With what the transformation should begin. In this case, we use the std::sregex_token_iterator. This will look for the regex (so, for the word) and return the first word. That's the begin.
With what the transformation should end. We use the empty std::sregex_token_iterator. That means: Do until all matches (all words) have been read.
The destination. For this we will use the std::ostream_iterator. This will send all transformed results (what the lambda returns) to the given output stream (in our case std::cout). And it will add a delimiter, here a newline ("\n").
The transormation function. Implemented as lambda. Here we get the word from the std::sregex_token_iterator and transform it into a new word according to what we want. So, a word with a capitalized first letter. We add a little bit text for the output line as wished by the OP.
Please check:
#include <string>
#include <iostream>
#include <regex>
#include <iterator>
int main()
{
// 1. This is the string to convert
std::string line("Hi there buddy");
// 2. We want to search for complete words
std::regex word("(\\w+)");
// 3. Transform the input string to output lines
std::transform(
std::sregex_token_iterator(line.begin(), line.end(), word, 1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"),
[i = 1](std::string w) mutable {
return std::string("List[") + std::to_string(i++) + "]=" + static_cast<char>(::toupper(w[0])) + &w[1];
}
);
return 0;
}
This will give us the following output:
List[1]=Hi
List[2]=There
List[3]=Buddy
Please get a feeling for the capabilities of C++
Found a solution for your next problem (when the user inputs a sentence only the first word it displayed). When you input a "space", the cin just thinks you are done. You need to use the getLine() to get the whole sentence.
getline(cin, line);
Instead of
cin>>line;

In C++ STL, how do I remove non-numeric characters from std::string with regex_replace?

Using the C++ Standard Template Library function regex_replace(), how do I remove non-numeric characters from a std::string and return a std::string?
This question is not a duplicate of question
747735
because that question requests how to use TR1/regex, and I'm
requesting how to use standard STL regex, and because the answer given
is merely some very complex documentation links. The C++ regex
documentation is extremely hard to understand and poorly documented,
in my opinion, so even if a question pointed out the standard C++
regex_replace
documentation,
it still wouldn't be very useful to new coders.
// assume #include <regex> and <string>
std::string sInput = R"(AA #-0233 338982-FFB /ADR1 2)";
std::string sOutput = std::regex_replace(sInput, std::regex(R"([\D])"), "");
// sOutput now contains only numbers
Note that the R"..." part means raw string literal and does not evaluate escape codes like a C or C++ string would. This is very important when doing regular expressions and makes your life easier.
Here's a handy list of single-character regular expression raw literal strings for your std::regex() to use for replacement scenarios:
R"([^A-Za-z0-9])" or R"([^A-Za-z\d])" = select non-alphabetic and non-numeric
R"([A-Za-z0-9])" or R"([A-Za-z\d])" = select alphanumeric
R"([0-9])" or R"([\d])" = select numeric
R"([^0-9])" or R"([^\d])" or R"([\D])" = select non-numeric
Regular expressions are overkill here.
#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>
inline bool not_digit(char ch) {
return '0' <= ch && ch <= '9';
}
std::string remove_non_digits(const std::string& input) {
std::string result;
std::copy_if(input.begin(), input.end(),
std::back_inserter(result),
not_digit);
return result;
}
int main() {
std::string input = "1a2b3c";
std::string result = remove_non_digits(input);
std::cout << "Original: " << input << '\n';
std::cout << "Filtered: " << result << '\n';
return 0;
}
The accepted answer if fine for the specifics of the given sample.
But it will fail for a number such as "-12.34" (it would result in "1234").
(note how the sample could be negative numbers)
Then the regex should be:
(-|\+)?(\d)+(.(\d)+)*
explanation: (optional ( "-" or "+" )) with (a number, repeated 1 to n times) with (optionally end's with: ( a "." followed by (a number, repeated 1 to n times) )
A bit over-reaching, but I was looking for this and the page showed up first in my search, so I'm adding my answer for future searches.

Big csv file c++ parsing performance

I have a big csv file (25 mb) that represents a symmetric graph (about 18kX18k). While parsing it into an array of vectors, i have analyzed the code (with VS2012 ANALYZER) and it shows that the problem with the parsing efficiency (about 19 seconds total) occurs while reading each character (getline::basic_string::operator+=) as shown in the picture below:
This leaves me frustrated, as with Java simple buffered line file reading and tokenizer i achieve it with less than half a second.
My code uses only STL library:
int allColumns = initFirstRow(file,secondRow);
// secondRow has initialized with one value
int column = 1; // dont forget, first column is 0
VertexSet* rows = new VertexSet[allColumns];
rows[1] = secondRow;
string vertexString;
long double vertexDouble;
for (int row = 1; row < allColumns; row ++){
// dont do the last row
for (; column < allColumns; column++){
//dont do the last column
getline(file,vertexString,',');
vertexDouble = stold(vertexString);
if (vertexDouble > _TH){
rows[row].add(column);
}
}
// do the last in the column
getline(file,vertexString);
vertexDouble = stold(vertexString);
if (vertexDouble > _TH){
rows[row].add(++column);
}
column = 0;
}
initLastRow(file,rows[allColumns-1],allColumns);
init first and last row basically does the same thing as the loop above, but initFirstRow also counts the number of columns.
VertexSet is basically a vector of indexes (int). Each vertex read (separated by ',') goes no more than 7 characters length long (values are between -1 and 1).
At 25 megabytes, I'm going to guess that your file is machine generated. As such, you (probably) don't need to worry about things like verifying the format (e.g., that every comma is in place).
Given the shape of the file (i.e., each line is quite long) you probably won't impose a lot of overhead by putting each line into a stringstream to parse out the numbers.
Based on those two facts, I'd at least consider writing a ctype facet that treats commas as whitespace, then imbuing the stringstream with a locale using that facet to make it easy to parse out the numbers. Overall code length would be a little greater, but each part of the code would end up pretty simple:
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <time.h>
#include <stdlib.h>
#include <locale>
#include <sstream>
#include <algorithm>
#include <iterator>
class my_ctype : public std::ctype<char> {
std::vector<mask> my_table;
public:
my_ctype(size_t refs=0):
my_table(table_size),
std::ctype<char>(my_table.data(), false, refs)
{
std::copy_n(classic_table(), table_size, my_table.data());
my_table[',']=(mask)space;
}
};
template <class T>
class converter {
std::stringstream buffer;
my_ctype *m;
std::locale l;
public:
converter() : m(new my_ctype), l(std::locale::classic(), m) { buffer.imbue(l); }
std::vector<T> operator()(std::string const &in) {
buffer.clear();
buffer<<in;
return std::vector<T> {std::istream_iterator<T>(buffer),
std::istream_iterator<T>()};
}
};
int main() {
std::ifstream in("somefile.csv");
std::vector<std::vector<double>> numbers;
std::string line;
converter<double> cvt;
clock_t start=clock();
while (std::getline(in, line))
numbers.push_back(cvt(line));
clock_t stop=clock();
std::cout<<double(stop-start)/CLOCKS_PER_SEC << " seconds\n";
}
To test this, I generated an 1.8K x 1.8K CSV file of pseudo-random doubles like this:
#include <iostream>
#include <stdlib.h>
int main() {
for (int i=0; i<1800; i++) {
for (int j=0; j<1800; j++)
std::cout<<rand()/double(RAND_MAX)<<",";
std::cout << "\n";
}
}
This produced a file around 27 megabytes. After compiling the reading/parsing code with gcc (g++ -O2 trash9.cpp), a quick test on my laptop showed it running in about 0.18 to 0.19 seconds. It never seems to use (even close to) all of one CPU core, indicating that it's I/O bound, so on a desktop/server machine (with a faster hard drive) I'd expect it to run faster still.
The inefficiency here is in Microsoft's implementation of std::getline, which is being used in two places in the code. The key problems with it are:
It reads from the stream one character at a time
It appends to the string one character at a time
The profile in the original post shows that the second of these problems is the biggest issue in this case.
I wrote more about the inefficiency of std::getline here.
GNU's implementation of std::getline, i.e. the version in libstdc++, is much better.
Sadly, if you want your program to be fast and you build it with Visual C++ you'll have to use lower level functions than std::getline.
The debug Runtime Library in VS is very slow because it does a lot of debug checks (for out of bound accesses and things like that) and calls lots of very small functions that are not inlined when you compile in Debug.
Running your program in release should remove all these overheads.
My bet on the next bottleneck is string allocation.
I would try read bigger chunks of memory at once and then parse it all.
Like.. read full line. and then parse this line using pointers and specialized functions.
Hmm good answer here. Took me a while but I had the same problem. After this fix my write and process time went from 38 sec to 6 sec.
Here's what I did.
First get data using boost mmap. Then you can use boost thread to make processing faster on the const char* that boost mmap returns. Something like this: (the multithreading is different depending on your implementation so I excluded that part)
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/thread/thread.hpp>
#include <boost/lockfree/queue.hpp>
foo(string path)
{
boost::iostreams::mapped_file mmap(path,boost::iostreams::mapped_file::readonly);
auto chars = mmap.const_data(); // set data to char array
auto eofile = chars + mmap.size(); // used to detect end of file
string next = ""; // used to read in chars
vector<double> data; // store the data
for (; chars && chars != eofile; chars++) {
if (chars[0] == ',' || chars[0] == '\n') { // end of value
data.push_back(atof(next.c_str())); // add value
next = ""; // clear
}
else
next += chars[0]; // add to read string
}
}

Replace a Field in a Space Delimited String C++

I have a single-space delimited string and I want to replace field x.
I can repeatedly use find to locate the x - 1 and x spaces, then use substr to grab the two strings on either side, then concatenate the two sub strings and my replacement text.
But man that seems like an awful lot of work for something that should be simple. Is there a better solution-- one that doesn't require Boost?
Answer
I've cleaned up #Domenic Lokies answer below:
sting fieldReplace( const string input, const string outputField, int index )
{
vector< char > stringIndex( numeric_limits< int >::digits10 + 2 );
_itoa_s( index, stringIndex.begin()._Ptr, stringIndex.size(), 10 );
const string stringRegex( "^((?:\\w+ ){" ); //^((?:\w+ ){$index})\w+
return regex_replace( input, regex( stringRegex + stringIndex.begin()._Ptr + "})\\w+" ), "$1" + outputField );
}
(_itoa_s and _Ptr are MSVS only I believe, so you'll need to clean those up if you want code portability. )
You can do it using one of the string::replace methods:
Locate the position of the x-1-st space. You can do it by calling string::find repeatedly
Locate the position of the x-th space by calling string::find one more time
Calculate the length of the word being replaced by subtracting the first index from the second one
Call string::replace passing the first index, the length, and the replacement string.
Here is how you can implement this:
#include <iostream>
#include <string>
using namespace std;
int main() {
string s = "quick brown frog jumps over the lazy dog";
size_t start = -1;
int cnt = 3; // Word number three
do {
start = s.find(' ', start+1);
} while (start != string::npos && --cnt > 1);
size_t end = s.find(' ', start+1);
s.replace(start+1, end-start-1, "fox");
cout << s << endl;
return 0;
}
Demo on ideone.
Since C++11 you should use a Regular Expression for your purposes. If you are not using a compiler which supports C++11, you can take a look at Boost.Regex.
Never combine std::string::find with std::string::replace, that is just not a good style in a language like C++.
I have written a short example for you to show you how to use Regular Expressions in C++.
#include <string>
#include <regex>
#include <iostream>
int main()
{
std::string subject = "quick brown frog jumps over the lazy dog";
std::regex pattern("frog");
std::cout << std::regex_replace(subject, pattern, "fox");
}

Generate numbers /c /wpa2

I want to generate 20 character wpa2 key, which consists of only numbers between 1- 10,000,000,000,000,000,000 in C++. Output format of each key must be in 20 characters format, like:
00000000000000000001
00000000000000000002
00000000000000000003
00000000000000000011
12300000000099945611
and so on.
I have this code, but:
It doesn't keep the numbers generated descending in 20 character format.
for (int i=0;i<=10000000000000000000;i++){
cout << 10000000000000000000 -i<<"\n";
}
Those numbers r too big than(long) integer, so g++ compiler in linux shell also didn't want to execute, due to size of "10000000000000000000 -i".
In this particular case, why not just have a string, and increment the character at the lowest index, if it oveflows (> '9'), then increment the next character up. Repeat and rinse until finished.
So, something like this:
std::string s = '000';
std::string::size_type len = s.length()
while (s != "999")
{
cout << s << endl;
s[len-1] ++;
int i = len-1;
while(s[i] > '9' && i >= 0)
{
s[i] = '0';
i--;
s[i]++;
}
}
However, if you have a machine that does one loop of the above code 1,000,000,000 times a second, it will take 317 years to run through your sequence. So I hope you have plenty of time and are eating healthily.
Your compiler needs to support 64-bit integers, if you want to store this as a number. It may be supported as long long data type. Change i to unsigned long long, and change the number literal to 10000000000000000000ULL. Be careful that you don't cast these values down to int (accidentally or otherwise) or you will lose some data.
Looping through all those numbers is going to take years, literally.
If you simply want to generate a random 20 character WPA2 key, you should do something like this instead using built in functions:
#include <iostream>
#include <string>
#include <chrono>
#include <random>
#include <algorithm>
std::string get_key() {
// Define all allowed characters (a WPA2 key can also contain letters).
std::string chars =
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
// Shuffle the characters.
std::mt19937 g(std::chrono::system_clock::now().time_since_epoch().count());
std::shuffle(std::begin(chars), std::end(chars), g);
// Return the first 20 characters.
return chars.substr(0, 20);
}
int main() {
std::cout << get_key() << std::endl;
}
If you only want a key consisting of numbers then remove all alpha characters from chars.