How do I find the offset of a matching string using RE2?

How do I find the offset of a matching string using RE2? - c++

RE2 is a modern regular expression engine available from Google. I want to use RE2 in a program that is currently using gnuregex. The problem I have relates to finding out what matched. What RE2 returns is the string that matched. I need to know the offset of what matched. My current plan is to take what RE2 returns and then use a find on the C++ string. But this seems wasteful. I've gone through the RE2 manual and can't figure out how to do it. Any ideas?

Store the result in a re2::StringPiece instead of a std::string. The value of .data() will point into the original string.
Consider this program.
In each of the tests, result.data() is a pointer into the original const char* or std::string.
#include <re2/re2.h>
#include <iostream>
int main(void) {
{ // Try it once with character pointers
const char *text[] = { "Once", "in", "Persia", "reigned", "a", "king" };
for(int i = 0; i < 6; i++) {
re2::StringPiece result;
if(RE2::PartialMatch(text[i], "([aeiou])", &result))
std::cout << "First lower-case vowel at " << result.data() - text[i] << "\n";
else
std::cout << "No lower-case vowel\n";
}
}
{ // Try it once with std::string
std::string text[] = { "While", "I", "pondered,", "weak", "and", "weary" };
for(int i = 0; i < 6; i++) {
re2::StringPiece result;
if(RE2::PartialMatch(text[i], "([aeiou])", &result))
std::cout << "First lower-case vowel at " << result.data() - text[i].data() << "\n";
else
std::cout << "No lower-case vowel\n";
}
}
}

Related

Regex character class subtraction in C++

I'm writing a C++ program that will need to take regular expressions that are defined in a XML Schema file and use them to validate XML data. The problem is, the flavor of regular expressions used by XML Schemas does not seem to be directly supported in C++.
For example, there are a couple special character classes \i and \c that are not defined by default and also the XML Schema regex language supports something called "character class subtraction" that does not seem to be supported in C++.
Allowing the use of the \i and \c special character classes is pretty simple, I can just look for "\i" or "\c" in the regular expression and replace them with their expanded versions, but getting character class subtraction to work is a much more daunting problem...
For example, this regular expression that is valid in an XML Schema definition throws an exception in C++ saying it has unbalanced square brackets.
#include <iostream>
#include <regex>
int main()
{
try
{
// Match any lowercase letter that is not a vowel
std::regex rx("[a-z-[aeiuo]]");
}
catch (const std::regex_error& ex)
{
std::cout << ex.what() << std::endl;
}
}
How can I get C++ to recognize character class subtraction within a regex? Or even better, is there a way to just use the XML Schema flavor of regular expressions directly within C++?

Character ranges subtraction or intersection is not available in any of the grammars supported by std::regex, so you will have to rewrite the expression into one of the supported ones.
The easiest way is to perform the subtraction yourself and pass the set to std::regex, for instance [bcdfghjklvmnpqrstvwxyz] for your example.
Another solution is to find either a more featureful regular expression engine or a dedicated XML library that supports XML Schema and its regular expression language.

Starting from the cppreference examples
#include <iostream>
#include <regex>
void show_matches(const std::string& in, const std::string& re)
{
std::smatch m;
std::regex_search(in, m, std::regex(re));
if(m.empty()) {
std::cout << "input=[" << in << "], regex=[" << re << "]: NO MATCH\n";
} else {
std::cout << "input=[" << in << "], regex=[" << re << "]: ";
std::cout << "prefix=[" << m.prefix() << "] ";
for(std::size_t n = 0; n < m.size(); ++n)
std::cout << " m[" << n << "]=[" << m[n] << "] ";
std::cout << "suffix=[" << m.suffix() << "]\n";
}
}
int main()
{
// greedy match, repeats [a-z] 4 times
show_matches("abcdefghi", "(?:(?![aeiou])[a-z]){2,4}");
}
You can test and check the the details of the regular expression here.
The choice of using a non capturing group (?: ...) is to prevent it from changing your groups in case you will use it in a bigger regular expression.
(?![aeiou]) will match without consuming the input if finds a character not matching [aeiou], the [a-z] will match letters. Combining these two condition is equivalent to your character class subtraction.
The {2,4} is a quantifier that says from 2 to 4, could also be + for one or more, * for zero or more.
Edit
Reading the comments in the other answer I understand that you want to support XMLSchema.
The next program shows how to use ECMA regular expression to translate the "character class differences" to a ECMA compatible format.
#include <iostream>
#include <regex>
#include <string>
#include <vector>
std::string translated_regex(const std::string &pattern){
// pattern to identify character class subtraction
std::regex class_subtraction_re(
"\\[((?:\\\\[\\[\\]]|[^[\\]])*)-\\[((?:\\\\[\\[\\]]|[^[\\]])*)\\]\\]"
);
// translate the regular expression to ECMA compatible
std::string translated = std::regex_replace(pattern,
class_subtraction_re, "(?:(?![$2])[$1])");
return translated;
}
void show_matches(const std::string& in, const std::string& re)
{
std::smatch m;
std::regex_search(in, m, std::regex(re));
if(m.empty()) {
std::cout << "input=[" << in << "], regex=[" << re << "]: NO MATCH\n";
} else {
std::cout << "input=[" << in << "], regex=[" << re << "]: ";
std::cout << "prefix=[" << m.prefix() << "] ";
for(std::size_t n = 0; n < m.size(); ++n)
std::cout << " m[" << n << "]=[" << m[n] << "] ";
std::cout << "suffix=[" << m.suffix() << "]\n";
}
}
int main()
{
std::vector<std::string> tests = {
"Some text [0-9-[4]] suffix",
"([abcde-[ae]])",
"[a-z-[aei]]|[A-Z-[OU]] "
};
std::string re = translated_regex("[a-z-[aeiou]]{2,4}");
show_matches("abcdefghi", re);
for(std::string test : tests){
std::cout << " " << test << '\n'
<< " -- " << translated_regex(test) << '\n';
}
return 0;
}
Edit: Recursive and Named character classes
The above approach does not work with recursive character class negation. And there is no way to deal with recursive substitutions using only regular expressions. This rendered the solution far less straight forward.
The solution has the following levels
one function scans the regular expression for a [
when a [ is found there is a function to handle the character classes recursively when '-[` is found.
The pattern \p{xxxxx} is handled separately to identify named character patterns. The named classes are defined in the specialCharClass map, I fill two examples.
#include <iostream>
#include <regex>
#include <string>
#include <vector>
#include <map>
std::map<std::string, std::string> specialCharClass = {
{"IsDigit", "0-9"},
{"IsBasicLatin", "a-zA-Z"}
// Feel free to add the character classes you want
};
const std::string getCharClassByName(const std::string &pattern, size_t &pos){
std::string key;
while(++pos < pattern.size() && pattern[pos] != '}'){
key += pattern[pos];
}
++pos;
return specialCharClass[key];
}
std::string translate_char_class(const std::string &pattern, size_t &pos){
std::string positive;
std::string negative;
if(pattern[pos] != '['){
return "";
}
++pos;
while(pos < pattern.size()){
if(pattern[pos] == ']'){
++pos;
if(negative.size() != 0){
return "(?:(?!" + negative + ")[" + positive + "])";
}else{
return "[" + positive + "]";
}
}else if(pattern[pos] == '\\'){
if(pos + 3 < pattern.size() && pattern[pos+1] == 'p'){
positive += getCharClassByName(pattern, pos += 2);
}else{
positive += pattern[pos++];
positive += pattern[pos++];
}
}else if(pattern[pos] == '-' && pos + 1 < pattern.size() && pattern[pos+1] == '['){
if(negative.size() == 0){
negative = translate_char_class(pattern, ++pos);
}else{
negative += '|';
negative = translate_char_class(pattern, ++pos);
}
}else{
positive += pattern[pos++];
}
}
return '[' + positive; // there is an error pass, forward it
}
std::string translate_regex(const std::string &pattern, size_t pos = 0){
std::string r;
while(pos < pattern.size()){
if(pattern[pos] == '\\'){
r += pattern[pos++];
r += pattern[pos++];
}else if(pattern[pos] == '['){
r += translate_char_class(pattern, pos);
}else{
r += pattern[pos++];
}
}
return r;
}
void show_matches(const std::string& in, const std::string& re)
{
std::smatch m;
std::regex_search(in, m, std::regex(re));
if(m.empty()) {
std::cout << "input=[" << in << "], regex=[" << re << "]: NO MATCH\n";
} else {
std::cout << "input=[" << in << "], regex=[" << re << "]: ";
std::cout << "prefix=[" << m.prefix() << "] ";
for(std::size_t n = 0; n < m.size(); ++n)
std::cout << " m[" << n << "]=[" << m[n] << "] ";
std::cout << "suffix=[" << m.suffix() << "]\n";
}
}
int main()
{
std::vector<std::string> tests = {
"[a]",
"[a-z]d",
"[\\p{IsBasicLatin}-[\\p{IsDigit}-[89]]]",
"[a-z-[aeiou]]{2,4}",
"[a-z-[aeiou-[e]]]",
"Some text [0-9-[4]] suffix",
"([abcde-[ae]])",
"[a-z-[aei]]|[A-Z-[OU]] "
};
for(std::string test : tests){
std::cout << " " << test << '\n'
<< " -- " << translate_regex(test) << '\n';
// Construct a reegx (validate syntax)
std::regex(translate_regex(test));
}
std::string re = translate_regex("[a-z-[aeiou-[e]]]{2,10}");
show_matches("abcdefghi", re);
return 0;
}

Try using a library function from a library with XPath support, like xmlregexp in libxml (is a C library), it can handle the XML regexes and apply them to the XML directly
http://www.xmlsoft.org/html/libxml-xmlregexp.html#xmlRegexp
----> http://web.mit.edu/outland/share/doc/libxml2-2.4.30/html/libxml-xmlregexp.html <----
An alternative could have been PugiXML (C++ library, What XML parser should I use in C++? ) however i think it does not implement the XML regex functionality ...

Okay after going through the other answers I tried out a few different things and ended up using the xmlRegexp functionality from libxml2.
The xmlRegexp related functions are very poorly documented so I figured I would post an example here because others may find it useful:
#include <iostream>
#include <libxml/xmlregexp.h>
int main()
{
LIBXML_TEST_VERSION;
xmlChar* str = xmlCharStrdup("bcdfg");
xmlChar* pattern = xmlCharStrdup("[a-z-[aeiou]]+");
xmlRegexp* regex = xmlRegexpCompile(pattern);
if (xmlRegexpExec(regex, str) == 1)
{
std::cout << "Match!" << std::endl;
}
free(regex);
free(pattern);
free(str);
}
Output:
Match!
I also attempted to use the XMLString::patternMatch from the Xerces-C++ library but it didn't seem to use an XML Schema compliant regex engine underneath. (Honestly I have no clue what regex engine it uses underneath and the documentation for that was pretty abysmal and I couldn't find any examples online so I just gave up on it.)

What does std::match_results::size return?

I'm a bit confused about the following C++11 code:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string haystack("abcdefabcghiabc");
std::regex needle("abc");
std::smatch matches;
std::regex_search(haystack, matches, needle);
std::cout << matches.size() << std::endl;
}
I'd expect it to print out 3 but instead I get 1. Am I missing something?

You get 1 because regex_search returns only 1 match, and size() will return the number of capture groups + the whole match value.
Your matches is...:
Object of a match_results type (such as cmatch or smatch) that is filled by this function with information about the match results and any submatches found.
If [the regex search is] successful, it is not empty and contains a series of sub_match objects: the first sub_match element corresponds to the entire match, and, if the regex expression contained sub-expressions to be matched (i.e., parentheses-delimited groups), their corresponding sub-matches are stored as successive sub_match elements in the match_results object.
Here is a code that will find multiple matches:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
string str("abcdefabcghiabc");
int i = 0;
regex rgx1("abc");
smatch smtch;
while (regex_search(str, smtch, rgx1)) {
std::cout << i << ": " << smtch[0] << std::endl;
i += 1;
str = smtch.suffix().str();
}
return 0;
}
See IDEONE demo returning abc 3 times.
As this method destroys the input string, here is another alternative based on the std::sregex_iterator (std::wsregex_iterator should be used when your subject is an std::wstring object):
int main() {
std::regex r("ab(c)");
std::string s = "abcdefabcghiabc";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i)
{
std::smatch m = *i;
std::cout << "Match value: " << m.str() << " at Position " << m.position() << '\n';
std::cout << " Capture: " << m[1].str() << " at Position " << m.position(1) << '\n';
}
return 0;
}
See IDEONE demo, returning
Match value: abc at Position 0
Capture: c at Position 2
Match value: abc at Position 6
Capture: c at Position 8
Match value: abc at Position 12
Capture: c at Position 14

What you're missing is that matches is populated with one entry for each capture group (including the entire matched substring as the 0th capture).
If you write
std::regex needle("a(b)c");
then you'll get matches.size()==2, with matches[0]=="abc", and matches[1]=="b".

EDIT: Some people have downvoted this answer. That may be for a variety of reasons, but if it is because it does not apply to the answer I criticized (no one left a comment to explain the decision), they should take note that W. Stribizew changed the code two months after I wrote this, and I was unaware of it until today, 2021-01-18. The rest of the answer is unchanged from when I first wrote it.
#stribizhev's solution has quadratic worst case complexity for sane regular expressions. For insane ones (e.g. "y*"), it doesn't terminate. In some applications, these issues could be DoS attacks waiting to happen. Here's a fixed version:
string str("abcdefabcghiabc");
int i = 0;
regex rgx1("abc");
smatch smtch;
auto beg = str.cbegin();
while (regex_search(beg, str.cend(), smtch, rgx1)) {
std::cout << i << ": " << smtch[0] << std::endl;
i += 1;
if ( smtch.length(0) > 0 )
std::advance(beg, smtch.length(0));
else if ( beg != str.cend() )
++beg;
else
break;
}
According to my personal preference, this will find n+1 matches of an empty regex in a string of length n. You might also just exit the loop after an empty match.
If you want to compare the performance for a string with millions of matches, add the following lines after the definition of str (and don't forget to turn on optimizations), once for each version:
for (int j = 0; j < 20; ++j)
str = str + str;

Anything like substr but instead of stopping at the byte you specified, it stops at a specific string [duplicate]

This question already has answers here:
How do you search a std::string for a substring in C++?
(6 answers)
Closed 8 years ago.
I have a client for a pre-existing server. Let's say I get some packets "MC123, 456!##".
I store these packets in a char called message. To print out a specific part of them, in this case the numbers part of them, I would do something like "cout << message.substr(3, 7) << endl;".
But what if I receive another message "MC123, 456, 789!##". "cout << message.substr(3,7)" would only print out "123, 456", whereas I want "123, 456, 789". How would I do this assuming I know that every message ends with "!##".

First - Sketch out the indexing.
std::string packet1 = "MC123, 456!##";
// 0123456789012345678
// ^------^ desired text
std::string packet2 = "MC123, 456, 789!##";
// 0123456789012345678
// ^-----------^ desired text
The others answers are ok. If you wish to use std::string find,
consider rfind and find_first_not_of, as in the following code:
// forward
void messageShow(std::string packet,
size_t startIndx = 2);
// /////////////////////////////////////////////////////////////////////////////
int main (int, char** )
{
// 012345678901234567
// |
messageShow("MC123, 456!##");
messageShow("MC123, 456, 789!##");
messageShow("MC123, 456, 789, 987, 654!##");
// error test cases
messageShow("MC123, 456, 789##!"); // missing !##
messageShow("MC123x 456, 789!##"); // extraneous char in packet
return(0);
}
void messageShow(std::string packet,
size_t startIndx) // default value 2
{
static size_t seq = 0;
seq += 1;
std::cout << packet.size() << " packet" << seq << ": '"
<< packet << "'" << std::endl;
do
{
size_t bangAtPound_Indx = packet.rfind("!##");
if(bangAtPound_Indx == std::string::npos){ // not found, can't do anything more
std::cerr << " '!##' not found in packet " << seq << std::endl;
break;
}
size_t printLength = bangAtPound_Indx - startIndx;
const std::string DIGIT_SPACE = "0123456789, ";
size_t allDigitSpace = packet.find_first_not_of(DIGIT_SPACE, startIndx);
if(allDigitSpace != bangAtPound_Indx) {
std::cerr << " extraneous char found in packet " << seq << std::endl;
break; // something extraneous in string
}
std::cout << bangAtPound_Indx << " message" << seq << ": '"
<< packet.substr(startIndx, printLength) << "'" << std::endl;
}while(0);
std::cout << std::endl;
}
This outputs
13 packet1: 'MC123, 456!##'
10 message1: '123, 456'
18 packet2: 'MC123, 456, 789!##'
15 message2: '123, 456, 789'
28 packet3: 'MC123, 456, 789, 987, 654!##'
25 message3: '123, 456, 789, 987, 654'
18 packet4: 'MC123, 456, 789##!'
'!##' not found in packet 4
18 packet5: 'MC123x 456, 789!##'
extraneous char found in packet 5
Note: String indexes start at 0. The index of the digit '1' is 2.

The correct approach is to look for existence / location of the "known termination" string, then take the substring up to (but not including) that substring.
Something like
str::string termination = "!#$";
std::size_t position = inputstring.find(termination);
std::string importantBit = message.substr(0, position);
You could check the front of the string separately as well. Combining these, you could use regular expressions to make your code more robust, using a regex like
MC([0-9,]+)!#\$
This will return the bit between MC and !#$ but only if it consists entirely of numbers and commas. Obviously you can adapt this as needed.
UPDATE you asked in your comment how to use the regular expression. Here is a very simple program. Note - this is using C++11: you need to make sure our compiler supports it.
#include <iostream>
#include <regex>
int main(void) {
std::string s ("ABC123,456,789!#$");
std::smatch m;
std::regex e ("ABC([0-9,]+)!#\\$"); // matches the kind of pattern you are looking for
if (std::regex_search (s,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
}
}
On my Mac, I can compile the above program with
clang++ -std=c++0x -stdlib=libc++ match.cpp -o match
If instead of just digits and commas you want "anything" in your expression (but it's still got fixed characters in front and behind) you can simply do
std::regex e ("ABC(.*)!#\\$");
Here, .+ means "zero or more of 'anything'" - but followed by !#$. The double backslash has to be there to "escape" the dollar sign, which has special meaning in regular expressions (it means "the end of the string").
The more accurately your regular expression reflects exactly what you expect, the better you will be able to trap any errors. This is usually a very good thing in programming. "Always check your inputs".
One more thing - I just noticed you mentioned that you might have "more stuff" in your string. This is where using regular expressions quickly becomes the best. You mentioned a string
MC123, 456!##*USRChester.
and wanted to extract 123, 456 and Chester. That is - stuff between MC and !#$, and more stuff after USR (if that is even there). Here is the code that shows how that is done:
#include <iostream>
#include <regex>
int main(void) {
std::string s1 ("MC123, 456!#$");
std::string s2 ("MC123, 456!#$USRChester");
std::smatch m;
std::regex e ("MC([0-9, ]+)!#\\$(?:USR)?(.*)$"); // matches the kind of pattern you are looking for
if (std::regex_search (s1,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
}
if (std::regex_search (s2,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
if (match[2].length() > 0) {
std::cout << m[2] << ": " << m[1] << std::endl;
}
}
}
Output:
match[0] = MC123, 456!#$
match[1] = 123, 456
match[2] =
match[0] = MC123, 456!#$USRChester
match[1] = 123, 456
match[2] = Chester
Chester: 123, 456
The matches are:
match[0] : "everything in the input string that was consumed by the Regex"
match[1] : "the thing in the first set of parentheses"
match[2] : "The thing in the second set of parentheses"
Note the use of the slightly tricky (?:USR)? expression. This says "This might (that's the ()? ) be followed by the characters USR. If it is, skip them (that's the ?: part) and match what follows.
As you can see, simply testing whether m[2] is empty will tell you whether you have just numbers, or number plus "the thing after the USR". I hope this gives you an inkling of the power of regular expressions for chomping through strings like yours.

If you are sure about the ending of the message, message.substr(3, message.size()-6) will do the trick.
However, it is good practice to check everything, just to avoid surprises.
Something like this:
if (message.size() < 6)
throw error;
if (message.substr(0,3) != "MCX") //the exact numbers do not match in your example, but you get the point...
throw error;
if (message.substr(message.size()-3) != "!##")
throw error;
string data = message.substr(3, message.size()-6);

Just calculate the offset first.
string str = ...;
size_t start = 3;
size_t end = str.find("!##");
assert(end != string::npos);
return str.substr(start, end - start);

You can get the index of "!##" by using:
message.find("!##")
Then use that answer instead of 7. You should also check for it equalling std::string::npos which indicates that the substring was not found, and take some different action.

string msg = "MC4,512,541,3123!##";
for (int i = 2; i < msg.length() - 3; i++) {
if (msg[i] != '!' && msg[i + 1] != '#' && msg[i + 2] != '#')
cout << msg[i];
}
or use char[]
char msg[] = "MC4,123,54!##";
sizeof(msg -1 ); //instead of msg.length()
// -1 for the null byte at the end (each char takes 1 byte so the size -1 == number of chars)

How to truncate a string [formating] ? c++

I want to truncate a string in a cout,
string word = "Very long word";
int i = 1;
cout << word << " " << i;
I want to have as an output of the string a maximum of 8 letters
so in my case, I want to have
Very lon 1
instead of :
Very long word 1
I don't want to use the wget(8) function, since it will not truncate my word to the size I want unfortunately. I also don't want the 'word' string to change its value ( I just want to show to the user a part of the word, but keep it full in my variable)

I know you already have a solution, but I thought this was worth mentioning: Yes, you can simply use string::substr, but it's a common practice to use an ellipsis to indicate that a string has been truncated.
If that's something you wanted to incorporate, you could just make a simple truncate function.
#include <iostream>
#include <string>
std::string truncate(std::string str, size_t width, bool show_ellipsis=true)
{
if (str.length() > width)
if (show_ellipsis)
return str.substr(0, width) + "...";
else
return str.substr(0, width);
return str;
}
int main()
{
std::string str = "Very long string";
int i = 1;
std::cout << truncate(str, 8) << "\t" << i << std::endl;
std::cout << truncate(str, 8, false) << "\t" << i << std::endl;
return 0;
}
The output would be:
Very lon... 1
Very lon 1

As Chris Olden mentioned above, using string::substr is a way to truncate a string. However, if you need another way to do that you could simply use string::resize and then add the ellipsis if the string has been truncated.
You may wonder what does string::resize? In fact it just resizes the used memory (not the reserved one) by your string and deletes any character beyond the new size, only keeping the first nth character of your string, with n being the new size. Moreover, if the new size is greater, it will expand the used memory of your string, but this aspect of expansion is straightforward I think.
Of course, I don't want to suggest a 'new best way' to do it, it's just another way to truncate a std::string.
If you adapt the Chris Olden truncate function, you get something like this:
#include <iostream>
#include <string>
std::string& truncate(std::string& str, size_t width, bool show_ellipsis=true) {
if (str.length() > width) {
if (show_ellipsis) {
str.resize(width);
return str.append("...");
}
else {
str.resize(width);
return str;
}
}
return str;
}
int main() {
std::string str = "Very long string";
int i = 1;
std::cout << truncate(str, 8) << "\t" << i << std::endl;
std::cout << truncate(str, 8, false) << "\t" << i << std::endl;
return 0;
}
Even though this method does basically the same, note that this method takes and returns a reference to the modified string, so be careful with it since this string could be destroyed because of an external event in your code. Thus if you don't want to take that risk, just remove the references and the function becomes:
std::string truncate(std::string str, size_t width, bool show_ellipsis=true) {
if (str.length() > width) {
if (show_ellipsis) {
str.resize(width);
return str + "...";
}
else {
str.resize(width);
return str;
}
}
return str;
}
I know it's a little bit late to post this answer. However it might come in handy for future visitors.

how to find number of elements in an array of strings in c++?

i have an array of string.
std::string str[10] = {"one","two"}
How to find how many strings are present inside the str[] array?? Is there any standard function?

There are ten strings in there despite the fact that you have only initialised two of them:
#include <iostream>
int main (void) {
std::string str[10] = {"one","two"};
std::cout << sizeof(str)/sizeof(*str) << std::endl;
std::cout << str[0] << std::endl;
std::cout << str[1] << std::endl;
std::cout << str[2] << std::endl;
std::cout << "===" << std::endl;
return 0;
}
The output is:
10
one
two
===
If you want to count the non-empty strings:
#include <iostream>
int main (void) {
std::string str[10] = {"one","two"};
size_t count = 0;
for (size_t i = 0; i < sizeof(str)/sizeof(*str); i++)
if (str[i] != "")
count++;
std::cout << count << std::endl;
return 0;
}
This outputs 2 as expected.

If you want to count all elements sizeof technique will work as others pointed out.
If you want to count all non-empty strings, this is one possible way by using the standard count_if function.
bool IsNotEmpty( const std::string& str )
{
return !str.empty();
}
int main ()
{
std::string str[10] = {"one","two"};
int result = std::count_if(str, &str[10], IsNotEmpty);
cout << result << endl; // it will print "2"
return 0;
}

I don't know that I would use an array of std::strings. If you're already using the STL, why not consider a vector or list? At least that way you could just figure it out with std::vector::size() instead of working ugly sizeof magic. Also, that sizeof magic won't work if the array is stored on the heap rather than the stack.
Just do this:
std::vector<std::string> strings(10);
strings[0] = "one";
strings[1] = "two";
std::cout << "Length = " << strings.size() << std::endl;

You can always use countof macro to get the number of elements, but again, the memory was allocated for 10 elements and thats the count that you'll get.

The ideal way to do this is
std::string str[] = {"one","two"}
int num_of_elements = sizeof( str ) / sizeof( str[ 0 ] );

Since you know the size.
You could do a binary search for not null/empty.
str[9] is empty
str[5] is empty
str[3] is not empty
str[4] is empty
You have 4 items.
I don't really feel like implementing the code, but this would be quite quick.

Simply use this function for 1D string array:
template<typename String, uint SIZE> // String can be 'string' or 'const string'
unsigned int NoOfStrings (String (&arr)[SIZE])
{
unsigned int count = 0;
while(count < SIZE && arr[count] != "")
count ++;
return count;
}
Usage:
std::string s1 = {"abc", "def" };
int i = NoOfStrings(s1); // i = 2
I am just wondering if we can write a template meta program for this ! (since everything is known at compile time)

A simple way to do this is to use the empty() member function of std::string like this e.g.:
size_t stringArrSize(std::string *stringArray) {
size_t num = 0;
while (stringArray->empty() != true) {
++num;
stringArray++;
}
return num;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do I find the offset of a matching string using RE2? - c++

Related

Regex character class subtraction in C++

What does std::match_results::size return?

Anything like substr but instead of stopping at the byte you specified, it stops at a specific string [duplicate]

How to truncate a string [formating] ? c++

how to find number of elements in an array of strings in c++?

Categories

Resources