Using sscanf to extract an int from a string in C++ - c++

My function must process strings that look like say hello y(5) or data |x(3)|, and I need to be able to extract the integer shown and store it into a separate int variable called address. However, some strings passing through will not have any integers, and for these the address must default to 0. When a string contains an integer, it will always be in between parentheses. I've attempted to use sscanf, but, being very new to sscanf, I'm encountering problems.. For some reason, the address always reads as 0. Here's my code:
void process(string info)
{
int address = 0; // set to 0 in case info contains no digits
sscanf(info.c_str(), "%d", address);
.
.
.
// remainder of code makes other function calls using the address, etc
}
Any ideas as to why the sscanf fails to find the integer in between parentheses? Thanks!

why the sscanf fails to find the integer in between parentheses
The "%d" in sscanf(info.c_str(), "%d", address) will cause sscanf() to stop scanning once a non-numeric sequence detected. Text like "(5)" will simply stop scanning at the "(".
Instead code need to to skip over non-numeric text.
Pseudo-code
in a loop
search for any of "-+0123456789"
if not found return 0
convert from that point using sscanf() or strtol()
if that succeeds, return number
else advance to next character
Sample code
int address;
const char *p = info.c_str();
for (;;) {
p += strcspn(p, "0123456789+-");
if (*p == 0) return 0;
if (sscanf(p, "%d", &address) == 1) {
return address;
}
p++;
}
Notes:
The strcspn function computes the length of the maximum initial segment of the string pointed to by s1 which consists entirely of characters not from the string pointed to by s2. C11 7.24.5.3 2
If code wants to rely on " it will always be in between parentheses." and input like "abc()def(123)" does not occur which has preceding non-numeric data between ().:
const char *p = info.c_str();
int address;
if (sscanf(p, "%*[^(](%d", &address)==1) {
return address;
}
return 0;
or simply
int address = 0;
sscanf(info.c_str(), "%*[^(](%d", &address);
return address;

You could use something as simple as this where strchr finds the first occurrence of "(" then use atoi to return the integer which will stop at the first non-digit.
char s1[] = "hello y(5)";
char s2[] = "data [x(3)]";
char s3[] = "hello";
int a1 = 0;
int a2 = 0;
int a3 = 0;
char* tok = strchr( s1, '(');
if (tok != NULL)
a1 = atoi(tok+1);
tok = strchr( s2, '(');
if (tok != NULL)
a2 = atoi(tok+1);
tok = strchr(s3,'(');
if (tok != NULL)
a3 = atoi(tok+1);
printf( "a1=%d, a2=%d, a3=%d", a1,a2,a3);
return 0;

When a string contains an integer, it will always be in between
parentheses
To strictly conform with this requirement you can try:
void process(string info)
{
int address;
char c = '5'; //any value other than ) should work
sscanf(info.c_str(), "%*[^(](%d%c", &address, &c);
if(c != ')') address = 0;
.
.
.
}

link to a solution
int address;
sscanf(info.c_str(), "%*[^0-9]%d", &address);
printf("%d", address);
this should extract the integer between the parenthesis

Related

Recognize string formatting Debug Assertion

I have a runtime problem with code below.
The purpose is to "recognize" the formats (%s %d etc) within the input string.
To do this, it returns an integer that matches the data type.
Then the extracted types are manipulated/handled in other functions.
I want to clarify that my purpose isn't to write formatted types in a string (snprintf etc.) but only to recognize/extract them.
The problem is the crash of my application with error:
Debug Assertion Failed!
Program:
...ers\Alex\source\repos\TestProgram\Debug\test.exe
File: minkernel\crts\ucrt\appcrt\convert\isctype.cpp
Line: 36
Expression: c >= -1 && c <= 255
My code:
#include <iostream>
#include <cstring>
enum Formats
{
TYPE_INT,
TYPE_FLOAT,
TYPE_STRING,
TYPE_NUM
};
typedef struct Format
{
Formats Type;
char Name[5 + 1];
} SFormat;
SFormat FormatsInfo[TYPE_NUM] =
{
{TYPE_INT, "d"},
{TYPE_FLOAT, "f"},
{TYPE_STRING, "s"},
};
int GetFormatType(const char* formatName)
{
for (const auto& format : FormatsInfo)
{
if (strcmp(format.Name, formatName) == 0)
return format.Type;
}
return -1;
}
bool isValidFormat(const char* formatName)
{
for (const auto& format : FormatsInfo)
{
if (strcmp(format.Name, formatName) == 0)
return true;
}
return false;
}
bool isFindFormat(const char* strBufFormat, size_t stringSize, int& typeFormat)
{
bool foundFormat = false;
std::string stringFormat = "";
for (size_t pos = 0; pos < stringSize; pos++)
{
if (!isalpha(strBufFormat[pos]))
continue;
if (!isdigit(strBufFormat[pos]))
{
stringFormat += strBufFormat[pos];
if (isValidFormat(stringFormat.c_str()))
{
typeFormat = GetFormatType(stringFormat.c_str());
foundFormat = true;
}
}
}
return foundFormat;
}
int main()
{
std::string testString = "some test string with %d arguments"; // crash application
// std::string testString = "%d some test string with arguments"; // not crash application
size_t stringSize = testString.size();
char buf[1024 + 1];
memcpy(buf, testString.c_str(), stringSize);
buf[stringSize] = '\0';
for (size_t pos = 0; pos < stringSize; pos++)
{
if (buf[pos] == '%')
{
if (buf[pos + 1] == '%')
{
pos++;
continue;
}
else
{
char bufFormat[1024 + 1];
memcpy(bufFormat, buf + pos, stringSize);
bufFormat[stringSize] = '\0';
int typeFormat;
if (isFindFormat(bufFormat, stringSize, typeFormat))
{
std::cout << "type = " << typeFormat << "\n";
// ...
}
}
}
}
}
As I commented in the code, with the first string everything works. While with the second, the application crashes.
I also wanted to ask you is there a better/more performing way to recognize types "%d %s etc" within a string? (even not necessarily returning an int to recognize it).
Thanks.
Let's take a look at this else clause:
char bufFormat[1024 + 1];
memcpy(bufFormat, buf + pos, stringSize);
bufFormat[stringSize] = '\0';
The variable stringSize was initialized with the size of the original format string. Let's say it's 30 in this case.
Let's say you found the %d code at offset 20. You're going to copy 30 characters, starting at offset 20, into bufFormat. That means you're copying 20 characters past the end of the original string. You could possibly read off the end of the original buf, but that doesn't happen here because buf is large. The third line sets a NUL into the buffer at position 30, again past the end of the data, but your memcpy copied the NUL from buf into bufFormat, so that's where the string in bufFormat will end.
Now bufFormat contains the string "%d arguments." Inside isFindFormat you search for the first isalpha character. Possibly you meant isalnum here? Because we can only get to the isdigit line if the isalpha check passes, and if it's isalpha, it's not isdigit.
In any case, after isalpha passes, isdigit will definitely return false so we enter that if block. Your code will find the right type here. But, the loop doesn't terminate. Instead, it continues scanning up to stringSize characters, which is the stringSize from main, that is, the size of the original format string. But the string you're passing to isFindFormat only contains the part starting at '%'. So you're going to scan past the end of the string and read whatever's in the buffer, which will probably trigger the assertion error you're seeing.
Theres a lot more going on here. You're mixing and matching std::string and C strings; see if you can use std::string::substr instead of copying. You can use std::string::find to find characters in a string. If you have to use C strings, use strcpy instead of memcpy followed by the addition of a NUL.
You could just demand it to a regexp engine which bourned to search through strings
Since C++11 there's direct support, what you have to do is
#include <regex>
then you can match against strings using various methods, for instance regex_match which gives you the possibility, together with an smatch to find out your target with just few lines of codes using standard library
std::smatch sm;
std::regex_match ( testString.cbegin(), testString.cend(), sm, str_expr);
where str_exp is your regex to find what you want specifically
in the sm you have now every matched string against your regexp, which you can print in this way
for (int i = 0; i < sm.size(); ++i)
{
std::cout << "Match:" << sm[i] << std::endl;
}
EDIT:
to better express the result you would achieve i'll include a simple sample below
// target string to be searched against
string target_string = "simple example no.%d is: %s";
// pattern to look for
regex str_exp("(%[sd])");
// match object
smatch sm;
// iteratively search your pattern on the string, excluding parts of the string already matched
cout << "My format strings extracted:" << endl;
while (regex_search(target_string, sm, str_exp))
{
std::cout << sm[0] << std::endl;
target_string = sm.suffix();
}
you can easily add any format string you want modifying the str_exp regex expression.

Can't seem to get a char array to leave a function and be usable in main

The problem enlies with printf(stringOut). It prints an empty array. The function halfstring appears to work correctly but the string it builds never makes it to main.
int main(int argc, char *argv[])
{
char stringIn[30] = "There is no cow level.\0";
char stringOut[sizeof(stringIn)];
halfstring(stringIn, stringOut);
printf(stringOut);
return 0;
}
halfstring is supposed to take every odd character in a char array and put it into a new char array without using ANY system-defined string functions (i.e. those found in the string.h library including strlen, strcat, strcpy, etc).
void halfstring(char stringIn [], char stringOut [])
{
int i = 0;
int modi;
while(stringIn[i] != '\0')
{
if(i % 2 != 0)
{
stringOut[i] = stringIn[i];
}
i++;
}
}
Inside the function halfstring you skipped the first and second characters of stringOut which probably are containing null characters when being declared this is the reason why you got nothing.
You can solve that by adding a new separate indice k for stringOut:
void halfstring(char stringIn [], char stringOut [])
{
int i = 0,k=0; // create a separate indice for stringOut
int modi;
while(stringIn[i] != '\0')
{
if(i % 2 != 0)
{
stringOut[k] = stringIn[i];
k++; // increment the indice
}
i++;
}
stringOut[k]='\0';
}
1) You don't need to NUL terminate a string literal:
char stringIn[30] = "There is no cow level.\0";
^^
2) Your second array (stringOut) results in something like:
{'T', garbage, 'e', garbage, 'e', garbage, 'a', garbage, 'e' ... };
You need to count the number of chars stored in the 2nd array:
void halfstring(char stringIn [], char stringOut [])
{
int i = 0;
int n = 0;
while(stringIn[i] != '\0')
{
if(i % 2 != 0)
{
stringOut[n++] = stringIn[i];
}
i++;
}
stringOut[n] = '\0';
}
There are several drawbacks in the program.
For starters there is no need to include the terminating zero in the string literal
char stringIn[30] = "There is no cow level.\0";
^^^^
because string literals already have the terminating zero.
Secondly usually standard string functions return pointer to the first character of the target string. This allows to chain at least two functions in one statement.
The first parameter is usually declares the target string while the second parameter declares the source string.
As the source string is not changed in the function it should be declared with the qualifier const.
And at last within the function there is used incorrect index for the target string and the string is not appended with the terminating zero.
Taking this into account the function can be written as it is shown in the demonstrative program below
#include <stdio.h>
char * halfstring( char s1[], const char s2[] )
{
char *p = s1;
while ( *s2 && *++s2 ) *p++ = *s2++;
*p = *s2;
return s1;
}
int main(void)
{
char s1[30] = "There is no cow level.";
char s2[sizeof( s1 )];
puts( halfstring( s2, s1 ) );
return 0;
}
Its output is
hr sn o ee.

Increment numbers in char array separated by different delimiters

I have string like this 1-2,4^,14-56
I am expecting output 2-3,5^,15-57
char input[48];
int temp;
char *pch;
pch = strtok(input, "-,^");
while(pch != NULL)
{
char tempch[10];
temp = atoi(pch);
temp++;
itoa(temp, tempch, 10);
memcpy(pch, tempch, strlen(tempch));
pch = strtok(NULL, "-,^");
}
After running through this if I print input it prints only 2 which is first character of the updated string. It does not print all characters in the string. What is the problem with my code?
For plain C, use the library function strtod. Other than atoi, this can update a pointer to the next unparsed character:
long strtol (const char *restrict str, char **restrict endptr, int base);
...
The strtol() function converts the string in str to a long value. [...] If endptr is not NULL, strtol() stores the address of the first invalid character in *endptr.
Since there may be more than one 'not-a-digit' character between the numbers, skip them with the library function isdigit. I placed this at the start of the loop so it would not accidentally convert a string such as -2,3 to -1,4 -- the initial -2 would be picked up first! (And if that is a problem elsewhere, there is also a strtoul.)
Since it appears you want the result in a char string, I use sprintf to copy the output into a buffer, which must be large enough for your possible input plus extra characters caused by a decimal overflow.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <errno.h>
#include <limits.h>
int main (void)
{
char *inputString = "1-2,4^,14-56";
char *next_code_at = inputString;
long result;
char dest[100], *dest_ptr;
printf ("%s\n", inputString);
dest[0] = 0;
dest_ptr = dest;
while (next_code_at && *next_code_at)
{
while (*next_code_at && !(isdigit(*next_code_at)))
{
dest_ptr += sprintf (dest_ptr, "%c", *next_code_at);
next_code_at++;
}
if (*next_code_at)
{
result = strtol (next_code_at, &next_code_at, 10);
if (errno)
{
perror ("strtol failed");
return EXIT_FAILURE;
} else
{
if (result < LONG_MAX)
dest_ptr += sprintf (dest_ptr, "%ld", result+1);
else
{
fprintf (stderr, "number too large!\n");
return EXIT_FAILURE;
}
}
}
}
printf ("%s\n", dest);
return EXIT_SUCCESS;
}
Sample run:
Input: 1-2,4^,14-56
Output: 2-3,5^,15-57
There are two major problems with this code:
First of all,
pch = strtok(input, ",");
When applied to the string 1-2,4^,14-56 will return the token 1-2.
When you call atoi("1-2") you'll get 1, which gets converted to 2.
You can fix this by changing the first strtok to pch = strtok(NULL, "-,^");
Second of all, strtok modifies the string, which means that you lose the original delimiter found. As this looks like a homework exercise, I'll leave you to figure out how to get around this.
I think this could by easier using regular expressions(and C++ instead of C of course):
Complete exmaple:
#include <iostream>
#include <iterator>
#include <regex>
#include <string>
int main()
{
// Your test string.
std::string input("1-2,4^,14-56");
// Regex representing a number.
std::regex number("\\d+");
// Iterators for traversing the test string using the number regex.
auto ri_begin = std::sregex_iterator(input.begin(), input.end(), number);
auto ri_end = std::sregex_iterator();
for (auto i = ri_begin; i != ri_end; ++i)
{
std::smatch match = *i; // Match a number.
int value = std::stoi(match.str()); // Convert that number to integer.
std::string replacement = std::to_string(++value); // Increment 1 and convert to string again.
input.replace(match.position(), match.length(), replacement); // Finally replace.
}
std::cout << input << std::endl;
return 0;
}
Output:
2-3,5^,15-57
strtok modifies the string you pass to it. Either use strchr or something like that to find the delimiters or make a copy of the string to work on.

How return position of pointer in function of searching substrings in strings?

Help please how to finish function.
I got exersize for develop function for searching substring in string and return first position of enter.
That is code what i made:
int strstr(const char *str, const char *pattern) {
const char *st = str; // assign adress of string to pointer
const char *pa = pattern; //assign adress of pattern what we must find in string to pointer
while (*st){ // starting sort out string
++st;
if( *st == *pa){ //when first symbol of pattern equal to symbol of string starting the loop
int i = 0; //counter of iteration for possibility to return first enter of substring
for(i;*st == *pa;i++){ //that loop sort out every next symbol of string and pattern for equality
++st;
++pa;
} //loop finish when pattern or string was ended, or any next symbol was not equal
if(*pa == 0){ //if patter was ended return position of first enter
return st-i; //there not compiling((
}
pa-i; //reset pattern
st-i; //reset string
}
}
return -1; //return -1, if substring was not find
}
For hard luck that code not compiling... Error is invalid conversion from ‘const char*’ to ‘int’
What type must be variable i for that? And check my logic please)
return st-i; //there not compiling((
You are returning a pointer to a constant char, where your function requires to return an integer. My best guess is you need to change it into:
return *(st-i)
Use the * to dereference the pointer into the const char object, which is interchangeable with int
The problem is that your function is currentlu defined to return an int.
If you desire to return an int, such as the relative position from the beginning of your string, then you have to return a difference between pointers
return (st-i)-str; // st-i = begin of the pattern found, - str for the relative position
If you desire to return a pointer, then your function signature shall be changed and you should return nullptr instead of -1 when you didn't fin the patter.
Several other minor issues:
incrementing st before starting comparison risk to miss the pattern if the string starts with it.
pa-i and st-i are without effect: it's just expressions, no change is stored. Maybe you wanted to write pa-=i ?
Try the following. At least it looks simpler.:)
#include <iostream>
int strstr( const char *str, const char *pattern )
{
bool found = false;
const char *p = str;
for ( ; *p && !found; ++p )
{
size_t i = 0;
while ( p[i] == pattern[i] && pattern[i] != '\0' ) ++i;
found = pattern[i] == '\0';
}
return found ? --p - str : -1;
}
int main()
{
std::cout << ::strstr( "Hello evsign", "evsign" ) << std::endl;
return 0;
}
The output is
6
As for your code then even the first statement in the loop is wrong
while (*st){ // starting sort out string
++st;
Why is st increased?
Also this loop
for(i;*st == *pa;i++){
shall be written as
for( ;*st == *pa && *pa; i++){

How can I tokenize a c-style string from a file without making a copy?

Let's say I have a constant c-style string say
const char* msg = "fred,jim,345,7665";
I'd like to tokenize this and read out the individual fields but for performance reasons I don't want to make a copy. How can I do this?
Obviously strtok takes a non-constant pointer and boost::tokenizer is an option but I am unsure what is doing behind the scenes.
Inevitably you will require some copy of the string, even if it is a substring being copied.
If you have a strtok_r function, you can use that, but it will still require a mutable string to do its work. Beware, however, as not all systems provide the function (e.g. Windows), which is why I've provided an implementation here. It works by requiring an additional parameter: a pointer to a C string to save the address of the next match. This allows for it to be more reentrant (thread-safe) in theory. However, you'll still be mutating the value. You could modify it to suit your needs if you like, perhaps copying N bytes into a destination buffer and null-terminating that buffer to avoid the need to modify the source string.
/*
Usage:
char *tok;
char *savep;
tok = mystrtok_r (somestr, ",", &savep);
while (NULL != tok)
{
/* Do something with `tok'. */
tok = mystrtok_r (NULL, ",", &savep);
}
*/
char *
mystrtok_r (char *str, const char *delims, char **nextp)
{
if (str == NULL)
str = *nextp;
str += strspn (str, delims);
*nextp = str + strcspn (str, delims);
**nextp = 0;
if (*str == 0)
return NULL;
++*nextp;
return str;
}
It depends on how you're going to use it.
If you want to get the next token, and then the next (like an iteration over the string, then you only really need to copy the current token into memory.
long strtok2( char *strDest, const char *strSrc, const char cTok, long lOffset, long lMax)
{
if(lMax > 0)
{
strSrc += lOffset;
char * start = strDest;
while(--lMax && *strSrc != cTok && (*strDest++ = * strSrc++) );
*strDest = 0; //for when the token was found, not the null.
return strDest - start - 1; //the length of the token
}
return 0;
}
I snagged a simple strcpy from http://vijayinterviewquestions.blogspot.com.au/2007/07/implement-strcpy-function.html
const char* msg = "fred,jim,345,7665";
char * buffer[20];
long offset = 0;
while(length = strtok2(buffer, msg, ',', offset, 20))
{
cout << buffer;
offset += (length+1);
}
Well, without a little more detail it's hard to know exactly what you want. I'll guess you are parsing delimited items where consecutive delimiters should be treated as zero length tokens (which is usually correct for comma separated elements). I'm also assuming a blank line counts as a single zero length token. This is how I'd approach it:
const char *token_begin = msg;
int length;
for(;;)
{
length = 0;
while(!isDelimiter(token_begin[length])) //< must include \0 as delimiter
++length;
//..do something here with token. token is at: token_begin[0..length)
if ( token_begin[length] != 0 )
token_begin = &token_begin[length+1]; //skip beyond non-null delimiter
else
break; //token null terminated. exit
}
If you are going to store the tokens somewhere then a copy will be necessary in any case and strtok does this nicely by using the string a placing null terminating character inside it.
The only other option I see to avoid copying it is a lexer which reads the string and through a state machine produces tokens by scanning the string and storing the partial results in a buffer but every token should in any case be stored at least in a null terminated string to you are not really saving anything.
Here is my proposal, my code is structured and use a global variable pos(I know global variable are a bad practice but is only to give you the idea), you can replace it with a data member if you need OOP.
int position, messageLength;
char token[MAX]; // MAX = Value greater than the maximum length
// of the tokens(e.g. 1,000);
bool hasNext()
{
return position < messageLength;
}
char* next(const char* message)
{
int i = 0;
while (position < messageLength && message[position] != ',') {
token[i++] = message[position];
position++;
}
position++; // ',' found
token[i] = '\0';
return token;
}
int main(int argc, char **argv)
{
const char* msg = "fred,jim,345,7665";
position = 0;
messageLength = strlen(msg);
while (hasNext())
cout << next(msg) << endl;
return EXIT_SUCCESS;
}