Parsing string to get comma-separated integer character pairs

Parsing string to get comma-separated integer character pairs - c++

I'm working on a project where I'm given a file that begins with a header in this format: a1,b3,t11, 2,,5,\3,*4,344,00,. It is always going be a sequence of a single ASCII character followed by an integer separated by a comma with the sequence always ending with 00,.
Basically what I have to do is go through this and put each character/integer pair into a data type I have that takes both of these as parameters and make a vector of these. For example, the header I gave above would be a vector with ('a',1), ('b',3),('t',11),(',',5)(' ',2),('\',3),('*',4),('3',44) as elements.
I'm just having trouble parsing it. So far I've:
Extracted the header from my text file from the first character up until before the ',00,' where the header ends. I can get the header string in string format or as a vector of characters (whichever is easier to parse)
Tried using sscanf to parse the next character and the next int then adding those into my vector before using substrings to remove the part of the string I've already analyzed (this was messy and did not get me the right result)
Tried going through the string as a vector and checking each element to see if it is an integer, a character, or a comma and acting accordingly but this doesn't work for multiple-digit integers or when the character itself is an int
I know I can fairly easily split my string based on the commas but I'm not sure how to do this and still split the integers from the characters while retaining both and accounting for integers that I need to treat as characters.
Any advice or useful standard library or string functions would be greatly appreciated.

One possibility, of many, would be to store the data in a structure. This uses an array of structures but the structure could be allocated as needed with malloc and realloc.
Parsing the string can be accomplished using pointers and strtol which will parse the integer and give a pointer to the character following the integer. That pointer can be advanced to use in the next iteration to get the ASCII character and integer.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define SIZE 100
struct pair {
char ascii;
int integer;
};
int main( void) {
char input[] = "a1,b3,!0,t11, 2,,5,\\3,*4,34400,";
char *pt = input;//start with pt pointing to first character of input
char *end = input;
int each = 0;
int loop = 0;
int length = 0;
struct pair pairs[SIZE] = { { '\0', 0}};
//assuming input will always end in 00, ( or ,00,)
//remove those three ( or 4 ??) characters
length = strlen ( input);
if ( length > 3) {
input[length - 3] = '\0';
}
for ( each = 0; each < SIZE; each++) {
//get the ASCII character and advance one character
pairs[each].ascii = *pt;
pt++;
//get the integer
pairs[each].integer = strtol ( pt, &end, 10);
//end==pt indicates the expected integer is missing
if ( end == pt) {
printf ( "expected an integer\n");
break;
}
//at the end of the string?
if ( *end == '\0') {
//if there are elements remaining, add one to each as one more was used
if ( each < SIZE - 1) {
each++;
}
break;
}
//the character following the integer should be a comma
if ( *end != ',') {
//if there are elements remaining, add one to each as one more was used
if ( each < SIZE - 1) {
each++;
}
printf ( "format problem\n");
break;
}
//for the next iteration, advance pt by one character past end
pt = end + 1;
}
//loop through and print the used structures
for ( loop = 0; loop < each; loop++) {
printf ( "ascii[%d] = %c ", loop, pairs[loop].ascii);
printf ( "integer[%d] = %d\n", loop, pairs[loop].integer);
}
return 0;
}
Another option is to use dynamic allocation.
This also uses sscanf to parse the input. The %n will capture the number of characters processed by the scan. The offset and add variables can then be used to iterate through the input. The last scan will only capture the ascii character and the integer and the return from sscanf will be 2.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct pair {
char ascii;
int integer;
};
int main( void) {
char input[] = "a1,b3,!0,t11, 2,,5,\\3,*4,34400,";
char comma = '\0';
char ascii = '\0';
int integer = 0;
int result = 0;
int loop = 0;
int length = 0;
int used = 0;
int add = 0;
int offset = 0;
struct pair *pairs = NULL;//so realloc will work on first call
struct pair *temp = NULL;
//assuming input will always end in 00, ( or ,00,)
//remove those three ( or 4 ??) characters
length = strlen ( input);
if ( length > 3) {
input[length - 3] = '\0';
}
while ( ( result = sscanf ( &input[offset], "%c%d%c%n"
, &ascii, &integer, &comma, &add)) >= 2) {//the last scan will only get two items
if ( ( temp = realloc ( pairs, ( used + 1) * sizeof ( *pairs))) == NULL) {
fprintf ( stderr, "problem allocating\n");
break;
}
pairs = temp;
pairs[used].ascii = ascii;
pairs[used].integer = integer;
//one more element was used
used++;
//the character following the integer should be a comma
if ( result == 3 && comma != ',') {
printf ( "format problem\n");
break;
}
//for the next iteration, add to offset
offset += add;
}
for ( loop = 0; loop < used; loop++) {
printf ( "ascii[%d] = %c ", loop, pairs[loop].ascii);
printf ( "value[%d] = %d\n", loop, pairs[loop].integer);
}
free ( pairs);
return 0;
}

Since you have figured out that you can just ignore the last 3 characters, using sscanf will be sufficient.
You can use sscanf to read one character (or getch functions), use sscanf to read an integer and finally even ignore one character.
Comment if you are having problems understanding how to do so.

Related

Checking if the first character of all the strings are same or not in a array of strings

I have an array of strings, I want to check whether the first characters of all the strings are the same or not.
I know how to retrieve the first character of a string, by this method
char first_letter;
first_letter = (*str)[0];
Initially, I thought to go the brute force way, by checking for the first letter for every strings, using a nested for loop.
int flag = 0
char f1,f2;
for(int i = 0;i < size_arr - 1;i++){
f1 = (*str[i])[0];
for(int j = i + 1;j < size_arr;j++){
f2 = (*str[j])[0];
if(f1 != f2)
flag += 1;
}
}
if(!(flag))
cout<<"All first characters same";
else
cout<<"Different";
But I need an approach to find whether the first letters of all the strings present in an array are the same or not. Is there any efficient way?

You needn't use a nested for loop.Rather modify your code this way
for(int i = 0;i < size_arr - 2;i++){
f1 = (*str[i])[0];
f2 = (*str[i+1])[0];
if( f1!=f2 ){
printf("not same characters at first position");
break;
flag=1;
}
}
if(flag==0)printf("same characters at first position");

I made this C approach for you (it's because you have used character arrays here, not std::string of C++ – so it's convenient to describe using C code):
#include <stdio.h>
#define MAX_LENGTH 128
int main(void) {
char string[][MAX_LENGTH] = {"This is string ONE.", "This one is TWO.",
"This is the third one."};
char first_letter = string[0][0];
int total_strs = sizeof(string) / sizeof(string[0]);
int FLAG = 1;
// Iterate through each letter of each string
for (int i = 0; i < total_strs; i++)
// First letter of the string is equal to first_letter?
if (string[i][0] != first_letter) {
FLAG = 0; // set to 0 as soon as it finds
break; // the initial_letter is NOT equal to the first
} // letter
if (FLAG)
fprintf(stdout, "The strings have the same initial letters.\n");
else
fprintf(stdout, "Not all strings have the same initial letters.\n");
return 0;
}
If you want to convert it to a C++ code, no big issue – just replace stdio.h with iostream, int FLAG = 1 with bool FLAG = true, fprintf() to std::cout statements, that's it.
In case you need to work with std::string for the same job, just simply get the array of those strings, set the flag as true by default, iterate through each string, and match in case the first string's initial letter is equivalent to others, eventually, mark the flag as false in as soon as a defected string is found.
The program will display (if same initial vs. if not):
The strings have the same initial letters.
Not all strings have the same initial letters.

How to parse out integers from a line with characters and integers

For a C/C++ assignment, I need to take an input line, starting with the character 's', followed by UP TO 3 separate integers. My issue is that, without vectors, I don't know how to account for an unknown number of integers (1-20).
For example, a test input would look like:
s 1 12 20
It was suggested to me to use cin.getline and take the whole line as a string, but how would I know where each integer would lie in a character array because of the possibility of single or double digits, let alone the number of integers in said string?

Construct a std::istringstream from the contents of the line, then keep using operator>> into an int, until it fail()s, stuffing each integer into a std::vector (after using the operator>> initially, once, to take care of the leading character).

You can mimic vectors using dynamic memory allocation. Initially create an array of size 2, using int *a = new int[2];
When this array fills up, make a new array of double the size, copy the old array in the new one and reassign a to the new array. Keep doing this until you have met the requirement.
EDIT
So getting the numbers through the string stream, if the array fills up, you could do:
int changeArr(int *a, int size){
int *b = new int[size*2];
for(int i=0;i<size;i++){
b[i] = a[i];
}
a = b;
return size*2;
}
int getNos(istringstream ss){
int *a = new int[2];
int cap = 2, i=0, number;
while(ss){
if(i>=cap){
cap = changeArr(a, cap);
}
ss >> a[i];
i++;
}
}
I have skipped the part about the first character, but I guess you can handle that.

Without vectors, you have a couple of approaches. (1) read an entire line at a time and tokenize the line with strtok or strsep, or (2) use the standard features built into strtol to walk down the string separating values with the pointer and end-pointer parameters to the function.
Since you know the format, you can easily use either. Both 1 & 2 above do the same thing, you are just using the tools in strtol to both tokenize and convert to a number in a single step. Here is a short example for handling a string followed by an unknown number of digits on each line:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <errno.h>
enum { BASE = 10, MAXC = 512 };
long xstrtol (char *p, char **ep, int base);
int main (void) {
char buf[MAXC] = "";
while (fgets (buf, MAXC, stdin)) { /* for each line of input */
char *p, *ep; /* declare pointers */
p = buf; /* reset values */
errno = 0;
printf ("\n%s\n", p); /* print the original full buffer */
/* locate 1st digit in string */
for (; *p && (*p < '0' || '9' < *p); p++) {}
if (!*p) { /* validate digit found */
fprintf (stderr, "warning: no digits in '%s'\n", buf);
continue;
}
/* separate integer values */
while (errno == 0)
{ int idx = 0;
long val;
/* parse/convert each number in line into long value */
val = xstrtol (p, &ep, BASE);
if (val < INT_MIN || INT_MAX < val) { /* validate int value */
fprintf (stderr, "warning: value exceeds range of integer.\n");
continue;
}
printf (" int[%2d]: %d\n", idx++, (int) val); /* output int */
/* skip delimiters/move pointer to next digit */
while (*ep && *ep != '-' && (*ep < '0' || *ep > '9')) ep++;
if (*ep)
p = ep;
else
break;
}
}
return 0;
}
/** a simple strtol implementation with error checking.
* any failed conversion will cause program exit. Adjust
* response to failed conversion as required.
*/
long xstrtol (char *p, char **ep, int base)
{
errno = 0;
long val = strtol (p, ep, base);
/* Check for various possible errors */
if ((errno == ERANGE && (val == LONG_MIN || val == LONG_MAX)) ||
(errno != 0 && val == 0)) {
perror ("strtol");
exit (EXIT_FAILURE);
}
if (*ep == p) {
fprintf (stderr, "No digits were found\n");
exit (EXIT_FAILURE);
}
return val;
}
(the xstrtol function just moves the normal error checking to a function to unclutter the main body of the code)
Example Input
$ cat dat/varyint.txt
some string 1, 2, 3
another 4 5
one more string 6 7 8 9
finally 10
Example Use/Output
$ ./bin/strtolex <dat/varyint.txt
some string 1, 2, 3
int[ 0]: 1
int[ 1]: 2
int[ 2]: 3
another 4 5
int[ 0]: 4
int[ 1]: 5
one more string 6 7 8 9
int[ 0]: 6
int[ 1]: 7
int[ 2]: 8
int[ 3]: 9
finally 10
int[ 0]: 10
You can provide a bit of tidying up, but this method can be used to parse an unknown number of values reliably. Look it over and let me know if you have any questions.

Since vectors aren't allowed, you'll need to find out how many numbers are in the line before you can make an array to hold them.
I won't just give you the entire code, since this is homework, but I'll show you what I would do to solve your problem.
If your lines will always look like this: "s number" or "s number number" or "s number number number", then you can easily find the number of numbers in the line by counting the spaces!
There will be one space in any string with one number (between the s and that number), and one more space for each number that follows the first.
So let's count the spaces!
int countSpaces(string s) {
int count = 0;
for (int i = 0; i < s.size(); i++) {
if (s[i] == ' ') {
count++;
}
}
return count;
}
Passing these strings:
string test1 = "s 123 4 99999";
string test2 = "s 1";
string test3 = "s 555 1337";
to the countSpaces function will give us:
3
1
2
And with that information, we can make an array with the correct size to hold each value!
EDIT
Now I realize that you're having trouble grabbing the numbers from the string.
What I would do, is use the above method to find the number of numbers in the line. Then, I would use the std::string.find() function to determine where, and if, any spaces are in the string.
So let's say we had the line: s 123 45 678
countSpaces would tell us we have 3 numbers.
Then we make an array to hold our three numbers. I would also cut off the s part so you don't have to worry about it anymore. Note that you can use std::stoi to turn a string into a number!
Now we can loop while find(' ') doesn't return -1.
In our loop, I would take the substring from 0 to the first space, like so:
num = std::stoi( myLine.substr(0, myLine.find(' ') )
Then you can cut off the part you just used:
myLine = myLine.substr( myLine.find(' ') );
This will grab a number off the front of your string, then chop off that number from the string, and repeat the process while there is still a space in the string.
EDIT:
If you aren't guaranteed to have one space between each number, then you can delete excess spaces before doing this method or you can do it during the countSpaces loop. At that point, it would make more sense to call the function countNums or such.
An example function to remove stretches of spaces and replace them with one space:
void removeExtraSpaces(string s) {
bool inSpaces = (s[0] == ' ');
for (int i = 1; i < s.size(); i++) {
if (s[i] == ' ') {
if(inSpaces) {
s.erase(i);
} else {
inSpaces = true;
}
} else if(inSpaces) {
inSpaces = false;
}
}
}

Input C-style string and get the length

The string input format is like this
str1 str2
I DONT know the no. of characters to be inputted beforehand so need to store 2 strings and get their length.
Using the C-style strings ,tried to made use of the scanf library function but was actually unsuccessful in getting the length.This is what I have:
// M W are arrays of char with size 25000
while (T--)
{
memset(M,'0',25000);memset(W,'0',25000);
scanf("%s",M);
scanf("%s",W);
i = 0;m = 0;w = 0;
while (M[i] != '0')
{
++m; ++i; // incrementing till array reaches '0'
}
i = 0;
while (W[i] != '0')
{
++w; ++i;
}
cout << m << w;
}
Not efficient mainly because of the memset calls.
Note:
I'd be better off using std::string but then because of 25000 length input and memory constraints of cin I switched to this.If there is an efficient way to get a string then it'd be good

Aside from the answers already given, I think your code is slightly wrong:
memset(M,'0',25000);memset(W,'0',25000);
Do you really mean to fill the string with the character zero (value 48 or 0x30 [assuming ASCII before some pedant downvotes my answer and points out that there are other encodings]), or with a NUL (character of the value zero). The latter is 0, not '0'
scanf("%s",M);
scanf("%s",W);
i = 0;m = 0;w = 0;
while (M[i] != '0')
{
++m; ++i; // incrementing till array reaches '0'
}
If you are looking for the end of the string, you should be using 0, not '0' (as per above).
Of course, scanf will put a 0 a the end of the string for you, so there's no need to fill the whole string with 0 [or '0'].
And strlen is an existing function that will give the length of a C style string, and will most likely have a more clever algorithm than just checking each character and increment two variables, making it faster [for long strings at least].

You do not need memset when using scanf, scanf adds the terminating '\0' to string.
Also, strlen is more simple way to determine string's length:
scanf("%s %s", M, W); // provided that M and W contain enough space to store the string
m = strlen(M); // don't forget #include <string.h>
w = strlen(W);

C-style strlen without memset may looks like this:
#include <iostream>
using namespace std;
unsigned strlen(const char *str) {
const char *p = str;
unsigned len = 0;
while (*p != '\0') {
len++;
*p++;
}
return len;
}
int main() {
cout << strlen("C-style string");
return 0;
}
It's return 14.

intToStr recursively

This is a task from school, I am supposed to write a recursive function that will convert a given int to a string, I know I'm close but I can't point the missing thing in my code, hints are welcome.
void intToStr(unsigned int num, char s[])
{
if (num < 10)
{
s[0] = '0' + num;
}
else
{
intToStr(num/10, s);
s[strlen(s)] = '0' + num%10;
}
}
Edit: my problem is that the function only works for pre initialized arrays, but if I let the function work on an uninitialized function it will not work.

Unless your array is zero-initialized, you are forgetting to append a null terminator when you modify it.
Just add it right after the last character:
void intToStr(unsigned int num, char s[])
{
if (num < 10)
{
s[0] = '0' + num;
s[1] = 0;
}
else
{
intToStr(num/10, s);
s[strlen(s)+1] = 0; //you have to do this operation here, before you overwrite the null terminator
s[strlen(s)] = '0' + num%10;
}
}
Also, your function is assuming that s has enough space to hold all the digits, so you better make sure it does (INT_MAX is 10 digits long I think, so you need at least 11 characters).

Andrei Tita already showed you the problem you had with the NULL terminators. I will show you an alternative, so you can compare and contrast different approaches:
int intToStr(unsigned int num, char *s)
{
// We use this index to keep track of where, in the buffer, we
// need to output the current character. By default, we write
// at the first character.
int idx = 0;
// If the number we're printing is larger than 10 we recurse
// and use the returned index when we continue.
if(num > 9)
idx = intToStr(num / 10, s);
// Write our digit at the right position, and increment the
// position by one.
s[idx++] = '0' + (num %10);
// Write a terminating NULL character at the current position
// to ensure the string is always NULL-terminated.
s[idx] = 0;
// And return the current position in the string to whomever
// called us.
return idx;
}
You will notice that my alternative also returns the final length of the string that it output into the buffer.
Good luck with your coursework going forward!

How to find string in a string

I somehow need to find the longest string in other string, so if string1 will be "Alibaba" and string2 will be "ba" , the longest string will be "baba". I have the lengths of strings, but what next ?
char* fun(char* a, char& b)
{
int length1=0;
int length2=0;
int longer;
int shorter;
char end='\0';
while(a[i] != tmp)
{
i++;
length1++;
}
int i=0;
while(b[i] != tmp)
{
i++;
length++;
}
if(dlug1 > dlug2){
longer = length1;
shorter = length2;
}
else{
longer = length2;
shorter = length1;
}
//logics here
}
int main()
{
char name1[] = "Alibaba";
char name2[] = "ba";
char &oname = *name2;
cout << fun(name1, oname) << endl;
system("PAUSE");
return 0;
}

Wow lots of bad answers to this question. Here's what your code should do:
Find the first instance of "ba" using the standard string searching functions.
In a loop look past this "ba" to see how many of the next N characters are also "ba".
If this sequence is longer than the previously recorded longest sequence, save its length and position.
Find the next instance of "ba" after the last one.
Here's the code (not tested):
string FindLongestRepeatedSubstring(string longString, string shortString)
{
// The number of repetitions in our longest string.
int maxRepetitions = 0;
int n = shortString.length(); // For brevity.
// Where we are currently looking.
int pos = 0;
while ((pos = longString.find(shortString, pos)) != string::npos)
{
// Ok we found the start of a repeated substring. See how many repetitions there are.
int repetitions = 1;
// This is a little bit complicated.
// First go past the "ba" we have already found (pos += n)
// Then see if there is still enough space in the string for there to be another "ba"
// Finally see if it *is* "ba"
for (pos += n; pos+n < longString.length() && longString.substr(pos, n) == shortString; pos += n)
++repetitions;
// See if this sequence is longer than our previous best.
if (repetitions > maxRepetitions)
maxRepetitions = repetitions;
}
// Construct the string to return. You really probably want to return its position, or maybe
// just maxRepetitions.
string ret;
while (maxRepetitions--)
ret += shortString;
return ret;
}

What you want should look like this pseudo-code:
i = j = count = max = 0
while (i < length1 && c = name1[i++]) do
if (j < length2 && name2[j] == c) then
j++
else
max = (count > max) ? count : max
count = 0
j = 0
end
if (j == length2) then
count++
j = 0
end
done
max = (count > max) ? count : max
for (i = 0 to max-1 do
print name2
done
The idea is here but I feel that there could be some cases in which this algorithm won't work (cases with complicated overlap that would require going back in name1). You may want to have a look at the Boyer-Moore algorithm and mix the two to have what you want.

The Algorithms Implementation Wikibook has an implementation of what you want in C++.

http://www.cplusplus.com/reference/string/string/find/
Maybe you made it on purpose, but you should use the std::string class and forget archaic things like char* string representation.
It will make you able to use lots of optimized methods, such as string research, etc.

why dont you use strstr function provided by C.
const char * strstr ( const char * str1, const char * str2 );
char * strstr ( char * str1, const char * str2 );
Locate substring
Returns a pointer to the first occurrence of str2 in str1,
or a null pointer if str2 is not part of str1.
The matching process does not include the terminating null-characters.
use the length's now and create a loop and play with the original string anf find the longest string inside.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parsing string to get comma-separated integer character pairs - c++

Related

Checking if the first character of all the strings are same or not in a array of strings

How to parse out integers from a line with characters and integers

Input C-style string and get the length

intToStr recursively

How to find string in a string

Categories

Resources