How can i split adjacent numbers and letters in c++?

How can i split adjacent numbers and letters in c++? - c++

I've got a large text document that including adjacent numbers and letters.
Just like that,
JACK1940383DAVID30284HAROLD68372TROY4392 etc.
How can i split this like below in C++
List: Jack / 1940383 , David/30284, ...

You can use std::string::find_first_of() and std::string::find_first_not_of() in a loop, using std::string::substr() to extract each piece, eg:
std::string s = "JACK1940383DAVID30284HAROLD68372TROY4392";
std::string::size_type start = 0, end;
while ((end = s.find_first_of("0123456789", start)) != std::string::npos) {
std::string name = s.substr(start, end-start);
start = end;
int number;
if ((end = s.find_first_not_of("0123456789", start)) != std::string::npos) {
number = std::stoi(s.substr(start, end-start));
}
else {
number = std::stoi(s.substr(start));
}
start = end;
// use name and number as needed...
}
Online Demo

You can use regex like this:
#include <iostream>
#include <string>
#include <regex>
#include <vector>
// create a struct to group your data
// this makes it easy to store it in a vector.
struct person_t
{
std::string name;
std::string number;
};
// overloaded output operator for printing one person's details
std::ostream& operator<<(std::ostream& os, const person_t& person)
{
std::cout << person.name << ": " << person.number << std::endl;
return os;
}
// get a vector of person_t based on the input
auto get_persons(const std::string& input)
{
// make a regex in this case a regex that will match one or more capital letters
// and groups them using the ()
// then match one or more digits and group them too.
static const std::regex rx{ "([A-Z]+)([0-9]+)" };
std::smatch match;
// a vector to hold all the persons
std::vector<person_t> persons;
// start at begin of string and look for first part of the string
// that matches the regex.
auto cbegin = input.cbegin();
while (std::regex_search(cbegin, input.cend(), match, rx))
{
// match[0] will contain the whole match,
// match[1]-match[n] will contain the groups from the regular expressions
// match[1] will contain the match with characters and thus the name
// match[2] will contain the match with the numbers and thus the number.
// create a person_t struct with this info
person_t person{ match[1], match[2] };
// and add it to the vector
persons.push_back(person);
cbegin = match.suffix().first;
}
return persons;
}
int main()
{
// parse and split the string
auto persons = get_persons("JACK1940383DAVID30284HAROLD68372TROY4392");
// show the output
for (const auto& person : persons)
{
std::cout << person;
}
}

As pointed in other good answers you can use
find_first_of(), find_first_not_of() and substr() from std::string in a loop
regex
But it may be too much. I will add 3 more examples that you may find
simpler.
The first 2 programs expects the file name on the command line for (my) convenience here, and the test file is in.txt. Contents are the same as posted
JACK1940383DAVID30284HAROLD68372TROY4392
The last example just parses the string data declared as a char[]
1. Using fscanf()
Since the target is to consume formatted data, fscanf() is an option. As the data structure is very simple, the program is just a one line loop:
char mask[] = "%50[^0-9]%50[0-9]";
while ( 2 == fscanf(F, mask, tk_key, tk_value))
std::cout << tk_key << "/" << tk_value << "\n";
program output
output is the same for all examples
JACK/1940383
DAVID/30284
HAROLD/68372
TROY/4392
code for ex. 1
#include <errno.h>
#include <iostream>
int main(int argc,char** argv)
{
if (argc < 2)
{ std::cerr << "Use: pgm FileName\n";
return -1;
}
FILE* F = fopen(argv[1], "r");
if (F == NULL)
{
perror("Could not open file");
return -1;
}
std::cerr << "File: \"" << argv[1] << "\"\n";
char tk_key[50], tk_value[50];
char mask[] = "%50[^0-9]%50[0-9]";
while ( 2 == fscanf(F, mask, tk_key, tk_value))
std::cout << tk_key << "/" << tk_value << "\n";
fclose(F);
return 0;
}
using a state machine
There are just 2 states so it is not a fancy FSA ;) State machines are good for representing this kind of stuff, albeit here this seems to be overkill.
#define S_LETTER 0
#define S_DIGIT 1
#include <algorithm>
#include <iostream>
#include <fstream>
using iich = std::istream_iterator<char>;
int main(int argc,char** argv)
{
std::ifstream in_file{argv[1]};
if ( not in_file.good()) return -1;
iich p {in_file}, eofile{};
std::string token{}; // string to build values
char st = S_LETTER; // state value for FSA
std::for_each(p, eofile,
[&token,&st](char ch)
{
char temp = 0;
switch (st)
{
case S_LETTER:
if ((ch >= '0') && (ch <= '9'))
{
std::cout << token << "/";
token = ch;
st = S_DIGIT; // now in number
}
else token += ch; // concat in string
break;
case S_DIGIT:
default:
if ((ch < '0') || (ch > '9'))
{ // is a letter
std::cout << token << "\n";
token = ch;
st = S_LETTER; // now in name
}
else token += ch; // concat in string
break;
}; // switch()
});
std::cout << token << "\n"; // print last token
}
Here we have no loop. for_each gets the data from an iterator and passes it to a function that builds the name and the value as strings and couts them
Output is the same
3. a simple FSA to consume the data
#define S_LETTER 0
#define S_DIGIT 1
#include <iostream>
int main(void)
{
char one[] = "JACK1940383DAVID30284HAROLD68372TROY4392";
char* p = (char*)&one;
char* token = p;
char st = S_LETTER;
char temp = 0;
while (*p != 0)
{
switch (st)
{
case S_LETTER:
if ((*p >= '0') && (*p <= '9'))
{
temp = *p;
*p = 0;
std::cout << token << "/";
*p = temp;
token = p;
st = S_DIGIT; // now in number
}
break;
case S_DIGIT:
default:
if ( (*p < '0') || (*p > '9'))
{ // letter
temp = *p;
*p = 0;
std::cout << token << "\n";
*p = temp;
token = p;
st = S_LETTER; // now in name
}
break;
}; // switch()
p += 1; // next symbol
}; // while()
std::cout << token << "\n"; // print last token
}
This code just uses a C-style loop to parse the input data

Related

Print name of the function that the variables belong to in C++

I am having so much trouble trying to solve this one out. I have to read a .c file that has three functions (add, sub and main) and I want to print to the console the name of their variables with the name of the function in brackets. I tried implementing a string function_name in my struct to store the value of the functions, but I don't know how to print it next to my variables until I hit another function. Any help or advice will be much appreciated.
For example:
From this .c text
int add ( int a , int b )
{
return a + b ;
}
I want to get this:
add, line 1, function, int, referenced 2
a (add), line 1, variable, int, referenced 1
b (add), line 1, variable, int, referenced 1
But I get this:
add(add), line 1, function, int, referenced 16
a, line 1, variable, int, referenced 15
b, line 1, variable, int, referenced 15
My code so far looks like this.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
using namespace std;
struct identifier
{
string id_name;
string function_name;
int id_count;
string id_function;
string id_type;
int id_ref;
};
int main(int argc, char** argv)
{
if (argc < 2)
{
cout << "ERROR: There is no file selected." << endl;
}
ifstream file(argv[1]);
string line;
string token;
vector<identifier> id_list;
int line_counter = 0;
int num_functions = 0;
int num_variables = 0;
int num_if = 0;
int num_for = 0;
int num_while = 0;
while (getline(file, line))
{
stringstream stream(line);
line_counter++;
while (stream >> token)
{
bool found = false;
for (auto& v : id_list)
{
if (v.id_name == token)
{
//We have seen the word so add one to its count
v.id_ref++;
found = true;
break;
}
}
if (token == "int" || token == "int*")
{
string star = token;
identifier intI;
stream >> token;
string name = token;
intI.id_name = name;
intI.id_count = line_counter;
intI.id_type = "int";
stream >> token; //Get the next token
if (token == "(")
{
//We have a function
intI.id_function = "function";
if (intI.id_name != "main")
{
intI.function_name = "(" + name + ")";
}
num_functions++;
}
else
{
//We have a variable
intI.id_function = "variable";
if (star == "int*")
{
intI.id_type = "int*";
}
num_variables++;
}
id_list.push_back(intI);
}
}
file.close();
//Print the words and their counts
for (auto& v : id_list)
{
cout << v.id_name << v.function_name << ", line " << v.id_count << ", " << v.id_function << ", " << v.id_type << ", referenced " << v.id_ref << endl;
}
return 0;

I can see you're incrementing id_ref now, but it's still not initialized, so you have undefined behaviour. Easiest way is to do = 0; where its defined in the struct.
As for your function, assuming there's no nested functions here, then you can just use a variable to keep track of that.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
struct identifier {
std::string id_name;
std::string function_name;
int id_count;
std::string id_function;
std::string id_type;
int id_ref = 0; // if not initialized, then you will get seemingly random numbers
};
int main( int argc, char **argv ) {
if ( argc < 2 ) {
std::cout << "ERROR: There is no file selected." << std::endl;
return 1; // quit early
}
std::ifstream file( argv[1] );
std::string line;
std::string token;
std::vector<identifier> id_list;
int line_counter = 0;
int num_functions = 0;
int num_variables = 0;
int num_if = 0;
int num_for = 0;
int num_while = 0;
std::string current_function; // keep track of the function
while ( std::getline( file, line ) ) {
std::stringstream stream( line );
line_counter++;
while ( stream >> token ) {
bool found = false;
for ( auto &v : id_list ) {
if ( v.id_name == token ) {
//We have seen the word so add one to its count
v.id_ref++;
found = true;
break;
}
}
if ( token == "int" || token == "int*" ) {
std::string star = token;
identifier intI;
stream >> token;
std::string name = token;
intI.id_name = name;
intI.id_count = line_counter;
intI.id_type = "int";
stream >> token; //Get the next token
if ( token == "(" ) {
//We have a function
intI.id_function = "function";
if ( intI.id_name != "main" ) {
current_function = name; // update the current function name
}
num_functions++;
} else {
intI.function_name = "(" + current_function + ")"; // add the function name to the variable name
//We have a variable
intI.id_function = "variable";
if ( star == "int*" ) {
intI.id_type = "int*";
}
num_variables++;
}
id_list.push_back( intI );
}
}
}
//file.close();
//Print the words and their counts
for ( const auto &v : id_list ) {
std::cout << v.id_name << v.function_name << ", line " << v.id_count << ", " << v.id_function << ", " << v.id_type << ", referenced " << v.id_ref << std::endl;
}
return 0;
}
Also, some recommended reading on using namespace std
Working example modified to work with a string, instead of parameter: https://godbolt.org/z/jKqqrhce6

How do I remove repeated words from a string and only show it once with their wordcount

Basically, I have to show each word with their count but repeated words show up again in my program.
How do I remove them by using loops or should I use 2d arrays to store both the word and count?
#include <iostream>
#include <stdio.h>
#include <iomanip>
#include <cstring>
#include <conio.h>
#include <time.h>
using namespace std;
char* getstring();
void xyz(char*);
void tokenizing(char*);
int main()
{
char* pa = getstring();
xyz(pa);
tokenizing(pa);
_getch();
}
char* getstring()
{
static char pa[100];
cout << "Enter a paragraph: " << endl;
cin.getline(pa, 1000, '#');
return pa;
}
void xyz(char* pa)
{
cout << pa << endl;
}
void tokenizing(char* pa)
{
char sepa[] = " ,.\n\t";
char* token;
char* nexttoken;
int size = strlen(pa);
token = strtok_s(pa, sepa, &nexttoken);
while (token != NULL) {
int wordcount = 0;
if (token != NULL) {
int sizex = strlen(token);
//char** fin;
int j;
for (int i = 0; i <= size; i++) {
for (j = 0; j < sizex; j++) {
if (pa[i + j] != token[j]) {
break;
}
}
if (j == sizex) {
wordcount++;
}
}
//for (int w = 0; w < size; w++)
//fin[w] = token;
//cout << fin[w];
cout << token;
cout << " " << wordcount << "\n";
}
token = strtok_s(NULL, sepa, &nexttoken);
}
}
This is the output I get:
I want to show, for example, the word "i" once with its count of 5, and then not show it again.

First of all, since you are using c++, I would recommend you to split text in c++ way(some examples are here), and store every word in map or unordered_map. Example of my realization you can find here
But if you don't want to rewrite your code, you can simply add a variable that will indicate whether a copy of the word was found before or after the word position. If a copy was not found in front, then print your word

This post gives an example to save each word from your 'strtok' function into a vector of string. Then, use string.compare to have each word compared with word[0]. Those indexes match with word[0] are marked in an int array 'used'. The count of match equals to the number marks in the array used ('nused'). Those words of marked are then removed from the vector, and the remaining carries on to the next comparing process. The program ends when no word remained.
You may write a word comparing function to replace 'str.compare(str2)', if you prefer not to use std::vector and std::string.
#include <iostream>
#include <string>
#include <vector>
#include<iomanip>
#include<cstring>
using namespace std;
char* getstring();
void xyz(char*);
void tokenizing(char*);
int main()
{
char* pa = getstring();
xyz(pa);
tokenizing(pa);
}
char* getstring()
{
static char pa[100] = "this is a test and is a test and is test.";
return pa;
}
void xyz(char* pa)
{
cout << pa << endl;
}
void tokenizing(char* pa)
{
char sepa[] = " ,.\n\t";
char* token;
char* nexttoken;
std::vector<std::string> word;
int used[64];
std::string tok;
int nword = 0, nsize, nused;
int size = strlen(pa);
token = strtok_s(pa, sepa, &nexttoken);
while (token)
{
word.push_back(token);
++nword;
token = strtok_s(NULL, sepa, &nexttoken);
}
for (int i = 0; i<nword; i++) std::cout << word[i] << std::endl;
std::cout << "total " << nword << " words.\n" << std::endl;
nsize = nword;
while (nsize > 0)
{
nused = 0;
tok = word[0] ;
used[nused++] = 0;
for (int i=1; i<nsize; i++)
{
if ( tok.compare(word[i]) == 0 )
{
used[nused++] = i; }
}
std::cout << tok << " : " << nused << std::endl;
for (int i=nused-1; i>=0; --i)
{
for (int j=used[i]; j<(nsize+i-nused); j++) word[j] = word[j+1];
}
nsize -= nused;
}
}
Notice that the removal of used words has to do in backward order. If you do it in sequential order, the marked indexes in the 'used' array will need to be changed. A running test:
$ ./a.out
this is a test and is a test and is test.
this
is
a
test
and
is
a
test
and
is
test
total 11 words.
this : 1
is : 3
a : 2
test : 3
and : 2

I read your last comment.
But I am very sorry, I do not know C. So, I will answer in C++.
But anyway, I will answer with the C++ standard approach. That is usually only 10 lines of code . . .
#include <iostream>
#include <algorithm>
#include <map>
#include <string>
#include <regex>
// Regex Helpers
// Regex to find a word
static const std::regex reWord{ R"(\w+)" };
// Result of search for one word in the string
static std::smatch smWord;
int main() {
std::cout << "\nPlease enter text: \n";
if (std::string line; std::getline(std::cin, line)) {
// Words and its appearance count
std::map<std::string, int> words{};
// Count the words
for (std::string s{ line }; std::regex_search(s, smWord, reWord); s = smWord.suffix())
words[smWord[0]]++;
// Show result
for (const auto& [word, count] : words) std::cout << word << "\t\t--> " << count << '\n';
}
return 0;
}

Difficulties with string declaration/reference parameters (c++)

Last week I got an homework to write a function: the function gets a string and a char value and should divide the string in two parts, before and after the first occurrence of the existing char.
The code worked but my teacher told me to do it again, because it is not well written code. But I don't understand how to make it better. I understand so far that defining two strings with white spaces is not good, but i get out of bounds exceptions otherwise. Since the string input changes, the string size changes everytime.
#include <iostream>
#include <string>
using namespace std;
void divide(char search, string text, string& first_part, string& sec_part)
{
bool firstc = true;
int counter = 0;
for (int i = 0; i < text.size(); i++) {
if (text.at(i) != search && firstc) {
first_part.at(i) = text.at(i);
}
else if (text.at(i) == search&& firstc == true) {
firstc = false;
sec_part.at(counter) = text.at(i);
}
else {
sec_part.at(counter) = text.at(i);
counter++;
}
}
}
int main() {
string text;
string part1=" ";
string part2=" ";
char search_char;
cout << "Please enter text? ";
getline(cin, text);
cout << "Please enter a char: ? ";
cin >> search_char;
divide(search_char,text,aprt1,part2);
cout << "First string: " << part1 <<endl;
cout << "Second string: " << part2 << endl;
system("PAUSE");
return 0;
}

I would suggest you, learn to use c++ standard functions. there are plenty utility function that can help you in programming.
void divide(const std::string& text, char search, std::string& first_part, std::string& sec_part)
{
std::string::const_iterator pos = std::find(text.begin(), text.end(), search);
first_part.append(text, 0, pos - text.begin());
sec_part.append(text, pos - text.begin());
}
int main()
{
std::string text = "thisisfirst";
char search = 'f';
std::string first;
std::string second;
divide(text, search, first, second);
}
Here I used std::find that you can read about it from here and also Iterators.
You have some other mistakes. you are passing your text by value that will do a copy every time you call your function. pass it by reference but qualify it with const that will indicate it is an input parameter not an output.

Why is your teacher right ?
The fact that you need to initialize your destination strings with empty space is terrible:
If the input string is longer, you'll get out of bound errors.
If it's shorter, you got wrong answer, because in IT and programming, "It works " is not the same as "It works".
In addition, your code does not fit the specifications. It should work all the time, independently of the current value which is stored in your output strings.
Alternative 1: your code but working
Just clear the destination strings at the beginning. Then iterate as you did, but use += or push_back() to add chars at the end of the string.
void divide(char search, string text, string& first_part, string& sec_part)
{
bool firstc = true;
first_part.clear(); // make destinations strings empty
sec_part.clear();
for (int i = 0; i < text.size(); i++) {
char c = text.at(i);
if (firstc && c != search) {
first_part += c;
}
else if (firstc && c == search) {
firstc = false;
sec_part += c;
}
else {
sec_part += c;
}
}
}
I used a temporary c instead of text.at(i) or text\[i\], in order to avoid multiple indexing But this is not really required: nowadays, optimizing compilers should produce equivalent code, whatever variant you use here.
Alternative 2: use string member functions
This alternative uses the find() function, and then constructs a string from the start until that position, and another from that position. There is a special case when the character was not found.
void divide(char search, string text, string& first_part, string& sec_part)
{
auto pos = text.find(search);
first_part = string(text, 0, pos);
if (pos== string::npos)
sec_part.clear();
else sec_part = string(text, pos, string::npos);
}

As you understand yourself these declarations
string part1=" ";
string part2=" ";
do not make sense because the entered string in the object text can essentially exceed the both initialized strings. In this case using the string method at can result in throwing an exception or the strings will have trailing spaces.
From the description of the assignment it is not clear whether the searched character should be included in one of the strings. You suppose that the character should be included in the second string.
Take into account that the parameter text should be declared as a constant reference.
Also instead of using loops it is better to use methods of the class std::string such as for example find.
The function can look the following way
#include <iostream>
#include <string>
void divide(const std::string &text, char search, std::string &first_part, std::string &sec_part)
{
std::string::size_type pos = text.find(search);
first_part = text.substr(0, pos);
if (pos == std::string::npos)
{
sec_part.clear();
}
else
{
sec_part = text.substr(pos);
}
}
int main()
{
std::string text("Hello World");
std::string first_part;
std::string sec_part;
divide(text, ' ', first_part, sec_part);
std::cout << "\"" << text << "\"\n";
std::cout << "\"" << first_part << "\"\n";
std::cout << "\"" << sec_part << "\"\n";
}
The program output is
"Hello World"
"Hello"
" World"
As you can see the separating character is included in the second string though I think that maybe it would be better to exclude it from the both strings.
An alternative and in my opinion more clear approach can look the following way
#include <iostream>
#include <string>
#include <utility>
std::pair<std::string, std::string> divide(const std::string &s, char c)
{
std::string::size_type pos = s.find(c);
return { s.substr(0, pos), pos == std::string::npos ? "" : s.substr(pos) };
}
int main()
{
std::string text("Hello World");
auto p = divide(text, ' ');
std::cout << "\"" << text << "\"\n";
std::cout << "\"" << p.first << "\"\n";
std::cout << "\"" << p.second << "\"\n";
}

Your code will only work as long the character is found within part1.length(). You need something similar to this:
void string_split_once(const char s, const string & text, string & first, string & second) {
first.clear();
second.clear();
std::size_t pos = str.find(s);
if (pos != string::npos) {
first = text.substr(0, pos);
second = text.substr(pos);
}
}

The biggest problem I see is that you are using at where you should be using push_back. See std::basic_string::push_back. at is designed to access an existing character to read or modify it. push_back appends a new character to the string.
divide could look like this :
void divide(char search, string text, string& first_part,
string& sec_part)
{
bool firstc = true;
for (int i = 0; i < text.size(); i++) {
if (text.at(i) != search && firstc) {
first_part.push_back(text.at(i));
}
else if (text.at(i) == search&& firstc == true) {
firstc = false;
sec_part.push_back(text.at(i));
}
else {
sec_part.push_back(text.at(i));
}
}
}
Since you aren't handling exceptions, consider using text[i] rather than text.at(i).

C++ remove punctuation marks and spaces from a string

How can I remove punctuation marks and spaces from a string in a simple way without using any library functions?

int main()
{
string s = "abc de.fghi..jkl,m no";
for (int i = 0; i < s.size(); i++)
{
if (s[i] == ' ' || s[i] == '.' || s[i] == ',')
{
s.erase(i, 1); // remove ith char from string
i--; // reduce i with one so you don't miss any char
}
}
cout << s << endl;
}

Assuming you can use library I/O like <iostream> and types like std::string and you just don't want to use the <cctype> functions like ispunct().
#include <iostream>
#include <string>
int main()
{
const std::string myString = "This. is a string with ,.] stuff in, it.";
const std::string puncts = " [];',./{}:\"?><`~!-_";
std::string output;
for (const auto& ch : myString)
{
bool found = false;
for (const auto& p : puncts)
{
if (ch == p)
{
found = true;
break;
}
}
if (!found)
output += ch;
}
std::cout << output << '\n';
return 0;
}
No idea about the performance, I'm sure it can be done in multiple better ways.

Getting the text of the last directory of a given string with the delimiters of /

Given a URL (which is a string) such as this:
www.testsite.com/pictures/banners/whatever/
I want to be able to get the characters of the last directory in the URL (in this case it's "whatever", I want to also remove the forward slashes). What would be the most efficient way to do this?
Thanks for any help

#include <iostream>
#include <string>
std::string getlastcomponent(std::string s) {
if (s.size() > 0 && s[s.size()-1] == '/')
s.resize(s.size() - 1);
size_t i = s.find_last_of('/');
return (i != s.npos) ? s.substr(i+1) : s;
}
int main() {
std::string s1 = "www.testsite.com/pictures/banners/whatever/";
std::string s2 = "www.testsite.com/pictures/banners/whatever";
std::string s3 = "whatever/";
std::string s4 = "whatever";
std::cout << getlastcomponent(s1) << '\n';
std::cout << getlastcomponent(s2) << '\n';
std::cout << getlastcomponent(s3) << '\n';
std::cout << getlastcomponent(s4) << '\n';
return 0;
}

Get the length and push every letter from last ( at example pseudo code:
x = string.length()
while(X != 0)
{
CharVector.push(string.at(x));
x--;
if(string.at(x) == "\") break;
}
then you got revetahw instead of whatever.
Then just swap it with this fucntion:
string ReverseString( const string& word )
{
std::string l_bla;
bla.reserve(word.size());
for ( string::size_type x = word.length ( ); x > 0; x-- )
{
l_bla += word.at ( x -1 );
}
return l_bla;
}
so you got whatever

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can i split adjacent numbers and letters in c++? - c++

I've got a large text document that including adjacent numbers and letters. Just like that, JACK1940383DAVID30284HAROLD68372TROY4392 etc. How can i split this like below in C++ List: Jack / 1940383 , David/30284, ...

Related

Print name of the function that the variables belong to in C++

How do I remove repeated words from a string and only show it once with their wordcount

Difficulties with string declaration/reference parameters (c++)

C++ remove punctuation marks and spaces from a string

Getting the text of the last directory of a given string with the delimiters of /

Categories

Resources