string remove and update - c++

Trying to remove and replace in string with single loop.
For Example :
Input : "the-new-year"
with my code
std::string convert(std::string text) {
std::string val;
auto index{ 0 };
for (auto x : text)
{
if (x != '-')
{
val.push_back(x);
index++;
}
else
{
val.push_back(std::toupper(text[index+1]));
index++;
}
}
return val;
}
call :convert("the-new-year");
Expected output : "theNewYear"
Getting result : "theNnewYyear" // Error extra character still there
Any suggestions using any STL algo ?

I suggest you do it in two steps:
First capitalize the first letter after a dash. Use a plain iterator or index for loop for this:
std::string val = text;
for (std::size_t i = 0; i < val.length(); ++i)
{
if (i > 0 && val[i - 1] == '-')
{
val[i] = std::toupper(val[i]);
}
}
Then using the erase-remove idiom use std::remove and the string erase functions to remove the dashes:
val.erase(std::remove(begin(val), end(val), '-'), end(val));

Related

C++ separate string by selected commas

I was reading the following question Parsing a comma-delimited std::string on how to split a string by a comma (Someone gave me the link from my previous question) and one of the answers was:
stringstream ss( "1,1,1,1, or something else ,1,1,1,0" );
vector<string> result;
while( ss.good() )
{
string substr;
getline( ss, substr, ',' );
result.push_back( substr );
}
But what if my string was like the following, and I wanted to separate values only by the bold commas and ignoring what appears inside <>?
<a,b>,<c,d>,,<d,l>,
I want to get:
<a,b>
<c,d>
"" //Empty string
<d,l>
""
Given:<a,b>,,<c,d> It should return: <a,b> and "" and <c,d>
Given:<a,b>,<c,d> It should return:<a,b> and <c,d>
Given:<a,b>, It should return:<a,b> and ""
Given:<a,b>,,,<c,d> It should return:<a,b> and "" and "" and <c,d>
In other words, my program should behave just like the given solution above separated by , (Supposing there is no other , except the bold ones)
Here are some suggested solution and their problems:
Delete all bold commas: This will result in treating the following 2 inputs the same way while they shouldn't
<a,b>,<c,d>
<a,b>,,<c,d>
Replace all bold commas with some char and use the above algorithm: I can't select some char to replace the commas with since any value could appear in the rest of my string
Adding to #Carlos' answer, apart from regex (take a look at my comment); you can implement the substitution like the following (Here, I actually build a new string):
#include <algorithm>
#include <iostream>
#include <string>
int main() {
std::string str;
getline(std::cin,str);
std::string str_builder;
for (auto it = str.begin(); it != str.end(); it++) {
static bool flag = false;
if (*it == '<') {
flag = true;
}
else if (*it == '>') {
flag = false;
str_builder += *it;
}
if (flag) {
str_builder += *it;
}
}
}
Why not replace one set of commas with some known-to-not-clash character, then split it by the other commas, then reverse the replacement?
So replace the commas that are inside the <> with something, do the string split, replace again.
I think what you want is something like this:
vector<string> result;
string s = "<a,b>,,<c,d>"
int in_string = 0;
int latest_comma = 0;
for (int i = 0; i < s.size(); i++) {
if(s[i] == '<'){
result.push_back(s[i]);
in_string = 1;
latest_comma = 0;
}
else if(s[i] == '>'){
result.push_back(s[i]);
in_string = 0;
}
else if(!in_string && s[i] == ','){
if(latest_comma == 1)
result.push_back('\n');
else
latest_comma = 1;
}
else
result.push_back(s[i]);
}
Here is a possible code that scans a string one char at a time and splits it on commas (',') unless they are masked between brackets ('<' and '>').
Algo:
assume starting outside brackets
loop for each character:
if not a comma, or if inside brackets
store the character in the current item
if a < bracket: note that we are inside brackets
if a > bracket: note that we are outside brackets
else (an unmasked comma)
store the current item as a string into the resulting vector
clear the current item
store the last item into the resulting vector
Only 10 lines and my rubber duck agreed that it should work...
C++ implementation: I will use a vector to handle the current item because it is easier to build it one character at a time
std::vector<std::string> parse(const std::string& str) {
std::vector<std::string> result;
bool masked = false;
std::vector<char> current; // stores chars of the current item
for (const char c : str) {
if (masked || (c != ',')) {
current.push_back(c);
switch (c) {
case '<': masked = true; break;
case '>': masked = false;
}
}
else { // unmasked comma: store item and prepare next
current.push_back('\0'); // a terminating null for the vector data
result.push_back(std::string(&current[0]));
current.clear();
}
}
// do not forget the last item...
current.push_back('\0');
result.push_back(std::string(&current[0]));
return result;
}
I tested it with all your example strings and it gives the expected results.
Seems quite straight forward to me.
vector<string> customSplit(string s)
{
vector<string> results;
int level = 0;
std::stringstream ss;
for (char c : s)
{
switch (c)
{
case ',':
if (level == 0)
{
results.push_back(ss.str());
stringstream temp;
ss.swap(temp); // Clear ss for the new string.
}
else
{
ss << c;
}
break;
case '<':
level += 2;
case '>':
level -= 1;
default:
ss << c;
}
}
results.push_back(ss.str());
return results;
}

Retrieve each token from a file according to specific criteria

I'm trying to create a lexer for a functional language, one of the methods of which should allow, on each call, to return the next token of a file.
For example :
func main() {
var MyVar : integer = 3+2;
}
So I would like every time the next method is called, the next token in that sequence is returned; in that case, it would look like this :
func
main
(
)
{
var
MyVar
:
integer
=
3
+
2
;
}
Except that the result I get is not what I expected:
func
main(
)
{
var
MyVar
:
integer
=
3+
2
}
Here is my method:
token_t Lexer::next() {
token_t ret;
std::string token_tmp;
bool IsSimpleQuote = false; // check string --> "..."
bool IsDoubleQuote = false; // check char --> '...'
bool IsComment = false; // check comments --> `...`
bool IterWhile = true;
while (IterWhile) {
bool IsInStc = (IsDoubleQuote || IsSimpleQuote || IsComment);
std::ifstream file_tmp(this->CurrentFilename);
if (this->eof) break;
char chr = this->File.get();
char next = file_tmp.seekg(this->CurrentCharIndex + 1).get();
++this->CurrentCharInCurrentLineIndex;
++this->CurrentCharIndex;
{
if (!IsInStc && !IsComment && chr == '`') IsComment = true; else if (!IsInStc && IsComment && chr == '`') { IsComment = false; continue; }
if (IsComment) continue;
if (!IsInStc && chr == '"') IsDoubleQuote = true;
else if (!IsInStc && chr == '\'') IsSimpleQuote = true;
else if (IsDoubleQuote && chr == '"') IsDoubleQuote = false;
else if (IsSimpleQuote && chr == '\'') IsSimpleQuote = false;
}
if (chr == '\n') {
++this->CurrentLineIndex;
this->CurrentCharInCurrentLineIndex = -1;
}
token_tmp += chr;
if (!IsInStc && IsLangDelim(chr)) IterWhile = false;
}
if (token_tmp.size() > 1 && System::Text::EndsWith(token_tmp, ";") || System::Text::EndsWith(token_tmp, " ")) token_tmp.pop_back();
++this->NbrOfTokens;
location_t pos;
pos.char_pos = this->CurrentCharInCurrentLineIndex;
pos.filename = this->CurrentFilename;
pos.line = this->CurrentLineIndex;
SetToken_t(&ret, token_tmp, TokenList::ToToken(token_tmp), pos);
return ret;
}
Here is the function IsLangDelim :
bool IsLangDelim(char chr) {
return (chr == ' ' || chr == '\t' || TokenList::IsSymbol(CharToString(chr)));
}
TokenList is a namespace that contains the list of tokens, as well as some functions (like IsSymbol in this case).
I have already tried other versions of this method, but the result is almost always the same.
Do you have any idea how to improve this method?
The solution for your problem is using a std::regex. Understanding the syntax is, in the beginning, a little bit difficult, but after you understand it, you will always use it.
And, it is designed to find tokens.
The specific critera can be expressed in the regex string.
For your case I will use: std::regex re(R"#((\w+|\d+|[;:\(\)\{\}\+\-\*\/\%\=]))#");
This means:
Look for one or more characters (That is a word)
Look for one or more digits (That is a integer number)
Or look for all kind of meaningful operators (Like '+', '-', '{' and so on)
You can extend the regex for all the other stuff that you are searching. You can also regex a regex result.
Please see example below. That will create your shown output from your provided input.
And, your described task is only one statement in main.
#include <iostream>
#include <string>
#include <algorithm>
#include <regex>
// Our test data (raw string) .
std::string testData(
R"#(func main() {
var MyVar : integer = 3+2;
}
)#");
std::regex re(R"#((\w+|\d+|[;:\(\)\{\}\+\-\*\/\%\=]))#");
int main(void)
{
std::copy(
std::sregex_token_iterator(testData.begin(), testData.end(), re, 1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n")
);
return 0;
}
You try to parse using single loop, which makes the code very complicated. Instead i suggest something like this:
struct token { ... };
struct lexer {
vector<token> tokens;
string source;
unsigned int pos;
bool parse_ident() {
if (!is_alpha(source[pos])) return false;
auto start = pos;
while(pos < source.size() && is_alnum(source[pos])) ++pos;
tokens.push_back({ token_type::ident, source.substr(start, pos - start) });
return true;
}
bool parse_num() { ... }
bool parse_comment() { ... }
...
bool parse_whitespace() { ... }
void parse() {
while(pos < source.size()) {
if (!parse_comment() && !parse_ident() && !parse_num() && ... && !parse_comment()) {
throw error{ "unexpected character at position " + std::to_string(pos) };
}
}
}
This is standard structure i use, when lexing my files in any scripting language i've written. Lexing is usually greedy, so you don't need to bother with regex (which is effective, but slower, unless some crazy template based implementation). Just define your parse_* functions, make sure they return false, if they didn't parsed a token and make sure they are called in correct order.
Order itself doesn't matter usually, but:
operators needs to be checked from longest to shortest
number in style .123 might be incorrectly recognized as . operator (so you need to make sure, that after . there is no digit.
numbers and identifiers are very lookalike, except that identifiers starts with non-number.

Parse integers in string in C++

I have a string that looks like this:
"{{2,3},{10,1},9}"
and I want to convert it to an array (or vector) of strings:
["{", "{", "2", "}", ",", "{", "10", ",", "1", "}", ",", "9", "}"]
I can't just pull out each character separately because some of the integers may be double-digit, and I can't figure out how to use stringstream because there are multiple delimiters on the integers (could be followed by , or })
Just walk through the string. If we're on a digit, walk 'til we're not on a digit:
std::vector<std::string> split(std::string const& s)
{
std::vector<std::string> results;
std::locale loc{};
for (auto it = s.begin(); it != s.end(); )
{
if (std::isdigit(*it, loc)) {
auto next = std::find_if(it+1, s.end(), [&](char c){
return !std::isdigit(c, loc);
});
results.emplace_back(it, next);
it = next;
}
else {
results.emplace_back(1, *it);
++it;
}
}
return results;
}
The logic is straightforward:
Iterate over the string.
push_back() each character as a string of its own into the output vector, unless it's a digit and the last character was a digit, in which case append the digit to the last string in the output vector:
That's it.
std::string s="{{2,3},{10,1},9}";
std::vector<std::string> v;
bool last_character_was_a_digit=false;
for (auto c:s)
{
if ( c >= '0' && c <= '9')
{
if (last_character_was_a_digit)
{
v.back().push_back(c);
continue;
}
last_character_was_a_digit=true;
}
else
{
last_character_was_a_digit=false;
}
v.push_back(std::string(&c, 1));
}
Thank you Sam and Barry for answering! I actually came up with my own solution (sometimes posting a question here helps me see things more clearly) but yours are much more elegant and delimiter-independent, I'll study them further!
std::vector<std::string> myVector;
std::string nextSubStr;
for (int i=0; i<myString.size(); i++)
{
nextSubStr = myString[i];
// if we pulled a single-digit integer out of myString
if (nextSubStr != "{" && nextSubStr != "}" && nextSubStr != ",")
{
// let's make sure we get the whole integer!
int j=i;
peekNext = myString[++j];
while (peekNext != "{" && peekNext != "}" && peekNext != ",")
{
// another digit on the integer
nextSubStr += peekNext;
peekNext = myString[++j];
i++;
}
}
myVector.push_back(nextSubStr);
}

Separator character in string c++

This is the requirement: Read a string and loop it, whenever a new word is encountered insert it into std::list. If the . character has a space, tab, newline or digit on the left and a digit on the right then it is treated as a decimal point and thus part of a word. Otherwise it is treated as a full stop and a word separator.
And this is the result I run from the template program:
foo.bar -> 2 words (foo, bar)
f5.5f -> 1 word
.4.5.6.5 -> 1 word
d.4.5f -> 3 words (d, 4, 5f)
.5.6..6.... -> 2 words (.5.6, 6)
It seems very complex for me in first time dealing with string c++. Im really stuck to implement the code. Could anyone suggest me a hint ? Thanks
I just did some scratch ideas
bool isDecimal(std::string &word) {
bool ok = false;
for (unsigned int i = 0; i < word.size(); i++) {
if (word[i] == '.') {
if ((std::isdigit(word[(int)i - 1]) ||
std::isspace(word[(int)i -1]) ||
(int)(i - 1) == (int)(word.size() - 1)) && std::isdigit(word[i + 1]))
ok = true;
else {
ok = false;
break;
}
}
}
return ok;
}
void checkDecimal(std::string &word) {
if (!isDecimal(word)) {
std::string temp = word;
word.clear();
for (unsigned int i = 0; i < temp.size(); i++) {
if (temp[i] != '.')
word += temp[i];
else {
if (std::isalpha(temp[i + 1]) || std::isdigit(temp[i + 1]))
word += ' ';
}
}
}
trimLeft(word);
}
I think you may be approaching the problem from the wrong direction. It seems much easier if you turn the condition upside down. To give you some pointers in a pseudocode skeleton:
bool isSeparator(const std::string& string, size_t position)
{
// Determine whether the character at <position> in <string> is a word separator
}
void tokenizeString(const std::string& string, std::list& wordList)
{
// for every character in string
// if(isSeparator(character) || end of string)
// list.push_back(substring from last separator to this one)
}
I suggest to implement it using flex and bison with c++ implementation

How to reverse a string in blocks of 2 in C++?

What I want to do is convert a string such as
"a4b2f0" into "f0b2a4"
or in more simple terms:
turning "12345678" into "78563412"
The string will always have an even number of characters so it will always divide by 2. I'm not really sure where to start.
One simple way to do that is this:
std::string input = "12345678";
std::string output = input;
std::reverse(output.begin(), output.end());
for(size_t i = 1 ; i < output.size(); i+=2)
std::swap(output[i-1], output[i]);
std::cout << output << std::endl;
Online demo
A bit better in terms of speed, as the previous one swaps elements twice, and this one swap each pair once:
std::string input = "12345678";
std::string output = input;
for(size_t i = 0, middle = output.size()/2, size = output.size(); i < middle ; i+=2 )
{
std::swap(output[i], output[size - i- 2]);
std::swap(output[i+1], output[size -i - 1]);
}
std::cout << output << std::endl;
Demo
Let's get esoteric... (not tested! :( And definitely not built to handle odd-length sequences.)
typedef <typename I>
struct backward_pair_iterator {
typedef I base_t;
base_t base;
bool parity;
backward_pair_iterator(base_t base, parity = false):
base(base), parity(parity) {
++base;
}
backward_pair_iterator operator++() {
backward_pair_iterator result(base, !parity);
if (parity) { result.base++; result.base++; }
else { result.base--; }
return result;
}
};
template <typename I>
backward_pair_iterator<I> make_bpi(I base) {
return backward_pair_iterator<I>(base);
}
std::string output(make_bpi(input.rbegin()), make_bpi(input.rend()));
static string reverse(string entry) {
if (entry.size() == 0) {
return "";
} else {
return entry.substr (entry.size() - 2, entry.size()) + reverse(entry.substr (0, entry.size() - 2));
}
}
My method uses the power of recursive programming
A simple solution is this:
string input = "12345678";
string output = "";
for(int i = input.length() - 1; i >= 0; i-2)
{
if(i -1 >= 0){
output += input[i -1];
output += input[i];
}
}
Note: You should check to see if the length of the string when mod 2 is = because otherwise this will go off the end. Do something like I did above.