How to generate godbolt like clean assembly locally? - c++

I want to generate clean assembly like Compiler Explorer locally. Note that, I read How to remove “noise” from GCC/clang assembly output? before attempting this. The output using that method isn't as clean or dense compared to godbolt and still has a lot of asm directives and unused labels in it.
How can I get clean assembly output without any unused labels or directives?

For the record, it is possible (and apparently not too hard) to set up a local install of Matt Godbolt's Compiler Explorer stuff, so you can use that to explore asm output for files that are part of existing large projects with their #include dependencies and everything.
If you already have some asm output, #Waqar's answer looks useful. Or maybe that functionality can be used on its own from the Compiler Explorer repo via node.js, IDK.
According to the install info in the readme in https://github.com/compiler-explorer/compiler-explorer (Matt's repo), you can simply run make after cloning it on a machine that has node.js installed.
I also found https://isocpp.org/blog/2017/10/cpp-weekly-episode-83-installing-compiler-explorerjason-turner which might have more details (or be obsolete at this point, IDK).
I think Matt also mentions using a local clone of Compiler Explorer in his CppCon 2017 talk about Compiler Explorer (maybe replying to a question at the end), “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”, and recommends it for playing with code that uses lots of #include that would be hard to get onto https://godbolt.org/. (Or for closed-source code).

A while ago, I needed something like this locally so I wrote a small tool to make the asm readable.
It attempts to 'clean' and make the 'asm' output from 'gcc' readable using C++ itself. It does something similar to Compiler Explorer and tries to remove all the directives and unused labels, making the asm clean. Only standard library is used for this.
Some things I should mention:
Will only with gcc and clang
Only tested with C++ code
compile with -S -fno-asynchronous-unwind-tables -fno-dwarf2-cfi-asm -masm=intel, (remove -masm= if you want AT&T asm)
AT&T syntax will probably work but I didn't test it much. The other two options are to remove the .cfi directives. It can be handled using the code below but the compiler itself does a much better job of this. See the answer by Peter Cordes above.
This program can work as standalone, but I would highly recommend reading this SO answer to tune your asm output and then process it using this program to remove unused labels / directives etc.
abi::__cxa_demangle() is used for demangling
Disclaimer: This isn't a perfect solution, and hasn't been tested extensively.
The strategy used for cleaning the asm(There are probably better, faster more efficient ways to do this):
Collect all the labels
Go through the asm line by line and check if the labels are used/unused
If the labels are unused, they get deleted
Every line beginning with '.' gets deleted, unless it is a used somewhere
Update 1: Not all static data gets removed now.
#include <algorithm>
#include <cxxabi.h>
#include <fstream>
#include <iostream>
#include <regex>
#include <string>
#include <sstream>
#include <unordered_map>
// trim from both ends (in place)
std::string_view trim(std::string_view s)
{
s.remove_prefix(std::min(s.find_first_not_of(" \t\r\v\n"), s.size()));
s.remove_suffix(std::min(s.size() - s.find_last_not_of(" \t\r\v\n") - 1, s.size()));
return s;
}
static inline bool startsWith(const std::string_view s, const std::string_view searchString)
{
return (s.rfind(searchString, 0) == 0);
}
std::string demangle(std::string &&asmText)
{
int next = 0;
int last = 0;
while (next != -1) {
next = asmText.find("_Z", last);
//get token
if (next != -1) {
int tokenEnd = asmText.find_first_of(":,.#[]() \n", next + 1);
int len = tokenEnd - next;
std::string tok = asmText.substr(next, len);
int status = 0;
char* name = abi::__cxa_demangle(tok.c_str(), 0, 0, &status);
if (status != 0) {
std::cout << "Demangling of: " << tok << " failed, status: " << status << '\n';
continue;
}
std::string demangledName{name};
demangledName.insert(demangledName.begin(), ' ');
asmText.replace(next, len, demangledName);
free((void*)name);
}
}
return std::move(asmText);
}
std::string clean_asm(const std::string& asmText)
{
std::string output;
output.reserve(asmText.length());
std::stringstream s{asmText};
//1. collect all the labels
//2. go through the asm line by line and check if the labels are used/unused
//3. if the labels are unused, they get deleted
//4. every line beginning with '.' gets deleted, unless it is a used label
std::regex exp {"^\\s*[_|a-zA-Z]"};
std::regex directiveRe { "^\\s*\\..*$" };
std::regex labelRe { "^\\.*[a-zA-Z]+[0-9]+:$" };
std::regex hasOpcodeRe { "^\\s*[a-zA-Z]" };
std::regex numericLabelsRe { "\\s*[0-9]:" };
const std::vector<std::string> allowedDirectives =
{
".string", ".zero", ".byte", ".value", ".long", ".quad", ".ascii"
};
//<label, used>
std::unordered_map<std::string, bool> labels;
//1
std::string line;
while (std::getline(s, line)) {
if (std::regex_match(line, labelRe)) {
trim(line);
// remove ':'
line = line.substr(0, line.size() - 1);
labels[line] = false;
}
}
s.clear();
s.str(asmText);
line = "";
//2
while (std::getline(s, line)) {
if (std::regex_match(line, hasOpcodeRe)) {
auto it = labels.begin();
for (; it != labels.end(); ++it) {
if (line.find(it->first)) {
labels[it->first] = true;
}
}
}
}
//remove false labels from labels hash-map
for (auto it = labels.begin(); it != labels.end();) {
if (it->second == false)
it = labels.erase(it);
else
++it;
}
s.clear();
s.str(asmText);
line = "";
std::string currentLabel;
//3
while (std::getline(s, line)) {
trim(line);
if (std::regex_match(line, labelRe)) {
auto l = line;
l = l.substr(0, l.size() - 1);
currentLabel = "";
if (labels.find(l) != labels.end()) {
currentLabel = line;
output += line + "\n";
}
continue;
}
if (std::regex_match(line, directiveRe)) {
//if we are in a label
if (!currentLabel.empty()) {
auto trimmedLine = trim(line);
for (const auto& allowedDir : allowedDirectives) {
if (startsWith(trimmedLine, allowedDir)) {
output += line;
output += '\n';
}
}
}
continue;
}
if (std::regex_match(line, numericLabelsRe)) {
continue;
}
if (line == "endbr64") {
continue;
}
if (line[line.size() - 1] == ':' || line.find(':') != std::string::npos) {
currentLabel = line;
output += line + '\n';
continue;
}
line.insert(line.begin(), '\t');
output += line + '\n';
}
return output;
}
int main(int argc, char* argv[])
{
if (argc < 2) {
std::cout << "Please provide more than asm filename you want to process.\n";
}
std::ifstream file(argv[1]);
std::string output;
if (file.is_open()) {
std::cout << "File '" << argv[1] << "' is opened\n";
std::string line;
while (std::getline(file, line)) {
output += line + '\n';
}
}
output = demangle(std::move(output));
output = clean_asm(output);
std::string fileName = argv[1];
auto dotPos = fileName.rfind('.');
if (dotPos != std::string::npos)
fileName.erase(fileName.begin() + dotPos, fileName.end());
std::cout << "Asm processed. Saving as '"<< fileName <<".asm'";
std::ofstream out;
out.open(fileName + ".asm");
out << output;
return 0;
}

I checked Compiler Explorer to see if they had a specific set of compiler options to get their output. But they don't. Instead they filter the assembly listing with this function. There's also an additional processing step that merges the debug info into source and assembly highlighting.
To answer your question, I don't think it's possible with GCC itself right now (Sep 2021).

Related

"Cleaning up" nested if statements

In a console program I am creating, I have a bit of code that parses through a file. After parsing each line, it is checked for syntax errors. If there is a syntax error, the program then stops reading the file and goes to the next part of the program. The problem is, it is very messy as my only solution to it so far is a series of nested if statements or a line of if statements. The problem with nested ifs is it gets very messy very fast, and a series of if statements has the program testing for several things that don't need to be tested. Heres some sudo code of my problem (note I am NOT using a return statement)
Pseudo code shown instead of real code, as it is very large
Nested if:
open file;
read line;
//Each if is testing something different
//Every error is different
if (line is valid)
{
read line;
if (line is valid)
{
read line;
if (line is valid)
{
do stuff;
}
else
error;
}
else
error;
}
else
error;
code that must be reached, even if there was an error;
Non-nested ifs:
bool fail = false;
open file;
read line;
//Each if is testing something different
//Every error is different
if (line is valid)
read line;
else
{
error;
fail = true;
}
if (!error && line is valid)
read line;
else
{
error;
fail = true;
}
if (!error && line is valid)
do stuff;
else
error;
//Note how error is constantly evaluated, even if it has already found to be false
code that must be reached, even if there was an error;
I have looked at many different sites, but their verdicts differed from my problem. This code does work at runtime, but as you can see it is not very elegant. Is there anyone who has a more readable/efficient approach on my problem? Any help is appreciated :)
Two options come to mind:
Option 1: chain reads and validations
This is similar to how std::istream extraction operators work. You could do something like this:
void your_function() {
std::ifstream file("some_file");
std::string line1, line2, line3;
if (std::getline(file, line1) &&
std::getline(file, line2) &&
std::getline(file, line3)) {
// do stuff
} else {
// error
}
// code that must be reached, even if there was an error;
}
Option 2: split into different functions
This can get a little long, but if you split things out right (and give everything a sane name), it can actually be very readable and debuggable.
bool step3(const std::string& line1,
const std::string& line2,
const std::string& line3) {
// do stuff
return true;
}
bool step2(std::ifstream& file,
const std::string& line1,
const std::string& line2) {
std::string line3;
return std::getline(file, line3) && step3(line1, line2, line3);
}
bool step1(std::ifstream& file,
const std::string& line1) {
std::string line2;
return std::getline(file, line2) && step2(file, line1, line2);
}
bool step0(std::ifstream& file) {
std::string line1;
return std::getline(file, line1) && step1(file, line1);
}
void your_function() {
std::ifstream file("some_file");
if (!step0(file)) {
// error
}
// code that must be reached, even if there was an error;
}
This example code is a little too trivial. If the line validation that occurs in each step is more complicated than std::getline's return value (which is often the case when doing real input validation), then this approach has the benefit of making that more readable. But if the input validation is as simple as checking std::getline, then the first option should be preferred.
Is there [...] a more readable/efficient approach on my problem
Step 1. Look around for a classical example of text parser
Answer: a compiler, which parses text files and produces different kind of results.
Step 2. Read some theory how does compilers work
There are lots of approaches and techniques. Books, online and open source examples. Simple and complicated.
Sure, you might just skip this step if you are not that interested.
Step 3. Apply theory on you problem
Looking through the theory, you will no miss such therms as "state machine", "automates" etc. Here is a brief explanation on Wikipedia:
https://en.wikipedia.org/wiki/Automata-based_programming
There is basically a ready to use example on the Wiki page:
#include <stdio.h>
enum states { before, inside, after };
void step(enum states *state, int c)
{
if(c == '\n') {
putchar('\n');
*state = before;
} else
switch(*state) {
case before:
if(c != ' ') {
putchar(c);
*state = inside;
}
break;
case inside:
if(c == ' ') {
*state = after;
} else {
putchar(c);
}
break;
case after:
break;
}
}
int main(void)
{
int c;
enum states state = before;
while((c = getchar()) != EOF) {
step(&state, c);
}
if(state != before)
putchar('\n');
return 0;
}
Or a C++ example with state machine:
#include <stdio.h>
class StateMachine {
enum states { before = 0, inside = 1, after = 2 } state;
struct branch {
unsigned char new_state:2;
unsigned char should_putchar:1;
};
static struct branch the_table[3][3];
public:
StateMachine() : state(before) {}
void FeedChar(int c) {
int idx2 = (c == ' ') ? 0 : (c == '\n') ? 1 : 2;
struct branch *b = & the_table[state][idx2];
state = (enum states)(b->new_state);
if(b->should_putchar) putchar(c);
}
};
struct StateMachine::branch StateMachine::the_table[3][3] = {
/* ' ' '\n' others */
/* before */ { {before,0}, {before,1}, {inside,1} },
/* inside */ { {after, 0}, {before,1}, {inside,1} },
/* after */ { {after, 0}, {before,1}, {after, 0} }
};
int main(void)
{
int c;
StateMachine machine;
while((c = getchar()) != EOF)
machine.FeedChar(c);
return 0;
}
Sure, instead of chars you should feed lines.
This technique scales up to a complicated compilers, proven with tons of implementations. So if you are looking for a "right" approach, here it is.
A common modern practice is an early return with RAII. Basically it means that the code that must happen should be in a destructor of a class, and your function will have a local object of that class. Now when you have error you exit early from the function (either with Exception or just plain return) and the destructor of that local object will handle the code that must happen.
The code will look something like this:
class Guard
{
...
Guard()
~Guard() { /*code that must happen */}
...
}
void someFunction()
{
Gaurd localGuard;
...
open file;
read line;
//Each if is testing something different
//Every error is different
if (!line)
{
return;
}
read line;
if (!line)
{
return;
}
...
}

how to check program is writing to terminal

This is follow up to my question posted on codereview - Colorful output on terminal where I was trying to output coloured strings on terminal and detect it via isatty() call. However as #Jerry Coffin pointed out -
You use isatty to check whether standard output is connected to a terminal, regardless of what stream you're writing to. This means the rest of the functions only work correctly if you pass std::cout as the stream to which they're going to write. Otherwise, you may allow formatting when writing to something that's not a TTY, and you may prohibit formatting when writing to something that is a TTY.
This was something that I wasn't aware of (read as had no experience in) and I wasn't even aware of the fact that cin/cout can be redirected elsewhere. So I tried to read more about it and found some existing questions on SO too. Here's what I've hacked together :
// initialize them at start of program - mandatory
std::streambuf const *coutbuf = std::cout.rdbuf();
std::streambuf const *cerrbuf = std::cerr.rdbuf();
std::streambuf const *clogbuf = std::clog.rdbuf();
// ignore this, just checks for TERM env var
inline bool supportsColor()
{
if(const char *env_p = std::getenv("TERM")) {
const char *const term[8] = {
"xterm", "xterm-256", "xterm-256color", "vt100",
"color", "ansi", "cygwin", "linux"};
for(unsigned int i = 0; i < 8; ++i) {
if(std::strcmp(env_p, term[i]) == 0) return true;
}
}
return false;
}
rightTerm = supportsColor();
// would make necessary checks to ensure in terminal
inline bool isTerminal(const std::streambuf *osbuf)
{
FILE *currentStream = nullptr;
if(osbuf == coutbuf) {
currentStream = stdout;
}
else if(osbuf == cerrbuf || osbuf == clogbuf) {
currentStream = stderr;
}
else {
return false;
}
return isatty(fileno(currentStream));
}
// this would print checking rightTerm && isTerminal calls
inline std::ostream &operator<<(std::ostream &os, rang::style v)
{
std::streambuf const *osbuf = os.rdbuf();
return rightTerm && isTerminal(osbuf)
? os << "\e[" << static_cast<int>(v) << "m"
: os;
}
My main issue is, although I've tested this manually, I'm not aware of the cases this might fail or bugs it might contain. Is this the right way to do this thing? Is there anything I might be missing?
Here's a minimal example to get running (you'll also need a in.txt with random data):
#include <iostream>
#include <fstream>
#include <string>
#include <unistd.h>
#include <cstdlib>
#include <cstring>
void f();
bool supportsColor();
// sample enum for foreground colors
enum class fg : unsigned char {
def = 39,
black = 30,
red = 31,
green = 32,
yellow = 33,
blue = 34,
magenta = 35,
cyan = 36,
gray = 37
};
// initialize them at start of program - mandatory
// so that even if user redirects, we've a copy
std::streambuf const *coutbuf = std::cout.rdbuf();
std::streambuf const *cerrbuf = std::cerr.rdbuf();
std::streambuf const *clogbuf = std::clog.rdbuf();
// check if TERM supports color
bool rightTerm = supportsColor();
// Here is the implementation of isTerminal
// which checks if program is writing to Terminal or not
bool isTerminal(const std::streambuf *osbuf)
{
FILE *currentStream = nullptr;
if(osbuf == coutbuf) {
currentStream = stdout;
}
else if(osbuf == cerrbuf || osbuf == clogbuf) {
currentStream = stderr;
}
else {
return false;
}
return isatty(fileno(currentStream));
}
// will check if TERM supports color and isTerminal()
inline std::ostream &operator<<(std::ostream &os, fg v)
{
std::streambuf const *osbuf = os.rdbuf();
return rightTerm && isTerminal(osbuf)
? os << "\e[" << static_cast<int>(v) << "m"
: os;
}
int main()
{
std::cout << fg::red << "ERROR HERE! " << std::endl
<< fg::blue << "ERROR INVERSE?" << std::endl;
std::ifstream in("in.txt");
std::streambuf *Orig_cinbuf = std::cin.rdbuf(); // save old buf
std::cin.rdbuf(in.rdbuf()); // redirect std::cin to in.txt!
std::ofstream out("out.txt");
std::streambuf *Orig_coutbuf = std::cout.rdbuf(); // save old buf
std::cout.rdbuf(out.rdbuf()); // redirect std::cout to out.txt!
std::string word;
std::cin >> word; // input from the file in.txt
std::cout << fg::blue << word << " "; // output to the file out.txt
f(); // call function
std::cin.rdbuf(Orig_cinbuf); // reset to standard input again
std::cout.rdbuf(Orig_coutbuf); // reset to standard output again
std::cin >> word; // input from the standard input
std::cout << word; // output to the standard input
return 0;
}
void f()
{
std::string line;
while(std::getline(std::cin, line)) // input from the file in.txt
{
std::cout << fg::green << line << "\n"; // output to the file out.txt
}
}
bool supportsColor()
{
if(const char *env_p = std::getenv("TERM")) {
const char *const term[8] = {"xterm", "xterm-256", "xterm-256color",
"vt100", "color", "ansi",
"cygwin", "linux"};
for(unsigned int i = 0; i < 8; ++i) {
if(std::strcmp(env_p, term[i]) == 0) return true;
}
}
return false;
}
I've also tagged c language although this is c++ code because the relevant code is shared b/w two and I don't want to miss any suggestions
OP's question:
My main issue is, although I've tested this manually, I'm not aware of the cases this might fail or bugs it might contain. Is this the right way to do this thing? Is there anything I might be missing?
Not all terminals support all features; in addition, the TERM variable is used most often to select a particular terminal description.
The usual approach to this is to use the terminal database rather than hard-coding things. Doing that, your methods
inline bool supportsColor()
inline std::ostream &operator<<(std::ostream &os, rang::style v)
would check the terminal capabilities, e.g., using tigetnum (for the number of colors), tigetstr (for the actual escape sequences which the terminal is supposed to support). You could just as easily wrap those as the isatty function.
Further reading:
interface to terminal database
terminal database
My terminal doesn't recognize color (ncurses FAQ)
To check on POSIX that the standard output is a terminal, just use isatty(3)
if (isatty(STDOUT_FILENO)) {
/// handle the stdout is terminal case
}
You might also use /dev/tty, see tty(4); e.g. if your program myprog is started in a command pipeline like ./myprog some arguments | less you could still fopen("/dev/tty","w") to output to the controlling terminal (even if stdout is then a pipe).
Sometimes, a program is run without any controlling terminal, e.g. thru crontab(5) or at(1)

Reading the information in string format from a file

I am trying to read the logic gates names and their inputs from a file. I have been given a .bench file which gives the information about the gate name and its inputs.
I have written a code below which gives me perfect results if the information is given in the following format:
firstGate = NAND(inpA, inpB, inpC)
secGate = NAND(1, 2)
30 = NAND(A, B)
PROBLEM: But if there is a change in the "white space" before = sign , after , or at some other place then my code doesn't work. For
example, if the file is given in the following format then i am not able to read it correctly
first=NAND(inpA, inpB, inpC) //no space before and after "="
sec = NAND(1,2) //no space after ","
My code which is working for the first case is below:
int main(int argc, char* argv[])
{
//Reading the .bench file
ifstream input_file;
input_file.open("circuit.bench");
if(input_file.fail())
{
cout << "Failed to open Bench file.\n";
return 1;
}
///////
string line;
while (getline( input_file, line ))
{
///For NAND
size_t first_index_nand, second_index_nand;
string gate_name;
const string nand_str = "NAND(";
if ((first_index_nand = line.find(nand_str)) != string::npos)
{
gate_name = line.substr(0, first_index_nand - 3);
cout<<"\nGate name: "<<gate_name;
first_index_nand += nand_str.length() - 1;
cout<<"\nInput to this gate: ";
for (; first_index_nand != string::npos; first_index_nand = second_index_nand)
{
if ((second_index_nand = line.find_first_of(",)", first_index_nand)) != string::npos)
{
string input_name = line.substr(first_index_nand + 1, second_index_nand++ - first_index_nand - 1);
cout<<" "<<input_name;
}
}
}
cout<<"\n";
}
return 0;
}
Query: How should i modify my code in such a way that it should be able to read the name of gate and its inputs irrespective of their position w.r.t whitespaces?
Note: I have to deal with this problem using C++ code and its libraries only.
First answer: never write a handcrafted parser yourself :-)
1) use code generators for parsers like lex, yacc, bison ( a lot more ... )
2) you can get support for parsing from expect or regexp
3) look for serialization e.g. boost::serialize. If you modify the writer/reader it is possible to serialize into more complex formats which contains something like your configuration files.
If you really want to write your own parser, it mostly recommended to write a more or less complex state machine. But this can be done by tools much easier then by hand.
Sorr ythat I will not dig through your code, but my personal experience is, that it ends in tons of code lines to get a real working parser. And mostly the code is not maintainable anymore. So I want to advice you to use one of the three ( or any other option ) I provided :-)
You should do as #Rook and #Klaus suggested , maybe using a simple xml file without a dtd and a libraty like Xerces http://xerces.apache.org/xerces-c/.
If you want to use your file format you should remove all the white spaces by hand you can find how for example here: What's the best way to trim std::string? or here: remove whitespace in std::string.
Only after that you can extract the data with your algorithm.
Anyway try this it should work.
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
using namespace std;
string trimWhiteSpaces(const string& line)
{
string l = line;
l.erase(std::remove_if( l.begin(), l.end(), ::isspace ), l.end());
return l;
}
int main(int argc, char* argv[])
{
cout << "starting... \n";
ifstream _ifile;
string fname = "gates.bench";
_ifile.open(fname.c_str());
if(!_ifile.is_open())
{
cerr << "Failed to open Bench file" << endl;
exit(1);
}
string line;
while(getline(_ifile, line))
{
line = trimWhiteSpaces(line);
size_t first_index_nand, second_index_nand;
string gate_name;
const string nand_str = "NAND(";
if ((first_index_nand = line.find(nand_str)) != string::npos)
{
gate_name = line.substr(0, first_index_nand - 3);
cout<<"\nGate name: "<<gate_name;
first_index_nand += nand_str.length() - 1;
cout<<"\nInput to this gate: ";
for (; first_index_nand != string::npos; first_index_nand = second_index_nand)
{
if ((second_index_nand = line.find_first_of(",)", first_index_nand)) != string::npos)
{
string input_name = line.substr(first_index_nand + 1, second_index_nand++ - first_index_nand - 1);
cout<<" "<<input_name;
}
}
}
cout<<"\n";
}
}
With a more OO approch
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
using namespace std;
class FileParser
{
public:
FileParser (const string fname)
{
ifile.open(fname.c_str());
if(!ifile.is_open())
{
exit(1);
}
}
~FileParser()
{
ifile.close();
}
void Parse()
{
string line;
while(getline(ifile, line)){
line = trimWhiteSpaces(line);
size_t first_index_nand, second_index_nand;
string gate_name;
const string nand_str = "NAND(";
if ((first_index_nand = line.find(nand_str)) != string::npos)
{
gate_name = line.substr(0, first_index_nand - 3);
cout<<"\nGate name: "<<gate_name;
first_index_nand += nand_str.length() - 1;
cout<<"\nInput to this gate: ";
for (; first_index_nand != string::npos; first_index_nand = second_index_nand)
{
if ((second_index_nand = line.find_first_of(",)", first_index_nand)) != string::npos)
{
string input_name = line.substr(first_index_nand + 1, second_index_nand++ - first_index_nand - 1);
cout<<" "<<input_name;
}
}
}
cout<<"\n";
}
}
private:
string trimWhiteSpaces(const string& line)
{
string l = line;
l.erase(std::remove_if( l.begin(), l.end(), ::isspace ), l.end());
return l;
}
ifstream ifile;
};
int main(int argc, char* argv[])
{
FileParser fP("gates.bench");
fP.Parse();
}

Read a string line by line using c++

I have a std::string with multiple lines and I need to read it line by line.
Please show me how to do it with a small example.
Ex: I have a string string h;
h will be:
Hello there.
How are you today?
I am fine, thank you.
I need to extract Hello there., How are you today?, and I am fine, thank you. somehow.
#include <sstream>
#include <iostream>
int main() {
std::istringstream f("line1\nline2\nline3");
std::string line;
while (std::getline(f, line)) {
std::cout << line << std::endl;
}
}
There are several ways to do that.
You can use std::string::find in a loop for '\n' characters and substr() between the positions.
You can use std::istringstream and std::getline( istr, line ) (Probably the easiest)
You can use boost::tokenize
this would help you :
http://www.cplusplus.com/reference/iostream/istream/getline/
If you'd rather not use streams:
int main() {
string out = "line1\nline2\nline3";
size_t start = 0;
size_t end;
while (1) {
string this_line;
if ((end = out.find("\n", start)) == string::npos) {
if (!(this_line = out.substr(start)).empty()) {
printf("%s\n", this_line.c_str());
}
break;
}
this_line = out.substr(start, end - start);
printf("%s\n", this_line.c_str());
start = end + 1;
}
}
I was looking for some standard implementation for a function which can return a particular line from a string. I came across this question and the accepted answer is very useful. I also have my own implementation which I would like to share:
// CODE: A
std::string getLine(const std::string& str, int line)
{
size_t pos = 0;
if (line < 0)
return std::string();
while ((line-- > 0) and (pos < str.length()))
pos = str.find("\n", pos) + 1;
if (pos >= str.length())
return std::string();
size_t end = str.find("\n", pos);
return str.substr(pos, (end == std::string::npos ? std::string::npos : (end - pos + 1)));
}
But I have replaced my own implementation with the one shown in the accepted answer as it uses standard function and would be less bug-prone..
// CODE: B
std::string getLine(const std::string& str, int lineNo)
{
std::string line;
std::istringstream stream(str);
while (lineNo-- >= 0)
std::getline(stream, line);
return line;
}
There is behavioral difference between the two implementations. CODE: B removes the newline from each line it returns. CODE: A doesn't remove newline.
My intention of posting my answer to this not-active question is to make others see possible implementations.
NOTE:
I didn't want any kind of optimization and wanted to perform a task given to me in a Hackathon!

How to replace all occurrences of a character in string?

What is the effective way to replace all occurrences of a character with another character in std::string?
std::string doesn't contain such function but you could use stand-alone replace function from algorithm header.
#include <algorithm>
#include <string>
void some_func() {
std::string s = "example string";
std::replace( s.begin(), s.end(), 'x', 'y'); // replace all 'x' to 'y'
}
The question is centered on character replacement, but, as I found this page very useful (especially Konrad's remark), I'd like to share this more generalized implementation, which allows to deal with substrings as well:
std::string ReplaceAll(std::string str, const std::string& from, const std::string& to) {
size_t start_pos = 0;
while((start_pos = str.find(from, start_pos)) != std::string::npos) {
str.replace(start_pos, from.length(), to);
start_pos += to.length(); // Handles case where 'to' is a substring of 'from'
}
return str;
}
Usage:
std::cout << ReplaceAll(string("Number Of Beans"), std::string(" "), std::string("_")) << std::endl;
std::cout << ReplaceAll(string("ghghjghugtghty"), std::string("gh"), std::string("X")) << std::endl;
std::cout << ReplaceAll(string("ghghjghugtghty"), std::string("gh"), std::string("h")) << std::endl;
Outputs:
Number_Of_Beans
XXjXugtXty
hhjhugthty
EDIT:
The above can be implemented in a more suitable way, in case performance is of your concern, by returning nothing (void) and performing the changes "in-place"; that is, by directly modifying the string argument str, passed by reference instead of by value. This would avoid an extra costly copy of the original string by overwriting it.
Code :
static inline void ReplaceAll2(std::string &str, const std::string& from, const std::string& to)
{
// Same inner code...
// No return statement
}
Hope this will be helpful for some others...
I thought I'd toss in the boost solution as well:
#include <boost/algorithm/string/replace.hpp>
// in place
std::string in_place = "blah#blah";
boost::replace_all(in_place, "#", "#");
// copy
const std::string input = "blah#blah";
std::string output = boost::replace_all_copy(input, "#", "#");
Imagine a large binary blob where all 0x00 bytes shall be replaced by "\1\x30" and all 0x01 bytes by "\1\x31" because the transport protocol allows no \0-bytes.
In cases where:
the replacing and the to-replaced string have different lengths,
there are many occurences of the to-replaced string within the source string and
the source string is large,
the provided solutions cannot be applied (because they replace only single characters) or have a performance problem, because they would call string::replace several times which generates copies of the size of the blob over and over.
(I do not know the boost solution, maybe it is OK from that perspective)
This one walks along all occurrences in the source string and builds the new string piece by piece once:
void replaceAll(std::string& source, const std::string& from, const std::string& to)
{
std::string newString;
newString.reserve(source.length()); // avoids a few memory allocations
std::string::size_type lastPos = 0;
std::string::size_type findPos;
while(std::string::npos != (findPos = source.find(from, lastPos)))
{
newString.append(source, lastPos, findPos - lastPos);
newString += to;
lastPos = findPos + from.length();
}
// Care for the rest after last occurrence
newString += source.substr(lastPos);
source.swap(newString);
}
A simple find and replace for a single character would go something like:
s.replace(s.find("x"), 1, "y")
To do this for the whole string, the easy thing to do would be to loop until your s.find starts returning npos. I suppose you could also catch range_error to exit the loop, but that's kinda ugly.
For completeness, here's how to do it with std::regex.
#include <regex>
#include <string>
int main()
{
const std::string s = "example string";
const std::string r = std::regex_replace(s, std::regex("x"), "y");
}
If you're looking to replace more than a single character, and are dealing only with std::string, then this snippet would work, replacing sNeedle in sHaystack with sReplace, and sNeedle and sReplace do not need to be the same size. This routine uses the while loop to replace all occurrences, rather than just the first one found from left to right.
while(sHaystack.find(sNeedle) != std::string::npos) {
sHaystack.replace(sHaystack.find(sNeedle),sNeedle.size(),sReplace);
}
As Kirill suggested, either use the replace method or iterate along the string replacing each char independently.
Alternatively you can use the find method or find_first_of depending on what you need to do. None of these solutions will do the job in one go, but with a few extra lines of code you ought to make them work for you. :-)
What about Abseil StrReplaceAll? From the header file:
// This file defines `absl::StrReplaceAll()`, a general-purpose string
// replacement function designed for large, arbitrary text substitutions,
// especially on strings which you are receiving from some other system for
// further processing (e.g. processing regular expressions, escaping HTML
// entities, etc.). `StrReplaceAll` is designed to be efficient even when only
// one substitution is being performed, or when substitution is rare.
//
// If the string being modified is known at compile-time, and the substitutions
// vary, `absl::Substitute()` may be a better choice.
//
// Example:
//
// std::string html_escaped = absl::StrReplaceAll(user_input, {
// {"&", "&"},
// {"<", "<"},
// {">", ">"},
// {"\"", """},
// {"'", "'"}});
#include <iostream>
#include <string>
using namespace std;
// Replace function..
string replace(string word, string target, string replacement){
int len, loop=0;
string nword="", let;
len=word.length();
len--;
while(loop<=len){
let=word.substr(loop, 1);
if(let==target){
nword=nword+replacement;
}else{
nword=nword+let;
}
loop++;
}
return nword;
}
//Main..
int main() {
string word;
cout<<"Enter Word: ";
cin>>word;
cout<<replace(word, "x", "y")<<endl;
return 0;
}
Old School :-)
std::string str = "H:/recursos/audio/youtube/libre/falta/";
for (int i = 0; i < str.size(); i++) {
if (str[i] == '/') {
str[i] = '\\';
}
}
std::cout << str;
Result:
H:\recursos\audio\youtube\libre\falta\
For simple situations this works pretty well without using any other library then std::string (which is already in use).
Replace all occurences of character a with character b in some_string:
for (size_t i = 0; i < some_string.size(); ++i) {
if (some_string[i] == 'a') {
some_string.replace(i, 1, "b");
}
}
If the string is large or multiple calls to replace is an issue, you can apply the technique mentioned in this answer: https://stackoverflow.com/a/29752943/3622300
here's a solution i rolled, in a maximal DRI spirit.
it will search sNeedle in sHaystack and replace it by sReplace,
nTimes if non 0, else all the sNeedle occurences.
it will not search again in the replaced text.
std::string str_replace(
std::string sHaystack, std::string sNeedle, std::string sReplace,
size_t nTimes=0)
{
size_t found = 0, pos = 0, c = 0;
size_t len = sNeedle.size();
size_t replen = sReplace.size();
std::string input(sHaystack);
do {
found = input.find(sNeedle, pos);
if (found == std::string::npos) {
break;
}
input.replace(found, len, sReplace);
pos = found + replen;
++c;
} while(!nTimes || c < nTimes);
return input;
}
I think I'd use std::replace_if()
A simple character-replacer (requested by OP) can be written by using standard library functions.
For an in-place version:
#include <string>
#include <algorithm>
void replace_char(std::string& in,
std::string::value_type srch,
std::string::value_type repl)
{
std::replace_if(std::begin(in), std::end(in),
[&srch](std::string::value_type v) { return v==srch; },
repl);
return;
}
and an overload that returns a copy if the input is a const string:
std::string replace_char(std::string const& in,
std::string::value_type srch,
std::string::value_type repl)
{
std::string result{ in };
replace_char(result, srch, repl);
return result;
}
This works! I used something similar to this for a bookstore app, where the inventory was stored in a CSV (like a .dat file). But in the case of a single char, meaning the replacer is only a single char, e.g.'|', it must be in double quotes "|" in order not to throw an invalid conversion const char.
#include <iostream>
#include <string>
using namespace std;
int main()
{
int count = 0; // for the number of occurences.
// final hold variable of corrected word up to the npos=j
string holdWord = "";
// a temp var in order to replace 0 to new npos
string holdTemp = "";
// a csv for a an entry in a book store
string holdLetter = "Big Java 7th Ed,Horstman,978-1118431115,99.85";
// j = npos
for (int j = 0; j < holdLetter.length(); j++) {
if (holdLetter[j] == ',') {
if ( count == 0 )
{
holdWord = holdLetter.replace(j, 1, " | ");
}
else {
string holdTemp1 = holdLetter.replace(j, 1, " | ");
// since replacement is three positions in length,
// must replace new replacement's 0 to npos-3, with
// the 0 to npos - 3 of the old replacement
holdTemp = holdTemp1.replace(0, j-3, holdWord, 0, j-3);
holdWord = "";
holdWord = holdTemp;
}
holdTemp = "";
count++;
}
}
cout << holdWord << endl;
return 0;
}
// result:
Big Java 7th Ed | Horstman | 978-1118431115 | 99.85
Uncustomarily I am using CentOS currently, so my compiler version is below . The C++ version (g++), C++98 default:
g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This is not the only method missing from the standard library, it was intended be low level.
This use case and many other are covered by general libraries such as:
POCO
Abseil
Boost
QtCore
QtCore & QString has my preference: it supports UTF8 and uses less templates, which means understandable errors and faster compilation. It uses the "q" prefix which makes namespaces unnecessary and simplifies headers.
Boost often generates hideous error messages and slow compile time.
POCO seems to be a reasonable compromise.
How about replace any character string with any character string using only good-old C string functions?
char original[256]="First Line\nNext Line\n", dest[256]="";
char* replace_this = "\n"; // this is now a single character but could be any string
char* with_this = "\r\n"; // this is 2 characters but could be of any length
/* get the first token */
char* token = strtok(original, replace_this);
/* walk through other tokens */
while (token != NULL) {
strcat(dest, token);
strcat(dest, with_this);
token = strtok(NULL, replace_this);
}
dest should now have what we are looking for.