Extract value preceeded by a certain word in c++ output - c++

If you have a file with a text of known structure how would you extract a value preceeded by certain identifying word? Specifically, how do you extract the value from the piece of text below.
CDM-nucleon micrOMEGAs amplitudes:
proton: SI -3.443E-10
Here is how far I got with the script:
#include <string>
#include <fstream>
#include <iostream>
using namespace std;
int main()
{
string identifier;
double value;
ifstream file("output.txt");
// Commands to extract value
file.close();
return 0;
}
Thank you very much.

It depends on the complexity and size of the output file. By 'complexity' I mean, say, a possibility to find your identifying word not followed by a number and variety of keywords you want to parse.
For a big file it may be unreasonable to load it into the memory, hence possible streaming.
In your particular case, you may want to look into this:
http://www.cplusplus.com/reference/regex/
or one of the parsers suitable for streamed data:
http://en.wikipedia.org/wiki/Flex_lexical_analyser
You might also look into other languages such as perl to parse that.

Related

how to get list of all files in a directory and sub directories(full file path)

how to get list of all files in a directory and sub directories(full file path), And also show it if the file name is in Russian and Arabic
I did a lot of searching on Google but did not find anything that would solve my problem, any help is appreciated
It is better to include what you have tried so far to solve your problem. That way we can help you debug your own code which can be very good for you.
However, C++17 made iterating over directories very easy through the directory iterator. Read more about the directory iterator here and see the code below:
#include <string>
#include <iostream>
#include <filesystem>
namespace fs = std::filesystem;
int main()
{
std::string path = "/path/to/directory";
for (const auto & entry : fs::directory_iterator(path))
std::cout << entry.path() << std::endl;
}
Now second part of your question: you have entries paths, you can extract the file name out of it. Then, suppose you have two dictionaries one Russian and the other is Arabic. If you iterate over the filename character by character and check every time whether it is in the Russian dictionary or the Arabic one, you get the idea!
What you need to know is Arabic, Russian, whatever character has a Unicode code point (What's unicode?); meaning there is a unique value for it. Computers are good with 0s and 1s but not human readable characters (ASCII was made to solve this specific problem). If you are familiar with ASCII, you can consider Unicode as more inclusive character encoding standard. For instance see this page for Arabic characters encoding.
PS: Use hashtables implementations (instead of arrays) for your dictionary, since it has an amortized O(1) lookup.

Fixing text in a .txt file

I've got a homework assignment that I have no idea how to even start. The instructions are to have an input .txt file with text that contains some mistakes. I have to fix that in the output .txt file, meaning, only 1 space between words, no space before a comma/punctuation and exactly 1 space after those. Capital letters at the beginning of a sentence. It also says that I don't have to use the ASCII table, because of the fact the capital letters are coded before lower case letters?
Input text example:
jaMEs , mY neIgHBor , Is A dOcTor . he SPoke eaSIlY , CLEarly And eloQuENtly.
Output:
James, my neighbor, is a doctor. He spoke easily, clearly and eloquently.
All we did in class was go over ifstream/ofstream and inputing/changing data in a .txt file, so I have no idea where to even begin. Is there a way to solve it, so it fixes any incorrect input text, or do I have to manually change every mistake in this particular text? No need to solve it for me. An example or some tips to get me started would be greatly appreciated!
Break the problem into pieces. First, read in the data from a file. Store it however you want, probably a string, then move on to the next part. Check each character and see if it is correct. If it is, move on. If not, make it correct and then move on. When you hit the end of the input, you are done.
To check if a character is correct, you just need to check if it is and should be lower case and if it should be a character. If it should be and isn't, fix it, otherwise move on.
Inspect each character as you read it. If it's a full-stop, then remember to upcase the next alphanumeric, otherwise to downcase it. If it's a space, then just remember that you've seen a space - don't print it until you see a word character.
Something like:
#include <algorithm>
#include <cctype>
#include <iostream>
#include <iterator>
int main()
{
int(*t)(int) = std::toupper;
char const*last = "";
std::for_each(std::istreambuf_iterator<char>{std::cin},
std::istreambuf_iterator<char>{},
[&](char c){if(std::isspace(c))last=" ";
else if(std::isalnum(c=t(c)))std::cout<<last<<c,last="",t=std::tolower;
else if(c==',')std::cout<<c,last=" ";
else if(c=='.')std::cout<<c,last=" ",t=std::toupper;});
}

Trouble with special characters

First of all, I'm a newbie in programming so you might have to be patient with me. The thing is, I'm writing a program that basically gets input and uses it to output some information plus the inputs in a .doc.
My problem is that I have some constant strings that output in a screwed up way when I use special characters like é í ó ã õ º ª.
I was able to fix it by adding setlocale(LC_ALL, ("portuguese")) but then I screwed my outputs of inputs (aka variable strings) that doesn't print special characters any more. Any clues how i can solve this? I've already tried wstrings and looked everywhere but couldn't find a single solution.
I can show my code here if it helps.
Here is an example of my problem:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string a;
wcout << "Enter special characters like éíó: ";
getline (cin, a);
cout << a;
}
I can't make the constant string and the variable string output correctly in the console at the same time.
You are probably using Windows. The Windows' Command Prompt default encoding is CP850, this encoding is rarely used anywhere else and it will display most special symbols differently from what you usually see in your favorite text editor. You can try to use the Windows APIs SetConsoleOutputCP(1252); and SetConsoleCP(1252); to change to CP1252, an encoding that is somewhat more compatible and should display those symbols the same way you see in the editor. You will need the #include <windows.h>, if its available.

replace string through regex using boost C++

I have string in which tags like this comes(there are multiple such tags)
|{{nts|-2605.2348}}
I want to use boost regex to remove |{{nts| and }} and replace whole string that i have typed above with
-2605.2348
in original string
To make it more clear:
Suppose string is:
number is |{{nts|-2605.2348}}
I want string as:
number is -2605.2348
I am quite new to boost regex and read many things online but not able to get answer to this any help would be appreciated
It really depends on how specific do you want to be. Do you want to always remove exactly |{{nts|, or do you want to remove pipe, followed by {{, followed by any number of letters, followed by pipe? Or do you want to remove everything that isn't whitespace between the last space and the first part of the number?
One of the many ways to do this would be something like:
#include <iostream>
#include <boost/regex.hpp>
int main()
{
std::string str = "number is |{{nts|-2605.2348}}";
boost::regex re("\\|[^-\\d.]*(-?[\\d.]*)\\}\\}");
std::cout << regex_replace(str, re, "$1") << '\n';
}
online demo: http://liveworkspace.org/code/2B290X
However, since you're using boost, consider the much simpler and faster parsers generated by boost.spirit.

C++ Screen Scraping from HTML

i'm trying to extract the data "Lady Gaga Fame Monster" from the html below using substr and find, but i wasn't able to retrieve the data.
<div class="album-name"><strong>Album</strong> > Lady Gaga Fame Monster</div>
I'm tried to extract the whole string first, but i can only extract till Album under the command cout << line_found , as there's spacing that prevents it from proceeding further.
I try cout << extract_line. I see no spaces in the extracted html code.
I tried the tutorial based from this http://www.cplusplus.com/reference/string/string/substr/, it works, even with spaces. I'm following closely but it stops extracting once it hit spaces. Pls help really appreciated. thanks. Figuring out 2 days without any solution.
here's the source code:
#include "parser.h"
#include <stdlib.h>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>
using namespace std;
int main() {
string line_found, extract_line, result, finalResult="";
int firstPosition, secondPosition, input, location;
ifstream sourceFile ("cd1.htm"); // extracts from sourcefile
while(!sourceFile.eof())
{
sourceFile >> extract_line;
location = extract_line.find("album-name");
// cout << extract_line;
if (location >=0)
{
line_found = extract_line.substr(location);
cout << line_found << endl;
firstPosition= line_found.find_first_of(">");
result = line_found.substr(firstPosition);
}
}
return 0;
}
The >> operator doesn't fetch lines. It fetches whitespace-separated tokens. Use std::getline (see here) instead.
Better still, don't use string searching tools to parse HTML. It's a disaster waiting to happen. In fact, it's happening to you right now. Note that there is more than one instance of > in your line, so you will probably find the wrong one and get yourself in a complete muddle trying to skip all the ones that don't matter (you could try looking for " > ", but what if you encounter this: ...class="album-name" > <strong>..., which is perfectly valid HTML.
If the HTML is proper XHTML, use an XML parser instead. Expat, for instance, is small, fast and (relatively) simple to use. You can find a nice, easy intro here.
If the HTML is messy, you're going to struggle with C++. There's a related SO question here. Alternatively, use a language with a good HTML library such as Python (Beautiful Soup), which you can call from C++.
Another lightweight and simple option could be to use a regex. VS2010 and VS2008 (SP1 IIRC) come with the #include header that should allow much more control and flexibility than your approach.
It wouldn't be as robust as Marcelo's approach but would be quicker to get started with.