The application's purpose is to translate lemmas of words present in the sentence from Russian to English. I'm doing it with help of sdict formatted vocabulary, which is queried by python script which is called by c++ program.
My purpose is to get the following output :
Выставка/exhibition::1 конгресс/congress::2 организаторами/organizer::3 которой/ which::4 являются/appear::5 РАО/NONE::6 ЕЭС/NONE::7 России/NONE::8 EESR/NONE::9 нефтяная/oil::10 компания/company::11 ЮКОС/NONE::12 YUKOS/NONE::13 и/and::14 администрация/administration::15 Томской/NONE::16 области/region::17 продлится/last::18 четыре/four::19 дня/day::20
The following code succeeded for the sentence, however for the second sentence and so on I get a wrong output:
Егор/NONE::1 Гайдар/NONE::2 возглавлял/NONE::3 первое/head::4 российское/first::5 правительство/NONE::6 которое/government::7 называли/which::8 правительством/call::9 камикадзе/government::10
Note: NONE is used for words lacking translation.
I'm running the following C++ code excerpt which actually calls PyRun_SimpleString:
for (unsigned int i = 0; i < theSentenceRows->size(); i++){
stringstream ss;
ss << (i + 1);
parsedFormattedOutput << theSentenceRows->at(i)[FORMINDEX] << "/";
getline(lemmaOutFileForTranslation, lemma);
PyObject *main_module, *main_dict;
PyObject *toTranslate_obj, *translation, *emptyString;
/* Setup the __main__ module for us to use */
main_module = PyImport_ImportModule("__main__");
main_dict = PyModule_GetDict(main_module);
/* Inject a variable into __main__, in this case toTranslate */
toTranslate_obj = PyString_FromString(lemma.c_str());
PyDict_SetItemString(main_dict, "start_word", toTranslate_obj);
/* Run the code snippet above in the current environment */
PyRun_SimpleString(pycode);
**usleep(2);**
translation = PyDict_GetItemString(main_dict, "translation");
Py_XDECREF(toTranslate_obj);
/* writing results */
parsedFormattedOutput << PyString_AsString(translation) << "::" << ss.str() << " ";
Where pycode is defined as:
const char *pycode =
"import sys\n"
"import re\n"
"import sdictviewer.formats.dct.sdict as sdict\n"
"import sdictviewer.dictutil\n"
"dictionary = sdict.SDictionary( 'rus_eng_full2.dct' )\n"
"dictionary.load()\n"
"translation = \"*NONE*\"\n"
"p = re.compile('( )([a-z]+)(.*?)( )')\n"
"for item in dictionary.get_word_list_iter(start_word):\n"
" try:\n"
" if start_word == str(item):\n"
" instance, definition = item.read_articles()[0]\n"
" translation = p.findall(definition)[0][1]\n"
" except:\n"
" continue\n";
I've noticed some delay in the second sentence's output, so I added the usleep(2); to C++ while thinking that it happens because calling PyRun_SimpleString is not synchronous. It didn't help, however and I'm not sure that this is the reason. The delay bug happens for sentences that follow and increases.
So, is the call to PyRun_SimpleString synchronous? Maybe, sharing of variable values between C++ and Python is not right?
Thank you in advance.
According to the docs, it is synchronous.
I would advise you to test the python code seperately from the C++ code, that would make debugging it much easier. One way of doing that is pasting the code in the interactive interpreter and executing it line by line. And when debugging, I would second Winston Ewert's comment to not discard exceptions.
Related
I use Manjaro Linux, DISTRIB_RELEASE=22.0.0, GNOME 43.1, Kernel 5.19.17-2, and I used zsh.
I decided to learn C++, but I ran into a problem. If I didn't add std::endl when outputting to the console, the symbol "%" is added.
See the screenshots attached.
Code1:
#include <iostream>
int main()
{
int age;
age = 28;
std::cout << "Age = " << age;
return 0;
}
Code2:
#include <iostream>
int main()
{
int age;
age = 28;
std::cout << "Age = " << age << std::endl;
return 0;
}
Why is this happening? All I tried was just adding std::endl. I want to know why the "%" symbol is being added.
Ah, you're omitting the final line break.
Your shell hence should (would it be very true to what your program actually produced in output) display the prompt on the same line as your output.
Now, that would look terrible and be confusing. So, instead your shell inserts a special character with a special background color to mark "hey, this isn't the program's output, but I'm still inserting a line break here, because I don't hate you, dear user".
That percentage symbol is not from your program. It's your shell trying to be sensible.
From the man page for zsh:
When a partial line is preserved, by default you will see an inverse+bold character at the end of the partial line: a % for a normal user or a # for root. If set, the shell parameter PROMPT_EOL_MARK can be used to customize how the end of partial lines are shown.
I've noticed that a lot of command line tools, wget for example, will show progress as a number or progress bar that advances as a process is completed. While the question isn't really language-specific, out of the languages I use most often for command line tools (C++, Node.js, Haskell) I haven't seen a way to do this.
Here's an example, three snapshots of a single line of Terminal as wget downloads a file:
Along with other information, wget shows a progress bar (<=>) that advances as it downloads a file. The amount of data downloaded so far (6363, 179561, 316053) and the current download speed (10.7KB/s, 65.8KB/s, 63.0KB/s) update as well. How is this done?
Ideally, please include a code sample from one or more of the three languages mentioned above.
Just print a CR (without a newline) to overwrite a line. Here is an example program in perl:
#!/usr/bin/env perl
$| = 1;
for (1..10) {
print "the count is: $_\r";
sleep(1)
}
I've also disabled output buffering ($| = 1) so that the print command sends its output to the console immediately instead of buffering it.
Haskell example:
import System.IO
import Control.Monad
import Control.Concurrent
main = do
hSetBuffering stdout NoBuffering
forM_ [1..10] $ \i -> do
putStr $ "the count is: " ++ show i ++ "\r"
threadDelay 1000000
Looking at GNU wget repo on GitHub -- progress.c
It seems they do it the same way i.e. print a \r and then overwrite.
/* Print the contents of the buffer as a one-line ASCII "image" so
that it can be overwritten next time. */
static void
display_image (char *buf)
{
bool old = log_set_save_context (false);
logputs (LOG_VERBOSE, "\r");
logputs (LOG_VERBOSE, buf);
log_set_save_context (old);
}
I can only speak about node.js, but the built-in readline module has some very basic screen handling functionality built-in. For example:
var readline = require('readline');
var c = 0;
var intvl = setInterval(function() {
// Clear entirety of current line
readline.clearLine(process.stdout, 0);
readline.cursorTo(process.stdout, 0);
process.stdout.write('Progress: ' + (++c) + '%');
if (c === 100)
clearInterval(intvl);
}, 500);
There are also third party modules if you want to get fancier, such as multimeter/meterbox and blessed/blessed-contrib.
Generally speaking though, some programs use ncurses, while others simply just manually output the ANSI escape codes to clear and redraw the current line.
They probably use the fancy ncurses library but on my Linux for my personal command-line tools I simply send '\r' to move the cursor back to the start of the line to overwrite it with new progress information.
#include <thread>
#include <chrono>
#include <iostream>
int main()
{
for(auto i = 0; i < 100; ++i)
{
std::cout << "\rprogress: " << i << "% " << std::flush;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
std::cout << "\rprogress: DONE " << std::flush;
}
As part of a bigger application I am working on a class for reading input from a text file for use in the initialization of the program. Now I am myself fairly new to programming, and I only started to learn C++ in December, so I would be very grateful for some hints and ideas on how to get started! I apologise in advance for a rather long wall of text.
The text file format is "keyword-driven" in the following way:
There are a rather small number of main/section keywords (currently 8) that need to be written in a given order. Some of them are optional, but if they are included they should adhere to the given ordering.
Example:
Suppose there are 3 potential keywords ordered like as follows:
"KEY1" (required)
"KEY2" (optional)
"KEY3" (required)
If the input file only includes the required ones, the ordering should be:
"KEY1"
"KEY3"
Otherwise it should be:
"KEY1"
"KEY2"
"KEY3"
If all the required keywords are present, and the total ordering is ok, the program should proceed by reading each section in the sequence given by the ordering.
Each section will include a (possibly large) amount of subkeywords, some of which are optional and some of which are not, but here the order does NOT matter.
Lines starting with characters '*' or '--' signify commented lines, and they should be ignored (as well as empty lines).
A line containing a keyword should (preferably) include nothing else than the keyword. At the very least, the keyword must be the first word appearing there.
I have already implemented parts of the framework, but I feel my approach so far has been rather ad-hoc. Currently I have manually created one method per section/main keyword , and the first task of the program is to scan the file for to locate these keywords and pass the necessary information on to the methods.
I first scan through the file using an std::ifstream object, removing empty and/or commented lines and storing the remaining lines in an object of type std::vector<std::string>.
Do you think this is an ok approach?
Moreover, I store the indices where each of the keywords start and stop (in two integer arrays) in this vector. This is the input to the above-mentioned methods, and it would look something like this:
bool readMAINKEY(int start, int stop);
Now I have already done this, and even though I do not find it very elegant, I guess I can keep it for the time being.
However, I feel that I need a better approach for handling the reading inside of each section, and my main issue is how should I store the keywords here? Should they be stored as arrays within a local namespace in the input class or maybe as static variables in the class? Or should they be defined locally inside relevant functions? Should I use enums? The questions are many!
Now I've started by defining the sub-keywords locally inside each readMAINKEY() method, but I found this to be less than optimal. Ideally I want to reuse as much code as possible inside each of these methods, calling upon a common readSECTION() method, and my current approach seems to lead to much code duplication and potential for error in programming. I guess the smartest thing to do would simply be to remove all the (currently 8) different readMAINKEY() methods, and use the same function for handling all kinds of keywords. There is also the possibility for having sub-sub-keywords etc. as well (i.e. a more general nested approach), so I think maybe this is the way to go, but I am unsure on how it would be best to implement it?
Once I've processed a keyword at the "bottom level", the program will expect a particular format of the following lines depending on the actual keyword. In principle each keyword will be handled differently, but here there is also potential for some code reuse by defining different "types" of keywords depending on what the program expects to do after triggering the reading of it. Common task include e.g. parsing an integer or a double array, but in principle it could be anything!
If a keyword for some reason cannot be correctly processed, the program should attempt as far as possible to use default values instead of terminating the program (if reasonable), but an error message should be written to a logfile. For optional keywords, default values will of course also be used.
In order to summarise, therefore, my main questions are the following:
1. Do you think think my approach of storing the relevant lines in a std::vector<std::string> to be reasonable?
This will of course require me to do a lot of "indexing work" to keep track of where in the vector the different keywords are located. Or should I work more "directly" with the original std::ifstream object? Or something else?
2. Given such a vector storing the lines of the text file, how I can I best go about detecting the keywords and start reading the information following them?
Here I will need to take account of possible ordering and whether a keyword is required or not. Also, I need to check if the lines following each "bottom level" keyword is in the format expected in each case.
One idea I've had is to store the keywords in different containers depending on whether they are optional or not (or maybe use object(s) of type std::map<std::string,bool>), and then remove them from the container(s) if correctly processed, but I am not sure exactly how I should go about it..
I guess there is really a thousand different ways one could answer these questions, but I would be grateful if someone more experienced could share some ideas on how to proceed. Is there e.g. a "standard" way of doing such things? Of course, a lot of details will also depend on the concrete application, but I think the general format indicated here can be used in a lot of different applications without a lot of tinkering if programmed in a good way!
UPDATE
Ok, so let my try to be more concrete. My current application is supposed to be a reservoir simulator, so as part of the input I need information about the grid/mesh, about rock and fluid properties, about wells/boundary conditions throughout the simulation and so on. At the moment I've been thinking about using (almost) the same set-up as the commercial Eclipse simulator when it comes to input, for details see
http://petrofaq.org/wiki/Eclipse_Input_Data.
However, I will probably change things a bit, so nothing is set in stone. Also, I am interested in making a more general "KeywordReader" class that with slight modifications can be adapted for use in other applications as well, at least it can be done in a reasonable amount of time.
As an example, I can post the current code that does the initial scan of the text file and locates the positions of the main keywords. As I said, I don't really like my solution very much, but it seems to work for what it needs to do.
At the top of the .cpp file I have the following namespace:
//Keywords used for reading input:
namespace KEYWORDS{
/*
* Main keywords and corresponding boolean values to signify whether or not they are required as input.
*/
enum MKEY{RUNSPEC = 0, GRID = 1, EDIT = 2, PROPS = 3, REGIONS = 4, SOLUTION = 5, SUMMARY =6, SCHEDULE = 7};
std::string mainKeywords[] = {std::string("RUNSPEC"), std::string("GRID"), std::string("EDIT"), std::string("PROPS"),
std::string("REGIONS"), std::string("SOLUTION"), std::string("SUMMARY"), std::string("SCHEDULE")};
bool required[] = {true,true,false,true,false,true,false,true};
const int n_key = 8;
}//end KEYWORDS namespace
Then further down I have the following function. I am not sure how understandable it is though..
bool InputReader::scanForMainKeywords(){
logfile << "Opening file.." << std::endl;
std::ifstream infile(filename);
//Test if file was opened. If not, write error message:
if(!infile.is_open()){
logfile << "ERROR: Could not open file! Unable to proceed!" << std::endl;
std::cout << "ERROR: Could not open file! Unable to proceed!" << std::endl;
return false;
}
else{
logfile << "Scanning for main keywords..." << std::endl;
int nkey = KEYWORDS::n_key;
//Initially no keywords have been found:
startIndex = std::vector<int>(nkey, -1);
stopIndex = std::vector<int>(nkey, -1);
//Variable used to control that the keywords are written in the correct order:
int foundIndex = -1;
//STATISTICS:
int lineCount = 0;//number of non-comment lines in text file
int commentCount = 0;//number of commented lines in text file
int emptyCount = 0;//number of empty lines in text file
//Create lines vector:
lines = std::vector<std::string>();
//Remove comments and empty lines from text file and store the result in the variable file_lines:
std::string str;
while(std::getline(infile,str)){
if(str.size()>=1 && str.at(0)=='*'){
commentCount++;
}
else if(str.size()>=2 && str.at(0)=='-' && str.at(1)=='-'){
commentCount++;
}
else if(str.size()==0){
emptyCount++;
}
else{
//Found a non-empty, non-comment line.
lines.push_back(str);//store in std::vector
//Start by checking if the first word of the line is one of the main keywords. If so, store the location of the keyword:
std::string fw = IO::getFirstWord(str);
for(int i=0;i<nkey;i++){
if(fw.compare(KEYWORDS::mainKeywords[i])==0){
if(i > foundIndex){
//Found a valid keyword!
foundIndex = i;
startIndex[i] = lineCount;//store where the keyword was found!
//logfile << "Keyword " << fw << " found at line " << lineCount << " in lines array!" << std::endl;
//std::cout << "Keyword " << fw << " found at line " << lineCount << " in lines array!" << std::endl;
break;//fw cannot equal several different keywords at the same time!
}
else{
//we have found a keyword, but in the wrong order... Terminate program:
std::cout << "ERROR: Keywords have been entered in the wrong order or been repeated! Cannot continue initialisation!" << std::endl;
logfile << "ERROR: Keywords have been entered in the wrong order or been repeated! Cannot continue initialisation!" << std::endl;
return false;
}
}
}//end for loop
lineCount++;
}//end else (found non-comment, non-empty line)
}//end while (reading ifstream)
logfile << "\n";
logfile << "FILE STATISTICS:" << std::endl;
logfile << "Number of commented lines: " << commentCount << std::endl;
logfile << "Number of non-commented lines: " << lineCount << std::endl;
logfile << "Number of empty lines: " << emptyCount << std::endl;
logfile << "\n";
/*
Print lines vector to screen:
for(int i=0;i<lines.size();i++){
std:: cout << "Line nr. " << i << " : " << lines[i] << std::endl;
}*/
/*
* So far, no keywords have been entered in the wrong order, but have all the necessary ones been found?
* Otherwise return false.
*/
for(int i=0;i<nkey;i++){
if(KEYWORDS::required[i] && startIndex[i] == -1){
logfile << "ERROR: Incorrect input of required keywords! At least " << KEYWORDS::mainKeywords[i] << " is missing!" << std::endl;;
logfile << "Cannot proceed with initialisation!" << std::endl;
std::cout << "ERROR: Incorrect input of required keywords! At least " << KEYWORDS::mainKeywords[i] << " is missing!" << std::endl;
std::cout << "Cannot proceed with initialisation!" << std::endl;
return false;
}
}
//If everything is in order, we also initialise the stopIndex array correctly:
int counter = 0;
//Find first existing keyword:
while(counter < nkey && startIndex[counter] == -1){
//Keyword doesn't exist. Leave stopindex at -1!
counter++;
}
//Store stop index of each keyword:
while(counter<nkey){
int offset = 1;
//Find next existing keyword:
while(counter+offset < nkey && startIndex[counter+offset] == -1){
offset++;
}
if(counter+offset < nkey){
stopIndex[counter] = startIndex[counter+offset]-1;
}
else{
//reached the end of array!
stopIndex[counter] = lines.size()-1;
}
counter += offset;
}//end while
/*
//Print out start/stop-index arrays to screen:
for(int i=0;i<nkey;i++){
std::cout << "Start index of " << KEYWORDS::mainKeywords[i] << " is : " << startIndex[i] << std::endl;
std::cout << "Stop index of " << KEYWORDS::mainKeywords[i] << " is : " << stopIndex[i] << std::endl;
}
*/
return true;
}//end else (file opened properly)
}//end scanForMainKeywords()
You say your purpose is to read initialization data from a text file.
Seems you need to parse (syntax analyze) this file and store the data under the right keys.
If the syntax is fixed and each construction starts with a keyword, you could write a recursive descent (LL1) parser creating a tree (each node is a stl vector of sub-branches) to store your data.
If the syntax is free, you might pick JSON or XML and use an existing parsing library.
I'm a bit confused. I'm trying to do some C++ and Python integration, but it's less than straightforward. I'm not using Boost, because I couldn't get Boost::Python to compile properly. But that's another story.
Currently, here's what I'm doing in C++:
//set everything up
PyObject* py_main_module = PyImport_AddModule("__main__");
PyObject* py_global_dict = PyModule_GetDict(py_main_module);
PyObject* py_local_dict = PyDict_New();
PyObject* py_return_value;
PyRun_SimpleString(data.c_str()); //runs Python code, which defines functions
//call a function defined by the python code
py_return_value = PyRun_String("test()", Py_single_input, py_global_dict, py_local_dict);
//attempt to check the type of the returned value
if(py_return_value != NULL) {
//this is the problem: all of these print 0
cout << PyList_Check(py_return_value) << endl;
cout << PySet_Check(py_return_value) << endl;
cout << PyFloat_Check(py_return_value) << endl;
} else {
cout << "IT WAS NULL?!" << endl;
}
The Python program (input to the C++ program as the string named "data"):
def test():
derp = 1.234
#derp = [1, 2, 3, 4]
#derp = set([1, 2, 3, 4])
return derp
Now, the problem is that the type checks aren't working. They all return 0, regardless of whether the Python function is returning a float, a list, or a set. What am I doing wrong?
Bonus points if anyone can tell me why the call to PyRun_String prints the returned value in the console. It's really annoying.
From the docs:
int Py_eval_input
The start symbol from the Python grammar for isolated expressions; for use with Py_CompileString().
int Py_file_input
The start symbol from the Python grammar for sequences of statements as read from a file or other source; for use with
Py_CompileString(). This is the symbol to use when compiling
arbitrarily long Python source code.
int Py_single_input
The start symbol from the Python grammar for a single statement; for use with Py_CompileString(). This is the symbol used for the
interactive interpreter loop.
Py_single_input evaluates the string as a statement. Statements don't inherently return anything, so you'll get None back from PyRun_String. Use Py_eval_input instead to evaluate the string as an expression and get a result.
Changing Py_single_input to Py_eval_input seems to resolve both issues.
The former treats the string as part of the interpreter loop, while the latter evaluates a single expression and gives you an object back. (I'm not sure what the return value means in the former case, but it's not the value of the expression.)
EDIT: Just tested it, and as per nneonneo's answer below, the result with Py_single_input is indeed Py_None.
I am doing some scientific work on a system with a queue. The cout gets output to a log file with name specified with command line options when submitting to the queue. However, I also want a separate output to a file, which I implement like this:
ofstream vout("potential.txt"); ...
vout<<printf("%.3f %.5f\n",Rf*BohrToA,eval(0)*hatocm);
However it gets mixed in with the output going to cout and I only get some cryptic repeating numbers in my potential.txt. Is this a buffer problem? Other instances of outputting to other files work... maybe I should move this one away from an area that is cout heavy?
You are sending the value returned by printf in vout, not the string.
You should simply do:
vout << Rf*BohrToA << " " << eval(0)*hatocm << "\n";
You are getting your C and C++ mixed together.
printf is a function from the c library which prints a formatted string to standard output. ofstream and its << operator are how you print to a file in C++ style.
You have two options here, you can either print it out the C way or the C++ way.
C style:
FILE* vout = fopen("potential.txt", "w");
fprintf(vout, "%.3f %.5f\n",Rf*BohrToA,eval(0)*hatocm);
C++ style:
#include <iomanip>
//...
ofstream vout("potential.txt");
vout << fixed << setprecision(3) << (Rf*BohrToA) << " ";
vout << setprecision(5) << (eval(0)*hatocm) << endl;
If this is on a *nix system, you can simply write your program to send its output to stdout and then use a pipe and the tee command to direct the output to one or more files as well. e.g.
$ command parameters | tee outfile
will cause the output of command to be written to outfile as well as the console.
You can also do this on Windows if you have the appropriate tools installed (such as GnuWin32).