PX Transform Routine compile issues - c++

I have a transformer routine written in C++ that is set to clear all whitespace and map to a value if the input string is either null or empty. The c++ code compiles and has tested properly, but I am having trouble getting the routine to work in Datastage.
As per instructions, I have copied the exact compiler options that I have in my DS Environment as below.
g++ -c -O -fPIC -Wno-deprecated -m64 -mtune=generic -mcmodel=small BlankToValue.cpp
g++ -shared -m64 BlankToValue.so BlankToValue.o
When testing the routine in a job however I get the following error.
Sequential_File_36,0: Internal Error: (shbuf): iomgr/iomgr.C: 2649
Is there a different set of options I should be using for compilation?
For reference, the c++ code.
#include <stdlib.h>
#include <stdio.h>
#include <algorithm>
#include <locale.h>
#include <locale>
char * BlankToValue(char *InStr, char *RepStr)
{
if (InStr[0] == '\0') // Check for null pointer at first character of input string.
{
return RepStr; // Return replacement string if true. This is to prevent unnecessary processing.
} else
{
const char* checkstr = InStr; // Establish copy of inputstring stored in checkstring.
do { // Start outer loop.
while (isspace(*checkstr)) { // Inner loop while current checkstring byte is whitespace.
++checkstr; // Increment to next checkstring byte.
}
} while ((*InStr++ = *checkstr++)); // Set inputstring byte to current checkstring and iterate both. Breaks when either string evaluates to null.
*InStr = '\0'; // Set null terminator for input string at current byte location.
if (InStr[0] == '\0') // Checks first character of cleaned input string for null pointer.
{
return RepStr; // Return replacement string if true.
} else
{
return InStr; // Return new input string if false.
}
}
}

William,
in your DataStage routine definition that points to this custom function, did you select routine type as object (.o file that is compiled into transformer stage at job run time) or a library (a lib.so file that is loaded at job run time but has requirements on library naming convention and that library is located in library path). Your code above suggested you are creating a *.so file but not prefixed with lib. Here is an example:
https://www.ibm.com/support/pages/node/403041
Additionally, if the first error in job log was not a library load error but rather was the internal error (shbuf) error, I found a couple of cases where that has occurred in the past with custom routines:
Custom routine involved null handling, as does yours, and began to fail after upgrading to Information Server 8.5 when null handling rules changed in our product. The changes are explained here:
https://www.ibm.com/support/pages/node/433863
You could test if this is the issue by running the job with new job level environment variable: APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING=1
In another case, the shbuf error in custom routine was the result of transformer stage receiving a large record (larger than could be handled by the datatype defined in the custom routine). Does the job still fail when using only a single sample input record with small values in all string type fields.
Thanks.

Also, I noticed that the error is coming from sequential file stage, rather than the transformer stage that is using the custom routine. Thus may also need to consider what is the output datatype for your custom routine and ensure that it is exiting with valid value that is not too large for the datatype and also not larger that default transport buffer size used between stages (defaults to 128k).

After a day or two of multiple attempts to try different compile and code approaches I found the solution to my problem. The below code was throwing a segmentation fault when passed a null column. Which makes sense in retrospect.
if (InStr[0] == '\0')
It has been corrected to the below and now everything works properly.
if ((InStr == NULL) || (InStr[0] == '\0'))

Related

Turbo C++ system function running an executable

How to run any exe file from turbo c++? I know that I should stop using turbo c++ and move one to Dev or Code::Blocks, but my school doesn't agree so I gotta wing it.
I just want to know how to run a file with or without the system() function.
Any kind of advice is welcome
Here's what I have tried so far:
1
#include<process.h>
int main()
{
system("tnfsv13.exe"); //tnfsv being a 16-bit application(The need for slowness v 13)
return 0;
}
2
#include<process.h>
int main()
{
system("tnfsv13.bat");
return 0;
}
tnfsv13.bat:
start "c:\TurboC3\BIN\" tnfsv13.exe
NOTE: Just a doubt, you guys: system() is not working in windows XP. I tried it using dosbox in windows 7 and it works well, but in XP it does absolutely nothing. Not even the system("dir") command seems to work but system(NULL) returns 1. Any guesses why?
Thanks.
You can also use Turbo C++'s execl() function. execl() loads and runs C:\\TC\\BIN\\tnfsv13.exe. NULL means there are no arguments to send to tnfsv13.exe. If an error occurs, execl() returns -1 into int c .
#include<stdio.h>
#include<process.h>
int main()
{
int c = execl("C:\\TC\\BIN\\tnfsv13.exe", NULL);
return 0;
}
Explanation:
execl() loads and executes a new child process. Because the child
process is placed in the memory currently occupied by the calling
process, there must be sufficient memory to load and execute it.
'pathname' specifies the file name of the child process. If
'pathname' has a file name extension, then only that file is searched
for. If 'pathname' ends with a period (.), then 'pathname' without an
extension is searched for. If that filename is not found, then
".EXE" is appended and execl() searches again. If 'pathname' has no
extension and does not end with a period, then execl() searches for
'pathname' and, if it is not found, appends ".COM" and searches
again. If that is not found, it appends ".EXE" and searches again.
'arg0', 'arg1',...'argn' are passed to the child process as command-
line parameters. A NULL pointer must follow 'argn' to terminate the
list of arguments. 'arg0' must not be NULL, and is usually set to
'pathname'.
The combined length of all the strings forming the argument list
passed to the child process must not exceed 128 bytes. This includes
"n" (for 0-n arguments) space characters (required to separate the
arguments) but does not include the null ('\0') terminating
character.
Returns: If execl() is successful, it does not return to the
calling process. (See the spawn...() routines for a
similar function that can return to the calling
process). If an error occurs, execl() returns -1 to
the calling process. On error, 'errno' (defined in
<errno.h>) is set to one of the following values
(defined in <errno.h>):
E2BIG Argument list or environment list too big.
(List > 128 bytes, or environment > 32k)
EACCES Locking or sharing violation on file.
(MS-DOS 3.0 and later)
EMFILE Too many files open.
ENOENT File or path not found.
ENOEXEC File not executable.
ENOMEM Not enough memory.
Notes: Any file open when an exec call is made remains open
in the child process. This includes
'stdin','stdout', 'stderr', 'stdaux', and 'stdprn'.
The child process acquires the environment of the
calling process.
execl() does not preserve the translation modes of
open files. Use setmode() in the child process to
set the desired translation modes.
See the spawn...() routines for similar though more
flexible functions that can return to the calling
program.
Caution: The file pointers to open buffered files are not
always preserved correctly. The information in the
buffer may be lost.
Signal settings are not preserved. They are reset to
the default in the child process.
-------------------------------- Example ---------------------------------
The following statements transfer execution to the child process
"child.exe" and pass it the three arguments "child", "arg1",
and"arg2":
#include <process.h> /* for 'execl' */
#include <stdio.h> /* for 'printf' and 'NULL' */
#include <errno.h> /* for 'errno', 'ENOENT' and 'ENOMEM' */
main()
{
execl("child.exe", "child", "arg1", "arg2", NULL);
/* only get here on an exec error */
if (errno == ENOENT)
printf("child.exe not found in current directory\n");
else if (errno == ENOMEM)
printf("not enough memory to execute child.exe\n");
else
printf(" error #%d trying to exec child.exe\n", errno);
}
system() works fine, though it may not work exactly the way you expect: it does the same thing as typing a command at a MSDOS (or Win32) command prompt including input and output being connected to the console.
If you just want to run a program, pass parameters, and not return from it, use a convenient form from the exec() family of functions. See this for one example.

How to do proper error handling in BNFC? (C++, Flex, Bison)

I'm making a compiler in BNFC and it's got to a stage where it already compiles some stuff and the code works on my device. But before shipping it, I want my compiler to return proper error messages when the user tries to compile an invalid program.
I found how bison can write error on the stderr stream and I'm able to catch those. Now suppose the user's code has no syntax error, it just references an undefined variable, I'm able to catch this in my visitor, but I can't know what the line number was, how can I find the line number?
In bison you can access the starting and ending position of the current expression using the variable #$, which contains a struct with the members first_column, first_line, last_column and last_line. Similarly #1 etc. contain the same information for the sub-expressions $1 etc. respectively.
In order to have access to the same information later, you need to write it into your ast. So add a field to your AST node types to store the location and then set that field when creating the node in your bison file.
(previous answer is richer) but in some simple parsers if we declare
%option yylineno
in flex, and print it in yyerror,
yyerror(char *s) {
fprintf(stderr,"ERROR (line %d):before '%s'\n-%s",yylineno, yytext,s);
}
sometimes it help...

C++ output to windows terminal using cout<<term_cc<color, default, attrib> outputs colors and attributes properly on Windows but not on Linux

Thought I was done and ready to submit this little project until I got this unexpected curveball.
The objective is to make a parser using a token lexer.
Essentially
<underline><red> R <green> G </green> <blue> B </blue> and back to red </red></underline>
will output as: "RGB and back to red" in their respective colors and attributes.
Everything works fine on windows but when I moved it over to the Linux systems it outputs the color codes with nothing happening.
#include <iostream>
#include <sstream>
#include <stack>
#include <map>
#include <cstdlib>
#include <vector>
#include "cmd.h"
#include "Lexer.h" // you should make use of the provided Lexer
#include "term_control.h"
#include "error_handling.h"
using namespace std;
map<string, term_colors_t> colorMap;
map<string, term_attrib_t> attribMap;
string display(const string& expression)
{
if(validate(expression) == "VALID") {
Lexer lex;
Token tok;
vector<term_colors_t> colorVect;
vector<term_attrib_t> attribVect;
lex.set_input(expression);
while(lex.has_more_token()){
tok = lex.next_token();
string sTok = tok.value;
if(tok.type == TAG && tok.value.at(0) != '/'){
cout<<term_cc(colorMap[tok.value], DEFAULT_COLOR, attribMap[tok.value]);
colorVect.push_back(colorMap[tok.value]);
attribVect.push_back(attribMap[tok.value]);
}
if(tok.type == TAG && tok.value.at(0) == '/'){
colorVect.pop_back();
cout<<term_cc(colorVect.back(), DEFAULT_COLOR, attribVect.back());
}
if(tok.type != TAG){
cout<<tok.value;
}
}
}
else if(validate(expression) != "VALID") return validate(expression);
return "";
}
_
cout<term_cc(Color, DEFAULT_COLOR, Attribute)
is the specfic method where the problem is hiding I have been searching around and can't seem to find the proper method.
cout<<term_fg(color)
that method properly displays color on the Linux system but I cannot have attributes with that method.
Everything I've been reading pertained only to color not color and attributes they were also using the echo command and hard coded colors for specific terminals. These would require serious changes in all my code and cause it to not work on Windows and only on Linux so I'm trying to avoid this.
Thanks in advance for any advice on this problem everyone I appreciate it hopefully I'll be able to get this in before 12!
It's not clear to me where colorMap and attribMap are initialized and to what values, and I'm just going on instinct here, but it seems likely that the keys for colorMap are colours and the keys for attribMap are attributes. In that case underline is not a key in colorMap and red is not a key in attribMap.
In your program, you do the following:
if(tok.type == TAG && tok.value.at(0) != '/'){
cout<<term_cc(colorMap[tok.value], DEFAULT_COLOR, attribMap[tok.value]);
which assumes that every TAG is present in both colorMap and attribMap. But if the tag is a colour like "red", it is (probably) only in colorMap and if it is an attribute like "underline", it is (probably) only in attribMap.
Now, what happens when you execute colorMap["underline"]? Here, the convenience of C++'s standard library can be a bit of a disadvantage because it silently hides an error. The answer is that a mapping from "underline" to the default value of a term_colors_t is added to the map so that the lookup will always return something. term_colors_t is an enum, so its default value is 0 (not '0').
Now, term_cc -- if it is the same term_cc that #MikePetch dug up -- does not check its arguments for validity; it just assumes that they are valid ANSI digits ('0' through '9', or in other words a number between 48 and 57, inclusive.) Since it doesn't check them, it just outputs includes them as they are in its output, and since you're (probably) calling term_cc with an attributes argument of 0 -- that is, a NUL character -- it outputs the NUL as part of the supposed console code.
I checked xterm, konsole and the Linux console, and all of them ignore the NUL character. (I believe that is the expected behaviour; DEC terminals like the VT-100 ignored NULs, although in some circumstances you needed to insert them because the terminal would also ignore any character if the previous control took too long.) I don't know what terminal emulator you are using, and it is quite possible that it has different behaviour, such as terminating the control code sequence. term_cc outputs the attribute first, even though it is the third argument, so it could well be that a NUL attribute would cause the terminal emulator to simply print something like ;31;49m instead of setting the foreground colour to red.
Some other bugs:
You never pop attribVect; only colorVect. So I don't see how the attributes will be properly restored.
You don't initialize colorVect to a DEFAULT_COLOR. So after the first tag is popped, you'll pop the (only) element off of colorVect, leaving it empty, and then call colorVect.back(), which is undefined if colorVect is empty.
Those were just the things I notice on a quick skim through the code. There might be other problems.

C++: Rename instead of Delete & Copy when using Sync

Currently I have the following part code in my Sync:
...
int index = file.find(remoteDir);
if(index >= 0){
file.erase(index, remoteDir.size());
file.insert(index, localDir);
}
...
// Uses PUT command on the file
Now I want to do the following instead:
If a file is the same as before, except for a rename, don't use the PUT command, but use the Rename command instead
TL;DR: Is there a way to check whether a file is the same as before except for a rename that occurred? So a way to compare both files (with different names) to see if they are the same?
check the md5sum, if it is different then the file is modified.
md5 check sum of a renamed file will remain same. Any change in content of file will give a different value.
I first tried to use Renjith method with md5, but I couldn't get it working (maybe it's because my C++ is for windows instead of Linux, I dunno.)
So instead I wrote my own function that does the following:
First check if the file is the exact same size (if this isn't the case we can just return false for the function instead of continuing).
If the sizes do match, continue checking the file-buffer per BUFFER_SIZE (in my case this is 1024). If the entire buffer of the file matches, return true.
PS: Make sure to close any open streams before returning.. My mistake here was that I had the code to close one stream after the return-statement (so it was never called), and therefore I had errno 13 when trying to rename the file.

Sporadic segfault in c++ python extension

I have a python object which accesses and downloads some text via HTTP.
I'm running this python object, and processing that text, using a c++ code.
I.e.
/* CPPCode.cxx */
int main(...) {
for(int i = 0; i < numURLs; i++) {
// Python method returns a string
PyObject *pyValue = PyObject_CallMethod(pyObjectInstance, pyFunctionName, par1, par2....);
string valString = PyString_AsString(pHistValue);
// ... process string ...
}
}
/* PyObject.py */
class PyClass:
def PyFunction(...):
try: urlSock = urllib.urlopen(urlName)
except ...
while(...) :
dataStr = urlSock.readline()
# do some basic string processing....
return dataStr
Most URLs work fine---the c++ code gets the proper string, I can process it, all is happy and well. A few particular URLs which look (basically) the same as the others on a browser, lead to a segfault in the PyString_AsString() method:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00000000000000b2
0x000000010007716d in PyString_AsString ()
If I print out the string that should be returned by the python method ('dataStr' in the pseudo-code above), it looks fine! I have no idea what could be causing this problem---any tips on how to procede would be appreciated!
Thanks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SOLUTION:
The template code I was using had a call to
Py_DECREF(pyValue)
before I called
PyString_AsString(pyValue)
Why it was being deallocated for certain particular function calls, I have no idea. As 'Gecco' says in the comments below,
'PyString_AsString documentation says: "The pointer refers to the internal buffer of string, not a copy. The data must not be modified in any way, unless the string was just created using PyString_FromStringAndSize(NULL, size). It must not be deallocated." '
PyString_AsString documentation says: "The pointer refers to the internal buffer of string, not a copy. The data must not be modified in any way, unless the string was just created using PyString_FromStringAndSize(NULL, size). It must not be deallocated."
Please ensure you do not deallocate this buffer
If you compile your C code with the -g debug flag (in GCC at least) then you can run your python code using the gnu debugger gdb:
$ gdb /path/to/python/compiled/against
... blah ...
(gdb) run PyObject.py
and you should catch your segfault.
My guess is the Py_DECREF is somehow getting a NULL value.