Error when using LinkedIn's datafu package - tuples

I'm working on a project that uses the TransposeTupleToBag UDF of LinkedIn's datafu UDF compilation. Found here: https://github.com/linkedin/datafu/tree/master/src/java/datafu/pig/util. I execute the following commands in grunt shell:
REGISTER jar-file;
DEFINE Transpose datafu.pig.util.TransposeTupleToBag();
a = load data 'file' using PigStorage(',') as (schema);
b = foreach a generate select_columns_from_schema;
c = foreach b generate col1, col2, datafu.pig.util.Transpose(col3, col4...coln);
When I execute the last line, I get this error:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Instance name is null.
This should not happen unless UDFContextSignature was not set.
What am I doing wrong? How to avoid it? I have not changed any of their code as well. And I'm only using TransposeTupleToBag, FieldNotFound and AliasableEvalFunc as they were the classes required to run Transpose successfully. I even tried the same with all classes loaded and it still gave me the same error. What's going on? Please help. Thanks!

TransposeTupleToBag requires a feature in Pig 0.11 where setUDFContextSignature is called. This is used to distinguish each invocation of the UDF. This method doesn't exist in Pig 0.10.

Turns out, LinkedIn's datafu is tested on pig 0.11.1 and nothing else. I was running pig 0.10 and so it wouldn't work as there was some property that probably is not being set in pig 0.10, but perhaps was fixed in pig 0.11.1.

Related

WSO2-ML Error create model

when i try create model with "decision-tree" dataset example, generated the below error. WSO2 Machine Learner version is: 1.2.2 .
[2017-01-11 18:21:02,284] ERROR {org.wso2.carbon.ml.core.impl.MLModelHandler} - Failed to build the model [id] 9
org.wso2.carbon.ml.core.exceptions.MLModelBuilderException: An error occurred while building logistic regression model: For input string: ",06"
at org.wso2.carbon.ml.core.spark.algorithms.SupervisedSparkModelBuilder.buildLogisticRegressionModel(SupervisedSparkModelBuilder.java:322)
suggestion?
Thanks,
Emanuele
resolved - the problem is caused by italian setting system operation. That use "comma separated" character for decimal and not "dot" character. I have resolved changed the settings.

Running SAS script using %include

I'm newbie in SAS, coming from SQL, so I'm dealing with them differences.
I have a SAS program "Master.sas" that runs, among other things, something like this:
%include "c:\script1.sas";
%include "c:\script2.sas";
%include "c:\script3.sas";
The question is, if I select all of them and run it, does it run sequentially or in parallel?
For example, if script2 uses a table that is loaded in script1, will it fail to run succesfully?
Well, that example maybe sound obvious as I tested, but what happens if script1 calculate a variable, script2 will have the variable calculated or uses what it found at run time (because, for example, script2 has runned previously than script1)?
Just to clarify, I need that SAS run them sequentially, one after other.
In SQL exists "GO" to separate batch processing, i.e.:
CREATE TABLE XXXXX
GO
SELECT * FROM XXXXX
GO
If someone tries to run that script with out GO, SQL runs them in parallel producing an error on the second script telling that "table XXXXX doesn't exist".
Do I need something similar in SAS or SAS just process next when the previous has finished?
Thanks in advance!
%include will run things in sequence. SAS will run the first %include as if it were just lines in the code, then hit the second and do the same, etc.
SAS's equivalent of GO is RUN, by the by, though in most cases RUN doesn't actually have to be included (though it's considered a good practice). SAS will not run in parallel mode just because you leave out RUN, but it is what tells SAS to go ahead and run the code that was given it. This does not apply in PROC SQL, however; that does not support run-group processing, and instantly executes each statement terminated by ;.
There are ways to make it run in parallel; for example, this hands-on workshop from SUGI 29 on Parallel Processing shows how to use RSUBMIT to do so. Enterprise Guide allows for parallel processing of programs (but not %includes in one program) if you tell it to (but not by default).
%include will run things in sequence. If your code in 1st %include hits an error then your program will stop and won't process other lines.
%include will always run things in sequence.
If there is some variable being created in script1, then you can use the same in script2 but if script 1 is dependent of some variable being created in script 2, it will get error out.

how to resolve this error:Read: Data overflow/conversion error

how to resolve this error:Read: Data overflow/conversion error for [some field] .I am getting this error after running the mapping in informatica data quality 9.1.0
Please try the below steps:
1) Check for the columns which may have date values in them. If the datatypes are not compatible in any of the transformations, error may come.
2) Always debug or run the data viewer for each of the transformation before you run the IDQ mapping. It will give you an overview of the data and issues if any.

SAS code behaves differently in interactive and batch modes

I have the following code that is running inside a macro. When it is run in interactive mode, it runs absolutely fine, no errors or warning. That was the case for last two year.
The same code has now been deployed in batch mode and it generates a warning WARNING: Apparent symbolic reference FIRSTRECCOUNT not resolved. and no value assigned to macro variable.
My question is, does anyone have any ideas why batch mode and interactive mode would behave differently?
Here some more information:
The dataset is being created and it is in work library.
The dataset does get opened by data step.
`firstreccount' doesn't get initialiased anywhere else in the program
I have search sas community. There is a topic here, but I don't have the same errors in batch initilisation as described in the answer.
Detailed information on the warning but it doesn't explain by it would work in interactive mode, but not in batch mode.
.
1735 %LET FIRSTSET = work.dataset1;
1744 DATA _NULL_;
1745 IF 0 THEN
1746 SET &FIRSTSET NOBS=X;
1747 CALL SYMPUT('FIRSTRECCOUNT' ,X);
1748 STOP;
1749 RUN;
1755 DATA _NULL_;
1756 IF 0 THEN
1757 SET &SECONDSET NOBS=X;
1758 CALL SYMPUT('SECONDRECOUNT' ,X);
1759 STOP;
1760 RUN;
WARNING: Apparent symbolic reference FIRSTRECCOUNT not resolved.
Update:
So I have attempted to replicate the error by copying the code with warning into a separate scheduled flow, but it didn't cause any errors at all.
By the way, the original job was deployed from SAS DI studio. I have checked all lines in user written code nodes and made sure that the length was within 80 characters as recommended by #RawFocus, #RobertPentridge, but it didn't solve the issue.
As recomended by #data_null_ I have checked VALIDVARNAME and it was different between interactive (value of "any") and batch mode (value of "V7") but changing these hasn't made any difference.
I have rewritted the logic to get the number of observations by calling attr for an open dataset. This eliminated the warning, but program would still fail with warning popping out in different places. It made me think Robert Partridge is correct. At the same time, I got an error that a macro not being resolved. The macro was inserted by DI studio to collect performance MI even that the job wasn't meant to be collecting MI. This made me think that SAS DI studio is not generating code correctly when deploying it, so I manually edited the deployed code to remove offending macro call and I also spotted that there was one line of code with MD5 function that was too long on one line because of a number of parameters being passed to it, so I inserted some white space. And finally the problem was fixed!!
I still need to do something about the job because when it will get redeployed from SAS DI, it will generate the same errors again. I don't have time to look into this further at the moment.
Conclusion: what you write in SAS DI and what gets deployed could be slightly different which could cause syntax parse to throw errors in random places. So I will mark Robert's answer as correct because it got me closer to solving the problem then any other answer.
The problem could be happening above the code snippet you pasted. The parser got into a funk earlier, and ended up issuing warning about code that is perfectly fine.
Check to make sure that no code within a macro is longer that ~160 chars on a single line. I try to keep my code well below that but long lines of code can run fine interactively and fail in batch - particularly when inside of a macro.
I expect your program has some small error above that does not cause SAS to go into syntax check mode when run interactively but does cause SAS to set obs to 0 and enter syntax check mode when run in batch.
One possibility is the limit (in batch mode) of the length of a line in your submitted SAS program:
See: http://support.sas.com/kb/15/883.html
Which version of SAS are you running?

SubSonic Bug with TOP keyword?

The TOP keyword in the generated SQL wraps the number in brackets (I persume for SQL compact support), however this errors on my SQL 2000 server as it doesn't expect the brackets.
Example C# Code:
var doc = Logic.Document.All().FirstOrDefault(d=> d.Guid == Request.QueryString["guid"]);
Produces the following SQL error:
Line 1: Incorrect syntax near '('.
as it generates the following SQL:
exec sp_executesql N'SELECT TOP (1) .....'
If I execute the same SQL manually without the brackets the SQL executes just fine.
Is this a bug?
After futher digging around the SubSonic SourceCode I answered a resolution here:
SubSonic3: Method "FirstOrDefault" throws exception with SQL Server 2000