How to read data from a PDF using SAS Program - sas

Problem Statement:
I am unable to read data from a PDF file using SAS.
What worked well:
I am able to download the PDF from the website and save it.
Not working (Need Help):
I am not able to read the data from a PDF file using SAS. The source content structure is expected to remain the same always. Expected Output is attached as a jpg image.
It would be a great learning and help if someone knows and help me how to tackle this scenario by using SAS program.
I tried something like this:
/*Proxy address*/
%let proxy_host=xxx.com;
%let port=123;
/*Output location*/
filename output "/desktop/Response.pdf";
/*Download the source file and save it in the desired location*/
proc http
url="https://cdn.nar.realtor/sites/default/files/documents/ehs-10-2020-overview-2020-11-19_0.pdf"
method="get"
proxyhost="&proxy_host."
proxyport=&port
out=output;
run;
%let lineSize = 2000;
data base;
format text_line $&lineSize..;
infile output lrecl=&lineSize;
input text_line $;
run;
DATA _NULL_ ;
X "PS2ASCII /desktop/Response.pdf
/desktop/flatfile.txt";
RUN;

You can use Apache PDFBox® library which is an open source Java tool for working with PDF documents. The library can be utilized from within SAS Proc GROOVY with Java code that strips text and it's position on page from a PDF document.
Example:
You will have to write more code to make a data set from the stripped text.
filename overview "overview.pdf";
filename ov_text "overview.txt";
* download a pdf document;
proc http
url="https://cdn.nar.realtor/sites/default/files/documents/ehs-10-2020-overview-2020-11-19_0.pdf"
method="get"
/*proxyhost="&proxy_host." */
/*proxyport=&port */
out=overview;
run;
* download the Apache PDFBox library (a .jar file);
filename jar 'pdfbox.jar';
%if %sysfunc(FEXIST(jar)) ne 1 %then %do;
proc http
url='https://www.apache.org/dyn/closer.lua?filename=pdfbox/2.0.21/pdfbox-app-2.0.21.jar&action=download'
out=jar;
run;
%end;
* Use GROOVY to read the PDF, strip out the text and position, and write that
* parse to a text file which SAS can read;
proc groovy classpath="pdfbox.jar";
submit
"%sysfunc(pathname(overview))" /* the input, a pdf file */
"%sysfunc(pathname(ov_text))" /* the output, a text file */
;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.ArrayList;
import java.util.List;
import java.io.FileWriter;
import java.io.PrintWriter;
public class GetLinesFromPDF extends PDFTextStripper {
static List<String> lines = new ArrayList<String>();
public GetLinesFromPDF() throws IOException {
}
/**
* #throws IOException If there is an error parsing the document.
*/
public static void main( String[] args ) throws IOException {
PDDocument document = null;
PrintWriter out = null;
String inPdf = args[0];
String outTxt = args[1];
try {
document = PDDocument.load( new File(inPdf) );
PDFTextStripper stripper = new GetLinesFromPDF();
stripper.setSortByPosition( true );
stripper.setStartPage( 0 );
stripper.setEndPage( document.getNumberOfPages() );
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
out = new PrintWriter(new FileWriter(outTxt));
// print lines to text file
for(String line:lines){
out.println(line);
}
}
finally {
if( document != null ) {
document.close();
}
if( out != null ) {
out.close();
}
}
}
/**
* Override the default functionality of PDFTextStripper.writeString()
*/
#Override
protected void writeString(String str, List<TextPosition> textPositions) throws IOException {
String places = "";
for(TextPosition tp:textPositions){
places += "(" + tp.getX() + "," + tp.getY() + ") ";
}
lines.add(str + " found # " + places);
}
}
endsubmit;
quit;
* preview the stripped text that was saved;
data _null_;
infile ov_text;
input;
putlog _infile_;
run;
/*
* additional SAS code will be needed to input the text as data
* and construct a data set that matches the original tabular content layout
*/

Related

Using macro variable in an IF statement within a loop is not working

I am having an issue with my code where it is working when I am hard coding the value (in comments) in the IF statement but when I insert the macro variable, the functions 'Copy' and 'Delete' do not work with no errors generated. Below is the code being used:
*%let pathscr = //files/FEB_P000/Reporting_FS;
%let pathdes = //files/FEB_P000/Reporting_FS/Accounting log/2021;
%let fn = LFNPAccounting;
%let dt = %sysfunc(inputn(&acc_date, yymmddn8.),yymmddn8.); /* 20211209 */
%let Var = &fn&dt;/* LFNPAccounting20211209 */
data _null_;
length fref $8 fname $256;
did = filename(fref,'\\files\FEB_P000\Reporting_FS');
did = dopen(fref);
do i = 1 to dnum(did);
fname = dread(did,i);
newfn = SUBSTR(fname,1,22);
if newfn = &Var then do;
/*if newfn = 'LFNPAccounting20211209' then do;*/
rc1=filename('src',catx('/',"&pathscr",fname));
rc2=filename('des',catx('/',"&pathdes",fname));
rc3=fcopy('src','des');
rc4= fdelete('src');
end;
end;
run;*
Could anyone help please?
Thanks
Hans
I am guessing you try to look into a specified folder pathscr, and if a file matches a certain string (SUBSTR(fname,1,22)), you copy and delete the latter to the Logs folder pathdes.
libname report "/home/kermit/temp/Reporting/";
data report.have20211210
report.have20211209
report.have20211208;
id = 1;
output;
run;
%let pathscr = /home/kermit/temp/Reporting/;
%let pathdes = /home/kermit/temp/Logs/;
%let fn = have; /* Name of the file */
%let type = .sas7bdat; /* File extension */
%let dt = %sysfunc(inputn(%sysfunc(today()), yymmddn8.), yymmddn8.);
%let file = &fn&dt&type.;
%put &=file;
data _null_;
drop rc did;
rc=filename("mydir", "&pathscr.");
did=dopen("mydir");
if did > 0 then do; /* check that the directory can be opened */
do i=1 to dnum(did); /* use dnum() to determine the highest possible member number */
fname=dread(did, i); /* get the name of the file */
if fname = "&file." then do; /* if the name of the file match: */
rc=filename('src', "&pathscr&file.");
rc=filename('des', "&pathdes&file.");
rc=fcopy('src', 'des'); /* copy from source to destination */
rc=fdelete('src'); /* delete from source */
end;
end;
end;
else do; /* if directory cannot be open, put the error message to the logs */
msg=sysmsg();
put msg;
end;
run;
Logs:
FILE=have20211210.sas7bdat
DOPEN opens a directory and returns a directory identifier value (a number greater than 0) that is used to identify the open directory in other SAS external file access functions. If the directory cannot be opened, DOPEN returns 0, and you can obtain the error message by calling the SYSMSG function.
I used today() for the dt macro-variable for convenience sake, but you will have to change it to whatever date you are searching for.
Consider that with the code above, if the file is already in the Logs folder, it will not be overwritten. Note that you do not have to use the CATX function if you put another / at the very end of your specified path.
Result
Macro variables are not resolved when bounded by single quotes. They are resolved when within double quotes.
Try
did = filename(fref,"&path_scr");
You set VAR to a value like:
%let Var = LFNPAccounting20211209 ;
Then you use it to generate a SAS statement:
if newfn = &Var then do;
Which will resolve to
if newfn = LFNPAccounting20211209 then do;
Since I did not see you creating any variable named LFNPAccounting20211209 it is most likely that you want to use this statement instead:
if newfn = "&Var" then do;
So that the SAS code you generate will compare the value of NEWFN to a string literal instead of another variable.
Note: Since it looks like you are using WINDOWS filesystem you should make the comparison case insenstive.
if upcase(newfn) = %upcase("&Var") then do;

Creating Internal Accounts in SAS Metadata Server by programm on SAS Base

I'm trying to create Internal Accounts programmaticaly by using proc metadata.
The code section below creates person with External Login.
put"<Person Name=%str(%')&&PersonName&i.%str(%')>";
put"<Logins>";
put"<Login Name=%str(%')Login.&&PersonName&i.%str(%') Password=%str(%')&&word&i.%str(%')/>";
put"</Logins>";
put"</Person>";
To create ExternalLogin we can set attribute Password, and in SAS Metadata it will be encrypted automaticaly.
But to create InternalLogin type of object it is necessary to make the hash value of the password and the salt. I know that the standard sas002 encryption method, but in the case of using proc pwencode how to obtain the value of salt?
Is it possible create InternalLogin by using SAS Base?
Thanx.
So on. I found an article that can tell us how to create Stored Process for this problem. My answer is addition to the article.
The approach is base on execute java methods from sas programm.
1. Prerare setPasswd.java class
I've modified class from article. Separate code to connect to metadata server and create InternalLogin
import java.rmi.RemoteException;
import com.sas.metadata.remote.AssociationList;
import com.sas.metadata.remote.CMetadata;
import com.sas.metadata.remote.Person;
import com.sas.metadata.remote.MdException;
import com.sas.metadata.remote.MdFactory;
import com.sas.metadata.remote.MdFactoryImpl;
import com.sas.metadata.remote.MdOMIUtil;
import com.sas.metadata.remote.MdOMRConnection;
import com.sas.metadata.remote.MdObjectStore;
import com.sas.metadata.remote.MetadataObjects;
import com.sas.metadata.remote.PrimaryType;
import com.sas.metadata.remote.Tree;
import com.sas.meta.SASOMI.ISecurity_1_1;
import com.sas.iom.SASIOMDefs.VariableArray2dOfStringHolder;
public class setPasswd {
String serverName = null;
String serverPort = null;
String serverUser = null;
String serverPass = null;
MdOMRConnection connection = null;
MdFactoryImpl _factory = null;
ISecurity_1_1 iSecurity = null;
MdObjectStore objectStore = null;
Person person = null;
public int connectToMetadata(String name, String port, String user, String pass){
try {
serverName = name;
serverPort = port;
serverUser = user;
serverPass = pass;
_factory = new MdFactoryImpl(false);
connection = _factory.getConnection();
connection.makeOMRConnection(serverName, serverPort, serverUser, serverPass);
iSecurity = connection.MakeISecurityConnection();
return 0;
}catch(Exception e){
return 1;
}
}
public setPasswd(){};
public int changePasswd(String IdentityName, String IdentityPassword) {
try
{
//
// This block obtains the person metadata ID that is needed to change the password
//
// Defines the GetIdentityInfo 'ReturnUnrestrictedSource' option.
final String[][] options ={{"ReturnUnrestrictedSource",""}};
// Defines a stringholder for the info output parameter.
VariableArray2dOfStringHolder info = new VariableArray2dOfStringHolder();
// Issues the GetInfo method for the provided iSecurity connection user.
iSecurity.GetInfo("GetIdentityInfo","Person:"+IdentityName, options, info);
String[][] returnArray = info.value;
String personMetaID = new String();
for (int i=0; i< returnArray.length; i++ )
{
System.out.println(returnArray[i][0] + "=" + returnArray[i][1]);
if (returnArray[i][0].compareTo("IdentityObjectID") == 0) {
personMetaID = returnArray[i][1];
}
}
objectStore = _factory.createObjectStore();
person = (Person) _factory.createComplexMetadataObject(objectStore, IdentityName, MetadataObjects.PERSON, personMetaID);
iSecurity.SetInternalPassword(IdentityName, IdentityPassword);
person.updateMetadataAll();
System.out.println("Password has been changed.");
return 0; // success
}
catch (MdException e)
{
Throwable t = e.getCause();
if (t != null)
{
String ErrorType = e.getSASMessageSeverity();
String ErrorMsg = e.getSASMessage();
if (ErrorType == null)
{
// If there is no SAS server message, write a Java/CORBA message.
}
else
{
// If there is a message from the server:
System.out.println(ErrorType + ": " + ErrorMsg);
}
if (t instanceof org.omg.CORBA.COMM_FAILURE)
{
// If there is an invalid port number or host name:
System.out.println(e.getLocalizedMessage());
}
else if (t instanceof org.omg.CORBA.NO_PERMISSION)
{
// If there is an invalid user ID or password:
System.out.println(e.getLocalizedMessage());
}
}
else
{
// If we cannot find a nested exception, get message and print.
System.out.println(e.getLocalizedMessage());
}
// If there is an error, print the entire stack trace.
e.printStackTrace();
}
catch (RemoteException e)
{
// Unknown exception.
e.printStackTrace();
}
catch (Exception e)
{
// Unknown exception.
e.printStackTrace();
}
System.out.println("Failure: Password has NOT been changed.");
return 1; // failure
}
}
2. Resolve depends
Pay attention to imports in class. To enable execute the code below necessary set CLASSPATH enironment variable.
On linux you can add the next command in %SASConfig%/Lev1/level_env_usermods.sh:
export CLASSPATH=$CLASSPATH:%pathToJar%
On Windows you can add/change environment variable by Advanced system settings
So where should you search jar files? They are in folder:
%SASHome%/SASVersionedJarRepository/eclipse/plugins/
Which files i should include in path?
I've include all that used in OMI(Open Metadata Interface).Also I've added log4j.jar (not working without this jar. Your promts will be helpful):
sas.oma.joma.jar
sas.oma.joma.rmt.jar
sas.oma.omi.jar
sas.svc.connection.jar
sas.core.jar
sas.entities.jar
sas.security.sspi.jar
log4j.jar
setPasswd.jar (YOUR JAR FROM THE NEXT STEP!)
Choose files from nearest release. Example:
Here I'm set file from v940m3f (fix release).
Other ways is here.
3. Compile setPasswd.jar
I'm tried use internal javac.exe into SAS, but it's not worked properly. So ou need to download JDK to compile jars. I've create Bat-file:
"C:\Program Files\Java\jdk1.8.0_121\bin\javac.exe" -source 1.7 -target 1.7 setPasswd.java
"C:\Program Files\Java\jdk1.8.0_121\bin\jar" -cf setPasswd.jar setPasswd.class
Paramethers -source and -target will helpful if your version of JDK is upper, that usses in SAS. Version of "sas"-java you can see by:
PROC javainfo all;
run;
Search the next string in log:
java.vm.specification.version = 1.7
4. Finally. SAS Base call
Now we can call Java code by this method (All methods available here):
data test;
dcl javaobj j ("setPasswd");
j.callIntMethod("connectToMetadata", "%SERVER%", "%PORT%", "%ADMIN%", "%{SAS002}HASHPASSORPASS%", rc1);
j.callIntMethod("changePasswd", "testPassLogin", "pass1", rc2);
j.delete();
run;
In log:
UserClass=Normal
AuthenticatedUserid=Unknown
IdentityName=testPass
IdentityType=Person
IdentityObjectID=A56RQPC2.AP00000I
Password has been changed.
Now time to test. Create new user with no passwords.
Execute code:
data test;
dcl javaobj j ("setPasswd");
j.callIntMethod("connectToMetadata", "&server.", "&port.", "&adm", "&pass", rc1);
j.callIntMethod("changePasswd", "TestUserForStack", "Overflow", rc2);
j.delete();
run;
Now our user has InternalLogin object.
Thanx.

Insert contents of a text file into the Oracle CLOB

I'm trying to insert whole text contents of file.txt into a CLOB column!
Connection^ DB = gcnew Connection();
OracleConnection^ Ocnn=DB->getOracleConnectionObject();
int number = 0;
try {
// here >>
OracleCommand^ c = gcnew OracleCommand("INSERT INTO PANDA.PAGE(SITE_ID, URL, SOURCE) VALUES('40', 'www.site.com', Read_Whole_File('C://Users/farmehr/Desktop/', 'file.txt'))", Ocnn);
number = c->ExecuteNonQuery();
}
catch (Exception^ eOra) {
Console::WriteLine(eOra->Message + "Exception Caught");
throw eOra;
}
I want to know is there any way to insert file directly to the data base? ( A function like Read_Whole_File() in the code )
In order to be able to insert a file into a clob first I had to make a procedure in SQLPLUS! SOURCE is my clob file and TEMP_CLOB is a predefined directory.
Next in my code I had to run this procedure:
Using code:
Result:
-Keep this in mind that for making and running procedures you have to login AS SYSDBA.( Change oracleClient.dll to OracleManagedAcess.dll if you're using C or .NET)

Update constant in if block and extract it as a field in the same GENERATE statement

I have a relation as below loaded into "calls".
(Header India)
(Call1)
(Call2)
(END)
(Header NZ)
(Call1)
(Call2)
(END)
I am trying to update the relation so that it becomes as below and I can group by the 2nd field to get country wise call counts.
(Header India, Header India)
(Call1, Header India)
(Call2, Header India)
(END, Header India)
(Header NZ, Header NZ)
(Call1, Header NZ )
(Call2, Header NZ)
(END, Header NZ)
The first tuple will always be (Header ). I am using the below code where I want to update the constant and then extract that constant as 2nd field. But it is not working. Any suggestions?
%declare HeaderText 'Header '
calls = LOAD 'Data File';
extrctd = FOREACH calls GENERATE $0 as (country:chararray), (SUBSTRING($0,1,7)=='Header '?'$HeaderText'=$0:'$HeaderText') as (txt:chararray);
One option is you can write your own UDF to solve this problem. sample code below
input.txt
Header India
Call1
Call2
END
Header NZ
Call1
Call2
END
PigScript:
REGISTER mycountry.jar;
calls = LOAD 'input.txt' AS (line:chararray);
extrctd = FOREACH calls GENERATE $0 AS country, mypackage.COUNTRY(line,'Header') as txt;
DUMP extrctd;
Output:
(Header India,Header India)
(Call1,Header India)
(Call2,Header India)
(END,Header India)
(Header NZ,Header NZ)
(Call1,Header NZ)
(Call2,Header NZ)
(END,Header NZ)
Sample UDF code: The below java class (COUNTRY and MyGlobal) is compiled and generated as mycountry.jar
COUNTRY.java
package mypackage;
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
class MyGlobal {
public static String myCountry;
}
public class COUNTRY extends EvalFunc<String> {
#Override
public String exec(Tuple arg0) throws IOException {
try
{
String input = ((String) arg0.get(0));
String header = ((String) arg0.get(1));
String output;
if(input.startsWith(header))
{
output = input;
MyGlobal.myCountry = output;
}
else
{
output = MyGlobal.myCountry;
}
return output;
}
catch(Exception e)
{
throw new IOException("Caught exception while processing the input row ", e);
}
}
}

T4 Templates : Reading resulting columns of a stored procedure table

I am learning T4 templates right now, and all examples I got on internet is about using the tables for code generation. I want to use stored procedure result columns to generate automated UI, is it possible? OR I have to create view for same query? in that case, how to read from view?
Thanks in advance.
I got the solution and here is how you can generate a rad grid directly from the sp name
<#
'requires: <## assembly name="System.Data" #>
dim Server as new Server(".\sqlexpress")
dim database as new Database(server, "xxxx")
dim strSpName as String= "sp_xxxx"
Dim dt as System.Data.DataTable= database.ExecuteWithResults("exec sp_GetEquipment").Tables(0)
dim ctlName as String = "grdEqp"
#>
<telerik:RadGrid ID="grd" runat="server" Skin="Web20" AutoGenerateColumns="false">
<MasterTableView>
<Columns>
<#
For Each column As System.Data.DataColumn In dt.Columns
#><telerik:GridBoundColumn DataField="<#=column.ColumnName #>" HeaderText="<#=column.ColumnName #>"/>
<#Next#>
</Columns>
</MasterTableView>
</telerik:RadGrid>
If you don't actually want to execute the stored procedure as various stored procedures have a number of different parameters passed then you could use the sp_describe_first_result_set system stored procedure to return the columns of the result set assuming there is just one.
/// <summary>
/// Returns table for which stored procedures need to be generated.
/// </summary>
string TableName = "usp_getNominalCode";
string SchemaName = "Financial";
DataTable DataTable
{
get
{
if (_table == null)
{
Server server = new Server(new ServerConnection(new SqlConnection(this.ConnectionString)));
SqlConnectionStringBuilder connectionStringBuilder = new SqlConnectionStringBuilder(this.ConnectionString);
Database database = new Database(server, connectionStringBuilder.InitialCatalog);
DataSet storedProcedureColumns = database.ExecuteWithResults("sp_describe_first_result_set #tsql= " + "'[" + SchemaName + "]" + ".[" + TableName + "]'");
_table = storedProcedureColumns.Tables[0];
}
return _table;
}
}
DataTable _table;
You can then query this table for it's structure like the other answer but it'll be a little more generic