I am very new to SAS and would like to import a CSV file into SAS. I have tried using proc and also the tool import data, but fail to import the data.
i get the following errors:
ERROR: The name Page Name (prop31) is not a valid SAS name
when importing the data it doesnt accept the delimiter and imports everything as one column
any suggestions?
Thank you!
Related
What is the correct way to import specific sheet of excel by using Django-Import-Export Module??
or if possible. all the sheets in a workbook one by one...
I referred their documentation and it was not much helpful
https://django-import-export.readthedocs.io/en/latest/
the same way I would like to export data into one workbook from multiple sheets...
how do achieve it??
Here is the complete answer for my question
databook = Databook()
imported_data= databook.load(file.read( ), format= 'xlsx')
for dataset in imported_data.sheets():
print(dataset.title) # returns the names of the sheets
print(dataset) # returns the data in each sheet
This is how you can export your multiple datasets in one excel file
book = tablib.Databook((data1, data2, data3))
with open('students.xls', 'wb') as f:
f.write(book.export('xls'))
documentation
You can also try using use pyexcel_xls package on django it's fairly easy to use.
from pyexcel_xls import get_data as xls_get
def import_excel(request):
excel_file = request.FILES['file']
#uploading the excel file
if (str(excel_file).split('.')[-1] == "xls"):
data = xls_get(excel_file, column_limit=15)
elif (str(excel_file).split('.')[-1] == "xlsx"):
data = xlsx_get(excel_file, column_limit=15)
if (len(data['sheet1']) > 2): #reading of records begins from second row
name = data['sheet1'] #excel sheet name you wish to get
for l in range(2, len(name)): #loop through the number of rows in the sheet
data = name[l][0] #extract the first column's data
I've got proc import from xlsx file with column names in polish language.
My simple proc looks like this:
proc import datafile = '/directory/file_name.XLSX'
out = libname.tablename
dbms = xlsx
replace;
run;
I would like to add somewhere ENCODING="LATIN2" so the columns don't look like:
Is it possible? And how?
I could do it in second step by renaming all the columns with some predefined list. but I don't want to do it like this yet. Maybe there is a better solution.
You need to specify the encoding of the file you are reading/importing.
Per SAS support, this can be specified in the filename statement.
I've tested it with SAS UE and csv files and it worked pretty well:
filename temp '/folders/myfolders/Raw data/iso8859.csv' encoding="utf-8";
proc import datafile = temp
out = utf8
dbms = csv
replace;
run;
Your code should then look like:
filename temp '/directory/file_name.XLSX' encoding="LATIN2";
proc import datafile = temp
out = libname.tablename
dbms = xlsx
replace;
run;
There's a few things going on here:
You can't control the encoding of the XLSX format file; that's a binary file (sort of), and SAS doesn't treat it like a text file. You can do this for CSVs, as that's read in as a text file, but not XLSX.
If you're importing a file in another encoding into SAS, your session encoding also matters. You will have to run your SAS session in the right encoding to have everything look right. See the documentation for more details on how to change your SAS session encoding.
But, there's a third option here, if you just want to get rid of the extra characters: options validvarname=v7 (or even v6). I believe this will prevent SAS's import engine from using any character other than A-Z 0-9 and underscore. It won't necessarily look pretty, though, it'll replace all of the other characters with underscores I suspect.
I am trying to load a csv file contain a date-time variable. The variable looks like the following
datetime
2008-10-08T07:06:08.248635000Z
2008-10-08T07:06:09.613897000Z
2008-10-08T07:06:28.217422000Z
2008-10-08T07:07:53.461926000Z
2008-10-27T16:10:49.189132000Z
I tried format time18.3, but because there is a character T after the date and character Z after the time, so the importing is not successful. Could anyone teach me how to load this data please.
That's B8601DZw.d format; so you can use B8601DZ30. I believe.
data _null_;
dt_char='2008-10-08T07:06:08.248635000Z';
dt_num = input(dt_char,B8601DZ30.);
put dt_num= datetime.;
run;
After reading in a .csv file using pandas, and then converting it into an R dataframe using the rpy2 package, I created a model using some R functions (also via rpy2), and now want to take the summary of the model and convert it into a Pandas dataframe (so that I can either save it as a .csv file or use it for other purposes).
I have followed out the instructions on the pandas site (source: https://pandas.pydata.org/pandas-docs/stable/r_interface.html) in order to figure it out:
import pandas as pd
from rpy2.robjects import r
import sys
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects import r, pandas2ri
pandas2ri.activate()
caret = rpackages.importr('caret')
broom= rpackages.importr('broom')
my_data= pd.read_csv("my_data.csv")
r_dataframe= pandas2ri.py2ri(my_data)
preprocessing= ["center", "scale"]
center_scale= StrVector(preprocessing)
#these are the columns in my data frame that will consist of my predictors in the model
predictors= ['predictor1','predictor2','predictor3']
predictors_vector= StrVector(predictors)
#this column from the dataframe consists of the outcome of the model
outcome= ['fluorescence']
outcome_vector= StrVector(outcome)
#this line extracts the columns of the predictors from the dataframe
columns_predictors= r_dataframe.rx(True, columns_vector)
#this line extracts the column of the outcome from the dataframe
column_response= r_dataframe.rx(True, column_response)
cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100)
model_R= caret.train(columns_predictors, columns_response, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl)
summary_model= base.summary(model_R)
coefficients= stats.coef(summary_model)
pd_dataframe = pandas2ri.ri2py(coefficients)
pd_dataframe.to_csv("coefficents.csv")
Although this workflow is ostensibly correct, the output .csv file did not meet my needs, as the names of the columns and rows were removed. When I ran the command type(pd_dataframe), I find that it is a <type 'numpy.ndarray'>. Although the information of the table is still present, the new formatting has removed the names of the columns and rows.
So I ran the command type(coefficients) and found that it was a <class 'rpy2.robjects.vectors.Matrix'>. Since this Matrix object still retained the names of my columns and rows, I tried to convert it into an R objects DataFrame, but my efforts proved to be futile. Furthermore, I don't know why the line pd_dataframe = pandas2ri.ri2py(coefficients) did not yield a pandas DataFrame object, nor why it did not retain the names of my columns and rows.
Can anybody recommend an approach so I can get some kind of pandas DataFrame that retains the names of my columns and rows?
UPDATE
A new method was mentioned in the documents of a slightly older version of the package called pandas2ri.ri2py_dataframe (source: https://rpy2.readthedocs.io/en/version_2.7.x/changes.html), and now I have a proper data frame instead of the numpy array. However, I still can't get the names of the rows and columns to be transferred properly. Any suggestions?
May be it should happen automatically during conversion, but in the meantime row and column names can easily be obtained from the R object and added to the pandas DataFrame. For example the column names for the R matrix should be at: https://rpy2.github.io/doc/v2.9.x/html/vector.html#rpy2.robjects.vectors.Matrix.colnames
I am trying to import a CSV file to a MySQL server. Everything is working fine with the MySQL connector: create database, show tables etc. But my program crashes when I create a statement like:
stmt->execute("LOAD DATA INFILE 'c:\Programming\SQL.csv' REPLACE INTO TABLE sqltable FIELDS TERMINATED BY '\;' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (Company, Address, City)")
Who can help me out?