Error during Dataset to Pandas conversion - ray

I am getting this error while converting dataset to pandas using ds.to_pandas. Is there any way to overcome it?
The dataset has more than the given limit of 100000 records. Use d.limit(N).topandas()
Thanks

Related

Dataprep - missing rows after processing

I have csv containing 1.5 milion rows. I prepared Dataprep job that parse data and store them to BQ (or CSV). But after processing I have nearly half of rows missing (around 700k). When I run this Dataprep job without any recipe steps I got the same wrong number of rows.
I did analysis and data in input CSV looks correct. I filtered some subset of data that are missing in output and this small subset is imported correctly.
Isn't there something like sampling of data in output? What can cause my rows are lost.

Error while converting continuous data to categorical data in Logistic Regression

I am using Logistic regression over my dataset which has its target variable in 0s and 1s. I used .replace() function and replaced them accordingly.
> data['target']=data['target'].replace({0:"No",1:"yes"})
The code ran fine. But when I am modelling the data,
model_log=sm.Logit(data['target'],data.iloc[:,2:]).fit()
it is showing the below error:
ValueError: Pandas data cast to numpy dtype of object. Check input
data with np.asarray(data).
when you select X data using iloc,it is return a pandas dataframe.According to statsmodel documentation,logit expect to X and y to be array_like. You need to cast the dataframe to required data type.You can use to_numpy method to convert dataframe to numpy array.
model_log=sm.Logit(data['target'].astype(float),data.iloc[:,2:].to_numpy()).fit()

WARNING: Failed to scan text length or time type for column

I'm running SAS 9.4 TS Level 1M5 x64_7PRO platform on Windows 6.1.7601.
I'm attempting to import an Access table with over 30,000 records that has 7 columns. One of the columns, "Results", contains data exceeding 4,000 characters (both numeric and text) for certain records. When using the code below,
PROC IMPORT OUT= ED_Notes_July2019
DATATABLE= "ED_Notes_Import"
DBMS=ACCESS REPLACE;
DATABASE="J:\EMTC\JMC\PECARN Registry\ED Documents Reports\2019\Month\Docs_Jul.accdb";
SCANMEMO=YES;
USEDATE=NO;
SCANTIME=YES;
RUN;
I get the following, "WARNING: Failed to scan text length or time type for column RESULT." In doing a little bit of research online, I only find potential solutions involving Excel (http://support.sas.com/kb/33/257.html). Is anyone aware of a solution applicable to Access?
I also have the data stored within SQL table (index space 836 MB; data space 50,000 MB; row count 8,948,138) but it takes hours to import that data from there using the code below:
LIBNAME SQL ODBC DSN='SQL Server' schema=dbo;
data ED_Notes_Master;
set sql.ED_Notes_Master;
if datepart(RESULT_DT_TM) > '01JUL2019'd;
run;
The if statement is most likely not being pushed to the server side by the automatic features of the ODBC engine.
Try replacing the if with a where statement and a datetime literal:
where RESULT_DT_TM > '01JUL2019:0:0'DT;

"Invalid Datetime" for JSON when uploading to GCP BigQuery

I'm trying to get some records into BigQuery, but I've gotten stuck at a date error. I've tried formatting the date in the way that BQ want, but these haven't seemed to help. Here is a (fake, obviously) record I am trying to work with.
{"ID":"1","lastName":"Doe","firstName":"John","middleName":"C","DOB":"1901-01-01 00:00:00","gender":"Male","MRN":"1","diagnosis":["something"],"phone":"888-555-5555","fax":"888-555-5555","email":"j#doe.org"}
And here is the error that I get when I try and upload the file
Provided Schema does not match Table x:y.z. Field dob has changed type from DATETIME to TIMESTAMP
I'm just not sure what the difference in my format could be that BQ is unhappy about. I have my date formatted properly, the time is formatted properly (even tried 00:00:00.0), but I just can't seem to get this data into the table. I've also not specified any time zone, which makes it even odder that it thinks I'm supplying a timestamp.
Any help would be appreciated

date format error while merging files using stata

I'm completely new to stata. I'm trying to merge 3 different datasets which have dates in them with format (d-mmm-yy). While trying to merge i'm encountering with an error saying
date is str 9 in using data stata
r(106)
I have no clue what this error is about. Need some help. I can provide any additional info if required.
Thanks
This probably means that in some data sets, the date is stored as a number (Stata's format is Unix-like, # of elapsed days since 1 Jan 1960), while in others, it is a string (which is exactly what Stata tells you). You need to convert them all to the same format, e.g. with
generate long n_date = date(date, "DMY", 2050)
See help date() or help date functions.