Extract data from large files excel

Extract data from large files excel - kettle

I'm using Pentaho Data Integration to create a transformation from xlsx files to mysql, but I can't import data from large files with Excel 2007 xlsx(apache POI Straiming). It gives me out of memory errors.

Did you try this option ?
Advanced settings -> Generation mode -> Less memory consumed for large excel(Event mode
(You need to check "Read excel2007 file format" first)

I would recommend you to increase jvm memory allocation before running the transformation. By default, pentaho data integration aka kettle comes with low memory allocation which would cause issues with running ETLs involving large files. You would need to modify the -Xmx value so that it specifies a larger upper memory limit in spoon.bat accordingly.
If you are using spoon in windows and edit spoon.bat in the line show below.
if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xmx512m" "-XX:MaxPermSize=256m"
If you are using kitchen or pan, edit in those pan.bat or kitchen.bat accordingly. If you are using in linux, change in .sh files.

Related

Load a large .obj file to WASM OpenGL

I am loading .obj models using the preload EMSCRIPTEN flag so that I am able to use them in WASM/WebGL from C++/OpenGL ES, the memory consumption goes over the limit when loading a 64mb .obj, I am able to load smaller models but from that size onward I crash. What is the correct way of loading large files so that I can access them in C++? I also tried the embed command but that doesn't work either.

For a large file, place it in the server as a static resource and use the Fetch API. You can let the browser cache it by using the EMSCRIPTEN_FETCH_PERSIST_FILE flag. It uses the HTML5 IndexedDB designed to store large data, such as 1GB. See this question for the size limit.

out of space SAS studio engine V9- MAC

I was trying to practice with a dataset having more than 100K rows and my SAS UE shows error as out of space while trying statistical analysis,after some google search I found some solutions like extending disk space in VM and cleaning work libraries(I did clean the work library using "proc datasets library=WORK kill; run; quit;" but the issue remains same) but I am not sure how to increase the disk space, or redirecting work library to local storage in my Mac. There are no proper guidelines I have seen/understood from google search. Please help.

You can modify the cores on the VM to be 2 and increase the RAM space in the Oracle VB settings. You cannot increase the size of the VM and 100K rows should not be problematic unless you're not cleaning up processes.
Yes, SAS UE does have a tendency to not clean up after crashes so eventually if you've crashed it multiple times you'll have to reinstall to clean up. You can get around this by reassigning the work library. A quick way to do this is in projects that will be affecting it set the USER library to your myfolders or another space on your computer.
libname user '/folders/myfolders/tempWSpace';
Make sure you first create the folder under myfolders. Then any single level data set (no libname) will automatically be stored in user library and you should be ok to run your code.

How to access PSSE working case variables by Python a code

I am a transmission planning engineer and trying to automate the execution of PSSE 100 times or more at one go through a Python code. I already runs, change loads, reruns psse and write bus based summary report to *.csv file. What I really want to do is select the first active power load variable of a PSSE case and increase it by 1 MW. Then run psse, write results to a csv file. Change the selected load back to its original value and move on to the next active load to do the same again and again until I have done same for all load busses.
This will help me to calculate transmission loss factors for entire network with one go.
Thanks

#dsmtlk, if you're experienced in Python, you can readily find the information you need in the PSSE API Manual located in your PSSE program folder (mine is in C:\Program Files (x86)\PTI\PSSE33\DOCS). The API routines for getting bus data are in section 8.6. The routine for changing bus data—viz., psspy.load_data_4()—is in section 2.21.
If you're new to Python, here are a couple links I found helpful when I first started:
https://docs.python.org/2/tutorial/
http://www.tutorialspoint.com/python/

Low level page manager in C/C++

I need a library (C/C++) that handles the basic operations of a low level pager layer on disk. The library must include methods to get a disk page, add data, write the page to disk etc.
To give a bit of background: Assume there's a particular dataset which runs query q. I have a particular grouping of the data on disk so that q can be processed faster. This library will help me write the data in pages, according to the grouping.
Appreciate any suggestions.

You don't specify the OS, but that doesn't really matter: all common OS'es bar this outright. Instead, use a memory-mapped file. To read a page from disk, just read from memory and the OS will (if needed) perform a disk read. Writing to disk will be mostly on-demand but usually can be done explicitly on demand.
As for your "dataset running query q", that just doesn't make sense. Queries are code not data. Do you perhaps have a query Q run over a dataset D?

How to optimize loading of a large XAP file

We have inherited a silverlight application used in banks. This is a huge silverlight application with a single xap file of 6.5MB size.
Recently one the core banking applications has updated their services to delete the entire browser cache from the users machine on daily basis.
This impacts the silverlight application directly. We cannot afford to download the 6 MB file every day. On a long term basis I know we need to break this monolith in to smaller manageable pieces and load them dynamically.
I wanted to check if there are any short term alternatives.
Can we have the silverlight runtime load the xap in to different director ?
Will making the application Out of Browser application give us any additional flexibility in terms of where we are loading the xap from ?
Any other suggestions which can help us to give a short term solution will be helpful.

What is inside your xap file ? (change extension to .zip and check what is inside)
Are you including images, sound inside your xap file ?
Are all dlls used necessary ?
Short-term alternatives are :
do some cleanup of your application (remove unused dlls, images, code,...)
rezip your xap file using a more powerful compression tool to save some place
Also, some tools exists to "minify" the size of your xap file. (but I never tried them)
Here is a link that has helped me to reduce my xap size :
http://www.silverlightshow.net/items/My-XAP-file-is-5-Mb-size-is-that-bad.aspx
Edit to answer your comment :
I would suggest using the Isolated Storage.
Quote from http://www.silverlightshow.net/items/My-XAP-file-is-5-Mb-size-is-that-bad.aspx :
Use Isolated Storage: keep in cache XAP files, DLL’s, resources and application data. This won't enhance the first load of the application, but in subsequent downloads you can check whether the resource is already available on the client local storage and avoid downloading it from the server. Three things to bear in mind: It’s up to you building a timestamp mechanism to invalidate the cached information, by default isolated storage is limited to 1 Mb size (by user request it can grow), and users can disable isolated storage on a given app. More info:Basics about Isolated Storage and Caching and sample.
Related links :
http://msdn.microsoft.com/en-us/magazine/dd434650.aspx
http://timheuer.com/blog/archive/2008/09/24/silverlight-isolated-storage-caching.aspx

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract data from large files excel - kettle

I'm using Pentaho Data Integration to create a transformation from xlsx files to mysql, but I can't import data from large files with Excel 2007 xlsx(apache POI Straiming). It gives me out of memory errors.

Did you try this option ? Advanced settings -> Generation mode -> Less memory consumed for large excel(Event mode (You need to check "Read excel2007 file format" first)

Related

Load a large .obj file to WASM OpenGL

out of space SAS studio engine V9- MAC

How to access PSSE working case variables by Python a code

Low level page manager in C/C++

How to optimize loading of a large XAP file

Categories

Resources