Read HDF files into MATLAB from a list.dat file containing the names of these files - list

I have a list.dat file that contains the names, in order, of about 1000 hdf files. I need to read these into MATLAB one by one in order and input the data contained in them into a matrix. How do I make MATLAB read in the hdf files? I know how to make MATLAB read one file, but when it's only the filenames in a list (in the same directory as the actual files), I don't know how to make it read in the variable.
Here's what I have so far:
% Read in sea ice concentrations
% AMSR-E data format: 'asi-s6250-20110101-v5.hdf';
% AMSR2 data format: 'asi-AMSR2-s6250-20120724-v5.hdf';
% SSMI data format: 'asi-SSMIS17-s6250-20111001-v5.hdf';
fname = 'list.dat';
data = double(hdfread(fname, 'ASI Ice Concentration'));
This currently does not work. It throws an error saying,
??? Error using ==> hdfquickinfo>findInsideVgroup at 156
HDF file '/home/AMSR_SeaIceData_Antarctic/list.dat' may be invalid or corrupt.
Error in ==> hdfquickinfo at 34
[found, hinfo] = findInsideVgroup ( filename, dataname );
Error in ==> hdfread>dataSetInfo at 363
hinfo = hdfquickinfo(filename,dataname);
Error in ==> hdfread at 210
[hinfo,subsets] = dataSetInfo(varargin{:});
The code works when I just put in the actual filename of the hdf file for fnames.
Thanks.

Related

Error with CRS argument while reprojecting

I'm trying to iterate multiple rasters (+500) in a for loop but I'm facing some problems.
First I want to reproject them from CRS EPSG:4326 to CRS EPSG: 32614, then resample them by using a mask raster which has a smaller resolution as well as extension and finally writing a result raster for each raster in the working directory, but I've been obtaining the following error message regarding the CRS argument:
Error in CRS(x) : PROJ4 argument-value pairs must begin with +: E:\Proyecto PM2.5\2_PM_2.5_Processing\Test/AOD_MOD_CDTDB_April_2016.tif
I took a look at multiple posts here, but I couldn't go over this problem. Below is my code, any help will be really appreciated from this R beginner
#find all tifs in your directory
dir<-"E:\\Proyecto PM2.5\\2_PM_2.5_Processing\\Test"
#get a list of all files with .tif in the name in the directory
files<-list.files(path=dir, pattern='.tif', full.names = TRUE)
#raster with the expected characteristics: extension, cellsize, number of pixels
r_ref <- raster("E:\\Proyecto PM2.5\\3_PM_2.5_Entrega\\temporal\\Raster_C.tif")
for (file in files){
name <- file
projectRaster(name,crs="+init=epsg:32614")
resample(file,r_ref,method="ngb")
savename<-sub("ZMVM",name,basename(file))
writeRaster(r,file=savename,)
}
You do
for (file in files){
name <- file
projectRaster(name,crs="+init=epsg:32614")
So name is the same as file (why do you make a copy?) --- a filename.
You ask projectRaster to project a character string (file name). What you intended is surely something like this
for (file in files){
r <- raster(file)
projectRaster(r, crs="+init=epsg:32614")

how do i concatenate two different files into one file using python

input: I have more than 100 sample files. Each sample file has two different files has an extension of *.column' and *.datatypes
File1 each file has column names and has an extension of *.column datatypes description and has an extension of *.datatypes
What I need is an output file in their respective files sample
Output File should have column names along with datatypes.
Currently am getting all 100 files data merged and saved into one file.
Eg: file_1:
column names datatypes
id int
name string
Eg: file_2:
column names datatypes
id int
name string
i got the output for all files column names and datatypes merged in one single file.
What I need is to get individual files merged separately for each sample.
for name in os.listdir("C:\Python27"):
if name.endswith(".column"):
for file in name:
file = os.path.join(name)
joined = file+ ".joined"
with open(joined,"w") as fout:
filenames = glob.glob('*.column')
for filename in filenames:
with open(filename) as f1:
file_names = glob.glob('*.datatypes')
for filename in file_names:
with open(filename) as f2:
for line1,line2 in zip(f1,f2):
x = ("{0} {1} \n".format(line1.rstrip(),line2.rstrip()))
y = x.strip()
fout.write(y.strip() + ',\n')
Please assist me.
Hopefully the below would work. This is on the understanding that each *.column file has a corresponding *.datatypes file name, if not the code will throw a File not found. error.
for colname in os.listdir("C:\Python27"):
if colname.endswith(".column"):
print('Processing:' + colname)
file = os.path.splitext(colname)[0]
joined = file+ ".joined"
with open(joined,"w") as fout:
with open(colname) as f1:
datname = file+'.datatypes'
with open(datname) as f2:
for line1,line2 in zip(f1,f2):
x = ("{0} {1}".format(line1.rstrip(),line2.rstrip()))
y = x.strip()
fout.write(y.strip() + ',\n')
print('Finished writing to :'+joined)
I test ran this with a few sample input files as below file1.column
date_sev
pos
file1.datatypes
timestamp
date
file2.column
id
name
file2.datatypes
int
string
file3.column
id
name
file3.datatypes
int
string
When I run the file I get the below output in the console
Processing:file1.column
Finished writing to :file1.joined
Processing:file2.column
Finished writing to :file2.joined
Processing:file3.column
Finished writing to :file3.joined
And the output files I get are file1.joined
date_sev timestamp,
pos date,
file2.joined
id int,
name string,
file3.joined
id int,
name string,
Also if you want to better the output syntax of the files then I would make the changes as below...
From
x = ("{0} {1}".format(line1.rstrip(),line2.rstrip()))
To
x = ("{0},{1}".format(line1.rstrip(),line2.rstrip()))
From
fout.write(y.strip() + ',\n')
To
fout.write(y.strip() + '\n')
I left the formatting as is from your initial version in my original solution posted in the beginning.

Python 2.7 - Append processed file contents from multiple files to one large CSV file with original filename headers separating

I have not done any programming in about 12 years and have been asked by one of my colleagues to help with what is apparently a basic Python 2.7 script. My question is very similar to what this person asked (though has not been answered):
Python - Batch combine Multiple large CSV, filter data, skip header, appending vertically into a single CSV
I need to prompt the user for the folder path, read in each file from that folder (there are hundreds of CSV files), conduct processing, and then output the finished processing from each file into a single CSV file with the output of each file separated by a blank line and the filename of the file that it was read from.
It would result in something like this:
CHEM_0_5
etc etc
etc etc
etc etc
LAW_4_1
etc etc
etc etc
LAW_7_3
etc etc
etc etc
Currently the script has to be edited with the name of the file it has to read, saved, and then run. Then the contents of the output file has to be manually copied into a new csv file. It is very tedious and time consuming.
This is what I currently have. Please note I have removed some of the processing from the example.
import time
import datetime
x = 0
stamp = 0
compare = 1
values = []
## INSERT NAME OF FILE YOU WANT TO CLEAN
g = open('CHEM_0_5.csv','r')
for line in g:
lis=[line.split() for line in g]
lis.pop(0)
lis.pop(0)
timestamps = []
results = []
x = 0
for i in cl:
## INSERT WHAT YOU WANT TO SAVE THE FILE AS
fd = open('new.csv','a')
fd.write(str(ts[x]) + "," + str(i) + "\n")
fd.close()
x = x + 1
g.close()
I have been trying to re-learn python in the process of searching for answers but given that I don't really know what I'm doing I feel that this could be something to do after I've completed the task for my colleague.
Thank you for taking the time to read my submission!

BadDataError when editing a .dbf file using dbf package

I have recently produced several thousand shapefile outputs and accompanying .dbf files from an atmospheric model (HYSPLIT) on a unix system. The converter txt2dbf is used to convert shapefile attribute tables (text file) to a .dbf.
Unfortunately, something has gone wrong (probably a separator/field length error) because there are 2 problems with the output .dbf files, as follows:
Some fields of the dbf contain data that should not be there. This data has "spilled over" from neighbouring fields.
An additional field has been added that should not be there (it actually comes from a section of the first record of the text file, "1000 201").
This is an example of the first record in the output dbf (retrieved using dbview unix package):
Trajnum : 1001 2
Yyyymmdd : 0111231 2
Time : 300
Level : 0.
1000 201:
Here's what I expected:
Trajnum : 1000
Yyyymmdd : 20111231
Time : 2300
Level : 0.
Separately, I'm looking at how to prevent this from happening again, but ideally I'd like to be able to repair the existing .dbf files. Unfortunately the text files are removed for each model run, so "fixing" the .dbf files is the only option.
My approaches to the above problems are:
Extract the information from the fields that do exist to a new variable using dbf.add_fields and dbf.write (python package dbf), then delete the old incorrect fields using dbf.delete_fields.
Delete the unwanted additional field.
This is what I've tried:
with dbf.Table(db) as db:
db.add_fields("TRAJNUMc C(4)") #create new fields
db.add_fields("YYYYMMDDc C(8)")
db.add_fields("TIMEc C(4)")
for record in db: #extract data from fields
dbf.write(TRAJNUMc=int(str(record.Trajnum)[:4]))
dbf.write(YYYYMMDDc=int(str(record.Trajnum)[-1:] + str(record.Yyyymmdd)[:7]))
dbf.write(TIMEc=record.Yyyymmdd[-1:] + record.Time[:])
db.delete_fields('Trajnum') # delete the incorrect fields
db.delete_fields('Yyyymmdd')
db.delete_fields('Time')
db.delete_fields('1000 201') #delete the unwanted field
db.pack()
But this produces the following error:
dbf.ver_2.BadDataError: record data is not the correct length (should be 31, not 30)
Given the apparent problem that there has been with the txt2dbf conversion, I'm not surprised to find an error in the record data length. However, does this mean that the file is completely corrupted and that I can't extract the information that I need (frustrating because I can see that it exists)?
EDIT:
Rather than attempting to edit the 'bad' .dbf files, it seems a better approach to 1. extract the required data to a text from the bad files and then 2. write to a new dbf. (See Ethan Furman's comments/answer below).
EDIT:
An example of a faulty .dbf file that I need to fix/recover data from can be found here:
https://www.dropbox.com/s/9y92f7m88a8g5y4/p0001120110.dbf?dl=0
An example .txt file from which the faulty dbf files were created can be found here:
https://www.dropbox.com/s/d0f2c0zehsyy8ab/attTEST.txt?dl=0
To fix the data and recreate the original text file, this snippet should help:
import dbf
table = dbf.Table('/path/to/scramble/table.dbf')
with table:
fixed_data = []
for record in table:
# convert to str/bytes while skipping delete flag
data = record._data[1:].tostring()
trajnum = data[:4]
ymd = data[4:12]
time = data [12:16]
level = data[16:].strip()
fixed_data.extend([trajnum, ymd, time, level])
new_file = open('repaired_data.txt', 'w')
for line in fixed_data:
new_file.write(','.join(line) + '\n')
Assuming all your data files look like your sample (the big IF being the data has no embedded commas), then this rough code should help translate your text files into dbfs:
raw_data = open('some_text_file.txt').read().split('\n')
final_table = dbf.Table(
'dest_table.dbf',
'trajnum C(4); yyyymmdd C(8); time C(4); level C(9)',
)
with final_table:
for line in raw_data:
fields = line.split(',')
final_table.append(tuple(fields))
# table has been populated and closed
Of course, you could get fancier and use actual date, and number fields if you want to:
# dbf string becomes
'trajnum N; yyyymmdd D; time C(4), level N'
#appending data loop becomes
for line in raw_data:
trajnum, ymd, time, level = line.split(',')
trajnum = int(trajnum)
ymd = dbf.Date(ymd[:4], ymd[4:6], ymd[6:])
level = int(level)
final_table.append((trajnum, ymd, time, level))

To count total number of rows in a csv/.txt file and write it to a new csv file in python

I want to count the total number of rows in a csv file/.txt, output/write it to a new csv file, then clean the file and write a 2nd column to the new file with total number of rows after cleaning. ( I currently have the code for cleaning, I only need help with accepting a file and writing the total rows to a new file before and after cleaning) I have attached the code below which writes only the column name to a new csv file and doesn't print the result.
import csv
data = open ('/anusha.csv','r')
#numline = len(file.readlines(data))
#print(numline)
before_clean = []
with open('out_anusha.csv', 'w') as f1:
for row in data:
f1 = len(file.readlines(data))
before_clean.append(f1)
writer = csv.writer(f1)
f1.write("Before_clean")
Any help is appreciated!
One way to count number of lines in file without going through whole reading process is to use wc utility if this program is supposed to run on *nix system.
You can refer Running "wc -l <filename>" within Python Code