Rocket Universe hung deleting multipart file - universe

I'm having a process hang on trying to delete a part of a multipart file. There's a lock on the file, but the process trying to do the delete is the one that holds the lock. What could be causing it to hang?
Our product uses a multipart file called MR.WORK. A new part gets created for each process, with a part name consisting of the letter U and the userno, so here it's MR.WORK,U-3. Let's say I'm logged in as foo, and the product is also logged in as foo, running in a phantom.
>PORT.STATUS USER foo
There are currently 2 uniVerse sessions; 1 interactive, 1 phantom
Pid.... User name. Who. Port name..... Last command processed............
23144 foo 2 /dev/pts/2 PORT.STATUS USER foo
Pid.... User name. Who. Last command processed............................
2086 foo -3 DELETE-FILE DATA MR.WORK,U-3
It get there and just hangs forever. Something else has an IN type group lock on the same inode, but I gather that IN is just informational and not really locking.
>LIST.READU EVERY
Active Group Locks: Record Group Group Group
Device.... Inode.... Netnode Userno Lmode G-Address. Locks ...RD ...SH ...EX
2068 21630372 0 2 9 IN 400 1 0 0 0
2068 21502283 0 -1 57 RD 400 0 1 0 0
Active Record Locks:
Device.... Inode.... Netnode Userno Lmode Pid Login Id Item-ID.............
2068 21630372 0 -3 9 RU 2086 foo MR.WORK,U-3
I'm stumped. MR.WORK,U-3 part is not particularly large. I've tried deleting and recreating the file and we'll see if that helps, but I'm not hopeful. Ideas?

Related

Extracting Multiple Blocks of Similar Text

I am trying to parse a report. The following is a sample of the text that I need to parse:
7605625112 DELIVERED N 1 GORDON CONTRACTORS I SIPLAST INC Freight Priority 2000037933 $216.67 1,131 ROOFING MATERIALS
04/23/2021 02:57 PM K WRISHT N 4 CAPITOL HEIGHTS, MD ARKADELPHIA, AR Prepaid 2000037933 -$124.23 170160-00
04/27/2021 12:41 PM 2 40 20743-3706 71923 $.00 055 $.00
2 WBA HOT $62.00 0
$12.92 $92.44
$167.36
7605625123 DELIVERED N 1 SECHRIST HALL CO SIPLAST INC Freight Priority 2000037919 $476.75 871 PAIL,UN1263,PAINT,3,
04/23/2021 02:57 PM S CHAVEZ N 39 HARLINGEN, TX ARKADELPHIA, AR Prepaid 2000037919 -$378.54
04/27/2021 01:09 PM 2 479 78550 71923 $.00 085 $95.35
2 HRL HOT $62.00 21
$13.55 $98.21
$173.76
This comprised of two or more blocks that start with "[0-9]{10}\sDELIVERED" and the last currency string prior to the next block.
If I test with "(?s)([0-9]{10}\sDELIVERED)(.*)(?<=\$167.36\n)" I successfully get the first Block, but If I use "(?s)([0-9]{10}\sDELIVERED)(.*)(?<=\$\d\d\d.\d\d\n)" it grabs everything.
If someone can show me the changes that I need to make to return two or more blocks I would greatly appreciate it.
* is a greedy operator, so it will try to match as much characters as possible. See also Repetition with Star and Plus.
For fixing it, you can use this regex:
(?s)(\d{10}\sDELIVERED)((.(?!\d{10}\sDELIVERED))*)(?<=\$\d\d\d.\d\d)
in which I basically replaced .* with (.(?!\d{10}\sDELIVERED))* so that for every character it checks if it is followed or not by \d{10}\sDELIVERED.
See a demo here

Python: Match a special caracter with regular expression

Hi everyone I'm using the re.match function to extract pieces of string within a row from the file.
My code is as follows:
## fp_tmp => pointer of file
for x in fp_tmp:
try:
cpuOverall=re.match(r"(Overall CPU load average)\s+(\S+)(%)",x)
cpuUsed=re.match(r"(Total)\s+(\d+)(%)",x)
ramUsed=re.match(r"(RAM Utilization)\s+(\d+\%)",x)
####Not Work####
if cpuUsed is not None: cpuused_new=cpuUsed.group(2)
if ramUsed is not None: ramused_new=ramUsed.group(2)
if cpuOverall is not None: cpuoverall_new=cpuOverall.group(2)
except:
searchbox_result = None
Each field is extracted from the following corresponding line:
ramUsed => RAM Utilization 2%
cpuUsed => Total 4%
cpuOverall => Overall CPU load average 12%
ramUsed, cpuUsed, cpuOverall are the variable where I want write the result!!
Corretly line are:
(space undefined) RAM Utilization 2%
(space undefined) Total 4%
(space undefined) Overall CPU load average 12%
When I execute the script all variable return a value: None.
With other variable the script work corretly.
Why the code not work in this case? I use the python3
I think that the problem is a caracter % that not read.
Do you have any suggestions?
PROBLEM 2:
## fp_tmp => pointer of file
for x in fp_tmp:
try:
emailReceived=re.match(r".*(Messages Received)\s+\S+\s+\S+\s+(\S+)",x)
####Not Work####
if emailReceived is not None: emailreceived_new=emailReceived.group(2)
except:
searchbox_result = None
Each field is extracted from the following corresponding on 2 lines in a file:
[....]
Counters: Reset Uptime Lifetime
Receiving
Messages Received 3,406 1,558 3,406
[....]
Rates (Events Per Hour): 1-Minute 5-Minutes 15-Minutes
Receiving
Messages Received 0 0 0
Recipients Received 0 0 0
[....]
I want extract only second occured, that:
Rates (Events Per Hour): 1-Minute 5-Minutes 15-Minutes
Receiving
Messages Received 0 0 0 <-this
Do you have any suggestions?
cpuOverall line: you forgot that there is more information at the start of the line. Change to
'.*(Overall CPU load average)\s+(\S+%)'
cpuUsed line: you forgot that there is more information at the start of the line. Change to
'.*(Total)\s+(\d+%)'
ramUsed line: you forgot that there is more information at the start of the line... Change to
'.*(RAM Utilization)\s+(\d+%)'
Remember that re.match looks for an exact match from the start:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. [..]
With these changes, your three variables are set to the percentages:
>>> print (cpuused_new,ramused_new,cpuoverall_new)
4% 2% 12%

How do I set the column width of a pexpect ssh session?

I am writing a simple python script to connect to a SAN via SSH, run a set of commands. Ultimately each command will be logged to a separate log along with a timestamp, and then exit. This is because the device we are connecting to doesn't support certificate ssh connections, and doesn't have decent logging capabilities on its current firmware revision.
The issue that I seem to be running into is that the SSH session that is created seems to be limited to 78 characters wide. The results generated from each command are significantly wider - 155 characters. This is causing a bunch of funkiness.
First, the results in their current state are significantly more difficult to parse. Second, because the buffer is significantly smaller, the final volume command won't execute properly because the pexpect launched SSH session actually gets prompted to "press any key to continue".
How do I change the column width of the pexpect session?
Here is the current code (it works but is incomplete):
#!/usr/bin/python
import pexpect
import os
PASS='mypassword'
HOST='1.2.3.4'
LOGIN_COMMAND='ssh manage#'+HOST
CTL_COMMAND='show controller-statistics'
VDISK_COMMAND='show vdisk-statistics'
VOL_COMMAND='show volume-statistics'
VDISK_LOG='vdisk.log'
VOLUME_LOG='volume.log'
CONTROLLER_LOG='volume.log'
DATE=os.system('date +%Y%m%d%H%M%S')
child=pexpect.spawn(LOGIN_COMMAND)
child.setecho(True)
child.logfile = open('FetchSan.log','w+')
child.expect('Password: ')
child.sendline(PASS)
child.expect('# ')
child.sendline(CTL_COMMAND)
print child.before
child.expect('# ')
child.sendline(VDISK_COMMAND)
print child.before
child.expect('# ')
print "Sending "+VOL_COMMAND
child.sendline(VOL_COMMAND)
print child.before
child.expect('# ')
child.sendline('exit')
child.expect(pexpect.EOF)
print child.before
The output expected:
# show controller-statistics
Durable ID CPU Load Power On Time (Secs) Bytes per second IOPS Number of Reads Number of Writes Data Read Data Written
---------------------------------------------------------------------------------------------------------------------------------------------------------
controller_A 0 45963169 1573.3KB 67 386769785 514179976 6687.8GB 5750.6GB
controller_B 20 45963088 4627.4KB 421 3208370173 587661282 63.9TB 5211.2GB
---------------------------------------------------------------------------------------------------------------------------------------------------------
Success: Command completed successfully.
# show vdisk-statistics
Name Serial Number Bytes per second IOPS Number of Reads Number of Writes Data Read Data Written
------------------------------------------------------------------------------------------------------------------------------------------------
CRS 00c0ff13349e000006d5c44f00000000 0B 0 45861 26756 3233.0MB 106.2MB
DATA 00c0ff1311f300006dd7c44f00000000 2282.4KB 164 23229435 76509765 5506.7GB 1605.3GB
DATA1 00c0ff1311f3000087d8c44f00000000 2286.5KB 167 23490851 78314374 5519.0GB 1603.8GB
DATA2 00c0ff1311f30000c2f8ce5700000000 0B 0 26 4 1446.9KB 65.5KB
FRA 00c0ff13349e000001d8c44f00000000 654.8KB 5 3049980 15317236 1187.3GB 1942.1GB
FRA1 00c0ff13349e000007d9c44f00000000 778.7KB 6 3016569 15234734 1179.3GB 1940.4GB
------------------------------------------------------------------------------------------------------------------------------------------------
Success: Command completed successfully.
# show volume-statistics
Name Serial Number Bytes per second IOPS Number of Reads Number of Writes Data Read Data Written
-----------------------------------------------------------------------------------------------------------------------------------------------------
CRS_v001 00c0ff13349e0000fdd6c44f01000000 14.8KB 5 239611146 107147564 1321.1GB 110.5GB
DATA1_v001 00c0ff1311f30000d0d8c44f01000000 2402.8KB 218 1701488316 336678620 33.9TB 3184.6GB
DATA2_v001 00c0ff1311f3000040f9ce5701000000 0B 0 921 15 2273.7KB 2114.0KB
DATA_v001 00c0ff1311f30000bdd7c44f01000000 2303.4KB 209 1506883611 250984824 30.0TB 2026.6GB
FRA1_v001 00c0ff13349e00001ed9c44f01000000 709.1KB 28 25123082 161710495 1891.0GB 2230.0GB
FRA_v001 00c0ff13349e00001fd8c44f01000000 793.0KB 34 122052720 245322281 3475.7GB 3410.0GB
-----------------------------------------------------------------------------------------------------------------------------------------------------
Success: Command completed successfully.
The output as printed to the terminal (as mentioned, the 3rd command won't execute in its current state):
show controller-statistics
Durable ID CPU Load Power On Time (Secs) Bytes per second
IOPS Number of Reads Number of Writes Data Read
Data Written
----------------------------------------------------------------------
controller_A 3 45962495 3803.1KB
73 386765821 514137947 6687.8GB
5748.9GB
controller_B 20 45962413 5000.7KB
415 3208317860 587434274 63.9TB
5208.8GB
----------------------------------------------------------------------
Success: Command completed successfully.
Sending show volume-statistics
show vdisk-statistics
Name Serial Number Bytes per second IOPS
Number of Reads Number of Writes Data Read Data Written
----------------------------------------------------------------------------
CRS 00c0ff13349e000006d5c44f00000000 0B 0
45861 26756 3233.0MB 106.2MB
DATA 00c0ff1311f300006dd7c44f00000000 2187.2KB 152
23220764 76411017 5506.3GB 1604.1GB
DATA1 00c0ff1311f3000087d8c44f00000000 2295.2KB 154
23481442 78215540 5518.5GB 1602.6GB
DATA2 00c0ff1311f30000c2f8ce5700000000 0B 0
26 4 1446.9KB 65.5KB
FRA 00c0ff13349e000001d8c44f00000000 1829.3KB 14
3049951 15310681 1187.3GB 1941.2GB
FRA1 00c0ff13349e000007d9c44f00000000 1872.8KB 14
3016521 15228157 1179.3GB 1939.5GB
----------------------------------------------------------------------------
Success: Command completed successfully.
Traceback (most recent call last):
File "./fetchSAN.py", line 34, in <module>
child.expect('# ')
File "/Library/Python/2.7/site-packages/pexpect-4.2.1-py2.7.egg/pexpect/spawnbase.py", line 321, in expect
timeout, searchwindowsize, async)
File "/Library/Python/2.7/site-packages/pexpect-4.2.1-py2.7.egg/pexpect/spawnbase.py", line 345, in expect_list
return exp.expect_loop(timeout)
File "/Library/Python/2.7/site-packages/pexpect-4.2.1-py2.7.egg/pexpect/expect.py", line 107, in expect_loop
return self.timeout(e)
File "/Library/Python/2.7/site-packages/pexpect-4.2.1-py2.7.egg/pexpect/expect.py", line 70, in timeout
raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x105333910>
command: /usr/bin/ssh
args: ['/usr/bin/ssh', 'manage#10.254.27.49']
buffer (last 100 chars): '-------------------------------------------------------------\r\nPress any key to continue (Q to quit)'
before (last 100 chars): '-------------------------------------------------------------\r\nPress any key to continue (Q to quit)'
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 19519
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: <open file 'FetchSan.log', mode 'w+' at 0x1053321e0>
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_re:
0: re.compile("# ")
And here is what is captured in the log:
Password: mypassword
HP StorageWorks MSA Storage P2000 G3 FC
System Name: Uninitialized Name
System Location:Uninitialized Location
Version:TS230P008
# show controller-statistics
show controller-statistics
Durable ID CPU Load Power On Time (Secs) Bytes per second
IOPS Number of Reads Number of Writes Data Read
Data Written
----------------------------------------------------------------------
controller_A 3 45962495 3803.1KB
73 386765821 514137947 6687.8GB
5748.9GB
controller_B 20 45962413 5000.7KB
415 3208317860 587434274 63.9TB
5208.8GB
----------------------------------------------------------------------
Success: Command completed successfully.
# show vdisk-statistics
show vdisk-statistics
Name Serial Number Bytes per second IOPS
Number of Reads Number of Writes Data Read Data Written
----------------------------------------------------------------------------
CRS 00c0ff13349e000006d5c44f00000000 0B 0
45861 26756 3233.0MB 106.2MB
DATA 00c0ff1311f300006dd7c44f00000000 2187.2KB 152
23220764 76411017 5506.3GB 1604.1GB
DATA1 00c0ff1311f3000087d8c44f00000000 2295.2KB 154
23481442 78215540 5518.5GB 1602.6GB
DATA2 00c0ff1311f30000c2f8ce5700000000 0B 0
26 4 1446.9KB 65.5KB
FRA 00c0ff13349e000001d8c44f00000000 1829.3KB 14
3049951 15310681 1187.3GB 1941.2GB
FRA1 00c0ff13349e000007d9c44f00000000 1872.8KB 14
3016521 15228157 1179.3GB 1939.5GB
----------------------------------------------------------------------------
Success: Command completed successfully.
# show volume-statistics
show volume-statistics
Name Serial Number Bytes per second
IOPS Number of Reads Number of Writes Data Read
Data Written
----------------------------------------------------------------------
CRS_v001 00c0ff13349e0000fdd6c44f01000000 11.7KB
5 239609039 107145979 1321.0GB
110.5GB
DATA1_v001 00c0ff1311f30000d0d8c44f01000000 2604.5KB
209 1701459941 336563041 33.9TB
3183.3GB
DATA2_v001 00c0ff1311f3000040f9ce5701000000 0B
0 921 15 2273.7KB
2114.0KB
DATA_v001 00c0ff1311f30000bdd7c44f01000000 2382.8KB
194 1506859273 250871273 30.0TB
2025.4GB
FRA1_v001 00c0ff13349e00001ed9c44f01000000 1923.5KB
31 25123006 161690520 1891.0GB
2229.1GB
FRA_v001 00c0ff13349e00001fd8c44f01000000 2008.5KB
37 122050872 245301514 3475.7GB
3409.1GB
----------------------------------------------------------------------
Press any key to continue (Q to quit)%
As a starting point: According to the manual, that SAN has a command to disable the pager. See the documentation for set cli-parameters pager off. It may be sufficient to execute that command. It may also have a command to set the terminal rows and columns that it uses for formatting output, although I wasn't able to find one.
Getting to your question: When an ssh client connects to a server and requests an interactive session, it can optionally request a PTY (pseudo-tty) for the server side of the session. When it does that, it informs the server of the lines, columns, and terminal type which the server should use for the TTY. Your SAN may honor PTY requests and use the lines and columns values to format its output. Or it may not.
The ssh client gets the rows and columns for the PTY request from the TTY for its standard input. This is the PTY which pexpect is using to communicate with ssh.
this question discusses how to set the terminal size for a pexpect session. Ssh doesn't honor the LINES or COLUMNS environment variables as far as I can tell, so I doubt that would work. However, calling child.setwinsize() after spawning ssh ought to work:
child = pexpect.spawn(cmd)
child.setwinsize(400,400)
If you have trouble with this, you could try setting the terminal size by invoking stty locally before ssh:
child=pexpect.spawn('stty rows x cols y; ssh user#host')
Finally, you need to make sure that ssh actually requests a PTY for the session. It does this by default in some cases, which should include the way you are running it. But it has a command-line option -tt to force it to allocate a PTY. You could add that option to the ssh command line to make sure:
child=pexpect.spawn('ssh -tt user#host')
or
child=pexpect.spawn('stty rows x cols y; ssh -tt user#host')

Parse ASCII Output of a Device-File in C++

i wrote a kernelspace driver for a USB-device. If it is connected it mounts to /dev/myusbdev0 for example.
Via command line with echo -en "command" > /dev/myusbdev0 i can send commands to the device and read results with cat /dev/myusbdev0.
Ok, now i have to write a C++ program. At first i would open the device file for read/write with:
int fd = open("/dev/echo", O_RDRW);
After that a cmd will be send to get the device working:
char cmd[] = { "\x02sEN LMDscandata 1\x03" };
write(fd, cmd, sizeof(cmd));
Now i get to the part i dont now how to handle yet. i now need to read from the device, as its keeping on sending me data continously. this data i need to read and parse now ...
char buf[512];
read(fd, buf, sizeof(buf);
The data looks like following, each one starts with \x02 and ends with \x03, they are not always the same size:
sRA LMDscandata 1 1 89A27F 0 0 343 347 27477BA9 2747813B 0 0 7 0 0
1388 168 0 1 DIST1 3F800000 00000000 186A0 1388 15 8A1 8A5 8AB 8AC 8A6
8AC 8B6 8C8 8C2 8C9 8CB 8C4 8E4 8E1 8EB 8E0 8F5 908 8FC 907 906 0 0 0
0 0 0 All Values are separated with a 20hex {SPC}
it think i need some kind of while loop to continiously read the data from an \x02 until i read a \x03.
if i have a complete scan, i need to parse this ascii message in its seperate parts (some variables uint_16, uint_8, enum_16, ...).
any idea how i can read a complete scan into a buf[] and then parse its components out?
As you say the device is sending continiously, i would recommend adding a queue to hold the chunks coming in, and some dispatching that takes out parts of the queue, i.e. x02 to x03, decoupling the work that is done from receiving chunks.
Furthermore you can have then single objects handling one complete block from x02 to x03, perhaps threaded (makes sense with the information given).
device => chunk reader => input queue => inputer reader => data handling
hope this helps

Perl RegEx for Matching 11 column File

I'm trying to write a perl regex to match the 5th column of files that contain 11 columns. There's also a preamble and footer which are not data. Any good thoughts on how to do this? Here's what I have so far:
if($line =~ m/\A.*\s(\b\w{9}\b)\s+(\b[\d,.]+\b)\s+(\b[\d,.sh]+\b)\s+.*/i) {
And this is what the forms look like:
No. Form 13F File Number Name
____ 28-________________ None
[Repeat as necessary.]
FORM 13F INFORMATION TABLE
TITLE OF VALUE SHRS OR SH /PUT/ INVESTMENT OTHER VOTING AUTHORITY
NAME OF INSURER CLASS CUSSIP (X$1000) PRN AMT PRNCALL DISCRETION MANAGERS SOLE SHARED NONE
Abbott Laboratories com 2824100 4,570 97,705 SH sole 97,705 0 0
Allstate Corp com 20002101 12,882 448,398 SH sole 448,398 0 0
American Express Co com 25816109 11,669 293,909 SH sole 293,909 0 0
Apollo Group Inc com 37604105 8,286 195,106 SH sole 195,106 0 0
Bank of America com 60505104 174 12,100 SH sole 12,100 0 0
Baxter Internat'l Inc com 71813109 2,122 52,210 SH sole 52,210 0 0
Becton Dickinson & Co com 75887109 8,216 121,506 SH sole 121,506 0 0
Citigroup Inc com 172967101 13,514 3,594,141 SH sole 3,594,141 0 0
Coca-Cola Co. com 191216100 318 6,345 SH sole 6,345 0 0
Colgate Palmolive Co com 194162103 523 6,644 SH sole 6,644 0 0
If you ever do write a regex this long, you should at least use the x flag to ignore whitespace, and importantly allow whitespace and comments:
m/
whatever
something else # actually trying to do this
blah # for fringe case X
/xi
If you find it hard to read your own regex, others will find it Impossible.
I think a regular expression is overkill for this.
What I'd do is clean up the input and use Text::CSV_XS on the file, specifying the record separator (sep_char).
Like Ether said, another tool would be appropriate for this job.
#fields = split /\t/, $line;
if (#fields == 11) { # less than 11 fields is probably header/footer
$the_5th_column = $fields[4];
...
}
My first thought is that the sample data is horribly mangled in your example. It'd be great to see it embedded inside some <pre>...</pre> tags so columns will be preserved.
If you are dealing with columnar data, you can go after it using substr() or unpack() easier than you can using regex. You can use regex to parse out the data, but most of us who've been programming Perl a while also learned that regex is not the first tool to grab a lot of times. That's why you got the other comments. Regex is a powerful weapon, but it's also easy to shoot yourself in the foot.
http://perldoc.perl.org/functions/substr.html
http://perldoc.perl.org/functions/unpack.html
Update:
After a bit of nosing around on the SEC edgar site, I've found that the 13F files are nicely formatted. And, you should have no problem figuring out how to process them using substr and/or unpack.
FORM 13F INFORMATION TABLE
VALUE SHARES/ SH/ PUT/ INVSTMT OTHER VOTING AUTHORITY
NAME OF ISSUER TITLE OF CLASS CUSIP (x$1000) PRN AMT PRN CALL DSCRETN MANAGERS SOLE SHARED NONE
- ------------------------------ ---------------- --------- -------- -------- --- ---- ------- ------------ -------- -------- --------
3M CO COM 88579Y101 478 6051 SH SOLE 6051 0 0
ABBOTT LABS COM 002824100 402 8596 SH SOLE 8596 0 0
AFLAC INC COM 001055102 291 6815 SH SOLE 6815 0 0
ALCATEL-LUCENT SPONSORED ADR 013904305 172 67524 SH SOLE 67524 0 0
If you are seeing the 13F files unformatted, as in your example, then you are not viewing correctly because there are tabs between columns in some of the files.
I looked through 68 files to get an idea of what's out there, then wrote a quick unpack-based routine and got this:
3M CO, COM, 88579Y101, 478, 6051, SH, , SOLE, , 6051, 0, 0
ABBOTT LABS, COM, 002824100, 402, 8596, SH, , SOLE, , 8596, 0, 0
AFLAC INC, COM, 001055102, 291, 6815, SH, , SOLE, , 6815, 0, 0
ALCATEL-LUCENT, SPONSORED ADR, 013904305, 172, 67524, SH, , SOLE, , 67524, 0, 0
Based on some of the other files here's some thoughts on how to process them:
Some of the files use tabs to separate the columns. Those are trivial to parse and you do not need regex to split the columns. 0001031972-10-000004.txt appears to be that way and looks very similar to your example.
Some of the files use tabs to align the columns, not separate them. You'll need to figure out how to compress multiple tab runs into a single tab, then probably split on tabs to get your columns.
Others use a blank line to separate the rows vertically so you'll need to skip blank lines.
Others allow wrap columns to the next line (like a spreadsheet would in a column that is not wide enough. It's not too hard to figure out how to deal with that, but how to do it is being left as an exercise for you.
Some use centered column alignment, resulting in leading and trailing whitespace in your data. s/^\s+//; and s/\s+$//; will become your friends.
The most interesting one I saw appeared to have been created correctly, then word-wrapped at column 78, leading me to think some moron loaded their spreadsheet or report into their word processor then saved it. Reading that is a two step process of getting rid of the wrapping carriage-returns, then re-processing the data to parse out the columns. As an added task they also have column headings in the data for page breaks.
You should be able to get 100% of the files parsed, however you'll probably want to do it with a couple different parsing methods because of the use of tabs and blank lines and embedded column headers.
Ah, the fun of processing data from the wilderness.