Pyomo Gurobi solver error- "ERROR: Solver (gurobi) returned non-zero return code (137)" - pyomo

I have a solution for solving an MIP problem of graph which works fine and gives the following output when I run it for smaller graphs. I'm using Gurobi solver with Pyomo.
Problem:
- Name: x73
Lower bound: 192.0
Upper bound: 192.0
Number of objectives: 1
Number of constraints: 10
Number of variables: 37
Number of binary variables: 36
Number of integer variables: 36
Number of continuous variables: 1
Number of nonzeros: 37
Sense: minimize
Solver:
- Status: ok
Return code: 0
Message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.
Termination condition: optimal
Termination message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.
Wall time: 0.03206682205200195
Error rc: 0
Time: 0.09361410140991211
Solution:
- number of solutions: 0
number of solutions displayed: 0
But I am getting the following error while running the code with larger graphs.
ERROR: Solver (gurobi) returned non-zero return code (137)
ERROR: Solver log: Using license file /opt/shared/gurobi/gurobi.lic Set
parameter TokenServer to value gurobi.lm.udel.edu Set parameter TSPort to
value 40100 Read LP format model from file /tmp/tmpaud9ogrn.pyomo.lp
Reading time = 0.01 seconds x1101: 56 rows, 551 columns, 551 nonzeros
Changed value of parameter TimeLimit to 600.0
Prev: inf Min: 0.0 Max: inf Default: inf
Gurobi Optimizer version 9.0.1 build v9.0.1rc0 (linux64) Optimize a model
with 56 rows, 551 columns and 551 nonzeros Model fingerprint: 0xafe0319a
Model has 15400 quadratic objective terms Variable types: 1 continuous,
550 integer (550 binary) Coefficient statistics:
Matrix range [1e+00, 1e+00] Objective range [0e+00, 0e+00]
QObjective range [4e+00, 8e+01] Bounds range [1e+00, 1e+00] RHS
range [1e+00, 1e+00]
Found heuristic solution: objective 22880.000000 Presolve removed 1 rows
and 1 columns Presolve time: 0.01s Presolved: 55 rows, 550 columns, 550
nonzeros Presolved model has 15400 quadratic objective terms Variable
types: 0 continuous, 550 integer (550 binary)
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
130920 9.8490000e+02 1.610955e+03 0.000000e+00 5s 263917
1.0999000e+03 1.710649e+03 0.000000e+00 10s 397157
1.0999000e+03 2.243077e+03 0.000000e+00 15s 529512
1.0999000e+03 1.910603e+03 0.000000e+00 20s 662404
1.0999000e+03 1.584650e+03 0.000000e+00 25s 791296
1.0999000e+03 1.812443e+03 0.000000e+00 30s 906473
1.3475000e+03 0.000000e+00 0.000000e+00 34s
Root relaxation: objective 1.347500e+03, 906473 iterations, 34.32 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node
Time
H 0 0 1730.0000000 0.00000 100% - 52s
H 0 0 1654.0000000 0.00000 100% - 52s
H 0 0 1578.0000000 0.00000 100% - 52s
0 0 1347.50000 0 137 1578.00000 1347.50000 14.6% - 52s
0 0 1347.50000 0 137 1578.00000 1347.50000 14.6% - 53s
H 0 0 1540.0000000 1347.50000 12.5% - 53s
0 2 1347.50000 0 145 1540.00000 1347.50000 12.5% - 55s
101 118 1396.92351 10 140 1540.00000 1347.50000 12.5% 157 61s
490 591 1416.40484 18 128 1540.00000 1347.50000 12.5% 63.6 65s
2136 2347 1440.09938 42 100 1540.00000 1347.50000 12.5% 42.9 70s
3847 3402 1461.55736 81 80 1540.00000 1347.50000 12.5% 37.0 82s
/opt/shared/gurobi/9.0.1/bin/gurobi.sh: line 17: 23890 Killed
$PYTHONHOME/bin/python3.7 "$#"
Traceback (most recent call last):
File "/home/2925/EdgeColoring/main.py", line 91, in <module>
qubo_coloring, qubo_time = qubo(G, colors, edge_list, solver)
File "/home/2925/EdgeColoring/qubo.py", line 59, in qubo
result = solver.solve(model)
File "/home/2925/.conda/envs/qubo/lib/python3.9/site-packages/pyomo/opt/base/solvers.py", line 596, in solve
raise ApplicationError(
pyomo.common.errors.ApplicationError: Solver (gurobi) did not exit normally
Using TimeLimit upto 2 minutes breaks the model early without any error but doesn't always give any optimal solution for larger graphs. Memory or processing power is not an issue here. I need to run the code without any interruption for at least 10 minutes if not for hours.

Related

KDB moving percentile using Swin function

I am trying to create a list of the 99th and 1st percentiles. Rather than a single percentile for today. I wanted percentiles for 500 days each using the prior 500 days. The functions I was using for this are the following
swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
percentile:{[x;y] y (100 xrank y:asc y) bin x}
swin[percentile[99;];500;List].
The issue I come across is that the 99th percentile calculates perfectly, but the 1st percentile makes the entire list = 0. a bit lost as to why it would do that. suggestions appreciated!
What's causing the zeros is two-fold:
What behaviour do you want for the earliest 500 days when there isn't 500 days of history to work with? On day 1 there's only 1 datapoint, on day 2 only 2 etc. Only on the 500th day is there 500 days of actual data to work with. By default that swin function fills the gaps with some seed value
You're using zero as that seed value, aka w#0
For example a 5 day lookback on each date looks something like:
q)swin[::;5;1 2 3 4 5]
0 0 0 0 1
0 0 0 1 2
0 0 1 2 3
0 1 2 3 4
1 2 3 4 5
You have zeros until you have data, so naturally the 1st percentile will pick up the zeros for the first roughly 500 dates.
So then you can decide to seed with a different value, or else possibly exclude zeros from your percentile function:
q)List:1000?1000
q)percentile:{[x;y] y (100 xrank y:asc y except 0) bin x}
q)swin[percentile[1;];500;List]
908 360 360 257 257 257 90 90 90 90 90 90 90 90...
If zeros are a legitimate value in your list and can't be excluded then maybe seed the swin with some other value that you know won't be in the list (negatives? infinity? null?) and then exclude that seed from the percentile function.
EDIT: A final alternative is to use a different sliding window function which doesn't fill gaps with a seed value, e.g.
q)swin2:{[f;w;s] f each(),/:{neg[x]sublist y,z}[w]\[s]}
q)swin2[::;5;1 2 3 4 5]
,1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
q)percentile:{[x;y] y (100 xrank y:asc y) bin x}
q)swin2[percentile[99;];500;List]
908 908 908 908 908 908 908 908 908 908 908 959 959..
q)swin2[percentile[1;];500;List]
908 360 360 257 257 257 90 90 90 90 90 90 90 90 90..

gruobi: used model.write but cannot find the file

I am using Gurobi with C++ and want to save the Lp as a file.lp. Therefore, I used
model.write("model.lp");
model.optimize();
This is my output and nor error occurs:
Optimize a model with 105 rows, 58 columns and 186 nonzeros
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [1e+00, 1e+00]
Bounds range [0e+00, 0e+00]
RHS range [1e+00, 6e+00]
Presolve removed 105 rows and 58 columns
Presolve time: 0.00s
Presolve: All rows and columns removed
Iteration Objective Primal Inf. Dual Inf. Time
0 -0.0000000e+00 0.000000e+00 0.000000e+00 0s
Solved in 0 iterations and 0.00 seconds
Optimal objective -0.000000000e+00
obj: -0
Status2
Process finished with exit code 0
So there is probably a mistake in my LP, since the optimal solution should not be 0. This is why I want to have a look at the model.lp file. However, I cannot find it. I searched my whole computer. Am I missing anything?

Counting how many times a condition succeeds in SAS

I have a table A with real time values in it.
Amount Count Pct1 Pct2
300 2 0.000 100.000
1,891 2 0.001 100.000
500 2 0.000 100.000
100 2 0.000 100.000
1,350 2 0.001 100.000
2,648 2 0.001 100.000
2,255 2 0.001 100.000
500 2 0.000 100.000
200 2 0.000 30.441
10 2 0.000 100.000
1,928 2 0.001 100.000
40 2 0.000 100.000
200 2 0.000 100.000
256 2 0.000 100.000
254 2 0.000 100.000
100 2 0.001 100.000
50 1 0.000 33.333
1,512 2 0.001 100.000
I have a table B with a set of conditions. I want to generate the Condition success count in SAS. i.e. If I pass the row 1 in the below table as a condition to the table A it succeeds 2 times. I am using a join to generate a cartesin product and its not efficient. I want an efficient way to solve this problem (similar to what countifs function does in excel). Thanks a lot for your help.
Amount Count Pct1 Pct2 Condion Success Count
1,576 2 0 100 4
1,537 2 0 100 4
1,484 2 0 100 5
1,405 2 0 100 5
1,290 2 0 100 6
1,095 2 0 100 6
948 2 0 100 6
932 2 0 100 6
914 2 0 100 6
887 2 0 100 6
850 2 0 100 6
774 2 0 100 6
707 2 0 100 6
704 2 0 100 6
695 2 0 100 6
646 2 0 100 6
50 1 0 5.42 16
50 1 0 5.42 16
You said that you have tried join to make to make a cartesian product. However, since you didn't post any code I am not sure if you tried to make full product and then calculate the rows. Doing the counting in one SQL statement is much faster since actually full cartesian product is not written anywhere. Like this:
proc sql;
create table tableC as
select c.*, coalesce(s,0) as SuccessCount from TableB c
left join (
select id, count(*) as s from TableA a,TableB b
where
a.amount >= b.amount and
a.count >= b.count and
a.pct1 >= b.pct1 and
a.pct2 >= b.pct2
group by id
) as d
on c.id = d.id
;
quit;
Note that tableB needs to have some unique id column. You should always have some column to use as id but if you don't have it already simple create it like this for example:
data tableB;
set tableB;
id = _N_;
run;

Pandas quantile failing with NaN's present

I've encountered an interesting situation while calculating the inter-quartile range. Assuming we have a dataframe such as:
import pandas as pd
index=pd.date_range('2014 01 01',periods=10,freq='D')
data=pd.np.random.randint(0,100,(10,5))
data = pd.DataFrame(index=index,data=data)
data
Out[90]:
0 1 2 3 4
2014-01-01 33 31 82 3 26
2014-01-02 46 59 0 34 48
2014-01-03 71 2 56 67 54
2014-01-04 90 18 71 12 2
2014-01-05 71 53 5 56 65
2014-01-06 42 78 34 54 40
2014-01-07 80 5 76 12 90
2014-01-08 60 90 84 55 78
2014-01-09 33 11 66 90 8
2014-01-10 40 8 35 36 98
# test for q1 values (this works)
data.quantile(0.25)
Out[111]:
0 40.50
1 8.75
2 34.25
3 17.50
4 29.50
# break it by inserting row of nans
data.iloc[-1] = pd.np.NaN
data.quantile(0.25)
Out[115]:
0 42
1 11
2 34
3 12
4 26
The first quartile can be calculated by taking the median of values in the dataframe that fall below the overall median, so we can see what data.quantile(0.25) should have yielded. e.g.
med = data.median()
q1 = data[data<med].median()
q1
Out[119]:
0 37.5
1 8.0
2 19.5
3 12.0
4 17.0
It seems that quantile is failing to provide an appropriate representation of q1 etc. since it is not doing a good job of handling the NaN values (i.e. it works without NaNs, but not with NaNs).
I thought this may not be a "NaN" issue, rather it might be quantile failing to handle even-numbered data sets (i.e. where the median must be calculated as the mean of the two central numbers). However, after testing with dataframes with both even and odd-numbers of rows I saw that quantile handled these situations properly. The problem seems to arise only when NaN values are present in the dataframe.
I would like to use quntile to calculate the rolling q1/q3 values in my dataframe, however, this will not work with NaN's present. Can anyone provide a solution to this issue?
Internally, quantile uses numpy.percentile over the non-null values. When you change the last row of data to NaNs you're essentially left with an array array([ 33., 46., 71., 90., 71., 42., 80., 60., 33.]) in the first column
Calculating np.percentile(array([ 33., 46., 71., 90., 71., 42., 80., 60., 33.]) gives 42.
From the docstring:
Given a vector V of length N, the qth percentile of V is the qth ranked
value in a sorted copy of V. A weighted average of the two nearest
neighbors is used if the normalized ranking does not match q exactly.
The same as the median if q=50, the same as the minimum if q=0
and the same as the maximum if q=100.

Approach guidance: grepping a string between multiple, slightly different delimiters

Assume a 3Kb file that looks like this:
PdId1 Unit 1
Model 3244
Status: OK
Advanced Status OK
-----------------------
No errors found
Statistics...
...<arbitrary length values here>...
PdId2 Unit 1
Model 3222
Status: OK
Advanced Status OK
-----------------------
Error Log is as follows <arbitrary values here>
PdId3 Unit 1
Model 3243
Status: OK
Advanced Status OK
-----------------------
No errors found
So we can be certain that PdIdn can reliably used as a delimiter, that it's always at the start of a line and that it's always trailing a numebr. I want to parse the text between the delimiter for "No errors found" and if the string is missing, grab the delimiter and the next four lines (grep -A4), glue on an error message and echo the result.
I've been wracking my brain about how to approach this. I'm most comfortable in Bash with grep, but I don't think grep's going to cut it here. I've looked at using split to break the file into pieces, but this seems messy and hard to clean up after processing is done. I started to try to write something in awk / sed, but I don't understand how to split on the delimiters, then go back and parse each result, then break off the next piece and parse that.
I apologise for the general nature of this question, but I'm stumped and could use some guidance.
Edit: Technically, PdId isn't a delimiter as much as it's the start of the next record. The number of records is arbitrary.
Edit: We've now got real world data to work with:
-------------------------------------------------------------------------------
PdId: 1
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 16/41 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 251) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 71
3 Spin_Up_Time 0x0007 169 169 024 Pre-fail Always
- 245 (Average 204)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 746
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1181
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 751
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 751
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 16/41)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 2
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 16/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 246) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 72
3 Spin_Up_Time 0x0007 171 171 024 Pre-fail Always
- 243 (Average 201)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 746
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1181
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 749
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 749
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 16/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 3
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 241) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 140 140 054 Pre-fail Offline
- 67
3 Spin_Up_Time 0x0007 170 170 024 Pre-fail Always
- 234 (Average 213)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1188
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 750
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 750
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 4
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 15/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 254) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
ATA Error Count: 165 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 165 occurred at disk power-on lifetime: 1176 hours (49 days + 0 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 50 b0 ee 81 0d
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 a8 80 ee 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 a0 00 ee 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 98 80 ed 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 90 00 ed 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 88 80 ec 81 40 00 18:38:48.275 WRITE FPDMA QUEUED
Error 164 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 10 f0 ad 6b 0d Error: ICRC, ABRT 16 sectors at LBA = 0x0d6badf0 = 225160688
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 00 80 80 ad 6b 40 00 18:36:07.145 WRITE DMA EXT
35 00 80 00 ae 6b 40 00 18:36:07.144 WRITE DMA EXT
35 00 80 00 ad 6b 40 00 18:36:07.144 WRITE DMA EXT
35 00 80 80 ab 6b 40 00 18:36:07.139 WRITE DMA EXT
35 00 80 00 ab 6b 40 00 18:36:07.139 WRITE DMA EXT
Error 163 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 f0 10 5e 5d 0d Error: ICRC, ABRT 240 sectors at LBA = 0x0d5d5e10 = 224222736
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 00 80 80 5b 5d 40 00 18:35:47.982 WRITE DMA EXT
35 00 80 80 5a 5d 40 00 18:35:47.982 WRITE DMA EXT
35 00 80 00 59 5d 40 00 18:35:47.981 WRITE DMA EXT
35 00 00 00 58 5d 40 00 18:35:47.979 WRITE DMA EXT
35 00 30 00 36 5d 40 00 18:35:47.960 WRITE DMA EXT
Error 162 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 20 e0 33 19 0d
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 30 00 33 19 40 00 18:34:50.672 WRITE FPDMA QUEUED
61 80 28 80 33 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 80 20 00 34 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 00 18 80 34 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 80 10 80 36 19 40 00 18:34:50.670 WRITE FPDMA QUEUED
Error 161 occurred at disk power-on lifetime: 1133 hours (47 days + 5 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 d0 30 dd 3b 0a
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 38 80 dc 3b 40 00 06:26:51.414 WRITE FPDMA QUEUED
61 80 30 00 df 3b 40 00 06:26:51.413 WRITE FPDMA QUEUED
61 80 28 80 df 3b 40 00 06:26:51.413 WRITE FPDMA QUEUED
61 80 20 00 da 3b 40 00 06:26:51.402 WRITE FPDMA QUEUED
61 80 18 80 da 3b 40 00 06:26:51.402 WRITE FPDMA QUEUED
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 73
3 Spin_Up_Time 0x0007 170 170 024 Pre-fail Always
- 234 (Average 212)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 747
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1187
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 748
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 748
194 Temperature_Celsius 0x0002 200 200 000 Old_age Always
- 30 (Lifetime Min/Max 15/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 165
-------------------------------------------------------------------------------
PdId: 5
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 251) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 140 140 054 Pre-fail Offline
- 68
3 Spin_Up_Time 0x0007 133 133 024 Pre-fail Always
- 289 (Average 282)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1186
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 750
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 750
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 6
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 30 Celsius
Power Cycle Min/Max Temperature: 27/30 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 243) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 72
3 Spin_Up_Time 0x0007 130 130 024 Pre-fail Always
- 294 (Average 287)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1186
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 751
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 751
194 Temperature_Celsius 0x0002 200 200 000 Old_age Always
- 30 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
Read line by line, collect into an accumulator the lines you want, print the accumulated lines when you see the trigger message (otherwise just start over and overwrite the accumulator when you see the start of the next record).
We use a as the accumulator and the helper variable n to keep track of how many lines to accumulate:
awk '/^PdId: [1-9][0-9]*/ { a=$0; n=4; next }
n { --n; a=a "\n" $ 0; next }
/No Errors Logged/ { print a }' file
Put the following into an executable awk file:
#!/usr/bin/awk -f
BEGIN {no_errs=1}
c > 0 {a[c++]=$0}
/^----------/ {
logAnyErrors()
ata_err=""
no_errs=0
c=1
delete a
}
/^No Errors Logged/ {no_errs=1}
/^ATA Error Count:/ {ata_err=$0}
function logAnyErrors() {
if( ata_err!="" || !no_errs) {
for(i=1;i<=5;i++) print a[i]
if( ata_err!="" ) print ata_err
print "--" # separator
}
}
END { logAnyErrors() }
Your data actually has a delimiter of "^------------------"... before each PdId.
The breakdown:
Start off assuming no errors in the BEGIN block
add each record line to an array called a with a line counter c
Whenever a new record section occurs call logAnyErrors() and reset counters
In logAnyErrors(), if there are ATA or other errors, print the first 5 lines of the record and a delimiter similar to what I think grep -A4 would output.
At the end, log any errors in the final record.
When I put this into an executable file called awko and run like awko data I get the following output:
PdId: 4
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
ATA Error Count: 165 (device log contains only the most recent five errors)
----------------
It's possible that the delete a line is non-conforming for some awks. Works on my mac. It's not necessary unless you want to print out more information in each block when errors occur(since the first 5 lines will always be overwritten).