replacing a specific column with a specific value using gawk - replace

I am trying to find everywhere my data has a 90 in column 2 and two lines above change the value of column 2. For example in my data below, if I see 90 at line 11 I want to change my column 2 value at line 9 from 11 to 5. I have a predetermined set of values I want to change the number to; the values will always be 10,11,12,30,31,32 to 1,2,3,4,5,6 respectably.
My data
# Type Response Acc RT Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 31 0 0 0.0000 70221
What I want
# Type Response Acc RT Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 5 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 31 0 0 0.0000 70221
I have been trying to store the previous line and use that as a reference but I can only go back one line, and I need to go back two. Thank you for your help.

This should work:
function pra(a) {
for(e in a) {
printf "%s ", a[e];
}
print "";
}
BEGIN {
vals[10] = 1;
vals[11] = 2;
vals[12] = 3;
vals[30] = 4;
vals[31] = 5;
vals[32] = 6;
}
NR == 1 { split($0, a, " ") }
NR == 2 { split($0, b, " ") }
NR > 2 {
if($2 == "90") {
a[2] = vals[a[2]];
}
pra(a);
al = 0;
for(i in a) al++;
for(i = 1; i <= al; i++) {
a[i] = b[i];
}
split($0, b, " ");
}
END {
pra(a);
pra(b);
}
The rundown of how this works:
* BEGING block - assign the translation values to vals
* NR == 1 and NR == 2 - remember the first two lines into split arrays a and b
* NR > 2 - for all lines after the first two
* If the second column has value 90, change it using the translation array
* Move elements of array b to a and split the current line into b
* END block - print a and b, which are basically last two lines
Sample run:
$ cat inp && awk -f mkt.awk inp
# Type Response Acc RT Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 31 0 0 0.0000 70221
# Type Response Acc RT Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 2 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 31 0 0 0.0000 70221
You can do something like this:
function pra(a) {
printf "%4d%8d%3d%5d%9.4f%6d\n", a[1], a[2], a[3], a[4], a[5], a[6]
}
BEGIN {
vals[10] = 1;
vals[11] = 2;
vals[12] = 3;
vals[30] = 4;
vals[31] = 5;
vals[32] = 6;
}
NR == 1 { print }
NR == 2 { split($0, a, " ") }
NR == 3 { split($0, b, " ") }
NR > 4 {
if($2 == "90") {
a[2] = vals[a[2]];
}
pra(a);
for(i = 1; i <= 6; i++) {
a[i] = b[i];
}
split($0, b, " ");
}
END {
pra(a);
pra(b);
}
To make it work for this specific case that includes formatting. Sample run:
$ cat inp && awk -f mkt.awk inp
# Type Response Acc RT Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 31 0 0 0.0000 70221
# Type Response Acc RT Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 2 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 31 0 0 0.0000 70221

This version maintains your original formatting
awk 'BEGIN{ new[" 1"]="10"; new[" 2"]="11"; new[" 3"]="12"
new[" 4"]="30"; new[" 5"]="31"; new[" 6"]="32" }
{ line[-2]=line[-1]; line[-1]=line[0]; line[0]=$0 }
$2==90 { if( match( line[-2], /^ *[0-9]+ +[1-6] / ) ) {
old=substr( line[-2], RLENGTH-2,2 )
line[-2]=substr( line[-2], 1, RLENGTH-3 ) new[old] \
substr( line[-2], RLENGTH ) } }
NR>2 { printf("%s\n",line[-2]) }
END { printf("%s\n%s\n",line[-1],line[0]) }' file.in

Related

PMS5003 with ESP8266 - many checksum errors

I have an ESP8266 connected to PMS5003 particulate matter sensor through (hardware) UART.
I'm getting many checksum errors while reading from PMS5003.
Here's the library that I'm using to communicate with PMS5003:
PMS5003.cpp
#include "PMS5003.h"
void PMS5003::processDataOn(HardwareSerial &serial) {
unsigned long timeout = millis();
int count = 0;
byte incomeByte[NUM_INCOME_BYTE];
boolean startcount = false;
byte data;
int timeoutHops = 0;
while (1){
if (((millis() - timeout) > 1000) && (timeoutHops == 0)) {
timeoutHops = 1;
yield();
ESP.wdtFeed();
}
if (((millis() - timeout) > 2000) && (timeoutHops == 1)) {
timeoutHops = 2;
yield();
ESP.wdtFeed();
}
if ((millis() - timeout) > 3000){
Serial.println("SENSOR-ERROR-TIMEOUT");
break;
}
if (serial.available()){
data = serial.read();
if (data == CHAR_PRELIM && !startcount) {
startcount = true;
count++;
incomeByte[0] = data;
} else if (startcount) {
count++;
incomeByte[count - 1] = data;
if (count >= NUM_INCOME_BYTE){
break;
}
}
}
}
unsigned int calcsum = 0;
unsigned int exptsum;
for (int i = 0; i < NUM_DATA_BYTE; i++) {
calcsum += (unsigned int)incomeByte[i];
}
exptsum = ((unsigned int)incomeByte[CHECK_BYTE] << 8) + (unsigned int)incomeByte[CHECK_BYTE + 1];
if (calcsum == exptsum) {
pm1 = ((unsigned int)incomeByte[PM1_BYTE] << 8) + (unsigned int)incomeByte[PM1_BYTE + 1];
pm25 = ((unsigned int)incomeByte[PM25_BYTE] << 8) + (unsigned int)incomeByte[PM25_BYTE + 1];
pm10 = ((unsigned int)incomeByte[PM10_BYTE] << 8) + (unsigned int)incomeByte[PM10_BYTE + 1];
} else {
Serial.println("#[exception] PM2.5 Sensor CHECKSUM ERROR!");
pm1 = -1;
pm25 = -1;
pm10 = -1;
}
return;
}
int PMS5003::getPM1() {
return pm1;
}
int PMS5003::getPM25() {
return pm25;
}
int PMS5003::getPM10() {
return pm10;
}
PMS5003.h
#ifndef _PMS_5003_H
#define _PMS_5003_H
#include <Wire.h>
#include <Arduino.h>
#define VERSION 0.2
#define Sense_PM 6
#define NUM_INCOME_BYTE 32
#define CHAR_PRELIM 0x42
#define NUM_DATA_BYTE 29
#define CHECK_BYTE 30
#define PM1_BYTE 10
#define PM25_BYTE 12
#define PM10_BYTE 14
class PMS5003 {
public:
//void processData(int *PM1, int *PM25, int *PM10);
void processDataOn(HardwareSerial &serial);
int getPM1();
int getPM25();
int getPM10();
private:
int pm1;
int pm25;
int pm10;
};
#endif
Here's how I'm using it:
struct ParticulateMatterMeasurements {
private:
bool _areValid = false;
int PM01Value = 0;
int PM25Value = 0;
int PM10Value = 0;
public:
void setAreValid(bool _areValid) {
ParticulateMatterMeasurements::_areValid = _areValid;
}
bool getAreValid() const {
return _areValid;
}
int getPM01Value() const {
return PM01Value;
}
void setPM01Value(int PM01Value) {
ParticulateMatterMeasurements::PM01Value = PM01Value;
}
int getPM25Value() const {
return PM25Value;
}
void setPM25Value(int PM25Value) {
ParticulateMatterMeasurements::PM25Value = PM25Value;
}
int getPM10Value() const {
return PM10Value;
}
void setPM10Value(int PM10Value) {
ParticulateMatterMeasurements::PM10Value = PM10Value;
}
};
ParticulateMatterMeasurements getMeasurements() {
ParticulateMatterMeasurements measurements;
measurements.setAreValid(false);
pms5003.processDataOn(Serial);
measurements.setPM01Value(pms5003.getPM1());
measurements.setPM25Value(pms5003.getPM25());
measurements.setPM10Value(pms5003.getPM10());
if (measurements.getPM01Value() != -1 && measurements.getPM25Value() != -1 && measurements.getPM10Value() != -1) {
measurements.setAreValid(true);
}
return measurements;
}
The problem is that I get many checksum errors. During 60 measurements I get about 100 of: #[exception] PM2.5 Sensor CHECKSUM ERROR!.
What could be the problem here?
#edit
I ran a test where I print what PMS5003 sends to my ESP8266. It looks like the checksum which is the last byte is sometimes not sent. Instead, I get 66 usually but I can see sometimes 66 77 instead of the last 2 bytes as well.
66 77 0 28 0 13 0 16 0 20 0 13 0 16 0 20 9 201 2 216 0 86 0 8 0 3 0 1 145 0 3 172
16
66 77 0 28 0 13 0 16 0 20 0 13 0 16 0 20 9 201 2 216 0 86 0 8 0 3 0 1 145 0 3 172
16
66 77 0 28 0 14 0 18 0 22 0 14 0 18 0 22 10 32 2 239 0 93 0 9 0 3 0 1 145 0 3 45
18
66 77 0 28 0 14 0 18 0 22 0 14 0 18 0 22 10 32 2 239 0 93 0 9 0 3 0 1 145 0 3 66
18
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 13 0 17 0 20 0 13 0 17 0 20 10 50 2 231 0 88 0 9 0 3 0 1 145 0 3 42
17
66 77 0 28 0 12 0 16 0 20 0 12 0 16 0 20 9 225 2 208 0 90 0 8 0 3 0 1 145 0 3 190
16
66 77 0 28 0 13 0 17 0 21 0 13 0 17 0 21 9 225 2 208 0 90 0 8 0 3 0 1 145 0 3 196
17
66 77 0 28 0 12 0 15 0 17 0 12 0 15 0 17 9 249 2 211 0 76 0 5 0 0 0 0 145 0 3 188
15
66 77 0 28 0 12 0 15 0 17 0 12 0 15 0 17 9 249 2 211 0 76 0 5 0 0 0 0 145 0 3 188
15
66 77 0 28 0 12 0 15 0 16 0 12 0 15 0 16 9 210 2 188 0 70 0 5 0 0 0 0 145 0 3 118
15
66 77 0 28 0 12 0 15 0 16 0 12 0 15 0 16 9 210 2 188 0 70 0 5 0 0 0 0 145 0 3 118
15
66 77 0 28 0 13 0 16 0 17 0 13 0 16 0 17 9 198 2 183 0 78 0 5 0 0 0 0 145 0 3 115
16
66 77 0 28 0 13 0 16 0 17 0 13 0 16 0 17 9 198 2 183 0 78 0 5 0 0 0 0 145 0 3 66
16
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 12 0 16 0 18 0 12 0 16 0 18 9 234 2 195 0 87 0 8 0 1 0 0 145 0 3 176
16
66 77 0 28 0 12 0 16 0 18 0 12 0 16 0 18 9 234 2 195 0 87 0 8 0 1 0 0 145 0 3 176
16
66 77 0 28 0 12 0 15 0 17 0 12 0 15 0 17 9 186 2 184 0 77 0 6 0 1 0 0 145 0 3 101
15
66 77 0 28 0 13 0 16 0 18 0 13 0 16 0 18 9 186 2 184 0 77 0 6 0 1 0 0 145 0 3 107
16
66 77 0 28 0 13 0 16 0 18 0 13 0 16 0 18 9 186 2 184 0 77 0 6 0 1 0 0 145 0 3 107
16
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 165 2 180 0 76 0 6 0 1 0 0 145 0 3 83
17
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 165 2 180 0 76 0 6 0 1 0 0 145 0 3 83
17
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 156 2 186 0 76 0 6 0 1 0 0 145 0 3 80
17
66 77 0 28 0 12 0 16 0 17 0 12 0 16 0 17 9 156 2 186 0 76 0 6 0 1 0 0 145 0 3 66
16
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 12 0 16 0 17 0 12 0 16 0 17 9 249 2 200 0 67 0 6 0 1 0 0 145 0 3 172
16
66 77 0 28 0 12 0 16 0 17 0 12 0 16 0 17 9 249 2 200 0 67 0 6 0 1 0 0 145 0 3 172
16
66 77 0 28 0 12 0 16 0 17 0 12 0 16 0 17 9 231 2 197 0 73 0 7 0 1 0 0 145 0 3 158
16
66 77 0 28 0 12 0 16 0 17 0 12 0 16 0 17 9 231 2 197 0 73 0 7 0 1 0 0 145 0 3 158
16
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 243 2 194 0 73 0 7 0 1 0 0 145 0 3 173
17
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 243 2 194 0 73 0 7 0 1 0 0 145 0 3 173
17
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 231 2 190 0 70 0 8 0 1 0 0 145 0 3 155
17
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 231 2 190 0 70 0 8 0 1 0 0 145 0 3 155
17
66 77 0 28 0 13 0 17 0 19 0 13 0 17 0 19 10 65 2 219 0 68 0 7 0 1 0 0 145 0 3 66
17
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 13 0 17 0 19 0 13 0 17 0 19 10 65 2 219 0 68 0 7 0 1 0 0 145 0 3 18
17
66 77 0 28 0 13 0 18 0 20 0 13 0 18 0 20 10 86 2 218 0 79 0 11 0 2 0 0 145 0 3 58
18
66 77 0 28 0 13 0 18 0 20 0 13 0 18 0 20 10 86 2 218 0 79 0 11 0 2 0 0 145 0 3 58
18
66 77 0 28 0 13 0 18 0 19 0 13 0 18 0 19 10 86 2 216 0 76 0 8 0 1 0 0 145 0 3 47
18
66 77 0 28 0 13 0 18 0 19 0 13 0 18 0 19 10 86 2 216 0 76 0 8 0 1 0 0 145 0 3 47
18
66 77 0 28 0 14 0 18 0 20 0 14 0 18 0 20 10 212 2 250 0 75 0 8 0 1 0 0 145 0 3 210
18
66 77 0 28 0 14 0 18 0 20 0 14 0 18 0 20 10 212 2 250 0 75 0 8 0 1 0 0 145 0 3 210
18
66 77 0 28 0 12 0 17 0 20 0 12 0 17 0 20 10 137 2 234 0 86 0 10 0 1 0 0 145 0 3 126
17
66 77 0 28 0 12 0 17 0 20 0 12 0 17 0 20 10 137 2 234 0 86 0 10 0 1 0 0 145 0 3 126
17
66 77 0 28 0 12 0 17 0 20 0 12 0 17 0 20 10 137 2 234 0 86 0 10 0 1 0 0 145 0 3 66
17
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 14 0 19 0 21 0 14 0 19 0 21 10 164 2 240 0 95 0 10 0 1 0 0 145 0 3 178
19
66 77 0 28 0 14 0 19 0 21 0 14 0 19 0 21 10 164 2 240 0 95 0 10 0 1 0 0 145 0 3 178
19
66 77 0 28 0 13 0 19 0 21 0 13 0 19 0 21 10 110 2 223 0 95 0 10 0 1 0 0 145 0 3 105
19
66 77 0 28 0 13 0 19 0 21 0 13 0 19 0 21 10 110 2 223 0 95 0 10 0 1 0 0 145 0 3 105
19
66 77 0 28 0 13 0 19 0 21 0 13 0 19 0 21 10 128 2 227 0 101 0 10 0 1 0 0 145 0 3 133
19
66 77 0 28 0 13 0 19 0 21 0 13 0 19 0 21 10 128 2 227 0 101 0 10 0 1 0 0 145 0 3 133
19
66 77 0 28 0 14 0 20 0 24 0 14 0 20 0 24 10 158 2 254 0 106 0 13 0 4 0 0 145 0 3 211
20
66 77 0 28 0 13 0 19 0 23 0 13 0 19 0 23 10 158 2 254 0 106 0 13 0 4 0 0 145 0 3 205
19
66 77 0 28 0 13 0 19 0 23 0 13 0 19 0 23 10 212 3 10 0 107 0 12 0 4 0 0 145 0 3 66
19
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 14 0 20 0 24 0 14 0 20 0 24 10 212 3 10 0 107 0 12 0 4 0 0 145 0 3 22
20
66 77 0 28 0 14 0 20 0 24 0 14 0 20 0 24 10 236 3 11 0 109 0 13 0 4 0 0 145 0 3 50
20
66 77 0 28 0 14 0 20 0 24 0 14 0 20 0 24 10 236 3 11 0 109 0 13 0 4 0 0 145 0 3 50
20
66 77 0 28 0 15 0 20 0 23 0 15 0 20 0 23 10 254 3 29 0 105 0 9 0 3 0 0 145 0 3 77
20
66 77 0 28 0 15 0 20 0 23 0 15 0 20 0 23 10 254 3 29 0 105 0 9 0 3 0 0 145 0 3 77
20
66 77 0 28 0 14 0 19 0 22 0 14 0 19 0 22 11 22 3 40 0 99 0 9 0 3 0 0 145 0 2 101
19
66 77 0 28 0 14 0 19 0 22 0 14 0 19 0 22 11 22 3 40 0 99 0 9 0 3 0 0 145 0 2 101
19
66 77 0 28 0 14 0 19 0 21 0 14 0 19 0 21 10 149 3 6 0 93 0 8 0 3 0 0 145 0 2 184
19
66 77 0 28 0 14 0 19 0 21 0 14 0 19 0 21 10 149 3 6 0 93 0 8 0 3 0 0 145 0 2 66
19
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 14 0 18 0 20 0 14 0 18 0 20 10 140 3 2 0 77 0 5 0 3 0 0 145 0 2 148
18
66 77 0 28 0 13 0 17 0 19 0 13 0 17 0 19 10 140 3 2 0 77 0 5 0 3 0 0 145 0 2 142
17
66 77 0 28 0 13 0 17 0 19 0 13 0 17 0 19 10 104 2 251 0 77 0 5 0 3 0 0 145 0 3 98
17
66 77 0 28 0 13 0 17 0 19 0 13 0 17 0 19 10 104 2 251 0 77 0 5 0 3 0 0 145 0 3 98
17
66 77 0 28 0 13 0 17 0 19 0 13 0 17 0 19 10 104 2 251 0 77 0 5 0 3 0 0 145 0 3 98
17
66 77 0 28 0 13 0 18 0 20 0 13 0 18 0 20 10 116 3 11 0 77 0 8 0 3 0 0 145 0 2 134
18
66 77 0 28 0 14 0 19 0 21 0 14 0 19 0 21 10 116 3 11 0 77 0 8 0 3 0 0 145 0 2 140
19
66 77 0 28 0 14 0 19 0 21 0 14 0 19 0 21 10 137 3 18 0 76 0 7 0 3 0 0 145 0 2 166
19
[update] This is the newest version.
66 77 0 28 0 14 0 19 0 21 0 14 0 19 0 21 10 137 3 18 0 76 0 7 0 3 0 0 145 0 2 166
19
66 77 0 28 0 12 0 17 0 18 0 12 0 17 0 18 10 77 2 241 0 77 0 5 0 0 0 0 145 0 3 66
17
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 12 0 17 0 18 0 12 0 17 0 18 10 77 2 241 0 77 0 5 0 0 0 0 145 0 3 54
17
66 77 0 28 0 13 0 18 0 19 0 13 0 18 0 19 10 32 2 224 0 80 0 6 0 0 0 0 145 0 3 2
18
66 77 0 28 0 13 0 18 0 19 0 13 0 18 0 19 10 32 2 224 0 80 0 6 0 0 0 0 145 0 3 2
18
66 77 0 28 0 13 0 19 0 19 0 13 0 19 0 19 10 47 2 240 0 81 0 5 0 0 0 0 145 0 3 35
19
66 77 0 28 0 12 0 18 0 18 0 12 0 18 0 18 10 47 2 240 0 81 0 5 0 0 0 0 145 0 3 29
18
66 77 0 28 0 14 0 20 0 20 0 14 0 20 0 20 10 173 3 6 0 90 0 6 0 1 0 1 145 0 2 202
20
66 77 0 28 0 14 0 20 0 20 0 14 0 20 0 20 10 173 3 6 0 90 0 6 0 1 0 1 145 0 2 202
20
66 77 0 28 0 14 0 21 0 21 0 14 0 21 0 21 10 233 3 24 0 90 0 9 0 1 0 1 145 0 3 31
21
66 77 0 28 0 13 0 19 0 22 0 13 0 19 0 22 10 242 3 25 0 84 0 12 0 4 0 2 145 0 3 38
19
66 77 0 28 0 13 0 19 0 22 0 13 0 19 0 22 10 242 3 25 0 84 0 12 0 4 0 2 145 0 3 38
19
66 77 0 28 0 13 0 19 0 22 0 13 0 19 0 22 10 242 3 25 0 84 0 12 0 4 0 2 145 0 3 66
19
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 15 0 20 0 23 0 15 0 20 0 23 11 79 3 40 0 88 0 9 0 4 0 2 145 0 2 156
20
66 77 0 28 0 15 0 20 0 23 0 15 0 20 0 23 11 79 3 40 0 88 0 9 0 4 0 2 145 0 2 156
20
66 77 0 28 0 14 0 19 0 21 0 14 0 19 0 21 11 67 3 38 0 83 0 9 0 4 0 2 145 0 2 129
19
66 77 0 28 0 14 0 19 0 21 0 14 0 19 0 21 11 67 3 38 0 83 0 9 0 4 0 2 145 0 2 129
19
66 77 0 28 0 14 0 18 0 21 0 14 0 18 0 21 11 58 3 43 0 82 0 8 0 4 0 2 145 0 2 121
18
66 77 0 28 0 15 0 19 0 22 0 15 0 19 0 22 11 58 3 43 0 82 0 8 0 4 0 2 145 0 2 127
19
66 77 0 28 0 15 0 19 0 21 0 15 0 19 0 21 11 31 3 37 0 79 0 7 0 4 0 2 145 0 2 88
19
66 77 0 28 0 15 0 19 0 21 0 15 0 19 0 21 11 31 3 37 0 79 0 7 0 4 0 2 145 0 2 88
19
66 77 0 28 0 14 0 18 0 21 0 14 0 18 0 21 10 203 3 11 0 84 0 8 0 4 0 2 145 0 2 66
18
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 14 0 18 0 21 0 14 0 18 0 21 10 203 3 11 0 84 0 8 0 4 0 2 145 0 2 235
18
66 77 0 28 0 14 0 18 0 21 0 14 0 18 0 21 10 197 3 5 0 84 0 9 0 4 0 2 145 0 2 224
18
66 77 0 28 0 14 0 18 0 21 0 14 0 18 0 21 10 197 3 5 0 84 0 9 0 4 0 2 145 0 2 224
18
66 77 0 28 0 14 0 18 0 21 0 14 0 18 0 21 10 179 3 8 0 81 0 8 0 3 0 1 145 0 2 203
18
66 77 0 28 0 14 0 18 0 21 0 14 0 18 0 21 10 179 3 8 0 81 0 8 0 3 0 1 145 0 2 203
18
66 77 0 28 0 14 0 17 0 20 0 14 0 17 0 20 10 98 2 243 0 75 0 8 0 3 0 1 145 0 3 90
17
66 77 0 28 0 14 0 17 0 20 0 14 0 17 0 20 10 98 2 243 0 75 0 8 0 3 0 1 145 0 3 90
17
66 77 0 28 0 14 0 17 0 19 0 14 0 17 0 19 10 116 2 246 0 77 0 5 0 3 0 1 145 0 3 108
17
66 77 0 28 0 14 0 17 0 19 0 14 0 17 0 19 10 116 2 246 0 77 0 5 0 3 0 1 145 0 3 66
17
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 14 0 17 0 19 0 14 0 17 0 19 10 74 2 232 0 80 0 3 0 1 0 1 145 0 3 51
17
66 77 0 28 0 14 0 17 0 19 0 14 0 17 0 19 10 74 2 232 0 80 0 3 0 1 0 1 145 0 3 51
17
66 77 0 28 0 14 0 17 0 19 0 14 0 17 0 19 10 74 2 232 0 80 0 3 0 1 0 1 145 0 3 51
17
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 246 2 220 0 79 0 3 0 1 0 1 145 0 3 205
17
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 246 2 220 0 79 0 3 0 1 0 1 145 0 3 205
17
66 77 0 28 0 13 0 17 0 19 0 13 0 17 0 19 9 255 2 222 0 82 0 6 0 1 0 1 145 0 3 224
17
66 77 0 28 0 13 0 17 0 19 0 13 0 17 0 19 9 255 2 222 0 82 0 6 0 1 0 1 145 0 3 224
17
66 77 0 28 0 14 0 18 0 20 0 14 0 18 0 20 10 89 2 242 0 85 0 6 0 1 0 1 145 0 3 88
18
66 77 0 28 0 14 0 18 0 20 0 14 0 18 0 20 10 89 2 242 0 85 0 6 0 1 0 1 145 0 3 66
18
#[exception] PM2.5 Sensor CHECKSUM ERROR!
After some time I get more errors:
66 77 0 28 0 13 0 17 0 66 77 0 28 0 13 0 17 0 17 0 13 0 17 0 17 9 129 2 177 0 65 0
7168
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 13 0 17 0 17 0 66 77 0 28 0 13 0 17 0 17 0 13 0 17 0 17 9 129 2 177 0
19712
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 13 0 17 0 17 0 66 77 0 28 0 12 0 16 0 16 0 12 0 16 0 16 9 186 2 186 0
19712
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 12 0 16 0 16 0 12 0 16 0 16 9 186 2 186 0 67 0 6 0 0 0 0 145 0 3 92
16
66 77 0 28 0 12 0 16 0 16 0 12 0 16 0 16 9 174 2 193 0 62 0 5 0 0 0 0 145 0 3 81
16
66 77 0 28 0 13 0 17 0 17 0 13 0 17 0 17 9 174 2 193 0 62 0 5 0 0 0 0 145 0 3 87
17
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 180 2 197 0 62 0 6 0 1 0 1 145 0 3 102
17
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 180 2 197 0 62 0 6 0 1 0 1 145 0 3 102
17
66 77 0 28 0 13 0 17 0 18 0 13 0 17 0 18 9 165 2 190 0 62 0 6 0 1 0 1 145 0 3 80
17
66 77 0 28 0 13 0 17 0 66 77 0 28 0 13 0 16 0 18 0 13 0 16 0 18 9 135 2 182 0 65 0
7168
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 13 0 16 0 18 0 66 77 0 28 0 13 0 16 0 18 0 13 0 16 0 18 9 135 2 182 0
19712
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 13 0 16 0 18 0 13 0 16 0 18 9 117 2 180 0 64 0 3 0 2 0 1 145 0 3 20
16
66 77 0 28 0 13 0 16 0 18 0 66 77 0 28 0 13 0 16 0 18 0 13 0 16 0 18 9 117 2 180 0
19712
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 12 0 15 0 17 0 12 0 15 0 17 9 153 2 196 0 64 0 3 0 2 0 1 145 0 3 66
15
66 77 0 28 0 12 0 15 0 17 0 12 0 15 0 17 9 153 2 196 0 64 0 3 0 2 0 1 145 0 3 66
15
[update] This is the newest version.
66 77 0 28 0 12 0 16 0 19 0 12 0 16 0 19 9 162 2 206 0 71 0 6 0 5 0 2 145 0 3 105
16
66 77 0 28 0 13 0 17 0 20 0 13 0 17 0 20 9 162 2 206 0 71 0 6 0 5 0 2 145 0 3 111
17
66 77 0 28 0 13 0 16 0 19 0 13 0 16 0 19 9 123 2 189 0 74 0 6 0 5 0 2 145 0 3 54
16
66 77 0 28 0 13 0 16 0 66 77 0 28 0 12 0 16 0 20 0 12 0 16 0 20 9 54 2 166 0 76 0
7168
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 12 0 16 0 20 0 12 0 16 0 20 9 54 2 166 0 76 0 7 0 6 0 2 145 0 2 222
16
66 77 0 28 0 11 0 15 0 19 0 66 77 0 28 0 11 0 15 0 19 0 11 0 15 0 19 9 21 2 166 0
19712
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 11 0 15 0 19 0 66 77 0 28 0 12 0 16 0 20 0 12 0 16 0 20 9 21 2 166 0
19712
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 13 0 16 0 20 0 13 0 16 0 20 9 120 2 189 0 83 0 9 0 6 0 1 145 0 3 65
16
66 77 0 28 0 13 0 16 0 20 0 13 0 16 0 20 9 120 2 189 0 83 0 9 0 6 0 1 145 0 3 65
16
66 77 0 28 0 13 0 16 0 20 0 13 0 16 0 20 9 186 2 198 0 74 0 9 0 5 0 1 145 0 3 130
16
66 77 0 28 0 13 0 16 0 20 0 13 0 16 0 20 9 186 2 198 0 74 0 9 0 5 0 1 145 0 3 130
16
66 77 0 28 0 13 0 15 0 19 0 13 0 15 0 19 9 171 2 184 0 74 0 9 0 5 0 1 145 0 3 97
15
66 77 0 28 0 13 0 15 0 66 77 0 28 0 13 0 16 0 20 0 13 0 16 0 20 9 198 2 190 0 77 0
7168
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 13 0 16 0 20 0 13 0 16 0 20 9 198 2 190 0 77 0 8 0 5 0 1 145 0 3 136
16
66 77 0 28 0 13 0 16 0 19 0 66 77 0 28 0 13 0 16 0 19 0 13 0 16 0 19 9 225 2 190 0
19712
#[exception] PM2.5 Sensor CHECKSUM ERROR!
66 77 0 28 0 13 0 17 0 20 0 13 0 17 0 20 10 23 2 214 0 81 0 9 0 3 0 0 145 0 2 246
17
66 77 0 28 0 13 0 17 0 20 0 13 0 17 0 20 10 23 2 214 0 81 0 9 0 3 0 0 145 0 2 246
17
66 77 0 28 0 13 0 17 0 20 0 13 0 17 0 20 10 23 2 214 0 81 0 9 0 3 0 0 145 0 2 246
17
66 77 0 28 0 13 0 17 0 20 0 66 77 0 28 0 13 0 17 0 20 0 13 0 17 0 20 10 98 2 234 0
19712
#[exception] PM2.5 Sensor CHECKSUM ERROR!
The lonely number in the new line after 32 bytes is the value of PM2.5 which shouldn't get high. However, it gets sometimes high and sometimes not when the checksum is incorrect.
I wonder why the situation changes over time... Maybe I could keep resetting the UART somehow?
For each of the packets that fail the checksum test, you find a CHAR_PRELIM (66) either in the middle or at the end. This means the sensor is occasionally dropping packets and causing misalignment.
One solution is to restart packet reading each time a 66 is read. This code should do it:
UPDATE: as per #sawdust's comment, the presence of both 66 and 77 should be used as a start condition because it may be possible for 66 to appear by itself in the data. The other consideration is to use the packet length provided by the 3rd and 4th bytes instead of assuming the length to be 32. Hopefully these improvements make the code more durable.
size_t length;
incomingByte[0] = 66; // the first two bytes are always known
incomingByte[1] = 77;
...
if (serial.available()) {
if (serial.read() == 66 && serial.read() == 77)
incomingByte[2] = serial.read(); // length high byte
incomingByte[3] = serial.read(); // length low byte
int length = (incomingByte[2] << 8) + incomingByte[3];
// starting at index 4, read `length` bytes
serial.readBytes(incomingByte + 4, length);
break;
}
}
// when the code breaks out of the while(1) loop, you still need to evaluate the checksum.
According to the protocol defined by this source, the packet length is fixed at 32 bytes, so the encoded frame length (bytes 3 and 4) should always equal 0 28 (32 bytes - 2 start bytes - 2 frame length bytes = 28).
However, this code should work even for variable length packets (thanks #sawdust).
Fair warning: I do not have one of these sensors, so obviously I didn't test this, but the concept remains.
I recognize that this code won't solve the issue of characters being dropped, since it just ignores incomplete packets and you still rely on the validity of the checksum.
Finally, I find it interesting that the reason that the checksum is failing is because the checksum bytes are not even being received in those cases!
Hope this helps!
UPDATE #2: This is more or less a revised answer in it of itself.
Using this code to read packets, the following criteria (which are defined by the protocol) are guaranteed:
The packets begins with [66 77]
The packet contains 32 bytes
The start condition [66 77] will never occur in the body of the packet.
Here's the code. I manage to reduce it down to a few if statements
void PMS5003::processDataOn(HardwareSerial &serial) {
bool possibleStart = false;
incomeByte[0] = 66;
incomeByte[1] = 77;
uint8_t count = 0;
...
while (1) {
...
if (serial.available()) {
uint8_t c = serial.read();
if (possibleStart) {
possibleStart = false;
if (c == 77) count = 2;
}
if (c == 66) possibleStart = true;
if (count >= 2) incomeByte[count++] = c;
if (count == NUM_DATA_BYTE) break;
}
}
// at this point, incomeByte must:\
// > begin with [66 77]
// > contain 32 bytes
// > not contain [66 77] anywhere after the first two bytes
// > therefore, it is guaranteed to contain a checksum
// now is the right time to evaluate the checksum.
// I expect all of the checksums to match, but you might as well check
}
At the time of posting, the OP has already coded a solution which fulfills the requirements. I am posting this because I believe this code improves upon the OP's by being more concise, more readable/declarative, and hopefully more easily manageable.
This code can also serve as a general solution for any case in which two characters define a start condition, provided the packet length is known or can be determined.
While the question why the data comes corrupted still remains, here is a workaround I managed to achieve:
I'm checking the checksum, if it's incorrect, then:
I'm looking for 66 77 in the whole data. When I find it:
I'm checking if in the next 16 bytes there's another 66 77. If it's not found:
I'm presuming the values that are distanced by 10-15 bytes from 66 77 are the ones I'm looking for (PM1, PM2.5, PM10).
Here's the code:
void PMS5003::processDataOn(HardwareSerial &serial) {
unsigned long timeout = millis();
int count = 0;
byte incomeByte[NUM_INCOME_BYTE];
boolean startcount = false;
byte data;
int timeoutHops = 0;
while (1){
if (((millis() - timeout) > 1000) && (timeoutHops == 0)) {
timeoutHops = 1;
yield();
ESP.wdtFeed();
}
if (((millis() - timeout) > 2000) && (timeoutHops == 1)) {
timeoutHops = 2;
yield();
ESP.wdtFeed();
}
if ((millis() - timeout) > 3000) {
yield();
ESP.wdtFeed();
Serial.println("SENSOR-ERROR-TIMEOUT");
break;
}
if (serial.available()) {
data = serial.read();
if (data == CHAR_PRELIM && !startcount) {
startcount = true;
count++;
incomeByte[0] = data;
} else if (startcount) {
count++;
incomeByte[count - 1] = data;
if (count >= NUM_INCOME_BYTE){
break;
}
}
}
}
unsigned int calcsum = 0;
unsigned int exptsum;
for (int a = 0; a < NUM_INCOME_BYTE; a++) {
Serial.print((unsigned int)incomeByte[a]);
Serial.print(" ");
}
Serial.println();
Serial.println(((unsigned int)incomeByte[PM25_BYTE] << 8) + (unsigned int)incomeByte[PM25_BYTE + 1]);
for (int i = 0; i < NUM_DATA_BYTE; i++) {
calcsum += (unsigned int)incomeByte[i];
}
exptsum = ((unsigned int)incomeByte[CHECK_BYTE] << 8) + (unsigned int)incomeByte[CHECK_BYTE + 1];
if (calcsum == exptsum) {
pm1 = ((unsigned int)incomeByte[PM1_BYTE] << 8) + (unsigned int)incomeByte[PM1_BYTE + 1];
pm25 = ((unsigned int)incomeByte[PM25_BYTE] << 8) + (unsigned int)incomeByte[PM25_BYTE + 1];
pm10 = ((unsigned int)incomeByte[PM10_BYTE] << 8) + (unsigned int)incomeByte[PM10_BYTE + 1];
} else {
Serial.println("#[exception] PM2.5 Sensor CHECKSUM ERROR!");
pm1 = -1;
pm25 = -1;
pm10 = -1;
for (int a = 0; a < NUM_INCOME_BYTE; a++) {
bool valid = true;
if (((unsigned int)incomeByte[a] == 66) && ((unsigned int)incomeByte[a+1] == 77)) {
if (a+16 < NUM_INCOME_BYTE) {
for (int b = a+1; b < a+15; b++) {
if (((unsigned int)incomeByte[b] == 66) && ((unsigned int)incomeByte[b+1] == 77)) {
valid = false;
break;
}
}
if (valid) {
pm1 = ((unsigned int)incomeByte[a+10] << 8) + (unsigned int)incomeByte[a+11];
pm25 = ((unsigned int)incomeByte[a+12] << 8) + (unsigned int)incomeByte[a+13];
pm10 = ((unsigned int)incomeByte[a+14] << 8) + (unsigned int)incomeByte[a+15];
Serial.println("valid: ");
Serial.print(pm1);
Serial.print(" ");
Serial.print(pm25);
Serial.print(" ");
Serial.print(pm10);
Serial.println();
break;
}
}
}
}
}
return;
}
Theoretically, it may produce false positives or negatives but in practice, it just works.
66 77 0 28 0 12 0 15 0 17 0 12 0 15 0 17 9 102 2 176 66 77 0 28 0 12 0 15 0 16 0 12
15
#[exception] PM2.5 Sensor CHECKSUM ERROR!
valid:
12 15 17
66 77 0 28 0 12 0 15 0 16 0 12 0 15 0 16 9 114 2 175 0 73 0 4 0 1 0 0 145 0 3 12
15
66 77 0 28 0 12 0 15 0 16 0 12 0 15 0 16 9 114 2 175 0 73 0 4 0 1 0 0 145 0 3 12
15
66 77 0 28 0 12 0 15 0 16 0 12 0 15 0 16 9 141 2 190 0 72 0 3 0 1 0 0 145 0 3 52
15
66 77 0 28 0 12 0 15 0 16 0 12 0 15 0 16 9 141 2 190 0 72 0 3 0 1 0 0 145 0 3 52
15
66 77 0 28 0 12 0 16 0 16 0 12 0 16 0 16 9 198 2 202 0 75 0 3 0 0 0 0 145 0 3 125
16
66 77 0 28 0 12 0 16 0 16 0 66 77 0 28 0 12 0 16 0 16 0 12 0 16 0 16 9 198 2 202 0
19712
#[exception] PM2.5 Sensor CHECKSUM ERROR!
valid:
12 16 16
66 77 0 28 0 12 0 15 0 16 0 12 0 15 0 16 9 174 2 199 0 71 0 3 0 0 0 0 145 0 3 92
15
66 77 0 28 0 12 0 15 0 16 0 12 0 15 0 16 9 174 2 199 0 71 0 3 0 0 0 0 145 0 3 92
15
66 77 0 28 0 12 0 15 0 16 0 12 0 15 0 16 9 174 2 199 66 77 0 28 0 13 0 16 0 16 0 13
15
#[exception] PM2.5 Sensor CHECKSUM ERROR!
valid:
12 15 16
66 77 0 28 0 13 0 16 0 16 0 13 0 16 0 16 9 213 2 205 0 72 0 3 0 0 0 0 145 0 3 142
16
66 77 0 28 0 13 0 16 0 16 0 13 0 16 0 16 9 213 2 205 0 72 0 3 0 0 0 0 145 0 3 142
16
66 77 0 28 0 13 0 16 0 17 0 13 0 16 0 17 9 207 2 208 0 83 0 6 0 1 0 0 145 0 3 156
16
66 77 0 28 0 13 0 16 0 17 0 13 0 16 0 17 9 207 2 208 0 83 0 6 0 1 0 0 145 0 3 156
16
66 77 0 28 0 13 0 17 0 17 0 13 0 17 0 17 9 159 2 202 0 87 0 5 0 1 0 0 145 0 3 107
17
66 77 0 28 0 13 0 17 0 17 0 66 77 0 28 0 13 0 17 0 17 0 13 0 17 0 17 9 159 2 202 0
19712
#[exception] PM2.5 Sensor CHECKSUM ERROR!
valid:
13 17 17

Extracting a table from a text file using PowerShell

I have a table that I want to extract from a batch of text file. The problem is that the table does not begin at the same line in the every text file. Also, the presentation, format, and reuse of keywords makes it really difficult to write a regex expression (for me at least). I've figured out how extract information from specific lines but this table is just a no go for me. I've researched regex expressions and splits but have come up empty.
The top of the file looks like this:
Summary Call Volume Statistics:
Total Calls = 1000
Total Hours = 486.7
Average Call Frequency = 2.05
Summary Reliability Statistics:
Total Queued Calls = 152
Total Calls = 1000
Total On Time Calls = 710
Total Reliability = 0.7100
Total Raw Demand = 640.00
Total Covered Demand = 437.79
Summary Business Statistics:
Total Servers = 4
Total Sim Time (secs) = 1752079
Total Server Time (secs) = 7008316
Total Server Busy Time (secs) = 0
Total Business = 0.0000
Detail Node Sim Reliability:
Node Calls On Time Percent Queued UnderTm OverTm
-------- -------- -------- -------- -------- -------- --------
0 97 81 0.8351 17 1637404 0
1 115 92 0.8000 25 1637404 0
2 103 90 0.8738 16 1637404 0
3 68 53 0.7794 17 1637404 0
4 63 57 0.9048 6 1637404 0
5 35 29 0.8286 7 1637404 0
6 31 27 0.8710 4 1637404 0
7 40 36 0.9000 6 1637404 0
8 22 17 0.7727 5 1637404 0
9 26 24 0.9231 1 1637404 0
10 24 21 0.8750 3 1637404 0
11 23 0 0.0000 5 1637404 0
12 23 20 0.8696 2 1637404 0
13 15 0 0.0000 2 1637404 0
14 20 19 0.9500 1 1637404 0
15 19 0 0.0000 1 1637404 0
16 23 18 0.7826 4 1637404 0
17 12 9 0.7500 4 1637404 0
18 10 10 1.0000 0 1637404 0
19 11 0 0.0000 1 1637404 0
20 13 0 0.0000 2 1637404 0
21 9 7 0.7778 1 1637404 0
22 11 9 0.8182 1 1637404 0
23 11 0 0.0000 2 1637404 0
24 14 6 0.4286 3 1637404 0
25 6 6 1.0000 0 1637404 0
26 6 0 0.0000 0 1637404 0
27 4 0 0.0000 1 1637404 0
28 5 5 1.0000 0 1637404 0
29 12 10 0.8333 1 1637404 0
30 12 11 0.9167 1 1637404 0
31 4 2 0.5000 2 1637404 0
32 8 8 1.0000 0 1637404 0
33 4 4 1.0000 0 1637404 0
34 6 0 0.0000 0 1637404 0
35 11 10 0.9091 1 1637404 0
36 7 0 0.0000 1 1637404 0
37 5 0 0.0000 2 1637404 0
38 5 0 0.0000 0 1637404 0
39 8 0 0.0000 2 1637404 0
40 6 6 1.0000 0 1637404 0
41 9 7 0.7778 2 1637404 0
42 4 1 0.2500 1 1637404 0
43 8 5 0.6250 1 1637404 0
44 1 1 1.0000 0 1637404 0
45 2 0 0.0000 0 1637404 0
46 5 4 0.8000 0 1637404 0
47 6 5 0.8333 0 1637404 0
48 3 0 0.0000 0 1637404 0
49 3 0 0.0000 0 1637404 0
50 2 0 0.0000 0 1637404 0
51 3 0 0.0000 1 1637404 0
52 2 0 0.0000 0 1637404 0
53 3 0 0.0000 0 1637404 0
54 2 0 0.0000 0 1637404 0
-------- -------- -------- -------- -------- -------- --------
Total: 1000 710 0.7100 152 1637404 0
Later in the file there is this table:
Comparable Node Alpha Reliability:
Node Raw Dem Sim Rely Wtd Cov
-------- -------- -------- --------
0 71.0000 0.8351 59.2887
1 62.0000 0.8000 49.6000
2 56.0000 0.8738 48.9320
3 39.0000 0.7794 30.3971
4 35.0000 0.9048 31.6667
5 21.0000 0.8286 17.4000
6 20.0000 0.8710 17.4194
7 19.0000 0.9000 17.1000
8 17.0000 0.7727 13.1364
9 17.0000 0.9231 15.6923
10 16.0000 0.8750 14.0000
11 15.0000 0.0000 0.0000
12 14.0000 0.8696 12.1739
13 12.0000 0.0000 0.0000
14 12.0000 0.9500 11.4000
15 11.0000 0.0000 0.0000
16 10.0000 0.7826 7.8261
17 10.0000 0.7500 7.5000
18 9.0000 1.0000 9.0000
19 9.0000 0.0000 0.0000
20 9.0000 0.0000 0.0000
21 8.0000 0.7778 6.2222
22 8.0000 0.8182 6.5455
23 8.0000 0.0000 0.0000
24 8.0000 0.4286 3.4286
25 7.0000 1.0000 7.0000
26 6.0000 0.0000 0.0000
27 6.0000 0.0000 0.0000
28 6.0000 1.0000 6.0000
29 6.0000 0.8333 5.0000
30 6.0000 0.9167 5.5000
31 5.0000 0.5000 2.5000
32 5.0000 1.0000 5.0000
33 5.0000 1.0000 5.0000
34 5.0000 0.0000 0.0000
35 5.0000 0.9091 4.5455
36 5.0000 0.0000 0.0000
37 4.0000 0.0000 0.0000
38 4.0000 0.0000 0.0000
39 4.0000 0.0000 0.0000
40 4.0000 1.0000 4.0000
41 4.0000 0.7778 3.1111
42 4.0000 0.2500 1.0000
43 4.0000 0.6250 2.5000
44 3.0000 1.0000 3.0000
45 3.0000 0.0000 0.0000
46 3.0000 0.8000 2.4000
47 3.0000 0.8333 2.5000
48 3.0000 0.0000 0.0000
49 3.0000 0.0000 0.0000
50 3.0000 0.0000 0.0000
51 2.0000 0.0000 0.0000
52 2.0000 0.0000 0.0000
53 2.0000 0.0000 0.0000
54 2.0000 0.0000 0.0000
-------- -------- -------- --------
Total: 437.7852
I need to be able to store the two middle columns as an array in order to do some calculations.
How do I go about doing this in powershell? I already have the following code that works (with generic name changes):
foreach ($file in $files) {
$fullName = [IO.Path]::GetFileNameWithoutExtension($file)
$CR = $fullName.Split("CRAPTFV")[-2]
$CT = $fullName.Split("CRAPTFV")[-3]
$P = $fullName.Split("CRAPTFV")[-4]
$A = $fullName.Split("CRAPTFV")[-5]
$S = $fullName.Split("CRAPTFV")[-6]
$CV = $fullName.Split("CRAPTFV")[-7]
$DEM = Select-String -Path $file -Pattern("Total Covered Demand = (\d*.?\d*)")
$REL = Select-String -Path $file -Pattern("\d+\t+\s+(\d+\.{1}\d+)\t+\s+(\d\.{1}\d+)\t+\s+(\d+.{1}\d+)") -AllMatches
Write-Output "$CT,$CR,$CV,$S,$A,$P,$DEM.Matches.groups[1]" | Out-File "fileadress" -Append
}
The goal is to use the table from each file to calculate some measurement and then append it to an output file. I seem to have yanked them out with $REL and I can see all the values with this code
$REL = Select-String -Path $file -Pattern("\d+\t+\s+(\d+\.{1}\d+)\t+\s+(\d\.{1}\d+)\t+\s+(\d+.{1}\d+)") -AllMatches
Write-Host $REL.Matches
But when I type the following I can only see the first value for each file. This
Write-Host $REL.Matches.Groups[1]
produces this:
71.0000
71.0000
71.0000
71.0000
71.0000
71.0000
for all files.
If I imagine that 4 spaces give a tab here is a way to use $REL :
$REL.matches[0].Groups[2].Value gives 0.8351
$REL.matches[1].Groups[3].Value gives 49.6000
$REL.matches[X].Groups[Y].Value for a file gives the cell of th Y column of the X line. X and Y start from 0.

How to loop rows and columns in pandas while replacing values with a constant increment

I am trying to replace values in a dataframe by 0. the first column I need to replace the 1st 3 values, the next column the 1st 6 values so on so forth increasing by 3 every time
a=np.array([133,124,156,189,132,176,189,192,100,120,130,140,150,50,70,133,124,156,189,132])
b = pd.DataFrame(a.reshape(10,2), columns= ['s','t'])
for columns in b:
yy = 3
for i in xrange(yy):
b[columns][i] = 0
yy += 3
print b
the outcome is the following
s t
0 0 0
1 0 0
2 0 0
3 189 189
4 132 132
5 176 176
6 189 189
7 192 192
8 100 100
9 120 120
I am clearly missing something really simple, to make the loop replace 6 values instead of only 3 in column t, any ideas?
i would do it this way:
i = 1
for c in b.columns:
b.ix[0 : 3*i-1, c] = 0
i += 1
Demo:
In [86]: b = pd.DataFrame(np.random.randint(0, 100, size=(20, 4)), columns=list('abcd'))
In [87]: %paste
i = 1
for c in b.columns:
b.ix[0 : 3*i-1, c] = 0
i += 1
## -- End pasted text --
In [88]: b
Out[88]:
a b c d
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 10 0 0 0
4 8 0 0 0
5 49 0 0 0
6 55 48 0 0
7 99 43 0 0
8 63 29 0 0
9 61 65 74 0
10 15 29 41 0
11 79 88 3 0
12 91 74 11 4
13 56 71 6 79
14 15 65 46 81
15 81 42 60 24
16 71 57 95 18
17 53 4 80 15
18 42 55 84 11
19 26 80 67 59
You need inicialize yy=3 before loop:
yy = 3
for columns in b:
for i in xrange(yy):
b[columns][i] = 0
yy += 3
print b
Python 3 solution:
yy = 3
for columns in b:
for i in range(yy):
b[columns][i] = 0
yy += 3
print (b)
s t
0 0 0
1 0 0
2 0 0
3 189 0
4 100 0
5 130 0
6 150 50
7 70 133
8 124 156
9 189 132
Another solution:
yy= 3
for i, col in enumerate(b.columns):
b.ix[:i*yy+yy-1, col] = 0
print (b)
s t
0 0 0
1 0 0
2 0 0
3 189 0
4 100 0
5 130 0
6 150 50
7 70 133
8 124 156
9 189 132

awk if-greater-than and replacement under condition

I have following data
......
6 4 4 17 154 93 309 0 11930
7 3 2 233 311 0 11936 11932 111874
8 3 1 15 0 11938 11943 211004 11449
9 3 2 55 102 0 11932 11941 111883
10 3 2 197 231 0 11925 11921 111849
11 3 2 160 777 0 11934 11928 111875
......
I hope to replace any values greater than 5000 to 0, from column 4 to column 9. How can I do this work with awk?
To print with lots of spaces like the input, something like this:
awk '{for(i=4;i<=NF;i++)if($i>5000)$i=0; for(i=1;i<=NF;i++)printf "%7d",$i;printf"\n"}' file
Output
6 4 4 17 154 93 309 0 0
7 3 2 233 311 0 0 0 0
8 3 1 15 0 0 0 0 0
9 3 2 55 102 0 0 0 0
10 3 2 197 231 0 0 0 0
11 3 2 160 777 0 0 0 0
For scrunched up together (TM) output, you can use this:
awk '{for(i=4;i<=NF;i++)if($i>5000)$i=0}1' file
6 4 4 17 154 93 309 0 0
7 3 2 233 311 0 0 0 0
8 3 1 15 0 0 0 0 0
9 3 2 55 102 0 0 0 0
10 3 2 197 231 0 0 0 0
11 3 2 160 777 0 0 0 0
An alternative approach (requires gawk4+):
{
patsplit($0, a, "[0-9]+", s)
printf s[0]
for (i=1; i<=length(a); i++){
if(i>4 && a[i]>5000) {
l=length(a[i])
a[i]=0
}
else l=0
printf "%"l"s%s", a[i], s[i]
}
printf "\n"
}
It is more flexible when the spacing would vary, as opposed to the example data. It might also be faster than the accepted answer, in case the number of fields is way bigger than 9.

Find specific columns and replace the following column with specific value with gawk

I am trying to find all the places where my data has a repeating line and delete the repeating line. Also, I am looking for where the 2nd column has the value 90 and replace the following 2nd column with a specific number I designate.
My data looks like this:
# Type Response Acc RT Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
7 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 31 0 0 0.0000 70221
I want my data to look like:
# Type Response Acc RT Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 5 0 0 0.0000 70221
My code:
BEGIN {
priorline = "";
ERROROFFSET = 50;
ERRORVALUE[10] = 1;
ERRORVALUE[11] = 2;
ERRORVALUE[12] = 3;
ERRORVALUE[30] = 4;
ERRORVALUE[31] = 5;
ERRORVALUE[32] = 6;
ORS = "\n";
}
NR == 1 {
print;
getline;
priorline = $0;
}
NF == 6 {
brandnewline = $0
mytype = $2
$0 = priorline
priorField2 = $2;
if (mytype !~ priorField2) {
print;
priorline = brandnewline;
}
if (priorField2 == "90") {
mytype = ERRORVALUE[mytype];
}
}
END {print brandnewline}
##Here the parameters of the brandnewline is set to the current line and then the
##proirline is set to the line on which we just worked on and the brandnewline is
##set to be the next new line we are working on. (i.e line 1 = brandnewline, now
##we set priorline = brandnewline, thus priorline is line 1 and brandnewline takes
##on line 2) Next, the same parameters were set with column 2, mytype being the
##current column 2 value and priorField2 being the same value as mytype moves to
##the next column 2 value. Finally, we wrote an if statement where, if the value
##in column 2 of the current line !~ (does not equal) value of column two of the
##previous line, then the current line will be print otherwise it will just be
##skipped over. The second if statement recognizes the lines in which the value
##90 appeared and replaces the value in column 2 with a previously defined
##ERRORVALUE set for each specific type (type 10=1, 11=2,12=3, 30=4, 31=5, 32=6).
I have been able to successfully delete the repeating lines, however, I am unable to execute the next part of my code, which is to replace the values I designated in BEGIN as the ERRORVALUES (10=1, 11=2, 12=3, 30=4, 31=5, 32=6) with the actual columns that contain that value. Essentially, I want to just replace that value in the line with my ERRORVALUE.
If anyone can help me with this I would be very grateful.
One challenge is that you can't just compare one line with the previous because the ID number will be different.
awk '
BEGIN {
ERRORVALUE[10] = 1
# ... etc
}
# print the header
NR == 1 {print; next}
NR == 2 || $0 !~ prev_regex {
prev_regex = sprintf("^\\s+\\w+\\s+%s\\s+%s\\s+%s\\s+%s\\s+%s",$2,$3,$4,$5,$6)
if (was90) $2 = ERRORVALUE[$2]
print
was90 = ($2 == 90)
}
'
For lines where the 2nd column is altered, this ruins the line formatting:
# Type Response Acc RT Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 5 0 0 0.0000 70221
If that's a problem, you could pipe the output of gawk into column -t, or if you know the line format is fixed, use printf() in the awk program.
This might work for you:
v=99999
sed ':a;$!N;s/^\(\s*\S*\s*\)\(.*\)\s*\n.*\2/\1\2/;ta;s/^\(\s*\S*\s*\) 90 /\1'"$(printf "%5d" $v)"' /;P;D' file
# Type Response Acc RT Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 99999 0 0 0.0000 68700
12 31 0 0 0.0000 70221
This might work for you:
awk 'BEGIN {
ERROROFFSET = 50;
ERRORVALUE[10] = 1;
ERRORVALUE[11] = 2;
ERRORVALUE[12] = 3;
ERRORVALUE[30] = 4;
ERRORVALUE[31] = 5;
ERRORVALUE[32] = 6;
}
NR == 1 { print ; next }
{ if (a[$2 $6]) { next } else { a[$2 $6]++ }
if ( $2 == 90) { print ; n++ ; next }
if (n>0) { $2 = ERRORVALUE[$2] ; n=0 }
printf("% 4i% 8i% 3i% 5i% 9.4f% 6i\n", $1, $2, $3, $4, $5, $6)
}' INPUTFILE
See it in action here at ideone.com.
IMO the BEGIN block is obvious. Then the following happens:
the NR == 1 line prints the very first line (and switches to the next line, also this rule only apply to the very first line)
Then checking if we had seen already the any line with the same 2nd and 6th columns and if so, switch to the next line, else mark it as seen in an array (using the concatenated column values as indecies, but do note that this might fail you if you have large values in the 2nd and smalls in the 6th (e.g. 2 0020 concatenated is 20020 and it's the same for 20 020) so you might want to add a column separatar in the index like a[$2 "-" $6]... and you can use more columns to check even more properly)
If the line has 90 on the second column prints it, flags to swap on the next line then switch to next line (in the input file)
On the next line checks the 2nd column in ERRORVALUE and if it finds, replaces its contents.
Then prints the formated line.
I agree with Glenn that two passes over the file is nicer. You can remove your duplicate, perhaps nonconsecutive, lines using a hash like this:
awk '!a[$2,$3,$4,$5,$6]++' file.txt
You should then edit your values as desired. If you wish to change the value 90 in the second column to 5000, try something like this:
awk 'NR == 1 { print; next } { sub(/^90$/, "5000", $2); printf("%4i% 8i% 3i% 5i% 9.4f% 6i\n", $1, $2, $3, $4, $5, $6) }' file.txt
You can see that I stole Zsolt's printf statement (thanks Zsolt!) for the formatting, but you can edit this if necessary. You can also pipe the output from the first statement into the second for a nice one-liner:
cat file.txt | awk '!a[$2,$3,$4,$5,$6]++' | awk 'NR == 1 { print; next } { sub(/^90$/, "5000", $2); printf("%4i% 8i% 3i% 5i% 9.4f% 6i\n", $1, $2, $3, $4, $5, $6) }'
The previous options work for the most part, however here's the way I would do it, simple and sweet. After reviewing the other posts I believe this would be the most efficient. In addition this also allows for the extra request the OP added in the comments to have the line after 90 replaced with a variable from 2 lines prior. This does it all in a single pass.
BEGIN {
PC2=PC6=1337
replacement=5
}
{
if( $6 == PC6 ) next
if( PC2 == 90 ) $2 = replacement
replacement = PC2
PC2 = $2
PC6 = $6
printf "%4s%8s%3s%5s%9s%6s\n",$1, $2, $3, $4, $5, $6
}
Example Input
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
7 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 31 0 0 0.0000 70221
Example Output
1 70 0 0 0.000000 57850
2 31 0 0 0.000000 59371
3 41 0 0 0.000000 60909
4 70 0 0 0.000000 61478
5 31 0 0 0.000000 62999
6 41 0 0 0.000000 64537
8 70 0 0 0.000000 65106
9 11 0 0 0.000000 66627
10 21 0 0 0.000000 68165
11 90 0 0 0.000000 68700
12 21 0 0 0.000000 70221