Spoon Kettle doesn't manage NULL values correctly

Spoon Kettle doesn't manage NULL values correctly - kettle

I'm using Spoon Kettle PDI to insert data from a csv file to a MariaDb database.
I'm doing something very simple but apparently, when in the csv there is a NULL value, PDI interpret it as a String and this create problems in the final query:
018/04/25 14:31:23 - Workstation.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : Because of an error, this step can't continue:
2018/04/25 14:31:23 - Workstation.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : org.pentaho.di.core.exception.KettleValueException:
2018/04/25 14:31:23 - Workstation.0 - Unexpected conversion error while converting value [checkPoint_id String] to an Integer
2018/04/25 14:31:23 - Workstation.0 -
2018/04/25 14:31:23 - Workstation.0 - checkPoint_id String : couldn't convert String to Integer
2018/04/25 14:31:23 - Workstation.0 -
2018/04/25 14:31:23 - Workstation.0 - checkPoint_id String : couldn't convert String to number : non-numeric character found at position 1 for value [NULL]
2018/04/25 14:31:23 - Workstation.0 -
2018/04/25 14:31:23 - Workstation.0 -
2018/04/25 14:31:23 - Workstation.0 -
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.core.row.value.ValueMetaBase.getInteger(ValueMetaBase.java:2081)
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertData(ValueMetaBase.java:3785)
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertBinaryStringToNativeType(ValueMetaBase.java:1579)
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.core.row.value.ValueMetaBase.getString(ValueMetaBase.java:1799)
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.core.row.RowMeta.getString(RowMeta.java:319)
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.core.row.RowMeta.getString(RowMeta.java:828)
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:385)
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:125)
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2018/04/25 14:31:23 - Workstation.0 - at java.lang.Thread.run(Thread.java:748)
2018/04/25 14:31:23 - Workstation.0 - Caused by: org.pentaho.di.core.exception.KettleValueException:
2018/04/25 14:31:23 - Workstation.0 - checkPoint_id String : couldn't convert String to Integer
2018/04/25 14:31:23 - Workstation.0 -
2018/04/25 14:31:23 - Workstation.0 - checkPoint_id String : couldn't convert String to number : non-numeric character found at position 1 for value [NULL]
2018/04/25 14:31:23 - Workstation.0 -
2018/04/25 14:31:23 - Workstation.0 -
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertStringToInteger(ValueMetaBase.java:1323)
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.core.row.value.ValueMetaBase.getInteger(ValueMetaBase.java:2019)
2018/04/25 14:31:23 - Workstation.0 - ... 9 more
2018/04/25 14:31:23 - Workstation.0 - Caused by: org.pentaho.di.core.exception.KettleValueException:
2018/04/25 14:31:23 - Workstation.0 - checkPoint_id String : couldn't convert String to number : non-numeric character found at position 1 for value [NULL]
2018/04/25 14:31:23 - Workstation.0 -
2018/04/25 14:31:23 - Workstation.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertStringToInteger(ValueMetaBase.java:1317)
2018/04/25 14:31:23 - Workstation.0 - ... 10 more
In the image you can see the import from csv. I have to specify the type of each column. The exception is related to the column checkPoint_id that is a number but can be null.
Is there a way to overcome this problem? It seems a quite basic operation and but I don't see any option I could turn on to fix this behaviour.

Uncheck the Lasy conversion.
If the problem persist, there is no standard for nulls in CSV, and it may well be that in your case, it uses "null" (a String).
If it's for one go, open the CSV file in an editor and make a global search&replace "null" with "".
If you have to automate or have a lot of CSV files, read all the field as String; then use the Null if... step to convert "null" to NULL; then change the data type with a Select value step on the Metadata tab.

Related

Unit test for z-score with Prometheus

I have been experimenting a lot with writing unit tests for alerts as per this: https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/#alerts-yml
I have some simple cases out, but now I am tackling rules that are less trivial. For example this:
abs(
avg_over_time(my_metrics{service_name="aService"}[1m])
-
avg_over_time(my_metrics{service_name="aService"}[3m])
)
/ stddev_over_time(my_metrics{service_name="aService"}[3m])
> 3
I have one file with the above rule and then this is in my test:
- interval: 1m
# Series data.
input_series:
- series: 'my_metrics{service_name="aService"}'
values: '0 0 0 0 1 0 0 0 0 '
alert_rule_test:
- eval_time: 3m
alertname: myalert
exp_alerts:
- exp_labels:
severity: warning
service_name: aService
exp_annotations:
summary: "some text"
description: "some other text"
I am not sure what my series should look like in order to test deviation from the mean. Is it even possible to test such rule?
Thank you
EDIT
I can have a succesful test if I set it > 0 as opposed to >3 I have tried to set a series of this sort:
'10+10x2 30+1000x1000'
but I cannot understand what would be the correct setup to have it triggered

This isn't a direct answer, rather a tip from someone who spent quite some time on these tests. Did you know that apart from testing alert expressions, you can unittest PromQL expressions as well? See how it can be useful:
evaluation_interval: 1m
tests:
- interval: 1m
input_series:
- series: test_metric
values: 1 1 1 10 1 1 1
promql_expr_test:
- expr: avg_over_time(test_metric[1m])
eval_time: 4m
exp_samples:
- value: #5.5
- expr: avg_over_time(test_metric[3m])
eval_time: 4m
exp_samples:
- value: #3.25
- expr: stddev_over_time(test_metric[3m])
eval_time: 4m
exp_samples:
- value: #3.897114317029974
I've split your alert expression into three separate, simple parts. If you run this unittest, you will see the commented-out values in the error message. From here it is not difficult to join pieces together and see why the alert is not happening. You can use that to build a working sequence of values.

remove any lines which level <30

I have a request please, I am usng regular expression on notepad++ and i have my database and it contains lines like
test1 - Level : 12 - Role : Healer
test2 - Level : 30 - Role : Healer
test3 - Level : 35 - Role : Healer
test3 - Level : 162 - Role : Healer
I want it to remove any lines whose level <30 so the output should be
test2 - Level : 30 - Role : Healer
test3 - Level : 35 - Role : Healer
test3 - Level : 162 - Role : Healer
Thanks in advance

You may try the following find and replace in regex mode:
Find: ^.*Level : [12]?[0-9]\b.*\R?
Replace: (empty)
Here is a demo showing that the logic is working.

Python 2.7: read a txt file, split and group a few column count from right

Due to the txt file has some flaw, the .txt file need to split from the right. below is some part f the files. Notice that the first row has only 4 columns and the other row has 5 columns. I want the data from the 2nd, 3rd, and 4th columns from the right
5123 - SENTRAL REIT - SENTA.KL - [$SENT]
KIPT - 5280 - KIP REAL EST - KIPRA.KL - [$KIPR]
ALIT - 5269 - AL-SALAM REAL - ALSAA.KL - [$ALSA]
KLCC - 5235SS - KLCC PROP - KLCCA.KL - [$KLCC]
IGBgggREIT - 5227 - IGB RT - IGREA.KL - [$IGRE]
SUNEIT - 5176 - SUNWAY RT - SUNWA.KL - [$SUNW]
ALA78QAR - 5116 - AL-AQAR HEA RT - ALQAA.KL - [$ALQA]
I want the file to be saved in .csv and can be read by pandas later
The desired output is
Code,Company,RIC
5123,SENTRAL REIT,SENTA.KL
5280,KIP REAL EST, KIPRA.KL
5269,AL-SALAM REAL,ALSAA.KL
5235SS,KLCC PROP,KLCCA.KL
5227,IGB RT,IGREA.KL
5176,SUNWAY RT,SUNWA.KL
5116,AL-AQAR HEA RT,ALQAA.KL
My code is below
>>> with open('abc.txt', 'r') as reader:
>>> [x for x in reader.read().strip().split(' - ') if x]
It returns a list and I unable to group the to the right column due to the flaw of the list (unequal columns in some rows if it is counted from left)
Please advise how to get the desired output

This should do the trick :)
import pandas as pd
with open('abc.txt', 'r') as reader:
data = [line.split(' - ')[-4:-1] for line in reader.readlines()]
df = pd.DataFrame(columns=['Code', 'Company', 'RIC'], data=data)
df.to_csv('abc.csv', sep=',', index=0)

Split string, extract and add to another column regex BIGQUERY

I have a table with Equipment column containing strings. I want to split string, take a part of it and add this part to a new column (SerialNumber_Asset). Part of the string i want to extract always has the same pattern: A + 7 digits. Example:
Equipment SerialNumber_Asset
1 AXION 920 - A2302888 - BG-ADM-82 -NK A2302888
2 Case IH Puma T4B 220 - BG-AEH-87 - NK null
3 ARION 650 - A7702047 - BG-ADZ-74 - MU A7702047
4 ARION 650 - A7702039 - BG-ADZ-72 - NK A7702039
My code:
select x, y, z,
regexp_extract(Equipment, r'([\A][\d]{7})') as SerialNumber_Asset
FROM `aa.bb.cc`
The message i got:
Cannot parse regular expression: invalid escape sequence: \A
Any suggestions what could be wrong? Thanks

Just use A instead of [\A], check example below:
select regexp_extract('AXION 920 - A2302888 - BG-ADM-82 -NK', r'(A[\d]{7})') as SerialNumber_Asset

How do I extract only the IPs from the data?

Trying to pull some logs and break it down. The following regex match gives me a correct match for all 4 IPs: ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}+) but not sure how to either delete the rest of the data or "extract" the IPs. Only need the IPs as shown.
June 3rd 2020, 21:18:02.193 [2020-06-03T21:18:02.781503+00:00,192.168.5.134,0,172.16.139.61,514,rslog1,imtcp,]<183>Jun 3 21:18:02 005-1attt01 atas_ssl: 1591219073.296175 CAspjq31LV8F0b 146.233.244.131 38530 104.16.148.244 443 - - - www.yahoo.com F - - F - - - - - - -
June 3rd 2020, 21:18:02.193 [2020-06-03T21:18:02.781503+00:00,192.168.5.134,0,172.16.139.61,514,rslog1,imtcp,]<183>Jun 3 21:18:02 005-1attt01 atas_ssl: 1591219073.296175 CAspjq31LV8F0b 146.233.244.131 38530 104.16.148.244 443 - - - www.yahoo.com F - - F - - - - - - -
Need this:
192.168.5.134 172.16.139.61 146.233.244.131 104.16.148.244
192.168.5.134 172.16.139.61 146.233.244.131 104.16.148.244

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Spoon Kettle doesn't manage NULL values correctly - kettle

Related

Unit test for z-score with Prometheus

remove any lines which level <30

Python 2.7: read a txt file, split and group a few column count from right

Split string, extract and add to another column regex BIGQUERY

How do I extract only the IPs from the data?

Categories

Resources