regexReplace in String Manipulation KNIME - regex

I'm trying to remove the content of all cells that start with a character that is not a number using KNIME (v3.2.1). I have different ideas but nothing works.
1) String Manipulation Node: regexReplace(§column§,"^[^0-9].*","")
The cells contain multiple lines, however only the first line is removed by this approach.
2) String Manipulation Node: regexMatcher($casrn_new$,"^[^0-9].*") followed by Rule Engine Node to remove all columns that are "TRUE".
The regexMatcher gives me "False" even for columns that should be "True" though.
3) String Replacer Node: I inserted the expression ^[^0-9].* into the Pattern column and selected "Replace whole String" but the regex is not recognised by that node so nothing gets replaced.
Does anyone have a solution for any of those approaches or knows another Node that might do the job? Help is much appreciated!

I would go with your first solution, since it has already worked, you just have to expand your regex to include newlines. I would try something like this:
regexReplace($column$,"^[^0-9].(.|\n)*","")
This should match any text starting with a character that is not a number, followed by any number of occurrences of any character or a newline. Depending on the line endings, you might need (.|\n|\r) instead of (.|\n).

You should use the following expression:
"(?s)^\D.*$"
So the dot will match even new lines. (Based on this: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#DOTALL)
In case you need to only change the content of the cells that do not start with a number, I do not think you need to filter any columns or rows. (BTW in case you want to remove rows, there are the Rule-based Row Filter/Splitter nodes which also support regular expressions with the MATCHES predicate.)

Related

KNIME regex expression to return 6th line

I have a column with string values present in several lines. I would like to only have the values in the 6th line, all the lines have varying lengths, but all the cells in the column have the information I need in the 6th line.
I am honestly absolutely new and have no background in Java nor KNIME - I have scoured this forum and other internet sources, and none seem to tackle what I need in KNIME specifically - I found something similar but it doesn't work in KNIME:
Regex for nth line in a text file
Your answer will probably need to be broken into two parts
How to do a regex search in KNIME
How to do a regex search for the 6th line
I can help with the regex search, but I don't know KNIME
To start with, you want to know how to search for a single line which is
([^\n]*\n)
This looks for
*: 0 or more of
[^\n]: anything that isn't a new line
followed by \n: a new line
and (): groups them together into a single match
We can then expand this into: ([^\n]*\n){5}([^\n]*\n){1} Which creates 2 capture groups, one with the first 5 lines, the second with the 6th line.
If KNIME supports Non-Capturing groups you can then expand that into the following so that you only have one matching capture group. You can decide for yourself which you like best.
(?:[^\n]*\n){5}([^\n]*\n){1}
I've created an example you can test on RegExr
Regardless of which way you go, make sure to document the regex with comments or stick it into a variable with a very clear name since they aren't particularly human readable

Regular expression for rest of line after first x characters

I have a bunch of lines with IDs as the first six characters, and data I don't need after. Is there a way to identify everything after the ID section so Find and Replace can replace it with whitespace?
/.{6}\K.*//
If you want something more specific, please be more specific in your question.

Extract values from this string?

I have the following string of text.
LOCATION: -20.443 122.951TEMPERATURE: 54.5CCONFIDENCE:
50%SATELLITE: aquaOBS TIME: 2014-05-06T05:30:30ZGRID:
1km
This is being pulled from a feed, and the fieldnames stay the same, but the values differ.
I have been trying to get my head around regular expressions and find a way to pull:
54.5 (temperature)
50 (confidence)
So I need two separate regular expressions that can pull the above from the original string. Any clues or pointers would be great.
I am doing this within a product that allows me to point to strings and can apply regular expressions to the strings so that values can be extracted and written to new fields.
ArcGIS appears to be using a very limited regex engine. It looks like it doesn't even support capturing groups, let alone lookaround. So I guess you need to try the following:
TEMPERATURE: ([0-9.]+)C
will match the TEMPERATURE entry and
CONFIDENCE: ([0-9]+)%
will match the CONFIDENCE entry.
If you're lucky, you can then access the relevant part of the match via the special variable \1 or $1 (which would then contain "54.5" and "50", respectively.
If that's not possible, you'll have to "manually" trim the first 13/12 characters from the left side from the string as well as the rightmost character.
You can split this text with delimiter- new line. As result you get an array. Than you can split the elements of the array with delimiter ':'

issue in a regexp

I'm using the following expression:
/^[alopinme]{5}$/
This regexp take me words from a set of words with letters contained within the brackets.
well, now i need to add some more functionality to such expression because i need that the fetched words could contain ONLY one more letter from another set of letters. Let's say that i want to get words formed with letters from set A and could (if exist) contain one more letter from set B.
i'm trying to guess how could i complete my regular expression but i do not find the right way.
Anyone could help me?
Thanks.
EDIT:
Here i post an example:
SELECT sin_acentos FROM Finder.palabras_esp WHERE sin_acentos REGEXP '^[tehsolm]{5}$'
This expression choose me words like: helms, moths meths homes and so on.....
but i need to add a set B of letters and get words that could contain ONLY one from such set. Lets say I have another set of letters [xzk] so the expression could get more words but only with the possibility of choosing one letter from set B.
The result could get words like: mozes, hoxes, tozes, and so on... if you check such words, you can see that most of letters for every word are from set A but only one from set B.
If the one of the other characters should appear exactly once, you can use:
^(?=.{5}$)[alopinme]*(?:[XYZ][alopinme]*)?$
(?=.{5}$) - Check the string is 5 characters long, even before matching. (this might not work on MySql)
[alopinme]* - Characters from A
(?:[XYZ][alopinme]*)? - Optional - one character from B, and some more from A.
Working example: http://rubular.com/r/aw6l561Int
Or, for if you want them up to 3 times, for example:
^(?=.{5}$)[alopinme]*(?:[XYZ][alopinme]*){0,3}$
Since the words that you are looking for are all five-character long, I can think of a rather ugly expression that would do the trick: let's say [alopinme] is your base set, and [xyz] is your optional set. Then the expression
/^([alopinmexyz][alopinme]{4}|[alopinme][alopinmexyz][alopinme]{3}|[alopinme]{2}[alopinmexyz][alopinme]{2}|[alopinme]{3}[alopinmexyz][alopinme]|[alopinme]{4}[alopinmexyz])$/
should allow five-letter words of the structure that you are looking for.
In general, a need to count anything makes your regex non-readable. Problems like this one are good to illustrate this point: it is much easier to write /^[alopinmexyz]{5}$/ expression, and add an extra step in code to check that [xyz] appears in the text no more than once. You can even use a regexp to do the additional check:
/^[^xyz]*[xyz]?[^xyz]*$/
The result in SQL would look as follows:
SELECT sin_acentos
FROM Finder.palabras_esp
WHERE sin_acentos REGEXP '^[tehsolmxyz]{5}$' -- Length == 5, all from tehsolm+xyz
AND sin_acentos REGEXP '^[^xyz]*[xyz]?[^xyz]*$' -- No more than one character from xyz

Replace all characters in a regex match with the same character in Vim

I have a regex to replace a certain pattern with a certain string, where the string is built dynamically by repeating a certain character as many times as there are characters in the match.
For example, say I have the following substitution command:
%s/hello/-----/g
However, I would like to do something like this instead:
%s/hello/-{5}/g
where the non-existing notation -{5} would stand for the dash character repeated five times.
Is there a way to do this?
Ultimately, I'd like to achieve something like this:
%s/(hello)*/-{\=strlen(\0)}/g
which would replace any instance of a string of only hellos with the string consisting of the dash character repeated the number of times equal to the length of the matched string.
%s/\v(hello)*/\=repeat('-',strlen(submatch(0)))/g
As an alternative to using the :substitute command (the usage of
which is already covered in #Peter’s answer), I can suggest automating
the editing commands for performing the replacement by means of
a self-referring macro.
A straightforward way of overwriting occurrences of the search pattern
with a certain character by hand would the following sequence of
Normal-mode commands.
Search for the start of the next occurrence.
/\(hello\)\+
Select matching text till the end.
v//e
Replace selected text.
r-
Repeat from step 1.
Thus, to automate this routine, one can run the command
:let[#/,#s]=['\(hello\)\+',"//\rv//e\rr-#s"]
and execute the contents of that s register starting from the
beginning of the buffer (or anther appropriate location) by
gg#s