Remove character from string in informatica - informatica

I have the following string values .
string 1 = test123
string 2 = stri567
now i need to remove 123,567 from the string. which means i need only first four character from the strings.(test,stri)

Have you tried reg replace to replace all numbers from a string -
REG_REPLACE( inp_col, '[0-9]','')

If you need first four characters from the column then use substring
SUBSTR(column_name, 0, 4)
Please make sure that you need first four digits or just strings from the columns.
In case if you need strings from the values, please use Koushik Roy's solution

Related

regular expression replace for SQL

I have to replace a string pattern in SQL with empty string, could anyone please suggest me?
Input String 'AC001,AD001,AE001,SA001,AE002,SD001'
Output String 'AE001,AE002
There are the 4 digit codes with first 2 characters "alphabets" and last two are digits. This is always a 4 digit code. And I have to replace all codes except the codes starting with "AE".
I can have 0 or more instances of "AE" codes in the string. The final output should be a formatted string "separated by commas" for multiple "AE" codes as mentioned above.
Here is one option calling regex_replace multiple times, eliminating the "not required" strings little by little in each iteration to arrive at the required output.
SELECT regexp_replace(
regexp_replace(
regexp_replace(
'AC001,AD001,AE001,SA001,AE002,SD001', '(?<!AE)\d{3},{0,1}', 'X','g'
),'..X','','g'
),',$','','g'
)
See Demo here
I would convert the list to an array, unnest that to rows then filter out those that should be kept and aggregate it back to a string:
select string_agg(t, ',')
from unnest(string_to_array('AC001,AD001,AE001,SA001,AE002,SD001',',') as x(t)
where x.t like 'AE%'; --<< only keep those
This is independent of the number of elements in the string and can easily be extended to support more complex conditions.
This is a good example why storing comma separated values in a single column is not such a good idea to begin with.

How can I tell if there are three or more characters between matches in a regex?

I'm using Ruby 2.1. I have this logic that looks for consecutive pairs of strings in a bigger string
results = line.scan(/\b((\S+?)\b.*?\b(\S+?))\b/)
My question is, how do I iterate over the list of results and print out whether there are three or more characters between the two strings? For instance if my string were
"abc def"
The above would produce
[["abc def", "abc", "def"]]
and I'd like to know whether there are three or more characters between "abc" and "def."
Use a quantifier for the spaces inbetween: \b((\S+?)\b\s{3,}\b(\S+?))\b
Also, the inner boundries are not really needed:
\b((\S+?)\s{3,}(\S+?))\b
A straightforward way to check this is by running a separate regex:
results.select!{|x|p x[/\S+?\b(.*?)\b\S+?/,1].size}
will print the size for every of the bunch.
Another way is to take the size of the captured groups and subtract them:
results = []
line.scan(/\b((\S+?)\b.*?\b(\S+?))\b/) do |s, group1, group2|
results << $~ if s.size - group1.size - group2.size >= 3
end

Find group of strings starting and ending by a character using regular expression

I have a string, and I want to extract, using regular expressions, groups of characters that are between the character : and the other character /.
typically, here is a string example I'm getting:
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
and so, I want to retrieved, 45.72643,4.91203 and also hereanotherdata
As they are both between characters : and /.
I tried with this syntax in a easier string where there is only 1 time the pattern,
[tt]=regexp(str,':(\w.*)/','match')
tt = ':45.72643,4.91203/'
but it works only if the pattern happens once. If I use it in string containing multiples times the pattern, I get all the string between the first : and the last /.
How can I mention that the pattern will occur multiple time, and how can I retrieve it?
Use lookaround and a lazy quantifier:
regexp(str, '(?<=:).+?(?=/)', 'match')
Example (Matlab R2016b):
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = regexp(str, '(?<=:).+?(?=/)', 'match')
result =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
In most languages this is hard to do with a single regexp. Ultimately you'll only ever get back the one string, and you want to get back multiple strings.
I've never used Matlab, so it may be possible in that language, but based on other languages, this is how I'd approach it...
I can't give you the exact code, but a search indicates that in Matlab there is a function called strsplit, example...
C = strsplit(data,':')
That should will break your original string up into an array of strings, using the ":" as the break point. You can then ignore the first array index (as it contains text before a ":"), loop the rest of the array and regexp to extract everything that comes before a "/".
So for instance...
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
Breaks down into an array with parts...
1 - 'abcd'
2 - '45.72643,4.91203/Rou'
3 - 'hereanotherdata/defgh'
Then Ignore 1, and extract everything before the "/" in 2 and 3.
As John Mawer and Adriaan mentioned, strsplit is a good place to start with. You can use it for both ':' and '/', but then you will not be able to determine where each of them started. If you do it with strsplit twice, you can know where the ':' starts :
A='abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
B=cellfun(#(x) strsplit(x,'/'),strsplit(A,':'),'uniformoutput',0);
Now B has cells that start with ':', and has two cells in each cell that contain '/' also. You can extract it with checking where B has more than one cell, and take the first of each of them:
C=cellfun(#(x) x{1},B(cellfun('length',B)>1),'uniformoutput',0)
C =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
Starting in 16b you can use extractBetween:
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = extractBetween(str,':','/')
result =
2×1 cell array
{'45.72643,4.91203'}
{'hereanotherdata' }
If all your text elements have the same number of delimiters this can be vectorized too.

Spotfire: count the number of a certain character in a string

I am trying to add a new calculated column that counts the number of semi colons in a string and adds one to it. So the column i have contains a bunch of aliases and I need to know how many for each row.
For example,
A; B; C; D
So basically this means there are 4 aliases (3 semi colons + 1)
Need to do this for over 2 million rows. Help please!
Basic idea is to subtract length of your string without ; characters from it's original length:
len([columnName])-len(Substitute([columnName],";",""))+1
Here it is with a regular expression:
Len(RXReplace([Column 1], "(?!;).", "", "gis"))+1
RXReplace takes as arguments:
The string you are wanting to work on (in this case it is on Column 1)
The regular expression you want to use (here it is (?!;). )
What you want to replace matches with (blank in this situation so
that everything that matches the regex is removed)
Finally a parameter saying how you want it to work (we are passing
in gis which means replace all matches not just the first, ignore case, replace newlines)
We wrap this in a Len which gives us the amount of semicolons since that is all that is left and finally we add 1 to it to get the final result.
You can read more about the regular expression here: https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx but in a nutshell it says match everything that isn't a semi colon.
You can read more about RXReplace and Len here: https://docs.tibco.com/pub/spotfire/6.0.0-november-2013/userguide-webhelp/ncfe/ncfe_text_functions.htm

Use REPLACE and LIKE together in postgres

I am trying to replace all the occurences of '-' in a column of a table.
What I need is also to replace the string which exists after the dash and its a random number.
To be more specific this is one of my values:
"ANDRIU 5-9, CHAL 152 34, SOMETHING"
What I want is to replace this part:
-9
with an empty space.
The problem is that: 9 can be any number and not necessarily one digit.
So I need something like finding the position of the first comma in the whole string. And the position of the dash and then replacing this based on the index values.
Is this possible?
Postgres provides the function regexp_replace(), which does what you want directly:
select regexp_replace(col, '-[0-9]+', ' ')