I am editing a large YAML file, and have this appear in the text many times (replaced actual text with placeholders):
options:
- placeholder1!
- this is a placeholder!
- %placeholderhere%!
- placeholder
- you get the point by now
- more placeholders
- one last placeholder...
this is just some text don't replace
options:
- placeholder1!
- this is a placeholder!
- %placeholderhere%!
- placeholder
- you get the point by now
- more placeholders
- one last placeholder...
I want to turn that into this:
options:
- placeholder1!
this is just some text don't replace
options:
- placeholder1!
Thanks for all your help :)
Input
options:
- placeholder1!
- this is a placeholder!
- %placeholderhere%!
- placeholder
- you get the point by now
- more placeholders
- one last placeholder...
this is just some text don't replace
options:
- placeholder1!
- this is a placeholder!
- %placeholderhere%!
- placeholder
- you get the point by now
- more placeholders
- one last placeholder...
Find ^options.*?- (placeholder1).*?placeholder...$
Replace with -\1!
Note: don't forget to check matches newline checkbox (v6.0 onwards only)
Result:
-placeholder1!
this is just some text don't replace
-placeholder1!
Related
I need to combine this:
"1381733226.6811","Form1","your-email","example1#gmail.com","1",NULL
"1381733226.6811","Form1","your-subject","foo1","2",NULL
"1381733868.4487","Form1","your-email","example2#gmail.com","1",NULL
"1381733868.4487","Form1","your-subject","foo2","2",NULL
"1381734307.5494","Form1","your-email","example3#gmail.com","1",NULL
"1381734307.5494","Form1","your-subject","foo3","2",NULL
"1381735753.0189","Form1","your-email","example4#gmail.com","1",NULL
"1381735753.0189","Form1","your-subject","foo4","2",NULL
into this:
example1#gmail.com - foo1
example2#gmail.com - foo2
example3#gmail.com - foo3
example4#gmail.com - foo4
Some of lines are "bad" and they should be avoided. For example:
"1387658626.6811","Form1","your-email","example1#gmail.com","1",NULL
"1381124126.1211","Form1","your-subject","foo1","2",NULL
or:
"1381733226.6811","Form1","your-email","example1#gmail.com","1",NULL
"1381733226.6811","Form1","your-email","foo1","2",NULL
I already tried do change this:
"\d+?\.\d+?","Form1","your-email","([^\r\n])*","1",NULL\r?\n"\d+?\.\d+?","Form1","your-subject","([^\r\n])*","2",NULL)
to this:
$1 - $2
But I failed and its not working :/. Have you any ideas?
You can use this regex, this is the correct form of your regex:
"(?<id>\d+?\.\d+?)","Form1","your-email","([^"]*)","1",NULL\r?\n"\k<id>","Form1","your-subject","([^"]*)","2",NULL
and replace with
$2 - $3
I modified the selection of content to ([^"]*) to ensure that it will match only the right content, and passed the * inside the selection.
Modified the regex to verifi if the ID is the same in both lines. working sample
This might be a silly question but I can't seem to overcome this myself -
I have a field with strings, which sometime end with 3 numbers separated by commas, for example
- 2353535.123213.124
- data.2354234.1324.1314
- data.old-24234.2341.4325
and sometimes not
- aaaa.53535
- data.old-3521
- data.AFG34fsaf34
Whenever the first case occurs, I need to extract the 3-numbers pattern from the end of the string. Meaning:
- 2353535.123213.124 -> 2353535.123213.124
- data.2354234.1324.1314 -> 2354234.1324.1314
- data.old-24234.2341.4325 -> 24234.2341.4325
- aaaa.53535 -> Do nothing
Is that possible?
If not through hiveQL (although this is preferable), even a java regular expression extraction would be helpful (to use in a custom UDF).
\\d+(?:\\.\\d+){2}$
You can use this java expression.See demo
I am using TALEND DATA INTEGRATION
I have a log file like this
I - Fab - 392 - 2014/12/20 22:09:15:200 - XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Begin :
I - Fab - 392 - 2014/12/20 22:12:15:438 - XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Bus / Before :
500|00104|002PL|0036364043 |005PL
809|001BBG|00365 |005-0200|006+0000|007000|0080000|0240|0250|0260|0270|0280|0290|033STK|034063100 |0441
830|0093100 |0441
I - Fab - 392 - 2014/12/20 22:12:19:766 - XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Bus / After :
500|00104|002PL|0036364043 |005PL
510|001BBG|00365 |005-0200|006+0000|007000|0080000|0240|0250|0260|0270|0280|0290|033STK|034063100 |0441
I want to extract the lines 2&3 and 6&7 (it's not always pair and impair). Anyway, I used a regular expression :
"I - (Fab|Opt) - \\d+ - (\\d{4}/\\d{2}/\\d{2}) (\\d{2}:\\d{2}:\\d{2}:\\d{3}) - .+ Bus / (.+) : \\n500|.+|003(\\d{7}).+"
using a tFileInputRegex, however I don't know what to use in the row separator (by default "\n")
I want my output to be a CSV file in which there are data extracted from the first and second lines.
I used a tMap to generate a CSV file, but the problem is I cannot extract the data I want.
If I extract the data I want I will be able to generate the file. So, I need help in the regex part. I wonder if there's a way in Talend DI to extract multiple rows (in my case TWO) using tFileInputRegex.
EDIT :
I have specified I - as a row separator, so I can be able to use \n (without any confusion), but the regex doesn't seem functional.
The \n delimiter for multiline (rows) should work, so it's more an issue of your overall regex. Try using a pattern such as this, for it should capture the groups correctly:
I.+(Fab|Opt).+(\\d{4}\\/\\d{2}\\/\\d{2}).+(\\d{2}:\\d{2}:\\d{2}:\\d{3}).+Bus\\s\\/\\s(\\w+)\\s:\\W+\\n500.+003(\\d{7}).+
Example:
https://www.regex101.com/r/nL2xT7/
I need to get all - characters between ###.
Input string: ### qwerty-qwerty-qwerty-qwerty - - - ###
(?<=###)\s?([\-]*)\s?(?=###)
Thanks in advance.
http://regex101.com/r/jL9lZ9/1
You could try the below regex to match - symbols which are present within ###,
(?:^(?:(?!###).)*(?=###.*?###)|(?<=###)(?:(?!###).)*$)(*SKIP)(*F)|-
DEMO
(?!.*?###.*?###.*?)(?=.*?###)-
This works as well.
See Demo:
http://regex101.com/r/jL9lZ9/4
This ones a 2 step one, slightly different but more understandable. Can use this if we are try to extract a pattern(-) between a specific pattern(###).
text="- qwe--### -- - qwerty- --## -qwerty --- ###- qwerty-- qw- - ### - rty--"
Note that there is a double hash(##) here, now assuming that we want -s between triple hashes(###) only
use this to extract required text (?<=###)[^#]*(?=###)
After that its just this - to extract
you can replace the boundary patterns and search patterns as required.
I have a CSV file that looks like so:
productSku-1,attribute1,2,3
productSku-1,attribute4,5
productSku-1,attribute4,5
productSku-2,attribute1,2,3
productSku-2,attribute4,5
productSku-3,attribute1,1
I am trying to “collapse” the same product attributes into one line, while getting rid of the extra instances of productSku. So, I match product to the next line and then remove the next productSku lines as well as the line break to compress it into a single line. In the above example, the result should look like so:
productSku1,attribute1,2,3,attribute4,5,attribute4,5
productSku2,attribute1,2,3,attribute4,5
productSku3,attribute1,1
I thought the following substitution command would work, but I have never used the \% sign.
:%s/(^[A-Za-z0-9-]+)(.)((\%(\n\1)(.))+)/\1\2\3
I thought it would exclude the match from match \3… but it isn’t working.
One way to solve this problem is like this:
:sort
qqjkdt,:g/<C-r>"/norm dt,-gJ<Enter>0P+q
10000#q
Step by step
:sort - sort all of the lines by productSku
qq - start a recording, see :help qq
jk - this is a little macro hack... if we get to the last line it will throw an error and the rest of the command won't execute.
dt, - delete the productSku
:g/ - start a :global command, see :help :global
<C-r>" - insert the contents of the " register (the productSku we deleted)
norm - short for :help :normal
dt, - delete the productSku since we don't want them
- - go to the previous line
gJ - join the lines without any spaces
<Enter> - finish the :global command
0P - put the productSku back at the beginning of the line
+ - move down one line
q - finish the recording
10000#q repeat the recording 10k times (or more if you need)
I would use the following efficient command.
:g/\%(^\1,.*\n\)\#<=\([^,]*\)/s/$/,/|-j!