Combine neighbouring lines - regex

I need to combine this:
"1381733226.6811","Form1","your-email","example1#gmail.com","1",NULL
"1381733226.6811","Form1","your-subject","foo1","2",NULL
"1381733868.4487","Form1","your-email","example2#gmail.com","1",NULL
"1381733868.4487","Form1","your-subject","foo2","2",NULL
"1381734307.5494","Form1","your-email","example3#gmail.com","1",NULL
"1381734307.5494","Form1","your-subject","foo3","2",NULL
"1381735753.0189","Form1","your-email","example4#gmail.com","1",NULL
"1381735753.0189","Form1","your-subject","foo4","2",NULL
into this:
example1#gmail.com - foo1
example2#gmail.com - foo2
example3#gmail.com - foo3
example4#gmail.com - foo4
Some of lines are "bad" and they should be avoided. For example:
"1387658626.6811","Form1","your-email","example1#gmail.com","1",NULL
"1381124126.1211","Form1","your-subject","foo1","2",NULL
or:
"1381733226.6811","Form1","your-email","example1#gmail.com","1",NULL
"1381733226.6811","Form1","your-email","foo1","2",NULL
I already tried do change this:
"\d+?\.\d+?","Form1","your-email","([^\r\n])*","1",NULL\r?\n"\d+?\.\d+?","Form1","your-subject","([^\r\n])*","2",NULL)
to this:
$1 - $2
But I failed and its not working :/. Have you any ideas?

You can use this regex, this is the correct form of your regex:
"(?<id>\d+?\.\d+?)","Form1","your-email","([^"]*)","1",NULL\r?\n"\k<id>","Form1","your-subject","([^"]*)","2",NULL
and replace with
$2 - $3
I modified the selection of content to ([^"]*) to ensure that it will match only the right content, and passed the * inside the selection.
Modified the regex to verifi if the ID is the same in both lines. working sample

Related

Mass regex search-and-replace BETWEEN patterns

I have a directory with a bunch of text files, all of which follow this structure:
...
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
- Again, some list items of random text
- Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
....
And I need to run a replace operation (let's say, I need to prepend CCC at the beginning of the line, just after the dash) on only those "list items", which are between PATTERN_A and PATTERN_B. The problem is they aren't really much different from the text above PATTERN_A, or below PATTERN_B, so an ordinary regex can't really catch them without also affecting the remaining text.
So, my question would be, what tool and what regex should I use to perform that replacement?
(Just in case, I'm fine with Vim, and I can collect those files in a QuickFix for a further :cdo, for example. I'm not that good with awk, unfortunately, and absolutely bad with Perl :))
Thanks!
If I have understood your questions, you can do so quite easily with a pattern-range selection and the general substitution form with sed (stream editor). For example, in your case:
$ sed '/PATTERN_A/,/PATTERN_B/s/^\([ ]*-\)/\1CCC/' file
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
-CCC Again, some list items of random text
-CCC Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
(note: to substitute in place within the file add the -i option, and to create a backup of the original add -i.bak which will save the original file as file.bak)
Explanation
/PATTERN_A/,/PATTERN_B/ - select lines between PATTERN_A and PATTERN_B
s/^\([ ]*-\)/\1CCC/ - substitute (general form 's/find/replace/') where find is from beginning of line ^ capturing text between \(...\) that contains [ ]*- (any number of spaces and a hyphen) and then replace with \1 (called a backreference that contains all characters you captured with the capture group \(...\)) and appending CCC to its end.
Look things over and let me know if you have questions or if I misinterpreted your question.
With Perl also, you can get the results
> perl -pe ' { s/^(\s*-)/\1CCC/g if /PATTERN_A/../PATTERN_B/ } ' mass_replace.txt
...
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
-CCC Again, some list items of random text
-CCC Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
....
>

How to get negative lookahead in regex to accept more words

I am trying to get some data for Splunk.
From this:
this my line - Fine (R/S)
more date - I like this (not)
date - output (yes)
I like to get all data from - to the end of line, but not the data in parentheses if it contains not or yes, so data in group1 should be:
Fine (R/S)
I like this
output
I have tried some like this:
- (.+) (?!(not|yes))
But this gives:
Fine
I like this
output
Or This:
- (.+)(?!not)
Gives:
Fine (R/S)
I like this (not)
output (yes)
You may try this,
- ((?:(?!\((?:not|yes)\)).)*)(?=\s|$)
DEMO
or
- (.*?)(?=\s+\((?:not|yes)\)|$)
This would capture all the chars until a space(yes) or space(no) or end of the line is reached.
DEMO

VB.NET: Get strings from lines between words

I am trying to find a way to grab the strings that are included between two words but I cannot figure out how to do it. I need each line to be added to a listbox.
For example:
First:
http://google.com
http://yahoo.com
default
Second:
http://facebook.com
http://123.com
http://test.com
default
Using this as an example, the first listbox needs to include the following items:
http://google.com
http://yahoo.com
default
And the second listbox should include those items:
http://facebook.com
http://123.com
http://test.com
default
How is this possible? I only know how to get a string between two words using split but it doesn't work in this case.
Thanks in advance.
Based off your data, you may consider using a Negative Lookahead to match the lines you want only.
For Each m As Match In Regex.Matches(input, "(?m)^(?!(?:First|Second):).+$")
ListBox1.Items.Add(m.Value)
I think you want something like this,
(?<=\n|^)First:(?:(?!\n\n).)*?(http://google\.com)(?:(?!\n\n|$).)*?(http://yahoo\.com)(?:(?!\n\n).)*?default(?=\n\n)|(?<=\n|^)Second:(?:(?!\n\n).)*?(http://facebook\.com)(?:(?!\n\n).)*?(http://123\.com)(?:(?!\n\n).)*?(http://test\.com)(?:(?!\n\n).)*?default(?=\n\n|$)
DEMO
How about something like this:
(?<=First:)(.*)
Online RegEx Demo
With this code:
Dim options = RegexOptions.Singleline
Dim sampleInput="First:" + Environment.NewLine + "http://google.com" + Environment.NewLine + "http://yahoo.com" + Environment.NewLine + "default"
Dim results = Regex.Match(sampleInput,"(?<=First:)(.*)",options).Value
Code Demo

Using OpenOffice sCalc - How to use IF function with REGEX capture and if true print capture to cell

I have a worksheet (OpenOffice sCalc) with many rows of data, MOST of them have a year enclosed in ()
One of the cell's has this content: Mary had a little lamb, Sarah Josepha Hale (1830)
I would like to capture the year and save it in the cell to the right.
This stmt will tell me if a year is present:
=IF(COUNTIF(L115; ".*[(][0-9]{4,4}[)].*");"hooray"; "boo")
When I try to replace "Hooray" with $1 in this stmt I get an error:
=IF(COUNTIF(L115; ".*([(][0-9]{4,4}[)]).*");$1; "boo")
I get this: #REF!
What is the correct syntax? Thank you in advance!
Regex capturing is possible in Search/replace (must be enabled under "More Options"), but I don't know if you can use capturing in formulae.
An alternative way:
=VALUE(MID(L115;FIND("(";L115)+1;4))

Replace multiple words in pig

I am new to Pig. In the script that I am writing I want to perform an operation similar to this:
foreach X GENERATE REPLACE(word,'.*abc.*','abc') OR REPLACE(word,'.*def.*','def').
If the first pattern matches then abc is replaced else if second pattern is matched then def is replaced. But I suppose the syntax is incorrect. Can someone help me with the syntax?
There are a few ways to do this, but since if the regex doesn't match the string, you'll just get your string back, this is pretty compact:
Y = FOREACH X GENERATE REPLACE(REPLACE(word, '.*abc.*', 'abc'), '.*def.*', 'def');