Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
please help me with this. I have file of following pattern
ABC x
bla bla bla
bla bla
bla
XYZ
ABC y
bla bla bla
bla bla
bla bla bla
XYZ
ABC z
bla bla bla
XYZ
I need output in file x.txt
ABC x
bla bla bla
bla bla
bla
XYZ
and
ABC y
bla bla bla
bla bla
bla bla bla
XYZ
in y.txt and so on for rest of patterns
Some like this:
awk '/ABC/ {close(f".txt");f=$2} {print >f".txt"}' file
This test if line starts with ABCand then set output file name to value in $2
Eks
cat y.txt
ABC y
bla bla bla
bla bla
bla bla bla
XYZ
EDIT: added close() to awk to close open file so it does not run out of space if it creates many files
Well, put all Input into one Array, walk through that with Regex until you find ABC something, open something.txt as Output, continue through your Array printing the lines to Output until you find XYZ and move on.
I assume you know how to write Perl.
How about:
my $fh;
open my $fh_in, '<', "input_file" or die "unable to open 'input_file: $!";
while (<$fh_in>) {
chomp;
if (/^ABC \w+/../XYZ/) {
if (/^ABC (\w+)/) {
open $fh, '>', "$1.txt" or die "unable to open '$1.txt': $!";
}
print $fh $_,"\n";
}
}
You can do it using sed as shown below:
sed -n '/ABC x/,/XYZ/p' file > x.txt
If you know all the patterns, you can use a loop:
for i in x y z
do
sed -n "/ABC $i/,/XYZ/p" file > "$i".txt
done
awk '/ABC/{name="file "$2".txt";}{print >name}' your_file
Related
It is better to describe an example. I have a latex source file (this is an ordinary text file) that has a lot of charactes $ enclosing inline equations, something like this:
bla bla bla $E = mc^2$ bla blah
I would like to replace each ocurrence of a matching pair of $ characters in the file by \( ... \), like this:
bla bla bla \(E = mc^2\) bla blah
Any idea of to do this, as simple as possible? I am not sure grep is able to handle this.
Assume that the file has an even number of occurrences of $. In that case, all we have to do is replace the $ at odd positions by \(, and the $ at even positions by \).
Like this?
spacewrench$ cat foo
bla bla bla $E = mc^2$ bla blah
spacewrench$ sed -e 's/\$\(.*\)\$/\\(\1\\)/g' < foo
bla bla bla \(E = mc^2\) bla blah
sed can do it. You may need to play with the number of backslashes, plus line endings if you have expressions that extend over multiple lines.
The .* expression is greedy, so it might only put one pair of parentheses around multiple $ on a line...you can fix that by replacing .* with [^\$]*.
Thanks in advance for any help you can provide
I have a text like that:
Bla bla bla bl[CR][LF]
a bla bla bla[CR][LF]
bla bla.[CR][LF]
Bla bla bla bla bl[CR][LF]
...and so on
I'd like to replace all new lines except the ones having a dot as last character.
This is the what I wanna get to:
Bla bla bla bla bla bla bla bla bla.[CR][LF]
Bla bla bla bla bla.[CR][LF]
...and so on
I tried with Notepad++, that supports RegEx, using the Search & Replace tab (Ctrl+H). That's the code:
Search: [^.\r\n]\r\n
Replace field had just a space.
It worked, but it truncates the last character of every line.
Bla bla bla bla bla bla bla bla bl.[CR][LF]
Bla bla bla bla bl.[CR][LF]
As I am a RegEx novice, which is the best way to do that?
Use this regex: (?<!\.)\r\n in the search field. It means find any \r\n that isn't preceded by a ..
Your regex means find any three characters where the first one isn't a . \r or \n, and the last two are \r\n. But then when you go to replace, it replaces that 1st character as well. The regex I posted checks for the non-period as a zero-length, so it doesn't replace that character.
Data
Bla bla usr/bin/rcp bla bla
Bla bla usr/bin/awl bla bla
Bla bla usr/bin/cp bla bla
Bla bla usr/bin/ftp bla bla
Bla bla usr/bin/cut bla bla
Ignore list
cp
ftp
rcp
Problem
I Need a regular expression (Java ish) that will process the data lines (of which there will be many others) and if usr/bin/ is found show it as a match but only if not followed by a word on the ignore list
Please see Regex Demo here.
usr\/bin\/(?!cp|ftp|rcp)
You need a negative lookahead regex here, try this regex:
user/bin/(?!(cp|ftp|rcp))
I've been using the bash command line with grep -e and sort -nr trying to filter and analyze some lines coming from a bunch of "data" files. So far I came out with an output file like this:
25 The X value is: bla bla bla done
19 The X value is: foo done
19 The X value is: bar done
19 The X value is: bbb done
19 The X value is: xxx yyy zzz done
where you can see the frequency and the "data" part I am interested into.
I am not able to find a regex to be used by grep to "clean those lines". I mean: I can intercept those "data" lines with a regex like is:.*done (I know this pattern is unique in the files I am analyzing), but how can I clean those lines extracting exactly the stuff between "is:" and "done"?
Try sed instead:
$ sed -r 's/^.*: (.*) done$/\1/' outputfile.txt
bla bla bla
foo
bar
bbb
xxx yyy zzz
If you wanted to return:
bla bla bla
foo
bar
bbb
xxx yyy zzz
you can use
(?<=:)(.*)(?=done)
I've a so written text:
11 bla gulp bla 22
11 bla bla bla 2211 bla
ble
bli 22
I need a regex to find all the text between all the couples "11" and "22" BUT that DON'T contain "gulp".
If I search (?s)11.*?22 using TextCrawler, I find all the three strings:
bla gulp bla
bla bla bla
bla ble bli
Wrong! I'd like to obtain only:
bla bla bla
bla ble bli
because "bla gulp bla" contains "gulp", and I don't want it!
Any idea? :-)
use a negative lookahead assertion:
11(?!.*?gulp.*?)(.*?)22
word boundaries might be a good idea in the middle (surrounding gulp), because it would allow to distinguish between gulp and gulping, gulped or ungulp(?):
11(?!.*?\bgulp\b.*?)(.*?)22
but putting them around everything:
\b11\b(?!.*?\bgulp\b.*?)(.*?)\b22\b
would exclude your other two results - not what you want.