Bash: cutting a delimited fragment of each string [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a file containing lines that look like this:
GTTCAGAGTTCTACAGTCCGACGATCGGATGAGNNNNNN
GTTCAGAGTTCTACAGTCCGACGATCTCCGAGTNNNNNN
GTTCAGAGTTCTACAGTCCGACGATCCTTATATNNNNNN
GTTCAGAGTTCTACAGTCCGACGATCGAAGTGCNNNNNN
GTTCAGAGTTCTACAGTCCGACGATCAAGTTTTNNNNNN
GTTCAGAGTTCTACAGTCCGACGATCCGACGAANNNNNN
I want to remove the first 26 and final 6 characters from each line. I haven't been able to write a good regular expression to accomplish that using vi, but I'm not sure what else to do.
Any suggestions?
Thanks!

Try with grep.
This will keep the last 13 characters and then the first 7, returning only the matching characters (-o) with the Perl-compliant -P flag:
grep -oP ".{13}$" foo.txt | grep -oP ".{7}"

If your file name is foo you can use cut to grab out the range of chars you want:
$ cut -c27-33 foo
This produces:
GGATGAG
TCCGAGT
CTTATAT
GAAGTGC
AAGTTTT
CGACGAA

cut can take a character range, if the lines are a fixed size (they appear to each be 39 characters)
cut -c27-33 file.txt

Related

grep nth string from a very large file in constant time(file size independent)? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Is there a grep (sed/awk) like tool in linux to find the nth occurrence of a string(regex) from a very large file? Also, I would like to find the number of occurrences of the search string within the file. Remember, the file is really large (> 2 gb).
Grep solution:
grep -on regexp < file.txt
file.txt:
one two one
two
one
two two
two one
Lines with regexp one
grep -on one < test.txt
1:one
1:one
3:one
5:one
How many occurrences:
grep -on one < test.txt | wc -l
4
Line with the Nth occurrence:
grep -m1 one < test.txt | tail -n1
one two one
Update: Now, the solutions don't use cat. Thanks to #tripleee for the hint.
I would like to find the number of occurrences of the search string
within the file
If the search string can't contain spaces, below might suffice:
awk -v RS=" " '/string/{i++}END{print "string count : " i}' file
But how fast it would be depends on the available RAM on the system.

unix change number's value on given line number in shell script for loop [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
How can I change a number on a line in a file using a unix tool like awk or sed?
I want to change the line 3 in my example file to the number 1-10 using a shell script. I think I need to use regex to recognize the digit but I'm not sure how to do this, or to allow multiple digits (like 10).
Example file:
/examples/are/hard so/hard/1
Shell script so far:
for i in {1..3};
do
sed 's|/examples/are/hard so/hard/7 | /examples/are/hard so/hard/'"$i" ex_file
cat ex_file
done
Desired output:
/examples/are/hard so/hard/1
/examples/are/hard so/hard/2
/examples/are/hard so/hard/3
What you've run isn't a valid sed command. If you're trying to do a substitution, that's s/search/replace/flags.
I imagine you meant:
sed 's/here\/is the number\/to\/change 3/here\/is the number\/to\/change '"$i"'/' ex_file
Note that we temporarily break out of single quote. Inside of single quotes, variable aren't interpolated. We swap the double quotes, bring in $i, then return to single quotes to finish the command.
P.S. You also don't have to use / as your delimiter.
sed 's|here/is the number/to/change 3|here/is the number/to/change '"$i"'|' ex_file

Filter specific lines from directory tree listing [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have the following directory listing:
/home/a/b/c/d/5089/294265
/home/a/b/c/d/5089/79783
/home/a/b/c/d/41630
/home/a/b/c/d/41630/293520
/home/a/b/c/d/41630/293520/293520
...
I want to filter only the lines that go 7 directories deep. In this example I would need only the line: /home/a/b/c/d/41630/293520/293520
Please suggest.
Thanks
You could use grep. Saying:
grep -P '(/[^/]*){8}' inputfile
would return
/home/a/b/c/d/41630/293520/293520
Not sure how you are generating this listing, but if you were using find you could control the depth by specifying -mindepth and -maxdepth options.
You can try:
find /home/x/y/z/ -print | awk -F/ 'NF>8'
or you could try
find /home/x/y/z/ -mindepth 7 -print
YourInput | sed 's|/.|&|7;t
d'
remove line with less than 7 "/" followed by something
echo /home/a/b/c/d/*/*/*
should do the trick.
Using awk:
find /home| awk -F \/ 'NF==9' file

Replace Specific Instance [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Improve this question
I'm completely new to RegEx and could really use some help with my delimma. I have a large text file of IP addresses and corresponding Hosts.
eg.
157.55.33.47 msnbot-157-55-33-47.search.msn.com
157.56.93.62 msnbot-157-56-93-62.search.msn.com
etc...
I need a find and replace algorithm that apeends to the beginning and end of each line and replaces the delimeter, which in this case is just a space.
eg. the ouput after running the regex should be
'text1' 157.55.33.47 'text2' msnbot-157-55-33-47.search.msn.com 'text3'
'text1' 157.56.93.62 'text2' msnbot-157-56-93-62.search.msn.com 'text3'
Any guidance would be greatly appreciated!
Find what:
^([\S]+)\s([\S]+)$
Replace with:
'text1' $1 'text2' $2 'text3'
You can use a Macro for this instead of a regex. Record keystrokes on the first line. I'm on a Mac right now, so I can't be sure this is right, but it should be close to:
Home, [type 'text1'], CTRL+RightArrow [repeat 7 times], [type 'text2],
space, End, space, [type 'text3'], DownArrow
Once your Macro is recorded, re-run your Macro for the entire file. Again, I can't see the exact options, but it will be something like the following:
Go to Macros>Run a Macro Multiple Times..., select Current recorded macro, and Run until the end of file.

select columns from a line starting from specific word to last column in unix? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 9 years ago.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Improve this question
Input file
1. blue is color
2. His shirt color is blue and it is good
3. deep blue see is a movie
output:
1. blue is color
2. blue and it is good
3. blue see is a movie
I need the output that start from specific word to last column in unix, using awk or cut.
awk 'BEGIN{FS="\\<blue\\>"; OFS="blue"}{$1=""}7' file
the above line will output:
kent$ awk 'BEGIN{FS="\\<blue\\>"; OFS="blue"}{$1=""}7' file
blue is color
blue and it is good
blue see is a movie
Note that "\\<blue\\>" will match exactly word blue, not bluesky or darkblue I hope this is what you want.
Code for sed:
$ sed -r 's/^([0-9]+. ).* (blue)\b/\1\2/' file
1. blue is color
2. blue and it is good
3. blue see is a movie
This is a pretty common use case,
cat test.txt | awk '{ if (x=index($0,"blue")) { print substr($0,x,length($0)); } }'
I would recommend you to take a look at Awk - A Tutorial and Introduction
In Perl:
perl -nle'/(blue.*)/&&print"$1\n"'