Git grep: no padding option? It moves lines when one and two digits results presented - git-grep

My config contains
[grep]
lineNumber = true
My command:
git grep --heading -C9 xxxxxxxx
I am getting result like:
7- yyyyyyyy
8- yyyyyyyy
9- yyyyyyyy
10: xxxxxxxx
11- yyyyyyyy
12- yyyyyyyy
But I want, for example:
7- yyyyyyyy
8- yyyyyyyy
9- yyyyyyyy
10: xxxxxxxx
11- yyyyyyyy
12- yyyyyyyy

Related

Identify & replace 2nd instance of search term in string... VBA RegEx doesn't have lookbehind functionality

I have a list of strings in format as below :
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 250PS xxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 350PS xxxxxxxxxxxxx xxxxxxxxx xxxx
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxx 100PS xxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 200PS xxxxxxxxxxxxxxxx 200PS xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 250PS xxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 350PS xxxxxxxxxxxxx xxxxxxxxx xxxx
In Excel/VBA, and I am trying to remove duplicate values from the string i.e. 100PS and 200PS where it is printed out twice. Using VBA and Reg-Ex I've come up with :
(?<=\d\d\dPS\s.*)(\d\d\dPS\s)
And this seems to work when testing it online and on other languages, but in VBA, lookbehind is not supported, and this is absolutely wrecking my brain.
The value always consists of \d\d\d (3 digits) and PS, ends with \s but all the xxxxxx text around it can differ every time and have different lengths etc.
How would I possibly choose the duplicate PS value with regex?
I have looked through stackoverflow and found a couple of reg-ex examples, but they don't seem to be working in VBA..
Any help is greatly appreciated,
Thanks
Have you considered a worksheet formula?
=SUBSTITUTE(A1,MID(A1,SEARCH("???PS",A1),6),"",2)
See regex in use here
(\s(\d{3}PS)\s.*\s)\2\s
(\s(\d{3}PS)\s.*\s) Capture the following into capture group 1
\s Matches a single whitespace character
(\d{3}PS) Capture the following into capture group 2
\d{3} Matches any 3 digits
PS Match this literally
\s Matches a single whitespace character
.* Matches any character (except \n) any number of times
\s Matches a single whitespace character
\2 Matches the text that was most recently captured by capture group 2
\s Matches a single whitespace character
Replacement: $1 (puts capture group 1 back into the string)
Result:
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 250PS xxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 350PS xxxxxxxxxxxxx xxxxxxxxx xxxx
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxx xxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 200PS xxxxxxxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 250PS xxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 350PS xxxxxxxxxxxxx xxxxxxxxx xxxx

Replace White Spaces with sed

I have a large file (100M rows) in the following format:
Week |ID |Product |Count |Price
---------- ------------- -------- ---------- -----
2016-01-01|00056001 |172 |23 |3.50
2016-01-01|1 |125 |15 |2.75
I am trying to use sed to add Xs to the missing digits on the second customer ID, but maintain the number of spaces after the full ID. So, the table would look like:
Week |ID |Product |Count |Price
---------- ------------- -------- ---------- -----
2016-01-01|00056001 |172 |23 |3.50
2016-01-01|1XXXXXXX |125 |15 |2.75
I have tried
sed -i "s/\s\{29,\}/XXXXXXX /g" *.csv
and
sed -i -- "s/1 /1XXXXXXX /g" *.csv
Neither with any change to the file. What am I missing?
Thanks.
EDIT for clarification: There are 29 spaces after the 1 in the actual data. I used less on the example table for readability sake. I assume whatever solution works will apply no matter the number of spaces.
That works for me (not using \s but merely space, and dropped the useless g option because needed once per line only):
sed -i "s/[ ]\{29,\}/XXXXXXX /" *.csv
Although for safety reasons I would rather use a more restrictive script which would perform the substitution only if |1 is encountered:
sed -i "s/\(\|1\)[ ]\{29,\}/\1XXXXXXX /" *.csv

grep: how to find ALL the lines between to expressions

We have a HUGE file (numbers), we want to get ALL the lines between two expressions, e.g.,
232445 -9998.01 xxxxxxxxxx
234566 -9998.02 xxxxxxxxx
.
.
324444 -8000.012 xxxxxxx
344444 -8000.0 xxxx
and the expressions are -9998.01 and -8000.0, so tried:
$ grep -A100000 '[0-9] -9998.[0-9]' mf.in | grep -B100000 '[0-9] -8000.[0-9]' mf.in > mfile.out
And this is OK ...ALL the lines between are get it... of course, 100000 is so big as to keept ALL the lines between... but if we are wrong? i.e., if there are more than 100000 between? How we can take ALL between without numeric specification after A and B ...
PD: I was unable to use sed with similar "[ ...]" expressions
PD2: the columns has more digits (here only 4 columns)
-1931076.0 -9998.96235 1.0002741998076021 0.0191476198569163
-1931075.0 -9998.95962 1.0000742544770280 0.0192495084654059
-1931074.0 -9998.95688 0.9998778097258081 0.0193725608470694
With awk:
awk '$2 ~ /^-9998.01$/{p=1} p{print} $2 ~ /^-8000.0$/{p=0}' file
Test:
$ cat file
232445 -9998.00 xxxxxxxxxx
232445 -9998.01 xxxxxxxxxx
234566 -9998.02 xxxxxxxxx
234566 -9998.03 xxxxxxxxx
234566 -9998.05 xxxxxxxxx
....
....
324444 -8000.011 xxxxxxx
324444 -8000.012 xxxxxxx
344444 -8000.0 xxxx
344444 -8000.1 xxxx
$ awk '$2 ~ /^-9998.01$/{p=1} p{print} $2 ~ /^-8000.0$/{p=0}' file
232445 -9998.01 xxxxxxxxxx
234566 -9998.02 xxxxxxxxx
234566 -9998.03 xxxxxxxxx
234566 -9998.05 xxxxxxxxx
....
....
324444 -8000.011 xxxxxxx
324444 -8000.012 xxxxxxx
344444 -8000.0 xxxx
sed already has this functionality builtin using this expression:
/regex1/,/regex2/ p=>p command prints all lines that are present in between 2 lines(start line having regex1 and end line having regex2(both inclusive in output)).
Here is an example wrt your file format:
$ cat file
124235 -69768.77 xxx
232445 -9998.01 xxxxxxxxxx
234566 -9998.02 xxxxxxxxx
12345 -124.66 xxxx
324444 -8000.012 xxxxxxx
344444 -8000.0 xxxx
344444 -7000.0 xxxx
$ sed -nr '/^[0-9]+\s-9998.[0-9]+\s/,/^[0-9]+\s-8000.[0-9]+\s/ p' file
232445 -9998.01 xxxxxxxxxx
234566 -9998.02 xxxxxxxxx
12345 -124.66 xxxx
324444 -8000.012 xxxxxxx
344444 -8000.0 xxxx
$
Well it might not be the best answer, but the easy fix for your command would be to use the file's number of lines as argument to -A and -B, so you're sure you cannot miss any lines:
NB_LINES=$(wc -l main.c | awk '{print $1}')
grep -A$NB_LINES '[0-9] -9998.[0-9]' mf.in | grep -B$NB_LINES '[0-9] -8000.[0-9]' mf.in > mfile.out
Though, tbh, in pure shell it's very likely I'd do something similar. Or I'd write a small python script, that would look like:
import re
LINE_RE = re.compile(r'[^ ]+ (-[0-9]+\.[0-9]+) .*')
with open('mf.in', 'r') as fin:
with open('mf.out', 'w') as fout:
for line in f:
match = LINE_RE.match(line)
if match:
if float(match.groups()[0]) > -9998.0:
fout.write(line)
elif float(match.groups()[0]) < -8000.0:
break
N.B.: this script is just to expose the algorithmic idea, and being blindly coded and untested, it might need some tweaking to actually work.
HTH

match the first two bits (only) in a digital stream line (byte) using grep

How should I match the first two bits (first occurrence only) in a digital stream line (1 byte line) using grep, in one direction only, i.e. (01 but not 10 in 01051015);
I've been tested:
grep -E '^[0-9]\{2\}$' | grep -Po --color '01' <<< 01051015
> 010-10-- (current output)
$cat -n test.txt
1 0001021113
2 0202031011
3 0103031113
4 ..........
$ grep -oE '^[0-1][0-9]\{2,2\}' | grep -E '(10)' ./test.txt > matchedList.txt
$ cat -n matchedList.txt
1 0001021113
2 0202031011
3 0103031113
But I need to parse and math the first "par occurrence", (in this case '10') ... in that specific order and one direction (like in line 2); so the correct utput should be: 2 0202031011
Tkx in advance
L.
grep -m 1 -e '^01' YourFile
where:
01 is your first 2 bit to find
-m 1 limit to 1st occurence

How to sort output of "s3cmd ls"

Amazon "s3cmd ls" takes like this output:
2010-02-20 21:01 1458414588 s3://file1.tgz.00<br>
2010-02-20 21:10 1458414527 s3://file1.tgz.01<br>
2010-02-20 22:01 1458414588 s3://file2.tgz.00<br>
2010-02-20 23:10 1458414527 s3://file2.tgz.01<br>
2010-02-20 23:20 1458414588 s3://file2.tgz.02<br>
<br>
How to select all files of archive, ending at 00 ... XX, with the latest date of fileset ?
Date and time is not sorted.
Bash, regexp ?
Thanx!
s3cmd ls s3://bucket/path/ | sort -k1,2
This will sort by date ascending.
DATE=$(s3cmd ls | sort -n | tail -n 1 | awk '{print $1}')
s3cmd ls | grep $DATE
sorting as a number schould put youngest dates last. Tail -n1 takes the last line, awk cuts the first word which is the date. Use that to get all entries of that date.
But maybe I didn't understand the question - so you have to rephrase it. You tell us 'date and time is not sorted' and provide an example where they are sorted - you ask for the latest date, but all entries have the same date.
Try
This command syntax is s3 ls s3:// path of the bucket / files | sort
s3cmd ls s3://file1.tgz.00<br> | sort
It will sort in the descending date at last
Simply append | sort to your s3cmd ls command to sort by date. The most recent file will appear right above your command line.
s3cmd ls s3://path/to/file | sort