Replace wrong lines in csv

Replace wrong lines in csv - regex

I have a csv file like that :
0;test1;description;toto
1;test2;description;tata
2;test3;desc
ription;tutu
3;test4;description;tete
In shell, I would like to replace all the line that doesn't start with a number.
In this exemple I want to replace \nription by ription
I don't find the correct expression with sed, grep... :(
I want this result :
0;test1;description;toto
1;test2;description;tata
2;test3;description;tutu
3;test4;description;tete
Thanks a lot
EDIT 1 :
I have try something like this :
LC_ALL=C tr '(\n)[0-9]' ' ' < hotels.csv > test.csv
Or this :
sed ':a;N;$!ba;s/\r\n?![0-ç-9]/ /g' hotels.csv
But i think my regex is wrong and it doesn't work :(

With awk this seems feasible:
awk -F ';' '{if (NR>1 && match($1,/^[0-9]+$/)) printf("\n"); printf("%s",$0);} END{printf("\n")}' infile.csv
What it does:
from the second line: check if first field is a number and print a newline
in any line: print the entire line ($0) without trailing newline
Output is sent to STDOUT, input comes from infile.csv
EDIT: Sorry, i missed to copy the match(...)

Using grep -P
grep -P "^\d" file.csv
Use grep to match lines that begin with a digit.

due to peculiarities of sed's pattern space processing, you will have to use something like this ..
Note: ~ must be a char not present in your text
$cat file
0;test1;description;toto
1;test2;description;tata
2;test3;desc
ription;tutu
3;test4;description;tete
$ sed 'N;s/\n/~/' file | sed -r 's/~([0-9])/\n\1/g;s/~//g'
0;test1;description;toto
1;test2;description;tata
2;test3;description;tutu
3;test4;description;tete
PS: if your input file has Windows line endings you will have to use \r\n instead of \n

awk '{sub(/3;desc/,"3;description;tutu")}NR == 4 {next}1' file
0;test1;description;toto
1;test2;description;tata
2;test3;description;tutu
3;test4;description;tete

Related

Deleting everything between two string matches in a file

I got this text in file.txt:
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v2=6990226111024612869; tt_webid=6990226111024612869; tt_csrf_token=VD5Nb_TQFH4RKhoJeSe2nzLB; R6kq3TV7=AHkh4PB6AQAA3LIS90nWf2ss0Q7ZTCQjUat4axctvhQY68DdUEz92RwpmVSX|1|0|e9d6917c2fe555827dcf5ee916ba9778079ab2a9; ttwid=1%7CAFodeNF0iZM2fyy-ZeiZ6HTpZoG_MSx6SmXHgGVQ-V4%7C1627538859%7C59ca1e4a56f9f537b55e655a6dabff88e44eb48502b164ed6b4199f5a5263cb0; passport_csrf_token_default=6f7653c3ce946a6ce5444723fb0c509b; passport_csrf_token=6f7653c3ce946a6ce5444723fb0c509b; sid_guard=0483b7d37f4e4bd20ab3046e29724798%7C1627538893%7C5184000%7CMon%2C+27-Sep-2021+06%3A08%3A13+GMT; uid_tt=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; uid_tt_ss=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; sid_tt=0483b7d37f4e4bd20ab3046e29724798; sessionid=0483b7d37f4e4bd20ab3046e29724798; sessionid_ss=0483b7d37f4e4bd20ab3046e29724798; store-idc=maliva; store-country-code=us; odin_tt=294845c8f7711db177f7c549a9f44edb1555031b27a2a485df809cd92c4e544ac0772bf462df5b7a100f6e488c45303cd62df3b6b950f0842520cd887850137b035d990f29cc8b752765e594560c977f; cmpl_token=AgQQAPNSF-RMpbE89z5HYF0_-2PcrxjXf4fZYP5_ZA
How can I delete everything from the string inside ( first & only instance ) from :tt_ to _ZA in file.txt keeping only Osmun.Prez#mail.com:c7lB2m6b#3.a.a using bash linux?
Thank you

Something like:
sed -i "s/:tt_.*//" file.txt
if you want to edit the file in place. If not, remove the -i switch.
The sed command means: replace (s), in each line of file.txt, all the chars (.*) starting by the pattern :tt_ with an empty string (//).
Or the command:
sed -i "s/:tt_.*_ZA//" file.txt
which is more adherent to what you ask for, but returns the same output.

Use pattern substitution:
i=$(cat file.txt)
echo "${i/:tt*_ZA}"

Assuming the general requirement is to remove everything after the 2nd : ...
Sample data:
$ cat file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v ... to end of line
some.one#home.com:B52_m6b#9_az.more.stuff:delete from here ... to end of line
One sed idea:
$ sed -En 's/^([^:]*:[^:]*).*$/\1/p' file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
some.one#home.com:B52_m6b#9_az.more.stuff

Using awk
awk 'BEGIN{FS=OFS=":"}{print $1,$2}'
Using : as the delimiter, it is easy to extract the columns before :tt

This deletes all chars from ":tt_" to the last "_ZA", inclusive, in file.txt
Mac_3.2.57$cat file.txt | sed 's/\(\)[:]tt.*_ZA\(.*\)/\1\2/'
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
Mac_3.2.57$

Or if it is always the first 2 values which are separated by colon (as per you example)
cat file.txt | cut -f1,2 -d’:’

Removing rows that contains "(null)" value from a text file

I would like to remove any row within a .txt file that contains "(null)". The (null) value is always in the 3rd column. I would like to add this to a script that I already have.
Txt file example:
39|1411|XXYZ
40|1416|XXX
41|1420|(null)
In this example I would like to remove the third row.
Im guessing its an awk -F but not sure from there.

You are on the right track with using -F.
$ awk -F '|' '$3 != "(null)"' file.txt
39|1411|XXYZ
40|1416|XXX
You set the field separator to |, then print all lines where the third field is not equal to (null). This uses awk's default of "print the line" if there's no action associated with a pattern.
If you relax the requirement to specifically test the third field, and there is no other place for the "(null)" substring to occur, you can get the same result with
grep -vF '(null)' file.txt

With awk:
awk '-F|' '$3 != "(null)"' < input-file

Here is a sed:
$ sed '/(null)$/d' file
39|1411|XXYZ
40|1416|XXX
The $ assures that the (null) is at the end of the line. If you want to assure that (null) is the final column:
$ sed '/\|(null)$/d' file
And if you want to be extra sure that it is the third column:
$ sed '/^[^|]*\|[^|]*\|(null)$/d' file
Or with grep:
$ grep -v '^[^|]*|[^|]*|(null)$'
(But instead of this last one, just use awk...)

Use grep:
grep -v '|.*|(null)' in_file
Here, grep uses option -v : print lines that do not match.
Or use Perl:
perl -F'[|]' -lane 'print if $F[2] ne "(null)";' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F'[|]' : Split into #F on literal |, rather than on whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

I would like to remove any row within a .txt file that contains "(null)"
If you wish to do that using AWK let file.txt content be
39|1411|XXYZ
40|1416|XXX
41|1420|(null)
then
awk '!index($0,"(null)")' file.txt
will output
39|1411|XXYZ
40|1416|XXX
Explanation: index return position of first occurence of substring ((null) in this case) or 0 if none will find, I negate what is return thus getting truth for 0 and false for anything else and AWK does print where result was truth.

unix - pattern matching in file

so I have a file with the following:
username=jsmith
api=3434kjklj23j4l3kj4l34j3l4j
I would like to return using regular expression "jsmith" and "3434kjklj23j4l3kj4l34j3l4j"
I know the regular expression for it is:
(username=)(.*) > \2
(api=)(.*) > \2
however using grep or sed or awk. I can't seem to figure out the way to use them without return the entire line.
How would you go about doing that with a commandline command?

awk is made for this task:
awk -F= '{print$2}' file
If the file has other entries, you can limit the output with a condition:
awk -F= '$1=="username"||$1=="api"{print$2}' file

Here is one using bash, PCRE and positive lookbehind (where supported):
$ grep -Po "((?<=^username=)|(?<=^api=)).*" file
jsmith
3434kjklj23j4l3kj4l34j3l4j
ie. output everything that is preceeded by username= or api= that start the lines.
And one in awk:
$ awk 'sub(/^(username|api)=/,""){print}' file
jsmith
3434kjklj23j4l3kj4l34j3l4j
ie. print lines where preceeding ^username= or ^api= are removed first.

Since you want to see chess with the input game=chess, here some solutions without matching username= or api=
cut -d"=" -f2- file
# or
sed -n 's/[^=]*=//p' file

here's the answer that worked on the macos and RHEl7.
awk -F= '$1=="username"{print$2}' testfile.txt
awk -F= '$1=="api"{print$2}' testfile.txt
testfile.txt
username=user1
api=pass1
username=user2
api =pass2

Delete all lines in file with specific regex using sed

We'd like to delete all lines which matches with the following "regex input" and put them in a new file:
Hi|thisisatest|11
What we have:
check='([^[:space:]]+)|([^[:space:]]+)|([^[:space:]]+)'
sed '/$check/d' test.txt > test_new.txt
It currently does not work.
Edit:
We got the following test.txt:
Jack|Miles|44
Carl|13
Robert|Whittaker|87
John|2
Frank|65
We want to delete Jack|Miles|44 and Robert|Whittaker|87, which matches the regex (if the regex is correct).

Correct BRE regex is:
check='[^[:space:]]*|[^[:space:]]*|[^[:space:]]*'
Then use it as:
sed "/$check/d" file
Carl|13
John|2
Frank|65
btw awk can handle it even better without using regex. Just use | as delimiter and delete all line that don't have 2 fields:
awk -F '|' 'NF==2' file
Carl|13
John|2
Frank|65

It is much more simpler when using awk, just do,
awk -F'|' 'NF<=2' file
Carl|13
John|2
Frank|65
To modify the same file back with the updates, just do,
awk -F'|' 'NF<=2' file > tmp && mv tmp file

With GNU sed:
sed -r '/\S+\|\S+\|\S+/d' file

Also a grep:
grep -P '^\w+\|\d+$' file >tmp
selects the "correct" entries from a file e.g. word|digits
or
grep -P '^[^|]+\|[^|]+$' file >tmp
and rename the tmp back to file

Perl, sed, or awk one-liner to change the format of the file

I need advice on how to change the file formatted following way
file1:
A 504688
B jobnameA
A 504690
B jobnameB
A 504691
B jobnameC
...
into file2:
A B
504688 jobnameA
504690 jobnameB
504691 jobnameC
...
One solution I could think of is:
cat file1 | perl -0777 -p -e 's/\s+B/\t/' | awk '{print $2"\t"$3}'.
But I am wondering if there is more efficient way or already known practice that does this job.

perl -nawe 'print "#F[1 .. $#F]", $F[0] eq "A" ? "\t" : "\n"' < /tmp/ab
Look up the options in perlrun.
Another useful one to add is -l (append newline to print), but not in this case.

Assuming your input file is tab separated:
echo $'A\tB'
cut -f2 filename | paste - -
Should be pretty quick because this is exactly what cut and paste were written to do.

awk '/^A/{num=$2}/^B/{print num,$2}' file
Or, alternately,
awk '{num=$2;getline;print num,$2}' file

Here is an sed solution:
sed -e 'N' -e 's/A\s*\(.*\)\nB\s*\(.*\)/\1\t\2/' file
This version will also print the header at the top:
sed '1{h;s/.*/A\tB/p;g};N;s/A\s*\(.*\)\nB\s*\(.*\)/\1\t\2/' file
Or an alternative:
sed -n '/^A\s*/{s///;h};/^B\s*/{s///;H;g;s/\n/\t/p}' file
If your sed does not support semicolons as a command separator for the alternative:
sed -n '
/^A\s*/{ # if the line starts with "A"
s/// # remove the "A" and the whitespace
h # copy the remainder into the hold space
} # end if
/^B\s*/{ # if the line starts with "B"
s/// # remove the "B" and the whitespace
H # append pattern space to hold space
g # copy hold space to pattern space
s/\n/\t/p # replace newline with tab and print
}' file
This version will also print the header at the top:
sed -n '/^A\s*/{s///;h;1s/.*/A\tB/p};/^B\s*/{s///;H;g;s/\n/\t/p}' file

This will work with any header text, not just fixed A and B >>
awk '{a=$1;b=$2;getline;if(c!=1){print a,$1;c=1};print b,$2}' file1 >file2
...and it will print also header row
If you need \t separator, then use:
awk '{a=$1;b=$2;getline;if(c!=1){print a"\t"$1;c=1};print b"\t"$2}' file1 >file2

This might work for you:
sed -e '1i\A\tB' -e 'N;s/A\s*\(\S*\).*\nB\s*\(\S*\).*/\1\t\2/' file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replace wrong lines in csv - regex

Using grep -P grep -P "^\d" file.csv Use grep to match lines that begin with a digit.

awk '{sub(/3;desc/,"3;description;tutu")}NR == 4 {next}1' file 0;test1;description;toto 1;test2;description;tata 2;test3;description;tutu 3;test4;description;tete

Related

Deleting everything between two string matches in a file

Removing rows that contains "(null)" value from a text file

unix - pattern matching in file

Delete all lines in file with specific regex using sed

Perl, sed, or awk one-liner to change the format of the file

Categories

Resources