Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a testfile.txt as below
/path/ 345 firstline
/path2/ 346 second line
/path3/ 347 third line having spaces
/path4/ 3456 fourthline
Now I want to make it as below
/path/,345,firstline
/path2/,346,second line
/path3/,347,third line having spaces
/path4/,3456,fourthline
According the above output, The data should be formed with 3 columns and all columns separated with commas.There may be one or more spaces between the columns in the input file.Example,
/path/ and 3456
can be separated by 2 tabs also. The third column contains the spaces as it is.
Can anyone can help me on this?
You can use the sed command to achieve this:
I tried this and it worked for me:
sed 's/[ \t]\+/,/' testfile.txt | sed 's/[ \t]\+/,/'
It replaces only the first and second instances of spaces or tabs in your file.
Update
sed (stream editor) uses commands to perform editing. The first parameter 's/[ \t]\+/,/' is the command that is used for replacing one expression with another. Splitting this command further:
Syntax: s/expression1/expression2/
s - substitute
/ - expression separator
[ \t]\+ - (expression1) this is the expression to find and replace (This regular expression matches one or more space or tab characters)
, - (expression2) this is what you want to replace the first expression with
When the above command is run, on each line the first instance of your expression is replaced. Because you need the second delimiter also to be replaced, I piped the output of the first sed command to another similar command. This replaced the second delimiter with a ,.
Hope this helps!
tr -s ' ' <data |sed -e 's/\s/,/' -e 's/\s/,/'
/path/,345,firstline
/path2/,346,second line
/path3/,347,third line having spaces
/path4/,3456,fourthline
Not the best way to do this, but it will generate the desired output.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
Improve this question
I want to replace all occurencies of " between ," and ", with ''' (three singular quotes). It will be done on a csv file and on all possible nested quotes to not mess up formatting.
E.g.
"test","""","test" becomes
"test","''''''","test".
Another example:
"test","quotes "inside" quotes","test"
becomes
"test","quotes '''inside''' quotes".
I use https://sed.js.org/ to test the replacement.
What I currently have is
sed "s/\([^,]\)\(\"\)\(.\)/\\1'\\''\\3/g"
but it seems not completed and it doesn't cover all cases that I want.
e.g.
works:
"anything","inside "quotes"","anything" ->
"anything","inside '''quotes'''","anything"
doesn't work for:
"anything","inside "test" quotes","anything" ->
"anything''',"inside '''test''' quotes''',"anything"
expected ->
"anything","inside '''test''' quotes","anything"
Maybe somebody is good with regex expressions and could help?
Using sed
$ cat input_file
"test","""","test"
"test","quotes "inside" quotes","test"
"anything","inside "quotes"","anything"
"anything","inside "test" quotes","anything"
$ sed -E ':a;s/(,"[^,]*('"'"'+)?)"([^,]*"(,|$))/\1'"'''"'\3/;ta' input_file
"test","''''''","test"
"test","quotes '''inside''' quotes","test"
"anything","inside '''quotes'''","anything"
"anything","inside '''test''' quotes","anything"
Escaping the triple single quotes is avoided woth a variable ${qs}.
Start replacing all quotes with ${qs}.
Next reset the replacements at the start of line, end of line and around ,.
qs="'''"
sed "s/\"/${qs}/g; s/^${qs}/\"/; s/${qs}$/\"/; s/${qs},${qs}/\",\"/g" csvfile
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a large CSV file with many columns, and multiple columns that have timestamps. I want to filter the data for a specific year based on only 1 of those columns.
Example of what some of my input CSV looks like: (there are no headers)
17263847
11/20/2018 3:00:13 PM
11/23/2018 6:45:00 AM
Approved
19483742
12/22/2019 4:00:12 PM
1/10/2020 4:50:11 AM
Approved
38274938
10/10/2018 2:02:19 PM
02/07/2019 1:04:15 PM
Approved
I want to extract all the rows that have 2019 in the second column; so for the example here, I would want to extract the 2nd row but not the 3rd row. Then, I want all of those rows to be put into a new CSV file.
Is there a simple way to do this using grep in command line? I used this but it's not working:
awk -F, '$1=="2019"' file1.csv > file2.csv
Any help would be appreciated!
First of all, in awk the second column is not $1, but $2 (remember that $0 refers to the whole line/register.
Second: Instead of the == literal comparison, use the regex matching ~ (first tutorial I found).
The command you need is:
awk -F, -e '$2 ~ /2019/' file1.csv > file2.csv
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a CSV file like this:
Name,Age,Pos,Country
John,23,GK,Spain
Jack,30,"LM, MC, ST",Brazil
Luke,21,"CMD, CD",England
And I need to get this:
Name,Age,Pos,Country
John,23,GK,Spain
Jack,30,LM,Brazil
Luke,21,CMD,England
With this expression I can extract the field but I don't know how to update it in the dataset
grep -o '\(".*"\)' file.csv | cut -d "," -f | sed 's/"//'
$ sed -E 's/"([^,]+)[^"]*"/\1/' ip.txt
John,23,GK,Spain
Jack,30,LM,Brazil
Luke,21,CMD,England
-E to enable ERE
" match double quote
([^,]+) match non-comma characters and capture it for reuse in replacement section
[^"]*" any other remaining characters
\1 will refer to the text that was captured with ([^,]+)
Note that this will work only one double quoted field and won't work if there are other valid csv formats like escaped double quotes, newline character in field, etc
Could you please try following, this should cover case when you have more than 1 occurrence of "....." in your Input_file, written and tested with GNU awk.
awk -v FPAT='[^"]*|"[^"]+"' '
BEGIN{
OFS=""
}
{
for(i=1;i<=NF;i++){
if($i~/^".*"$/){
gsub(/^"|"$|[, ].*/,"",$i)
}
}
}
1
' Input_file
This question already has answers here:
Replace All Lines That Do Not Contain Matched String
(4 answers)
Closed 3 years ago.
I have a problem with making sed command, which gonna change lines, where =sometext= occurs and change it to another pattern, but will not do it when https occcurs in that line. I have no idea how I should change this command:sed -i 's/=\([^=]*\)=/{{\1}}/g'
You'll want to read the sed manual about matching lines: https://www.gnu.org/software/sed/manual/sed.html chapter 4:
The following command replaces the word ‘hello’ with ‘world’ only in lines not containing the word ‘apple’:
sed '/apple/!s/hello/world/' input.txt > output.txt
Use multiple blocks, e.g.:
sed '/=sometext=/ { /https/b; s/.../.../; }'
This question already has answers here:
Grep for literal strings
(6 answers)
Closed 4 years ago.
I want to copy over all lines in a file (file1.txt) containing Cmd:[41] over to another file (file2.txt)
awk '/Cmd:[41]/' file1.txt > file2.txt
This command doesn't seem to work. The file2.txt is of size 0. However there are lines in file1.txt that contains Cmd:[41]
Is there some specific awk escape character that I should be using. The problem is with [41] part. The other part of the search string seems to work fine.
You can just change your command in the following way and it will work:
awk '/Cmd:\[41\]/' file1.txt > file2.txt
Explanations:
'/Cmd:[41]/' will match lines that contain: Cmd:4 or Cmd:1 but will not match lines that contain literally Cmd:[41] as [...] are used in regex to define a character range, or a list of characters that can be matched therefore you need to escape them by adding a \ before them.