Replace text in quotes with Regex or AWK [closed] - regex

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a CSV file like this:
Name,Age,Pos,Country
John,23,GK,Spain
Jack,30,"LM, MC, ST",Brazil
Luke,21,"CMD, CD",England
And I need to get this:
Name,Age,Pos,Country
John,23,GK,Spain
Jack,30,LM,Brazil
Luke,21,CMD,England
With this expression I can extract the field but I don't know how to update it in the dataset
grep -o '\(".*"\)' file.csv | cut -d "," -f | sed 's/"//'

$ sed -E 's/"([^,]+)[^"]*"/\1/' ip.txt
John,23,GK,Spain
Jack,30,LM,Brazil
Luke,21,CMD,England
-E to enable ERE
" match double quote
([^,]+) match non-comma characters and capture it for reuse in replacement section
[^"]*" any other remaining characters
\1 will refer to the text that was captured with ([^,]+)
Note that this will work only one double quoted field and won't work if there are other valid csv formats like escaped double quotes, newline character in field, etc

Could you please try following, this should cover case when you have more than 1 occurrence of "....." in your Input_file, written and tested with GNU awk.
awk -v FPAT='[^"]*|"[^"]+"' '
BEGIN{
OFS=""
}
{
for(i=1;i<=NF;i++){
if($i~/^".*"$/){
gsub(/^"|"$|[, ].*/,"",$i)
}
}
}
1
' Input_file

Related

Search and replace (escape) double quotes within double quotes in CSV values [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
Improve this question
I want to replace all occurencies of " between ," and ", with ''' (three singular quotes). It will be done on a csv file and on all possible nested quotes to not mess up formatting.
E.g.
"test","""","test" becomes
"test","''''''","test".
Another example:
"test","quotes "inside" quotes","test"
becomes
"test","quotes '''inside''' quotes".
I use https://sed.js.org/ to test the replacement.
What I currently have is
sed "s/\([^,]\)\(\"\)\(.\)/\\1'\\''\\3/g"
but it seems not completed and it doesn't cover all cases that I want.
e.g.
works:
"anything","inside "quotes"","anything" ->
"anything","inside '''quotes'''","anything"
doesn't work for:
"anything","inside "test" quotes","anything" ->
"anything''',"inside '''test''' quotes''',"anything"
expected ->
"anything","inside '''test''' quotes","anything"
Maybe somebody is good with regex expressions and could help?
Using sed
$ cat input_file
"test","""","test"
"test","quotes "inside" quotes","test"
"anything","inside "quotes"","anything"
"anything","inside "test" quotes","anything"
$ sed -E ':a;s/(,"[^,]*('"'"'+)?)"([^,]*"(,|$))/\1'"'''"'\3/;ta' input_file
"test","''''''","test"
"test","quotes '''inside''' quotes","test"
"anything","inside '''quotes'''","anything"
"anything","inside '''test''' quotes","anything"
Escaping the triple single quotes is avoided woth a variable ${qs}.
Start replacing all quotes with ${qs}.
Next reset the replacements at the start of line, end of line and around ,.
qs="'''"
sed "s/\"/${qs}/g; s/^${qs}/\"/; s/${qs}$/\"/; s/${qs},${qs}/\",\"/g" csvfile

More elegant way to extract substring in shell [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I wrote regex to get chartname(auth-token-service)). But this seems very crude, can someone write a more precise way.
chartname=`echo my-auth-token-service=xxx.azurecr.io/auth-token-service:latest | cut -d= -f1 | sed -e "s/^.*-//"`
Gets text between '=' and '/'
sed "s/.*=\(.*\)\/.*/\1/" = xxx.azurecr.io
Gets text between '/' and ':'
sed "s/.*\/\(.*\):.*/\1/" = auth-token-service
Gets text after ':'
sed "s/.*:\(.*\)/\1/" = latest
Not familiar with the format of token, but if I understood correctly you just want the part after the slash and before the colon.
echo my-auth-token-service=xxx.azurecr.io/auth-token-service:latest | sed -e 's/^.\+\/\([^\/]\+\):[^:]\+$/\1/'
Since you asked for a regex solution:
string=my-auth-token-service=xxx.azurecr.io/auth-token-service:latest
[[ $string =~ /([^:]*) ]] && chartname=${BASH_REMATCH[1]}
This assumes that the chartname is always between the / and the :. Note that chartname would be unassigned with this, if the reges does not match.
The Unix shell has parameter expansion built in. You can't nest these, so it takes multiple steps, but you avoid the overhead of starting multiple external processes.
var='my-auth-token-service=xxx.azurecr.io/auth-token-service:latest'
chartname=${var%%=*}
chartname=${chartname#*-}
The suffix operator ${var%pattern} returns the value of $var with any suffix matching pattern removed; the ${var#pattern} operator does the same for a prefix match. Doubling the operator changes it to trim the longest possible pattern match instead of the shortest. (These are shell glob patterns, not regular expressions, though.)
If you require a one-liner, you can refactor the cut into the sed script.
chartname=$(sed 's/[^-]*\([^=]*\)=.*/\1/' <<< 'my-auth-token-service=xxx.azurecr.io/auth-token-service:latest')
Notice the modernized syntax $(cmd ...) over the obsolescent `cmd ...` and the Bash "here string" with <<< (not POSIX-compatible though).
With awk only tested on the GNU variant.
var=my-auth-token-service=xxx.azurecr.io/auth-token-service:latest
echo "$var" | awk -F'[=:/]' -vOFS='\n' '{print $1, $2, $3, $NF}'
Output
my-auth-token-service
xxx.azurecr.io
auth-token-service
latest

how to make the data in a file inorder? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a testfile.txt as below
/path/ 345 firstline
/path2/ 346 second line
/path3/ 347 third line having spaces
/path4/ 3456 fourthline
Now I want to make it as below
/path/,345,firstline
/path2/,346,second line
/path3/,347,third line having spaces
/path4/,3456,fourthline
According the above output, The data should be formed with 3 columns and all columns separated with commas.There may be one or more spaces between the columns in the input file.Example,
/path/ and 3456
can be separated by 2 tabs also. The third column contains the spaces as it is.
Can anyone can help me on this?
You can use the sed command to achieve this:
I tried this and it worked for me:
sed 's/[ \t]\+/,/' testfile.txt | sed 's/[ \t]\+/,/'
It replaces only the first and second instances of spaces or tabs in your file.
Update
sed (stream editor) uses commands to perform editing. The first parameter 's/[ \t]\+/,/' is the command that is used for replacing one expression with another. Splitting this command further:
Syntax: s/expression1/expression2/
s - substitute
/ - expression separator
[ \t]\+ - (expression1) this is the expression to find and replace (This regular expression matches one or more space or tab characters)
, - (expression2) this is what you want to replace the first expression with
When the above command is run, on each line the first instance of your expression is replaced. Because you need the second delimiter also to be replaced, I piped the output of the first sed command to another similar command. This replaced the second delimiter with a ,.
Hope this helps!
tr -s ' ' <data |sed -e 's/\s/,/' -e 's/\s/,/'
/path/,345,firstline
/path2/,346,second line
/path3/,347,third line having spaces
/path4/,3456,fourthline
Not the best way to do this, but it will generate the desired output.

cURL and Bash, capture variable and value from string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
Using bash, grep, split, awk, or sed, I would like to capture
ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ
from
Set-Cookie: ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ; secure; path=/
'ASPSESSIONID' remains always the same + 8 random characters (SUSTQBQS).
Also this variable may not always be located in the second columns or right after 'Set-Cookie: '
Can anyone please help ?
With awk:
awk -F"[ ;]" '{print $2}' FileName
Set the field seperator as space and ;. Then print the 2nd field.
The basic regex structure is the same in various programs.
It may be explained in words as: The text between the colon/space(: ) and the semicolon (;). Which, in regex parlance is:
: ([^;]*);
And could be assigned to a var:
RE=': ([^;]*);'
Then, we could use it in
bash
while read l; do
[[ $l =~ $RE ]] && echo "${BASH_REMATCH[1]}";
done <file
gawk
gawk -v RE="$RE" '$0 ~ RE { print gensub(".*"RE".*","\\1",1); }' file
sed
sed -rn 's/^.*'"$RE"'.*$/\1/p' file # using -r avoids the several `\`
Try this sed command
sed 's/[^:]\+..\([^;]\+\).*/\1/' FileName
Explanation:
[^:]\+ -- Remove the charecters until :
.. -- Remove two characters
\([^;]\+\) -- Capture the group until ; found
.* -- Remove the all character after capture the group
\1 -- Finally print the captured group
Output :
ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ

design a regular expression to print out a list of words that start and end with the same 3 letters unix [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
design a regular expression to print out a list of words that start and end with the same 3 letters. For example: microcosmic, entrancement etc. i need it in unix.
This is your first post .... you can use grep (GNU grep 2.16)
grep -E "^(.{3}).*\1$" file.txt
input file.txt
microcosmic
hello
entrancement
world
you get,
microcosmic
entrancement
explanation
^ : beginning of line
(...) : backreference mark
.{3} : first three letter
.* : whatever
\1 : backreference
$ : ending of line
EDIT
if, you look for each word that start and end with the same 3 letters in a text
echo "microcosmic gshgshi entrancement hello world" |
grep -E -o "\b(.{3})\S*\1\b"
you get,
microcosmic
entrancement
\b : a regular expression means "word boundary"
\S : It isn't blank space
-o option : Print only the matched
IMPORTANT NOTE
Words like to abc or ababa It don't works, in this case you can use awk without regular expressions
echo "microcosmic gshgshi entrancement hello world abc ababa" |
awk 'length($0)<3{next;}
substr($0,1,3) == substr($0,length($0)-2,3)' RS="[ \n\t]+"
you get,
microcosmic
entrancement
abc
ababa