I have a .csv files where many rows have one of the field values like this:
scl[0]
scl[1]
scl[2]
sda[1]
sda[2]
sda[3]
I am storing them in a variable while reading the csv files in line by line format,like:
set string [$m get cell 0 1]
Now when I do regexp to check whether the cell has scl[0] I am unable to pass the square bracket to this regular expression:
I gave this syntax:
if{[regexp "scl\[0\]" $string]} {
...
}
But the if condition doesn't get executed.
If in case of scl(0), i.e () instead of {} in csv file, I gave {[regexp "scl\[(\]0\[)\]" $string]} which worked. The same format I tried apply to square brackets still it doesn't get evaluated.
Am I missing something?
Please help.
Thanks
Note that \ has special meaning inside double quotes. So just do:
regexp "scl\\[0\\]" $string
or:
regexp {scl\[0\]} $string
You could also use string equal: then you only need to worry about one level of quoting:
string equal {scl[0]} $string
Documentation:
string
Related
I am working within a VB.Net application and am trying to protect numbers with leading zeros (like zip codes for example) in a bunch of CSV files by declaring them a text using double quotes (") as text delimiters.
The files are existing files, so I can't go back to the source and regenerate the files.
What would be the proper Regex syntax to find every occurrence of
,0,
and replace it with
,"0
For example, make ,01234, into ,"01234", or ,0011112222, into ,"0011112222",
I know this should be 'brain dead' simple, but I just can't get it to work.
Maybe...
(?<!\d)(0\d+)
Replace with...
"$1"
If you need to protect any single zeros then perhaps...
(?<!\d)(0\d{0,})
Demo: https://regex101.com/r/CFFpu8/1
In case you also want to avoid decorating numbers that are inside longer strings or are already enclosed by double quotes, you could try the following expression:
/(?<!")\b0\d+\b(?!")/g
In VB.net double quotes would probably have to be escaped:
Dim text As String = "01234, ""01234"", 0011112222, ""0011112222"", 100, 0, 11a00bc00123, 00foo."
Dim output As String = Regex.Replace(text, "(?<!"")(\b0\d+\b)(?!"")", """$1""")
Console.WriteLine(output)
Output:
"01234", "01234", "0011112222", "0011112222", 100, 0, 11a00bc00123, 00foo.
I'm trying to clean a CSV file which has a column with contents like this:
Sometexthere1", "code"=>"47.51-2-01"}]
And I would like to remove everything before the first quote (") in order to keep just this:
Sometexthere1
I know that I can use $` to get everything before some match in regex, but I am not understanding how to keep just the string before the first double quote.
Parameter expansion does this well enough:
# Define a variable
s='Sometexthere1", "code"=>"47.51-2-01"}]'
# expand it, removing the longest possible match (from the end) for '"'*
result=${s%%'"'*}
# demonstrate that result by printing it
printf '%s\n' "$result"
...properly returns Sometexthere1.
You probably mean "delete everything after a double quote"? In Open Refine, you can use this GREL formula :
value.replace(/".+/, "")
> Result : Sometexthere1
I'm trying to clean up a column in my data frame where the rows look like this:
1234, text ()
and I need to keep just the number in all the rows. I used:
df$column = gsub(", text ()", "", df$column)
and got this:
1234()
I repeated the operation with only the parentheses, but they won't go away. I wasn't able to find an example that deals specifically with parentheses being eliminated as unwanted text. sub doesn't work either.
Anyone knows why this isn't working?
Parentheses are stored metacharacters in regex. You should escape them either using \\ or [] or adding fixed = TRUE. But in your case you just want to keep the number, so just remove everything else using \\D
gsub("\\D", "", "1234, text ()")
## [1] "1234"
If your column always looks like a format described above :
1234, text ()
Something like the following should work:
string extractedNumber = Regex.Match( INPUT_COLUMN, #"^\d{4,}").Value
Reads like: From the start of the string find four or more digits.
I'm fetching node coordinates from a file. Unfortunately for small numbers the following format is used:
-3.014-5
without an "e" --> -3.014e-5
I can't use format because all the functions I found require a floating point number, which the above not is...
So I wanted to use regular expressions to find the "-5" part and replace it by "e-5".
([+-]?[0-9]+)?$ would do that, but how can I use that expression in TCL?
set num -3.014-5
set Enum [ regexp -all { ([+-]?[0-9]+)?$ } $num ]
I get "invalid command name "+-", so I replaced the square brackets by " , but then I get 1 as an answer. What am I doing wrong?
I don't understand why you get the error message "invalid command name "+-". As long as you have your regular expression inside curly braces {} the expression should not be evaluated by the interpreter.
For me this worked to achieve the desired result:
set Enum [regsub {^([+-]?[.0-9]+)([+-]?[0-9]+)?$} $num {\1e\2}]
Edit:
If you want "normal" numbers (those without an exponent) to remain unchanged you could simply remove the ? from the tail part of the regular expression. In this case the expression will not match and the number remains unchanged:
set Enum [regsub {^([+-]?[.0-9]+)([+-][0-9]+)$} $num {\1e\2}]
I don't know tcl but I would guess you need to escape the + and propably the - too.
Try this: set Enum [ regexp -all ([\+-]?[0-9]+)?$ $num ]
or this: set Enum [ regexp -all ([\+\-]?[0-9]+)?$ $num ]
You might need to use \\ instead of \ (I don't know tcl sorry)
I am trying to write a common regular expression for the below 3 cases:
Supernatural_S07E23_720p_HDTV_X264-DIMENSION.mkv
the.listener.313.480p.hdtv.x264-2hd.mkv
How.I.met.your.mother.s02e07.hdtv.x264-xor.avi
Now my regular exoression should remove the series name from the original string i,e the output of above string will be:
S07E23_720p_HDTV_X264-DIMENSION.mkv
313.480p.hdtv.x264-2hd.mkv
s02e07.hdtv.x264-xor.avi
Now for the basic case of supernatural string I wrote the below regex and it worked fine but as soon as the series name got multiple words it fails.
$string =~ s/^(.*?)[\.\_\- ]//i; #delimiter can be (. - _ )
So, I have no idea how to proceed for the aboves cases I was thinking along the lines of \w+{1,6} but it also failed to do the required.
PS: Explanation of what the regular expression is doing will be appreciated.
you can detect if the .'s next token contains digit, if not, consider it as part of the name.
HOWEVER, I personally think there is no perfect solution for this. it'd still meet problem for something like:
24.313.480p.hdtv.x264-2hd.mkv // 24
Warehouse.13.s02e07.hdtv.x264-xor.avi // warehouse 13
As StanleyZ said, you'll always get into trouble with names containing numbers.
But, if you take these special cases appart, you can try :
#perl
$\=$/;
map {
if (/^([\w\.]+)[\.\_]([SE\d]+[\.\_].*)$/i) {
print "Match : Name='$1' Suffix='$2'";
} else {
print "Did not match $_";
}
}
qw!
Supernatural_S07E23_720p_HDTV_X264-DIMENSION.mkv
the.listener.313.480p.hdtv.x264-2hd.mkv
How.I.met.your.mother.s02e07.hdtv.x264-xor.avi
!;
which outputs :
Match : Name='Supernatural' Suffix='S07E23_720p_HDTV_X264-DIMENSION.mkv'
Match : Name='the.listener' Suffix='313.480p.hdtv.x264-2hd.mkv'
Match : Name='How.I.met.your.mother' Suffix='s02e07.hdtv.x264-xor.avi'
note : aren't you doing something illegal ? ;)