Matching multiple quoted strings in a single line with regex - regex

I want to match quoted strings of the form 'a string' within a line. My issue comes with the fact that I may have multiple strings like this in a single line. Something like
result = functionCall('Hello', 5, 'World')
I can search for phrases bounded by strings with ['].*['], and that picks up quoted strings just fine if there is a single one in a line. But with the above example it would find 'Hello', ', 5, ' and 'World', when I only actually want 'Hello' and 'World'. Obviously I need some way of knowing how many ' precede the currently found ' and not try to match when there is an odd amount.
Just to note, in my case strings are only defined using ', never ".

you should use [^']+ between quotes:
var myString = "result = functionCall('Hello', 5, 'World')";
var parts = myString.match(/'[^']+'/g);

Related

How to determine if variable contains a specified string using RegEx

How can I write a condition which will compare Recipient.AdressEntry for example with the following String "I351" using RegEx?
Here is my If condition which works but is hardcoded to every known email address.
For Each recip In recips
If recip.AddressEntry = "Dov John, I351" Then
objMsg.To = "example#mail.domain"
objMsg.CC = recip.Address
objMsg.Subject = Msg.Subject
objMsg.Body = Msg.Body
objMsg.Send
End If
Next
The reason I need this condition is email may have one of several colleagues from my team and one or more from another team. AdressEntry of my colleagues ends with I351 so I will check if this email contains one of my teammates.
For Each recip In recips
If (recip.AddressEntry = "Dov John, I351" _
Or recip.AddressEntry = "Vod Nohj, I351") Then
objMsg.To = "example#mail.domain"
objMsg.CC = recip.Address
objMsg.Subject = Msg.Subject
objMsg.Body = Msg.Body
objMsg.Send
End If
Next
You still didn't clarify exactly what the condition you want to use for matching is, so I'll do my best:
If you simply want to check if the string ends with "I351", you don't need regex, you can use something like the following:
If recip.AddressEntry Like "*I351" Then
' ...
End If
If you want to check if the string follows this format "LastName FirstName, I351", you can achieve that using Regex by using something like the following:
Dim regEx As New RegExp
regEx.Pattern = "^\w+\s\w+,\sI351$"
If regEx.Test(recip.AddressEntry) Then
' ...
End If
Explanation of the regex pattern:
' ^ Asserts position at the start of the string.
' \w Matches any word character.
' + Matches between one and unlimited times.
' \s Matches a whitespace character.
' \w+ Same as above.
' , Matches the character `,` literally.
' \s Matches a whitespace character.
' I351 Matches the string `I351` literally.
' $ Asserts position at the end of the string.
Try it online.
Hope that helps.

Extract table key-values from LUA code

I have multiple strings from LUA code, each one with a LUA table item, something like:
atable['akeyofthetable'] = { 'name' = 'a name', 'thevalue' = 34, 'anotherkey' = 'something' }
The string might be spanned in multiple lines, meaning it might be:
atable['akeyofthetable'] = { 'name' = 'a name',
'thevalue' = 34,
"anotherkey" = 'something' }
How to get some (ex: only name and anotherkey in the above example) of the keys with their values as "re.match" objects in python3 from that string? Because this is taken from code, the existence of keys is not guarantied, the "quoting" of keys and values (double or single quotes) may vary, even from key to key, and there may be empty values ('name' = '') or non quoted strings as values ('thevalue' = anonquotedstringasvalue). Even the order of the keys is not guarantied. Split using commas (,) is not working because some string values have commas (ex: 'anotherkey' = 'my beloved, strange, value' or even 'anotherkey' = "my beloved, 'strange' = 34, value"). Also keys may or may not be quoted (depends, if names are in ASCII probably will not be quoted).
Is it possible to do this using one regex or I must do multiple searches for every key needed?
Code
If there is a possibility of escaped quotes \' or \" within the string, you can substitute the respective capture groups for '((?:[^'\\]|\\.)*)' as seen here.
See regex in use here
['\"](?:name|anotherkey)['\"]\s*=\s*(?:'([^']*)'|\"([^\"]*)\")
Usage
See code in use here
import re
keys = [
"name",
"anotherkey"
]
r = r"['\"](" + "|".join([re.escape(key) for key in keys]) + r")['\"]\s*=\s*(?:'([^']*)'|\"([^\"]*)\")"
s = "atable['akeyofthetable'] = { 'name' = 'a name',\n\t 'thevalue' = 34, \n\t \"anotherkey\" = 'something' }"
print(re.findall(r, s))
Explanation
The second point below is replaced by a join of the keys array.
['\"] Match any character in the set '"
(name|anotherkey) Capture the key into capture group 1
['\"] Match any character in the set '"
\s* Match any number of whitespace characters
= Match this literally
\s* Match any number of whitespace characters
(?:'([^']*)'|\"([^\"]*)\") Match either of the following
'([^']*)' Match ', followed by any character except ' any number of times, followed by '
\"([^\"]*)\" Match ", followed by any character except " any number of times, followed by "

Avoid repeating regex substitution

I have lines of code (making up a Ruby hash) with the form of:
"some text with spaces" => "some other text",
I wrote the following vim style regex pattern to achieve my goal, which is to replace any spaces in the string to the left of the => with +:
:%s/\(.*\".*\)\ (.*\"\ =>.*\,)/\1+\2
Expected output:
"some+text+with+spaces" => "some other text",
Unfortunately, this only replaces the space nearest to the =>. Is there another pattern that will replace all the spaces in one run?
Rather than write a large complex regex a couple of smaller ones would easier
:%s/".\{-}"/\=substitute(submatch(0), ' ', '+', 'g')
For instance this would capture the everything in quotes (escaped quotes break it) and then replace all spaces inside that matched string with pluses.
If you want it to work with strings escaped quotes in the string you just need to replace ".\{-}" with a slightly more complex regex "\(\\.\|[^\"]\)*"
:%s/"\(\\.\|[^\"]\)*"/\=substitute(submatch(0), ' ', '+', 'g')
If you want to restrict the lines that this substitute runs on use a global command.
:g/=>/s/"\(\\.\|[^\"]\)*"/\=substitute(submatch(0), ' ', '+', 'g')
So this will only run on lines with =>.
Relevant help topic :h sub-replace-expression
It's really far from perfect, but it does nearly the job:
:%s/\s\ze[^"]*"\s*=>\s*".*"/+/g
But it doesn't handle escape quotes, so the following line won't be replaced correctly:
"some \"big text\" with many spaces" => "some other text",

Splitting Two Characters In a String - Perl

I'm trying to split this string. Here's the code:
my $string = "585|487|314|1|1,651|365|302|1|1,585|487|314|1|1,651|365|302|1|1,656|432|289|1|1,136|206|327|1|1,585|487|314|1|1,651|365|302|1|1,585|487|314|1|1,651|365|302|1|1%656|432|289|1|1%136|206|327|1|1%654|404|411|1|1";
my #ids = split(",", $string);
What I want is to split only % and , in the string, I was told that I could use a pattern, something like this? /[^a-zA-Z0-9_]/
Character classes can be used to represent a group of possible single characters that can match. And the ^ symbol at the beginning of a character class negates the class, saying "Anything matches except for ...." In the context of split, whatever matches is considered the delimiter.
That being the case, `[^a-zA-Z0-9_] would match any character except for the ASCII letters 'a' through 'z', 'A' through 'Z', and the numeric digits '0' through '9', plus underscore. In your case, while this would correctly split on "," and "%" (since they're not included in a-z, A-Z, 0-9, or _), it would mistakenly also split on "|", as well as any other character not included in the character class you attempted.
In your case it makes a lot more sense to be specific as to what delimiters to use, and to not use a negated class; you want to specify the exact delimiters rather than the entire set of characters that delimiters cannot be. So as mpapec stated in his comment, a better choice would be [%,].
So your solution would look like this:
my #ids = split/[%,]/, $string;
Once you split on '%' and ',', you'll be left with a bunch of substrings that look like this: 585|487|314|1|1 (or some variation on those numbers). In each case, it's five positive integers separated by '|' characters. It seems possible to me that you'll end up wanting to break those down as well by splitting on '|'.
You could build a single data structure represented by list of lists, where each top level element represents a [,%] delimited field, and consists of a reference to an anonymous array consisting of the pipe-delimited fields. The following code will build that structure:
my #ids = map { [ split /\|/, $_ ] } split /[%,]/, $string;
When that is run, you will end up with something like this:
#ids = (
[ '585', '487', '314', '1', '1' ],
[ '651', '365', '302', '1', '1' ],
# ...
);
Now each field within an ID can be inspected and manipulated individually.
To understand more about how character classes work, you could check perlrequick, which has a nice introduction to character classes. And for more information on split, there's always perldoc -f split (as mentioned by mpapec). split is also discussed in chapter nine of the O'Reilly book, Learning Perl, 6th Edition.

Extract text between single quotes in MATLAB

I have multiple lines in some text files such as
.model sdata1 s tstonefile='../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p' passive=2
I want to extract the text between the single quotes in MATLAB.
Much help would be appreciated.
To get all of the text inside multiple '' blocks, regexp can be used as follows:
regexp(txt,'''(.[^'']*)''','tokens')
This says to get text surrounded by ' characters, which does not include a ' in the captured text. For example, consider this file with two lines (I made up different file name),
txt = ['.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'' passive=2 ', char(10), ...
'.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_3port_via_minstub.s00p'' passive=2']
>> stringCell = regexp(txt,'''(.[^'']*)''','tokens');
>> stringCell{:}
ans =
'../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'
ans =
'../data/s_element/isdimm_rcv_via_3port_via_minstub.s00p'
>>
Trivia:
char(10) gives a newline character because 10 is the ASCII code for newline.
The . character in regexp (regex in the rest of the coding word) pattern usually does not match a newline, which would make this a safer pattern. In MATLAB, a dot in regexp does match a newline, so to disable this, we could add 'dotexceptnewline' as the last input argument to `regexp``. This is convenient to ensure we don't get the text outside of the quotes instead, but not needed since the first match sets precedent.
Instead of excluding a ' from the match with [^''], the match can be made non-greedy with ? as follows, regexp(txt,'''(.*?)''','tokens').
If you plan to use textscan:
fid = fopen('data.txt','r');
rawdata = textscan(fid,'%s','delimiter','''');
fclose(fid);
output = rawdata{:}(2)
As also used in other answers the single apostrophe 'is represented by a double one: '', e.g. for delimiters.
considering the comment:
fid = fopen('data.txt','r');
rawdata = textscan(fid,'%s','delimiter','\n');
fclose(fid);
lines = rawdata{1,1};
L = size(lines,1);
output = cell(L,1);
for ii=1:L
temp = textscan(lines{ii},'%s','delimiter','''');
output{ii,1} = temp{:}(2);
end
One easy way is to split the string with single quote delimiter and take the even-numbered strings in the output:
str = fileread('test.txt');
out = regexp(str, '''', 'split');
out = out(2:2:end);
You can do this using regular expressions. Assuming that there is only one occurrence of text between quotation marks:
% select all chars between single quotation marks.
out = regexp(inputString,'''(.*)''','tokens','once');
After identifing which lines you want to extract info from, you could tokenize it or do something like this if they all have the same form:
test='.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'' passive=2';
a=strfind(test,'''')
test=test(a(1):a(2))