Extracting Current line if some sub-string match first time only? - regex

I have a string like this
str = ["asap subject ssfs sfdsf sdfsdfsdefs sdfssdf","nsubject qwerty
swqt","dsfsdf sdfsdf sdfsfs sfsdf er:subject adsdsd dsdfs
sdfsdfsdfsds"]
What i Want
str = ["ssfs sfdsf sdfsdfsdefs sdfssdf","qwerty
swqt","adsdsd dsdfs sdfsdfsdfsds"]
I using
for i in range(0,len(str)):
list_i.append(str[i].strip("subject*)[1])
But problem is when i have long text after subject and i want value of current line only.

Seems like you should be using the str.split function.
Instead of
str[i].strip("subject")[1]
replace that with
str[i].split("subject ",1)[-1]
This splits the string at "subject", then takes the last element of that result.

Related

How can I extract a file name based on number string?

I have a list of filenames in a struct array, example:
4x1 struct array with fields:
name
date
bytes
isdir
datenum
where files.name
ans =
ts.01094000.crest.csv
ans =
ts.01100600.crest.csv
etc.
I have another list of numbers (say, 1094000). And I want to find the corresponding file name from the struct.
Please note, that 1094000 doesn't have preceding 0. Often there might be other numbers. So I want to search for '1094000' and find that name.
I know I can do it using Regex. But I have never used that before. And finding it difficult to write for numbers instead of text using strfind. Any suggestion or another method is welcome.
What I have tried:
regexp(files.name,'ts.(\d*)1094000.crest.csv','match');
I think the regular expression you'd want is more like
filenames = {'ts.01100600.crest.csv','ts.01094000.crest.csv'};
matches = regexp(filenames, ['ts\.0*' num2str(1094000) '\.crest\.csv']);
matches = ~cellfun('isempty', matches);
filenames(matches)
For a solution with strfind...
Pre-16b:
match = ~cellfun('isempty', strfind({files.name}, num2str(1094000)),'UniformOutput',true)
files(match)
16b+:
match = contains({files.name}, string(1094000))
files(match)
However, the strfind way might have issues if the number you are looking for exists in unexpected places such as looking for 10 in ["01000" "00101"].
If your filenames match the pattern ts.NUMBER.crest.csv, then in 16b+ you could do:
str = {files.name};
str = extractBetween(str,4,'.');
str = strip(str,'left','0');
matches = str == string(1094000);
files(matches)

C++ Substract the end of a string, not knowing length of the result

I have a string like this: 001,"John Marvin","doctor", "full time"
I want to delete everything after (001) with substr, but, the length of (001) is not always 3 so I can not put something like thie:
string chain = "001,\"John Marvin\",\"doctor\", \"full time\"";
std::string partial = chain.substr(0,3);
How can I proceed in this case?
You could find the index of the first comma and use that to determine where to cut off the string.
Something like:
string chain = "001,\"John Marvin\",\"doctor\", \"full time\"";
int cutoff = chain.find(',');
string newString = chain.substr(0, cutoff);
Tested here.

Spliting string into a list of substrings

I have a string id <- "Hello these are words N12345678 hooray how fun".
I would like to extract just N12345678 from this string.
So far I have used strsplit(id, " "). Now I have
>id
>[[1]]
>[1] "Hello" "these" "are" "words" "N12345678" "hooray" "how"
>[8] "fun"
Which is of type list and of length 1 (despite apparently having 8 elements?)
If I then use id <- id[grep("^[N][0-9]",id)],
id is an empty list.
I think what I need to do is split the string into a list of length 8 with each element as a substring and then grep should be able to pick out the pattern, but I'm not sure how to go about that.
Use regmatches
> regmatches(id, regexpr("N[0-9]+", id))
[1] "N12345678"
If you insist on using strsplit. I think this can solve the problem:
id <- "Hello these are words N12345678 hooray how fun"
id = strsplit(id, " ")
id[[1]][grep("^N[1-9]", id[[1]])]
Notice that I haven't changed your regex. It could be more precise expression such as ^N\\d+$.
Do you know about strtok? It will parse your input line on certain characters. For the purpose of my example, I am breaking off a piece of my string every time I hit a space.
tempVar = strtok(string, " ");
// tempVar has "id" or everything up to the first space
while (tempVar != NULL)
{
tempVar = strtok(NULL, " ");
//now tempVar picked up the next word, and will loop picking up the next word until the end of string
}
Using this, your "Hello these are words N123456789 Hooray" would do this:
tempVar would be Hello, then "these" etc etc.
Each time through the loop tempVar would get a new value. So i would suggest evaluating tempVar in the loop (before grabbing the next one) so that you can stop when you have N123456789
Try:
gsub('\\b[a-zA-Z]+\\b','',id)

Verify and cut a string using regexp in matlab

I have the following string:
{'output',{'variable','VGRG_Pos_Var1/Parameters/D_foo'},'date',734704.60904050921}
I would like to verify the format of the string that the word 'variable' is the second word and i would like to retrive the string after the last '/' in the 3rd string (In this example 'D_foo').
how could i verify this and retrive the sting i search?
I tried the following:
regexp(str,'{''\w+'',{''variable'',''([(a-z)|(A-Z)|/|_])+')
without success
REMARK
The string to analysis is not splited after the komma, it is only due to length of the string.
EDIT
my string is:
'{''output'',{''variable'',''VGRG_Pos_Var1/Parameters/D_foo''},''date'',734704.60904050921}';
and not a cell, which could be understood. I added the sybol ' at the start and end of the string to symbolizied that it is a string.
I realise that you mention using regexp in the question, but I'm not sure if this is a requirement? If other solutions are acceptable you could try this:
str='{''output'',{''variable'',''VGRG_Pos_Var1/Parameters/D_foo''},''date'',734704.60904050921}';
parts1=textscan( str, '%s','delimiter',{',','{','}'},'MultipleDelimsAsOne',1);
parts2=textscan( parts1{1}{3}, '%s','delimiter',{'/',''''},'MultipleDelimsAsOne',1);
string=parts2{1}{end}
match=strcmp(parts1{1}{2},'variable')
To answer the first part of your question, you can write this:
str = {'output',{'variable','VGRG_Pos_Var1/Parameters/D_foo'},'date',734704.60904050921};
temp = str(2); %this holds the cell containing the two strings
if cmpstr(temp{1}(1), 'variable')
%do stuff
end
For the second part you can do this:
str = {'output',{'variable','VGRG_Pos_Var1/Parameters/D_foo'},'date',734704.60904050921};
temp = str(2); %like before, this contains the cell
temp = temp{1}(2); %this picks out the second string in the cell
temp = char(temp); %turns the item from a cell to a string
res = strsplit(temp, '/'); %splits the string where '/' are found, res is an array of strings
string = res(3); %assuming there will always be just 2 '/'s.

How to get last element in tokenized string in C++ separated by "::"?

I'm working on C++,
i have one string as follows:
string str = "rake::may.chipola::ninbn::myFuntion";
How to get last element from above string which is always after the last occurrence of "::"?
Use std::string::rfind() to locate the last occurrence of :: and use std::string::substr() to extract the token:
// Example without confirming that a '::' exists.
std::string last_element(str.substr(str.rfind("::") + 2));