I can't figure this exercise at RegexOne.come: http://regexone.com/example/4?
This was my solution: (^.*\.(jpg|png|gif)$)
The capture text shows up green, but my result is all green checks except for the 3 that I need.
I tried this too: (^.*(\.jpg|\.png|\.gif)$)ďťż
Try this:
(^.*)\.(jpg|png|gif)$
Related
I have a text in cell (A1) like this:
âđđ
đ
âď¸đđđŞđ§đ§
I want to extract the unique emojis from this cell into separate cells:
âđđ
âď¸đđđŞđ§
Is this possible?
You want to put each character of âđđ
đ
âď¸đđđŞđ§đ§ to each cell by splitting using the built-in function of Google Spreadsheet.
Sample formula:
=SPLIT(REGEXREPLACE(A1,"(.)","$1#"),"#")
âđđ
đ
âď¸đđđŞđ§đ§ is put in a cell "A1".
Using REGEXREPLACE, # is put to between each character like â#đ#đ
#đ
#â#ď¸#đ#đ#đŞ#đ§#đ§#.
Using SPLIT, the value is splitted with #.
Result:
Note:
In your question, the value of ď¸ which cannot be displayed is included. It's \ufe0f. So "G1" can be seen like no value. But the value is existing. So please be careful this. If you want to remove the value, you can use âđđ
đ
âđđđŞđ§đ§.
References:
REGEXREPLACE
SPLIT
Added:
From marikamitsos's comment, I could notice that my understanding was not correct. So the final result is as follows. This is from marikamitsos.
=TRANSPOSE(UNIQUE(TRANSPOSE(SPLIT(REGEXREPLACE(A1,"(.)","$1#"),"#"))))
or try:
=TRANSPOSE(UNIQUE(TRANSPOSE(REGEXEXTRACT(A1, REPT("(.)", LEN(A1))))))
Formula
Appears, one of the best formula solutions would be:
=SPLIT(REGEXREPLACE(A1,"(.)","$1#"),"#")
You may also add some additional checks like skin tones & intermediate chars:
=TRANSPOSE(SPLIT(REGEXREPLACE(A2,"(.[đťđźđ˝đžđż"&CHAR(8205)&CHAR(65039)&"]*)","#$1"),"#"))
It will help to join some emojis as a single emoji.
Script
More precise way is to use the script:
https://github.com/orling/grapheme-splitter/blob/master/index.js
â
Add the code to Script editor
Add code for sample usage:
function splitEmojis(string) {
var splitter = new GraphemeSplitter();
// split the string to an array of grapheme clusters (one string each)
var graphemes = splitter.splitGraphemes(string);
return graphemes;
}
Tests
Not 100% precise
1
Please note: some emojis are not correctly shown in sheets
đ´ó §ó ˘ó ˇó Źó łó żđ´ó §ó ˘ó łó Łó ´ó żđ´ó §ó ˘ó Ľó Žó §ó żđ´
â emojis:
flag: England
flag: Scotland
flag: Wales
black flag
are the same for Google Sheets.
2
Vlookup function in #GoogleSheets and in #Excel thinks chars
#ď¸âŁ and
*ď¸âŁ
are the same!
So, for example, I have this list:
10text
11text
12text
13text
14text
15text
16text
17text
18text
19text
Now I need to copy this and make it all into the 20-29 and 30-39 range with a Regex.
So only the very first 1 needs to be changed into a 2 and a 3 etc.
I can't seem to figure out what the regex is.
I tried: 1.text 1*text
Now I have been doing some reading and probably because it ain't in my native language it is still magic to me.
http://www.ntu.edu.sg/home/ehchua/programming/howto/regexe.html
https://www.regular-expressions.info/tutorial.html
http://2017.compciv.org/guide/topics/end-user-software/atom/how-to-use-regex-atom.html
Reference - What does this regex mean?
https://stackoverflow.com/tags/regex/info
What was I hoping for
When I fill in this in the search field:
1*text
and this in the replace field:
2*text
then on pressing change all the list becomes
20text
21text
22text
23text
24text
25text
26text
27text
28text
29text
I don't know exactly what you are trying to do here, but if you want to convert 1xtext to 2xtext, then try searching for this pattern:
^1
and then replace with just 2. Or, more generally, to match something like (1xtext) you could try using the pattern \b1, and again replace with 2.
I want to keep the words with the tag NA. If more than one such words come together, I want to combine them into a one word.
Example:
%if i have
a='[The/D, handle/NA, of/NS, the/NaAq, hair/NA, brush/NA, is/NaAZ broken/A]'
% the output I want:
output={'handle', 'hair brush'}
I tried with searching for /NA but the problem is there are false positives which are the, is.
Currently my code is:
g=split(a(2:end-1));
b= strfind(g,'/NA');
g(~cellfun(#isempty, b))
Any ideas how to proceed? Any one-line regular expression will be very helpful if possible.
Looks like a nice NLP problem. Maybe this gets you started:
a='[The/D, handle/NA, of/NS, the/NaAq, hair/NA, brush/NA, is/NaAZ broken/A]';
output={'handle', 'hair brush'};
expr = '(\S+/NA, )+'; % look for words followed by '/NA, '
match = regexp(a,expr,'match');
output = strtrim(strrep(match,'/NA,','')) % strrep: get rid of tag - strtrim: get rid of tailing blank
Note that this approach will fail if the last word is tagged with /NA. You can catch that case independently though.
I'm trying to use regex to find all variable initializations or assignments in code.
Currently I have
(\w+|\w[_])\s*=\s*(\d+\.\d+|.*)
which works but also finds commented out code like
// a = 100; which I don't want it to do. I've tried
([^/]\w+|\w[_])\s*=\s*(\d+\.\d+|.*)`
which I thought should ignore strings that start with / but that doesn't work.
Edit:
For example I'd like it to find lines like
b = 200;
but not // c = 3;
I try this take if necessary.
^(?:(?!\/\/).)*[a-z][a-z0-9\_]*\s*=\s*[0-9]+;
SEE DEMO: http://regex101.com/r/jE4vM0/3
Use this regex and check if the first sub-match is "//", if yes, it is after a comment.
(//)*\s*(\w+|\w[_])\s*=\s*(\d+\.\d+|.*)
For example "var=5;" will get three sub-matches: blank, var, and 5 while "//var=5;" will get //, var, and 5.
I'm trying to parse a log file that looks like this:
%%%% 09-May-2009 04:10:29
% Starting foo
this is stuff
to ignore
%%%% 09-May-2009 04:10:50
% Starting bar
more stuff
to ignore
%%%% 09-May-2009 04:11:29
...
This excerpt contains two time periods I'd like to extract, from the first delimiter to the second, and from the second to the third. I'd like to use a regular expression to extract the start and stop times for each of these intervals. This mostly works:
p = '%{4} (?<start>.*?)\n% Starting (?<name>.*?)\n.*?%{4} (?<stop>.*?)\n';
times = regexp(c,p,'names');
Returning:
times =
1x16 struct array with fields:
start
name
stop
The problem is that this only captures every other period, since the second delimiter is consumed as part of the first match.
In other languages, you can use lookaround operators (lookahead, lookbehind) to solve this problem. The documentation on regular expressions explains how these work in MATLAB, but I haven't been able to get these to work while still capturing the matches. That is, I not only need to be able to match every delimiter, but also I need to extract part of that match (the timestamp).
Is this possible?
P.S. I realize I can solve this problem by writing a simple state machine or by matching on the delimiters and post-processing, if there's no way to get this to work.
Update: Thanks for the workaround ideas, everyone. I heard from the developer and there's currently no way to do this with the regular expression engine in MATLAB.
MATLAB seems unable to capture characters as a token without removing them from the string (or, I should say, I was unable to do so using MATLAB REGEXP). However, by noting that the stop time for one block of text is equal to the start time of the next, I was able to capture just the start times and the names using REGEXP, then do some simple processing to get the stop times from the start times. I used the following sample text:
c =
%%%% 09-May-2009 04:10:29
% Starting foo
this is stuff
to ignore
%%%% 09-May-2009 04:10:50
% Starting bar
more stuff
to ignore
%%%% 09-May-2009 04:11:29
some more junk
...and applied the following expression:
p = '%{4} (?<start>[^\n]*)\n% Starting (?<name>[^\n]*)[^%]*|%{4} (?<start>[^\n]*).*';
The processing can then be done with the following code:
names = regexp(c,p,'names');
[names.stop] = deal(names(2:end).start,[]);
names = names(1:end-1);
...which gives us these results for the above sample text:
>> names(1)
ans =
start: '09-May-2009 04:10:29'
name: 'foo'
stop: '09-May-2009 04:10:50'
>> names(2)
ans =
start: '09-May-2009 04:10:50'
name: 'bar'
stop: '09-May-2009 04:11:29'
If you are doing a lot of parsing and such work, you might consider using Perl from within Matlab. It gives you access to the powerful regex engine of Perl and might also make many other problems easier to solve.
All you should have to do is to wrap a lookahead around the part of the regex that matches the second timestamp:
'%{4} (?<start>.*?)\n% Starting (?<name>.*?)\n.*?(?=%{4} (?<stop>.*?)\n)'
EDIT: Here it is without named groups:
'%{4} (.*?)\n% Starting (.*?)\n.*?(?=%{4} (.*?)\n)'