Google Sheets: Conditional Formatting Based On a Cell - If Contains String - regex

In Google Sheets, is it possible to have conditional formatting on cell A1 that changes the colour if B1 contains the string "Hello World!", but not necessarily exactly the exact same?

try:
=REGEXMATCH(LOWER(B1), "hello world")

Related

Extract a list of unique text characters/ emojis from a cell

I have a text in cell (A1) like this:
โœŒ๐Ÿ˜‹๐Ÿ‘…๐Ÿ‘…โ˜๏ธ๐Ÿ˜‰๐ŸŒ๐Ÿช๐Ÿ’ง๐Ÿ’ง
I want to extract the unique emojis from this cell into separate cells:
โœŒ๐Ÿ˜‹๐Ÿ‘…โ˜๏ธ๐Ÿ˜‰๐ŸŒ๐Ÿช๐Ÿ’ง
Is this possible?
You want to put each character of โœŒ๐Ÿ˜‹๐Ÿ‘…๐Ÿ‘…โ˜๏ธ๐Ÿ˜‰๐ŸŒ๐Ÿช๐Ÿ’ง๐Ÿ’ง to each cell by splitting using the built-in function of Google Spreadsheet.
Sample formula:
=SPLIT(REGEXREPLACE(A1,"(.)","$1#"),"#")
โœŒ๐Ÿ˜‹๐Ÿ‘…๐Ÿ‘…โ˜๏ธ๐Ÿ˜‰๐ŸŒ๐Ÿช๐Ÿ’ง๐Ÿ’ง is put in a cell "A1".
Using REGEXREPLACE, # is put to between each character like โœŒ#๐Ÿ˜‹#๐Ÿ‘…#๐Ÿ‘…#โ˜#๏ธ#๐Ÿ˜‰#๐ŸŒ#๐Ÿช#๐Ÿ’ง#๐Ÿ’ง#.
Using SPLIT, the value is splitted with #.
Result:
Note:
In your question, the value of ๏ธ which cannot be displayed is included. It's \ufe0f. So "G1" can be seen like no value. But the value is existing. So please be careful this. If you want to remove the value, you can use โœŒ๐Ÿ˜‹๐Ÿ‘…๐Ÿ‘…โ˜๐Ÿ˜‰๐ŸŒ๐Ÿช๐Ÿ’ง๐Ÿ’ง.
References:
REGEXREPLACE
SPLIT
Added:
From marikamitsos's comment, I could notice that my understanding was not correct. So the final result is as follows. This is from marikamitsos.
=TRANSPOSE(UNIQUE(TRANSPOSE(SPLIT(REGEXREPLACE(A1,"(.)","$1#"),"#"))))
or try:
=TRANSPOSE(UNIQUE(TRANSPOSE(REGEXEXTRACT(A1, REPT("(.)", LEN(A1))))))
Formula
Appears, one of the best formula solutions would be:
=SPLIT(REGEXREPLACE(A1,"(.)","$1#"),"#")
You may also add some additional checks like skin tones & intermediate chars:
=TRANSPOSE(SPLIT(REGEXREPLACE(A2,"(.[๐Ÿป๐Ÿผ๐Ÿฝ๐Ÿพ๐Ÿฟ"&CHAR(8205)&CHAR(65039)&"]*)","#$1"),"#"))
It will help to join some emojis as a single emoji.
Script
More precise way is to use the script:
https://github.com/orling/grapheme-splitter/blob/master/index.js
โ†‘
Add the code to Script editor
Add code for sample usage:
function splitEmojis(string) {
var splitter = new GraphemeSplitter();
// split the string to an array of grapheme clusters (one string each)
var graphemes = splitter.splitGraphemes(string);
return graphemes;
}
Tests
Not 100% precise
1
Please note: some emojis are not correctly shown in sheets
๐Ÿด๓ ง๓ ข๓ ท๓ ฌ๓ ณ๓ ฟ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ๐Ÿด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ๐Ÿด
โ†‘ emojis:
flag: England
flag: Scotland
flag: Wales
black flag
are the same for Google Sheets.
2
Vlookup function in #GoogleSheets and in #Excel thinks chars
#๏ธโƒฃ and
*๏ธโƒฃ
are the same!

String excerpts

I would like to copy a certain string (out of a longer range of strings in one cell) and show it in a different cell with Google Sheets. This is what is in the initial cell A1:A :
"String 1","String 2","String 3"
In B1:B I'd like ONLY String 3, so without the "" and the other strings.
Is this possible with spreadsheets?
Or is there any other way of doing so?
Update
So the task is to get word inside double quotes. And the mathcing string is placed in the end of text.
You may use regular expressions to deal with that, the basic formula is:
=REGEXEXTRACT(A1,"([^""]+)""$")
This will give a word inside "" from text in cell A1 at the end of text.
For example:
some text...,"Thisthat","https://www.url.com/de/Thisthat"
gives https://www.url.com/de/Thisthat
You may also use arrayformula:
=ArrayFormula(REGEXEXTRACT(A1:A3,"([^""]+)""$"))
Please, read more about this functions here and here.
Old answer
if you want strings to be on their rows, use this formula in B1:
=ArrayFormula(if(A1:A = "String 3";A1:A;""))
If you have cells in A1:A, which contain 'string 3', and you want to match them too, use this:
=ArrayFormula(if(REGEXMATCH(A1:A , "String 3"),"String 3",""))

Google Sheets Trim after either of characters

I have a column of cells in Google Sheets with text in the following formats:
PLAYBILL59; Code Description Here
BROADWAYBOX59: Code Description Here
TICKETCODE: Code Description Here
I want to create a formula that deletes everything after and including either a colon or semi-colon, that would leave:
PLAYBILL59
BROADWAYBOX59
TICKETCODE
I've been trying for hours with no luck.
Any suggestions very appreciated.
Let's say that your colum is A, then you can use REGEXEXTRACT in your formula like
=REGEXEXTRACT(A1; "[A-Z0-9-a-z]+")
Assuming your text string is in A1, try:
=SUBSTITUTE(SUBSTITUTE(A1, "; Code Description Here",""), ": Code Description Here", "")

How can I normalize / asciify Unicode characters in Google Sheets?

I'm trying to write a formula for Google Sheets which will convert Unicode characters with diacritics to their plain ASCII equivalents.
I see that Google uses RE2 in its "REGEXREPLACE" function. And I see that RE2 offers Unicode character classes.
I tried to write a formula (similar to this one):
REGEXREPLACE("pรบblico","(\pL)\pM*","$1")
But Sheets produces the following error:
Function REGEXREPLACE parameter 2 value "\pL" is not a valid regular expression.
I suppose I could write a formula consisting of a long set of nested SUBSTITUTE functions (Like this one), but that seems pretty awful.
Can any offer a suggestion for a better way to normalize Unicode letters with diacritical/accent marks in a Google Sheets formula?
[[:^alpha:]] (negated ASCII character class) works fine for REGEXEXTRACT formula.
But =REGEXREPLACE("pรบblico","([[:alpha:]])[[:^alpha:]]","$1") gives "pblic" as a result. So, I guess, formula doesn't know what exact ASCII character must replace "รบ".
Workaround
Let's take the word pรบblicฤ“; we need to replace two symbols in it. Put this word in cell A1, and this formula in cell B1:
=JOIN("",ArrayFormula(IFERROR(VLOOKUP(SPLIT(REGEXREPLACE(A1,"(.)","$1-"),"-"),D:E,2,0),SPLIT(REGEXREPLACE(A1,"(.)","$1-"),"-"))))
And then make directory of replacements in range D:E:
D E
1 รบ u
2 ฤ“ e
3 ... ...
This formula is still ugly, but more useful because you can control your directory by adding more characters to the table.
Or use Java Script
Also found a good solution, which works in google sheets.
This did it for me in Google Sheets, Google Apps Scripts, GAS
function normalizetext(text) {
var weird = 'รถรผรณล‘รบรฉรกร ลฑรญร–รœร“ลรšร‰รร€ลฐรรงร‡!#ยฃ$%^&*()_+?/*."';
var normalized = 'ouooueaauiOUOOUEAAUIcC ';
var idoff = -1,new_text = '';
var lentext = text.toString().length -1
for (i = 0; i <= lentext; i++) {
idoff = weird.search(text.charAt(i));
if (idoff == -1) {
new_text = new_text + text.charAt(i);
} else {
new_text = new_text + normalized.charAt(idoff);
}
}
return new_text;
}
This answer doesn't require a Google App Script, and it's still fast, and relatively simple. It builds on Max's answer by providing a full lookup table, and it also allows for case-sensitive transliteration (normally VLOOKUP is NOT case-sensitive).
Here is a link to the Google Spreadsheet if you want to jump right into it. If you want to use your own sheet, you'll need to copy the TRANS_TABLE sheet into your Spreadsheet.
In the code snippet below, the source cell is A2, so you'd place this formula in any column on row 2. Using REGEXREPLACE AND SPLIT, we split apart the string in A2 into an array of characters, then USING ARRAYFORMULA, we do the following to EACH character in the array: First, the character is converted to its 'decimal' CODE equivalent, then matched against a table on the TRANS_TABLE sheet by that number, then using VLOOKUP, a character X number of columns over (the index value provided) on the TRANS_TABLE sheet (in this case, the 3rd column over) is returned. When all characters in the array have been transliterated, we finally JOIN the array of characters back into a single string. I provided examples with named ranges as well.
=iferror(
join(
"",
ARRAYFORMULA(
vlookup(
code(split(REGEXREPLACE($A2,"(.)", "$1;"),";",TRUE)),
TRANS_TABLE!$A$5:$F,3
)
)
)
,)
You'll note on the TRANS_TABLE sheet I made, I created 4 different transliteration columns, which makes it easy to have a column for each of your transliteration needs. To reference the column, just use a different index number in the VLOOKUP. Each column is simply a replacement character column. In some cases, you don't want any conversion made (A -> A or 3 -> 3), so you just copy the same character from the source Glyph column. Where you DO want to convert characters, you type in whatever character you want replaced (รฑ -> n etc). If you want a character removed altogether, you leave the cell blank (? -> ''). You can see examples of the transliteration output on the data sheet in which I created 4 different transliteration columns (A-D) referencing each of the Transliteration tables from the TRANS_TABLE sheet for different use case scenarios.
I hope this finally answers your question in a fashion that isn't so "ugly." Cheers.

How to format given string using regex?

So I have defined variables in such a way in my file:
public static final String hello_world = "hello world"
public static final String awesome_world = "awesome world"
public static final String bye_world= "bye world"
I have many declarations like that.
Is it possible to format them as(All '=' in a line):
public static final String hello_world = "hello world"
public static final String awesome_world = "awesome world"
public static final String bye_world = "bye world"
I can't even think of a way to do it. Any kind of help is appreciated.
P.S If it matters, I use sublime text 2.
If it is a one-time task you might try the following:
Import the text file into, e.g., Excel using the 'text in columns' functionality (separation character: space) so that column A contains "public" in each row, column B "static", ..., column E the variable names, column F the "=" signs, and column G the variable values (strings).
Then put the following formula into cell H1 (and copy it down to the other rows):
="public static final String "&E1&REPT(" ";50-LEN(E1))&" = "&""""&G1&""""
Afterwards, column H contains the following outputs:
public static final String hello_world = "hello world"
public static final String awesome_world = "awesome world"
public static final String bye_world = "bye world"
Please note that the Excel functions REPT and LEN are named differently if your Excel language is not English.
If you're careful with your original layout (so that = signs are separated from the variable name, for example, unlike the third line of data in the example), then this will do the job:
awk '{ if (length($5) > max) max = length($5);
name[NR] = $5; value[NR] = $0; sub(/^[^"]*"/, "\"", value[NR]); }
END { format = sprintf("public static final String %%-%ds = %%s\n", max);
for (i = 1; i <= NR; i++) printf(format, name[i], value[i]); }'
It assumes you are dealing with 'public static final String' throughout (but doesn't verify that). It keeps track of the length of the longest name it reads (line 1), and also the variable name and the material from the open double quote to the end of line (line 2). At the end, it generates a format string which will print the variable names left justified in a field as long as the longest (line 3). It then applies that to the saved data (line 4), generating:
public static final String hello_world = "hello world"
public static final String awesome_world = "awesome world"
public static final String bye_world = "bye world"
To make it bomb-proof (e.g. the original data), you have to work a bit harder, though it shouldn't be insuperable. The simplest fix for the sloppy original format would be to pre-filter the data with:
sed 's/=/ = /'
Extra spaces around properly spaced input won't affect the output, and the missing spaces in the 3 sample line of data are fixed. It would be fiddly to do that inside awk because you'd want it to resplit the line after editing it. You could do something very similar in Perl.
Given that the volumes of data to be processed are unlikely to be in the megabyte range, let alone larger, the two command-command solution is perfectly reasonable; you're unlikely to be able to measure the cost of the sed process.
There is no single regex that can solve your problem. Your only option would be to run a series of regexes, one to handle each line length:
s/^(.{40})=/\1 =/
s/^(.{39})=/\1 =/
s/^(.{38})=/\1 =/
And even then, that's probably not what you want and it's probably much, much easier to do it by hand.
The problem is that the only way a regex substitution can insert different strings at different times is if what it's inserting is a backref, and there's no backref to give you your 5 - N space characters. Your other option would be to try to capture a variable number of characters, but in this case there's no way to make that do it for you either.
Regexes were not made to do things like that (they don't support arithmetic), but some text editors are, so just find a fancy text editor or do it by hand.
Since you're using Sublime Text 2, there's a much easier way to do that.
There's a great package for Sublime Text 2 which will do exactly what you want:
Sublime Alignment
Dead-simple alignment of multi-line selections and
multiple selections for Sublime Text 2.
Features:
Align multiple selections to the same column by inserting spaces (or
tabs)
Align all lines in a multi-line selection to the same indent
level
Align the first = on each line of a multi-line selection to the
same column
Before:
After: