Hope you're doing well.
Imagine I have the following Sheet:
5:20:58 xxxx: entro con el mismo xxxx
5:21:08 xxxx: xxxx
5:21:58 xxxxx: Perfecto, te pido de 5 a 10 minutos mientras
reviso la configuración de las etiquetas. ¿De acuerdo?
5:22:04 xxxxx: ok
I need to delete the datetime of all those rows. The result
xxxx: entro con el mismo xxxx
xxxx: xxxx
xxxxx: Perfecto, te pido de 5 a 10 minutos mientras
reviso la configuración de las etiquetas. ¿De acuerdo?
xxxxx: ok
Is there a formula in Google Sheets to make this?
I tried with REPLACE, SPLIT but is not applicable to all the rows in the sheet.
(The real sheet has too many rows, I extracted a part from the sheet to give an example)
EDIT
(following OP's comment)
...there are sometimes that the data not starts with a timestamp. ... How can I adjust the formula to make it work?
Please use the following altered formula
=INDEX(IFERROR(REGEXEXTRACT(B1:B;" (.*)");B1:B))
OR (for an even more robust formula)
=INDEX(IFERROR(REGEXEXTRACT(B1:B;"^\d+:\d+:\d+ (.+)");B1:B))
Original answer
Please use the following formula (adjust range to your needs)
=INDEX(IFERROR(REGEXEXTRACT(B1:B;" (.*)")))
OR (depending on your locale)
=INDEX(IFERROR(REGEXEXTRACT(B1:B," (.*)")))
Functions used:
INDEX
IFERROR
REGEXEXTRACT
Let's say your raw data is in A2:A. Place this in the second cell (e.g., B2) of an otherwise empty column:
=ArrayFormula(IF(A2:A="",,TRIM(REGEXREPLACE(A2:A,"\d+:\d+:\d+",""))))
ADDENDUM:
Version for some international locales (where semicolon is used in place of a comma within formulas):
=ArrayFormula(IF(A2:A="";;TRIM(REGEXREPLACE(A2:A;"\d+:\d+:\d+";""))))
Could anyone please advise on a way to extract surnames that have spaces in them, as a single block of names?
I have names in a dataset that look like this
clear
input str40 name
"R. P. de la Espriella Guerrero"
"J. de Carvalho Ponce"
"E. De Freitas Drumond"
"R. de la Fuente and M. E. Medina-Mora"
"C. Van Heyningen and I. D. Watson"
"A. Z. van de Wiel and D. W. de Lange"
end
I only want the first surname (so only the first author and excluding other authors) but I want those names that have spaces to be extracted 'en bloc'. So, ultimately resulting in a dataset as follows, for instance:
clear
input str40 name
"de la Espriella Guerrero"
"de Carvalho Ponce"
"De Freitas Drumond"
"de la Fuente"
"Van Heyningen"
"van de Wiel"
end
I'd be grateful for any help.
Here is code that implements the two rules given in my comment above. It assumes the version of Stata used supports the unicode character string functions.
clear
input str40 name
"R. P. de la Espriella Guerrero"
"J. de Carvalho Ponce"
"E. De Freitas Drumond"
"R. de la Fuente and M. E. Medina-Mora"
"C. Van Heyningen and I. D. Watson"
"A. Z. van de Wiel and D. W. de Lange"
end
generate surname = name
replace surname = usubstr(surname,1,ustrpos(surname+" and "," and ")-1)
list, clean noobs
replace surname = usubstr(surname,ustrrpos(surname,". ")+1,.)
list, clean noobs
I have two patterns to use a regular expression. In the first one, I have this pattern and I can catch the word.
With this regex:
referente[,;]*\s\S\s(.+)\.\sOnde
O COORDENADOR-GERAL DE GESTÃO DE PESSOAS DO MINISTÉRIO DOS TRANSPORTES, PORTOS E AVIAÇÃO CIVIL, no uso das atribuições que lhe foram subdelegadas pela Portaria/SAAD nº. 202, art. 1°, inciso VII, de 08 de outubro de 2010, publicada no Diário Oficial da União, de 11 de outubro de 2010, resolve:
Retificar a Portaria COGEP-MT nº 3394, de 30 de novembro de 2016, publicada no Diário Oficial da União, Seção 2, página 55, de 13 de dezembro de 2016, referente à MARIA ALIXANDRINA COSTA REIS. Onde se lê "MARIA AUXILIADORA COSTA REIS"; Leia-se "MARIA ALIXANDRINA COSTA REIS.(Processo SEI: 50000.124582/2016-62) BA.
I need to take the name in another pattern.
O COORDENADOR-GERAL DE GESTÃO DE PESSOAS DO MINISTÉRIO DOS
TRANSPORTES, PORTOS E AVIAÇÃO CIVIL, no uso das atribuições que lhe
foram subdelegadas pela Portaria/SAAD nº. 202, art.1º, inciso VII, de
08 de outubro de 2010, publicada no Diário Oficial da União, de 11 de
outubro de 2010, resolve: Conceder Pensão Temporária, nos termos do
artigo 215 e 217, inciso II, alínea "a" da Lei nº 8.112/1990, à
ELIANE RIBEIRO MENESES, filha inválida do ex-servidor ASTOLFO
MENEZES, matrícula SIAPE nº. 0783182, do Quadro Permanente deste
Ministério, falecido na inatividade em 05 de agosto de 1997, cuja cota
parte equivale a 100% (cem por cento) do valor correspondente à
remuneração decorrente do cargo de Artífice de Mecânica (NI), Classe
"A", Padrão "III", com vigência partir do momento da Publicação da
Portaria de Concessão e efeitos financeiros a partir de 30 de maio de
2015, data do falecimento da viúva. (Processo SEI nº
50000.019342/2016-47) - MG.
I need to take the bold word too, in the same regex. How can I modify this regex?
I am not sure why you want to match/embolden the trailing punctuation and Onde/Where substring.
I would recommend this pattern to optionally match referente then the à then the all-caps words to follow. There are no capture groups, just replace the fullstring with the emboldened fullstring.
(I don't use nsregularexpression, so let me know if something is simply not right.)
/(?:referente )?à [A-Z]+(?: [A-Z]+)*/u
The unicode flag is to accommodate the accented letters that will be encountered.
Pattern Demo
p.s. In your "solution" you are incorporating [,;]* but that doesn't get represented in your sample strings, so I left it out. Reducing the total number of parenthetical groups delivers improved pattern efficiency -- that is why I use just two non-capturing groups.
you can use the following regex to match the 2 bold parts of your examples:
(à\sELIANE\s\w+\s\w+SES)|(referente[,;]*\s\S\s.+\.\sOnde)
Good luck!
My solution is ([,;]*)*\sà\s((\w+\s)+\w+)[\.,]
I have a CSV file like that:
"","LESCHELLES","","LESCHELLES"
"","SAINTE CROIX DE VERDON","","SAINTE CROIX DE VERDON"
"","SERRE CHEVALIER","","SERRE CHEVALIER"
"","SAINT JUST D'ARDECHE","","SAINT JUST D'ARDECHE"
"","NEUVILLE SUR VANNES","","NEUVILLE SUR VANNES"
"","ESCUEILLENS ET SAINT JUST","","ESCUEILLENS ET SAINT JUST"
"","PAS DES LANCIERS","","PAS DES LANCIERS"
"","PLAN DE CAMPAGNE","","PLAN DE CAMPAGNE"
And I'd like to convert it this way:
"","Leschelles","","LESCHELLES"
"","Sainte Croix De Verdon","","SAINTE CROIX DE VERDON","STE CROIX DE VERDON","93"
"","Serre Chevalier","","SERRE CHEVALIER","SERRE CHEVALIER","93"
"","Saint Just D'Ardeche","","SAINT JUST D'ARDECHE"
"","Neuville Sur Vannes","","NEUVILLE SUR VANNES"
"","Escueillens Et Saint Just","","ESCUEILLENS ET SAINT JUST","ESCUEILLENS ET ST JUST","91"
"","Luc","","LUC"
"","Pas Des Lanciers","","PAS DES LANCIERS","PAS DES LANCIERS","93"
"","Plan De Campagne","","PLAN DE CAMPAGNE","PLAN DE CAMPAGNE","93"
This would be nice. And better: lowercase all "whole" words like de, d', et, sur and des. This would give:
"","Leschelles","","LESCHELLES"
"","Sainte Croix de Verdon","","SAINTE CROIX DE VERDON","STE CROIX DE VERDON","93"
"","Serre Chevalier","","SERRE CHEVALIER","SERRE CHEVALIER","93"
"","Saint Just d'Ardeche","","SAINT JUST D'ARDECHE"
"","Neuville sur Vannes","","NEUVILLE SUR VANNES"
"","Escueillens et Saint Just","","ESCUEILLENS ET SAINT JUST","ESCUEILLENS ET ST JUST","91"
"","Luc","","LUC"
"","Pas des Lanciers","","PAS DES LANCIERS","PAS DES LANCIERS","93"
"","Plan de Campagne","","PLAN DE CAMPAGNE","PLAN DE CAMPAGNE","93"
Python has title():
Return a titlecased version of the string where words start with an
uppercase character and the remaining characters are lowercase.
The algorithm uses a simple language-independent definition of a word
as groups of consecutive letters. The definition works in many
contexts but it means that apostrophes in contractions and possessives
form word boundaries, which may not be the desired result:
"they're bill's friends from the UK".title() "They'Re Bill'S Friends From The Uk"
A workaround for apostrophes can be constructed
using regular expressions:
import re
def titlecase(s):
return re.sub(r"[A-Za-z]+('[A-Za-z]+)?",
lambda mo: mo.group(0)[0].upper() +
mo.group(0)[1:].lower(),
s)
titlecase("they're bill's friends.") "They're Bill's Friends."
Update: here's the solution for French problem:
import re, sys
def titlecase(s):
return re.sub(r"[A-Za-z]+('[A-Za-z]+)?",
lambda mo: mo.group(0)[0].upper() +
mo.group(0)[1:].lower(),
s)
def french_parse(s):
p = re.compile(
r"( de la | sur | sous | la | de | les | du | le | au | aux | en | des | et )|(( d'| l')([a-z]+))",
re.IGNORECASE)
return p.sub(
lambda mo: mo.group().find("'")>0
and mo.group()[:mo.group().find("'")+1].lower() +
titlecase(mo.group()[mo.group().find("'")+1:])
or (mo.group(0)[0].upper() + mo.group(0)[1:].lower()),
s);
for line in sys.stdin:
s = line[20:len(line)-1]
p = s.find('"')
t = s[:p]
# Just output to show which names have been modified:
if french_parse( titlecase(t) ) != titlecase(t):
print '"' + french_parse( titlecase(t) ) + '"'
Just launch it like this:
python thepythonscript.py < file.csv
Then the output will be:
"Grenand les Sombernon"
"Touville sur Montfort"
"Fontenay en Vexin"
"Durfort Saint Martin de Sossenac"
"Monclar d'Armagnac"
"Ports sur Vienne"
"Saint Barthelemy de Beaurepaire"
"Saint Bernard du Touvet"
"Rosoy le Vieil"
While you may be able to pull this off with some vim regex magic, I think it'll be easier if you solve the problem in your favorite scripting language, and pipe selected text through that from vim using the ! command. Here's an (untested) example in PHP:
#!/usr/bin/env php
<?php
$specialWords = array('de', 'd\'', 'et', 'du', /* etc. */ );
foreach (file('php://stdin') as $ville) {
$line = ucwords($line);
foreach ($specialWords as $w) {
$line = preg_replace("/\\b$w\\b/i", $w, $line);
}
echo $line;
}
Make that script executable and store it somewhere on your PATH; then from vim, select some text and use :'<,'>! yourscript.php to convert (or just :%! yourscript.php for the whole buffer).
The csv.vim ftplugin helps with working in CSV files. Though it does not offer a "substitute in column N" function directly, it may get your near that. At least you can arrange the columns into neat blocks, and then apply a simple regexp or visual blockwise selection to it.
But I second that using a different toolchain that is more suited to manipulating CSV-files may be preferable over doing this completely in Vim. It also depends on whether it's a one-off task or, you do this frequently.
Here is an oneliner vim command.
%s/"[^"]*",\zs\("[^"]*"\)/\=substitute(substitute(submatch(0), '\<\(\a\)\(\a*\)\>', '\u\1\L\2', 'g'), '\c\<\(de\|d\|l\|sur\|le\|la\|en\|et\)\>', '\L&', 'g')
I expect here to have no double-quotes in the first two fields.
The idea behind this solution is to rely on :h :s\= to execute a series of functions on the second field once found. The series of functions being: first change each word to TitleCase, then put all liants in lowercase.