Auto serial number and text in Google Sheets: "Number - Text" by dragging - if-statement

For my data project, column A is Name of "number-text".
Google Sheets can automatically create "text-number" by dragging.
ABC - 01
ABC - 02
ABC - 03
But it doesn’t work when I use: "number - text" .
Example:
01 - ABC
02 - ABC
03 - ABC
How can I create a serial number and text in Google Sheets by dragging or another better solution (arrayformula)?

If you really wish for a dragging solution, you can try:
=TEXT(ROW(A1), "00")&" - ABC"
Though, the proper way would be to use arrayformula bound to another column, like:
=ARRAYFORMULA(IF(B2:B="",,TEXT(COUNTIFS(B2:B, "<>",
ROW(B2:B), "<="&ROW(B2:B)), "00")&" - ABC"))

Related

Extract two words regardless of order In Google Sheets

I have a Google Sheets table with input in column A, and I'd want to achieve this result using REGEXEXTRACT.
Desired result:
Input
Output
Stock OutNew21554 - Shirt - Red
New | Stock Out
NewStock Out54872 - Shirt - Green
New | Stock Out
This is what I attempted.
01
=ArrayFormula(REGEXEXTRACT(A1:A2, "[(Stock Out)|(New)]+"))
Input
Output
Stock OutNew21554 - Shirt - Red
Stock OutNew
NewStock Out54872 - Shirt - Green
NewStock Out
02
=ArrayFormula(REGEXEXTRACT(A2:A3, "(Stock Out)|(New)+"))
Input
Output
Stock OutNew21554 - Shirt - Red
Stock Out
NewStock Out54872 - Shirt - Green
Use two instances of regexextract() in an { array expression }, wrapped in iferror():
=arrayformula( iferror(
{
regexextract(A2:A3, "New"),
regexextract(A2:A3, "Stock Out")
}
) )
There's no way to this in a single regex without generating all possible permutations or without lookaround support. However, we can call regexextract repeatedly using REDUCE. For eg, to extract, New,Stock and Color,
=BYROW(A2:A3,LAMBDA(row,TEXTJOIN(" | ",1,REDUCE(,{"New","Stock Out","Red|Green"},LAMBDA(a,c,{a;IFNA(REGEXEXTRACT(row,c))})))))
This supports unlimited☨ number of words to extract.
Output
New | Stock Out | Red
New | Stock Out | Green

Getting value between '-' in google sheets

Im trying to get the number between '-' and '-' in google sheets but after trying many things I still havent been able to find the solution.
Data record 1
England Premier League
West Ham vs Crystal Palace
2.090 - 3.47 - 3.770
Expected value = 3.47
Data record 2
England League Two
Carlisle vs Scunthorpe
2.830 - 3.15 - 2.820
Expected value = 3.15
Hopefully someone can help me out
Try either of the following
option 1.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4," \d+\.\d+ ")*1))
option 2.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4,".* - (\d+\.\d+) ")))
(Do adjust the formula according to your ranges and locale)
use:
=INDEX(IFNA(REGEXEXTRACT(A1:A, "- (\d+(?:.\d+)?) -")*1))

How to combine the Output of Regex Findall in Pandas

I'm exploring regex with pandas in a jupyter notebook.
My goal is to extract housenumberadditions from an addressline, using a set of regex patterns.
I'm building upon this post: https://gist.github.com/christiaanwesterbeek/c574beaf73adcfd74997
and I use this for input from a .csv:
Afleveradres
Dorpstraat 2
Dorpstr. 2
Dorpstraat 2
Laan 1933 2
18 Septemberplein 12
Kerkstraat 42-f3
Kerk straat 2b
42nd street, 1337a
1e Constantijn Huigensstraat 9b
Maas-Waalweg 15
De Dompelaar 1 B
Kümmersbrucker Straße 2
Friedrichstädter Straße 42-46
Höhenstraße 5A
Saturnusstraat 60-75
Saturnusstraat 60 - 75
Plein \'40-\'45 10
Plein 1945 1
Steenkade t/o 56
Steenkade a/b Twee Gezusters
1, rue de l\'eglise
Herestraat 49 BOX1043
Maas-Waalweg 15 15
My goal is to extract the streetnames, housenumbers & housenumberadditions.
So far I basically use:
# get data
file_base_name = 'examples'
dfa = pd.read_csv(''+file_base_name+'.csv', sep=';')
#get number
dfa['num'] = dfa['Afleveradres'].str.extract(r"([,\s]+\d+)\s*")
dfa['num'] = dfa['num'].str.strip()
# split leftover values into street & addition
dfa['tmp']=dfa.Afleveradres.str.replace(r"([,\s]+\d+)\s*", ';')
# new data frame with split value columns
new = dfa["tmp"].str.split(";", n = 1, expand = True)
# making separate first name column from new data frame
dfa["str"]= new[0]
# making separate last name column from new data frame
dfa["add"]= new[1]
dfa.drop(['tmp'], axis=1, inplace=True)
which results in:
listing streenames, numbers & addition:
;Afleveradres;str;add;num
0;Dorpstraat 2;Dorpstraat;;2
1;Dorpstr. 2;Dorpstr.;;2
2;Dorpstraat 2;Dorpstraat;;2
3;Laan 1933 2;Laan;2;1933
4;18 Septemberplein 12;18 Septemberplein;;12
5;Kerkstraat 42-f3;Kerkstraat;-f3;42
6;Kerk straat 2b;Kerk straat;b;2
7;42nd street, 1337a;42nd street;a;, 1337
8;1e Constantijn Huigensstraat 9b;1e Constantijn Huigensstraat;b;9
9;Maas-Waalweg 15;Maas-Waalweg;;15
10;De Dompelaar 1 B;De Dompelaar;B;1
So far so good, for now.
Next, I'd like to correct for housenumber ranges, like '42-46' and '60 - 65'.
A re.findall returns expected values:
import re
def rem(str):
pattern = r'[,#\'?\.$%_]'
if re.match(pattern, str):
tmp = 'Y'
else:
tmp = 'N'
return tmp
def extract_numrange(row):
r = ''+row['Afleveradres']
num_range1 = re.findall(r'([,\s]+\d+\-+\d+)\s*|([,\s]+\d+\s+\-+\s+\d+)\s*',r)
return num_range1
# return rem(num_range1)
dfa['excep'] = dfa.apply(extract_numrange, axis=1)
dfa
output re.findall
15 Friedrichstädter Straße 42-46 Friedrichstädter Straße -46 42 [( 42-46, )]
16 Höhenstraße 5A Höhenstraße A 5 []
17 Saturnusstraat 60-75 Saturnusstraat -75 60 [( 60-75, )]
18 Saturnusstraat 60 - 75 Saturnusstraat -; 60 [(, 60 - 75)]
But how do I clean this output, from [( 42-46, )] and [(, 60 - 75)] into something like 42-46 and 60 - 75 in a new column?
Or are there better approaches for my question?
The problem comes from the fact there are two capturing groups. You need to re-vamp the pattern to use only a single capturing group, or get rid of the group altogether.
Your pattern is of the (Group1)\s*|(Group2)\s* type. As you see, all you need is to re-group the parts into (Group1|Group2)\s*.
So, the quickest fix is
([,\s]+\d+\-+\d+|[,\s]+\d+\s+\-+\s+\d+)\s*
See the regex demo.
However, I think you do not need the whitespaces on both ends. Then, move those patterns you do not want to capture out of the grouping:
[,\s]+(\d+\-+\d+|\d+\s+\-+\s+\d+)\s*
^^^^^^
See this regex demo.
Probably, you may reduce this even further to
[,\s](\d+(?:-+|\s+-+\s+)\d+)
See this regex demo, the (?:-+|\s+-+\s+) is a non-capturing group that won't result in additional tuple item.

Rename file names with accents in shell

I have had an error while migrating my files from another server.
Now, all wordpress files are like this:
Tendência-Moda-Feminina-2014
I need to have this name: Tendência-Moda-Feminina-2014
I would like to know if someone has a regex to replace those "codes" to the normal portuguese accents.
There are others codes that I need to exchange to my accented letters.
My foldes are like this:
2014
- 01
- 02
...
- 12
2015
- 01
- 02
...
- 12
Its the uploads foldes of wordpress structure.

AWStats multiple columns in extra section

I have an AWStats running and the reports are built from IIS logfiles.
I have an extra section to view all the actions of the executed perlscripts on the site.
The config looks like this:
ExtraSectionName1="Actions"
ExtraSectionCodeFilter1="200 304"
ExtraSectionCondition1="URL,\/cgi\-bin\/.+\.pl"
ExtraSectionFirstColumnTitle1="Action"
ExtraSectionFirstColumnValues1="QUERY_STRING,action=([a-zA-Z0-9]+)"
ExtraSectionFirstColumnFormat1="%s"
ExtraSectionStatTypes1=HPB
ExtraSectionAddAverageRow1=0
ExtraSectionAddSumRow1=1
MaxNbOfExtra1=20
MinHitExtra1=1
The output looks like this:
Action Pages Hits
foo 1234 1234
bar 5678 5678
But there are some actions with the same name in different perl scripts.
I would need this:
Script Action Pages Hits
foo.pl foo 1234 1234
bar.pl foo 1234 1234
foo.pl bar 5678 5678
bar.pl bar 5678 5678
Does anyone know how to create such a report?
EDIT:
I did some more research and all forum posts I've found say that it is not possible to have two columns in an extra section without hacking in awstats.pl
Now I am trying to put it into one column using URLWITHQUERY to output someting like this:
Action Pages Hits
foo.pl?action=foo 1234 1234
foo.pl?action=bar 1234 1234
bar.pl?action=foo 5678 5678
...
The new problem is that the query has more parameters than action, which are unordered.
I tried this
ExtraSectionFirstColumnValues1="URLWITHQUERY,([a-zA-Z0-9]+\.pl\?).*(action=[a-zA-Z0-9]+)"
but AWStats only gets the value from the first bracket pair and ignores the rest. I think it internally works with $1 provided by the perl regex 'magic'.
Any ideas?
maybe?
ExtraSectionFirstColumnTitle1="Script"
ExtraSectionFirstColumnValues1="URL,\/cgi\-bin\/(.+\.pl)`enter code here`"
ExtraSectionFirstColumnFormat1="%s"
ExtraSectionFirstColumnTitle2="Action"
ExtraSectionFirstColumnValues2="QUERY_STRING,action=([a-zA-Z0-9]+)"
ExtraSectionFirstColumnFormat2="%s"
I've found a solution.
awstats.pl fetches the data for the specified extra sections in line 19664 - 19750
This is my modification:
# Line 19693 - 19701 in awstats.pl (AWStats version 7 Revision 1.971)
elsif ( $rowkeytype eq 'URLWITHQUERY' ) {
if ( "$urlwithnoquery$tokenquery$standalonequery" =~
/$rowkeytypeval/ )
{
$rowkeyval = "$1$2"; # I simply added a $2 for the second capture group
$rowkeyok = 1;
last;
}
}
This will get the first and the second capture group specified in the ExtraSectionFirstColumnValuesX regex.
Example:
ExtraSectionFirstColumnValues1="URLWITHQUERY,([a-zA-Z0-9]+\.pl\?).*(action=[a-zA-Z0-9]+)"
Needless to say that you need to add a $3 $4 $5 ... if you need more groups.