Best way to use spreadsheet RegEx to extract text and numbers and replace with the formatting? - regex

I'm currently working on a non profit project where I need to reformat the way the data in the rows displays.
At the moment, this is how the row data looks:
Save The Children (Donation)|10.00{0}{2}
And I need it to output like this instead:
donation_id:save_children|quantity:1|total:10.00
The first problem is sometimes there's multiple items within the row:
Save The Children (Donation)|10.00{0}{2} / Save The Forrest|15.50{0}{2}
In which case it would need to be separated by a semicolon:
donation_id:save_children|quantity:1|total:10.00;donation_id:save_forrest|quantity:1|total:15.50
The second problem is, we have 9 donation variables/causes, each needing to convert the output to a different "donation_id".
So every time it finds:
Save the Children, it needs to convert to: donation_id:save_children
Save the Forrest, to, donation_id:save_forrest
Save the Animals, to, donation_id:save_animals
And so forth.
And the third problem is that the donation amounts are variable (as people donate whatever they wish), so the "total:" dollar value that we ouput will often be different.
How would I go about doing this with the regex?
Thank you

You can use below regex
(Save) The (Children|Forrest|Animals).*?\|([0-9]+\.[0-9]+)\{0\}\{2\}([\s\/]+)?
substitution/replace with
donation_id:$1_$2|quantity:1|total:$3;
When I test for
Save The Children (Donation)|10.00{0}{2} / Save The Forrest|15.50{0}{2}
Output is
donation_id:Save_Children|quantity:1|total:10.00;donation_id:Save_Forrest|quantity:1|total:15.50;
Test it online!

Related

Conditional Formatting Depending Upon Multiple Numbers

I have a column of values that are a number out of 10. So, it could be 2/10, 3/10, 4/10 and so on, all the way up to 10/10. To be clear, these are not dates, but simply showing how many questions the student answered correctly out of 10.
I'm trying to use conditional formatting to highlight them a certain color depending upon the score they got. For 9/10 and 10/10, I'm wanting to use a certain color, but it doesn't seem to be working with REGEXMATCH or with OR. Also wanting to highlight all scores that are 6/10 or lower. I know that I could make this work by applying conditional formatting for each and every score with text contains but the problem I'm finding is that it thinks it's a date.
Is there a way to match multiple scores out of 10 using REGEXMATCH?
Link to Sheet
select column and change formatting to Plain text
now you can use formula like:
=REGEXMATCH(A1; "^9|10\/")

Automating cell references

I have a standard sheet formatted like this, with columns going linearly
First Name Last Name Contact Type Address Address 2 City State
My goal is to take the data in this sheet and format it differently in another sheet, like this, vertically
First Name Last Name
Contact Type
Address
Address 2
City State Zip
For the users, they can then use this to print out mailing labels (plus for some of my users, reading info in block form is just easier)
So I can do something like this in the new sheet
={NameDirectory!B4&" "&NameDirectory!C4}
={NameDirectory!D4}
={NameDirectory!E4&" "&NameDirectory!F4}
etc, so each line grabs the right data, so the address appears in a block
Of course, the next block will be similar
={NameDirectory!B5&" "&NameDirectory!C5}
={NameDirectory!D5}
={NameDirectory!E5&" "&NameDirectory!F5}
The only thing that changes is the number as I have merely shifted down a row.
I keep thinking there must be an easier way to do this, other than copying the formula, pasting it in and manually changing B5 to E5. Likewise, I can drag the formula down multiple rows, but it changes the number not the letter as I want.
Is there anyway I can achieve what I want without a lot of copying and pasting?
Here is an example (not sharing the original as it as real contact information, so this has two entries, but the original has around 50+)
https://docs.google.com/spreadsheets/d/1Ob4JXRT0CqqOb94-3Mqt1wu2LSsm3oeodc0eTPtT_3U/edit#gid=0
try:
=ARRAYFORMULA(FLATTEN(QUERY(SPLIT(FLATTEN(QUERY(TRANSPOSE(SUBSTITUTE(
{Sheet1!A2:A&" "&Sheet1!B2:B, Sheet1!C2:E, Sheet1!F2:F&", "&Sheet1!G2:G&" "&Sheet1!H2:H, Sheet1!I2:K,
IF(Sheet1!A2:A="",," ")}, " ", CHAR(13))),,9^9)), " "), "where not Col2 contains ','", 0)))

spotfire plot list of elements

I have a data table that has this format :
and I want to plot temperature to time, any idea how to do that ?
This can be done in a TERR data function. I don't know how comfortable you are integrating Spotfire with TERR, there is an intro video here for instance (demo starts from about minute 7):
https://www.youtube.com/watch?v=ZtVltmmKWQs
With that in mind, I wrote the script without loading any library, so it is quite verbose and explicit, but hopefully simpler to follow step by step. I am sure there is a more elegant way, and there are better ways of making it flexible with column names, but this is a start.
Your input will be a data table (dt, the original data) and the output a new data table (dt.out, the transformed data). All column names (and some values) are addressed explicitly in the script (so if you change them it won't work).
#remove the []
dt$Values=gsub('\\[|\\]','',dt$Values)
#separate into two different data frames, one for time and one for temperature
dt.time=dt[dt$Description=='time',]
dt.temperature=dt[dt$Description=='temperature',]
#split the columns we want to separate into a list of vectors
dt2.time=strsplit(as.character(dt.time$Values),',')
dt2.temperature=strsplit(as.character(dt.temperature$Values),',')
#rearrange times
names(dt2.time)=dt.time$object
dt2.time=stack(dt2.time) #stack vectors
dt2.time$id=c(1:nrow(dt2.time)) #assign running id for merging later
colnames(dt2.time)[colnames(dt2.time)=='values']='time'
#rearrange temperatures
names(dt2.temperature)=dt.temperature$object
dt2.temperature=stack(dt2.temperature) #stack vectors
dt2.temperature$id=c(1:nrow(dt2.temperature)) #assign running id for merging later
colnames(dt2.temperature)[colnames(dt2.temperature)=='values']='temperature'
#merge time and temperature
dt.out=merge(dt2.time,dt2.temperature,by=c('id','ind'))
colnames(dt.out)[colnames(dt.out)=='ind']='object'
dt.out$time=as.numeric(dt.out$time)
dt.out$temperature=as.numeric(dt.out$temperature)
Gaia
because all of the example rows you've shown here contain exactly four list items and you haven't specified otherwise, I'll assume that all of the data fits this format.
with this assumption, it becomes pretty trivial, albeit a little messy, to split the values out into columns using the RXReplace() expression function.
you can create four calculated columns, each with an expression like:
Int(RXReplace([values],"\\[([\\d\\-]+),([\\d\\-]+),([\\d\\-]+),([\\d\\-]+)]","\\1",""))
the third argument "\\1" determines which number in the list to extract. backslashes are doubled ("escaped") per the requirements of the RXReplace() function.
note that this example assumes the numbers are all whole numbers. if you have decimals, you'd need to adjust each "phrase" of the regular expression to ([\\d\\-\\.]+), and you'd need to wrap the expression in Real() rather than Int() (if you leave this part out, the result will be a String type which could cause confusion later on when working with the data).
once you have the four columns, you'll be able to unpivot to get the data easily.

Horizontal stretching in ListRenderer

I have a list that should display 7 items that each look like this:
Date Weekday Distance Time
Long text that may span many lines
two column text Distance Time
two column text Distance Time
two column text Distance Time
The last lines repeat in a number depending on the data, i e there may be different amounts of such lines for each list item.
I have tried implementing this with a ListCellRenderer that creates a table according to the requirements above, but I have a few problems with it:
The long text that may span many lines is implemented in a SpanLabel. But this text will not display more than one line anyway
Each item in the list will get space for the same number of lines below the first two..
So it seems that items in a list must be of the same size.
Later I also want to be able to detect selection on the entire list item, not just individual fields of it.
Is there a better way to do this?
How do I ensure that the SpanLabel actually gets as much space as it needs?
How do I ensure that the unknown number of lines gets the space they need, depending on how many they are?
Don't use a list: https://www.codenameone.com/blog/deeper-in-the-renderer.html
Lists in Codename One assume every entry is exactly the same height and provide no flexibility here.
I suggest doing something like the property cross demo: https://www.udemy.com/learn-mobile-programming-by-example-with-codename-one/
Where we use a Container with components within to provide a list like behavior with the full flexibility that arbitrary components allow.

Comparing two documents

I have two very large lists. They both were originally in excel, but the larger one is a list of emails (about 160,000) of them with other information like their name and address etc. And the smaller one is a list of just 18,000 emails.
My question is what would be the easiest way to get rid of all 18,000 rows from the first document that contain the email addresses from the second?
I was thinking regex or maybe there is another application I can use? I have tried searching online but it seems like there isn't much specific to this. I also tried notepad++ but it freezes when I try to compare these large files.
-Thank You in Advance!!
Good question. One way I would tackle this is making a C++ program [you could extrapolate the idea to the language of your choice; You never mentioned which languages you were proficient in] that read each item of the smaller file into a vector of strings. First, of course, use Excel to save the files as CSV instead of XLS or XLSX, which will comma-separate the values so you can work with them easier. For the larger list, "Save As" a copy of just email addresses, deleting the other rows for now.
Then, you could open the larger list and use a nested loop to check if you should output to an output file. Something like:
bool foundMatch=false;
for(int y=0;y<LargeListVector.size();y++) {
for(int x=0;x<SmallListVector.size();x++) {
if(SmallListVector[x]==LargeListVector[y]) foundMatch=true;
}
if(!foundMatch) OutputVector.append(LargeListVector[y]);
foundMatch=false;
}
That might be partially pseudo-code, but do you get the idea?
So I read a forum post at : Here
=MATCH(B1,$A$1:$A$3,0)>0
Column B would be the large list, with the 160,000 inputs and column A was my list of things I needed to delete of 18,000.
I used this to match everything, and in a separate column pasted this formula. It would print out either an error or TRUE. If the data was in both columns it printed out true.
Then because I suck with excel, I threw this text into Notepad++ and searched for all lines that contained TRUE (match case, because in my case some of the data had the word true in it without caps.) I marked those lines, then under search, bookmarks, I removed all lines with bookmarks. Pasted that back into excel and voila.
I would like to thank you guys for helping and pointing me in the right direction :)