Regular expression for a time format - regex

I am trying to search for cells in a column containing a time in the format of 00:00 using the MATCH function. I have tried things similar to MATCH("??:??",A:A) and MATCH("?*:?*",A:A) with no luck. How can I form a regular expression of 2 digits, then a colon, then 2 digits?

Often times what is displayed to us in excel isn't the actual value. Time and Dates are one such time. Generally if you enter a time into excel 3:55 it will convert that automatically to excel's time format, which is a decimal number: 0.163194444444444, but it formats it automatically to "mm:ss" just like you entered it.
So... when you try to =MATCH() using a wildcard, you aren't going to find a hit since the value in the cell is actually that decimal number.
Instead you have to convert the value of the cells you are searching into a text format. You can do that with the =TEXT() formula. Assuming your data starts in A1 you can put in B1:
=Text(A1, "mm:ss")
Now the returned value from that formula still looks like the mm:ss format, but it's now text. The underlying value is actually 3:55. No funny business. You can now base your wild card search of "*:*" off of this column:
=Match("*:*", B1:B10, 0)
If you want to do this all in one formula you can use an Array formula (or CSE):
=Match("*:*", Text(A1:A10,"mm:ss"),0)
Using Ctrl+Shift+Enter (instead of Enter) when you enter the formula.

Related

How to get all cells that appear more than 5 times?

enter image description here
I have a table in OpenOffice that contains a column with region's codes (column J). Using table functions, how to get all codes that appear more than 5 times and write them in one cell?
Normally I would recommend breaking this problem down into smaller parts using helper columns. Or better yet, move the data into LibreOffice Base which can easily work with distinct values.
However, I managed to come up with a rather large formula that seems to do what you asked. Enter it as an array formula.
=TEXTJOIN(",";1;IF(COUNTIF(исходник.J$2:J$552;исходник.J2:J552)>5;IF(ROW(исходник.J2:J552)=MATCH(исходник.J2:J552;исходник.J$2:J$552;0)+ROW(J$2)-1;исходник.J2:J552;"")))
I can't test this on your actual data since your example is only an image, but let's say that there are six of both 77 and 37. Then this would show 77,37 as the result.
Here is a breakdown. Look up the functions in LibreOffice Online Help for more information.
=TEXTJOIN(",";1; — Join all results into a single cell, separated by commas.
IF(COUNTIF(исходник.J$2:J$552;исходник.J2:J552)>5; — Find codes that occur more than 5 times. This is the same as what you wrote.
IF(ROW(исходник.J2:J552)= — Compare the next result to the row number that we are currently looking at.
MATCH(исходник.J2:J552;исходник.J$2:J$552;0)+ROW(J$2)-1; — Determine the first row that has this code. We do this to get unique results instead of 6 or more of each code in the result.
исходник.J2:J552;""))) — Return the code. (Your formula simply returns 1 here, which doesn't seem to be what you want.) If it doesn't match, return an empty string rather than 0, because TEXTJOIN ignores empty strings.

How to filter by Regex in LibreOffice?

I've got this string:
{"success":true,"lowest_price":"1,49€","volume":"1,132","median_price":"1,49€"}
Now I want the value for median_price being displayed in a cell. HHow can I achive this with Regex?
With regex101.com I've came to this solution:
(?<=median_price":")\d{0,4},\d{2}€
But this one does not seem to be working in LibreOffice calc.
I'd advise to discard the Euro-symbol at first since you'd probably want to retrieve a value to calculate with, a numeric value. Therefor try:
Formula in B1:
=--REGEX(A1;".*median_price"":""(\d+(?:,\d+)?)€.*";"$1")
The double unary will transform the result from the 1st capture group into a number. I then went ahead and formatted the cell to display currency (Ctrl+Shift+4).
Note: I went with a slightly different regular pattern. But go with whatever works for your data I supppose.

Convert text to numeral text

I have a google spreadsheet that I am using as a quiz. The quiz takers select an option from a data validation drop down.
The cell below strips all but the first character =left(A2,1) which is a numeral value. Further below in the sheet is a cell that sums certain cells, For instance,
=Sum(A3,D3)
For some reason, the sum function does not recognize the cell as a purely numeral value, even with changing the format to number. Any ideas?
you can use SUMPRODUCT instead of SUM which is able to recognize numeric values even if they are disguised as a text string:
=SUMPRODUCT(A3:D3)
another way would be use regex like:
=ARRAYFORMULA(REGEXEXTRAXT(A3:D3, "(\d+)-")*1)
Given what is immediately above the values in Row3, that row might be redundant and the total calculated as:
=ArrayFormula(sum(0+left(A2:D2)))

Meaning of 3F7.1 in Fortran data format

I am trying to create an MDM file using HLM 7 Student version, but since I don't have access to SPSS I am trying to import my data using ASCII input. As part of this process I am required to input the data format Fortran style. Try as I might I have not been able to understand this step. Could someone familiar with Fortran (or even better HLM itself) explain to me how this works? Here is my current understanding
From the example EG3.DAT they give
(A4,1X,3F7.1)
I think
A4 signifies that the ID is 4 characters long.
1X means skip a space.
F.1 means that it should read 1 decimal places.
I am very confused about what 3F7 might mean.
EG3.DAT
2020 380.0 40.3 12.5
2040 502.0 83.1 18.6
2180 777.0 96.6 44.4
Below are examples from the help documents.
Rules for format statement
Format statement example
EG1 data format
EG2 data format
EG3 data format
One similar question is Explaining Fortran Write Format. Unfortunately it does not explicitly treat the F descriptor.
3F7.1 means 3 floating point numbers, each printed over 7 characters, each with one decimal number behind the decimal point. Leading characters are blanks.
For reading you don't need the .1 info at all, just read a floating point number from those 7 characters.
You guessed the meaning of A4 (string of four characters) and 1X (one blank) correctly.
In Fortran, so-called data edit descriptors (which format the input or output of data) may have repeat specifications.
In the format (A4,1X,3F7.1) the data edit descriptors are A4 and F7.1. Only F7.1 has a repeat specification (the number before the F). This simply means that the format is as though the descriptor appeared repeated: like F7.1, F7.1, F7.1. With a repeat specification of 1, or not given, there is just the single appearance.
The format of the question, then, is like
(A4,1X,F7.1,F7.1,F7.1)
This format is one that is covered by the rules provided in one of the images of the question. In particular, the aspect of repeat specification is given in rule 2 with the corresponding example of rule 3.
Further, in Fortran proper, a repeat count specifier may also be * as special case: that's like an exceptionally large repeat count. *(F7.1) would be like F7.1, F7.1, F7.1, .... I see no indication that this is supported by HLM but if this is needed a very large repeat count may be given instead.
In 1X the 1 isn't a repeat specification but an integral, and necessary, part of the position edit descriptor.
Procedure for making MDM file from excel for HLM:
-Make sure ALL the characters in ALL the columns line up
Select a column, then right click and select Format Cells
Then click on 'Custom' and go to the 'Type' box and enter the number
of 0s you need to line everything up
-Remove all the tabs from the document and replace them with spaces.
Open the document in word and use find and replace
-To save the document as .dat
First save it as .txt
Then open it in Notepad and save it as .dat
To enter the data format (FORTRAN-Style)
The program wants to read the data file space by space, so you have to specify it perfectly so that it reads the whole set properly.
If something is off, even by a single space, then your descriptive stats will be wonky compared to if you check them in another program.
Enclose the code with brackets ()
Divide the entries with commas ,
-Need ID column for all levels
ID column needs to be sorted so that it is in order from smallest to
largest
Use A# with # being the number of characters in the ID
Use an X1 to
move from the ID to the next column
-Need to say how many characters are needed in each column
Use F
After F is the number of characters needed for that column -Use F# (#= number)
There need to be enough character spaces to provide one 'gap' space
between each column
There need to be enough to character spaces to allow for the decimal
As part of the F you need to specify the number of decimal places
You do this by adding a decimal point after the F number and then a
number to represent the spaces you need -F#.#
You can use a number in front of the F so as to 'repeat' it. Not
necessary though. -#F#.#
All in all, it should look something like this:
(A4,X1,F4.0,F5.1)
Helpful links:
https://books.google.de/books?id=VdmVtz6Wtc0C&pg=PA78&lpg=PA78&dq=data+format+fortran+style+hlm&source=bl&ots=kURJ6USN5e&sig=fdtsmTGSKFxn04wkxvRc2Vw1l5Q&hl=en&sa=X&ved=0ahUKEwi_yPurjYrYAhWIJuwKHa0uCuAQ6AEIPzAC#v=onepage&q&f=false
http://www.ssicentral.com/hlm/help6/error/Problems_creating_MDM_files.pdf
http://www.ssicentral.com/hlm/help7/faq/FAQ_Format_specifications_for_ASCII_data.pdf

Convert Scientific notation to text or integer using regex in Notepad++

I am looking to convert a scientific notation number into the full integer number.
E.g:
8.18234E+11 => 818234011668
Excel reformatted all my upc codes within a csv and this solution is not working for me.
I have my csv open in Notepad++ and would love to do this using a regex find and replace.
Thanks.
The damage is already done and cannot be recovered from the CSV file. 8.18234E+11 could be anything* from 818233500000 to 818234499999.
To prevent Excel from rounding large numbers, you need to store them as text. If you set the cell format to text, any value inserted from then on should be automatically interpreted as text. In OpenOffice Calc (I don't have MS Excel), you can also prefix a numeric value with ' to get it interpreted as text no matter the cell format.
There is a chance that the correct value is stored in the original XLS (or XSLX or ODS or the live Excel session or ...) file. If not, then you'll have to enter the data again. If the data is there, you need to store it as text or increase the number of significant digits in the exported CSV. If you only have the exported data, then you're out of luck.
*UPC has a single check digit, so only 100 000 out of the 1 000 000 codes are actually valid UPC codes.