Cannot delete non-space whitespace character in excel - regex

When bringing in data into excel via whatever method (import, paste, ...) I sometimes get the following issue. At the beginning of the cell there is an extra space in front of the text. Now I know the usual procedures to handle this namely:
trim(cell number)
and if its not a space character
=TRIM(SUBSTITUTE(cell number,CHAR(160),CHAR(32)))
But this time both of these didn't work. I did try other substitute CHAR's.
AND the character at the beginning is just plain weird. When I go to the very beginning of the cell and try to delete it I must hit the delete key twice to remove one space! But when I go to the first character in the cell and instead hit backspace I only need to press it once.
What else can I do to eliminate this weird non-space whitespace character?

If cell A1 contains non-visible junk characters, you must identify them before you can remove them.
Pick some cell and enter:
=IFERROR(CODE(MID($A$1,ROWS($1:1),1)),"")
and copy down. This will give you the CHAR code for each character in A1
Then you can use SUBSTITUTE() to remove the offender.

Lets assume column A has text where some cells are good and some have text with the weird space like character at the front. So some cells we want to change and some we don't.
1) Create a one column table with one letter in each cell. I decided to go over to the right to column H for the table. So for example cell H1 has A, cell H2 has B and so on.
2) Get the length of the cell we want to edit. I've put this formula in cell B1.
=LEN(A1)
3) Test the cell for the first letter. This gives us which cell to change and which not. I've put this formula in cell C1.
=ISNA(VLOOKUP(LEFT(A1),$H$1:$H$26,1,0))
4) Change (or not depending on step 3) using RIGHT and the result from LEN.
=IF(B1,RIGHT(A1,B1-2),A1)
Notice that I have to subtract 2 spaces and not one? Like I said it was a strange character.
5) Repeat down the column.

If the first legitimate character in your string will be in the set [A-Za-z0-9] then you could use this formula:
=MID(A1,MIN(SEARCH({"a";"b";"c";"d";"e";"f";"g";"h";"i";"j";"k";"l";"m";"n";"o";"p";"q";"r";"s";"t";"u";"v";"w";"x";"y";"z";0;1;2;3;4;5;6;7;8;9},A1&"abcdefghijklmnopqrstuvwxyz1234567890")),99)
where 99 is longer than the longest string might be. If there are other legitimate starting characters, then add them to both the array constant and the string at the end.
If you might need to remove trailing spaces (char(32)), you can enclose the above in a TRIM function.

Related

How to use Data Validation, Number must start with 7

I want to find a formula that can validate the order number. if the future order numer entered does not start with 7****, it will show a warning. Thank you so much for your help.
https://docs.google.com/spreadsheets/d/1piS3GQ5TzrGAr4VSoSbABkMa-fRm6n_wP16RimegO6E/edit?usp=sharing
Select the range of cells that you only allow texts that start or end with certain characters.
Click Data > Data Validation > Data Validation.
In the Data Validation dialog box, please configure as follows.
1 Select Custom from the Allow drop-down list;
2 For allowing texts that start with certain characters, please copy the below formula into the Formula box;
=EXACT(LEFT(A2,3),"KTE")
And for allowing texts that end with certain characters, please copy the below formula into the Formula box;
=EXACT(RIGHT(A2,3),"KTE")
3 Click the OK button.
Notes:
In the formulas, A2 is the first cell of the selected range; 3 is the number of characters you specified, KTE is the start or end text.
These two formulas are case-sensitive.
If you don’t need case-sensitive, please apply the below CONTIF formulas:
Only allow texts that start with KTE in a range of cells
=COUNTIF(A2,"KTE*")
Only allow texts that end with KTE in a range of cells
=COUNTIF(A2,"*KTE")
And then, click OK button. From now on, only the text string begins or ends with the centain characters you specified can be entered into the selected cells.
In your case, replace KTE with 7 since you want yours to start with number 7.
Yours should be:
=EXACT(RIGHT(A2,1),"7")
You can remove the quotation marks housing the number 7 since 7 is an int, not a string (varchar)
=EXACT(RIGHT(A2,1),7) maybe you can try this if the former fails.
use:
=REGEXMATCH(""&A2, "^7.+")

Replace trailing ".1" to ".2"

I am assuming you would need a regex for this. The best I could come up with is
=REGEXREPLACE(C2, "\.(?=[^.]*$)", ".2")
but it only detects the period in the end and the google sheet returns #REF!
Other ways, such as directly changing the cell C2:C5, are also welcomed.
You can just check if the trailing 2 characters from the right are equal to .1
get two chars from the right
test equality
RIGHT(A1,2)=".1"
Then, to convert matching values, you can slice off the last two chars (length-2) and append the .2
LEFT(A1,LEN(A1)-2)&".2"
All together
=IF(RIGHT(A1,2)=".1",LEFT(A1,LEN(A1)-2)&".2",A1)
If you actually want to increment arbitrary values (and not just .1), you can skip the equality check and add 0.1 intermediately
=LEFT(C3,LEN(C3)-2)&((RIGHT(C3,2)+0.1)&"")
If you have values with more than a single digit, hunt them in an intermediate column so you can use their length to
add the right power of ten (.5+0.1, .993+0.001, etc.)
exclude the right number of chars when appending
If you want a full version parser, consider VBA or passing the column to a more practical language

Excel formulae: need way to determine if 3rd text character from right is "-"

I have a column of hospital names. In most of them, the last three characters are "-" and the two-letter abbreviation for the state, e.g. "-CA". but some (out of hundreds) have the state name somewhere in the hospital name, e.g. "Texas Tech U Affil-Lubbock" or "Community Health of South Florida".
I'm trying to find a way to make Excel give the last two characters only if the 3rd character from the right is a dash ("-"), but trying to specify that character position seems impossible.
I tried:
=IF(RIGHT(H4,-3)="-",RIGHT(H4,2),"noabbrev") and get #VALUE
=IF(RIGHT(H4,3)="-??",(RIGHT(H4,2)),"noabbrev") and always get noabbrev for
all cells
At this point, I fear I need to use =RIGHT(H4,2) in order to get the bulk of the cells correctly and eyeball/correct the errors by hand.
Am I missing the obvious again?
You can use this formula if H4 is text:
=IF(MID(H4,LEN(H4)-2,1)="-",RIGHT(H4,2),"noabbrev")
If A1 contains some text, then:
=Left(Right(A1,3),1)
should isolate the character you want.

(VBS) Make a list of multiple selection entries

I have a text in Word and I select some words from it using Ctrl+Click, then after the paragraph I make a numerated list of those selected words. It looks like this
I was wondering if it was possible to use VBS to create a macros, which would put a number after selected words and make a list of them afterwards. So far I have only managed to do something like this:
Sub MakeAList()
With Selection
.Copy
.MoveDown Unit:=wdParagraph, Count:=1
.TypeParagraph
.Paste
End With
End Sub
It copies the selected words and makes each one a paragraph right after the main one. I still make a list and put (*) after the word in the original text.
To clear up any confusion, the language is Chinese, and the words are not separated by spaces, but I'll try to use the "Split". What's important is when I select several words, let's say for 2-character words, the programm shows that I have selected 8 words (4*2), but 4 lines, even if those words are in the same line. So I think I shouls experiment with the line counter, if there is any.

Find rows that contain three digit number

I need to subset rows that contain <three digit number>
I wrote
foo <- grepl("<^[0-9]{3}$>", log1[,2])
others <- log1[!foo,]
but I'm not really sure how to use regex...just been using cheat sheets and Google. I think the < and > characters are throwing it off.
You almost had it. Try
^&lt[0-9]{3}&gt$
It might behoove you to read about anchors (^ and $).
The ^ and $ signs refer to the beginning and end of the string, respectively. You shouldn't be matching anything before or after them.
If you want rows that contain that pattern, you shouldn't use the anchors at all. You should just use this: <[0-9]{3}> (or shorten it to <\\d{3}>)
Just for posterity, I thought I would contribute what I think is the implied answer to the OP's stated question.
It seems the OP wants to exclude rows of a data frame where the second column contains a 3-digit integer. This can be done quite easily using the 'nchar' function to count the number of characters in each number, like so:
others <- log1[nchar(log1[,2])!=3,]
We are simply creating an array with the number of characters contained in each row of column 2 and selecting that row if the number does not equal 3.