create glyphs with diacritic marks - diacritics

I would like to add a letter h with a dot below (U+1E25) to a free font from google. Is it possible to just duplicate the letter h and paste it to another position and draw a dot under it?
If so, what are the exact steps?
Thanks
Unfortunately i can't find the time to learn fontforge. I have attempted but have not been successful

Related

How to find and replace box character in text file?

I have a large text file that I'm going to be working with programmatically but have run into problems with a special character strewn throughout the file. The file is way too large to scan it looking for specific characters. Most of the other unwanted special characters I've been able to get rid of using some regex pattern. But there is a box character, similar to "□". When I tried to copy the character from the actual text file and past it here I get "�", so the example of the box is from Windows character map which includes the code 'U+25A1', which I'm not sure how to interpret or if it's something I could use for a regex search.
Would anyone know how I could search for the box symbol similar to "□" in a UTF-8 encoded file?
EDIT:
Here is an example from the text file:
"� Prune palms when flower spathes show, or delay pruning until after the palm has finished flowering, to prevent infestation of palm flower caterpillars. Leave the top five rows."
The only problem is that, as mentioned in the original post, the square gets converted into a diamond question mark.
It's unclear where and how you are searching, although you could use the hex equivalent:
\x{25A1}
Example:
https://regex101.com/r/b84oBs/1
The black diamond with a question mark is not a character, per se. It is what a browser spits out at you when you give it unrecognizable bytes.
Find out where that data is coming from.
Determine its encoding. (Usually UTF-8, but might be something else.)
Be sure the browser is configured to display that encoding. This is likely to suffice <meta charset=UTF-8> in the header of the page.
I found a workaround using Notepad++ and this website. It's still not clear what encoding system the square is originally from, but when I post it into the query field in the website above or into the Notepad++ Conversion Table (Plugins > Converter > Conversion Table) it gives the hex-character code for the "Replacement Character" which is the diamond with the question mark.
Using this code in a regex expression, \x{FFFD}, within Notepad++ search gave me all the squares, although recognizing them as the Replacement Character.

Regex expression for searching spaced/broken words in OCR PDFs (goo d ni g ht)

I need searching lots of OCR PDFs. I realized the words and sentences are perfect visually, but if I copy an paste the content, there are spaces which shouldn't be there!
I can see in the text: good night
If I copy and paste somewhere: goo d ni g ht
I would appreciate advices to handle this situation through a Regex expression considering:
a) The simple example for short words as \bgood night\b for goo d ni g ht
b) When there is line break in the sentence. I mean, the Regex expression isn't able to search from one line to another in the PDF even the paragraph is the same. In looking for
\bthe sun set and the night comes\b , but the PDF content is like that when pasted:
line 1: t he sun set an d th e
line 2: nig ht co m es
Many thanks,
Cadu
This random occurence of spaces in the middle of words can happen in PDF.
The reason behind it is the complex format that PDF actually is.
You see, a PDF document is actually a container of instructions for rendering the text in a viewer.
Imagine instructions like:
go to position 50, 50.
draw the character 'G'
go to position 56, 50.
draw the character 'O'
etc
Whenever you select something in a viewer (for instance Adobe), the program has to figure out what content overlaps with your selection (already this is not an easy problem). If it's text, it then needs to decide where to add spaces and line-breaks. Different viewers (or software) might use different metrics for this. A typical one for instance is "insert a space if two characters are further apart than the width of the space character in the same font"
The point is, getting text out of a PDF document is always kind of guesswork. And if you add the fact that it's an OCR PDF, you are adding a further layer of difficulties.

Find and replace specific characters in Word headings

I have a huge size word document (32MB). All I want to do is to find and replace the following characters with something (say space or nothing, that is remove them) but only in all the headings (h1 to h9 or every heading level being used) and not anywhere else:
,-_;~%&*()?/.
Can anyone help? I want to do it using a macro. I know the manual way using the Find dialog but it is too cumbersome because of the huge size of the document. Thank you for help.

Separating out a list with regex?

I have a CSV file which has been generated by a system. The problem is with one of the fields which used to be a list of items. An example of the original list is below....
The serial number of the desk is 45TYTU
This is the second item in the list
The colour of the apple is green
The ID code is 489RUI
This is the fourth item in the list.
And unfortunately the system spits out the code below.....
The serial number of the desk is 45TYTUThis is the second item in the listThe colour of the apple is greenThe ID code is 489RUIThis is the fourth item in the list.
As you can see, it ignores the line breaks and just bunches everything up. I am unable to modify the system that generates this output so what I am trying to do is come up with some sort of regex find and replace expression that will separate them out.
My original though would be to try and detect when an upper case letter is in the middle of a lower case word, but as in one of the items in the example, when a serial number is used it throws this out.
Anyone any suggestions? Is regex the way to go?
--- EDIT ---
I think i need to simplify things for myself, if I ignore the fact that lines that end in a serial number will break things for now. I need to just create an expression that will insert a line break if it detects that an upper case letter is being used after a lower case one
--- EDIT 2 ---
Using the example given by fardjad everything works for the sample data given, the strong was...
(.(?=[A-Z][a-z]))
Now as I test with more data I can see an issue appearing, certain lines begin with numbers so it is seeing these as serial numbers, you can see an example of this at http://regexr.com?2vfi5
There are only about 10 known numbers it uses at the start of the lines such as 240v, 120v etc...
Is there a way to exclude these?
That won't be a robust solution but this is what you asked. It matches the character before an uppercase letter followed by a lowercase one. You can simply use regex replace and append a new line character:
(.(?=[A-Z][a-z]))
see this demo.
You could search for this
(?<=\p{Ll})(?=\p{Lu})
and replace with a linebreak. The regex matches the empty space between a lowercase letter \p{Ll} and an uppercase letter \p{Lu}.
This assumes you're using a Unicode-aware regex engine (.NET, PCRE, Perl for example). If not, you might also get away with
(?<=[a-z])(?=[A-Z])
but this of course only detects lower-/uppercase changes in ASCII words.

c++ create text fits edit box

Well.. I know that title is not that clear, I couldn't think of better one.
I wanna know how to do this...
when you have edit box and it only can show 10 characters.
Something like this
ssssssssss
let just say i have more than 10 characters. Some of them will go in the back.
Like we have this string "123456789010" it will show just these ones "3456789010".
My problem is that some characters are small and don't take that much space and some do.
So i can't find a way to break the string and get some characters in the back.
any idea?
Try this in Style type in edit box use ES_MULTILINE for use multiple lines.
edit1=CreateWindowA("edit","edit box",WS_CHILD|WS_VISIBLE|WS_BORDER|ES_MULTILINE,120,160,200,200,hWnd,(HMENU)IDI_EDIT,hInstance,0);
You can calculate the display-length of the string in your control (there are several function for that) and adjust the size of the control accordingly.
you only want to see the far left or far right?
here is your string "0123456789"
you can only display 5 values due to pixel size of box....
do you want it to be "...56789" more like "56789"
or "01234..." more like "01234"?