Force SkParagraph layout to account for ghost whitespace - c++

SkParagraph automatically compensates for "ghostwhitespace" when shaping a paragraph. I'd like to disable this behaviour and allow the line to pushed out when whitespace is introduced.
Center alignment with current behaviour:
The quick brown fox
๐ŸฆŠ ate a zesty
hamburgerfons ๐Ÿ”.
The ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง laughed.
Now adding loads of spaces after zesty: (desired behaviour)
The quick brown fox
๐ŸฆŠ ate a zesty
hamburgerfons ๐Ÿ”.
The ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง laughed.
Notice second line pushed to the left due to all the extra whitespace.
I've modified this CanvasKit fiddle to illustrate. See line 40.
I've also found this flutter issue that illustrates the issue.
I've gone through the Skia / SkParagraph source code many times over and can't find a way to introduce the behaviour I need.

Related

Evenly distribute wrapped text between lines in SwiftUI

How can I tell SwiftUI that when text wraps, I'd like all the lines to be as close to equal length as possible?
For example, I don't want this:
The quick brown fox jumps over the
lazy dog
Even if there is enough horizontal space to fit everything except "lazy dog" on the first line, I want this instead (or whatever gives the most equal line lengths for the font in use):
The quick brown fox
jumps over the lazy dog

stop short of multiple strings and characters using '^'

I'm doing a regex operation that to stop short of either character sets { or \t\t{.
the first is ok, but the second cannot be achieved using the ^ symbol the way I have been.
My current regex is [\t+]?{\d+}[^\{]*
As you can see, I've used ^ effectively with a single character, but I cannot apply it to a string of characters like \t\t\{
How can the current regex be applied to consider both of these possibilities?
Example text:
{1} The words of the blessing of Enoch, wherewith he blessed the elect and righteous, who will be living in the day of tribulation, when all the wicked and godless are to be removed. {2} And he took up his parable and said--Enoch a righteous man, whose eyes were opened by God, saw the vision of the Holy One in the heavens, which the angels showed me, and from them I heard everything, and from them I understood as I saw, but not for this generation, but for a remote one which is for to come. {3} Concerning the elect I said, and took up my parable concerning them:
The Holy Great One will come forth from His dwelling,
{4} And the eternal God will tread upon the earth, [even] on Mount Sinai,
And appear from His camp
And appear in the strength of His might from the heaven of heavens.
{5} And all shall be smitten with fear
And the Watchers shall quake,
And great fear and trembling shall seize them unto the ends of the earth.
{6} And the high mountains shall be shaken,
And the high hills shall be made low,
And shall melt like wax before the flame
When I do this as a multi-line extract, the indendantation does not maintain for the first line of each block. Ideally the extract should stop short of the \t\t{ allowing it to be picked up properly in the next extract, creating perfectly indented blocks. The reason for this is when they are taken from the database, the \t\t should be detected at the first line to allow dynamic formatting.
[\t+]?{\d+}[\s\S]*?(?=\s*{|$)
You can use this.See demo.
https://regex101.com/r/nNUHJ8/1

How to join several consecutive lines using one pattern for the first line and the other pattern for all following lines, preferably with sed?

I want to remove from this example the whole section "Derived words", both of them. So far I have come up with an idea of joining lines that follow the line "Derived words:" with that line and removing it, but I can't just join two following lines, the number of lines may differ from article to article. So, my thoughts are check if line matches the pattern '^Derived words:' then check if next line matches the pattern '^[a-z] ' if true, join together, check next line... Sounds like the job is perfectly tailored for Bash's if-then-else but I'd prefer a pure sed solution if possible.
A swift event or process happens very quickly or without delay.
Our task is to challenge the UN to make a swift decision...
The police were swift to act.
Syn:
quick
Derived words:
swiftly The French have acted swiftly and decisively to protect their industries.
swiftness The secrecy and swiftness of the invasion shocked and amazed army officers.
Something that is swift moves very quickly.
With a swift movement, Matthew Jerrold sat upright.
Syn:
quick
Derived words:
swiftly ^[[0;37m...a swiftly flowing stream.
swiftness With incredible swiftness she ran down the passage.
A swift is a small bird with long curved wings.
Expected results
A swift event or process happens very quickly or without delay.
Our task is to challenge the UN to make a swift decision...
The police were swift to act.
Syn:
quick
Something that is swift moves very quickly.
With a swift movement, Matthew Jerrold sat upright.
Syn:
quick
A swift is a small bird with long curved wings.
Thanks in advance
This might work for you (GNU sed):
sed -n '/^Derived words:/{:a;n;/^\w/ba};p' file
Use seds grep-like flag -n and when encountering Derived words: keep reading until a non-word is matched at the start of a line.
I find that when you want to work on blocks of many lines, the best tool tends to be awk, for example:
awk '/^Derived words/{skip=1} /^ /{skip=0} 1{if(!skip)print}' input
A swift event or process happens very quickly or without delay.
Our task is to challenge the UN to make a swift decision...
The police were swift to act.
Syn:
quick
Something that is swift moves very quickly.
With a swift movement, Matthew Jerrold sat upright.
Syn:
quick
A swift is a small bird with long curved wings.
This should work in regular (non-GNU) sed. There may be a way to eliminate the redundant pattern, but I haven't come up with it yet.
sed -e :a -e '/^Derived words:/N;s/\n[a-z]//;ta' -e 's/^Derived words:.*\n//'
Here's how it works:
You said that you want to remove "Derived words:" and any lines that follow it if they start with a letter (let's call those continuation lines).
So sed reads the input and echoes it to stdout, line by line, as usual.
But when it encounters "Derived words:" at the start of a line, before echoing it, it reads the next line into the pattern space and appends to "Derived words:", with a newline separating them (the N command), still echoing nothing since it saw "Derived words:". It then tries to delete that newline and the alphabetical character immediately following it (the s command).
If it can, then it must have found a continuation line, so it tries to do that again, by jumping to the start of the script (the t command, which conditionally jumps to the label "a" defined up front with the colon command) where it will append the next line and so on.
If it can't, it's left with the "Derived words:" line plus any continuation lines appended (without their newlines, which were removed) plus the next non-continuation line, which is separated from the rest by a newline.
If it then sees that it has a line that starts with "Derived words:", it deletes it up to and including the newline (the second s command) -- leaving the part that follows the newline, the next non-continuation line -- which it echoes. Then it resumes processing the input with the next line.

Regular Expression - Matching and extracting complicated conditions

I'm trying to write a regular expression that will match these conditions:
Maximum of 8000 characters (any characters, including "\r\n")
Maximum of 10 lines (separated by \r\n).
to extract from the matched text only the first 4 lines.
Can't find a good way do it...:/
Thanks!!
Regular expressions are not what you need. They are used to match a certain pattern, not a certain length. If you are holding the data in a string, myString.length <= 8000 is all you need for the character count (using the correct syntax for your language, of course). For the number of lines, you will have to count the number of \r\n sequences in your string (can be done iteratively). To get the first four lines, just find the 4th \r\n and get everything before that with a substring method.
Description
This expression does the following:
validates the input string is between zero and 8,000 characters
validates there are at most 10 line of new line delimited text
then captures the first 4 new line delimited lines of text
\A(?=.{0,8000}\Z)(?=(?:^.*?(?:\r|\n|\Z)){0,10}\Z)(?:^.*?[\r\n\Z]+){0,4} This requires options: m multiline, and s dot matches all characters
Expanded
\A anchor to the begining of the string, this anchor allows the use of the s option which allows the . to match new line and line feed characters
(?=.{0,8000}\Z) look ahead and validate there are between zero and 8000 characters
(?=(?:^.*?(?:\r|\n|\Z)){0,10}\Z) look ahead and validate there are no more then 10 new line delimited lines
(?:^.*?[\r\n\Z]+){0,4} match the first 4 lines of text
PHP Code Example:
You didn't specify a language so I'm including this PHP example to show how it works and the sample output.
Input Text
This input test is 8 lines of new line delimited strings. There are only 1779 characters here.
Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove right at the coast of the Semantics, a large language ocean. A small
river named Duden flows by their place and supplies it with the necessary regelialia. It is a paradisematic country, in which roasted parts of sentences fly into your mouth. Even the all-powerful Pointing has no control about
the blind texts it is an almost unorthographic life One day however a small line of blind text by the name of Lorem Ipsum decided to leave for the far World of Grammar. The Big Oxmox advised her not to do so, because there were
thousands of bad Commas, wild Question Marks and devious Semikoli, but the Little Blind Text didnโ€™t listen. She packed her seven versalia, put her initial into the belt and made herself on the way. When she reached the first hills of
the Italic Mountains, she had a last view back on the skyline of her hometown Bookmarksgrove, the headline of Alphabet Village and the subline of her own road, the Line Lane. Pityful a rethoric question ran over her cheek, then
she continued her way. On her way she met a copy. The copy warned the Little Blind Text, that where it came from it would have been rewritten a thousand times and everything that was left from its origin would be the word "and"
and the Little Blind Text should turn around and return to its own, safe country. But nothing the copy said could convince her and so it didnโ€™t take long until a few insidious Copy Writers ambushed her, made her drunk with Longe
and Parole and dragged her into their agency, where they abused her for their projects again and again. And if she hasnโ€™t been rewritten, then they are still using her.
Code
<?php
$sourcestring="your source string";
preg_match('/\A(?=.{0,8000}\Z)(?=(?:^.*?(?:\r|\n|\Z)){0,10}\Z)(?:^.*?[\r|\n\Z]+){0,4}/ims',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
Matches
$matches Array:
(
[0] => Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove right at the coast of the Semantics, a large language ocean. A small
river named Duden flows by their place and supplies it with the necessary regelialia. It is a paradisematic country, in which roasted parts of sentences fly into your mouth. Even the all-powerful Pointing has no control about
the blind texts it is an almost unorthographic life One day however a small line of blind text by the name of Lorem Ipsum decided to leave for the far World of Grammar. The Big Oxmox advised her not to do so, because there were
thousands of bad Commas, wild Question Marks and devious Semikoli, but the Little Blind Text didnโ€™t listen. She packed her seven versalia, put her initial into the belt and made herself on the way. When she reached the first hills of
)

What is a permuted index?

I am reading Accelerated C++. I don't understand Exercise 5-1:
Design and implement a program to produce a permuted index from the following input. A permuted index is one in which each phrase is indexed by every word in the phrase.
The quick brown fox
jumped over the fence
The quick brown fox
jumped over the fence
jumped over the fence
The quick brown fox
jumped over the fence
The quick brown fox
That explanation isn't clear to me. What exactly is a permuted index?
The term permuted index is another name for a KWIC index, referring to the fact that it indexes all cyclic permutations of the headings. Books composed of many short sections with their own descriptive headings, most notably collections of manual pages, often ended with a permuted index section, allowing the reader to easily find a section by any word from its heading. This practice is no longer common.
From: http://en.wikipedia.org/wiki/Key_Word_in_Context
ps: you can access wikipedia via http://www.proxify.com
You can find a 'live' example of a permuted index in the 7th Edition UNIXโ„ข Programmer's Reference Manual, Vol 1 (dating back to 1979). A fragment of it (from the PDF files) is:
If you look for 'account', you can find a number of related entries together. You probably wouldn't think to look for sa(1) as well as ac(1), not to mention acct(2) or acct(5) unless they were grouped together. This is the benefit of a permuted index; you can look up the key word and see it in a bigger context.
You could also look at the man page entry for the ptx(1) command in the same 7th Edition manual.
Permuted index is an alphabetic list of index surrounded by its context. In the output, observe the bold words. They are alphabetically sorted and are surrounded by its context. This makes it easy for us to search a word and directly infer its usage from the surrounding context i.e. words in your case.
The quick brown fox
jumped over the fence
The quick brown fox
jumped over the fence
jumped over the fence
The quick brown fox
jumped over the fence
The quick brown fox