Monday, 11 February 2013

Re: font_split change, action required

In article
<OUT-51193FEB.MD-1.4.17.chris.young@unsatisfactorysoftware.co.uk>,
Chris Young <chris.young@unsatisfactorysoftware.co.uk> wrote:
> On Mon, 11 Feb 2013 00:10:08 +0000 (GMT), Michael Drake wrote:

> > For example, see:
> > http://www.netsurf-browser.org/welcome/index.ja
> > If you only split at spaces, that paragraph will only ever be one long
> > line. (There are no spaces to split on.)

> Which it is here.

Indeed. It always has been on all platforms. Until now, the GTK front
end was paying the performance price of proper Unicode handling, but then
it had to search back the string from that point to find a space to
satisfy the API constraint. (Then it had to measure the width of what it
actually ended up deciding to split at.)

> I don't see why all this needs to be in the frontend.

It stops the core from preventing any perfectly good Unicode line breaking
available in the front end.

> The core can do all that.

Sure. But the HTML layout code needed fixing to handle splits on
non-space characters anyway. Now that that is done, it was trivial to add
full Unicode line breaking support on platforms that provide it, so I did.

In the long term we'll get it working everywhere, but it's not going to
happen right now.

> Advantages:
> * nsfont_split is no longer needed (core can use
> nsfont_position_in_string and nsfont_width)

Position in string should find the offset nearest the passed coordinate.
So if you click on the left hand side of a letter 'm', then the caret is
placed to the left, and on the right, the right. nsfont_split is looking
for a point less than an available width as priority.

> * Consistent splitting behaviour across all platforms - splitting on
> characters other than space only needs to be added once

That once will be a lot of developer effort.

> * More complex splitting could be added, for example splitting
> mid-word and hyphenating depending on language rules

We need to implement the Unicode line breaking algorithm:

http://www.unicode.org/reports/tr14/

> Disadvantage:
> nsfont_width might need to be called in addition to
> nsfont_position_in_string. There's a potential speed penalty if the
> actual width post-split is needed by the core, depending on the
> current implementations of nsfont_split, although I can't see why it
> would need that information very frequently (only if it reaches #6?)

It always needs the actual width of the text post-split. In the HTML
handler, the text box is actually split into two boxes. (See how
underlines on e.g. hyperlinks break at spaces after the space has been
used for a line break.) The width of the split box is needed because when
the page reflows, it uses the cached box width for layout decisions and
positioning.

--

Michael Drake (tlsa) http://www.netsurf-browser.org/

No comments:

Post a Comment