Monday, 17 March 2014

Re: [PATCH] Encode non-ASCII hosts according to IDNA2008

On Sat, Mar 15, 2014 at 03:42:34PM +0100, Chris Young wrote:
> This is my attempt at adding IDNA2008 support. It is in branch
> chris/idn-punycode (this is a condensed version but same as HEAD
> there).

hi chris, thanks for taking a look at this, just fyi there is a
holding bug for this:

http://bugs.netsurf-browser.org/mantis/view.php?id=1905

overall I like the feature and there do not seem to be many problems
with your approach of simply leaving it up to the frontends to
decide if they should display the punycode raw or convert back to
their own display format.

>
> I've used libidn2 because it fully conforms to the spec, although I
> can see there may be a benefit to not using an external library for
> this. libidn2 seems to build cleanly with no patches on OS4 so I
> don't envisage any major problems elsewhere, however I've made it
> optional for now.
>

This is where it becomes a bit awkward. Using an external library gets
us a working conformant implementation but at the cost of another
build dependency.

Having looked at the libidn2 code, there are only a handful of actual
source files (not sure you even use all of the API either) and they
are vastly outnumbered by the autoconf etc. and an embedded copy of
gnulib, which is actually the problem.

gnulib is duplication of a lot of what we have in netsurf already and
seems a large burden for such a small implementation.

Also there is no pkg-config file which makes runtime detection a pain,
the author seems disinclined to add such a file either as autoconf
obviously does not require them and being a GNU project he follows the
one true GNU way.

> As I've put it in nsurl_create the non-ASCII hosts are automatically
> handled everywhere in NetSurf, and are passed back to the frontend as
> the punycode version so there should be no weird problems with
> frontend string gadgets not supporting UTF-8.

seemed sensible

>
> I've tested the Amiga and GTK frontends and things like www.bücher.de
> are handled correctly when typed in or clicked. Other frontends may
> need to ensure any URLs typed in are passed as UTF-8.

it does indeed work, all urls getting converted to the punycode format
for display currently.

Overall I like the feature, hell if it were not for libidn2 I would
have already merged it. Perhaps someone else has an opinion better
thought through than mine.

--
Regards Vincent

No comments:

Post a Comment