Thursday, 8 March 2012

Re: [evaluation] python bindings to netsurf for pyjamas-desktop

On Thu, Mar 8, 2012 at 3:44 PM, m0n0 <ole@monochrom.net> wrote:
> Am Donnerstag, den 08.03.2012, 05:24 +0100 schrieb lkcl luke
> <luke.leighton@gmail.com>:
>
> Hello,
>
> I believe the the guy who is pushing netsurf the most, is currently
> on Vacation(?), and so maybe it would be good if we wait for what
> John Mark Bell will say about it.

ahh ok. well there's plenty of time.


>> (observation: potentially, the reduction in code size could well
>> result in increased speed on slower CPUs especially embedded low-end
>> processors with smaller or zero sized caches.  bottom line: it's not
>> clear-cut!)
>
>
> I still don't get why you want to have such an component model.

answer 1) is ... tactical / strategic.

you know i did samba's nt domains implementation, right? it was ..
eek, 1996. i didn't get DCE/RPC. i'd looked at it, and went "this is
useless! there's nooo waaayyy i have to know annnnything about all
this. *iii* can handle it" :)

threeeee years later i was seriously wishing i'd taken the time and
patience to properly study DCE/RPC because by the time i was done ...

so, i'm not saying you _should_ put in such a beast, but i'm saying
that from experience, you're at an early stage where, if i was in your
(collective) position, i would consider very very carefully evaluating
and studying the options, and taking the time to do that _before_
going ahead.

the primary reason is that because netsurf has the goal of being
fast, small and running even on low-speed processors such as 200mhz
ARMs, any decision that you make now is going to take you at least 2
man-years of effort and could potentially jeapordise the project if
you pick the wrong approach.

answer 2) is detail-orientated.

language bindings involve the following:

1) mapping one-for-one with each and every single function (DOM in
this case: 2,000+ of them)
2) mapping one-for-one with each and every single property (DOM:
20,000 properties yes really twenty THOUSAND)
3) converting between the argument types on the function calls from
one language to the other
4) converting the _return_ result of the function into the target language
5) taking care of ref-counts on objects so that when an object goes
out of scope in the language-bindings its corresponding object gets
de-ref'd too
6) taking care of returning the *same* object if it has already been
referenced once before (so that == works in the target language).

this is complicated as hell, m0n0, and it's something that has been
done "to death", by several teams, across decades of computing
science.

you _can_ choose to ignore the experience and knowledge of those
people, but i can pretty much predict with 99% accuracy that should
you choose to have ignored that experience and knowledge, you will
soon wish that you hadn't :)

> I'm
> not saying that I know enough about it, but anyway - there's always
> another way. You talk like there is no alternative.

no i'm not. i'd love to hear of alternatives. however... having
dealt with all the major browser engines and a couple of the obscure
ones as well, i've been around the block shall we say, and they
genuinely fall into just those two categories [a COM of some
description or a massive half-way-house auto-generator]

if you can think of or have heard of an alternative to any of the
technology i've mentioned, i'd love to know what it is because i could
well turn out to make *my* life easier in the future.


> But on the other
> extreme, there can be one single dispatcher function to access ALL
> DOM properties.

that's right. that's what a Common Object Model (COM, CORBA, DCOM,
XPCOM, GObject-Introspection) does. but to do that, you have to have
some sort of "representation".

take the libdom vtables for example (they look damn good btw!)

but think: how the heck is the "single dispatcher function" supposed
to *know* about the layout of those vtables? how does it "know" that
the first pointer is to.. ok lemme grab an example...

/* DOM Attr vtable */
typedef struct dom_attr_vtable {
struct dom_node_vtable base;

dom_exception (*dom_attr_get_name)(struct dom_attr *attr,
dom_string **result);
dom_exception (*dom_attr_get_specified)(struct dom_attr *attr,
bool *result);

right.

how is the proposed "single dispatcher function" supposed to "know"
that the first pointer is to dom_attr_get_name ?

how is it supposed to "know" that the first argument is a struct
dom_attr and the second argument is a dom_string?

do you see what the problem is?

so that's what these "IDL" files are all about. the IDL files allow
you to write (or use) a program which either auto-generates the
"understanding" or it auto-generates a binary-formatted "type library"
which the middleware-of-your-choice can use to go "oh look, function
ABC is number 5 in the array in my type-library, which tells me that
argument 1 is of type dom_attr, argument 2 is a dom_string and the
return result is an xyz".

middleware such as COM, XPCOM, CORBA, GObject-Introspection, it's
*that* f*****g sophisticated.


> That would mean you have to Implement on interface
> to ONE C function. That doesn't sound evil to me.

it's not that easy :)

> Also, it sounds to me, like you do not keep in mind that:
>
>  - NetSurf is C, not C++

i'm aware of that. GObject-introspection, COM and XPCOM all deal
with c. it's a little awkward: you have to follow some very specific
design rules related to refcounting, but you've implemented
refcounting in libdom so i know you're intimately familiar with that.

>  - Other Languages can call C functions, why should netsurf
>   be SUPER-NICE and offer the posssibility for other Languages
>   to call DOM stuff?

the choice of language is itself irrelevant. it's just that if
you're going to do it even for one language, it's much much easier to
add more once that's done... *if* you think about it carefully in
advance.

.... or, if you use COM or GObject-Introspection you get *all*
programming languages "for free". that's the whole point.


> Especially Python has an nice interface to C,
>   AFAIK.

yes. again - it's not that easy: dynamic languages have refcounting,
garbage collection etc. the refcounting basically has to map
one-to-one with the DOM object being "represented".


>  - For my system, there exists no clean way for Shared Libs,
>   everything is linked statically, sometimes I'm thinking
>   that your approach may thinks that shared libs are available
>   everywhere.

GObject-Introspection is statically linkable, i believe. the
"webkit-style" language bindings (the pythonwebkit ones and the
gobject ones that i did) were c, despite binding to c++.

that made things a bit... interesting, on the ref-count side, i can tell you :)


>  - When linked statically, type safety is ensured by C, at least
>   when you prototyped everything fine. I believe that's also the
>   case for shared lib's - so I still see no reason for an COM.

it's because of the points 1-6 above. you either have to implement
that *entirely* yourself... or you can use an auto-generator - a
compiler with corresponding well-defined "Interface Definition
Language" files.

>   ( + the Components still need to stick to CDECL rules anyway )

yes - that's not a problem.

>>  b.iii) GObject with Introspection.  this one is interesting, because
>> it turns out that gobject-introspection now has *multiple* language
>> support... including *two* separate javascript bindings!  in other
>> words if you were to adopt gobject as the netsurf middleware, should
>> you choose this route, you'd *automatically* get javascript
>> bindings... *for free*... as well as several other programming
>> languages.
>
>
>> this is why middleware is so powerful - as you no doubt, as free
>> software programming experts, already know.  but please bear with me:
>> i am writing to a wider audience with the above, so for *their*
>> benefit it has to be spelled out.
>
>
> Sorry, I still don't get why it is so powerfull, + I don't get how it
> is related to JavaScript within NetSurf.

it's because i know how much work is involved in doing 1-6 (above)...
regardless of the language. javascript, python, c++, java, it doesn't
matter: it's a *lot* of work.


> Please be patient with me ;)

:)

> 1. We need a wrapper to javascript engines, at least I think that's
>   nice to have. So people can compile with the JavaScript implementation
>   available to their system ( my system is limited to spidermonkey, maybe
> KDE-js ).

ok. this is of course assuming that you _want_ javascript - at all.
you would not have implemented libdom if you didn't: there's no need.

so, yes, it would be nice to support multiple javascript languages...
but wait... each one is different! so if you choose the
"webkit-esque" approach, you must have two, three, four, five
*different* sets of 7,000+ hand-coded "support" code which takes care
of the 1-6 stuff above, only a relatively small amount of which can be
auto-generated!

... ooorrrrr.... you can use "middleware" such as COM, XPCOM,
GObject-Introspection etc. as an "intermediary", such that you save
yourself vast amounts of implementation time.

for example: if you chose GObject-Introspection, you would *only*
need to implement a libdom-to-gobject system, and then you would
*automatically* gain those two javascript languages that are being
developed by the gnome team: the addition of others would be a small
task probably about 3,000 lines of code or so.


> 2. We need a glue to implement DOM objects within JavaScript.
>   ( your glue is named COM, right? I do not mean COM in Microsoft slang,
>     but an Component Object Model in general )

yes. confusing, but yes.

> 3. The glue routes function calls from javascript to libdom and vice versa,
>   which will trigger events (right?).

absolutely.

the middleware - whatever it is - even takes care of "wrapping" the
callback function(s) for you. such that the event within libdom will
call the callback function, the arguments will *automatically* get
type-converted into the target language etc. etc.

if you don't use some sort of middleware you will have to do this
*manually*. and how many different event types are there? there's
over 50, aren't there. that's a hell of a lot of manual coding when
middleware will get you that code *for free*.

> 4. I still do not see the need for COM. Please tell me why and why it is so
>   important.

1-6. can we come back to this question once you've read those thingies?

> 5. The glue must be C, because it must be fast.

that's perfectly possible. most of these middleware libraries - the
ones that are any good - are written in c. but by their very nature,
they allow translation between other programming languages
automatically. some of those languages *use* that middleware easier
than others. python and other dynamic programming languages are
easiest; object-orientated statically-typed languages are next, and
the most awkward is non-object-orientated statically-typed languages
like c... but it *can* be done. the rules cannot be taken care of by
the programming language, though, so you have to do it yourself.
macros etc. ugh :)

>   my wish, but if someone wants to replace the C glue with somethin
>   like python, I guess that's still possible, no need for
>   COM.

it's not that simple.

>
> Do you have a graphic which shows with on single look, why COM is so
> important? :)

naah :)

might be able to find one though....

hmmm....
http://en.wikipedia.org/wiki/Component_Object_Model#Interfaces

nope.

http://www.polberger.se/components/read/com.html

"Sun's OpenOffice.org productivity suite uses the COM-inspired UNO
component model to enable OLE-like features,"

ooooo! there's another one to investigate!

" In the embedded realm, many component models are heavily influenced
by COM. Philips's and Samsung's Universal Home API (UHAPI), for
electronics appliances in the home, borrows heavily from COM with its
uhCOM technology. The Symbian operating system for mobile handsets
uses a COM-inspired component model called ECom (Symbian Foundation
2008). ABB uses COM-like technology for their programmable controllers
to increase the modularity of their codebase (Lüders et al. 2005)."

oo fantastic! there's a _ton_ of these things out there.

... but you notice something, right? these are *BIG* named
companies. they all wrote their own "COM-inspired" implementations.
you have to think: why did they do that? why did they not just go and
expect their programmers to just... manage *without* Common Object
Model Technology?

you look at the two most successful computer companies in the world.
microsoft - whose technology was founded on COM, and apple, whose
technology was founded on Cocoa, using Objective-C, where the "Common
Object Model" concept is *built in to the language*.

this isn't a coincidence :)

l.

No comments:

Post a Comment