On Wed, Oct 24, 2012 at 02:54:49AM -0700, Dean Mao wrote:
> Here's a more compact test:
>
> <script>for(var i=0;i<n;i++);</script>
>
> Outputs:
>
> START TAG: 'script'
> CHARACTERS: 'for(var i=0;i'
> START TAG: 'n;i++);<' attributes:
> 'script' = ''
>
> Essentially everything inside a <script> tag should be treated as
> characters until a </script> tag is seen.
Yes. This behaviour you're seeing is expected. The HTML5 tokeniser has a
number of modes, which are selected by the token handler callback
provided by the client. The trivial token handler in test/tokeniser.c
does not manipulate the tokeniser mode, thus it does not handle the
contents of script (and other, similar) elements in the expected fashion.
The treebuilder implementation in Hubbub does manipulate the tokeniser
mode in the correct way. In most cases, you'll want to use the built-in
treebuilder, as it handles all the complexity of coping with junk input
for you. See examples/libxml.c for a demonstration of how to use the
built-in treebuilder.
If you do only wish to use the tokeniser, then you need to ensure that
your token handler changes the tokeniser mode in the same way that an
HTML5 treebuilder would.
J.
No comments:
Post a Comment