Tuesday, 24 June 2014

Re: Disc cache worth it?

Although this reply is to Peter it applies to all the subsequent
discussion.

Firstly, as I did memntion in my original mail this feature is new and
not tuned yet so may have adverse behaviour on some systems. Please do
not draw any conclusions about the usefulness or otherwise of this
feature from *development* snapshots.

If you want stable behaviour the 3.1 release should be used. It may
well turn out that this feature is simply unsuitable for RISC OS but
we are at the beginning of a long road.

As we have seen with the issues around !Cache and now with general
perfomance, the challenges of building a cache suitable for use across
many systems are not inconsiderable

On Mon, Jun 23, 2014 at 05:58:52PM +0100, Peter Young wrote:
> I've been using the disc cache on RISC OS 2.19, ARMini, and I seem to
> have found some downsides to it, and I wonder if (a) I'm doing it
> correctly and (b) if it's worth the occasional faster opening of some
> sites.

Subsequent mails indicate you were using this correctly, as to the
"worth it" that is what we need to evaluate. I prevoiusly mentioned
it, but I shall re-iterate:

The persistant cache is purely a trade, in this case it is trading
disc resource for network resources.

I use the term disc resource carefully, it includes both storage space
*and* the time to store and retrieve the files. Similarly network
resource is the data downloaded *and* the latency getting that
data.

>
> If I load, for instance, http://www.bbc.co.uk/news/ as the first site
> of a session, it loads maybe a little faster, but then I get
> intermittent hourglass activity for sometimes up to thirty seconds,
> during which I can't do anything else. There are several other sites,
> for instance Wikipedia home page, which do the same. And the next day
> the same happens.

What is happening here is that when your browser has gone idle after
retrieving the website you have numerous objects (images html files
etc.) which are eligible to be written out to the persistent cache.

Given the front page of the BBC news site is (right now) 1.1 Megabytes
in 187 files which can rise to in the region of 2 Megabytes or more if
there are a image heavy news stories, that represents a great deal of
data to be written out.

The write out task then starts writing these to disc limited by a
bandwidth cap. In the current implementation this is hard wired to a
maximum write bandwidth of 512K/second. It had not occurred to me that
such a rate would not be achievable.

Because the code does not moderate its write rate you have to watch
the hourglass as it saves those files to disc. This is made much
worse, as as I understand it from knowlageable RISC OS people, because
disc writes are not performed in the background.

In your example if we assume you managed to get four megabytes of
cached data to be written and it took 30 seconds to achive that, we
get a write rate of around 130K/second or roughly a Megabit/second.
Your network connection does not have to be very good at all to outrun
the disc and hence the disc cache is making a horrible trade in your
case.

>
> Looking in !Cache, which is in !Boot.!Resources, I find that in the
> Caches.Default.NetSurf directory there are currently 1933 files,
> totalling 22449384 bytes. Is this to be expected, as I don't use
> NetSurf a huge amount? I've already excluded this directory from my
> daily backup, which has been taking a lot longer since I started using
> !Cache.

The web is huge these days, 22 Megabytes is literally only 20 pages on
most sites, even if the site shares those objects between multiple
pages, just visiting all the heading sections of BBC news requires
10Megabytes of persistant storage and over 1000 files.

For example google chrome by default uses 16Megabyte blocks to store
sub 16k sized objects and creates four of those to start with

It is not uncommon for a well used browser to have several gigabytes
of disc used for web caches. In most current systems using a gigabyte
of RAM let alone Disc is not uncommon.

In summary, the cache:

- Is still in development.

- Is not a panacea and will not benefit everyone.

- Is a compromise trade, and it seems for some systems with slow disc
it is literally faster to retrieve from network than from local
storage.

- Is likely to be very large to be effective as the source web pages
are large.

- Can be disabled by setting its size to zero.

I will look into adding a heuristic to disable or at least tune the
cache writeout if it detects it is exceeding the available disc
bandwidth.


--
Regards Vincent
http://www.kyllikki.org/

No comments:

Post a Comment