On Tue, Apr 12, 2016 at 02:49:39PM +0100, netsurf@avisoft.f9.co.uk wrote:
> There was much discussion about a year ago about the cache performance on
> RISC OS, and there were some code changes, but I would like to add the
> results of some investigations of the Netsurf v3.4 cache on my Iyonix,
> running RISC OS 5.23 (11 Oct 2015).
I am beginning to think this feature should never have been enabled
for hoplessly legacy operating systems such as RISC OS.
I will go over how this feature works once again. This is in response
to this message but is, as usual, aimed at all users.
To be clear a cache in any computer program trades one resource for
another.
Generally a web browser will have numerous caches for different
uses. In NetSurf we have three main caches:
1. one held in RAM for decoded images
This cache trades processor time (used to decode images from
compressed source formats like jpegs) for memory (to hold the
decoded images).
Without this cache scrolling a page would be glacial as every
time an image needs to be plotted, even if it is only a single
pixel of it, we would need to decode the entire source image
2. a cache for source objects (the stuff downloaded from web servers) held in memory
This cache trades memory for network bandwidth used downloading
source objects.
Without this every time a page navigation happens within a
website all the css, images, javascript etc that did not chnage
must be downloaded again which would quickly make browsing
unusable.
3. a cache of source objects held on disc.
This cache trades disc space and bandwidth for memory.
Although not immediately obvious the memory is that from the
previous cache and indirectly could be seen as network bandwidth
used. This is known as a cache hierachy where one cache backs
another.
The memory cache size setting deals with the first two of these caches
and the disc cache settings the third.
>
> In the past I had problems with the cache taking large amounts of disc
> space, and the resulting long backup times for !Boot, so my current
> settings are 10MB space, expiring after 2 days.
The average web page is now well over 2 megabytes [1] and is growing
rapidly all the time. You would be much better served having no
persistant storage (disc) cache enabled at all by setting its size to
zero than by a small one like this.
As I keep emphasising again and again the cache is a trade of one
resource for another in an attempt to reduce the oveall time to
perform the action of visiting a web page. If you are not able to make
that trade a net profitable transaction you are better off not doing
it at all.
The RAM, CPU and disc overhead for enabling the cache greatly exceeds
your settings which probably require a minimum of a few hundred
megabytes and several weeks to make the trade worthwile on RISC
OS. In fact I will add a feature request to the tracker to have a
minimum viable size for the cache size options.
I fear your expectations around sizes of resources are a little out of
date. The default cache sizes on PC platforms is 128 megabytes of
memory and a gigabyte of disc. Even these are pretty restrained, for
example: my desktop has a recently started copy of chrome with a
handful of tabs open and thats reporting well over a gigabyte of
memory used and several gigabytes of disc.
It is not uncommon for standard PCs to have 8 gigabytes of memory and
a terrabyte of hard drive space accessed at rates measured in
hundreds of megabytes a second. I know RISC OS has no hope of getting
anywhere near such resources but it must be understood that the
modern web is orientated around systems of this magnitude of capability.
[1] http://www.soasta.com/blog/page-bloat-average-web-page-2-mb/
>
> However, the actual space usage was 45MB (as measured by Filer Count),
> and it contained 210 files. What was more difficult to find was that
> there were 7,298 directories with 8 levels, which occupied another 14MB,
> of which 6,412 contained no files at any lower level. So only 886
> directories actually contained the 210 files of cached data. Enumeration
> of the cache took about 2 minutes.
>
It is possible you had a cache left over from an earlier version of
NetSurf where small files were stored in separate files. Cache
improvements merge all smaller files into a few large index files and
only use the directories for larger files.
Regardless the directories are not accounted for as on most OS they
are a very low cost resource and are never enumerated. There is a well
known "cache" indicator file created which on most systems is used to
indicate to other software that the contents of the directory are not
at all "valuble" and may be discarded at will and should not be
enumerated.
> I decided to delete all 6,412 directories that contained no data, saving
> about 12MB of disc space. More importantly, counting or enumerating all
> the cache now takes about 7 seconds. There are still the same number of
> files and cached bytes.
you can always delete the entire cache (without netsurf running) at
any time without having any impact at all except it will require all
source files to be retrieved from the network.
>
> Netsurf itself still seems to work, but I have not noticed any change in
> performance.
>
as already stated: with those settings any possible long term benefit
you might gain is being lost to overhead as RISC OS disc system is generally
poor.
> So, some questions:
>
> - When are cached files deleted to meet the configured size & expiry?
The cache is pruned only when adding a new entry which causes the
overall cache usage to exceed the set level. at that point the least
"valuble" objects are discarded untill the size drops below the
desired size. This process is subject to 10% hysteresis to avoid
excessive thrashing.
No account is taken of the overheads like directory size or block
sizes in the usage caclulations on the assumption they will be small
compared to the cached data and overheads are computationaly
expensive to determine for little gain.
> - Are directories included in the space used?
no
> - Are directories ever deleted? If so, when?
no
> - Will deletion of empty directories cause any problems for Netsurf?
no
>
> I have looked at the help ... but that says that files are not deleted by
> Netsurf, and makes no mention of directories. It also refers to a
> 'Perform maintenance' button which can be used to delete redundant files
> ... but this is nowhere to be seen!
The help and manual are out of date and refer to long since removed
functionality. It might be useful to include a "purge" cache
functionality for security reasons and perhaps ensure integrity when
the cache size values are changed. again a feature requiest to cover
this will be created.
>
> Martin
>
>
>
--
Regards Vincent
http://www.kyllikki.org/
No comments:
Post a Comment