Sunday, 22 March 2015

[gccsdk] VFP revisited

Hi,

Attached is a patch which fixes all the VFP FIXME's and deals with a few of the
other issues preventing VFP/NEON from being used with trunk GCC. There are
still a couple of bits which may need some more work (more on that later), but
it brings us a lot closer to having production-ready VFP/NEON support than my
original patch from a few years ago.

Here's a rundown of the changes:

----------------------------

pthread code:

Each thread is given its own VFP context, managed by VFPSupport.

Signal handling:

Updated to produce a VFP/NEON register dump if a VFP exception has been
caught. Note that this is just a hex dump, since we don't know what data
types are in the registers. There was also a bug in the detection of
serious errors which meant that serious errors were never being detected
(the change to the LDR at line ~410 of _signal.s)

SharedUnixLibrary:

Now ensures any active VFP context is disabled when a client exits.

fenv:

When I produced my initial patch it looks like I missed out a change that was
required to get fclrexcpt working correctly with VFP. So I've merged in the
relevant bit from the VFP version of glibc.

_getcpuarch:

Now uses the cache type register to work out whether a CPU with architecture
field of 15 is ARMv6 or ARMv7 (e.g. Raspberry Pi 1 was previously being
detected as ARMv7)

_vfork:

Deactivate & reactivate VFP context over the fork to make sure the state is
duplicated correctly

_syslib:

Updated SUL version checks to require SUL 1.13 if the code was built for
VFP/NEON. Added some logic to check for required VFP/NEON features - or at
least as many features as we can assume the program might require (GCC doesn't
seem to be very good at letting code know what VFP/NEON variant is being
targeted). Also added logic to set up the initial VFP context and to make a
note of how many registers are available (for creating future contexts)

riscos-elf.h:

Add -mno-unaligned-access to the default compiler flags to stop GCC
automatically using unaligned loads/stores for ARMv6+

gcc.config.arm.c.p:

When building some (non-NEON) code but targeting NEON FPU I ran into an
internal compiler error. The cause seems to be that unless GCC is built with a
HOST_WIDE_INT as a 64bit type, it's incapable of converting some constant
values to NEON immediate constants. My fix (the second hunk of the patch, the
rest is just updates to the offsets for the preexisting hunks) is to avoid
attempting to generate the troublesome encodings if GCC has been built with a
32bit HOST_WIDE_INT. I'm not entirely happy with the patch (it will lead to
different binaries being produced by different builds of GCC), but the only
other alternative (short of rewriting large chunks of GCC) would be to force
HOST_WIDE_INT to be a 64bit type, which will presumably have a significant
performance/memory impact for the native RISC OS version, and may even
introduce bugs if we've got RISC OS-specific code somewhere which assumed a
certain HOST_WIDE_INT size. So I'll leave it to you to decide which approach
you'd prefer (I haven't actually tried a native build yet, to see how much of
an impact it would really make)

-------------------

Apart from the HOST_WIDE_INT issue, the the only other bit which I think might
need some more work is the handling of VFP exceptions in the signal handling
code. There are two issues at present:

* The error base for the VFP exceptions *might* change. I'm currently waiting
to hear back from ROOL whether this will happen or not (it looks like there
might have been a SNAFU with the error base allocation, causing it to clash
with one of ROL's allocations)

* At the moment when VFPSupport raises an exception it leaves the context in a
somewhat inconsistent state - the FPSCR will show that the exception has
occured, but other registers may be in a state from before when the exception
has occured. In particular FPEXC will generally show that the instruction that
generated the exception still needs to be processed. I've had a look through
the IEEE and C specs and can't see any obvious indications of what state things
should be left in following an exception, so I'm not sure what the correct
approach is meant to be. However I suspect things would be a lot better if
VFPSupport adopted one of the following approaches:

1. Leave things in the state they were prior to the exception being raised. I
believe this is the approach FPEmulator takes. However for VFP I don't think
this would be possible - if the exception occurs part-way through a vector
operation then the hardware may have already updated some of the
registers.
2. Complete the iteration that causes the exception (substituting an
appropriate value for the result) and then raise the error. Leave the context
in a state such that when it is restored any remaining iterations/operations
will be performed. However depending on how much stuff is allowed to happen
between the first exception being raised and the exception processing resuming
we might find that the results of the remaining operation have become invalid
(e.g. imagine if one of the source registers has been updated to a different
value)
3. Complete the entire operation (substituting appropriate values for the
results of any exceptional calculation) and only raise the error at the end.
Note that sometimes the hardware can require the support code to process two
instructions, so we have an extra choice here for whether we should process
just the first instruction or both of them. Processing just the first may fall
into the same pitfalls as option 2.
4. Make exception handling a more interactive process - basically implement
support for the IEEE 'alternate exception handling', where the application can
decide whether to raise an exception or to substitute the result of an
exceptional operation with a given value. This would then give UnixLib the
freedom to control exactly what state the context is left in after the
exception.

Note that in terms of current RISC OS targets it's only the Raspberry Pi 1
that's capable of generating VFP exceptions, and UnixLib disables all FP
exceptions on startup by default. So any changes to the exception handling
(whether in RISC OS or UnixLib) are only going to affect the small number of
users who are running code on a Pi 1 which explicitly enables trapped FP
exceptions.


If you have any other comments/questions about the code then let me know.
VFP/NEON support in GCC has now floated to near the top of my list of
priorities, so until something knocks it off that spot I should be able to
dedicate a fair amount of my time to making whatever changes are necessary to
get things to a state where you're happy with them.

Cheers,

- Jeffrey

No comments:

Post a Comment