[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#1036158: gcc-13: Please raise baseline for alpha to EV56



On Wed, May 17, 2023 at 11:27:43AM +0200, John Paul Adrian Glaubitz wrote:
> Hi Michael!
> 
> On Tue, 2023-05-16 at 20:25 +1200, Michael Cree wrote:
> > On Tue, May 16, 2023 at 09:38:56AM +0200, John Paul Adrian Glaubitz wrote:
> > > After a long discussion on IRC and the mailing list, we have agreed to raise the
> > > baseline for the alpha architecture to EV56 to improve the generated code and fix
> > > a number of issues. The change is already being implemented in the glibc packages
> > > which switches to EV56 [1] since hwcaps are no longer available with glibc 2.37 [2].
> > > 
> > > Could you raise the baseline for gcc on alpha to EV56?
> > > 
> > > I assume, it should be "--with-cpu=ev56" or "--with-arch=ev56".
> > 
> > Yes, please!
> > 
> > I suggest the following in debian/rules2:
> > 
> > ifneq (,$(findstring alpha,$(DEB_TARGET_ARCH)))
> >   CONFARGS += --with-cpu=ev56 --with-tune=ev6
> > endif
> > 
> > (the --with-tune only affects instruction scheduling and better tunes
> > code for ev6 and more recent machines, but allows execution down to
> > ev56.)  I have tested this in the past with a rebuild of most packages
> > that are in the base essential chroot in the past and it works well.
> 
> Doesn't that come with a speed penalty for EV56 machines? I'm asking because EV56 is
> currently the baseline for QEMU when emulating Alpha.

I was under the impression that qemu was ev6/ev67 being machine type
clipper which emulates an ES40.  Am I mistaken?

With regards instruction scheduling EV56 is in-order two-instructions
[1] executed per cycle.  EV6 and EV67 are out-of-order [2]
four-instructions executed per cycle. Hence, for ev6/ev67 it can be
advantageous to bring forward instructions that are data (operand)
ready and delaying by four cpu-instructions those that depend on a
result of a previous instruction instead of placing them immediately
after the previous instruction to guarantee they don't waste an
instruction slot in the same cpu cycle. [3]

The deleterious impact on ev56 of doing this will be very small to
utterly negligible.  It is not worth worrying about.

Regards,
Michael.

[1] Here I am talking about most integer register/register operate
instructions.  Memory, integer multiply and floating-point instructions
have longer latencies.

[2] Note that out-of-order does not mean the cpu can bring forward
data ready instructions that have not yet been seen in the instruction
pipeline.  That is why we ask the compiler to place them earlier.

[3] Even more advantageous on ev6/ev67 is to loop unroll and evaluate
two iterations of the loop in parallel, i.e., intertwine the two
computational pathways.  When I did tests some time ago with gcc (4.6
and earlier versions) the compiler did not do this well, whereas my
manually optimised machine code was getting better than three
instructions executed per cpu cyle on certain code.


Reply to: