[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: rocm-device-libs for clang-16?



Hi Cory,

I've been pondering this mail for a while, as it indeed seems that there
is no easy solution.

On 2023-06-19 06:41, Cordell Bloor wrote:
> I've been thinking about how to transition HIP from clang-15 to clang-16
> (which will be necessary to enable support for RDNA 3 GPUs).
> 
> In retrospect, I think that omitting the LLVM version number from the
> rocm-device-libs install path was a mistake. The amdgpu bitcode files
> are only compatible within a single major version of LLVM. For example,
> when the rocm-device-libs are built with clang 15.0.6, the resulting
> bitcode files will be compatible clang 15.0.7. However, they are not
> guaranteed to be compatible with clang 16.0.0.
> 
> If we update the rocm-device-libs to be built with clang-16, then it
> might not be possible to use them with clang-15. Nevertheless, they will
> still be installed in a path where clang-15 will find them. This may
> result in confusing behaviour for any users that continue using clang-15
> to build their projects.
In no particular order, and including some of your suggestions below,  I
think these are our options:

  (1) we could add a Breaks: clang-15 so rocm-device-libs (>= x.y.z) so
      that it cannot be co-installed with clang-15, but that would be
      the sledge hammer approach.

  (2) we just could go ahead without special handling, and add a NEWS
      file to rocm-device-libs warning users about the use together with
      clang-15, suggesting they use clang-16 for ROCm work instead.
      These messages are presented to users in interactive contexts (and
      also sent by mail to root, if I'm not mistaken)

  (3) We could ask the stable RT to approve an update to clang-15
      (support major-specific resource directory, symlink 15.0.7 to it),
      assuming #1038991 you filed is resolved, and fix this in some
      bookworm point release.

  (4) We could add symlinks from the resource directory to a stable
      rocm-device-libs path (though how would those symlinks look?)

  (5) We could fix this by shipping an LLVM config file.

Anyone, please amend if this is incorrect.

I'm mostly worried about bookworm, because whatever solution we pick
needs a safe upgrade path from it. Not just for trixie, but also for the
case where we start providing bookworm-backports.

> At the moment, we're also relying on a Debian-specific patch to
> clang-15 to find the rocm-device-libs in 
> /usr/lib/$(multiarch)/amdgcn/bitcode and I'd rather not bring that
> patch forward into clang-16.

Agreed.

> Ideally, the rocm-device-libs would be installed to the clang resource
> directory. This is where they are installed on Fedora [1] and it is
> where they are planned to be placed in future versions of the upstream
> ROCm project. It is also one of the default search locations for clang,
> so we will be able to drop our patch. The problem is that if you run
> `clang++-15 -print-resource-dir`, you'll find that on Debian the path is
> `/usr/lib/llvm-15/lib/clang/15.0.7`. That path includes the LLVM patch
> version and is therefore not stable. In Fedora, installing to that
> location is probably easier because the LLVM resource directory is
> /usr/lib64/clang/16.

Do you happen to know from LLVM's "official" POV over which versions the
resource directory is supposed to be stable?

Because if it is indeed the major version, then Debian using 15.0.7
would probably be a bug, warranting a higher severity for the #1038991
you filed, and that could easily be rectified with approach (3) above.

> Does anybody have thoughts on how we might move the rocm-device-libs
> into the clang resource directory? Or, maybe on how to create a symlink
> in the resource directory that points to a more stable rocm-device-libs
> install path?

As long as a path in one package doesn't conflict with path in another
package, I think one is free to install what one wants, so the symlink
approach should be fine.

But how would that symlink look like, though? I guess we'd need to
introduce a split in rocm-device-libs, have clang-15 link to one split,
and clang-16 to the other (or both)?

> Or, perhaps some way to create a system LLVM config file
> [2] to do the same?

I'm sure we could do this by installing it in the system directory, but
here, too, I'm not sure how exactly this would look. Would it be
possible to set something up like this without this config having side
effects on anything else?

> I'm not sure of the best way to handle this, but I think we're going
> to need to figure out a solution before we can move on from clang-15.
Agreed.

Best,
Christian


Reply to: