A workaround for a problem

My primary test systems are Debian based. One of them hosts an AMD MI50 Instinct GPU that I use for dev/test/learning.

I install various ROCm on this system. One of the packages to be installed is rocm-gdb. This is the GPU code debugger tool. This package has a hardwired dependency upon libpython3.8 or libpython3.10. Debian 12 ships with libpython3.11. Technically there is no reason why the dependency shouldn't be on libpython (>= 3.8), though this appears to be a python build/packaging issue, as python like to have multiple different versions available, so a libpython.so wouldn't make much sense in the context of a multi version installation.

That is frustrating (the python issue), and one of my many critiques of the language.

But the dependency is worse, as it prevents one from installing this on Debian 12. Where I do my work.

Ok.

So I've argued here and at work that we need to target ubiquity, make the tools available to the widest possible audience. I argue that this was one of the (many) reasons that SGI failed. Having been inside that company at the time, it was disheartening to see it have a great product and manage to restrict their TAM by erecting silly barriers that wouldn't add to the bottom line.

Current job isn't SGI. There are some insanely smart and driven/dedicated people there. The products are amazing (not just being a cheerleader, experience from having played with them). Some things though, need work.

So I need a way to use the debugger. And the .deb is built wrong, with an incorrect dependency. How do I fix this, if I can't access the source and file a PR?

Well, the magic of automation scripting. The following script will take the fully qualified path to the rocm-gdb deb, unpack it, fix the control file, add in a soft link you will need, rerun ldconfig, repackage the deb with the right dependencies.

#!/bin/bash -x

RGDB=$1 # full path to rocm-gdb_*.deb


# create a temp path
T=$$
mkdir $T
cd $T

dpkg-deb -x $RGDB rocm-gdb
dpkg-deb --control $RGDB rocm-gdb/DEBIAN
sed -i.bak 's/libpython3.10/libpython3.11 \| libpython3.10/g' rocm-gdb/DEBIAN/control

# now link the libpython3.11 to libpython3.10 shared object if it is not already linked
if [[ -e /usr/lib/x86_64-linux-gnu/libpython3.10.so ]]; then
ln -s /usr/lib/python3.11/config-3.11-x86_64-linux-gnu/libpython3.11.so /usr/lib/x86_64-linux-gnu/libpython3.10.so
ldconfig
fi

# backup the .deb, and build a new one
TS=`date +%s`
mv -fv $RGDB $RGDB.${TS}
dpkg -b rocm-gdb $RGDB

Then the new deb installs without a problem

joe@scruffy:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm

joe@scruffy:~$ rocgdb
GNU gdb (XXXXXXX) 14.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://github.com/ROCm-Developer-Tools/ROCgdb/issues>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb)

Ta-da!

So ... I'll ask around and see if I can get access to this repo soon, so I can file that PR. The fix is that single sed line in the script.

And for the record, the MI50 does work with ROCm 6.0. Its just not supported. And there are many of them for cheap on eBay. Though you will need a blower fan to help cool this.

I ran the Radeon Validation Suite (rvs) with the 3 hour test, and got the GPU to thermal trip a few days ago. I need a better cooling solution, so I might buy a second, and do my water block hack on it. I have an MI25 I did that with, and it worked great. Sadly, I don't have a machine for that card now.

And I need more PCIe slots. But that's another story.

Show Comments