Local installation of DeepTMHMM

For one of our ongoing metagenomic projects I needed to split predicted proteins into cytoplasmic and transmembrane groups. After looking at a couple of different options I opted to use DeepTMHMM, part of the biolib framework. DeepTMHMM has a wonderful wrapper that submits jobs to a remote server for analysis. This is great for relatively small numbers of sequences. However, even after clustering by similarity I have well over 105 predicted proteins. The wrapper would be terribly inefficient for this many sequences, and while I couldn’t find any explicit limits on the number of submissions, I presume that it would be considered rude to bomb the server with thousands of fasta files. Thus began my odyssey to run DeepTMHMM locally, and Homeric the journey was. Unfortunately, there was no way to document the step by step which involved serious and increasingly frantic troubleshooting of both hardware and software. Instead I present some cautionary notes on things that didn’t work and solution that ultimately did.

DeepTMHMM makes use of a GPU via PyTorch. None of our servers were equipped with a GPU so step 1 was to acquire one. I have a pretty good, though elderly, Linux box in my office for development work. Some discussion with the manufacturer (Puget Systems) assured me that the motherboard would be compatible with a relatively modern GPU. I selected an NVIDIA GeForce RTX 4060 as a good compromise between cost and performance and a couple days later was ready for surgery. Here’s where some initial mistakes were made. I admit them here so that others may find humor.

Mistake 1 – I borrowed some power cables for the GPU not appreciating that these are specific to the power supply unit. The resulting (mildly traumatic) incident somehow marred the boot sector of the boot drive, though the drive itself was fine. While recovering from that I made Mistake 2:

Mistake 2 – Updating operating system. Development workstation was (is) running Ubuntu LTS 20.04 which is nearing end of support. Since I was replacing the boot drive anyway it seemed like a good idea to modernize things to LTS 24.04.

After a bit of work I had everything up and running (with original power cables which I thankfully had been toting around for 10 years) and the nvidia-smi command showed the GPU alive and talking to the system. Time to install DeepTMHMM.

The “preferred” way of running DeepTMHMM locally is via a docker container. Let the record show that I do not recommend this option. I tried this on the workstation and via WSL on my GPU-equipped laptop. Buried deep deep deep in the stack are some dependency conflicts that prevent a modern implementation of docker from reading the output of the code running inside the outdated container. After a few days of troubleshooting I abandoned this in favor of a local install of DeepTMHMM.

You can obtain an academic license and copy of the software by emailing licensing@biolib.com (thank you ChatGPT for this solution… I don’t think I would have found it otherwise). On the surface it looks straightforward enough. There’s a helpful README file with install instructions and a reasonable number of dependencies. The trick is with the dependency versions.

Following standard best practice I isolated everything in a conda environment using a Python version that matched the docker container (3.8.20). I felt my way through the rest of the dependencies and it all looked good, but regardless of what I tried the specified version of PyTorch couldn’t run code on the GPU. I can’t recall all or even most of the things I tried, but at some point I reached the end of the line and decided to recreate a historically accurate place for DeepTMHMM to call home. This meant reinstalling LTS 20.04 and an older NVIDIA driver (570.133.07). I did need a more recent version of CUDA (12.8) than was bundled with PyTorch, and PyTorch 2.0.1 instead of the one indicated in the README. Magically, and most unexpectedly at this point, everything worked and DeepTMHMM is happily chewing on my data. Should take about 90 hours to run 200,000 or so predictions.

Here’s the final recipe that worked and a modified version of the original README:
OS: Ubuntu LTS 20.04.6
GPU: NVIDIA GeForce RTX 4060
CUDA: v12.8
NVIDIA driver: 570.133.07

### DeepTMHMM 1.0 - Academic Version ###

### Installation ###

# Install system-wide dependencies

sudo apt-get install libhdf5-dev

# Setup a virtual environment

conda create -n deeptmhmm python=3.8
conda activate deeptmhmm

# Install build dependencies (inside environment but not with conda)
python3 -m pip install wheel Cython==0.29.37 pkgconfig==1.5.5

# Install PyTorch
pip install torch==2.0.1

# Install other dependencies (inside environment but not with conda)
python3 -m pip install -r requirements.txt

# Run tool on sample file
python3 predict.py --fasta sample.fasta --output-dir result1

# The result is now available in result1/

The output of nvidia-smi for good measure. Note the GPU memory allocated to python3 for DeepTMHMM:

371 Total Views 1 Views Today
This entry was posted in Computer tutorials. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *