PyTorch has identified a malicious dependency with the same name as the framework’s ‘torchtriton’ library. This has led to a successful compromise via the dependency confusion attack vector.
PyTorch admins are warning users who installed PyTorch-nightly over the holidays to uninstall the framework and the counterfeit ‘torchtriton’ dependency.
From computer vision to natural language processing, the open source machine learning framework PyTorch has gained prominence in both commercial and academic realms.
Malicious library targets PyTorch-nightly users
Between December 25th and December 30th, 2022, users who installed PyTorch-nightly should ensure their systems were not compromised, PyTorch team has warned.
The warning follows a ‘torchtriton’ dependency that appeared over the holidays on the Python Package Index (PyPI) registry, the official third-party software repository for Python.
“Please uninstall it and torchtriton immediately, and use the latest nightly binaries (newer than Dec 30th 2022),” advises PyTorch team.
The malicious ‘torchtriton’ dependency on PyPI shares name with the official library published on the PyTorch-nightly’s repo. But, when fetching dependencies in the Python ecosystem, PyPI normally takes precedence, causing the malicious package to get pulled on your machine instead of PyTorch’s legitimate one.
“Since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository. This design enables somebody to register a package by the same name as one that exists in a third party index, and pip will install their version by default,” writes PyTorch team in a disclosure published yesterday.
At the time of writing, BleepingComputer observed the malicious ‘torchtriton’ dependency had exceeded 2,300 downloads in the past week.
This type of supply chain attack is known as “dependency confusion,” as first reported by BleepingComputer in 2021, just as the attack vector was popularized by ethical hacker Alex Birsan.
PyTorch states, users of the PyTorch stable packages are unaffected by this issue.
Hacker steals sensitive files, claims ethical research
Not only does the malicious ‘torchtriton’ survey your system for basic fingerprinting info (such as IP address, username, and current working directory), it further steals sensitive data:
- Gets system information
- nameservers from
/etc/resolv.conf
- hostname from
gethostname()
- current username from
getlogin()
- current working directory name from
getcwd()
- environment variables
- nameservers from
- Reads the following files
- /etc/hosts
- /etc/passwd
- The first 1,000 files in $HOME/*
- $HOME/.gitconfig
- $HOME/.ssh/*
It then uploads all of this data, including file contents, to the h4ck.cfd domain via encrypted DNS queries using the wheezy.io DNS server.
PyTorch explains, the malicious ‘triton’ binary contained within the counterfeit ‘torchtriton’ is only executed when the user imports ‘triton’ package in their build. This would require explicit code and is not PyTorch’s default behavior.
The notice on the h4ck.cfd domain implies the whole operation is ethical research, but the analysis strongly indicates otherwise.
“Hello, if you stumbled on this in your logs, then this is likely because your Python was misconfigured and was vulnerable to a dependency confusion attack. To identify companies that are vulnerable the script sends the metadata about the host (such as its hostname and current working directory) to me. After I’ve identified who is vulnerable and [reported] the finding all of the metadata about your server will be deleted.”
Contrary to the wording of the notice, the binary not only collects “metadata,” but steals aforementioned secrets including your SSH keys, ,gitconfig, hosts and passwd files, and the contents of the first 1,000 files in your HOME directory.
BleepingComputer obtained a copy of the malicious binary which, according to VirusTotal, shows a clean reputation at the time of writing. But, don’t be fooled.
We observed, unlike several research packages and PoC exploits that are conspicuous in their intent and behavior, ‘torchtriton’ employs known anti-VM techniques to evade detection. More importantly, the malicious payload is obfuscated and contained entirely in the binary format, i.e. Linux ELF files, all of which makes the library an outlier when juxtaposed with ethical dependency confusion exploits of the past shipped in plaintext.
We also noticed the sample reads .bash_history or a list of commands and inputs the user has typed into the terminal, which is yet another trait exhibited by malware.
This won’t be the first time either when a hacker claims that their actions constitute ethical research, just as they are caught exfiltrating secrets.
In mid 2022, hugely popular Python and PHP libraries, respectively, ‘ctx’ and ‘PHPass’ were hijacked and altered to steal AWS keys. The researcher behind the attack later claimed that this was ethical research.
For the avoidance of doubt, we approached the owner of h4ck.cfd for comment. Public records show the domain was registered with Namecheap on December 21st, just days prior to this incident.
Given below is the complete statement we received from the domain owner, who also appears to be behind the wheezy.io domain.
Note, the mention of “Facebook” below is relevant given PyTorch’s conception at Meta AI.
“Hey, I am the one who claimed torchtriton package on PyPi. Note that this was not intended to be malicious!
I understand that I could have done a better job to not send all of the user’s data. The reason I sent more metadata is that in the past when investigating dependency confusion issues, in many cases it was not possible to identify the victims by their hostname, username and CWD. That is the reason this time I decided to send more data, but looking back this was wrong decision and I should have been more careful.
I accept the blame for it and apologize. At the same time I want to assure that it was not my intention to steal someone’s secrets. I already reported this vulnerability to Facebook on December 29 (almost three days before the announcement) after having verified that the vulnerability is indeed there. I also made numerous reports to other companies who were affected via their HackerOne programs. Had my intents been malicious, I would never have filled any bug bounty reports, and would have just sold the data to the highest bidder.
I once again apologize for causing any disruptions, I assure that all of the data I received has been deleted.
By the way in my bug report to Facebook I already offered to transfer the PyPi package to them, but so far I haven’t received any replies from them.”
Mitigations
PyTorch team has renamed the ‘torchtriton’ dependency to ‘pytorch-triton’ and reserved a dummy package on PyPI to prevent similar attacks. The group seeks to claim ownership of the existing ‘torchtriton’ on PyPI to defuse the current attack.
To uninstall the malicious dependency chain, users should run the following command:
$ pip3 uninstall -y torch torchvision torchaudio torchtriton
$ pip3 cache purge
Running the following command will look for the presence of malicious binary and reveal if you are impacted:
python3 -c "import pathlib;import importlib.util;s=importlib.util.find_spec('triton');
affected=any(x.name == 'triton' for x in (pathlib.Path(s.submodule_search_locations[0]
if s is not None else '/' ) / 'runtime').glob('*'));
print('You are {}affected'.format('' if affected else 'not '))"
The SHA256 hash of the ‘triton’ ELF binary is: 2385b29489cd9e35f92c072780f903ae2e517ed422eae67246ae50a5cc738a0e.
Update Jan 1st, 11:13 AM ET: Added statement from the creator of the dependency confusion package received hours after publishing.
PyTorch discloses malicious dependency chain compromise over holidays