Security of External Dependencies in CI/CD Workflows

Use of external dependencies in build processes brings common security risks related to code integrity. But have you considered all of them?

Jakub Kramarz 2024.10.09   –   6 MIN read

A decade-old debate on “curl | sudo bash”

The question of whether it is safe to use such “pipe installers” – where a script is downloaded and directly executed – has been a contentious topic for over a decade. This practice has long been popular due to its simplicity (allowing developers to install tools with a single command) and portability (at least between Linux distributions). However, its security implications make it controversial, especially in the context of Continuous Integration/Continuous Deployment (CI/CD) pipelines.

Yet, despite the security risks, this technique remains common and comes in many unexpected flavors. This article explores the dangers of dependency poisoning in this and similar practices, and hints possible ways to securely manage external dependencies in automated workflows.

The red flags of external dependencies

As discussed by Mark Stemm of Sysdig in “Friends Don’t Let Friends Curl | Bash” and Jordan Eldredge in “One way ‘curl pipe sh’ install scripts can be dangerous”, this method can expose systems to malicious code and man-in-the-middle (MITM) attacks due to a complete lack of verification steps and visibility into what the script actually does.

Without verification mechanisms like checksums or digital signatures, there is no way to ensure that the script you download is exactly the same as in the previous build and does what the developer intended. Furthermore, if it is not saved anywhere, once the script is executed, it becomes difficult to track which actions were performed, making it difficult to audit or revert unintended changes. Malicious actions may go unnoticed until it’s too late. Despite these risks, curl | bash remains widely used due to its convenience. However, in CI/CD pipelines, where reproducible builds are paramount and security needs to be airtight, this practice is particularly not recommended.

Dangerous comfort of seemingly unchanging Git tags

A common alternative to curl | bash is downloading tools directly from Git repositories. While this might feel more secure, it introduces its own set of problems particularly while specifying a version of a tool or dependency. As explained in the Git documentation on re-tagging, Git tags can be redefined or “mutated” after they’ve been pushed. This means the tag you’re referencing in your CI/CD pipeline today could point to an entirely different commit tomorrow, opening door to potential attacks. This means an attacker with access to the repository can introduce malicious changes while maintaining the appearance of stability and trust. Without proper verification mechanisms, pipelines relying on a tag would be vulnerable to such changes. It doesn’t necessarily require malicious intent from the repository owner—perhaps, they were simply an easier target for attackers than your organization.

Moreover, downloading a tool directly from a Git repository doesn’t guarantee its integrity. Object GPG signatures – basically the only built-in Git verification that would make this attack harder – are an excellent additional security measure, but they are not widely used.

Abbreviated Commit Hashes and Chosen-Prefix Attacks

For CI/CD pipelines, this means that you are trusting that the code from that repository is always safe. To counter the risk of re-defined tags or changed branches and therefore ensure a file’s integrity, developers often turn to specifying the commit hash. However, even this approach is not foolproof.

Hash Collisions and the Weakness of SHA-1

Cryptographic attacks such as hash collisions exploit the ability to create two different inputs that produce the same hash. In the case of SHA-1, vulnerabilities are well-documented. The SHAttered attack, unveiled in 2017, demonstrated a full collision on SHA-1, proving the algorithm’s fragility. SHA-1 has been deprecated for secure uses, such as SSL certificates, for nearly a decade due to these weaknesses. Computing a full SHA-1 collision now requires just about an hour of supercomputer time, making it increasingly feasible for malicious actors.

If only a partial collision is needed to undermine the perceived security of hash-based verification with a fixed chosen prefix, then an ordinary laptop is fully sufficient to achieve this in about an hour.

The case of abbreviated commit hashes

Since childhood, we’ve been taught that using long, hardcoded magic constants in code is a bad practice — a classic code smell. So, when you encounter a 40-character Git commit hash in a deployment script, it’s tempting to think that a shorter, 8-character prefix would look much cleaner. While this may seem like a secure and efficient way to reference a specific version of a tool, it actually introduces serious risks.

During one of our recent assignments, we encountered a code very similar to the following in one of CI/CD scripts:

wget https://raw.githubusercontent.com/drwetter/testssl.sh/bdeda3/testssl.sh
chmod +x testssl.sh
./testssl.sh

One might assume that this URL not only references a specific commit but also explicitly indicates that it should come from the drwetter/testssl.sh repository and there’s no chance that Dirk Wetter will do something malicious.

Let’s now discuss a different version of the same file: raw.githubusercontent.com/drwetter/testssl.sh/c1857ee/testssl.sh

Can’t we say exactly the same thing about it? Actually, no: this version was never a part of the repository, it is not now, and – considering that it comes from a rejected merge request – never will be: c1857ee7753a42da2b3e6abe8cf81c28c977d8b0

Because of the way GitHub’s fork network and pull requests function, a malicious commit may become visible in the context of the original repository after being pushed to a forked repository.

What will happen to this distribution URL if two commits with the same hash prefix clash, making the reference from the script ambiguous (this is quite common; Git even has built-in tools for dealing with such cases)? Which version will be available? Maybe the first committed? Will it be defined by the self-declared commit time? Or specifically the one from the original repository? Why not the first accessed? What about the last one committed? It is actually undocumented, but fortunately, GitHub fails safely by serving none of them. 

The client’s system seemed stable—until a chosen-prefix attack was successfully executed on the SHA-1 hash using one of the common “vanity commit hash” tools, making the abbreviated hash ambiguous. Hope your CI/CD script will not skip an important verification step because of this. If the CI/CD pipeline were to download a compromised file instead, it would effectively break the integrity of the build process.

Git’s Transition to SHA-256: A Secure Future or Cryptographic Dystopia?

Since 2017, Git has been working on migrating from SHA-1 to a stronger hash function, with SHA-256 chosen as its cryptographically secure successor.

As a thought-provoking conclusion, consider this: Bitcoin—the world’s largest decentralized financial network—bases its proof of work on the computational difficulty of finding a collision in SHA-256. Yet, there is now a global network of Bitcoin miners solving hash puzzles in under one second – every second. While such an attack may be economically infeasible for most players, is it possible that Satoshi Nakamoto’s creation has inadvertently made a lot of specialized, efficient hardware available that could be repurposed for attacks on SHA-256?

Why is CI/CD the best target for attackers?

At their core, build pipelines are intricate systems designed to handle a wide range of tasks with wide network access. They download source code from both public and private repositories, integrate it with sensitive credentials and secrets, run arbitrary code, and ultimately deploy it into business-critical environments. This process opens up potential vulnerabilities, particularly through software dependencies that are outside the company’s direct control. 

Such dependencies can provide an easy (or at least unexpected) pathway to bypass security measures, as evidenced by incidents like the xz-utils backdoor outbreak in 2024 and the SolarWinds supply chain attack in 2020.

Continuous integration systems face similar threats, with breaches becoming almost routine. For example, CircleCI leaked secrets in 2023, GitLab dealt with critical vulnerabilities in 2024, CodeCov was compromised in 2021. Any breach or manipulation of these systems offers valuable opportunities for malicious actors.

A novel approach to dependency confusion was explored by Bar Lanyado of Lasso Security in “Diving Deeper into AI Package Hallucinations”. He targeted developers directly by publishing (originally non-existent) software dependencies that were widely recommended by generative AI models in code samples.

In light of these trends, it seems increasingly apparent that the system is skewed in the attacker’s favor.

Secure practices for managing external dependencies

In modern software engineering and computing, using externally maintained software is unavoidable.

https://xkcd.com/2347/

However, by addressing key areas, you can greatly enhance the security of your CI/CD pipelines and ensure that external dependencies don’t suddenly become weak links in your development lifecycle:

  • Avoid Pipe Installers, or at least verify the source and content of any downloaded script before executing.
  • Identify possible threats and attack methods: gather distributed knowledge, ask your team to identify tentative processes and find the most important dangers and most effective defense methods. You all know your systems, right?
  • Mirror and verify your dependencies: ensure that libraries you’re using are the same as the ones you intend to use. At the same time, make sure you will be able to access the specific version used in the build in the future.
  • Verify code integrity: use digital signatures to verify that external tools and dependencies haven’t been tampered with. If you must download “a file from the Internet” as a part of your build process, at least check that its checksum hasn’t changed (and ensure that build fails immediately if it has).
  • Be wary of Git tags and abbreviated hashes: don’t assume that tags are immutable—consider using signed tags or full commit hashes for more reliable version control.
  • Monitor for vulnerabilities: regularly audit dependencies and use safe, trusted artifact repositories whenever possible.
  • Educate your team: make developers (who are often playful by nature) aware of novel attack vectors to help them predict and prevent future ones.
Jakub Kramarz
Jakub Kramarz Senior IT Security Consultant