Modern software development practices are very focused on improving efficiency and effectiveness through automation. Agile and DevOps strategies both stress the use of tools like continuous deployment and testing to ensure that every piece of code written is correct before it is accepted to the code repository, and that it remains correct throughout its lifecycle.
A crucial part of this is a good version control system that enables developers to manage a codebase in a single location and synchronize and track edits to the code. While a few different version control systems exist, the most famous one is Github.
Github is an extremely useful resource for managing a team of developers working on a single product. However, Github also has its dark side. The recent breach of the Canonical Github account and other research on sensitive data leaks reveal that the system can pose a serious threat to software and API security.
A Quick Intro to Github
Github is an example of a distributed version control system (DVCS). In these systems, every developer working on a particular project has a copy of the code repository in addition to the master repository, which is stored elsewhere. This system allows developers to work on modifications to the code in parallel without having the code constantly being modified under them (like when multiple people work on the same Google Docs file).
Once a developer has completed and tested a set of modifications, they can either push their modifications to the master copy or submit a pull request (which allows someone in charge to review the modifications before accepting them). Github automatically merges their changes with ones created by other developers and lets the developer choose which version to accept if some changes are incompatible (like two different changes to the same line of code).
Github offers both private and public repositories to its users. Private repositories allow an organization to maintain control of its intellectual property, while public repositories are good for open source code projects. However, these public repositories can be a serious security vulnerability if improperly used or protected.
The Canonical Breach
Canonical is the organization behind Ubuntu, one of the most popular versions of the Linux operating system. Since Ubuntu is open source, the source code of the operating system is hosted publicly on Canonical’s Github account.
On July 6, 2019, it was discovered that an attacker had gained unauthorized access to the Canonical Github account using compromised credentials. The attacker created several new, empty repositories and issues (requests for new features, fixes, etc.) within the account. However, no indication exists that the Ubuntu source code was modified.
While it is worrying that an attacker gained access to the Github account of such a major piece of software, it appears that there was no malicious intent. The creation of the new issues and repositories made it very obvious that the attacker was there.
If the attacker had chosen to be stealthy, they could have modified the source code to create backdoors into the operating system, which has happened to other Linux versions in the past (Linux Mint and Gentoo Linux). If such a modification occurred and went undetected, the attacker would have access to hundreds of millions of desktop computers and servers.
Github and API Security
The hack of the Canonical Github account also coincided with another Github-related attack. Cybersecurity firms have detected and reported mass scanning for Github dotfiles. These files contain configuration information for software stored in Github code repositories and are often used to store credentials, cryptographic keys, etc. Theoretically, this type of file should never be stored on Github’s servers. It is possible to configure Github to ignore these files when syncing a copy of a developer’s local copy of the repository with the master copy stored on the cloud.
However, research has demonstrated that users commonly fail to take advantage of this security feature. A study that examined only 13% of all of Github’s publicly accessible code repositories found a wealth of exposed sensitive data. Included in this were 201,642 unique cryptographic and API keys.
These keys were included in the code project to provide automated access to the owner’s account with different organizations (Google, Amazon, Facebook, etc.) or computers (in the case of SSH keys). However, anyone with access to the key has access to the account or machine as well. The fact that these credentials were publicly exposed on Github represents a serious threat to software, network, and API security.
Improving API Security
The Canonical Github account breach and the research regarding exposed credential on Github demonstrate how the service can be used by hackers. An attacker with access to an organization’s Github account could modify the hosted API code to include backdoors or other vulnerabilities. Alternatively, publicly exposed Github repositories could leak API keys that grant unauthorized access to an organization’s account with other services or access to accounts via the organization’s API.
These threats demonstrate the importance of going the extra mile to secure access to an organization’s web API. Tools like a web application firewall (WAF) and runtime application self protection (RASP) can help with identifying anomalous logins to the API or unusual behavior by the API software. When a user has authenticated access to a system (using breached credentials) or the software itself cannot be trusted (due to modification of the source code on a breached Github account), this may be the only way to detect and prevent a potential attack.