Mass Running Vulnerability Scan Tools on Github Repositories
vuln scanning github repos
I decided to run code vulnerability scanning tools en-masse on all of the PHP github repositories that have a certain amount of stars/followers. It helped that I already started with a list of github repositories. That is a major limiting factor for this project. It would be difficult to scrape a list of PHP repositories from github.
I sorted the repositories by number of stars so the most interesting ones get scanned, and I scanned the top 30k. I had intended to scan more, but I ended up aborting the scans early because snyk contacted me and complained that i was overloading their servers
why i did it
I did not expect the tools to point me right to bugs immediately. It is more like the outputs of the tools will serve as an entrypoint for browsing potential bugs.
browsing the outputs is fun
I created some simple terminal tools to browse the script outputs, clone, and investigate anything interesting.
I used fzf
with a custom preview program to create interactive text user interfaces.
When I select a repository, the scripts will automatically clone it and pull up the report for that repository in vim
.
When I’m busy watching my kids or waiting for something, I can pull out a laptop and just poke around at random sections of other people’s code and write down anything interesting. It is kind of an engaging hobby. It requires some amount of focus to actually try out any bugs, but it is very easy to browse around while other things are happening. I have been keeping track of a list of bugs that I have found. So far, I have only found a few things worth reporting or investigating in some small repositories. I will talk a little more about what kinds of bugs I have found later in this post.
analyzing quantities of repositories
- checking the data to see how many repos there are with a given number of stars.
- there are 780 repositories that have over a thousand stars!
$ zcat repos-with-interest.gz | awk '$4 == "PHP" && $2 > 1000' | wc -l 780
- figuring out how many are reasonable to scan
- i intended to scan every repository with more than 5 stars.
$ zcat repos-with-interest.gz | awk '$4 == "PHP" && $2 > 10' | wc -l
41335
$ zcat repos-with-interest.gz | awk '$4 == "PHP" && $2 > 5' | wc -l
68411
procedure for scanning
- start a swarm of docker containers. start new ones once they quit so there’s a constant number. can use docker-compose or a simple bash script for this.
- entrypoint script connects to rabbitmq to get a request.
- in retrospect, i would not use rabbitmq. i would just make a plain queue API because rabbitmq might consider requests as failed if they take too long. i created a queue API do to this in the future.
- the results of all scans and tools get printed to standard output within the docker
- docker runs a few requests from rabbitmq and then quits
- standard output of docker is uploaded as a file to AWS. docker container ID is used as a unique ID for this filename.
- now i’ve got all the data!
what tools i decided to run
- snyk
- this code scanner is very good.
- it has a few false positives, but it can also analyze and explain the origin of vulnerable user data
- php code sniffer with security audit vulnerability scanning extensions
- ended up not using this output from phpcs very much. it duplicates the snyk output, doesn’t ignore tests, and returns more false positives.
building a docker for the scan worker
- dockerfile to build the docker
FROM kalilinux/kali-rolling
WORKDIR /secaudit
RUN apt-get update && apt-get install -y \
git \
php-codesniffer \
amqp-tools \
npm
RUN npm install snyk -g
RUN git clone https://github.com/FloeDesignTechnologies/phpcs-security-audit
ADD entrypoint .
ADD scan-worker .
ENTRYPOINT [ "/secaudit/entrypoint" ]
- entrypoint
- needs to configure environment for snyk to find its api key first.
- runs CLI script to connect to rabbitmq. limit to 5 URLs to scan before stopping docker.
#!/bin/bash
# set API key from env vars SNYK_API_KEY
if [ ! -z "$SNYK_API_KEY" ]; then
mkdir -p $HOME/.config/configstore
printf "{\n\"api\":\"$SNYK_API_KEY\"\n}\n" > $HOME/.config/configstore/snyk.json
fi
# run rabbitmq
amqp-consume -s $RABBIT -q githubscan -c 5 /secaudit/scan-worker
scan-worker
script to process each github URL from RabbitMQ- read URL from stdin - this is how
amqp-consume
passes messages - clone repo
- run phpcs and snyk
- separate the outputs with standard formatting that can be easily parsed with AWK
- read URL from stdin - this is how
#!/bin/bash
read url
export url
echo ">>>>>>>>>> REPO $url"
mkdir /repo
cd /repo
git clone "$url" >/dev/null
dirname="$(ls)"
# run phpcs with security audit extension
echo ">>>>>>>>>> phpcs"
cd /secaudit/phpcs-security-audit
timeout 600 phpcs --extensions=php,inc,lib,module,info --standard=Security "/repo/$dirname"
# run snyk
echo ">>>>>>>>>> snyk"
cd "/repo/$dirname"
timeout 600 snyk code test
if [ -e "composer.lock" ]; then
timeout 600 snyk test
fi
# cleanup
rm -rf /repo
echo ">>>>>>>>>> END $url"
exit 0
what kinds of bugs have been found so far
- there are a lot of repositories that include vulnerable dependencies in a
/vendor
directory. if these are included in the final webroot, and they can be accessed directly, this is a serious vulnerability- for example, a testing script to evaluate standard input as code, which would allow you to post PHP code and run it on the server
- there are a lot of things that almost look like LFI vulnerabilities, but there is some string appended before or after the include that prevents you from exploiting php filters for code execution
- in some of these cases though, the include functionality might be able to be abused to include admin functionality scripts. depending on how admin scripts are included and verified, if they only prevent direct inclusion, then you might still be able to get at some protected functionality.
- there are a lot of places where zip files are read which could be vulnerable to zip slips
- there are still a few smaller web apps with no SQLi prevention on the login page, pure authentication bypass and SQLi. not very many though. any app with a large enough following should have these issues fixed.