"The attack surface is the vulnerability. Finding a bug there is just a detail."
-- Mark Dowd
"Some details are more important than others."
-- Fedor G. Pikus
A collection of my Semgrep rules to facilitate vulnerability research.
- https://hnsecurity.it/blog/semgrep-ruleset-for-c-c-vulnerability-research
- https://hnsecurity.it/blog/automating-binary-vulnerability-discovery-with-ghidra-and-semgrep
- https://hnsecurity.it/blog/big-update-to-my-semgrep-c-cpp-ruleset
- https://hnsecurity.it/blog/streamlining-vulnerability-research-with-the-idalib-rust-bindings-for-ida-9-2/
- Install Semgrep.
- Clone this GitHub repo.
- To use these rules, run:
# high priority scan (quick wins)
semgrep --severity ERROR --config semgrep-rules/rules /path/to/source
# high and medium priority scan (recommended)
semgrep --severity ERROR --severity WARNING --config semgrep-rules/rules /path/to/source
# full scan (might include marginal findings and more false positives)
semgrep --config semgrep-rules/rules /path/to/sourceNote: Specify the --no-git-ignore switch to scan files regardless of git tracking status or .gitignore rules.
For a better streamlined experience, I recommend saving Semgrep scan output in SARIF format and using SARIF Explorer in VS code:
semgrep --sarif --sarif-output=/path/to/source/SEMGREP.sarif --config semgrep-rules/rules /path/to/source
code /path/to/source # then open the SEMGREP.sarif file in VS code with SARIF Explorer See also the included SARIF output example.
- Tested with Semgrep CLI 1.142.0
- insecure-api-gets. Use of the insecure API function
gets. - insecure-api-strcpy-stpcpy-strcat. Use of potentially insecure API functions
strcpy,stpcpy,strcat. - insecure-api-sprintf-vsprintf. Use of potentially insecure API functions
sprintfandvsprintf. - insecure-api-scanf-etc. Use of potentially insecure API functions in the
scanffamily. - incorrect-use-of-strncat. Wrong size argument passed to
strncat. - incorrect-use-of-strncpy-memcpy-etc. Wrong size argument passed to
strncpy,memcpy, and variants. - incorrect-use-of-sizeof. Accidental use of the
sizeofoperator on a pointer instead of its target. - unterminated-string-strncpy-stpncpy. Lack of explicit NUL-termination after
strncpyandstpncpy. - off-by-one. Potential off-by-one error.
- pointer-subtraction. Potential use of pointer subtraction to determine size.
- unsafe-ret-snprintf-vsnprintf. Potentially unsafe use of the return value of
snprintfandvsnprintf. - unsafe-ret-strlcpy-strlcat. Potentially unsafe use of the return value of
strlcpyandstrlcat. - write-into-stack-buffer. Direct write into buffer allocated on the stack.
- incorrect-unsigned-comparison. Checking if an unsigned variable is negative.
- unsafe-strlen. Casting the return value of
strlento short might be dangerous. - integer-wraparound. Potential integer wraparound errors.
- integer-truncation. Potential integer truncation errors.
- signed-unsigned-conversion. Potential signed/unsigned conversion errors.
- format-string-bugs. Potential format string bugs.
- insecure-api-alloca. Use of the potentially insecure API function
alloca. - use-after-free. Potential use after
free. - double-free. Potential double
free. - incorrect-use-of-free. Calling
freeon memory not in the heap. - unchecked-ret-malloc-calloc-realloc. Unchecked return code of
malloc,calloc,realloc. - ret-stack-address. Potential return of the address of a stack-allocated variable.
- putenv-stack-var. Call to
putenvwith a stack-allocated variable. - memory-address-exposure. Potential exposure of underlying memory addresses.
- mismatched-memory-management. Potentially mismatched C memory management routines.
- mismatched-memory-management-cpp. Potentially mismatched C++ memory management routines.
- command-injection. Potential OS command injection via
systemorpopen.
- insecure-api-access-stat-lstat. Use of insecure API functions
access,stat,lstat. - insecure-api-mktemp-tmpnam-tempnam. Use of insecure API functions
mktemp,tmpnam,tempnam. - insecure-api-signal. Use of insecure API function
signal.
- incorrect-order-setuid-setgid-etc. Privilege management functions called in the wrong order.
- unchecked-ret-setuid-seteuid. Unchecked return code of
setuidandseteuid.
- regex-dos. Regular expression that may exhibit exponential runtime and lead to ReDoS.
- incorrect-use-of-memset. Wrong order of arguments to
memset. - insecure-api-rand-srand. Use of potentially insecure API functions
randandsrand. - overlapping-src-dst. Source and destination overlap in copy functions.
- suspicious-assert. Potentially invalid size check due to use of assertion macros.
- interesting-api-calls. Calls to interesting and potentially insecure API functions.
- unchecked-ret-scanf-etc. Unchecked return code of functions in the
scanffamily. - insecure-api-atoi-atol-atof. Use of potentially insecure API functions
atoi,atol,atof. - high-entropy-assignment. Assignment of a high-entropy value that might be a secret.
- argv-envp-access. Command-line argument or environment variable access.
- missing-default-in-switch. Missing default case in a switch statement.
- missing-break-in-switch. Missing break or equivalent in a switch statement.
- missing-return. Missing return statement in non-void function.
- typos. Potential typos with security implications.
- bad-words. Keywords and comments that suggest the presence of bugs.
- Extensive testing of the current rules compared to v1.0.0, then release of v1.1.0 (ongoing).
- Additional
--timebenchmarking against real-world code to spot slow rules in need of optimization. - Improve overall accuracy and reduce false positives, without missing potential hot spots in code.
- Add new checks to the existing rules and add new rules where needed.
- Add scripts to clean up pseudocode generated by common decompilers to improve Semgrep parsing.
- Implement dedicated kernel rules (Linux, BSD, macOS, etc.).
- Implement dedicated C++ rules and move them in another folder separated from the one for C rules.
- Port the rules to the Semgrep pro engine, which allows for inter-file and inter-function analysis.
- Implement taint mode where suitable to improve rules with dataflow analysis.
- Investigate symbolic propagation that might be useful to reduce some false positives.
- Implement a Semgrep wrapper and post-processor as described in this research.
