CVE VULNERABILITY CLASSIFICATION IN SOURCE CODE BASED ON TOKEN ANALYSIS AND LSTM NETWORKS
DOI:
https://doi.org/10.56651/lqdtu.jst.v13.n02.928.ictKeywords:
Tokens, source code, deep learning, vulnerability detection, natural language processing, PHP source codeAbstract
As web applications become increasingly widespread, the importance of source code security is growing rapidly. Exposed vulnerabilities present serious risks to both service providers and customers. Various models have been proposed to address this issue, however, most approaches rely on complex graph structures generated from source code or on expert-driven regular expression patterns. This paper introduces a model that utilizes token-based mechanisms combined with deep learning techniques for efficient vulnerability detection in PHP (Hypertext Preprocessor) web applications. By leveraging the PHP tokenization process, we have developed a custom token that merges tokens, supports key PHP features, and optimizes parsing. Using datasets such as the Software Assurance Reference Dataset (SARD) and SQL Injection Labs (SQLI-LABS), this paper demonstrates the training of a deep learning model with enhanced tokens to effectively detect vulnerabilities in the source code.