[PhD] - Detection of Vulnerabilities and Automatic Protection for Web Applications
Sep 2015 - Sep 2016
In less than three decades of existence, the Web evolved from a platform for accessing hypermedia to a framework for running complex web applications. These applications appear in many forms, from small home-made to large-scale commercial services such as Gmail, Office 365, and Facebook. Although a sig- nificant research effort on web application security has been on going for a while, these applications have been a major source of problems and their security con- tinues to be challenged. An important part of the problem derives from vulner- able source code, often written in unsafe languages like PHP, and programmed by people without the appropriate knowledge about secure coding, who leave flaws in the applications. Nowadays the most exploited vulnerability category is the input validation, which is directly related with the user inputs inserted in web application forms.
The thesis proposes methodologies and tools for the detection of input valida- tion vulnerabilities in source code and for the protection of web applications written in PHP, using source code static analysis, machine learning and runtime protection techniques.
An approach based on source code static analysis is used to identify vulnerabili- ties in applications programmed with PHP. The user inputs are tracked with taint analysis to determine if they reach a PHP function susceptible to be exploited. Then, machine learning is applied to determine if the identified flaws are actu- ally vulnerabilities. In the affirmative case, the results of static analysis are used to remove the flaws, correcting the source code automatically thus protecting the web application.
A new technique for source code static analysis is suggested to automatically learn about vulnerabilities and then to detect them. Machine learning applied to natural language processing is used to, in a first instance, learn characteristics about flaws in the source code, classifying it as being vulnerable or not, and then discovering and identifying the vulnerabilities
A runtime protection technique is also proposed to flag and block injection at- tacks against databases. The technique is implemented inside the database man- agement system to improve the effectiveness of the detection of attacks, avoid- ing a semantic mismatch. Source code identifiers are employed so that, when an attack is flagged, the vulnerability is localized in the source code.
Overall this work allowed the identification of about 1200 vulnerabilities in open source web applications available in the Internet, 560 of which previously un- known. The unknown vulnerabilities were reported to the corresponding soft- ware developers and most of them have already been removed.