Repository logo

Predicting content manipulations by open web proxies



Journal Title

Journal ISSN

Volume Title





Degree Level



The need for anonymity and privacy has given rise to open web proxies that act as gateways relaying traffic between web servers and their clients, allowing users to access otherwise not accessible content. As the open web proxy ecosystem continues to grow, more and more studies point out the extent of content alteration on the Internet. The content alterations applied by proxies include both benign and malicious modifications, such as adding crypto-mining scripts or adding injections. While some content modifications such as add injections can be prevented using blocker tools, adding scripts to JavaScript files cannot be detected with any antivirus or blocker tool. The widespread use of proxies and their malicious behaviour motivated us to focus on the feasibility of predicting these manipulations to choose a proxy for daily use carefully. While the previous studies focused on the detection and analysis of content manipulation by proxies, we present a novel approach for predicting the types of content alterations that might be silently applied by open proxies. Besides, this approach allows us to predict the injection of any extra file by open proxies. The predictions in this study indicate changes without a need to fetch the data through a proxy first. The leveraged dataset in this work is created by collecting website content of 1028 domains fetched through 1293 proxies as the initial steps of this study. Then, we derive 13 types of content modification through a detailed analysis of content manipulations on collected content. Then the detected content modification types are utilized to form our dataset for prediction analysis. This research allows us to accurately predict proxy behaviour over a particular website, enabling us to recognize malicious and benign proxies and cautiously select a proxy to connect to. This study predicted the type of content modifications with 92\% accuracy. In addition, the injection of extra files was predicted with 99\% accuracy. Besides, our study reveals an important observation that the majority of proxies manipulate website content based on technical information of the website and its web server.



Porxy, Machine learning, modification, manipulation, injection, open proxy, Web



Master of Science (M.Sc.)


Computer Science


Computer Science


Part Of