Efficient High Performance Protocols For Long Distance Big Data File Transfer
Data sets are collected daily in large amounts (Big Data) and they are increasing rapidly due to various use cases and the number of devices used. Researchers require easy access to Big Data in order to analyze and process it. At some point this data may need to be transferred over the network to various distant locations for further processing and analysis by researchers around the globe. Such data transfers require the use of data transfer protocols that would ensure efficient and fast delivery on high speed networks. There have been several new data transfer protocols introduced which are either TCP-based or UDP-based, and the literature has some comparative analysis studies on such protocols, but not a side-by-side comparison of the protocols used in this work. I considered several data transfer protocols and congestion control mechanisms GridFTP, FASP, QUIC, BBR, and LEDBAT, which are potential candidates for comparison in various scenarios. These protocols aim to utilize the available bandwidth fairly among competing flows and to provide reduced packet loss, reduced latency, and fast delivery of data. In this thesis, I have investigated the behaviour and performance of the data transfer protocols in various scenarios. These scenarios included transfers with various file sizes, multiple flows, background and competing traffic. The results show that FASP and GridFTP had the best performance among all the protocols in most of the scenarios, especially for long distance transfers with large bandwidth delay product (BDP). The performance of QUIC was the lowest due to the nature of its current implementation, which limits the size of the transferred data and the bandwidth used. TCP BBR performed well in short distance scenarios, but its performance degraded as the distance increased. The performance of LEDBAT was unpredictable, so a complete evaluation was not possible. Comparing the performance of protocols with background traffic and competing traffic showed that most of the protocols were fair except for FASP, which was aggressive. Also, the resource utilization for each protocol on the sender and receiver side was measured with QUIC and FASP having the highest CPU utilization.
network measurements, case study, data transfer protocols, congestion control
Master of Science (M.Sc.)