EX.1 Number of lines in large files
EX2. Top ten arrival airports by passenger number
EX3. Plot the monthly number of searches for selected airports
Solution is given in Python upon some internet search about how creating hdfs store and using Pandas
I tried also to use R which is easier to manipulate data
The problem is that with R loading the large file into memory takes more time but computation is as easier as trivial operation
at the end both solution are almost equal in time and i can give advantage to R especially that i didn't use the libraries for large data files