Abstract:How to find the good representation from raw data is a key and very important issue in machine learning. Most traditional approaches are based on the relationship among data or utilize simple linear combination, in which deep learning algorithm can perform very well in various machine learning tasks and achieve very good representations. However, most existing algorithms are implemented in serial, which cannot handle large-scale data. This paper proposes an effective parallel auto-encoder (PAE) based on Spark. The proposed PAE not only can learn satisfying representation, but also can speed up the executing time based on Spark. And then the paper adapts PAE to deal with the sparse data. Experiments conducted on two tasks, i.e., classification and collaborative filtering, demonstrate the effectiveness and efficiency of the proposed PAE.