A Lightweight Ensemble Model Based on Knowledge Distillation and Distributed Data Parallelism for Predicting User Advertising Return on Investment

Yu Qiao; Alan Wilson; Zhaoyan Zhang

doi:10.62836/jitp.2023.159

A Lightweight Ensemble Model Based on Knowledge Distillation and Distributed Data Parallelism for Predicting User Advertising Return on Investment

Advertising plays a pivotal role in enabling businesses to connect with potential customers and promote their offerings. In today’s digital age, advertising channels such as online display ads, social media promotions, and targeted email campaigns dominate the marketing landscape. Given the substantial investments companies make in these channels, evaluating advertising effectiveness through Return on Investment (ROI)—a metric representing the ratio of net profit to advertising expenditure—becomes crucial. Accurately predicting user advertising ROI aids in optimizing campaign strategies, ensuring resources are allocated effectively. Traditional heuristic and rule-based methods often fail to capture the complex relationships in user data, leading to limited predictive accuracy. Recent advancements in machine learning, particularly deep learning, have significantly improved ROI prediction by uncovering intricate, non-linear patterns in large datasets. However, deep learning models can be computationally intensive and challenging to deploy in resource-constrained environments. To address these limitations, this study proposes a novel lightweight distributed ensemble model that leverages distributed data parallelism (DDP), knowledge distillation, and ensemble learning. The framework trains a large teacher network using DDP, followed by distilling knowledge into a smaller student network, and integrates high-level representations with other machine learning models. The results demonstrate improved prediction accuracy and computational efficiency, making the model suitable for real-time advertising ROI forecasting.

Keywords: ROI prediction; ensemble model; knowledge distillation; distributed training

References

Bagwell K. The Economic Analysis of Advertising. Handbook of Industrial Organization 2007; 3: 1701–1844.
Nelson P. Advertising as Information. Journal of Political Economy 1974; 82(4): 729–754.
Fitzgerald J. Evaluating Return on Investment of Multimedia Advertising with a Single-Source Panel: A Retail Case Study. Journal of Advertising Research 2004; 44(3): 262–270.
Danaher PJ, Rust RT. Determining the Optimal Return on Investment for an Advertising Campaign. European Journal of Operational Research 1996; 95(3): 511–521.
Tang CY, Li C. Examining the Factors of Corporate Frauds in Chinese A-Share Listed Enterprises. OAJRC Social Science 2023; 4(3): 63–77.
Sheikh M, Conlon S. A Rule-Based System to Extract Financial Information. Journal of Computer Information Systems 2012; 52(4): 10–19.
Huang W, Ma J. Analysis of Vehicle Fault Diagnosis Model Based on Causal Sequence-to-Sequence in Embedded Systems. Optimizations in Applied Machine Learning 2023; 3(1).
Ma J, Zhang Z, Xu K, et al. Improving the Applicability of Social Media Toxic Comments Prediction Across Diverse Data Platforms Using Residual Self-Attention-Based LSTM Combined with Transfer Learning. Optimizations in Applied Machine Learning 2022; 2(1).
Zhou Z, Wu J, Cao Z, et al. On-Demand Trajectory Prediction Based on Adaptive Interaction Car Following Model with Decreasing Tolerance. In Proceedings of the 2021 International Conference on Computers and Automation (CompAuto), Paris, France, 7–9 September 2021; pp. 67–72.
Zhang G, Zhou T, Cai Y. CORAL-based Domain Adaptation Algorithm for Improving the Applicability of Machine Learning Models in Detecting Motor Bearing Failures. Journal of Computational Methods in Engineering Applications 2023; 3(1): 1–17.
Gan Y, Ma J, Xu K. Enhanced E-Commerce Sales Forecasting Using EEMD-Integrated LSTM Deep Learning Model. Journal of Computational Methods in Engineering Applications 2023; 3(1): 1–11.
Chen X, Zhang H. Performance Enhancement of AlGaN-Based Deep Ultraviolet Light-Emitting Diodes with AlxGa1-xN Linear Descending Layers. Innovations in Applied Engineering and Technology 2023; 2(1): 1–10.
Hao Y, Chen Z, Jin J, et al. Joint Operation Planning of Drivers and Trucks for Semi-Autonomous Truck Platooning. Transportmetrica A: Transport Science 2023; 1–37.
Dai, W. Safety Evaluation of Traffic System with Historical Data Based on Markov Process and Deep-Reinforcement Learning. Journal of Computational Methods in Engineering Applications 2021; 1(1): 1–14.
Dai, W. Evaluation and Improvement of Carrying Capacity of a Traffic System. Innovations in Applied Engineering and Technology 2022; 1(1): 1–9.
Dai, W. Design of Traffic Improvement Plan for Line 1 Baijiahu Station of Nanjing Metro. Innovations in Applied Engineering and Technology 2023; 2(1): 1–11
Wenjun D, Fatahizadeh M, Touchaei HG, et al. Application of Six Neural Network-Based Solutions on Bearing Capacity of Shallow Footing on Double-Layer Soils. Steel and Composite Structures 2023; 49(2): 231–244.
Tang Y, Li C. Exploring the Factors of Supply Chain Concentration in Chinese A-Share Listed Enterprises. Journal of Computational Methods in Engineering Applications 2023; 3(1): 1–17.
Li, C. Analysis of Luxury Goods Marketing Strategies Based on Consumer Psychology. OAJRC Social Science 2022; 3(5): 432-443
Li C, Tang Y. The Factors of Brand Reputation in Chinese Luxury Fashion Brands. Journal of Integrated Social Sciences and Humanities 2023; 1–14.
Shen Y, Wang Y, Lu X, et al. A Framework for Massive Scale Personalized Promotion. arXiv 2021; arXiv:2108.12100.
Lewis R, Wong J. Incrementality Bidding and Attribution. arXiv 2022; arXiv:2208.12809.
Kong D, Shmakov K, Yang J. Do Not Waste Money on Advertising Spend: Bid Recommendation via Concavity Changes. arXiv 2022; arXiv:2212.13923.
Nakagawa K, Abe M, Komiyama, J. Ric-nn: A Robust Transferable Deep Learning Framework for Cross-Sectional Investment Strategy. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, NSW, Australia, 6–9 October 2020; pp. 370–379.
Error MA. Mean Absolute Error. Retrieved September 2016; 19: 14.
Error MS. Mean Squared Error; Springer: Saugus, MA, USA, 2010; pp. 653–653.
Hodson TO. Root Mean Square Error (RMSE) or Mean Absolute Error (MAE): When to Use Them or Not. Geoscientific Model Development Discussions 2022; 15: 5481–5487.
Zou J, Han Y, So SS. Overview of Artificial Neural Networks. In Artificial Neural Networks: Methods and Applications; Humana Press: Totowa, NJ, USA, 2009; pp. 14–22.
Yegnanarayana B. Artificial Neural Networks; PHI Learning Pvt. Ltd.: Delhi, India, 2009.
Krenker A, Bešter J, Kos, A. Introduction to the Artificial Neural Networks. In Artificial Neural Networks: Methodological Advances and Biomedical Applications; InTechOpen: London, UK, 2011; pp. 1–18.
Kramer O, Kramer O. K-Nearest Neighbors. In Dimensionality Reduction with Unsupervised Nearest Neighbors; Springer: Berlin/Heidelberg, Germany, 2013; pp. 13–23.
Laaksonen J, Oja E. Classification with Learning k-Nearest Neighbors. In Proceedings of International Conference on Neural Networks (ICNN’96), Washington, DC, USA, 3–6 June 1996; Volume 3, pp. 1480–1483.
Zhang Z. Introduction to Machine Learning: K-Nearest Neighbors. Annals of Translational Medicine 2016; 4(11): 218.
Awad M, Khanna R, Awad M, et al. Support Vector Regression. In Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers; Apress: Berkeley, CA, USA, 2015; pp. 67–80.
Smola AJ, Schölkopf B. A Tutorial on Support Vector Regression. Statistics and Computing 2004; 14: 199–222.
Zhang F, O’Donnell LJ. Support Vector Regression. In Machine Learning; Academic Press: Cambridge, MA, USA, 2020; pp. 123–140.
Su X, Yan X, Tsai CL. Linear regression. Wiley Interdisciplinary Reviews: Computational Statistics 2012; 4(3): 275–294.
Montgomery DC, Peck EA, Vining GG. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021.
James G, Witten D, Hastie T, et al. Linear Regression. In An Introduction to Statistical Learning: With Applications in Python; Springer International Publishing: Cham, Switzerland, 2023; pp. 69–134.
Li S, Zhao Y, Varma R, et al. Pytorch Distributed: Experiences on Accelerating Data Parallel Training. arXiv 2020; arXiv:2006.15704.
Zhao Y, Gu A, Varm R, et al. Pytorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. arXiv 2023; arXiv:2304.11277.
Mishra, P. Distributed PyTorch Modelling, Model Optimization, and Deployment. In PyTorch Recipes: A Problem-Solution Approach to Build, Train and Deploy Neural Network Models; Apress: Berkeley, CA, USA, 2022; pp. 187–212.
Gou J, Yu B, Maybank SJ, et al. Knowledge Distillation: A Survey. International Journal of Computer Vision 2021; 129(6): 1789–1819.
Phuong M, Lampert C. Towards Understanding Knowledge Distillation. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5142–5151.
Chen P, Liu S, Zhao H, et al. Distilling Knowledge via Knowledge Review. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 5008–5017.
Webb GI, Zheng Z. Multistrategy Ensemble Learning: Reducing Error by Combining Ensemble Learning Techniques. IEEE Transactions on Knowledge and Data Engineering 2004; 16(8): 980–991.
Ovelgönne M, Geyer-Schulz A. An Ensemble Learning Strategy for Graph Clustering. Graph Partitioning and Graph Clustering 2012; 588: 187.
Huang F, Xie G, Xiao R. Research on Ensemble Learning. In Proceedings of the 2009 International Conference on Artificial Intelligence and Computational Intelligence, Shanghai, China, 7–8 November 2009; Volume 3, pp. 249–252.

Supporting Agencies

Funding: This research was supported by the National Natural Science Foundation of China under Grant No. 61872364 and 71974036.

Downloads

A Lightweight Ensemble Model Based on Knowledge Distillation and Distributed Data Parallelism for Predicting User Advertising Return on Investment

References

Supporting Agencies

Information