Downloads

Jiarui Rao, Qian Zhang, Shaoyu Liu, & Xinqi Liu. (2023). Integrating Textual Analytics with Time Series Forecasting Models: Enhancing Predictive Accuracy in Global Energy and Commodity Markets. Innovations in Applied Engineering and Technology, 2(1), 1–7. https://doi.org/10.62836/iaet.v2i1.265

Integrating Textual Analytics with Time Series Forecasting Models: Enhancing Predictive Accuracy in Global Energy and Commodity Markets

This study presents a comprehensive framework for predicting crude oil prices by integrating textual features extracted from news headlines into a time series forecasting model. The rationale for using headlines instead of full articles is twofold: headlines encapsulate the essence of the news, and the approach aligns with previous research by Li et al. The focus on futures news over gold news is justified by the larger dataset and the complex interrelations between futures prices, including gold, natural gas, and crude oil. The methodology involves extracting thematic and sentiment information from news headlines using text mining techniques, constructing daily topic strength indices, and developing an emotional strength index that accounts for the decay effect of news influence over time. The study employs a vector autoregression model to determine the optimal lags for various exogenous sequences, including topics and sentiment indices, relative to crude oil prices. The forecasting model is trained using machine learning techniques such as Random Forest Regression (RF), Support Vector Regression (SVR), Autoregressive Integrated Moving Average (ARIMA), and their extended versions with exogenous variables (ARIMAX). The performance of the models is evaluated using metrics like Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). The results indicate that incorporating textual features significantly improves the prediction accuracy of RF, SVR, and AdaBoost models, while the traditional ARIMA model performs well without textual features. The study also introduces a novel approach combining Ensemble Empirical Mode Decomposition (EEMD) with Independent Component Analysis for analyzing non-linear and non-stationary time series data, specifically applied to gold price analysis. The EEMD-BPNN-ADD model is identified as the most accurate for forecasting, with interval predictions provided for gold prices. This research contributes to the field by demonstrating the effectiveness of integrating textual analysis with traditional financial models for improved market forecasting.

financial time series; decomposition techniques; trend analysis; seasonality; forecasting; risk management

References

  1. Andrienko N, Andrienko G, Fuchs G, et al. Visual Analytics for Understanding Texts. In: Visual Analytics for Data Scientists; Springer: Cham, Germany, 2020.
  2. Chen M, Chen Y, Zhang Q. A Review of Energy Consumption in the Acquisition of Bio-Feedstock for Microalgae Biofuel Production. Sustainability 2021; 13(16): 8873.
  3. Keim DA, Krstajic M, Rohrdantz C, Schreck T. Real-Time Visual Analytics for Text Streams. Computer 2013; 46(7): 22–26.
  4. Shimizu S, Nakai K, Li Y, et al. Boron Neutron Capture Therapy for Recurrent Glioblastoma Multiforme: Imaging Evaluation of a Case with Long-Term Local Control and Survival. Cureus 2023; 15(1): e33898. https://doi.org/10.7759/cureus.33898.
  5. Kamola M, Arabas P. Improving Time-Series Demand Modeling in Hospitality Business by Analytics of Public Event Datasets. IEEE Access 2020; 8: 53666–53677.
  6. Li Y, Shimizu S, Mizumoto M, Iizumi T, Numajiri H, Makishima H, Sakurai H. Proton Beam Therapy for Multifocal Hepatocellular Carcinoma (HCC) Showing Complete Response in Pathological Anatomy After Liver Transplantation. Cureus 2022; 14: e25744. https://doi.org/10.7759/cureus.25744.
  7. Li Z, et al. Stock Market Analysis and Prediction Using LSTM: A Case Study on Technology Stocks.Innovations in Applied Engineering and Technology 2023; 2(1): 1–6. https://doi.org/10.62836/iaet.v2i1.162.
  8. Sarkar D. Text Analytics with Python, 2nd ed.; Apress: New York, NY, USA, 2016.
  9. Peng Z, Jian J, Wang M, Wang Q, Boyer T, Wen H, Liu H, Mao Z-H, Chen KP. Big Data Analytics on Fiber-Optical Distributed Acoustic Sensing with Rayleigh Enhancements. In Proceedings of the 2019 IEEE Photonics Conference (IPC), San Antonio, TX, USA, 29 September–3 October2019; pp. 1–3. https://doi.org/10.1109/IPCon.2019.8908496.
  10. Wang R, Behandish M. Surrogate Modeling for Physical Systems with Preserved Properties and Adjustable Tradeoffs. arXiv2022, arXiv:2202.01139.
  11. Wang Q, Zhao K, Badar M, Yi X, Lu P, Buric M, Mao Z-H, Chen KP. Improving OFDR Distributed Fiber Sensing by Fibers with Enhanced Rayleigh Backscattering and Image Processing. IEEE Sensors Journal 2022; 22: 18471–18478. https://doi.org/10.1109/JSEN.2022.3197730.
  12. Badar M, Lu P, Wang M, Wang Q, Chen KP, Buric M, Ohodnicki PR. Integrated Auxiliary Interferometer to Correct Non-Linear Tuning Errors in OFDR. In Proceedings of the SPIE, Optical Waveguide and Laser Sensors, 114050G, Online, 8 May 2020; Volume 11405. https://doi.org/10.1117/12.2558910.
  13. Wang Q, Jian J, Wang M, Wu J, Mao Z-H, Gribok AV, Chen KP. Pipeline Defects Detection and Classification Based on Distributed Fiber Sensors and Neural Networks. In Proceedings of the Optical Fiber Sensors Conference 2020 Special Edition, OSA Technical Digest, Washington, DC, USA, 8–12 June 2020. https://doi.org/10.1364/OFS.2020.W2B.3.
  14. Kumada H, Li Y, Yasuoka K, Naito F, Kurihara T, Sugimura T, Sakae T. Current Development Status of iBNCT001, Demonstrator of a LINAC-based Neutron Source for BNCT. Journal of Neutron Research 2022; 24(3–4): 347–358. https://doi.org/10.3233/JNR-220029.
  15. Li S, Mo Y, Li Z. Automated Pneumonia Detection in Chest X-Ray Images Using Deep Learning Model. Innovations in Applied Engineering and Technology 2022; 1(1): 1–6. https://doi.org/10.62836/iaet.vli1.002.
  16. Wang J, Tong J, Tan K, Vorobeychik Y, Kantaros Y. Conformal Temporal Logic Planning Using Large Language Models: Knowing When to Do What and When to Ask for Help. arXiv2023, arXiv:2309.10092.
  17. Li Y, Matsumoto Y, Chen L, Sugawara Y, Oe E, Fujisawa N, Sakurai H. Smart Nanofiber Mesh with Locally Sustained Drug Release Enabled Synergistic Combination Therapy for Glioblastoma. Nanomaterials 2023; 13: 414. https://doi.org/10.3390/nano13030414.
  18. Chen M. Investigating the Influence of Interannual Precipitation Variability on Terrestrial Ecosystem Productivity. Doctoral Dissertation, Massachusetts Institute of Technology, Cambridge, MA, USA, 2023.
  19. Iezzi DF, Celardo L. Text analytics: Present, past and future. In Text Analytics: Advances and Challenges; Springer International Publishing: New York, NY, USA, 2020; pp. 3–15.
  20. Dong S, Xu T, Chen M. Solar Radiation Characteristics in Shanghai. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2022; Volume 2351, pp. 012016.
  21. Wang R, Shapiro V. Topological Semantics for Lumped Parameter Systems Modeling. Advanced Engineering Informatics 2019; 42: 100958.
  22. Kanungsukkasem N, Leelanupab T. Financial Latent Dirichlet Allocation (FinLDA): Feature Extraction in Text and Data Mining for Financial Time Series Prediction. IEEE Access 2019; 7: 71645–71664.
  23. Li Y, Mizumoto M, Oshiro Y, Nitta H, Saito T, Iizumi T, Sakurai H. A Retrospective Study of Renal Growth Changes after Proton Beam Therapy for Pediatric Malignant Tumor. Current Oncology 2023; 30: 1560–1570. https://doi.org/10.3390/curroncol30020120.
  24. Steed CA, Drouhard M, Beaver J, Pyle J, Bogen PL. Matisse: A Visual Analytics System for Exploring Emotion Trends in Social Media Text Streams. In Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, 29 October–1 November 2015.
  25. Wu Z, Wang Q, Gribok AV, Chen KP. Pipeline Degradation Evaluation Based on Distributed Fiber Sensors and Convolutional Neural Networks (CNNs). In Proceedings of the 27th International Conference on Optical Fiber Sensors, Alexandria, VA, USA, 29 August–2 September 2022. https://doi.org/10.1364/OFS.2022.W4.41.
  26. Carnot ML, Bernardino J, Laranjeiro N, Gonçalo Oliveira H. Applying Text Analytics for Studying Research Trends in Dependability. Entropy 2020; 22(11): 1303.
  27. Shimizu S, Mizumoto M, Okumura T, Li Y, Baba K, Murakami M, Sakurai H. Proton Beam Therapy for a Giant Hepatic Hemangioma: A Case Report and Literature Review. Clinical and Translational Radiation Oncology 2021; 27: 152–156. https://doi.org/10.1016/j.ctro.2021.01.014.
  28. Rezaee Z, Dorestani A, Aliabad S.Application of Time Series Analyses in Big Data: Practical, Research, and Education Implications. Journal of Emerging Technologies in Accounting 2018; 15(1): 183–197.
  29. Chen M. Annual Precipitation Forecast of Guangzhou Based on Genetic Algorithm and Backpropagation Neural Network (GA-BP). In Proceedings of the International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2021), 19–21 November 2021; Volume 12156, pp. 182–186.
  30. Castellanos M, Kim HD, Hsu M, Zhai C, Rietz T, Diermeier D. Mining Causal Topics in Text Data: Iterative Topic Modeling with Time Series Feedback. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA, 27 October–1 November 2013.

Supporting Agencies

  1. Funding: Not applicable.