The Study of Machine Learning Inference Tasks Based on Serverless Computing Platforms

Authors

  • Chang Liu Author

DOI:

https://doi.org/10.61173/7mzsgf91

Keywords:

Serverless, ResNet50, Model Partitioning, Parallel Execution, Inference Efficiency

Abstract

This study proposes a distributed inference method based on the ResNet50 model, aiming to improve inference efficiency and resource utilization by dividing the model into multiple sub-models. Specifically, the model is divided into the initial convolutional layer, four stages of residual blocks, and the subsequent global average pooling and fully connected layers. Each sub-model independently handles specific tasks, allowing for parallel execution on different computing devices, thereby accelerating the overall inference process. This partitioning strategy effectively addresses high-concurrency requests, enhancing the system‘s response speed. Additionally, it enables dynamic scaling of resources based on workload demands, which is crucial for real-time applications. The implementation of distributed inference also makes the model more flexible, adapting to various computing resources and application scenarios. Experimental results indicate that this method significantly enhances inference efficiency while maintaining model performance, providing new ideas and solutions for the practical application of deep learning models. These findings underscore the potential of distributed architectures in advancing the deployment of complex neural networks across diverse environments.

Downloads

Published

2024-12-31

Issue

Section

Articles