AWS SageMaker is a platform designed to support full lifecycle of data science process, from data preparation, to model training, to deployment. Having clean separation yet easy pipelining between model training and deployment is one of its greatest strength. A model can be developed using a training instances and saved as files. The deployment process can retrieve model artifacts saved in S3, and deploy a run time environment as HTTP endpoints. Finally, any application can send REST queries and get prediction results back from deployed endpoints.
While simple in concept, information regarding the practical implementation of SageMaker model deployment and prediction queries is currently lacking and scattered. It is easier to grasp in the simple 3 step process contained in a notebook.
1. create deployment model
2. configure deployment instances
3. deploy to service endpoints
Finally, create service endpoints, wait for completion, and model deployment is finished, now ready to service prediction requests.
The complete deployment process can be visualized as follows:
The complete sample notebook can be seen here: