Choosing a Cloud Computing Platform for your enterprise

Being an ML Leader

Rahul S

--

src: link

We cannot do away with clouds. From the perspective of an enterprise, Cloud computing services do not accrue an upfront investment. We get advanced hardware without having to purchase it, and pay per second basis.

They give us access to an on-demand large-scale computing capacity, making it possible to distribute model training across multiple machines. We can also access special hardware configurations, like GPUs, FPGAs, and massively parallel HPC (high performance computing) systems.

Cloud services also provide advanced features for managing datasets and algorithms, training models and deploying them efficiently to production.

(Side note) Infrastructure as a Service (IaaS) is about how vendors deliver cloud-based virtualized resources. The Platform as a Service (PaaS) model adds a layer of managed services to IaaS resources. These PaaS offerings provide the hardware needed for deep learning workloads, as well as software services for managing deep learning pipelines, from data ingestion to production deployment and real-world inference.

To choose which vendor to go for, we have to anticipate our needs with regard to which step in a typical ML workflow matters most to us.

Data Preparation: Generally the heaviest and most sensitive parts of a deep learning project, in Data Preparation step, we prepare deep learning datasets from raw data through one of the following two approaches:

1) Export, transform, load (ETL) — transforms data as it is pulled from the source and creates a ready-made dataset that can be used for analytics.
2) Export, load, transform (ELT) — provides greater flexibility, lets you store raw data in a data lake and then transform it into the required format on demand.

We need to be sure of our approach and choose the vendor accordingly. We should also know which data storage, database or data warehouse services we will use, and how they can make data preparation easier.

Training: Pushing the training jobs to cloud-based compute-instances should be seamless. So we should check what hardwares our vendor is using, like GPUs, TPUs, and FPGAs. We should also…

--

--