Introduction to Machine Learning for Developers

323

By Stephanie Kim, software developer

Today’s developers often hear about leveraging machine learning algorithms in order to build more intelligent applications, but many don’t know where to start.

One of the most important aspects of developing smart applications is to understand the underlying machine learning models, even if you aren’t the person building them. Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning.

While this is only a brief definition, machine learning means we can use statistical models and probabilistic algorithms to answer questions so we can make informative decisions based on our data.

An excerpt from Rob Schapire’s Theoretical Machine Learning lecture in 2008 sums up machine learning very nicely:

“Machine learning studies computer algorithms for learning to do stuff. We might, for instance, be interested in learning to complete a task, or to make accurate predictions, or to behave intelligently. The learning that is being done is always based on some sort of observations or data, such as examples…direct experience, or instruction. So in general, machine learning is about learning to do better in the future based on what was experienced in the past.”

The two main types of machine learning algorithms are supervised and unsupervised learning. Unsupervised algorithms are great for exploring your dataset and are used for pattern detection, object recognition in images and other classification problems like recommendations based on similar items.

The k-means algorithm is a popular unsupervised algorithm that makes no assumptions about the data meaning it uses random seeds and an iterative process that eventually converges. This unsupervised clustering algorithm uses a distance metric with the goal of minimizing the Euclidean distance from the data points to a centroid, remeasuring and reassigning each data point to a centroid on each iteration.

This algorithm takes n observations into k clusters with each observation belonging to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. K-means is used in market segmentation, computer vision, geostatistics, astronomy and agriculture.

Read the source article at Algorithmia.com.