Machine learning with Python: A guide to getting started
"Machine learning” has an almost cinematic quality to it, doesn’t it? It evokes the work Isaac Asimov and Arthur C. Clarke.
Science fiction has often been the predecessor to true scientific advancement, and in regards to artificial intelligence this is definitely the case, though not in the ways that authors and filmmakers have predicted. At least so far.
Machine learning is very real, and not as impenetrable as you might think. If you’ve used a search engine, tagged a friend in a Facebook photo, or noticed a lack of spam in your email inbox, then you’ve used technology that utilizes machine learning.
The field is growing everyday, and nearly any industry can utilize it. If you’re interested in machine learning, you are far from alone.
Machine learning: A quick overview
In Understanding Machine Learning: From Theory to Algorithms, Shai Shalev-Shwartz and Shai Ben-David define it as “the automated detection of meaningful patterns in data.”
In other words, machine learning is a way for a computer program to comprehend data independently of a programmer. Over time, it improves its ability to understand information, creating more accurate predictive models.
If this sounds like artificial intelligence, that’s because it is, of a sort. The ability to learn based on past experiences is the very definition of intelligence. The difference is that machine learning is far more specialized than AI.
"...machine learning is not trying to build automated imitation of intelligent behavior, but rather to use the strengths and special abilities of computers to complement human intelligence, often performing tasks that fall way beyond human capabilities. "
A machine learning program can be thought of as a specialized subset of artificial intelligence, one that can a limited number of tasks extremely well.
Implementing machine learning
One advantage of this field is that there is no restriction on programming language when it comes to implementation.
Because machine learning is typically used to process large volumes of data, you may want to choose a powerful low-level language. However, if you’re only just beginning to explore this field, it might be better to start with Python.
Python is beginner-friendly, and can do the same thing that other coding languages can, but in fewer lines of code.
If you are interested in exploring machine learning with Python, this article will serve as your guide. This is not a tutorial in using machine learning, but an introduction to the field, and a quick overview of resources one might use to get started as programming machine learning using Python.
Machine learning with Python
Python is one of the most popular coding languages in use today, and is especially popular with web developers.
It is probably the most prolific object-oriented program language out there, and was coded so that a person of any skill can understand it. It is also known, somewhat unfairly, as the “beginner’s” language because of its easy syntax.
While it is true that Python is accessible to first-time coders, its simplicity can be a strength; a small development team can accomplish tasks more quickly and in fewer lines of code than with lower-level languages. However, the trade off for this ease-of-use is speed, where it lags behind many of its competitors.
Is Python a good language for machine learning? The pros and cons.
The answer is more complicated than a simple yes or no.
In many ways, Python is not the ideal tool for the job. In nearly every instance, the data that machine learning is used for is massive. Python’s lower speed means it can’t handle enormous volumes of data fast enough for a professional setting.
Machine learning is a subset of data science, and Python was not designed with data science in mind.
However, Python’s greatest strength is its versatility. There are hundreds of libraries available with a simple download, each of which allow developers to adapt their code to nearly any problem.
While vanilla Python is not especially adapted to machine learning, it can be very easily modified to make writing machine learning algorithms much simpler.
When should you use Python for machine learning?
If you’re reading this article because you’re a beginner in machine learning, then right now would be a great time!
Whether Python is a “beginner’s language” or not, it is an ideal language for learning new concepts. Cutting your teeth with machine learning problems, allowing specialized libraries to handle fine details, is a great way to utilize Python while you come to grips with larger machine learning concepts.
Once you understand machine learning better as a larger field, then you might want to try moving onto other more powerful languages, such C++ and R, both of which are especially well-suited to large-scale machine learning problems.
Getting started with Python machine learning
The first step should be to familiarize yourself with Python if you haven’t done so already. W3Schools has an excellent reference for beginners to the language.
Another great resource is Python.org, which contains a wealth of tutorials, downloads, documentation resources and a thriving community for both beginning and advanced users.
If you are comfortable with Python, then the next step is to decide which libraries you want to use for your machine learning model.
The sheer number of available Python libraries for machine learning can be paralyzing. There are dozens of options for solving even simple problems, and many developers have found themselves including more libraries than they need while trying to find the best ones.
We’ll help you narrow that list down. There are some libraries that specialize in machine learning itself, while others are more focused on specific aspects of machine learning, such as data analysis or visualization.
Which Python library should I choose for machine learning?
The following libraries are general-purpose libraries for anything involving advanced data manipulation. This means they can all be used in implementing machine learning, and many of the higher level machine learning libraries make use of some or all of these libraries.
Getting acquainted with them is highly recommended if you plan on getting anywhere with scientific Python programming.
This list is by no means exhaustive; it is meant to be a starting point for you as you explore machine learning through Python!
Overview: As stated previously, Python does not specialize in scientific computing, but certain libraries exist to help change that. Of these libraries, NumPy (or Numerical Python) is by far the most common and influential.
If you ever want to explore data science using Python, a working knowledge of NumPy is indispensable. It’s so useful that many of the other libraries in this article use NumPy internally.
How it works: NumPy’s most important feature, at least for our purposes, is the NdArray object that allows users to create an array of N dimensions.
These objects are many times more efficient than Python’s built-in data structures, and are incredibly versatile. If you ever need to express images, sound waves, or other binary structures as an array of real numbers, use a NdArray.
NumPy’s data structures go a long way toward making up for Python’s weakness in speed, which is why so many other computing libraries make use of it.
Getting started: Because it is the foundation for so many other libraries, we cannot recommend mastering NumPy strongly enough. Luckily, it is well-documented, and many tutorials are available online. There’s even a useful cheat sheet, which we’ve included here.
Overview: SciPy is one of the many libraries built on NumPy, and is often considered to be part of the same stack.
How it works: Making use of those handy N-dimensional arrays, it takes things a step further by introducing advanced algorithms for data handling and visualization.
SciPy is an extremely powerful tool for advanced scientific computing. In their own words, “with SciPy an interactive Python session becomes a data-processing and system-prototyping environment rivaling systems such as MATLAB, IDL, Octave, R-Lab, and SciLab.”
Getting started: Learning the ins and outs of SciPy will make your machine learning programming that much easier, since it can handle most of the complex data manipulation for you.
Learning algorithms and when to use them can be intimidating for rookie developers, but SciPy, like NumPy, is extremely well documented and supported. Compared to some other data science libraries it’s actually quite intuitive.
Overview: Together with SciPy and NumPy, Matplotlib helps make up the holy trinity of Python scientific computing.
How it works: Where NumPy provides foundational data structures, and SciPy provides algorithms for data manipulation, Matplotlib specializes in data visualization.
Good visualization is an essential part of any machine learning enterprise. After all, training your machine learning algorithms to recognize patterns isn’t too useful if you can’t read the results.
Getting started: The field of libraries that visualize data is crowded, but Matplotlib remains at the top because of its versatility. There is almost no type of chart or plot that it cannot create, and you can customize every detail down to every label.
Matplotlib is also supported by nearly every popular Python IDE. That versatility does come at the cost of ease-of-use, however; it’s not quite as user-friendly as competing data visualization libraries.
The next group of Python libraries are more complete packages, as it were. Rather than being used for general scientific purposes, these libraries specialize in deep learning and machine learning.
These packages are a great place to start if you want to build your machine learning program out into a working application, and do a great deal to ease the entire process.
Overview: Theano is a library that specializes in creating multi-dimensional arrays and making advanced mathematical operations more efficient. If this sounds a lot like NumPy, that’s because it is tightly integrated with NumPy, and uses it at the lowest level.
How it works: In many ways, Theano can be thought of as NumPy’s more advanced and specialized form, one that can make Python nearly as efficient as C or R.
Since Theano was developed specifically for machine learning at the Université de Montréal, it is an excellent tool for that application, even if it doesn’t handle the machine learning algorithms itself.
Getting started: Check out the Theano site for tutorials, documentation, FAQs and information on installation.
Overview: TensorFlow is almost certainly the most well-known open source machine learning library available for Python, and for good reason. It was developed by Google, and is used in nearly every Google application that utilizes machine learning.
How it works: If you’ve used Google Photos or voice search, then you’ve been using TensorFlow.
TensorFlow is extremely well documented and supported, and is optimized for speed. It is more difficult to learn, however, because it is actually a Python front-end coded on top of C or C++.
Getting started: The TensorFlow website contains extensive tutorials on how to use the library for any number of machine learning applications.
Overview: Built on top of Theano and TensorFlow is Keras, a high-level library for working with datasets.
How it works: Keras is best known for being one of the easiest machine learning libraries out there because it is coded entirely in Python, while using either Theano or TensorFlow as a back-end.
It is the most beginner-friendly library for machine learning, and includes functions for creating training datasets and more.
Keras' neural networks API was developed for fast experimentation and is a good choice for any deep learning project that requires fast prototyping.
Again, this is far from a comprehensive list of machine learning libraries. Python is a versatile language, and there is a library available for any preference.
The highlighted libraries are a great place to begin your journey, however, to understanding the complexities of the larger problem of machine learning before delving into uncharted waters.
Using machine learning intelligently
Machine learning is an extremely powerful tool, but you have to be careful in how you apply it.
It’s easy to think of the technology as a means to create perfect statistical models that are free from the yolk of human bias. After all, the results you find are reached mathematically, so anything the program finds can be trusted implicitly. After all, numbers don’t lie, do they?
Computer programs are not swayed by emotion or preconceptions, but the people who program them are.
In remarks made at the SASE conference in Berkeley, Maciej Cegłowski described machine learning as “money laundering for bias,” a means of disavowing responsibility for the results that your computer program has found. This shouldn’t put you off of machine learning, just keep human error in mind.
Machine learning is not a perfect solution to every problem. However, the right application of these powerful algorithms are taking computer science to new heights every day. Treated with the same caution and care that you would any other technology, learning python for machine learning can be a useful and marketable skill that will put you on the forefront of an incredible new wave of technological advancement.