Applying Data Science to Data Science: a deep dive into the best-loved technologies in the world of Data Science in 2020.

Image for post
Image for post
Photo by Christopher Burns on Unsplash

Digging in

This article is broken into two parts:

  1. Technology — This section takes a deep-dive into the technologies that the Data Science world has been writing about and responding to this year. …


Getting Started

Image for post
Image for post
Photo by Taylor Vick on Upsplash

Getting abstract

Arguably at the top of this pyramid of abstraction is the concept of serverless computing, and it is built on the idea that (as the name suggests), developers need not spend time configuring servers and writing boilerplate app code, and should instead dive straight in to writing and deploying the code that ‘really’ drives business value. This can also make it super easy for developers, Data Scientists and others to deploy simple applications and services with little-to-no experience of configuring the infrastructure needed to deploy ‘classic’ web apps. If that sounds like it may be useful to you, then great! …


Getting Started

Image for post
Image for post
Erlend Ekseth on Unsplash

Deployment is hard

That’s where automation can come in very handy: careful automation of ML pipelines can massively boost your productivity by allowing you to rapidly iterate on a pipeline in order to account for new business logic or modelling changes, while also ensuring those changes meet key performance criteria before going into service with your stakeholders/customers. …


Image for post
Image for post
Kelli McClintock on Unsplash

In Part 1 of this series, you saw a few practical examples of how Object-Oriented Programming (OOP) can be used to help you resolve some code design problems. If you missed it, it’s over here:

Right, let’s dig in.

Getting technical

  • Classes — The definition of the data and procedures available to a given structure. In other words, a class defines what data it refers to, and what procedures (methods) can be used on this data. …

Creating CLIs can help improve accessibility and reuse of your ML pipelines, but they can be a pain to set up. Enter Fire.

Image for post
Image for post
Photo by Cullan Smith from Unsplash

What is Fire?

So, why write a CLI? Practically, a simple CLI can make configuring a script as simple as changing a couple of command line arguments. Let’s say you’ve got a script set up on an orchestration service (maybe something like Jenkins) that regularly retrains your latest and greatest Tweet sentiment classifier. Let’s say it’s a Scikit-Learn Random Forest. …


Image for post
Image for post
Photo by Rene Böhmer from Unsplash.

What is Streamlit?

With this relatively mature ecosystem in place, you may question the need for a yet another framework to join the pack. But that’s exactly what the team over at Streamlit are doing: introducing a brand new framework for building data applications. What’s more, they’ve created quite a bit of buzz around their project too, so much so that they recently closed a $21M Series A funding round to allow them to continue developing their framework. …


Image for post
Image for post
Kelli McClintock on Unsplash

The power of OOP

However, it isn’t uncommon for many programmers to swirl around concepts like OOP for many years — perhaps gaining the odd bit of insight here and there — but not consolidating that understanding into a clear set of ideas. For beginners too, the concepts of OOP can be a little bewildering, with some guides utilising language-specific OOP implementations to illustrate ideas and many using subtly distinct of overloaded language, all of which in turn can sometimes obfuscate OOP concepts in the more generic sense. …


Image for post
Image for post
Photo by JJ Ying on Unsplash

What you’ll learn

  • An overview of the origins and aims of the MLOps movement;
  • An introduction to a couple of key MLOps concepts;
  • A tutorial for setting up a Continuous Training/Continuous Delivery (CT/CD) ML pipeline with GitHub Actions and Google Cloud Functions.

The tutorial section is designed to make use of free (or nearly free) services, so following along should cost you a few pennies at most. If you’re working on an MVP and need some ML infrastructure in place sharpish but want to avoid the price tag and technical overhead of AWS SageMaker or Azure ML deployments, you might find the example useful too. Finally, if you’re interested in understanding how the tutorial fits together to run it end-to-end for yourself, you should check out the previous post in this series on deploying lightweight ML models as serverless functions. …


Making Sense of Big Data

Image for post
Image for post
Photo by Zbynek Burival on Unsplash

A deployment conundrum

However, if you don’t have access to ongoing technical support in the form of DevOps or MLOps teams, then wading through cloud services to set up load balancers, API gateways, continuous integration and delivery pipelines, security settings etc. can be quite a lot of overhead. Moreover, unless you’re pretty confident with these concepts, delivering (and monitoring) an ML API for which you can guarantee security and performance at scale and thereby engender the trust of your users can be challenging. …


TQDM is an easy-to-use, extensible progress bar package for Python. This short post gives you a few useful examples to get you started.

Image for post
Image for post
Volodymyr Hryshchenko on Unsplash

Tracking your progress

It’s not uncommon for folks to opt to simply print status messages to console, or in some slightly more sophisticated cases use the (excellent and recommended) built-in logging module. In a lot of cases this is fine. However, if you're running a task with many hundreds of steps (e.g. training epochs), or over a data structure with many millions of elements, these approaches are sometimes a little unclear and verbose, and frankly kind of ugly. Plus, adding little ‘developer experience’ touches to your code (such as progress bars!) …

About

Mark Douthwaite

Applied AI specialist, computer scientist, software engineer. Read more at https://mark.douthwaite.io/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store