In late November, I gave a guest lecture at the NHL Stenden University of Applied Sciences, presenting an introduction to text classification using SciKit Learn to third-year Data Analytics students. Life has been busy, so I've only just gotten around to writing about the experience.
To get the opportunity to teach in the classroom again, something I've seriously missed since moving on from my lecturer post at Teesside University, was fantastic. The students were curious and engaged, and I was lucky enough to be able to return to NHL Stenden's beautiful campus in Leeuwarden, capital of the Dutch province of Friesland, to deliver a follow-up lab session in early December as students worked on their assessments.
The learning outcomes for the session included understanding how naive bayes text classification models work in-depth, and employing them alongside support vector machines and random forests to solve real-world text classification problems. We started with the classic spam vs. non-spam classification problem, touching on Laplace smoothing, tf-idf normalization, rendering confusion matrices using matplotlib and pickling our trained models to disk for later use. By the time we got to the end of the lecture and moved into the lab session, it was clear that everyone in the room, myself included, had taken away something valuable from the session.
All-in-all, a fantastic experience. I very much hope to return again in future!
The slides and the worksheet from the session are available for download from my website, and the source code for the session is up on my GitHub.