Data Science @ Nissan Digital: More than Scikit-Learn
Author : Anant Agarwal (Data Science Manager)
March 2024
A frequent question that’s asked by many candidates applying for the Data Science CoE at Nissan Digital, is what kind of technologies do you work with? Simply put, it’s “Data Science that’s more than just Scikit-Learn”!
If you are a Data Scientist, be it an experienced one or an aspiring one, you’ve likely built your expertise around Pandas for data manipulation, Seaborn for data visualization and Scikit-Learn for machine learning. While these Python packages are ubiquitous and powerful, they fall short in several measures. For example, dealing with large datasets becomes excruciatingly slow with Pandas; Scikit-Learn doesn’t include models that cater to niche untapped supervised learning problem statements for structured data, that are rich with immense business value. There are many more such examples.
So, how do you tackle these issues? What skill sets do you need?
Data Science is a vast, evolving and burgeoning field, with the advent of Large Language Models in the last year. Thus, it’s a key to keep a continuous learning approach when it comes to Data Science and Machine Learning and there’s no one size that fits all.
Each problem statement comes with its unique asks. The Data Science team at Nissan Digital works on cutting-edge problems across sales & marketing, supply chain, sales finance among other functions across geographies.
Developing efficient and scalable Data Science solutions is at the core of the Data Science CoE, while keeping up-to-date with the latest technologies. This could be in the form of Polars for lightning fast data manipulation, Nixtla for a comprehensive set of time series forecasting algorithms, Pyro for probabilistic modelling, DoWhy for causal inference, and LangChain for large language modelling, among many other techniques. The team frequently evaluates out-of-the-box solutions as benchmarks to develop custom, flexible, high-performing solutions internally from scratch. This can range from evaluating Google’s LightweightMMM framework for marketing mix modelling, to Amazon Textract for OCR applications.
Development is just one piece of the puzzle. Once the solution has been developed, it needs to pass a productionization phase. It is not limited to a series of user acceptance testing (UAT), implementing automated data and machine learning pipelines, as well as CI/CD deployment. In a widely-cited report by Gartner, about 85% of machine learning projects fail to go past the proof-of-concept (PoC) phase into deployment. The Data Science team breaks through this frontier of Machine Learning Operations, or simply MLOps, to unlock business value. The team works alongside data engineers, software engineers, solution architects, on cloud platforms like AWS with technologies such as Docker.
Circling back to the initial question of what is Data Science @ Nissan Digital – it is a blend of carefully honed automotive business expertise along with a mindset for continuous learning for niche powerful data science skills. Nissan Digital provides an environment that encourages the brightest Data Scientists to shine on pressing business problems that interest them.
While this is a challenging proposition, it is indispensable for a team to unlock any real business value. After all, Albert Einstein famously quoted, “In the middle of difficulty lies opportunity” and as Data Scientists, we cannot agree more.