View on GitHub

Data Science Depot

Len Greski's articles and associated reference content for data science

Getting Started in Data Science

Students studying data science ask lots of questions about how to break into the field from another line of work.

My primary advice to someone starting out in data science is actually general job seeking advice. Since 75% of jobs are found through people’s contact networks (reference: 2015 personal conversation with a consultant at the Lee Harrison Hecht career coaching firm), a beginning data scientist needs to develop a contact network that includes data scientists.

The toughest part of any job search is getting past the keyword-based resume filters employed by many HR departments. Building your network is the best way to get past the automated resume filters.

Second, consider jobs that are related to data science, but called by other names: market research, analytics, web analytics, etc.

Third, don’t be afraid to take a job that is “entry level,” even if it requires that you take a salary cut relative to your last job. Why? After a year or two of real world experience, you’ll be a lot more valuable.

Question: What are the key skill areas I must develop to be a viable candidate for data science roles?

A well-rounded data scientist in a corporate environment will have enough breadth of skill to contribute to a team in at least three of the following eight areas, including a credible story in the first two skill areas.

Area Description
1. Collection & Analysis The ability to acquire data from a variety of sources, manipulate it to remove or mitigate impurities (e.g. missing data), and transform it into a format suitable model building
2. Modeling The ability to develop hypotheses, select algorithms based on the characteristics of the data available, and build highly accurate predictive models.
3. Applications Embed predictive models into systems that are used by customers / end users on an operational basis (e.g. recommending cross-sell up-sell products on an e-commerce website), including the ability to generate predictions at high volume in less than 500 milliseconds.
4. Operations Manage, update, and support models in production software applications at an acceptable cost structure with no downtime.
5. DevOps Manage versions of code, algorithms, externally sourced components and test cases. Automate the build and deploy of models and and supporting components.
6. Solution Architecture for Data Science Assign responsibilities to components in a logical software architecture in a way that enables high performance, manageable cost, fault tolerance, security, and ease of scaling with large volumes of data.
7. Software Selection and Supplier Management Evaluate purchased components ranging from cloud-based infrastructure to machine learning capabilities (e.g. h2o.ai) based on objective evaluation criteria. Define and negotiate contracts with suppliers of purchased components so the costs of applications are manageable as end user usage and data volumes grow.
8. Business Value Management Define a market opportunity for a data science powered application, including one time costs, ongoing costs, and benefits over a 3 - 5 year period. Manage the implementation of the data science powered application to a production deployment, manage its operation and track benefits to ensure they meet or exceed originally estimated values. Add or modify deployed capabilities to increase generation of benefits relative to costs over the lifespan of the application.

Many of the data science curricula in universities are focused on the first two areas:

The Johns Hopkins University Data Science Specialization on Coursera offered via Coursera covers Applications in addition to the first two areas.

Generally speaking, the last five categories aren’t taught in universities because many of the PhDs teaching data science don’t have sufficient industry experience to teach in these areas.

However, experienced IT professionals have many of these skills, and these are the things one can leverage in an interview to gain access to data science jobs when one is at an entry level in the first two skill areas.

Question: referring to the prior question, what is a “credible” level of skill?

For an entry level data scientist role, “credible” means being able to provide relevant answers to questions that are appropriate for people who have completed a data science curriculum or bootcamp.

For example, students who have completed the Johns Hopkins Data Science Specialization should be able to provide concrete but entry level answers to the following types of questions.

“Credible” also means knowing one’s limitations, and relating experiences where one has quickly learned new things.

Question: What are data science career options for an experienced IT professional?

The biggest question is whether one is willing to take an entry level data science job at an entry level salary. Depending on one’s current salary and financial flexibility, it may be worthwhile to take an entry level data science job instead of a technical project or engineering manager role.

2 - 3 years of significant work as a data scientist will be more valuable to a person’s long term career prospects than taking a job that is related to data science but where a person isn’t developing a portfolio of completed data science projects.

The most important thing a person can do to enhance her/his career prospects is to develop relationships in the field where one wants to work. In the U.S., As noted above, 75% of jobs are found by networking, so the market places a premium on developing relationships before you make a career move.

Return Home

© 2017 - 2018 Leonard M. Greski - copying with attribution permitted