What is it to be a Data Scientist - An Interview

A professional data scientist answers the 12 most asked questions. Clarify and debunks the myths you've built surrounding this field!

What is it to be a Data Scientist - An Interview

This article is the summary of an interview with Kashyap Barua [1], an experienced data scientist. These most asked questions were taken from the Learn AI Together community on Discord [2].

Short Introduction

Kashyap Barua is a professional data scientist working at MiQ [3]. His background is mostly in computer science and engineering, which he has done in the Kalinga Institute of Industrial Technology [4] as well as many Coursera [5] certifications to improve his skill set in both machine learning and data science. But as he will explain, there are many different ways to get into the data science field. Being an engineer is definitely not a requirement!

Data science is a very large field and the entryway is quite a mystery for many people. This is why I asked our community, what would be the #1 thing you would like to ask a professional worker in the field of data science if you had the chance to speak with one. Of course, the answers are all be subjective to Kashyap Barua, but his answers are very interesting and it will definitely clear out many of your questions, too!

It may even help you in deciding whether or not you will choose the data science path! But enough talking, and let’s get right to it!

The interview

Photo by CoWomen on Unsplash

Here, Kashyap answers the 12 most asked questions by our community about the field of data science.

1 — What is Data Science?

Data Science at an overview is an interdisciplinary field that uses Math, Business Acumen, and Algorithms to solve problems while using structured and unstructured data.

2 — Who is a Data Scientist?

A Data Scientist is a person who is responsible for collecting, analyzing, and making sense out of data, at the same time using a large amount of data. A Data Scientist is expected to know statistical techniques, programming languages, and other visualization tools to be able to make sense of the data and solve business problems.

3 — Is a Ph.D. or Master’s degree required to get placed in a big company or is the skill enough to get in? e.g. Kaggle wins, personal projects, etc. (By Sowjanya)

This has not been mandated in companies that you need a Ph.D. or Master’s degree to get placed, to be honest. Though there might be some companies that ask for these advanced degrees, a majority of companies don’t require them. I would recommend using online platforms like Datacamp, Coursera, and Udacity to get a grasp of this domain. You can build your profile through Kaggle submissions, personal projects as well which would help you get an edge over other candidates.

Note from the author: Kaggle is an awesome platform. It is full of free courses, tutorials, and competitions. You can join competitions for free and create a team to work with amazing people. The competition gives you a problem to fix and data to achieve this, you only need to download their data, read about their problem, and start coding right away! You can even earn money from these competitions and it is a really great thing to have on your resume. This may be the best way to get experience while learning a lot for free. And even earn money!

4 — What are the best projects to have in your portfolio to get your first job in data science? (By Rephawl Roriz)

There are tons of projects out there that could help you build your profile as a Data Scientist. But again, Data Science is a superset of a large number of tasks, i.e., Data Cleaning, Data Collection, Data Visualization, or Modeling. Based on these categories you could pick projects like Exploring Bitcoin Cryptocurrency Market Data, Predicting Credit Card Approvals, Text Analysis of a famous person’s Twitter profile. A lot of other project topics can be obtained at https://www.datacamp.com/projects/.

5 — What is the best programming language to start off to become a data scientist? (By Deep)

My personal favorite is R. The market does not weigh R over Python, there is always one company looking for R and the other one looking for Python based on their use-case. But I can recommend Python as a lot of the packages are being updated for this language and the visualization packages are pretty awesome for Python as well. R is more inclined towards statistics or research-oriented work for the user while Python allows productization of your work and scale to other tools in your organization.
But the most important takeaway from my career is that you should learn SQL and this should be prioritized at the very beginning. All companies expect that Data Scientists or Data Analysts know how to use SQL to shape data, R and Python come second.

6 — What are the selection criteria for recruiters? Which skills do they look for? (By Sowjanya)

A majority of recruiters look for SQL. Any organization on any given day uses data storage. You need to be able to extract the data from these data sources before even doing any sort of data manipulation or modeling. Companies stream TBs worth of data daily and these data cannot be used directly using R or Python. Hence, you need to aggregate the data to the most convenient form, which is where you need to know SQL.
Apart from SQL, they’ll expect you to know either R/Python and a dashboarding tool like Power BI/Tableau/Metabase.

7 — How to get started reading research papers, and find the best ones, when there is such a humongous amount of them relating to the field? (By Avhijit)

Good question Avhijit. Having worked on 7 research papers and published them under my name by now, I think I have a good answer to this question. I started publishing papers back in 2016 and one thing I learned is that you need to have a solid topic before you start off writing a paper. You need to have an idea about what you want to do or new research into. I know there are millions of papers out there, but once you have a topic that you want to work on, that million becomes thousands now. Working into the sub-topic aspect, that thousand becomes a hundred and so on. Now you have hundreds of papers to choose from and understand before you start writing your own. I for instance wanted to understand and write a paper on the Retail domain. I started going through 30 different papers on innovation and research in the Retail sector. This is when I got an idea about proposing my own framework and I started writing my first paper.

8 — What is usually the first task assigned to a data scientist right after you’re hired? (By Anab Akhtar)

So, any Analyst or Data Scientist who joins a company is not asked to start analyzing data or start modeling/predicting. The first thing that the professional needs to do is connect with all the relevant Points of Contact within the company to understand the business. The Data Scientist needs to understand how the business functions otherwise data alone wouldn’t make sense. He needs to be aligned with the business outcomes and goals of the team or company. Once he understands the lay of the land, he starts going through all the data sources and understands how the data looks like and what all DBs store what type of data. As soon as you are production-ready, you start writing your own scripts to analyze huge amounts of data and make sense out of them, though the modeling and prediction come at a later stage of his work.

9 — Can someone coming from a different background than computer science enter the data science stream? If yes, what does he need to learn to achieve that? (By Salman)

A cool thing about this domain is that you can become a Data Scientist regardless of what degree you did and what subjects you specialized in. A Data Scientist is expected to know some tools and technologies before being hired, some of which are Basic-Intermediate Statistics, SQL, R, or Python. These are the basics that you need to know and the rest is all going to be a bit smoother for you. I have had colleagues who majored in a bunch of diverse streams like Economics, Philosophy, etc, and started working as Data Analysts and worked their way up to become Data Scientists.

10 — What makes you different from other data scientists? (By Haswanth)

I am currently a Product Analyst working for a Data Science Team. While the Data Scientists work on analyzing data and then making their production-ready models and tools, I need to be able to understand their work well so that I can track the performance metrics of their tools. I connect with a lot of stakeholders around, also clients to understand their requirements and convert them to easily interpretable forms for the Data Scientists to consume and build their products accordingly.

11 — How important he finds the role of statistics in your day-to-day work? (By Normalized Nerd)

Statistics are really important for your role. I once had to perform some A/B Tests on a feature for the product that we had released for our audience. To perform the A/B Test, one needs to know the difference between using the Frequentist Approach and the Bayesian Approach. If you want to use the Frequentist Approach, you need to understand the nuances of t-test and p-values to be able to successfully accept or reject null hypotheses. On the other hand, if you want to go ahead with Bayesian Approach, you need to know the theories of Prior and Posterior probabilities and also Bayes Theorem and conclude your A/B test results. This is just an instance of how statistical techniques were required for my use case, there could be numerous other cases and requirements. Hence, statistics is very important for the role.

12 — What is the hardest part of a Data Science job?

One of the hardest parts of the job is to understand the business and requirements well before you start working with the data. If a stakeholder comes with his/her requirements, you need to understand exactly what they want out of the data, as the next couple of days (or sprints) are going to be you trying to solve that. If you interpret the requirements incorrectly, the entire week’s worth of work is going to go to waste, and companies are bound with strict time frames to complete your tasks.


Conclusion

There it is! I hope that these answers helped you understand what is a data scientist and maybe demystified some myths you had in mind! Thank you again Kashyap for your time and the great answers, feel free to connect with him on LinkedIn!

Join the Discord community, Learn AI Together. These 12 most-asked questions were taken from our 5 800 current AI enthusiasts members! It is the best place to share your projects, papers, best courses, find Kaggle teammates, ask questions, and much more!

Photo by Tim Marshall on Unsplash

If you like my work and want to support me, I’d greatly appreciate it if you follow me on my social media channels:

  • The best way to support me is by following me on Medium.
  • Subscribe to my YouTube channel.
  • Follow my projects on LinkedIn.
  • Learn AI together, join our Discord community, share your projects, papers, best courses, find Kaggle teammates, and much more!

References

[1] Kashyap Barua, Professional Data scientist, https://www.linkedin.com/in/kashyap-barua-4ab640b6/

[2] “Learn AI Together” community, Discord, https://discord.gg/learnaitogether

[3] MiQ, Consulted on December 14th, 2020, https://www.wearemiq.com/

[4] Kalinga Institute of Industrial Technology, Consulted on December 14th, 2020, https://kiit.ac.in/

[5] Coursera, Consulted on December 14th, 2020, https://www.coursera.org/