Data science is
not about making complicated models. It's not about making awesome
visualizations. It's not about writing code. Data science is about using data
to create as much impact as possible for your company. Now, the impact can be in the
form of multiple things. It could be in the form of insights in the form of
data products or in the form of product recommendations for a company. Now to
do those things, then you need tools like making complicated models or data
visualizations or writing code. But essentially as a data scientist, your job is
to solve real company problems using data and what kind of tools you use we
don't care about.
Now there's a
lot of misconception about data science, especially on YouTube and I think the
reason for this is because there's a huge misalignment between what's popular
to talk about and what's needed in the industry. So because of that, I want to
make things clear.
I am a data
scientist working for a GAFA company and those companies really emphasize on
using data to improve their products. Therefore, this is my take on what is
data science. Before data science, we popularized the term data mining in an
article called from data mining to knowledge discovery in databases in 1996 in
which it referred to the overall process of discovering useful information from
data. In 2001, William S. Cleveland wanted to bring data mining to another
level. He did that by combining computer science with data mining. Basically, he
made statistics a lot more technical which he believed would expand the
possibilities of data mining and produce a powerful force for innovation.
Now
you can take advantage of computing power for statistics and he called this combo
data science. Around this time, this is also when web 2.0 emerged where
websites are no longer just a digital pamphlet, but a medium for a shared
experience amongst millions and millions of users. These are web sites like
MySpace in 2003 Facebook in 2004 and YouTube in 2005. We can now interact with
these web sites meaning we can contribute post comments like upload share
leaving our footprint in the digital landscape we call the Internet and help create
and shape the ecosystem we now know and love today. In addition, guess what? That's a lot of data so much data, it became
too much to handle using traditional technologies. Therefore, we call this Big
Data.
That opened a
world of possibilities in finding insights using data. However, it also meant
that the simplest questions require sophisticated data infrastructure just to
support the handling of the data. We needed parallel computing technology like
MapReduce, Hadoop, and Spark so the rise of big data in 2010 sparked the rise
of data science to support the needs of the businesses to draw insights from
their massive unstructured data sets. So then, the journal of data science
described data science as almost everything that has something to do with data
Collecting, analyzing, and modeling. Yet the most important part is its
applications. All sorts of applications.
Yes, all sorts
of applications like machine learning. So in 2010 with the new abundance of
data, it made it possible to train machines with a data-driven approach rather
than a knowledge-driven approach. All the theoretical papers about recurring
neural networks, support vector machines became feasible. Something that can
change the way we live and how we experience things in the world. Deep learning
is no longer an academic concept in this thesis paper. It became a tangible
useful class of machine learning that would affect our everyday lives. So machine learning and AI dominated the media
overshadowing every other aspect of data science like exploratory analysis,
experimentation, ... And skills we traditionally called business intelligence.
So now the general public thinks of data science as researchers focused on machine learning
and AI but the industry is hiring data scientists as analysts. So there's a
misalignment there. The reason for the misalignment is that yes, most of these data
scientists can probably work on more technical problems but big companies like
Google, Facebook, Netflix have so many low-hanging fruits to improve their
products that they don't require any advanced machine learning or the statistical
knowledge to find these impacts in their analysis. Being a good data scientist
isn't about how advanced your models are. It's about how much impact you can
have with your work. You're not a data cruncher. You're a problem solver.
You're strategists. Companies will give you the most ambiguous and hard
problems.
Now the thing
that's less known is the stuff in between which is right here everything.
Surprisingly this is actually one of the most important things for companies
because you are trying to tell the company what to do with your product. Therefore,
what do I mean by that? Therefore, I am analytics that tells you using the
data what kind of insights, what are happening to my users and then metrics. This
is important because what is going on with my product? You know, these metrics
will tell you if you are successful or not. Then you know to be testing, of
course, Experimentation that allows you to know, which product versions are the
best. Therefore, these things are actually important but they are not so
covered in media. What's covered in media is this part. AI, deep learning. We
have heard it on and on about it, you know.
But when you
think about it for a company, for the industry. It's actually not the highest
priority or at least it's not the thing that yields the most result for the
lowest amount of effort. That's why AI and deep learning is on top of the
hierarchy of needs and these things may be testing analytics. They're actually
way more important for the industry so that's why we're hiring a lot of data scientists
that do that. So what do data scientists actually do?
Well that
depends on the company because of its size. So for a start-up you
kind of lack of resources. So you can only kind of have one Data Scientists. So
that one data scientist he has to do everything. So you might be seeing all
this being data scientists. Maybe you won't be doing Artificial Intelligence or
Deep Learning because that's not a priority right now. But you might be doing
all of these. You have to set up the whole data infrastructure. You might even
have to write some software code to add logging and then you have to do the
analytics yourself, then you have to build the metrics yourself, and you have
to do A/B testing yourself.
That's why for
startups if they need a data scientist this whole thing is data science, so
that means you have to do everything. But let's look at medium-sized companies.
Now, finally, they have a lot more resources. They can separate data
engineers and data scientists. So usually in the collection, this is probably
software engineering. And then here, you're going to have data engineers doing
this. Depending on if you're medium-sized company does a lot of recommendation
models or stuff that requires Artificial Intelligence, then Data Scientists
will do all these. So as a Data Scientist, you have to be a lot more technical.
That's why they only hire people with PhDs or masters because they want you to
be able to do more complicated things.
So let's talk about a large company now. Because
you're getting a lot bigger you probably have a lot more money and then you can
spend it more on employees. So you can have a lot of different employees
working on different things. That way the employee does not need to think about
this stuff that they don't want to do and they could focus on the things that
they're best at. For example, me and my untitled large company I would be in
analytics so I could just focus my work on analytics and metrics and stuff like
that. So I don't need to worry about Data Engineering or Artificial Intelligence,
deep learning stuff. So here's how it looks for a large company Instrumental
logging sensors. This is all handled by software engineers Right? And then
here, cleaning and building data pipelines. This is for data engineers. Now here,
between these two things, we have Data Science Analytics. That's what it's
called.
But then once we
go to AI and deep learning, this is where we have research scientists or we
call it data science core, and they are backed by and now engineers which are machine
learning engineers. Anyways, so in summary, as you can see, data science can be
all of this and it depends on what company you are in and the definition will
vary. So please let me know what you would like to learn more about AI deep
learning, or A/B testing, experimentation. Depending on what you want to learn
about leave a comment down below so I could talk about it or I could find
someone who knows about this and I can share the insights with you.
Hope you have a
wonderful day. Hope this was helpful. Thanks for reading.
0 comments