15 Data Science Facts to Dig Through
Updated · Mar 06, 2023
As of now, each of us produces 1.7 megabytes of data per second, which equals over a hundred gigabytes per day. As the world’s online population continues to increase, the amount of existing data seems to grow larger at an even faster pace.
But data isn’t particularly useful simply by existing—somebody has to work it. And that somebody is called a data scientist.
Today, we’ll talk about the essence of this relatively new profession and share many of our favorite data science facts with you.
No worries—we’ve dug up, processed, and analyzed all the data already; you need only read it.
Ingenious Data Science Stats (Editor’s Choice):
- Demand for data scientists will see a 28% increase by 2026.
- Nearly half (48%) of data scientists have a PhD.
- The US currently employs 105,980 data scientists.
- Data science helped find the creepiest word in Shakespeare’s Macbeth.
- Swarm AI predicts Oscar winners with 90% accuracy.
- Companies using data science for customer analytics can outperform competitors by up to 23 times in certain indicators.
- The demand for data scientists is three times higher than the supply.
- Data scientists spend 80% of their time organizing data.
- There are approximately seven times more data scientists in San Francisco (CA) than anywhere else in the US.
What Is Data Science?
It’s a lot of things.
The simple explanation is that it’s a more sophisticated version of data analysis that includes developing all new processes for dealing with data to bring it to a usable state. Then, the analysis happens.
Let’s dive deep into data analysis—what it is, how it came to be, and some of the benefits it has brought to enterprises.
1. Data science follows a five-stage life cycle.
In order to better define data science, we ought to look at its main attributes. There are five of them, often referred to as “stages” within a life cycle: capture (acquire data), maintain (warehouse data), process, analyze, and communicate (report the data).
In short, data science is all about organizing raw data and extracting crucial insights from it.
2. The first official “data scientist” appeared in 2008.
The history of data science dates back to the 1960s, but the term “data scientist” first appeared in 2008 when large corporations saw the need for dedicated data specialists.
In 2009, Hal Varian, Google’s Chief Economist, wrote an article on the importance of data in the future. No wonder he’s now held his post at Google for two decades—his prediction was very much correct.
Fun fact: Just a few years later, the Harvard Business Review dubbed data science “the sexiest job of the 21st century.” We’re sure sexier jobs will crop up before long—a century is quite long—but we’ll concede the title for this decade.
3. There are 105,980 data scientists employed in the US.
(Source: U.S. Bureau of Labor Statistics)
The latest data science stats reveal that, as of May 2022, 105,980 people in the US work in the field. These professionals are spread across various industries, such as Computer Systems Design (16,620), Management of Companies and Enterprises (12,570), and Management, Scientific, and Technical Consulting (7,270).
The top-paying industries for data scientists, on the other hand, are “Computer and Peripheral Equipment Manufacturing” (an average of $71.29/hour), “Electronic Component Manufacturing” ($68.34/hour), and Information Services ($67/hour).
4. Companies using data science for customer analytics can outperform competitors by up to 23 times in certain indicators.
(Source: McKinsey & Company)
Basically, what data science does is allow companies to gather important information on how best to act by processing vast amounts of raw data for hidden insights.
A recent study found that businesses successfully applying customer analytics were able to acquire up to 23 times more new customers. All in all, the chances of a company to succeed in sales and marketing double when a company uses customer analytics.
5. Software trained to understand the sentiment behind emails can lead to 95% faster handling of customer requests.
So, what is data science used for specifically?
How does it help companies grow?
Here’s one example: Contextor is a robotic process automation tool that understands the sentiment behind emails. Companies using it can then identify who are the unhappy customers and prioritize their response, thus improving “incident handling times” anywhere from 15% to 95%.
Happier customers typically means better business, so it’d be silly to pass on this opportunity.
Fun fact: Sentiment analysis is a useful tool for reputation management, too. Solutions like Brand24 scan what people are saying about your brand online as well as the sentiment behind their comments. That way, if you have a negative review, you can be proactive and do some damage control before it escalates.
What Does a Data Scientist Do?
They waste a lot of time, like most of us—but they also do tons of fun stuff.
Among said fun stuff, there’s predicting who’ll win the Oscars (with almost perfect accuracy) or analyzing classic literature (and the creepiest vocabulary ever).
Read along to find out more data science facts.
6. Data scientists spend 80% of their time organizing data.
A good data scientist knows how to analyze data statistically—and they also know that’s but a tiny part of the job.
Approximately 80% of their time is spent looking for data, cleaning it up, and organizing it before it’s ready for analysis. It’s the final 20% that leads to the wringing of information to be used in the making of business decisions.
7. The average data scientist makes $108,660 per year.
(Source: U.S. Bureau of Labor Statistics)
All data science areas tend to be well-paid. The mean annual wage in the field falls at $108,660, but naturally, some make more than that.
For instance, the mean wage for a data scientist surpasses $130,000 a year in Washington and California. Salaries in New Jersey, New York, and Delaware don’t fall far behind, paying an average of $120,000+ a year for data scientists’ expertise.
Fun fact: Only 17.8% of the US workforce makes over $100,000 a year—and half of them are still living paycheck-to-paycheck.
8. Swarm AI predicts Oscar winners with 90% accuracy.
(Source: Unanimous AI)
An example of data science at work is Unanimous’ Swarm AI, software designed to “amplify human intelligence” by optimizing knowledge.
In the past four years, it has been able to predict Oscar winners with 90% accuracy, significantly outdoing professional critics.
Fun fact: Swarm software can predict more than just Academy Award winners. It’s good for predicting the outcome of professional sports matches and great for doing financial projections, too.
9. Data science helped find the creepiest word in Shakespeare’s Macbeth.
Here’s a funny data analysis story. Researchers trying to gain a deeper understanding of Macbeth ran a log-likelihood analysis of the play to figure out what words Shakespeare used more frequently than normal.
Many, such as “thane,” “hail,” and “cauldron” were to be expected due to the nature of the play, but others weren’t. “The” ended up with a log-likelihood of 41, which means that that particular word is used far more often in Macbeth than in your average Shakespearean masterpiece—about 30% more frequently, to be exact.
Upon further analysis and some more data science work, researchers realized that Shakespeare had a tendency to write verses like “the eye wink at the hand.” And that using “the” over “my” evokes dissociation from the character’s own body, which contributes to the creepiness that pervades the play. So, there you go—mystery solved.
It is precisely these sorts of data science facts that make us appreciate the discipline—it’s useful for more than just optimizing profits!
10. 69% of data scientists use Python.
(Source: InData Labs)
Data science and programming go hand in hand. After all, you’ll definitely need one programming language or another to analyze large quantities of data.
At the moment, Python is by far the most popular—69% of professionals in the industry use it. For comparison, just 24% use R.
Fun fact: 75% of data science job postings ask for experience with Python—more than it’s actually used.
Data Science Trends
If there’s one person who can make a half-decent prediction of future trends, that has to be a data scientist.
Unfortunately, we aren’t data scientists, so we’ll just give you the current trends and leave it to you to speculate about what the future might bring.
11. Data science is the third-best job in the US this year.
In light of the ever-growing importance of big data, it’s only logical to assume the future of data science is nothing but bright. And perhaps it is, but it used to be even brighter.
From 2016 to 2019, data science was considered the best job in the US, based on criteria such as average pay, job satisfaction, and job openings. In 2021, it was the second-best. Now, it's the third.
That said, there are over five times as many job openings in data science today than there were in 2016, and pay certainly hasn’t gone down, so it remains one of the foremost choices for a career—if you’ve got the skills.
12. The average number of data scientists working at large organizations worldwide rose from 28 in 2020 to 50 in 2021.
As the data science industry grows, so does the number of people it employs. As recently as 2020, 40% of large businesses employed no more than 10 data scientists. A year later, 81% of those big companies employed at least 50 each.
In other words, the corporate world saw the average number of employed data scientists go from 28 to 50 in just a year.
Fun fact: As of 2022, just over 50 out of the 3,216 postsecondary schools in the US teach data science majors. Five years ago, practically none of them did.
13. Demand for data scientists is three times higher than supply.
(Source: Analytics Insight)
Considering the sheer number of data science applications in multiple industries, it’s self-evident why the demand for specialists in the field is high.
And considering the substantial barrier to entry due to the vast range of skills necessary, it’s also clear why supply isn’t up to par.
Even now, there are three times as many job openings for data scientists as there are people looking for such working opportunities—which is at least partly why data science is such a well-paid career path.
Furthermore, the demand for these professionals is unlikely to abate any time soon. On the contrary, experts suggest it will rise another 28% by 2026.
Fun fact: In general terms, about 1%-2% of the world’s population has a PhD. Yet, specifically, 48% of data scientists have one. See what we mean by “substantial barrier to entry”?
14. There are approximately seven times more data scientists in San Francisco than elsewhere in the US.
The concept of data science first arose in connection with computer science, and so, to this day, it remains associated with technology. This likely explains why tech hotspots such as San Francisco and San Jose have 7.2 and 6.3 times more job posting for data scientists, respectively, compared to the average across the US.
California as a whole boasts a location quotient (local demand relative to national) of 2.1x. Interestingly enough, compared to other states, it’s far from being the most data-science-saturated place—Washington DC sits at 5.6x and Massachusetts at 2.6x.
15. Only half a percent of all data is ever analyzed.
Various data science stats suggest that we generate multiple petabytes of data every day. Yet, much of it remains unused, as professionals only ever analyze about 0.5% of the total.
Think about it this way—pre-election surveys, for instance, can be fairly accurate, but they don’t always correctly estimate a candidate’s final results (2016 was notorious for this).
Similarly, 0.5% of the immense volume of data in existence is undoubtedly fairly representative of reality, but it’s neither complete nor perfect. And since the role of a data scientist is to extract useful information from data, we can only imagine the magnitude of knowledge hidden within those 99.5%.
Fun fact: There will be 175 zettabytes of data by 2025 (one zettabyte = 1,000 exabytes = 1,000,000 petabytes = 1,000,000,000 terabytes).
Every second you spend online generates data. Then, a data scientist somewhere goes through that data in search of something useful. Of course, it’s not your data in particular that interests them—it’s the data of all the relevant individuals in a given case.
This article provides but a glimpse into the work that these professionals undertake.
There are certainly many more data science facts to discuss—but they’ll require some more rummaging in the vast internet to uncover.
For now, we’ll say today’s lot will suffice.
A wayfarer by heart, Jordan fancies journeying into foreign lands with a camera in hand almost as much as he enjoys roving the online world. He spends his time poking at letters and pixels, trying to transmogrify them into something cool.