Blog - The Big Data Landscape

Interview With Wim De Waele, CEO of iMinds

Interview with Wim De Waele, CEO of iMinds. I recently interviewed Wim at the iMinds Conference in Ghent Belgium. Caveat: This post is not specifically big data related, but I wanted to share it with you since it’s the kind of interview I enjoy doing. Wim and his team put on an event sponsored by major media companies and governments in Europe. In fact they had over 17 events that made up the conference, with more than 4,000 attendees in total.

Top 8 Laws Of Big Data

These are the Top 8 Laws of Big Data, based on hundreds of discussions with Big Data insiders.

1. The faster you analyze your data, the greater its predictive value. Companies are moving away from batch processing to real-time to gain competitive advantage.

2. Maintain one copy of your data, not dozens. The more you copy and move your data, the less reliable it becomes (example: banking crisis).

3. Use more diverse data, not just more data. More diverse data leads to greater insights. Combining multiple data sources can lead to the most interesting insights of all.

4. Data has value far beyond what you originally anticipate. Don’t throw it away.

5. Plan for exponential growth. The number of photos, emails, and IMs, while large, is limited by the number of people. Networked “sensor” data from mobile phones, GPS, and other devices is much larger.

6. Solve a real pain point. Don’t think of big data as a stand-alone shiny technology. Think about your core business problems and how to solve them by analyzing Big Data.

7. Put data and humans together to get the most insight. More data alone isn’t sufficient. Look for ways to broaden the use of data across your organization.

8. The focus in IT has shifted from Technology to Information. Those that fail to leverage the numerous internal and external data sources available will be leapfrogged by new entrants.

6 Insights From Facebook’s Former Head Of Big Data

Ashish Thusoo knows a lot about Big Data. Thusoo joined Facebook in 2007 when the company had 50 million users. He left when it had some 800 million. During that time he managed Facebook’s internal data analytics team.

Facebook’s analytics team managed the data and analytics for ad targeting, user growth, and user engagement. Now Thusoo has a new company, Qubole, which is building a Big Data platform in the cloud.

Thusoo’s insights have a single overarching theme: the democratization of data. By this he means opening up data analytics to all users in an organization, from data scientists to product engineers and business analysts.

Here’s what Thusoo learned while scaling the data analytics engine at Facebook:

1. New technologies have shifted the conversation from “what data to store” to “what can we do with more data.” The lower comparative cost of open source technologies like Hadoop and Hive makes it possible to gather more key measurements. In the case of Facebook and other Internet properties, that means gathering a lot more data on user activity and behavior.

This reduction in cost also enables more historical data to be online. “The result,” says Thusoo, “is better data driven applications. At least in the data world, simple algorithms on more data seems to yield better results than complex algorithms on a smaller data sample, notwithstanding some exceptions.”

2. Simplify data analytics for end users. Put another way, what Thusoo learned at Facebook was that there “was a lot of power in democratizing data for data users” such as scientists, analysts, and engineers.

His goal was to make all capabilities related to data easy, from instrumenting applications and collecting data, to understanding and analyzing it, to creating data driven applications.

“Building familiar interfaces,” and tools to deal with data was key to increasing the adoption of underlying technologies like Hadoop and Hive within Facebook.

3. More users means data analytics systems have to be more robust. The vision of “democratizing data” among Facebook’s “data scientists, analysts and data engineers made things harder.”

To realize that vision, Thusoo’s team had to design in the ability to handle poorly written queries so they wouldn’t crash the system. They had to build mechanisms for sharing resources fairly, including usage monitoring and limits.

“We had many different kinds of users ranging from business analysts to product engineers with varying levels of understanding of the infrastructure or the best practices of using it.”

4. Social networking works for Big Data. ”We invested in making our tools more and more collaborative so that users could share analysis with each other and discover data by getting connected to expert users of a data set.”

With Facebook’s hyper-growth and data that was changing all the time, a collaboration approach “worked better than creating knowledge bases around metadata.”

5. No single infrastructure can solve all Big Data problems. When it came to real-time reports, Thusoo’s team made “a lot of investment as we discovered use cases… better solved through systems other than Hadoop. In the case of real time reports our team invested in building out Puma. There were many other examples around graph analysis as well as low latency data inspection on large data sets,” where they had to build or invest in new technologies.

6. Building software is hard, but running a service is even harder. Thusoo’s team had to do a lot of work to make the service usable. They invested a lot of time and energy in building “systems that would measure usage, point out bottlenecks and really quantify for our users how much they were using” the system. They had to build capabilities to monitor and deliver on agreed upon service levels as well.

The Big Data Landscape Video

David Feinleib presenting The Big Data Landscape as part of the Big Data Trends presentation at Perfect Storm 2012.


This is the Big Data Landscape. The interesting thing here other than the fact that there are lots of logos on the screen is that there’s been tremendous investment in the infrastructure area. Think about companies like Cloudera, HortonWorks, and so on. These companies have raised hundreds of millions of dollars in venture capital.

So I think this is really where the venture interest has been for the last few years. The next few years are all about Big Data Applications, especially in areas like operational intelligence, companies like Splunk. There are lots of opportunities for sales and marketing. Much of the investment going forward will be Big Data Apps and tools for enabling people to access Big Data.

Big Data Demystified: The Book Is Coming Soon

If you’ve enjoyed The Big Data Landscape and Big Data Trends, you’ll love Big Data Demystified. My new book will be out in the spring and will cover many aspects of Big Data. Look for it soon on Amazon!

Unlike other books in the space, this will not be a primarily technical book. My goal is to take the difficult to approach topic of Big Data and open it up to everyone. Big Data Demystified is a book about how data is impacting our daily lives.