A Big Data TV interview with Director of Data Sciences Ofer Mendelevitch from Hortonworks. Ofer talks Big Data, Yahoo!, and the origins of Hadoop.
Dave Feinleib: Dave Feinleib here with another episode of Big Data TV. I’m speaking with Ofer Mendelevitch. Ofer great to have you on the show.
Ofer: Thanks for having me.
Dave: We’re here at Big Data date night at the Microsoft campus in Mountain View, California. Ofer tell us a little about Hortonworks. What the company does, and for those who don’t know, what Hadoop is.
Ofer: Hortonworks is one of the providers of Hadoop. Hadoop is an open source platform for data management and large scale computation that scales horizontally, and can process big data, lots of big data, relatively inexpensively and easily.
Dave: A little about the history of the company. You started out at Yahoo, and you sort of knew this technology really early on. Talk a little about that.
Ofer: What happened was, in about 2005 what is now known as Hadoop started in Yahoo. Yahoo hired the first person who started this project and built a team around Hadoop. Essentially moving it over into an Apache open source project. Then Yahoo was really a big user, and still is one of the largest users, of Hadoop, and they started investing in it more and more, progressing the technology. Hortonworks spun out of Yahoo. Around 2011 took the core team of developers in the Hadoop group and built essentially a company around it. Hortonworks today we provide HDP, which is Hortonworks Data Platform. We are completely one hundred percent open source. Everything we do goes back into open source. Our business model really is about support for Hadoop. For enterprises we really do that really, really well.
Dave: Great. You have a partnership with Microsoft where you work very closely with them on Azure. Talk a little about that.
Ofer: We have a very strong partnership with Microsoft. The partnership is really an engineered partnership, meaning that a big team of engineers on both sides worked really, really well to make sure Hadoop can run in the Windows environment, can be managed from the Windows platforms of management systems, management, etc. It’s a way to help Hadoop as well as Azure run on these platforms.
Dave: What are some of the key challenges you’re seeing? You’re out there talking to enterprise customers. What are some of the big challenges they have, and how are you helping them overcome them?
Ofer: One of the things I’m seeing really strongly is having had the experience of Yahoo using Hadoop really early on in a web company, I think there’s a lot of value that gets generated for an enterprise to use that. I think Hadoop was not really ready until maybe a year ago or so to be used in an enterprise environment. Now it’s almost like all the value enterprise can take advantage of. The community solved a lot of the problems that enterprises needed to have. Like security snapshots, a lot of different management tools and things like that. It’s almost ready for enterprise, and they can really leverage big data using this platform.
Dave: That’s great. Security snapshots, what are some of the other key insights or innovations that you’ve seen that enterprises really have to have to adopt this.
Ofer: A lot of it is the management tools. Hortonworks has launched Ambari which is an open source project. Part of the Apache, Hadoop eco-system to help take care of the dev-ops, the people who manage the cluster. You’ve got to take care of the developers. There’s a lot of tools that we keep innovating on — Pig and Hive too. There’s Hadoop 2.0 which is coming out soon. A lot of these tools are really what the enterprises need. In addition, security was done last year, and it’s pretty much solved.
Dave: Ofer, thanks so much for being on the show.
On this episode of Big Data TV, David Feinleib interviews Chris Pouliot, the Director of Analytics and Algorithms at Netflix. The talk about the tools, technology, and business of making great recommendations.
Dave Feinleib: Hi, I’m Dave Feinleib here with another episode of Big Data TV. We’re here at Big Data Date Night. My guest is Chris Pouliot, Director of Algorithms and Analytics at Netflix. Chris, welcome to the show.
Chris Pouliot: Oh, thanks so much.
Dave: Chris tell us a little bit about what you do at Netflix.
Chris: I run a team of data scientists. It’s a horizontal data science team that spans across all of the business verticals. We’re trying to derive insights from Big Data, and help to make better business decisions, or help create a better user experience for our subscribers.
Dave: We’re all familiar of course with Netflix recommendations on the movies. What are some of the hard challenges you’ve faced in working with movies and data, and large numbers of users at Netflix?
Chris: Pretty much it’s just how do we get the data. Working with big data technology like our data’s in the cloud and Hive. Also figuring out how do we perform analytics on the Big Data. Exploring ways of how to use distributed machine learning algorithms in the cloud? What are the technologies that are out there, and how do we best utilize those to make our data science the most productive as it can be.
Dave: What are some of the challenges that you’ve had to overcome? Is it scale issues, is it diversity of data? What are some of the issues that you grapple with?
Chris: It’s a little bit of everything. Diversity of data, my team does not only personalizations for movies, but we also deal with content demand prediction. Helping our buyer down in Beverly Hills figure out how much do we pay for a piece of content. The personalization recommendations for helping users find good movies and TV shows. Marketing analytics, how do we optimize our marketing spin. Streaming platform, how do we optimize the user experience once I press play. There’s a wide range of data, so theres a lot of diversity. We have a lot of scale, a lot of challenging problems. The question then is, how do we attract great data scientists that can just see this as a playground, a sandbox of really exciting things. Challenging problems, challenging data, great tools, and then just the ability to have fun and create great products.
Dave: That’s terrific. Are you a big movie buff yourself?
Dave: Any favorite movies we should know about from your viewing preferences?
Chris: Sure. Netflix is getting into creating original programming. Our first TV series is House of Cards starring Kevin Spacey with David Fincher producing it. Excellent show, I highly recommend it.
Dave: Excellent. Chris, thanks so much for being on the show, great to have you.
Chris: Thanks so much.
On this episode of Big Data TV, SurveyMonkey Director of Analytics, Fedor Dzegilenko talks about Big Data, relational databases, and how to make a great survey.