Best minds in ‘big data’ work to harness its power

Jeff Hammerbacher, a math genius who worked on Facebook in its early days, is famous in data science circles for this quote: “The best minds of my generation are thinking about how to make people click ads. That sucks.” Not true at the UW. Here, the best minds are collaborating to ask questions and harness the power of “Big Data” to find answers and seek solutions to advance the common good.

What does “Big Data” mean? Off the coasts of Washington and Oregon, UW is deploying a sensor network with 750 miles of fiberoptic cable on the seabed to continuously stream massive amounts of data to ocean scientists. They will use it to study the processes, for example, which regulate global climate, store human-caused carbon, support major fish stocks and threaten coastlines.

In space science, a powerful telescope is helping to create a 4D time-varying map of the universe with extraordinary detail for astronomers to study and that the public can access. In the social sciences, UW researchers are using huge data streams disgorged by Twitter to study communication trends and how social media is used during disasters. In the life sciences, interdisciplinary teams of neuroscientists, biologists and computer scientists are analyzing massive data streams from insects to understand the brain-muscle interface.

These are just a few examples of the “Big Data” trend that is changing how discovery takes place across the widest imaginable range of disciplines. The rapid advance of technology is giving researchers more and better data than ever before. The task required by “Big Data” now is how to analyze and extract understanding from huge data sets so it can be turned into knowledge, insights and solutions to problems.

Three universities—the University of California at Berkeley, the University of Washington and New York University—are partnering with each other, the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation to learn how best to harness the full potential of all this data. The universities have received a five-year, $37.8 million award to speed the growth of data-intensive discovery in a range of fields.

Why this particular academic troika? The foundations considered universities across the country and concluded that these three are national leaders both in advancing the methods of data-intensive discovery, and in building the partnerships that put these methods to work for researchers. The UW’s eScience Institute was doing “Big Data” even before the term was coined. Founded in 2008 to support faculty research, the eScience Institute provided much of the groundwork that resulted in the grant award.

At the UW, the grant will be used primarily to fund salaries for new research positions, including five data scientists who specialize in software and will work with researchers across campus, four postdoctoral data science fellows pursuing interdisciplinary research agendas, and four partially funded research scientists stationed in other departments and centers.

Researchers are confronted with a tsunami of data that is rapidly growing in volume, velocity and variety.

A dedicated “data science studio” on campus will have meeting areas and drop-in workspaces to encourage collaboration across the UW’s colleges and schools. The new endeavor means people with data analysis problems will get new tools, new techniques and help solving their problems.

One reason getting these data scientists in place at the UW is so important is that many researchers still struggle with the knowledge necessary to maximize the use of these big data streams. While the goal of this project is to foster even greater collaboration with researchers across campus, it’s also to shift the culture of the university.

“We refer to data-intensive discovery as ‘the fourth paradigm,’” says Ed Lazowska, Bill & Melinda Gates Chair in Computer Science & Engineering and Director of the UW eScience Institute. “For centuries, discovery was driven by observation and experimentation. Then theory—a second paradigm—was added. For the past 50 years we’ve had a third paradigm: computational science. And now a fourth paradigm has been added, which we can already see is going to have enormous impact.”

Lazowska describes several challenges. Researchers are confronted with a tsunami of data that is rapidly growing in volume, velocity and variety. Even the best researchers often lack the expertise to effectively move “from data to knowledge to action.” The tools and techniques are evolving rapidly. New partnerships are required. New approaches to educating the next generation of researchers, too—facilitated by a new major Interdisciplinary Graduate Education (IGERT) award to UW from the National Science Foundation. Finally, new career paths must be created for the individuals who build the tools that enable this new approach to discovery. The grant and partnership with Cal, NYU and the foundations, plus the NSF IGERT award, are allowing the UW to take a big step in enabling this “fourth paradigm” and ensuring a future in which the UW will expand its role as one of the world’s powerhouse research universities.

Big data before it was cool

The recent $37.8 million grant, the academic partnerships and NSF IGERT award are just the latest milestones in the UW’s rich history of data-driven discovery. A few highlights include:

1970 > Department of Biostatistics established • This pioneering program was developed as part of the creation of the School of Public Health. It is now ranked third nationally among biostatistics departments.

1979 > Department of Statistics formed • This broader-scoped program is ranked sixth among statistics departments.

1999 > Center for Statistics and the Social Sciences launched • The first center of its kind in the country, it fosters collaboration between statisticians and social scientists and offers innovative case-based curricula for students.

2008 > eScience Institute founded • These experts in data mining, machine learning and sensor networks serve as matchmakers, helping researchers apply the most appropriate technology to their research. The eScience team consists of individuals with backgrounds in physics, astronomy, bioengineering, bioinformatics, data management techniques and computer science.

Big data and you

Interested in joining the data revolution? More than 7,000 people have already completed Introduction to Data Science, an online class taught by Bill Howe, Director of Research for Scalable Data Analytics at the UW eScience Institute. For those with some background in programming and databases, the course provides the basics for applying big-data techniques to the workplace. If that’s not enough, UW also offers a three-course Certificate in Data Science offered through the University of Washington Professional and Continuing Education program, which includes by data scientists from Microsoft and other local tech companies, networking opportunities with peers and case studies from the ‘front lines.’