From the Office of the Provost
Episode 12: Responsible Data Science at the University of Pittsburgh
[INTRO MUSIC]
Joe McCarthy: Hello and welcome to “From the Office of the Provost,” the podcast that highlights exciting activities and initiatives in the Office of the Provost or University-wide that bolster and enhance our collective vision for growth and transformation.
I'm your host, Provost Joe McCarthy, and today I'm joined by Mike Colaresi, Associate Vice Provost for Data Science.
Mike joined the Office of the Provost in 2023 and leads the responsible Data Science Initiative, commonly known as RDS @Pitt. He also serves as a William S. Dietrich II Chair of Political Science, a role he's held since 2017 when he came to the University.
In leading RDS at Pitt, he heads our efforts to expand human driven data science across the University of Pittsburgh.
Welcome, Mike.
Mike Colaresi: Thank you so much. I'm really happy to be here.
McCarthy: So Mike, as you know, RDS @Pitt was created to accelerate, expand, and deepen the practice of responsible, human-driven, interdisciplinary data science across the University and the Pittsburgh community at large. Could you tell us a bit more about the goals of this initiative?
Colaresi: Absolutely. It's been such an honor to help lead this initiative, and we really do have a lot of momentum going on from the Data Science Task Force and, you know, other initiatives that predated me, so the Year of Data and Society and other parts.
So, the first thing we wanted to do is make sure people understood what is responsible data science and why is Pitt a special place for this to lead both in the country, the region, and the world.
And so, in particular, we focus on the alignment of the planning, the development, the application, and improvement of database and computational tools with the values of communities and organizations, but all of that is in the service of empowering positive decisions and mitigating harms.
So, we absolutely focus on applications. Our heads aren't in the clouds. Responsible data science is about people, it's about applications. It's about decisions and the data and modeling is all part of that.
And it's really our goal is to share that with everybody and make sure that people at Pitt understand that this is something like we're great at and society needs from us that we absolutely have to organize and interdisciplinary ways to deliver on this promise.
That means knitting together our expertise across the University. We have great disciplines, but we also have great interdisciplinary collaborations that RDS @Pitt help elevate and bring to the next level, but then we also have external partnerships.
And that's another dimension of interdisciplinarity that's very important to responsible data science is that we're taking into account what our students, future employers need; what society needs; what people are going to need for citizenship in the future, and we're listening all the time to how technology is changing, how needs are changing. And so we have this shared understanding that we're trying to grow and build this interdisciplinary innovation and collaboration that we're driving and we're enhancing student success throughout this.
So, it's really important for us to make sure students are included in each one of these decisions and projects that we build out so that they're working with external partners, that they're working in interdisciplinary ways, and that they're staying connected with their major and disciplinary focus that they have.
So, it's in-class learning that we're helping build new curriculum for, right, as well as out-of-class experiences that really broaden and expand how people use and know how to share responsible data science practices.
McCarthy: Excellent. Thank you, Mike. That's a great definition, I appreciate it. And I know you and your colleagues are putting RDS and what it means on the map.
Something that we hear about a lot in this space that I'd like to probe a little bit is generative AI. Can you explain how the efforts of you and your colleagues relate to what's going on with genAI?
Colaresi: Yeah, absolutely. This has been something that's really come about, like, since I started this job in 2023 that we had a lot of momentum and we were defining responsible data science and then everyone wants to talk about generative AI and how does that relate to data science? How do you be responsible with these new AI tools?
And we've thought a lot about it and one of the conclusions we came to is we're just incredibly well situated to help lead in this space because responsible data science is part of dealing with the anxiety of generative AI and AI tools that we're very comfortable.
Like when we worry about AI tools and emerging technologies that are digital in particular, we're worried about jobs. We're worried about what value we can add, right, to particular white-collar or blue-collar tasks and and what's going on.
That's what we've been focused on the whole time—the human side of better decisions and how people can work with technology to do better than people could do alone, right? Or that technology can do sort of naively running on its own.
And so, it's always about the value add, right, between those things.
Now there is some thought leadership we've done in this space and there are a couple of messages I really want to sort of get through.
Number one is: AI literacy is not going to be enough for our students. One of the things we've been able to do is partner with industries and I’m happy to talk more about our External Advisory Board that's just been crucial for what we're able to do. But what we hear from them over and over again is that things are changing at a much more rapid pace. So, what we need when we're hiring is we need people that can be entrepreneurs, they can be leaders on their own and it doesn't really matter their role.
Data and models, they're everywhere, right? AI models and their uses, or potential, they need people that can have the agency to choose, right, over different tools and their uses and their applications. They need people with confidence that are willing to experiment, like, with new technologies and see what works and what doesn't responsibly.
And they absolutely need to have accountability and not just say it's the algorithm's fault, or the dashboard told me to do this, but learn how to build those systems and engineer them, communicate with them, bring teams together, and you know, actually field new stakeholder groups, right, that can help us do better with these tools and technologies.
We call this digital leadership. That's what we're going for. That's what we're organizing in kind of the AI and education space around responsible data science, because it's about the use of the tools, the alignment of these tools to the decisions you want to make, the principles, and the values.
And I just, we're uniquely positioned, I think, to do this both at Pitt, right, and through the Office of the Provost.
McCarthy: So, as we discussed, RDS is focused on use-based, responsibly applied data science, so it's critical that we engage with industry leaders and other external stakeholders.
And in advancing this mission, you just alluded to the Advisory Board. I'd like you to tell us a little bit more about them and how they are connected with the project.
Colaresi: Absolutely. One of the first things we did as we read over the Data Science Task Force—we saw the successes that Nora Mattern was able to create for the Year of Data and Society—was Pitt has this unique place of being practical.
It has the unique place of having partnerships with industries, with governments, with nonprofits both in the region, right, and more general. And we knew we needed their expertise.
We wanted to be the opposite of an ivory tower. We absolutely wanted to be a welcoming and listening place, and we wanted to serve their interests because I think that serves our students’ interest.
So, with the help and co-leadership of Andrew Hannah, who's an adjunct faculty member in business but also a very successful local entrepreneur, we got sort of brass to tacks, like, who do we partner with, right?
Absolutely, UPMC. But they're very generous. They're like, “you need to bring Highmark along too.” We need the health industry involved because our students, you know, are getting jobs. We have research partnerships with both organizations.
So, we partner also with Dick's Sporting Goods, Sheetz, right, in both the operations and retail space. Data and modeling is everywhere. It's not just technology companies. And they've been amazing partners to us.
FedEx, right, also is a partner and on our External Advisory Board.
First National Bank, PNC, PPG, City of Pittsburgh and Hillman Foundation also to give us that broader perspective, right, both in sort of civic engagement on data and models, but as well in the Hillman Foundation has been crucial for us in having a sort of macro picture of the region, how they see the workforce moving forward.
So, we get all of these different voices, right, telling us how they see technology impacting their business needs. And we meet quarterly; they've been an incredibly active and engaged board, right?
So, we just have this deep alignment of interests that intersects with students. It intersects with learning, but at the base of it, right, is they see us as providing the most valuable resource that they have, right, which is the workforce of tomorrow and it's just been a real joy to work with them on that.
McCarthy: That's, that's really fantastic to have that degree of collaboration, that degree of exposure, both for our faculty and our students with external stakeholders and partners. Probably should serve as a model, frankly, for a lot of what we're doing at Pitt.
Could I turn the tide a little bit and ask you to focus a little bit on internal partners? What are some examples of how RDS has impacted those actually within Pitt's walls?
Colaresi: Yeah, another mantra for us is, don't reinvent the wheel because Pitt has so many amazing engines of innovation and training. And so we kind of operate in two modes.
One is a connector, and in particular we know the Department of Biomedical Informatics. We have an amazing Center for Computational Drug Abuse Research, right, for example. And so, when they have needs, like, we work with them, like, absolutely.
And in particular, for these very high-functioning, high-performing places that are absolutely doing world-class innovative work, some of the things we help them do is leverage multi-scale collaborations where if they're doing, let's say, cellular or molecular work, right, we help them think about like what are the societal consequences of the drug they're discovering.
Will people use it? What are their concerns going to be? What are the policy implications of getting that drug to market as quickly as possible? How do we simulate the larger macro public health effects right of these drugs in? You know, like a specific community and context and it's just been amazing working with these groups, right, to be able to do that.
But computational drug abuse research in particular, we know that addiction is a very, very important problem in our region, right. And in the country. And so being able to work with people on drug discovery, with social work, right, and adoption and dealing with these problems and people showing the value of a University.
And bringing these conversations to the fore and actually helping people on something that is salient and important for them, it's just amazing. And I I don't know that it's possible anywhere else. I don't go to researchers at Pitt and I never hear, “well, I'm just doing basic research. The payoff will be in 20 or 30 years.” I don’t hear that.
What I hear is the passion behind their work of the everyday real world problems and the people they know in the community. And it's it's inspiring. It's absolutely amazing.
And then another great set of partners is the Mascaro Center for Sustainable Innovation under the leadership of Melissa Bilec and Gena Kovalcik. They have been an amazing partners on sustainability.
Again, like multi-scale, they're doing material science on exactly like what type of steel, you know, what type of concrete and infrastructure a building should be made out of to be sustainable. But then they also think about use. They think about how people are actually going to be using infrastructure and buildings.
They help inform what we do at Pitt in infrastructure and they help inform, you know, what industries do for their business. And we can be talking about saving billions of dollars potentially on energy use for sustainability.
But what they do it kind of goes from quantum to community, and it includes every part of that stack. And it again, it's incredibly inspiring to work with them on that.
And then I'd also like to highlight the Western Pennsylvania Regional Data Center, which really takes it to a community focus.
So, they have years and years of experience working with community organizations, the City of Pittsburgh, Allegheny County. This is actually a partnership that sort of came from Allegheny County and the city saying, “Pitt, we know you're a great partner. Can you help build this infrastructure so we can have better visibility into data and models?”
But decision makers can actually access that in context, so they were on the forefront of building interactive dashboards and visualizations that sort of showed you the data, but in context. So, if you were seeking out affordable housing, right, you could not get the, you know, just the industry real estate perspective, right. You could get like, where is the city policy and affordable housing? And where is the affordable housing stock in Pittsburgh, for example, and sort of geospatial terms that let you as the user navigate in ways that empowered you that the city thought is important viewing traffic patterns, right, to help save lives.
So, working with the city, the Department of Mobility and Infrastructure, right, to say, like, where are we having problems and how can we fix them? Right? And do better.
And they help build data and modeling tools to do that. But always they start with the stakeholders and that's actually been a model for us across these teams is that they start with the real world problems.
And then they design, discover, and engineer to help solve those problems. And that's quite frankly, what we try to do now in all of our work at RDS @Pitt is take inspiration, right, from that and and how we work and solve problems.
McCarthy: So it sounds like we have no lack of of really interesting interdisciplinary problems to solve. How can students get engaged in this work?
Colaresi: So, I think there's a couple ways. So, first, we started the RDS Scholars program, and so this was where students from across the University applied and we really wanted to make sure they understood that data science wasn't just technical. That this was absolutely about anyone that, in their career and that's just about everybody, is going to interact with data models, algorithms, dashboards and need to understand it.
And so, we wanted students on the ground floor of understanding frameworks for not just how to think about the ethics of data and modeling and the abstract, but practically. If we bring you a real-world problem, how do you show your employer that you're analyzing things responsibly? How do you illustrate your critical thinking with and beyond data? How do you show them you've collaborated with the right stakeholders?
They not only knew the technical ins and outs, they know the people that are going to be influenced by this positively and negatively. What have you done to mitigate risks and communicate those risks and uncertainties in downstream decisions?
So, this RDS Scholars program, Nora Mattern led it with Kendra Oliver and Lisa Parker, also. Seventeen students working for the last two semesters presenting their work on applied problems at Data Science Day. I think almost all of them have internships that, you know, came from their work with this and other things they've been doing, and it's just been an inspiration we're going to do that again.
So, look for a call for RDS Scholars.
We've also tried to empower out-of-class research opportunities for students. So, we have communities of practice in sustainability, addiction, and opportunities in data science that we're working at. And so people can get in touch with us to actually help find them out-of-class research opportunities related to responsible data science.
We have our own projects. So, for example, we're building curriculum in responsible data science. And so, we have several students that are actually working on next-generation user interfaces for us on how do we teach responsible data science in the age of AI when these AI tools are changing all the time? So, these students are like on the ground floor of like, what sort of the next generation of technology is going to be delivering things. And they’re amazing.
Another set of students is actually working on, so we like to call them, like, crash videos. So cautionary tales of naive data analysis. So, the use of AI without responsibility.
So we're sort of collating, right, these stories, both from industry partners, right, and other people and actually putting some like meat on those bones, actually how could we have avoided the crash? And then building curriculum around that related to the Master of Data Science and other curriculum.
So, we have these needs both internally where we're partnering with students right on how to do this work and what's been valuable. Three of them just gave amazing presentations yesterday on their work this semester and it was— our students are incredible. What they can do is incredible, and so another thing we think about is, don't think they can't do it. The things that our students can accomplish both within a project and with external partners, it really is, it's limitless particularly because they have connections to their resources.
I would say the RDS scholars, it was 17 students. We wanted to reach more. We got over 100 applications, right, for the first annual RDS Scholars program. And we were like, we can't turn these people away. They're engaged and excited about responsible data science and their place in it and you know part of our mission goal of having shared understanding of RDS is that everyone belongs and it seemed like the wrong message to be like, “thanks, just apply next year.”
So we created a membership program and we created what we're calling "server-side chats," where we're bringing in industry partners mostly from our External Advisory Board, but we'll expand that next year too.
And we invite everybody to be members and then we've been having these server-side chats where both have a formal question and answer, like, how do I get a job in industry? If you're interested, you know, what are the things you would want me to do as a student, you know, right now to be prepared to be part of your organization?
And then informal connections, right? So there's actually, the students get to meet these people that we're bringing in and they're amazing.
What was surprising about that, also, is graduate students started showing up. So even though we were targeting undergrads, there was this need in the graduate space also for this connection to responsible data science. So we're going to expand the scholars program to graduate students next year.
McCarthy: With 100 applications for 17 spots, I know you don't need any free advertising, but I love to hear about our students. So can you tell me some highlights of the RDS scholars work on at Data Science Day this past March?
Colaresi: Yeah, absolutely. You know, one of the things that has been very exciting to be part of is Data Science Day, and RDS Scholars was this new project but we always knew we wanted to intersect it with Data Science Day.
So Data Science Day was something we did for, 2024 was the first time. We did it as a partnership with the School of Computing and Information where we just wanted to celebrate data science in person, you know, with people, right?
Well, all of these amazing centers and bringing them all together because one thing we did learn is that there's not necessarily the visibility across Pitt. We're doing so many cool things.
Doing cool things is hard. It takes a lot of time. And so, we needed to be intentional about how we brought people together, and Data Science Day was just convening and it was amazing. Like it was great last year and so RDS, if it really leaned into it this year of like, what if we had a year of planning, right, to do this, can we do even more and in particular, how can we increase students engagement with this?
So RDS Scholars brought projects and had posters, and then we had people discuss it. We had an award, right, for best poster celebrating, you know, AI=driven innovation but how to do it responsibly.
And I think, you know, we had 150 people show up at this convening at Data Science Day, which is amazing. 28 posters.
So you wanted some highlights of these. Like so, there was an amazing one on AI in the workforce, and in particular they were working with a computational social science scholar, Morgan Frank, who's in the School of Computing Information.
And our Advisory Board, like, loved this because they were focused on, like, what do we do to prepare specifically? How do we use data to inform these trends in industry? Should I be worried as a software engineer, right? Like as a computational social scientist, am I good because I know how to balance the people with the technical? Or is something coming for that also and really like, what's the data right behind these and trends? So that was that was an amazing one.
There was a number of students that were interested in the business use cases in particular. And so, like, how to create value for operations research, right? Let's say in new algorithms and things that can how that can add value.
Another one I loved was actually a humanities poster basically about what's the role of the humanities in the age of AI, right, and how it can be a really strong force, right, to like how we add value to that? It was practical, but it was inspirational too.
And these students, sort of A-Z, did amazing work, and it even went beyond the posters.
We ended the day with sort of what we called "solution sprints," where we let our Advisory Board say, like, what problems do you have? And the scholars came to that, right.
And some of them, you know, it wasn't even on their poster, but they were proposing solutions to the head data scientist, right, at Highmark, right, like on how to solve problems and really impressing with like what they could contribute and what they could do. So that was that was really, really exciting for us.
McCarthy: That sounds like a lot of fun. I'll be sure to get that on my calendar for next year.
And honestly, the poster on training and curriculum—we might have to share with the Provost Advisory Committee on Undergraduate Programs.
So I do like to close by asking, you know, we obviously just scratched the surface here on what's going on with RDS @Pitt, so how can people learn more and stay connected with the project?
Colaresi: Yeah, we do have a website that's a welcoming front door to Responsible Data Science at Pitt. So that's datascience.pitt.edu.
So, we have a LinkedIn profile and so you can follow us on LinkedIn where we keep up to date on events and other convenings.
There's a newsletter that people can subscribe to on the website. We call it RDS Connects, where again, it's events, but also like job opportunities, like, we're posting in there.
And then, like, we have communities of practice. And so, they're on the website.
There's leaders that you can also connect to that might be in your discipline or connected disciplines, so it doesn't just have to be connecting through the Office of the Provost. So we have faculty members listed in engineering, in pharmacy, in the Dietrich School, in the School of Computing and Information, in Business, right. So, all of these places you can connect with.
And we're building, like, working groups. And so, if you're faculty or staff and listening also, we're always looking for new opportunities to partner and grow our network, really help make previous innovation sustainable.
The other thing is people bring us ideas like we don't have these connections yet. But I know we could do this cool thing. think, Joe, you and I have talked about, like, academia runs on coffee and jealousy, right? And so people have seen this thing that they, they know we could do better here, they know. And after a cup of coffee, they e-mail me right and say like, “how do we do this? How do we bring these people together?” So e-mail me those things, e-mail me the things that are, you know going to be the next exciting thing we talk about the next time I'm on the podcast.
McCarthy: We strategically schedule this taping, of course, after we had our coffee, and hopefully our listeners are enjoying theirs and getting ready to e-mail you as we speak.
But thank you, Mike, for joining me today and sharing the great work that you and your team and colleagues, both within and outside of Pittsburgh, have been doing.
And as usual, thank you listeners for tuning in. Again, I'm Provost Joe McCarthy and this has been “From the Office of the Provost.”
[OUTRO MUSIC]