Hey Siri: How Much Does This Galaxy Cluster Weigh?

Aug 25, 2022

4 min

It's been nearly a century since astronomer Fritz Zwicky first calculated the mass of the Coma Cluster, a dense collection of almost 1,000 galaxies located in the nearby universe. But estimating the mass of something so huge and dense, not to mention 320 million light-years away, has its share of problems — then and now. Zwicky's initial measurements, and the many made since, are plagued by sources of error that bias the mass higher or lower.


Now, using tools from machine learning, a team led by Carnegie Mellon University physicists has developed a deep-learning method that accurately estimates the mass of the Coma Cluster and effectively mitigates the sources of error.


"People have made mass estimates of the Coma Cluster for many, many years. But by showing that our machine-learning methods are consistent with these previous mass estimates, we are building trust in these new, very powerful methods that are hot in the field of cosmology right now," said Matthew Ho, a fifth-year graduate student in the Department of Physics' McWilliams Center for Cosmology and a member of Carnegie Mellon's NSF AI Planning Institute for Physics of the Future.


Machine-learning methods are used successfully in a variety of fields to find patterns in complex data, but they have only gained a foothold in cosmology research in the last decade. For some researchers in the field, these methods come with a major concern: Since it is difficult to understand the inner workings of a complex machine-learning model, can they be trusted to do what they are designed to do? Ho and his colleagues set out to address these reservations with their latest research, published in Nature Astronomy.


To calculate the mass of the Coma Cluster, Zwicky and others used a dynamical mass measurement, in which they studied the motion or velocity of objects orbiting in and around the cluster and then used their understanding of gravity to infer the cluster's mass. But this measurement is susceptible to a variety of errors. Galaxy clusters exist as nodes in a huge web of matter distributed throughout the universe, and they are constantly colliding and merging with each other, which distorts the velocity profile of the constituent galaxies. And because astronomers are observing the cluster from a great distance, there are a lot of other things in between that can look and act like they are part of the galaxy cluster, which can bias the mass measurement. Recent research has made progress toward quantifying and accounting for the effect of these errors, but machine-learning-based methods offer an innovative data-driven approach, according to Ho.


"Our deep-learning method learns from real data what are useful measurements and what are not," Ho said, adding that their method eliminates errors from interloping galaxies (selection effects) and accounts for various galaxy shapes (physical effects). "The usage of these data-driven methods makes our predictions better and automated."


"One of the major shortcomings with standard machine learning approaches is that they usually yield results without any uncertainties," added Associate Professor of Physics Hy Trac, Ho's adviser. "Our method includes robust Bayesian statistics, which allow us to quantify the uncertainty in our results."


Ho and his colleagues developed their novel method by customizing a well-known machine-learning tool called a convolutional neural network, which is a type of deep-learning algorithm used in image recognition. The researchers trained their model by feeding it data from cosmological simulations of the universe. The model learned by looking at the observable characteristics of thousands of galaxy clusters, whose mass is already known. After in-depth analysis of the model's handling of the simulation data, Ho applied it to a real system — the Coma Cluster — whose true mass is not known. Ho's method calculated a mass estimate that is consistent with most of the mass estimates made since the 1980s. This marks the first time this specific machine-learning methodology has been applied to an observational system.


"To build reliability of machine-learning models, it's important to validate the model's predictions on well-studied systems, like Coma," Ho said. "We are currently undertaking a more rigorous, extensive check of our method. The promising results are a strong step toward applying our method on new, unstudied data."

Models such as these are going to be critical moving forward, especially when large-scale spectroscopic surveys, such as the Dark Energy Spectroscopic Instrument, the Vera C. Rubin Observatory and Euclid, start releasing the vast amounts of data they are collecting of the sky.


"Soon we're going to have a petabyte-scale data flow," Ho explained. "That's huge. It's impossible for humans to parse that by hand. As we work on building models that can be robust estimators of things like mass while mitigating sources of error, another important aspect is that they need to be computationally efficient if we're going to process this huge data flow from these new surveys. And that is exactly what we are trying to address — using machine learning to improve our analyses and make them faster."


This work is supported by NSF AI Institute: Physics of the Future, NSF PHY-2020295, and the McWilliams-PSC Seed Grant Program. The computing resources necessary to complete this analysis were provided by the Pittsburgh Supercomputing Center. The CosmoSim database used in this paper is a service by the Leibniz-Institute for Astrophysics Potsdam (AIP).


The study's authors include: Trac; Michelle Ntampaka, who graduated from CMU with a doctorate in physics in 2017 and is now deputy head of Data Science at the Space Telescope Science Institute; Markus Michael Rau, a McWilliams postdoctoral fellow who is now a postdoctoral fellow at Argonne National Lab; Minghan Chen, who graduated with a bachelor's degree in physics in 2018, and is a Ph.D. student at the University of California, Santa Barbara.; Alexa Lansberry, who graduated with a bachelor's degree in physics in 2020; and Faith Ruehle, who graduated with a bachelor's degree in physics in 2021.


You might also like...

Check out some other posts from Carnegie Mellon University

1 min

Pittsburgh’s AI-Powered Renaissance

Carnegie Mellon University’s artificial intelligence experts come from a wide range of backgrounds and perspectives, representing fields including computer science, sustainability, national security and entrepreneurship. Ahead of the AI Horizons Summit highlighting the city's commitment to responsible technology, CMU experts weighed in on why they see Pittsburgh as a hub for human-centered AI.

+1

1 min

How vulnerable are US energy facilities

Earlier this month, alarm bells were ringing at the Justice Department after a Jordanian citizen was arrested for targeting and breaking into solar power facility farm in Florida. During that same time period, energy facilities in New Jersey and Idaho also came under attack. The attacks were politically motivated and have led national media outlets like USA Today to contact experts from Carnegie Mellon University to help explain the situation and break if all down. The Department of Homeland Security has issued warnings that domestic extremists have been developing "credible, specific plans" since at least 2020 and would continue to "encourage physical attacks against electrical infrastructure." Industry experts, federal officials, and others have warned in one report after another since at least 1990 that the power grid was at risk, said Granger Morgan, an engineering professor at Carnegie Mellon University. One challenge is that there's no single entity whose responsibilities span the entire system, Morgan said. And the risks are only increasing as the grid expands to include renewable energy sources such as solar and wind, he said. August 15, 2024 - USA TODAY Professor Granger's comments are startling as America's vulnerabilities to important infrastructure seem to be more exposed than ever. And if you're a journalist looking to cover this emerging topic - then let us help with your questions and stories. Morgan Granger is available to speak with media - simply click on his icon below to arrange an interview today.   Photo Credit: Zbynek Burival

2 min

Is the economy on thin ice? Our expert can explain

Is the bubble about to burst again on the country's economy? A recent article by Bloomberg News paints a picture of what lies ahead - and the predictions look bleak at best. The Congressional Budget Office warned in its latest projections that US federal government debt is on a path from 97% of GDP last year to 116% by 2034 — higher even than in World War II. The actual outlook is likely worse. From tax revenue to defense spending and interest rates, the CBO forecasts released earlier this year are underpinned by rosy assumptions. Plug in the market’s current view on interest rates, and the debt-to-GDP ratio rises to 123% in 2034. Then assume — as most in Washington do — that ex-President Donald Trump’s tax cuts mainly stay in place, and the burden gets even higher.   With uncertainty about so many of the variables, Bloomberg Economics has run a million simulations to assess the fragility of the debt outlook. In 88% of the simulations, the results show the debt-to-GDP ratio is on an unsustainable path — defined as an increase over the next decade.  April 01 - Bloomberg The economy is in the news every day - and if you're a journalist looking to know what the future may hold - then let us help with your coverage and questions. Professor Lee Branstetter is a research associate of the National Bureau of Economic Research and nonresident senior fellow at the Peterson Institute for International Economics.   He is available to speak with media about the economy - simply click on his icon now to arrange an interview today.

View all posts