每日最新頭條.有趣資訊

大數據不等於科學規律

我們從天文學史獲得的重要教訓是,大數據本身是解釋不了自己的。構建簡化的數學模型,再將其與真實的物理世界聯繫起來,並加以完善,這才是從數據這塊原始礦石中提煉出“意義”這顆稀有寶石的可靠方法。

來源 | 公眾號“蔻享學術”

作者 | Frank Wilczek (麻省理工學院教授、2004年諾貝爾獎得主)

翻譯 | 梁丁當、胡風

天文學史表明,如果沒有理論模型的解釋,觀測數據本身揭示的信息是有限的。

The history of astronomy shows that observations can only explain so much without the interpretive frame of theories and models.

如今,大數據和機器學習為許多科學問題提供了新的解決方法。而天文學史為我們提供了一個有趣的角度去審視如何運用數據引導科學,這或許是一個很好的警示。

Big data and machine learning are powering new approaches to many scientific questions. But the history of astronomy offers an interesting perspective on how data informs science—and perhaps a cautionary tale.

早期的巴比倫天文學家採用了今天我們稱之為純“大數據”或者“模式識別”的方法。他們積累了數個世紀的太陽、月球和行星運動及日月食的觀測數據,從中找出了不同的循 環周期。只需假設這些周期會繼續下去,他們就能為種植、灌溉和收割的時間提供合理指導,制定出可靠的占星術,並提前預測月食發生的時間。

Early Babylo年n astronomers took what today we'd call a pure "big data" or "pattern recognition" approach. They accumulated observations of solar, lunar and planetary motion and eclipses for many centuries and identified various cycles that had repeated many times. Simply by assuming that those cycles would continue, they were able to give good advice for planting, irrigation and harvest times, to cast credible horoscopes and to predict in advance when lunar eclipses would occur.

古希臘天文學家則用了兩種不同的方法來理解同一組數據。第一種方法是構建幾何模型,即將太陽、月亮、行星和恆星視為一個個抽象的發光點,分別固定在某個勻速旋轉的天球上。

The ancient Greek astronomers used two distinct methods to understand the same data set. The first was to make geometric models that treated the sun, moon, planets and stars as mathematical abstractions—shiny points carried upon uniformly rotating celestial spheres.

起初,希臘人的預測並不比巴比倫人強,事實上差很多。為了改進,他們假設光點在天球上不是固定的,還在沿著額外的圓周軌道運動,即本輪。公元2世紀時,這種模型體系在天文學家托勒密(Ptolemy)手中臻於完美。儘管在後人看來,托勒密的模型是冗雜笨拙的,但在當時,它確實提供了一種相對緊湊的框架體系來包容大量的天文數據,並且給出了有用的實際結果。

At first, the Greeks' predictions were no better than those of the Babylo年ns—in fact, they were significantly worse. But they patched things up by postulating additional movements of the spheres, called epicycles. These models, which were perfected by the 2nd-century astronomer Ptolemy, seem ugly in retrospect, but they did package the astronomical data in a relatively compact form, and they gave useful practical results.

希臘天文學家採用的第二種方法是將天體視為具有物理性質的真實物體。這種方法的一個代表性成就是:公元前3世紀時,阿里斯塔克(Aristarchus)首次測算出了日地距離與地月距離的比值。阿里斯塔克假設月光來自反射的太陽光,當半個月亮和太陽同時出現在天空的時候,他利用簡單的三角原理計算出了兩者距離的比值。

The second method used by Greek astronomers was to consider astronomical bodies as real objects with physical properties. Perhaps the high point of this effort was the brilliant determination by Aristarchus, in the 3rd century B.C., of the ratio of the distances from the Earth to the sun and the moon. Assuming that the moon shines by reflected sunlight, and measuring the angle between the sun and the half-moon when both are visible in the sky, he calculated the ratio using simple trigonometry.

然而在數個世紀裡,上述兩種天文學方法——一個是數學的,一個是物理的——一直沒能很好地結合起來。這是因為已有的“大數據”,即太陽、月亮和恆星那些容易觀測到的運行模式,只不過是深層規律呈現出來的隱晦表象。

Yet a proper synthesis of the mathematical and physical approaches to astronomy wasn’t achieved for many centuries. That’s because the available "big data"-the easily observable patterns of the sun, moon and stars-are cryptic, superficial signs of the deep structure beneath.

16世紀時,哥白尼(Copernicus)發現,如果把太陽而不是地球放在天球的中心,就可以得到一個更加簡潔漂亮的托勒密式模型。雖然托勒密模型在科學史上常常不受待見,但該模型在哥白尼的突破中起到了絕對關鍵的作用,因為它為模型參數之間的“巧合”提供了物理的解釋。

Copernicus, in the 16th century, discovered that he could get more beautiful versions of Ptolemy-style models if he put the sun, rather than the Earth, at the center of the celestial spheres. Ptolemy's work typically gets rough treatment in the history of science, but it was absolutely essential to Copernicus's breakthrough in offering a physical explanation of "coincidences" among the model's parameters.

在哥白尼提出日心說後不久,伽利略(Galileo)就利用自製的望遠鏡,成功觀測到了金星的相位變化、木星的衛星——一個縮微的“太陽系”,以及月球的表面地貌。夜空不再是抽象幾何點和虛擬球面的數學模型,而是一個向我們展示實實在在的天體的窗口。最終,當牛頓提煉出了運動與引力的普遍規律後,巴比倫人和托勒密的“大數據”方法與阿里斯塔克和伽利略的物理終於被結合起來,從而開啟了真正的現代科學。

Not long after, Galileo's homemade telescope revealed the phases of Venus, Jupiter's attendant satellites—a "solar system" in mi年ture—and the topography of the moon. The night sky came to life as a showcase of tangible, physical bodies rather than an exercise in idealized points and imaginary spheres. When Isaac Newton distilled the universal laws of motion and gravity, he reunited the "big data" approach of the Babylo年ns and Ptolemy with the physics of Aristarchus and Galileo, launching truly modern science.

我們從天文學史獲得的重要教訓是,大數據本身是解釋不了自己的。構建簡化的數學模型,再將其與真實的物理世界聯繫起來,並加以完善,這才是從數據這塊原始礦石中提煉出“意義”這顆稀有寶石的可靠方法。

The big lesson is that big data doesn't interpret itself. Making mathematical models, trying to keep them simple, connecting to the fullness of reality and aspiring to perfection—these are proven ways to refine the raw ore of data into precious jewels of meaning.

作者簡介

Frank Wilczek:弗蘭克·維爾切克是麻省理工學院物理學教授、量子色動力學的奠基人之一。因在誇克粒子理論(強作用)方面所取得的成就,他在2004年獲得了諾貝爾物理學獎。

特 別 提 示

1. 進入『返樸』微信公眾號底部菜單“精品專欄“,可查閱不同主題系列科普文章。

2. 『返樸』提供按月檢索文章功能。關注公眾號,回復四位陣列成的年份+月份,如“1903”,可獲取2019年3月的文章索引,以此類推。

《返樸》,科學家領航的好科普。國際著名物理學家文小剛與生物學家顏寧共同出任總編輯,與數十位不同領域一流學者組成的編委會一起,與你共同求索。關注《返樸》(微信號:fanpu2019)參與更多討論。二次轉載或合作請聯繫[email protected]

獲得更多的PTT最新消息
按讚加入粉絲團