AI4S: Accelerated AI Models Drive Scientific Discovery
Advertisements
The advent of artificial intelligence (AI) has ushered in a seismic shift in the way scientific research is conducted around the globe. Often referred to as AI for Science (AI4S), this innovative paradigm employs AI technologies to tackle complex scientific challenges, ultimately yielding significant discoveries and technological advancements. Scholars and researchers are touting AI4S as the "fourth paradigm" of scientific inquiry, a term that emphasizes its foundational role in contemporary research alongside empirical, theoretical, and computational methodologies.
The impact of AI on various scientific fields has been profound and cannot be understated. AI4S integrates machine learning, data analytics, and high-performance computing, enabling scientists to delve deeper into their investigations and explore areas previously thought impenetrable. While opinions among specialists regarding the definitions and implications of AI4S may vary, a common consensus emerges: AI is dramatically changing the face of scientific research.
A fascinating manifestation of AI's transformative nature was highlighted during a recent conference where it was noted that the Nobel Prizes in Physics and Chemistry for 2024 would both honor achievements in AI-related fields. Specifically, the Physics Prize will recognize groundbreaking foundational discoveries in machine learning based on artificial neural networks. Similarly, the Chemistry Prize will be awarded for contributions to computational protein design. Such accolades underscore the increasing prominence of AI in scientific inquiry, marking the entry of AI into the prestigious realm of Nobel recognition.
Consider, for instance, the 2024 Nobel Prize in Chemistry, which recognized the development of the AlphaFold AI model. This innovation addresses a challenge dating back five decades: predicting the complex structures of approximately 200 million known proteins. The implications for this advancement are staggering, propelling research and development in the biomedical arena and being utilized by over two million users worldwide. This is a clear illustration of how AI is not just a tool but a catalyst for accelerated scientific progress.
The intertwined relationship between AI and the scientific community is becoming increasingly apparent. As pointed out by academic leaders like Academician O Wei-nan from the Chinese Academy of Sciences, contemporary scientific research can generally be classified into two paradigms: the data-driven Keplerian paradigm and the principle-driven Newtonian paradigm, both facing unique challenges in modern contexts. The solution to these challenges often converges on a singular point: the lack of effective methods to address high-dimensional mathematical problems impedes scientific progress. Herein lies a fundamental opportunity for deep learning and AI to facilitate breakthroughs.
Traditionally, the focus of AI4S has been algorithm-driven, harnessing algorithmic advancements to fuel research innovation. However, as large models continue to emerge and develop, a discernible shift from an “algorithm-driven” paradigm to a “computation-driven” approach is taking place. This shift is particularly evident in data-intensive fields of research where vast computational, networking, and storage requirements are paramount.

Academician Wang Jian echoed similar sentiments during his address, emphasizing the crucial role the internet plays in the realm of open science. He argued that AI4S will foster greater inclusiveness, allowing more individuals to contribute to the pool of scientific innovation. Open science is not merely about making scientific findings accessible; it's fundamentally about rethinking how research is conducted and communicated.
In the era where data, computation, and AI exist inextricably linked to the internet, the latter has become an essential infrastructural backbone that propels various scientific inquiries forward. The inherent scale effects of the internet amplify the potential success of AI applications, as it enables the seamless integration of data, models, and computational resources, much like the way the internet itself connects users and information across the globe.
Wang Jian also provided insights into the transformative potential of open-source platforms within the AI domain. He discussed how projects like DeepSeek are expanding the concept of open-source and highlighting the immense value of open resources in advancing scientific and technological fields. DeepSeek's origin under the MIT License allowed it to gain significant visibility, leading to a proliferation of scholarly articles covering the development and implications of the project shortly after its launch.
AI is increasingly being recognized as a standard-bearer for enhancing research innovation efficiency. Data from Google Scholar reveals that in the past three years, the quantity of research papers utilizing AI has increased at a staggering rate exceeding threefold. The emergence of large models has only accelerated this trend, positioning AI4S at the forefront of scientific innovation across various sectors such as chip design, biomedicine, materials science, astronomy, meteorology, and autonomous vehicles, among others, showcasing major breakthroughs along the way.
The rapid adoption of AI4S technologies is evident in the current trajectory of large models. The success of DeepSeek serves as a testament to the effectiveness of open-source large models. Meta's chief scientist, Yann LeCun, noted that DeepSeek has introduced new ideas while building upon previous work. Its open-source nature means that anyone can benefit from the outcomes of this research, exemplifying the power of open research and initiatives.
In essence, the characteristics of open-source large models imply that once their performance reaches excellence, supported by robust documentation, comprehensive guidelines, and an evolving toolchain, a snowball effect occurs—attracting developers and researchers to engage with its ecosystem. This leads to the production of an extensive family of derivative models that significantly enhance overall performance and quality, rivaling even the finest closed-source models.
Furthermore, the open-source model effectively mitigates the costs associated with deploying large models, overcoming previous limitations characterized by exorbitant inference costs. By utilizing open-source large models in conjunction with public cloud and API structures, a comprehensive acceleration of innovation across multiple phases—from validating minimum viable products (MVPs) to reaching clients and refining operations—can be achieved.
For an industry perspective, private deployment of AI large models requires an investment of capital and time that can be up to tenfold compared to the public cloud and API deployment methods. Public clouds offer a vast array of scalable, elastic, and cost-effective computational resources alongside established toolchains, sharply reducing the barrier to entry for innovation. For example, Google's cloud platform has enabled startups like Midjourney and Pika to rapidly launch new products.
From a customer engagement standpoint, public clouds boast access to an expansive pool of digitally-savvy customers, facilitating quick and economical outreach. The Mistral model, for instance, reported attracting around 1,000 quality clients immediately upon deployment on the Azure cloud platform.
This influential dynamic suggests that the combination of public clouds and APIs is set to become the mainstay for enterprises utilizing large models. Research institutions across China have increasingly turned to Alibaba's cloud services for conducting scientific innovations, resulting in promising advancements in areas spanning biology, agriculture, and astronomy.
Through concerted efforts in promoting powerful computational capabilities, shared data, and accessible models, Alibaba's AI for Science initiative has explored various cooperative frameworks such as infrastructure service models, specialized platform models, and joint research collaborations. For instance, the collaboration between Alibaba AI and Sun Yat-sen University to explore "How to Use AI to Mine RNA Viruses" led to substantial discoveries, including the identification of over 510,000 virus genomes, heralded by being featured on the cover of the esteemed journal "Cell."
Additionally, long before the emergence of ChatGPT, Alibaba Cloud initiated the development of a model community—Modao Community—now hosting over 40,000 models and more than 10 million users. To date, downloads of the Alibaba Tongyi Qianwen open-source model have surpassed 200 million, resulting in over 90,000 derived models being created based on its architecture.
Thanks to Alibaba's unrelenting commitment to open-source principles, it has continuously enhanced the capabilities of its Tongyi Qianwen large models while maintaining a broad open-source approach across dimensions and modalities. The recent release of the Open LLM Leaderboard by Hugging Face, the largest global AI open-source community, has confirmed this progress, showcasing that all ten models in the top rankings are derived models trained on Alibaba's Tongyi Qianwen open-source architecture.
During the 2024 GTC conference (NVIDIA GPU Technology Conference), NVIDIA's CEO Jensen Huang reaffirmed that AI4S represents one of three critical directions within the AI landscape. However, the path forward is not without obstacles—issues such as a shortage of cross-disciplinary talent, challenges in reusing technical solutions, and subpar data quality across vertical disciplines have begun to surface.
In light of these challenges, Tang Chen acknowledged three pivotal recommendations for advancing AI4S: achieving inclusive and equitable growth, fostering integrated innovation, and ensuring safe and orderly development. Addressing these themes is imperative for navigating the complexities as AI propels scientific research into its next evolution.
Post Comment