1. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., and Dean, J. (2013).
Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26. https://papers./paper_files/
paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf.
2. Le, Q., and Mikolov, T. (2014). Distributed representations of sentences
and documents. PMLR 32, 1188–1196.
3. Conneau, A., Kiela, D., Schwenk, H., Barrault, L., and Bordes, A. (2017).
Supervised learning of universal sentence representations from natural
language inference data. Preprint at arXiv. https:///10.48550/arXiv.1705.02364.
4. McCann, B., Bradbury, J., Xiong, C., and Socher, R. (2017). Learned in
translation: Contextualized word vectors. Adv. Neural Inf. Process.
Syst.. https://dl./doi/10.5555/3295222.3295377.
5. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation
of word representations in vector space. Preprint at arXiv. https:///
10.48550/arXiv.1301.3781.
6. Pennington, J., Socher, R., and Manning, C.D. (2014). Glove: Global vectors for word representation. https://nlp./pubs/glove.pdf.
7. Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar,
E., Lee, P., Lee, Y.T., Li, Y., and Lundberg, S. (2023). Sparks of artificial
general intelligence: Early experiments with gpt-4. Preprint at arXiv.
https:///10.48550/arXiv.2303.12712.
8. Goldstein, A., Zada, Z., Buchnik, E., Schain, M., Price, A., Aubrey, B., Nastase, S.A., Feder, A., Emanuel, D., Cohen, A., et al. (2022). Shared computational principles for language processing in humans and deep language
models. Nat. Neurosci. 25, 369–380. https:///10.1038/s41593-022-
01026-4.
9. Caucheteux, C., Gramfort, A., and King, J.-R. (2023). Evidence of a predictive coding hierarchy in the human brain listening to speech. Nat. Hum. Behav. 7, 430–441. https:///10.1038/s41562-022-01516-2.
10. Schrimpf, M., Blank, I.A., Tuckute, G., Kauf, C., Hosseini, E.A., Kanwisher,
N., Tenenbaum, J.B., and Fedorenko, E. (2021). The neural architecture of
language: Integrative modeling converges on predictive processing. Proc.
Natl. Acad. Sci. USA 118, e2105646118. https:///10.1073/pnas.
2105646118.
11. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
A.N., Kaiser, q., and Polosukhin, I. (2017). Attention is all you need. Adv.
Neural Inf. Process. Syst. 30.
12. Hassid, M., Peng, H., Rotem, D., Kasai, J., Montero, I., Smith, N.A., and
Schwartz, R. (2022). How much does attention actually attend? Questioning the Importance of Attention in Pretrained Transformers. Preprint at arXiv. https:///10.48550/arXiv.2211.03495.
13. Tay, Y., Dehghani, M., Abnar, S., Shen, Y., Bahri, D., Pham, P., Rao, J.,
Yang, L., Ruder, S., and Metzler, D. (2020). Long range arena: A benchmark for efficient transformers. Preprint at arXiv. https:///10.
48550/arXiv.2011.04006.
14. Bzdok, Danilo, and Yeo, B.T.T (2017). Inference in the age of big data:
Future perspectives on neuroscience. Neuroimage 155, 549–564.
15. Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., and Metzler, D. (2022). Emergent abilities
of large language models. Preprint at arXiv. https:///10.48550/arXiv.2206.07682.
16. OpenAI. (2023). GPT-4 Technical Report. Preprint at arXiv. https:///
10.48550/arXiv.2303.08774.
17. Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R.,
Gray, S., Radford, A., Wu, J., and Amodei, D. (2020). Scaling laws for neural language models. Preprint at arXiv. https:///10.48550/arXiv.
2001.08361.
18. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix,
T., Rozie`re, B., Goyal, N., Hambro, E., and Azhar, F. (2023). Llama:
Open and efficient foundation language models. Preprint at arXiv.
https:///10.48550/arXiv.2302.13971.
19. Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D.d.L., Hendricks, L.A., Welbl, J., and Clark, A. (2022).
Training compute-optimal large language models. Preprint at arXiv.
https:///10.48550/arXiv.2203.15556.
20. Schaeffer, R., Miranda, B., and Koyejo, S. (2023). Are emergent abilities of
Large Language Models a mirage?. Preprint at arXiv. https:///10.
48550/arXiv.2304.15004.
21. Caballero, E., Gupta, K., Rish, I., and Krueger, D. (2022). Broken neural
scaling laws. Preprint at arXiv. https:///10.48550/arXiv.2210.14891.
22. Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q.,
Gesmundo, A., Attariyan, M., and Gelly, S. (2019). Parameter-efficient
transfer learning for NLP. PMLR 97, 2790–2799. https://r.
press/v97/houlsby19a/houlsby19a.pdf.
23. Pfeiffer, J., Ruckle € ´ , A., Poth, C., Kamath, A., Vulic, I., Ruder, S., Cho, K.,
and Gurevych, I. (2020). Adapterhub: A framework for adapting transformers. Preprint at arXiv. https:///10.48550/arXiv.2007.07779.
24. Bapna, A., Arivazhagan, N., and Firat, O. (2019). Simple, scalable adaptation for neural machine translation. Preprint at arXiv. https:///10.
48550/arXiv.1909.08478.
25. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I.
(2019). Language models are unsupervised multitask learners. OpenAI
blog 1, 9.
26. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P.,
Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020). Language
models are few-shot learners. Adv. Neural Inf. Process. Syst. 33,
1877–1901.
27. Xiang, J., Tao, T., Gu, Y., Shu, T., Wang, Z., Yang, Z., and Hu, Z. (2023).
Language Models Meet World Models: Embodied Experiences Enhance
Language Models. Preprint at arXiv. https:///10.48550/arXiv.
2305.10626.
28. Berglund, L., Tong, M., Kaufmann, M., Balesni, M., Stickland, A.C., Korbak, T., and Evans, O. (2023). The Reversal Curse: LLMs trained on 'A is
B' fail to learn 'B is A'.. Preprint at arXiv. https:///10.48550/arXiv.
2309.12288.
29. Brandes, N., Goldman, G., Wang, C.H., Ye, C.J., and Ntranos, V. (2023).
Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 55, 1512–1522. https:///10.1038/
s41588-023-01465-0.
30. Cui, H., Wang, C., Maan, H., and Wang, B. (2023). scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI.
Preprint at bioRxiv. https:///10.1101/2023.04.30.538439.
31. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O.,
Tunyasuvunakool, K., Bates, R., Zı ´dek, A., Potapenko, A., et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596,
583–589. https:///10.1038/s41586-021-03819-2.
32. Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M.,
Zitnick, C.L., Ma, J., and Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 118, e2016239118. https:///
10.1073/pnas.2016239118.
33. Yang, E., Milisav, F., Kopal, J., Holmes, A.J., Mitsis, G.D., Misic, B., Finn,
E.S., and Bzdok, D. (2023). The default network dominates neural responses to evolving movie stories. Nat. Commun. 14, 4197. https://doi.
org/10.1038/s41467-023-39862-y.
34. Ye, Z., Liu, Y., and Li, Q. (2021). Recent Progress in Smart Electronic Nose
Technologies Enabled with Machine Learning Methods. Sensors 21, 7620.
https:///10.3390/s21227620.
35. Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc,
K., Mensch, A., Millican, K., and Reynolds, M. (2022). Flamingo: a visual
language model for few-shot learning. Adv. Neural Inf. Process. Syst.
35, 23716–23736.
36. Sharma, P., Ding, N., Goodman, S., and Soricut, R. (2018). Conceptual
captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. Proceedings of the 56th Annual Meeting of the Association
for Computational Linguistics. https:///P18-1238/.
37. Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D.,
Borth, D., and Li, L.-J. (2016). YFCC100M: The new data in multimedia
research. Commun. ACM 59, 64–73.
38. Zhou, Y., Chia, M.A., Wagner, S.K., Ayhan, M.S., Williamson, D.J.,
Struyven, R.R., Liu, T., Xu, M., Lozano, M.G., Woodward-Court, P., et al.
(2023). A foundation model for generalizable disease detection from retinal
images. Nature 622, 156–163.
39. Wagner, S.K., Hughes, F., Cortina-Borja, M., Pontikos, N., Struyven, R.,
Liu, X., Montgomery, H., Alexander, D.C., Topol, E., Petersen, S.E., et al.
(2022). AlzEye: longitudinal record-level linkage of ophthalmic imaging
and hospital admissions of 353 157 patients in London, UK. BMJ open
12, e058552.
40. Weininger, D. (1988). SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf.
Comput. Sci. 28, 31–36.
41. Bzdok, D., and Ioannidis, J. P. (2019). Exploration, inference, and prediction in neuroscience and biomedicine. Trends in neurosciences 42,
251–262.
42. Bzdok, D., Engemann, D., and Thirion, B. (2020). Inference and prediction
diverge in biomedicine. Patterns 1, 100119.
43. Shanahan, M., McDonell, K., and Reynolds, L. (2023). Role play with large
language models. Nature 623, 493–498. https:///10.1038/s41586-
023-06647-8.
44. Sharma, A., Kumar, R., Ranjta, S., and Varadwaj, P.K. (2021). SMILES to
smell: decoding the structure–odor relationship of chemical compounds
using the deep neural network approach. J. Chem. Inf. Model. 61,
676–688.
45. Ballentine, G., Friedman, S.F., and Bzdok, D. (2022). Trips and neurotransmitters: Discovering principled patterns across 6850 hallucinogenic experiences. Sci. Adv. 8, eabl6989.
46. Wu, C., Zhang, X., Zhang, Y., Wang, Y., and Xie, W. (2023). Pmc-llama:
Further finetuning llama on medical papers. Preprint at arXiv. https://doi.
org/10.48550/arXiv.2304.14454.
47. Rodziewicz, T.L., Houseman, B., and Hipskind, J.E. (2023). Medical Error
Reduction and Prevention. In StatPearls (StatPearls Publishing LLC.).
48. Hipp, R., Abel, E., and Weber, R.J. (2016). A Primer on Clinical Pathways.
Hosp. Pharm. 51, 416–421. https:///10.1310/hpj5105-416.
49. Acosta, J.N., Falcone, G.J., Rajpurkar, P., and Topol, E.J. (2022). Multimodal biomedical AI. Nat. Med. 28, 1773–1784. https:///10.1038/
s41591-022-01981-2.
62. Poldrack, R.A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends Cogn. Sci. 10, 59–63. S1364-6613(05)00336-
0 [pii]. https:///10.1016/j.tics.2005.12.004.
63. Laird, A.R., Fox, P.M., Eickhoff, S.B., Turner, J.A., Ray, K.L., McKay, D.R.,
Glahn, D.C., Beckmann, C.F., Smith, S.M., and Fox, P.T. (2011). Behavioral interpretations of intrinsic connectivity networks. J. Cogn. Neurosci.
23, 4022–4037. https:///10.1162/jocn_a_00077.
64. Mesulam, M.M. (1998). From sensation to cognition. Brain 121 (Pt 6),
1013–1052.
65. Voytek, B. (2022). The data science future of neuroscience theory. Nat.
Methods 19, 1349–1350. https:///10.1038/s41592-022-01630-z.
66. Brainstorm Consortium, Anttila, V., Bulik-Sullivan, B., Finucane, H.K., Walters, R.K., Bras, J., Duncan, L., Escott-Price, V., Falcone, G.J., Gormley,
P., et al. (2018). Analysis of shared heritability in common disorders of
the brain. Science 360, eaap8757. https:///10.1126/science.
aap8757.
67. Beam, E., Potts, C., Poldrack, R.A., and Etkin, A. (2021). A data-driven
framework for mapping domains of human neurobiology. Nat. Neurosci.
24, 1733–1744. https:///10.1038/s41593-021-00948-9.
68. Wittgenstein, L. (1958). Philosophical Investigations (Basil Blackwell).
69. Naisbitt, J. (1988). Megatrends: ten new directions transforming our lives
(Warner Books).
70. Dziri, N., Milton, S., Yu, M., Zaiane, O., and Reddy, S. (2022). On the origin
of hallucinations in conversational models: Is it the datasets or the
models?. Preprint at arXiv. https:///10.48550/arXiv.2204.07931.
71. Strubell, E., Ganesh, A., and McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Preprint at arXiv. https:///10.
48550/arXiv.1906.02243.
72. Nadeem, M., Bethke, A., and Reddy, S. (2020). StereoSet: Measuring stereotypical bias in pretrained language models. Preprint at arXiv. https://
/10.48550/arXiv.2004.09456.
73. Liu, F., Bugliarello, E., Ponti, E.M., Reddy, S., Collier, N., and Elliott, D.
(2021). Visually grounded reasoning across languages and cultures. Preprint at arXiv. https:///10.48550/arXiv.2109.13238.