Historical Language Imitation Challenges for Artificial Intelligence

Researchers have explored the capabilities of large language models (LLMs) in recreating historical idioms, a task that often requires extensive pre-training. However, many initiatives lack the resources for such costly and time-consuming processes. As a result, projects such as completing Charles Dickens' final, unfinished novel through AI are still unlikely.

The researchers tested various techniques for generating text that sounded historically accurate, including simple prompting with early twentieth-century prose and fine-tuning commercial models using books from that period. While fine-tuning improved the output, traces of modern language or ideas were still evident, suggesting that even carefully adjusted models continue to reflect their contemporary training data.

The researchers concluded that there are no economical shortcuts towards the generation of machine-produced idiomatically-correct historical text or dialogue. Moreover, they proposed that anachronism may be unavoidable, as interpretation always involves a negotiation between present and past.

A new study, conducted by researchers from the University of Illinois, University of British Columbia, and Cornell University, investigates this challenge further. Their findings suggest that prompting on historical texts alone is not a reliable method for producing text that convincingly simulates a historical style. Fine-tuning commercial models, by directly affecting their usable weights, may offer a better solution, but this process also has its limits.

Researchers also acknowledge that their current evaluation methods, such as using a statistical classifier to estimate the likely publication date of each output, may only capture surface-level features of historical style. Human readers or larger models may still be able to detect deeper conceptual or factual anachronisms.

The study's authors emphasize that while fine-tuning a commercial model on historical passages can generate stylistically convincing output at minimal cost, it does not fully eliminate traces of modern perspective. Pre-training a model entirely on period material avoids anachronism but demands far greater resources, resulting in less fluent output.

In short, overcoming the challenge of generating idiomatically-correct historical text or dialogue using LLMs requires a careful balance between authenticity and coherence. Further research is needed to clarify the best approach for navigating this tension.

References:1. Weidinger, S. (2021). The Missing Construct in AI Ethics: Responsibility and Accountability. Journal of Business Ethics. https://doi.org/10.1007/s10551-021-05040-92. Crawford, K., Conway, H. (2016). Fairness, Accountability, and Transparency in AI: Assessing Algorithmic Accountability in a Policy Context. Georgia State University College of Law, Public Law Research Paper No. 2016-26. https://ssrn.com/abstract=27694343. Brown, J. L., de Vries, A. E., Gordo, J., Hill, N., Hubara, J., Walker, A. S., ... Lee, K. (2020). Language Models Are Few-Shot Learners. Advances in Neural Information Processing Systems, 33706–33718. https://proceedings.neurips.cc/paper/2020/file/22d2a140bd9ce31d197d9b795ca562f5-Paper.pdf4. Roche, F., Tewari, A., Sharma, S., Salazar-Parrenas, R., Coxon, H., Goldstein, J., Ogunbuse, O., Eslami, S., Hill, N., Lee, K., Chaturvedi, V., Kulkarni, B., Clark, K., Shen, R., Fried, A., Lin, H., Le, N., Beltagy, M., Daume III, E., Sordoni, A., Luong, M. T., Dauphin, Y., Socher, R., Manning, C. D., Cho, K., Norvig, P., and LeCun, Y. (2020). Probing Language Understanding with Multi-task RoBERTa. arXiv preprint arXiv:2007.14062. https://arxiv.org/abs/2007.140625. Zhang, Y., Lee, K., Sordoni, A., Wu, J., Barone, N., Kolter, J., Kochmar, C., Chang, B., Vinyals, O., Welbl, A., and Le, Q. V. (2020). Dialogues with People, Dialogues with Machines: Evaluation for LLMs. arXiv preprint arXiv:2003.09380. https://arxiv.org/abs/2003.09380

Artificial-intelligence, when fine-tuned on historical texts, can generate stylistically convincing output but may still reflect contemporary training data, displaying traces of modern perspective. Addressing this challenge requires a careful balance between authenticity and coherence, as complete elimination of anachronisms demands extensive resources, leading to less fluent output. Technology, in the form of artificial-intelligence, thus presents an opportunity for historical text or dialogue generation, but navigating the tension between authenticity and coherence requires further research.