АНАЛИ 72–3-ЧЛАНЦИ | ARTIFICIAL REASON AND ARTIFICIAL INTELLIGENCE: THE LEGAL REASONING CAPABILITIES OF GPT-4

Bojan SPAIĆ, Miodrag JOVANOVIĆ

10.51204/Anali_PFBU_24302A

САЖЕТАК /

Despite the widespread adoption of generative transformer large language models and the interest of the global legal community, discussions about the models in philosophy of law mainly have been focusing on what LLMs cannot do. In making the first steps towards a philosophical analysis of the capabilities of AI models in the field of law, we follow the basic idea of Turing’s „imitation game“. Proceeding from the frequently raised characterization of legal reasoning as „artificial“, the paper identifies the undisputed minimum core of the „artificiality“ thesis and asks to what extent it can be imitated by artificial intelligence. To answer this question, we test the legal reasoning capabilities of ChatGPT, the most advanced, up-to-date LLM version of artificial intelligence. The conclusion is that in all relevant types of activities usually associated with legal reasoning – fact-finding, interpretation, qualification, and decision-making – ChatGPT can generate outcomes as if it reasons legally.

ЧЛАНАК ЈЕ РАСПОЛОЖИВ НА /

Енглеском

РЕФЕРЕНЦЕ /

Alexander, Larry. 1998. The Banality of Legal Reasoning. Notre Dame Law Review 73: 517–533.
Alexander, Larry, Emily Sherwin. 2021. Advanced Introduction to Legal Reasoning. Cheltenham: Edward Elgar Publishing Limited.
Altman, Sam. 2023. Planning for AGI and beyond. https://openai.com/blog/planning-for-agi-and-beyond, last visited August 14, 2024.
Araszkiewicz, Michal. 2021. Critical Questions to Argumentation. Journal of Applied Logics 8: 291–320.
Arredondo, Pablo. 2023. GPT-4 Passes the Bar Exam: What That Means for Artificial Intelligence Tools in the Legal Profession. Stanford Law School Blogs, April 19. https://law.stanford.edu/2023/04/19/gpt-4-passes-the-bar-exam-what-that-means-for-artificial-intelligence-tools-in-the-legal-industry/, last visited August 14, 2024.
Bickenbach, Jerome. 1990. The ‘Artificial Reason’ of the Law. Informal Logic 12: 23–32.
Bohn, Dieter. 2016. Elon Musk: negative media coverage of autonomous vehicles could be ‘killing people’. October 20. https://www.theverge.com/2016/10/19/13341306/elon-musk-negative-media-autonomous-vehicles-killing-people, last visited August 14, 2024.
Browning, Jacob, Yann Lecun. 2022. AI and the Limits of Language. https://www.noemamag.com/ai-and-the-limits-of-language/, last visited August 14, 2024.
Brożek, Bartosz, Michał Furman, Marek Jakubiec, Bartłomiej Kucharzyk. 2/2024. The black box problem revisited. Real and imaginary challenges for automated legal decision making. Artificial Intelligence and Law 32: 427–440.
Buhrmester, Vanessa, David Münch, Michael Arens. 4/2021. Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey. Machine Learning and Knowledge Extraction 3: 966–989.
Chandra, Abel, Laura Tünnermann, Tommy Löfstedt, Regina Gratz. 2023. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 12. https://elifesciences.org/articles/82819, last visited August 14, 2024.
Cheng, Kunming, Qiang Guo, Yongbin He, Yanqiu Lu, Shuqin Gu, Haiyang Wu. 8/2023. Exploring the Potential of GPT-4 in Biomedical Engineering: The Dawn of a New Era. Annals of Biomedical Engineering 51: 1645–1653.
Chin, Felix. 2023. Twitter post, Jul 19. https://twitter.com/felixchin1/status/1681582623208927233?s=20, last visited August 14, 2024.
Dahan, Samuel, Rohan Bhambhoria, David Liang, Xiaodan Zhu. 2023. Lawyers Should Not Trust AI: A call for an Open-source Legal Language Model. Queen’s University Legal Research Paper. https://dx.doi.org/10.2139/ssrn.4587092.
Dahl, Matthew, Varun Magesh, Mirac Suzgun, Daniel E. Ho. 1/2024. Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models. Journal of Legal Analysis 16: 64–93.
Davis, Wes. 2023. A lawyer used ChatGPT and now has to answer for its ‘bogus’ citations. The Verge, May 27. https://www.theverge.com/2023/5/27/23739913/chatgpt-ai-lawsuit-avianca-airlines-chatbot-research, last visited August 14, 2024.
Dwayne. 2023. Twitter post, July 19 https://twitter.com/DwayneCodes/status/1681516290224300033?s=20, last visited August 14, 2024.
Fan, Jim. 2023. Twitter post, Jul 19 https://twitter.com/DrJimFan/status/1681716564335394817?s=20, last visited August 14, 2024.
Floridi, Luciano. 15/2023. AI as Agency Without Intelligence: on ChatGPT, Large Language Models, and Other Generative Models. Philosophy & Technology 36.
Future of Life Institute (FLI). 2023. Pause Giant AI Experiments: An Open Letter. https://futureoflife.org/open-letter/pause-giant-ai-experiments/, last visited August 14, 2024.
Gemini Team, Google. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. https://www.kapler.cz/wp-content/uploads/gemini_v1_5_report.pdf, last visited August 14, 2024.
Goertzel, Ben. 1/2014. Artificial General Intelligence: Concept, State of the Art, and Future Prospects. Journal of Artificial General Intelligence 5: 1–48.
Guha, Neel, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N. Rockmore, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani, Frank Fagan, Galit Sarfaty, Gregory M. Dickinson, Haggai Porat, Jason Hegland, Jessica Wu, Joe Nudell, Joel Niklaus, John Nay, Jonathan H. Choi, Kevin Tobia, Margaret Hagan, Megan Ma, Michael Livermore, Nikon Rasumov-Rahe, Nils Holzenberger, Noam Kolt, Peter Henderson, Sean Rehaag, Sharad Goel, Shang Gao, Spencer Williams, Sunny Gandhi, Tom Zur, Varun Iyer, Zehua Li. 2023. LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models. arXiv. https://doi.org/10.48550/arXiv.2308.11462.
Hage, Jaap. 2020. Moeten we computers laten rechtspreken? 71–77. Rechtstheorie en praktijk – een liber amicorum, edited by Bald de Vries, Elaine Mak, Lukas van den Berge, Thomas Riesthuis, Jet Tighelaar, Jeroen Kiewiet, Susanne Burri and Thijs de Sterke. Netherlands: Boom Juridisch.
Herik, van den Jaap. 1991. Kunnen computers rechtspreken. Arnhem: Gouda Qiunt.
Hildebrandt, Mireille. 1/2018. Law as Computation in the Era of Artificial Legal Intelligence. The University of Toronto Law Journal 68: 12–35.
Hinton Geoffrey. 1992. How Neural Networks Learn from Experience. Scientific American 3: 144–151.
Hinton, Geoffrey, Yoshua Bengio. 2023. Statement on AI Risk. https://www.safe.ai/statement-on-ai-risk, last visited August 14, 2024.
Huang, Jie, Kevin Chen-Chuan Chang. 2022. Towards Reasoning in Large Language Models: A Survey. arXiv. https://doi.org/10.48550/arXiv.2212.10403.
Turpin, Miles, Julian Michael, Ethan Perez, Samuel R. Bowman. 2023. Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. arXiv. https://doi.org/10.48550/arXiv.2305.04388.
Chen, Lingjiao, Matei Zaharia & James Zou. 2023. How is ChatGPT’s behavior changing over time. arXiv. https://doi.org/10.48550/arXiv.2307.09009.
Kojima, Takeshi, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo & Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. arXiv. https://doi.org/10.48550/arXiv.2205.11916.
Jiang, Cong, Xiaolei Yang. 2023. Legal Syllogism Prompting: Teaching Large Language Models for Legal Judgment Prediction. arXiv. https://doi.org/10.48550/arXiv.2307.08321.
Katz, Daniel Martin, Michael James Bommarito, Shang Gao, Pablo Arredondo. 2023. GPT-4 Passes the Bar Exam. Philosophical Transactions of the Royal Society 382. http://dx.doi.org/10.2139/ssrn.4389233.
Kosinski, Michal. 2023. Theory of mind may have spontaneously emerged in large language models. arXiv. https://arxiv.org/vc/arxiv/papers/2302/2302.02083v1.pdf.
Lanham, Tamera, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez. 2023. Measuring faithfulness in chain-of-thought reasoning. arXiv. https://doi.org/10.48550/arXiv.2307.13702.
LawGeex. 2018. Comparing the Performance of Artificial Intelligence to Human Lawyers in the Review of Standard Business Contracts. https://images.law.com/contrib/content/uploads/documents/397/5408/lawgeex.pdf, last visited August 14, 2024.
Legg, Michael, Felicity Bell. 2019. Artificial Intelligence and the Legal Profession: A Primer. https://allenshub.unsw.edu.au/sites/default/files/inline-files/FLIPStream%20Primer_0.pdf, last visited August 14, 2024.
Law School Admission Council (LSAC). 2024. LSAT official website. https://www.lsac.org/lsat, last visited August 14, 2024.
Chen, Lingjiao, Matei Zaharia, James Zou. 2023. How is ChatGPT’s Behavior Changing over Time? arXiv. https://doi.org/10.48550/arXiv.2307.09009.
Li, Shiyang, Jianshu Chen, Yelong Shen, Zhiyu Chen, Xinlu Zhang, Zekun Li, Hong Wang, Jing Qian, Baolin Peng, Yi Mao, Wenhu Chen, Xifeng Yan. 2022. Explanations from Large Language Models Make Small Reasoners Better. arXiv. https://doi.org/10.48550/arXiv.2210.06726.
Lightman, Hunter, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe. 2023. Let’s Verify Step by Step. arXiv. https://doi.org/10.48550/arXiv.2305.20050.
Liu, Hanmeng, Ruoxi Ning, Zhiyang Teng, Jian Liu, Qiji Zhou, Yue Zhang. 2023. Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4. arXiv. https://doi.org/10.48550/arXiv.2304.03439.
Martin, Lauren, Nick Whitehouse, Stephanie Yiu, Lizzie Catterson, Rivindu Perera. 2024. Better Call GPT, Comparing Large Language Models against Lawyers. arXiv. https://doi.org/10.48550/arXiv.2401.16212.
National Association of Bar Examiners. 2024. Uniform Bar Examination. https://www.ncbex.org/exams/ube, last visited August 14, 2024.
Ngo, Richard. 2023. Twitter post, March 28. https://x.com/RichardMCNgo/status/1640568775018975232?s=20, last visited August 14, 2024.
OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt, last visited August 14, 2024.
OpenAI. 2023. GPT-4 Technical Report. arXiv. https://doi.org/10.48550/arXiv.2303.08774.
OpenAI Development Forum. 2024. GPT-4 is getting worse and worse every single update. OpenAI, March 19. https://community.openai.com/t/gpt-4-is-getting-worse-and-worse-every-single-update/508470, last visited August 14, 2024.
Perlman Andrew. 2023. The implications of ChatGPT for legal services and society. Suffolk University Legal Studies Research Paper Series 22–14. http://dx.doi.org/10.2139/ssrn.4294197.
Petrović, Đorđe, Radomir Mijailović, Dalibor Pešić. 2020. Traffic Accidents with Autonomous Vehicles: Type of Collisions, Manoeuvres and Errors of Conventional Vehicles’ Drivers. Transportation Research Procedia 45: 161–168.
Price, Emily. 2023. OpenAI Acknowledges GPT-4 Is Getting ‘Lazy’, PCMag, 9 December. https://www.pcmag.com/news/openai-acknowledges-gpt-4-is-getting-lazy, last visited August 14, 2024.
Coke, Edward. [1658]1977. Prohibitions del Roy. Coke Reports 12/64. https://www.uniset.ca/other/cs4/77ER1342.html, last visited August 14, 2024.
Rabello, Alfredo Mordechai. 1/1974. Non Liquet: From Modern Law to Roman Law. Israel Law Review 9: 63–84.
Razeghi, Yasaman, Robert L. Logan IV, Matt Gardner, Sameer Singh. 2022. Impact of pretraining term frequencies on few-shot numerical reasoning. 840–854 in Findings of the Association for Computational Linguistics: EMNLP 2022, edited by Yoav Goldberg, Zornitsa Kozareva, Yue Zhang. Kerrville: Association for Computational Linguistics.
Rose, Janus. 2023. A Judge Just Used ChatGPT to Make a Court Decision. Vice, February 3. https://www.vice.com/en/article/k7bdmv/judge-used-chatgpt-to-make-court-decision, last visited August 14, 2024.
Schafer, Burkhard. 2022. Legal Tech and Computational Legal Theory. 305–337 in Law and Technology in a Global Digital Society: Autonomous Systems, Big Data, IT Security and Legal Tech, edited by Georg Borges and Christoph Sorge. Cham: Springer International Publishing.
Schauer Frederick. 2009. Thinking Like a Lawyer: A New Introduction to Legal Reasoning. Cambridge, Massachusetts, London: Harvard University Press.
Schauer Frederick, Barbara Spellman. 1/2017. Analogy, Expertise, and Experience. The University of Chicago Law Review 84: 249–268.
Shanahan, Murray, Kyle McDonell, Laria Reynolds. 2023. Role-Play with Large Language Models. arXiv. https://doi.org/10.48550/arXiv.2305.16367.
Shao, Yunfan, Linyang Li, Junqi Dai, Xipeng Qiu. 2023. Character-LLM: A Trainable Agent for Role-Playing. arXiv. https://doi.org/10.48550/arXiv.2310.10158.
Spellman, Barbara, Frederick Schauer. 2012. Legal Reasoning. 719–735. The Oxford Handbook of Thinking and Reasoning, edited by Keith Holyoak and Robert Morrison. Oxford: Oxford University Press.
Stokel-Walker, Chris. 2023. Generative AI is coming for the Lawyers. WIRED, February 21. https://www.wired.co.uk/article/generative-ai-is-coming-for-the-lawyers, last visited August 14, 2024.
Turing, Alan. 1950. Computing Machinery and Intelligence. Mind 49: 433–460.
Turpin, Miles, Julian Michael, Ethan Perez, Samuel R. Bowman. 2023. Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. arXiv. https://doi.org/10.48550/arXiv.2305.04388.
U.S. Centers for Disease Control and Prevention (CDC). 2024. Global Road Safety, May 16. https://www.cdc.gov/transportation-safety/global/index.html#:~:text=Each%20year%2C%201.35%20million%20people,Global%20Status%20Report%20on%20Safety, last visited August 14, 2024.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. 2017. Attention Is All You Need. Advances in Neural Information Processing Systems. arXiv. https://doi.org/10.48550/arXiv.1706.03762.
Verheij, Bart. 2007. A Coffeehouse Conversation on the Van den Herik Test. 155–163 in Liber Amicorum ter Gelegenheid van de 60e Verjaardag van prof. dr. H. Jaap van den Herik. Maastricht: Maastricht ICT Competence Center.
Verheij, Bart. 2021. A Second Coffeehouse Conversation on the Van den Herik Test. 101–114. Liber amicorum ter gelegenheid van het emeritaat van prof. dr. Jaap van den Herik. Amsterdam: Ipskamp Publishing.
Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou. 2022a. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv. https://doi.org/10.48550/arXiv.2201.11903.
Wei, Jason, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus. 2022b. Emergent Abilities of Large Language Models. arXiv. https://doi.org/10.48550/arXiv.2206.07682.
Wolfram, Stephen. 2022. What Is ChatGPT Doing … and Why Does It Work? https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/, last visited August 14, 2024.
Wu, Zhaofeng, Linlu Qiu, Alexis Ross, Ekin Akyürek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, Yoon Kim. 2023. Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks. arXiv. https://doi.org/10.48550/arXiv.2307.02477.
Yu, Zihan, Liang He, Zhen Wu, Xinyu Dai, Jiajun Chen. 2023. Towards Better Chain-of-Thought Prompting Strategies: A Survey. arXiv. https://doi.org/10.48550/arXiv.2310.04959.
Zhou, Yongchao, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba. 2022. Large language models are human-level prompt engineers. arXiv. https://doi.org/10.48550/arXiv.2211.01910.

ARTIFICIAL REASON AND ARTIFICIAL INTELLIGENCE: THE LEGAL REASONING CAPABILITIES OF GPT-4

Bojan SPAIĆ, Miodrag JOVANOVIĆ

10.51204/Anali_PFBU_24302A

Повезане објаве /