Skip to main content

· 19 min read
Jeff Yang, FSA

In this article we examine AI's ability to answer actuarial exam questions, using OpenAI's recently released API for its gpt-3.5-turbo model. At the time of writing, gpt-3.5-turbo is the model that powers ChatGPT, a popular chatbot interface.

We utilize publicly available exam questions and solutions to:

  1. generate alternative solutions to existing problems, and
  2. create new problems from existing questions.

We utilize AI Actuarial Assistant, a recently built user interface that integrates gpt-3.5-turbo with The Actuarial Nexus's database of actuarial exam questions.

Can ChatGPT Pass an Actuarial Exam?


Actuaries must pass a series of professional exams to become certified. In total, these exams require thousands of hours of studying and take candidates many years to complete. Each exam has a pass rate of roughly 50%, so it is not uncommon for candidates to fail 1, 2, or 5 along the way.

There has been ample press coverage touting ChatGPT's ability to pass professional exams in medicine, law, business, and other more well-known fields.1,2 Similar articles have also been published about its lackluster ability to solve simple math problems.1

Actuarial exams are notoriously difficult for the level of mathematical rigor involved. Since OpenAI's models are built on top of large language models (LLMs) centered around completions, it may be reasonable to assume that actuarial exam questions, which often require a certain level of quantitative reasoning and problem-solving ability, are not well-suited for ChatGPT's more linguistic-based approach.

In this article, we test gpt-3.5-turbo's ability to answer actuarial exam questions, and apply prompting techniques to improve gpt-3.5-turbo's reliability.

Selecting a Model

ChatGPT is a popular chatbot interface built on top of a family of large language models known as GPT-3. Both ChatGPT and GPT-3 are developed by OpenAI, an artificial intelligence research company based in San Francisco.

On March 1, 2023, OpenAI released the API for its latest version of the model, gpt-3.5-turbo. This is the same model that powers ChatGPT. Before gpt-3.5-turbo, the most capable model was text-davinci-003. OpenAI's documentation states that gpt-3.5-turbo does equally well on text completions when compared with the Davinci model family, and it does so at 10% of the cost to developers.

The scenarios presented in this article are generated using gpt-3.5-turbo. In the future, we may consider using a different model.

Comparing gpt-3.5-turbo and text-davinci-003

At first glance, and in terms of cost and performance, gpt-3.5-turbo seems like the ideal model to use going forward. Unfortunately, gpt-3.5-turbo currently does not allow for fine-tuning, which means the model has no memory of any information it was previously prompted with. To keep a conversation going, the entire chat history must be included in each successive prompt. Since the OpenAI API charges based on token usage, this could amplify the cost significantly compared to a pre-trained model. Furthermore, the combined prompt input and output are limited to 4,096 tokens. Assuming the average question & solution pair consumes 250 tokens, this means that the model can retain ~12 Q&A pairs at any one time, allotting ~1,000 tokens for the response output and prompt guidance.

On a per-token basis, text-davinci-003 is the more costly model by a factor of 10x. However, it allows for fine-tuning, which means that models can retain information based on past questions and solutions. If text-davinci-003 is used, we won't have to pass in the history of training data for each new prompt. A downside of text-davinci-003 is that is significantly worse at solving math questions out-the-box.1

For questions that just need a little bit of guidance, gpt-3.5-turbo could be the way to go. For other questions that require a more refined model, text-davinci-003 could be the solution in the long run.

Sourcing Questions

We perform our analyses using questions from Exam P (Probability), Exam FM (Financial Mathematics), Exam SRM (Statistics for Risk Modeling), and Exam FAM (Fundamentals of Actuarial Mathematics).

To perform this study, we sample from publicly available sample questions released by the Society of Actuaries. These sample questions are a good proxy for the questions on the actual exam, both in terms of difficulty and scope.

In its raw form, the SOA sample questions are only available in PDF format. To limit the amount of legwork needed to prepare the data, we query from The Actuarial Nexus's organized database of questions, which includes the SOA sample questions and solutions reformatted in Markdown and LaTeX. At the time of writing, this includes ~800 questions across Exam P, Exam FM, Exam SRM, and Exam FAM.


We perform a series of scenario tests and document the methodology and results for each scenario.

Increasing reliability

For each scenario, we also apply the following adjustments to increase reliability:

  1. Per Techniques to Improving Reliability, we affix each prompt with, "Let's think step by step," a technique known as zero-shot chain of thought (zero-shot-CoT) prompting. Adding this one sentence yields significant improvements in the accuracy of the results, particularly for math problems.

  2. Since we are comparing results from different scenarios and value accuracy over creativity, we set the temperature to 0 to reduce the level of randomness in the responses. Unfortunately, even with a setting of 0, "a small amount of variability may remain." In a more formal study, we would increase the sampling size to further increase reliability, but this setting suffices for demonstration purposes.

  3. Aside from the base case scenario, we also prompt the model with either a full solution write-up or the correct answer. This significantly increases the chance that the output arrives at the correct answer. Because the AI has a high chance of yielding the correct numeric answer, it is even more essential that the human prompting the output carefully review the explanation for logical consistency.



Model: gpt-3.5-turbo
Prompt: Question Only
Additional Prompting: None

This scenario serves as our base case for gpt-3.5-turbo. We prompt the model with the sample question and compare the model's result to the official SOA solution. The solution is not provided in this scenario.

Since the model doesn't know the answer beforehand, we expect the output to be the least reliable compared to Scenario 1 and Scenario 2. However, out of the three cases, the prompting is the least restrictive, so the output may include creative solutions.


All results presented in this article were generated using AI Actuarial Assistant.

AI Actuarial Assistant integrates gpt-3.5-turbo's capabilities with The Actuarial Nexus's database of existing actuarial exam questions (~800 questions at the time of writing). By automating all the back-end logistics and allowing users to save output to a central database, AI Actuarial Assistant is prepped to store and process a large network of user-prompted solutions and questions.

AI Actuarial Assistant breaks down a significant barrier in using AI to answer actuarial exam questions. Specifically, any user can generate a response from an existing question with the click of a few buttons (no text input required). From there, the user can easily post the solution to the integrated forum with the click of a button, and then edit the post if needed. Other users can then upvote or downvote the resulting output.

In the near future, through a process known as Self-taught Reasoner (STaR), higher-quality output (posts) can be used to train future models. The output would then be integrated into AI Actuarial Assistant to (1) create a sustainable ecosystem for generating new questions, and (2) increase the reliability of future model responses.


gpt-3.5-turbo includes a level of randomness in the output, even when the temperature is set to 0, so there is no guarantee that the exact results presented in this article can be replicated.


Below are several examples documenting sample output from AI Actuarial Assistant. All relevant links are included, including links to the original question and the output from each scenarios. The observations are solely based on my own judgment. Any reader who has a strong opinion is welcome to reply to the corresponding forum post.

Exam P - SOA Practice Question #1
  • Original Question
  • Subject(s): General Probability, Set Theory
  • Difficulty: Easy
  • Observations
    • Base Case - The model is able to calculate the correct solution on its own. Unfortunately, gpt-3.5-turbo decided to get a bit too fancy and included an unnecessary link to a non-existent image.
    • Scenario 1 - Prompting the engine with the correct solution produced basically the same logic as in the Base Case. This is a reasonable outcome given the engine was able to derive the correct solution on its own in the Base Case. Unfortunately, it decided to include a link to the same non-existent image again.
    • Scenario 2 - The provided solution is similar to the explanation in the Base Case, Scenario 1 and the official SOA solution. This time, the model did not attempt to include an accompanying image.
    • Scenario 3 - The model performed as intended and generated an error-free question, solution, (A)-(E) answer choices, title, and associated keywords.
  • Overall
    • The model performs quite well on this problem, and arrives at the correct solution in all cases.
Exam P - SOA Practice Question #51
  • Original Question
  • Subject(s): Deductibles, Continuous Random Variables, Expected Value
  • Difficulty: Hard
  • Observations
    • Base Case - The model does a fairly good job setting up the problem, but fails to correctly evaluate an integral, and ultimately arrives at the wrong answer. The arithmetic is also incorrect.
    • Scenario 1 - The model struggles with calculating the expected payment for partial damage. The expected value for total loss is also incorrectly calculated.
    • Scenario 2 - The model does a good job connecting the types of loss with the corresponding mathematical representations, and arrives at the correct final answer. However it makes math errors when calculating the partial damage expected payment.
    • Scenario 3 - The model does a good job creating a new question, solution, (A)-(E) answer choices, a title and relevant tags. The question itself is highly relevant to the original question. In the solution, it again appears to incorrectly calculate the partial damage expected payment by not factoring in the deductible. The arithmetic is also incorrect.
  • Overall
    • The model does a relatively good job understanding the problem and setting up the solution. It struggles with more complicated integrals and simplifying equations into numbers.
Exam P - SOA Practice Question #308
  • Original Question
  • Subject(s): Univariate Random Variables, Variance
  • Difficulty: Medium
  • Observations
    • Base Case - The model failed to understand that the probabilities provided in the table are cumulative. The rest of the logic to calculate the standard deviation seems correct. It again makes an error in applying simple arithmetic to calculate the variance. There are also issues rendering LaTeX.
    • Scenario 1 - Scenario 1 is fraught with the same shortcomings as in the Base Case. In providing the correct answer as part of the prompt, the model forces its solution to match the correct answer by erroneously equating its solution to the correct solution.
    • Scenario 2 - Again, the model misses the fact that the probabilities in the table are cumulative, so the first sentence in the output is incorrect. The rest of the output looks okay.
    • Scenario 3 - The model was able to generate a very relevant question. It did not provide (A)-(E) answer choices, and the probabilities in the solution don't match the probabilities in the question.
  • Overall
    • The main stumbling block in this problem is that the AI was not able to recognize that the given probabilities were cumulative. There were also some simple arithmetic errors. Both of these issues seem like minor hiccups that could be addressed in future models.
Exam FM - SOA Practice Question #1
  • Original Question
  • Subject(s): Time Value of Money, Force of Interest, Rate of Interest
  • Difficulty: Medium
  • Observations
    • Base Case - The question asks us to calculate the continuous force of interest, given the semiannual rate of interest. In this case, the number of years (7.25 years), is not necessary to solve this problem. Understandably, the model's solution attempts to factor in the 7.25 years, but makes an arithmetic error in evaluating an exponent (it calculates 1.02^14.5 = 157.10). Aside from arithmetic errors, the rest of the logic seems correct.
    • Scenario 1 - Again there is an error in evaluating the same exponent (it calculates 1.02^14.5 = 153.96), which is closer, but not close enough. The solution is also overly complicated, and the logic seems to break towards the end.
    • Scenario 2 - The model doesn't quite interpret the problem correctly. It tries to correlate the annual force of interest with the value of the account in 7.25 years. However the 7.25 years is irrelevant to the problem. This type of problem seems simple for a human, but potentially difficult for AI, without additional prompting. Nonetheless, it provides an acceptable solution to the problem, since it was prompted with the original solution.
    • Scenario 3 - The model does a good job creating a useable question, solution, (A)-(E) answer choices, a title and relevant tags.
  • Overall
    • This problem is tricky for GPT-3.5 because the question itself includes seemingly relevant information that is, in fact, not necessarily to solve the problem (i.e. a red herring). It also makes careless errors when evaluating equations.
Exam SRM - SOA Practice Question #1
  • Original Question
  • Subject(s): Clustering Algorithms, Unsupervised Learning Techniques
  • Difficulty: Easy
  • Observations
    • Base Case - The model does a good job setting up the problem and creating a nice looking table. Unfortunately, it doesn't show its work in calculating the numbers in the table, and it also calculates the numbers incorrectly. The error seems to stem from faulty arithmetic.
    • Scenario 1 - The Scenario 1 solution is similar to the Base Case solution, with some nuanced additions. In providing the correct answer as part of the prompt, the model forces its solution to match the correct answer by erroneously equating its solution to the correct solution.
    • Scenario 2 - The model regurgitates the original solution, so not much value is added here.
    • Scenario 3 - The model does a good job creating a new question, solution, (A)-(E) answer choices, a title and relevant tags. Unfortunately, there are some minor mistakes in the solution.
  • Overall - The model does a relatively good job understanding the problem and setting up the solution. It struggles with arithmetic.
Exam FAM-S - SOA Practice Question #1
  • Original Question
  • Subject(s): Credibility, Poisson Distribution, Pareto Distribution
  • Difficulty: Hard
  • Observations
    • Base Case - The output includes a lot of background information (formulas, notation, etc.) that might be unnecessary for students who prefer a more succinct response. For others who prefer a more comprehensive explanation, the background information could be helpful. Unfortunately, the model is not able to piece the fundamentals together to form a correct solution.
    • Scenario 1 - The approach is similar to the Base Case approach. The model regurgitates fundamental concepts, but does not apply them correctly.
    • Scenario 2 - The output is able to provide helpful context to the SOA solution. It is able to correctly identify how different pieces of an equation fit together.
    • Scenario 3 - The generated question and solution are similar to the original question and solution. Again, the model appears to make a simple numeric error by equating 18,408 to 30,000.
  • Overall - This problem requires knowledge of different formulas related to random variable distributions and credibility theory. The model does a fairly good job gathering the pieces, but struggles to put these pieces together.


Our findings in this article show that gpt-3.5-turbo, and by extension AI Actuarial Assistant, still has several challenges to overcome before it can reliably answer actuarial exam questions without human supervision.

One of the main limitations is the model's inability to simplify arithmetic, algebraic, and calculus equations into numeric answers. This seems reasonable given that the model is trained to recognize language patterns rather than perform computations. However, as OpenAI notes, "[GPT-3] is actually attempting to perform the relevant [arithmetic] computation rather than memorizing a table." The fact that GPT-3's has the potential to "compute" rather than regurgitate patterns seems promising for future versions of the model, at least as it applies to solving actuarial exam questions.

Despite its current limitations, AI can still offer several benefits to students preparing for actuarial exams:

  1. Given gpt-3.5-turbo's impressive ability to clearly explain concepts, the generated explanations could benefit students who are looking for additional guidance to existing solutions. If a student just needs help with one part of the problem, there's a relatively high chance an AI-generated solution can help the student see things from a different perspective. From there, it is up to the student to fill in the gaps and determine whether the rest of the generated solution is accurate or not.
  2. Every few years, new exams are introduced and old exams are removed from the actuarial credentialing process. This can leave students with little material to draw from, especially for the first sitting of a new exam. By turning up the temperature parameter, AI-generated questions could be used as inspiration for the types of problems that could be asked in a new exam.

Next Steps

The results in this article just scratch the surface of AI's potential. We presented an initial infrastructure and use case for integrating LLMs into the actuarial exam study process. As the world begins to better understand the capabilities and limitations of LLMs, better models will be implemented and results will improve.

The Actuarial Nexus was built to intake and store a large amount of community-sourced exam questions and solutions. The infrastructure is uniquely set-up to integrate with AI Actuarial Assistant. Rather than write a question and solution from scratch, a contributor can simply press a few buttons and generate a (relatively) unique solution or question using AI. The aforementioned STaR process can then be utilized to select high quality responses to train future models. Of course, steps will need to be taken to ensure that this initial quantity over quality approach provides more benefit than harm to students, which will come with usage and time.

The current version of AI Actuarial Assistant (beta) was built in a short amount of time to lay the groundwork for this article. Given the novelty of the idea and the relatively quick development time, there is a lot of unexplored potential in using AI to generate actuarial exam study material, particularly in the topics of prompt design and fine-tuning. I look forward to sharing progress on these two topics in a future blog post.


AI has the potential to enhance the studying experience for actuarial exam candidates. We presented examples in which the model explains existing solutions, writes new practice problems, and stores responses in a central database for future training. The technology is still new, so it will take time to figure out how to best produce reliable and useful results.

The vast majority of actuarial exam questions are in print/PDF form or behind paywalls, so they are not easily accessible to OpenAI's core training model. Since actuarial exams test candidates on a relatively narrow and niche range of topics, I believe the real value in AI integration in the next few years will be derived from the ability to procure high-quality data, process that data, and train a curated model for solving actuarial exam questions. Given the heavy focus on math, an added challenge will be working with LaTeX, for both supplying input and processing output. AI Actuarial Assistant is just the first step in integrating AI with preparing for actuarial exams.

AI is here to stay, and can only improve with time. OpenAI's gpt-3.5-turbo is the latest installment in a rapidly evolving series of LLMs, with each model being a massive improvement upon the previous model. On January 30, 2023, ChatGPT was upgraded with improved factuality and mathematical capabilities. As recently as today, OpenAI announced GPT-4, its latest LLM. GPT-4 potentially addresses some of the limitations outlined in this article, particularly with math calculations.

There is also speculation of Wolfram|Alpha integrating their facts-based answer engine with an OpenAI LLM. This would considerably increase the useability of AI for solving actuarial exam questions, since a main limitation with the current model is its inability to compute basic math equations.

With the right supervision and ecosystem, AI models and its implementations can evolve to provide tremendous value to students preparing for actuarial exams. We have barely begun to scratch the surface in this article. In the next part, we will expand on the topics covered here and discuss further developments with AI Actuarial Assistant.

· 4 min read

Actuaries must pass a series of professional exams in order to become certified. The exams are administered by the Society of Actuaries (SOA) and the Casualty Actuarial Society (CAS) in the United States. The exams cover a wide range of topics including probability, statistics, mathematical finance, economics, financial reporting and regulation, and the specific practices and principles of the actuarial profession. For purposes of this article, we will focus on the exams administered by the SOA.

What are Actuarial Exams

The preliminary exams are the first step in the process of becoming a fully credentialed actuary. They are a set of professional exams and other requirements that cover the fundamental concepts and principles of actuarial science. Once you have passed the preliminary exams, you will be eligible to apply for the Associate of the Society of Actuaries (ASA) designation. A full list of requirements for the ASA designation can be found here.

Is the actuarial profession for me?

Although exams are a big part of the actuarial profession, they are not the only thing that you will be doing as an actuary. Many actuaries work in a variety of roles that require a wide range of skills that go beyond the exams.

Actuaries use mathematical and statistical techniques to assess and manage risks in areas such as insurance, finance, and investment. The field requires strong analytical and problem-solving skills, as well as the ability to work well with numbers and data. If you enjoy these types of challenges and are interested in a career in a business-related field, then becoming an actuary may be a good fit for you. It's also important to research the field and talk to people in the industry to gain a better understanding of the day-to-day work and career prospects.

The preliminary exams are designed to test the knowledge and understanding of the candidate on a wide range of topics, with a heavy focus on calculations. On the other hand, the day-to-day work of an actuary involves applying the knowledge and skills learned through the credentialing process to real-world problems and projects. The overlap between the exam material and the job responsibilities of an actuary may not always be clear at first.

Check out our newcomers forum to learn more about what others are saying about breaking into the actuarial profession.

How do I start?

Regardless of whether you're a career changer, a recent graduate, or a seasoned professional, most employers will only seriously consider you for an entry-level position if you have passed at least one actuarial exam. Passing an exam demonstrates a commitment to the profession and a certain level of aptitude to perform the job duties of an entry-level actuary. In general, the more exams you pass, the more marketable you will be to potential employers.

If this is your first time learning about actuarial exams, we recommend you check out our database of practice questions to get a feel for the types of questions that you will be seeing on the exam. You can also browse our exam prep forum to ask any questions that you may have about the exams.

Although you can start with any exam, the majority of candidates start with either Exam P or Exam FM.

How long do I need to prepare?

The SOA and CAS recommend candidates study 100 hours for every hour of the exam. For Exam P, which is a 3 hour long exam, this means that you should plan to study for 300 hours. For Exam FM, which is a 2.5 hour long exam, this means that you should plan to study for 250 hours.

The actual study time will vary depending on your background and experience. If you have a strong background in the subject material, you may be able to study for less time. Conversely, if you have a weaker background in the subject material, you may need to study for more time.

Regardless of your background, there are many tactics that you can learn to study more efficiently. For some tips and tricks, check out our blog post on Better Study Habits to learn more.


Becoming an actuary can be a rewarding career choice, but it ultimately depends on your interests and skills. For most candidates, it takes several years to complete all the required exams and become an ASA. Once candidates achieve the ASA designation, they can continue to work towards the Fellowship designation.

· 7 min read

Improving your study habits can be a daunting task, especially if you've struggled with managing your time and staying focused in the past. With a little bit of effort and help from The Actuarial Nexus's automated tools, you can learn how to study more effectively and pass your next exam. Here are a few tips to help you get started.

Better Study Habits

There are many tools and resources available to prepare actuarial exams. Flashcards, study guides, and review sheets can all be helpful in organizing and reviewing material. In this article, we'll discuss how you can study more efficiently by making the most of The Actuarial Nexus's study tools and resources.

1. Engage with others

The Actuarial Nexus is built around the idea of growth through collaboration - that we all have strengths and weaknesses when it comes to studying. By engaging with other students, you can learn from their experiences and perspectives.

For each practice problem, you have the option to view the top solutions, which can be a great way to learn from different approaches and perspectives. You can also upvote problems that you find particularly helpful or bookmark problems to come back to later. This can be a valuable tool for organizing and prioritizing your study material.

In addition to posting solutions and upvoting problems, you can also see who else is working on the same problem. This can be a great way to connect with other students and collaborate on solving problems together.

Posting general questions in the dedicated forum can also be a great way to connect with other students. The forums are designed to integrate with the rest of the platform, so you can easily search for topics or questions that have already been answered.

2. Write your own problems and/or solutions

In addition to working through practice problems provided by The Actuarial Nexus, try writing your own problems or solutions. This can help you test your understanding of the material and identify any areas where you need further clarification. You also help us by contributing to our database of practice problems, which can be a great resource for other students.

Your question or solution does not have to be 100% fleshed out. It's fine to post a draft and let others help finetune your question or solution. Your contribution will only be added to the platform's question bank once it's ready and reviewed.

To create your own problem, click the "Create draft question" button in your Feed.

3. Increase your level

An ELO rating system is a method for evaluating the skill levels of players in a competitive game or sport. It was originally developed for chess, but it has since been adapted for use in a variety of other contexts, including online gaming and educational platforms like The Actuarial Nexus.

In our rating system, each user and question is assigned a level ranging from 0 to 10. A user's rating and the question's rating are adjusted based on the outcome of the user's results on the problem, with points being added or subtracted based on whether the problem is correctly or incorrectly answered. The amount of points that a user gains or loses is determined by the relative level of the problem. For example, a level 6 user who correctly answers a difficulty 10 problem will earn more points than a level 6 user who correctly answers a difficulty 1 problem.

The rating system is designed to be self-correcting, so that a player's level will tend to converge towards the user's expected exam score over time. Factors that play into the leveling system include the difficulty of the problem, the user's current level, the user's number of attempts on the same problem, the differential between the problem's average solve time and the user's solve time, and the user's total attempts on all problems. This can help ensure that the ratings are accurate and fair, and that players are not abusing the platform to increase their level. For example, a level 1 user who only solves level 1 questions will not be able to significantly increase their level without attempting harder problems.

While we place constraints on the system to prevent abuse, our ultimate goal is to provide users with additional tools and metrics to help pass actuarial exams. At the end of the day, we rely on the community to help us maintain the integrity of the system. Typical to ELO rating systems, we expect to calibrate the system as we learn more about how it works in practice.

Ultimately, the goal of the leveling system is to accurately predict your expected score on the actual exam. In the initial months following the platform's launch, your level may not necessarily reflect your actual score. However, the more data that is collected, the more accurate the predictions will become.

4. Try spaced repetition

Spaced repetition is an evidence-based study technique that involves reviewing material at increasing intervals over time to help improve retention. The Actuarial Nexus implements a variation of the Leitner system to help you study more efficiently.

The system works by spacing out recommended practice problems based on your previous attempts. Questions that were solved correctly will be shown to you less frequently, while problems that you struggle with will be shown more frequently. This can make it easier to remember the material in the long term.

5. Track your progress

The Actuarial Nexus will automatically save snapshots metrics for you as you work through practice problems and mock exams. This will allow you to see how you are doing overall and identify any trends or patterns. You can also compare your performance against the average performance of other students who have attempted the same practice problem to gauge your relative progress.

Additionally, if you notice that you are consistently struggling with a particular topic, you may want to spend more time reviewing that material or seeking additional help. On the other hand, if you are doing well in a particular subject, you can try to challenge yourself by attempting more difficult practice problems in that area.

The Actuarial Nexus also records the amount of time you spend on each practice problem, and the average time your peers spend. If you notice that it takes you significantly longer to complete certain types of problems, you may want to try different approaches or seek additional help to improve your speed. You also have the ability to view all your past attempts on a particular problem, which can be a great way to see how you've improved over time.


The Actuarial Nexus is a comprehensive study platform that offers a range of tools and resources to help you prepare for actuarial exams. By making the most of all of the available resources, you can increase your retention of new material and boost your performance on exams.

To access all our features, check out our subscription models here.