Use Case Utility
- XAI and LLMs are often tools for accomplishing some other goal.
- Very limited work has explored the utility of LLMs in use-case– specified user studies, but a user study on Microsoft/Github’s Copilot [1], an LLM-based code generation tool, found that it “did not necessarily improve the task completion time or success rate” [52]
- LLM outputs often sound very confident, even if what they are saying is hallucinated [50]
- When the user inquires about the incorrectness, they also have a documented tendency to argue that the user is wrong and that their response is correct. In fact, some have called LLMs “mansplaining as a service” [34]
- This can make it more difficult for humans to implement cognitive checks on LLM outputs.
- While some recent LLM work has outlined categories of failure modes for LLMs based on the types of cognitive biases use [29], we push for greater work in this field