A hundredth the price and a quarter the quality means that this is here to stay. Might be a little early in the accuracy phase to start riding AI written briefs into court unchecked, but then I’ve never met a lawyer who didn’t try to make their billing efficient.
But logically, since all that is needed is improved accuracy it’s more likely that improved accuracy will be the answer rather than any change in human behavior.
> A hundredth the price and a quarter the quality means that this is here to stay
No, it's simply that those noobs don't know how to use LLMs. They'll eventually learn.
Basically, you don't use them to dig up new information, unless you're extremely careful about triple-checking that information. Google Scholar's legal database search is better for that. You use LLMs to write boilerplate, paraphrase, edit, and synthesize information from your own sources. Do it properly, and you'll never "hallucinate" a fake case in one of your legal filings, and you'll be able to write 'em in 5% of the time.
> Do it properly, and you'll never "hallucinate" a fake case in one of your legal filings, and you'll be able to write 'em in 5% of the time.
All fun and games for those who can and good for them, but I'm betting the majority can't or won't. The result being society pays for those later ones incompetence.
I've led a team building an LLM agent for customer service.
Our finding is that it's between 50% and 10% of the operational cost of a human for a case. This costing is based on the range of costs for offshore vs. nearshore workers and doesn't account for a lot of the overhead of a human powered service organisation (in the jargon this isn't fully loaded).
I believe that the real cost is about 20% if dev expenses are included - but that's just my view of where inbetween the bounds thing come to rest.
Now, that's not 100th. In terms of quality, there are things it can't do and despite our architecture (which is aimed at managing the deficiencies of LLM's) we still see some hallucinations creeping through. For example our encoder has problems with directionality as in it will write text like "average transaction value declined from $150 to $154 in october." We can catch (in our tests anyway) all the mistakes about the values, but the actual textual phrasing is hard to check - at the level where hard means I think that the value of the system doesn't justify it.
I think, from customer feedback, that this sort of thing will be ok for the apps we are building, but it is a real problem with this generation of models and it's not clear to me that it will be solved in the future (although like everyone else I was blindsided by the jump from GPT3 to 4 so who knows).
Really interesting insights and a really great comment.
I expect the technology to accelerate including dramatic leaps in accuracy and for LLM technology to make geometric improvements (just larger models and better hardware will improve them substantially, and that’s already coming to market in 2024-2025).
But logically, since all that is needed is improved accuracy it’s more likely that improved accuracy will be the answer rather than any change in human behavior.