OpenAI researchers found that even the most advanced AI models struggle to solve most coding tasks, lagging behind human coders.
OpenAI used a benchmark called SWE-Lancer, based on 1,400+ real-world coding tasks, to test AI’s problem-solving abilities.
While AI can fix surface-level bugs, it often fails to find root causes or handle larger, more complex coding challenges.
Anthropic’s Claude 3.5 Sonnet performed better than OpenAI’s models but still got most answers wrong in coding tasks.
Despite rapid advancements, AI isn’t reliable enough to replace human software engineers in real-world projects.
Even with its flaws, some companies are replacing human coders with AI, despite its inability to handle complex tasks effectively.
AI works faster than humans but often provides incorrect or incomplete solutions, lacking the depth of human engineers.