We are extremely delighted to release Multi-SWE-bench! Multi-SWE-bench addresses the lack of multilingual benchmarks for evaluating LLMs in real-world code issue resolution. Unlike existing ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results