Benchmarking

New Qiskit HumanEval Release: Qiskit 1.4 Compatibility and Benchmark Improvements

Released a new version of Qiskit HumanEval compatible with Qiskit 1.4, featuring significant improvements to the benchmark including more robust and rigorous code execution tests for more accurate evaluations of LLM-generated quantum code.

Presenting Qiskit HumanEval at IEEE Quantum Week 2024

Presenting the Qiskit HumanEval benchmark for LLMs at IEEE Quantum Week 2024 in the SYS-BNCH Benchmarking session. Available afterwards at the IBM Quantum booth to discuss AI and quantum computing initiatives.

Qiskit HumanEval: Evaluation Benchmark for Quantum Code Generation Published

Published research paper introducing Qiskit HumanEval dataset for evaluating Large Language Models capability to generate quantum computing code. The dataset comprises more than 100 quantum computing tasks with prompts, solutions, test cases, and difficulty ratings, establishing benchmarks for generative AI tools in quantum code development.