
Updated 9/27/25 to add NVIDIA RTX PRO Blackwell, NVIDIA GeForce RTX 50-Series, and NVIDIA B200 SXM
AMBER 24 GPU Benchmarks on NVIDIA GeForce, RTX, and Data Center GPUs
At Exxact, we have benchmarked AMBER for quite some years now to provide GPU performance for the molecular dynamics simulation suite and to educate both our team and customers on the most optimal hardware. We want to equip our readers with information on what GPUs best fit their budget and workload, whether that's ten thousand, hundred thousand, or million-plus atom count models.
Before we get into it, here's some information on the numbers we achieved:
- All Benchmarks are testing a single GPU (even if multiple GPUs are present). AMBER does not take advantage of multi-GPU acceleration but instead can run multiple calculations in parallel on each GPU.
- All simulations were run using AMBER 24 and AmberTools 25. From version to version, there is little change to the code calculation speed. These numbers can be compared to AMBER 22.
- Newly tested GPUs are on CUDA version 12.8, whereas the previous GPUs were tested on CUDA 12.3. This should not affect performance in a significant manner.
- NVIDIA GH200 (Grace Hopper Superchip) configuration tests only the on-board Hopper GPU.
- Once again, AMBER computations are only performed by GPUs via CUDA. Platform variations have an insignificant effect on performance.
We test on a breadth of classic molecular dynamics problems, including PME and generalized born simulations. Simulations are ordered from largest to smallest to reflect relative performance at all model size levels. Your results may vary based on storage speed, computing environment, and other factors.
- STMV NPT 4fs: 1,067,095 atoms
- Cellulose NVE 2fs = 408,609 atoms
- Cellulose NPT 2fs = 408,609 atoms
- FactorIX NVE 2fs = 90,906 atoms
- FactorIX NPT 2fs = 90,906 atoms
- DHFR (JAC Prod.) NVE 2fs = 23,558 atoms
- DHFR (JAC Prod.) NPT 2fs = 23,558 atoms
- Nucleosome GB 2fs = 25,095 atoms
- Myoglobin GB 2fs = 2,492 atoms
Quick AMBER GPU Benchmark Takeaways
- In larger systems, we see the new NVIDIA Blackwell architecture GPUs outperform the previous generation GPUs.
- NVIDIA GeForce RTX 5090 offers the best performance for its cost, featuring high clock speeds and 32GB of memory. However, it lacks multi-GPU scalability due to its physical size, only suitable for single-GPU workstations.
- The NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition is an amazing card for large simulation sizes. In smaller simulations, its lower clock speed showcases less desirable results since it can't effectively ramp up to fully take advantage of this GPU's performance compared to the consumer counterpart RTX 5090 and last generation NVIDIA RTX 6000 Ada.
- For those working with smaller simulation sizes under 100,000K atoms, the NVIDIA RTX PRO 4500 Blackwell is a great option, matching the popular NVIDIA RTX 5000 Ada, but at a cheaper price.
- The NVIDIA B200 SXM has amazing performance, but it is expensive for molecular dynamics. The B200 SXM, GH200, and H100 PCIe are all geared towards AI workloads; their high price tag makes them not the most price-to-performance friendly option for just MD simulation.

We're Here to Deliver the Tools to Power Your Research
With access to the highest-performing hardware, at Exxact, we offer customizable platforms for AMBER optimized for your deployment, budget, and desired performance so you can make an impact with your research!
Configure your Ideal GPU System for AMBERGPUs Benchmarked
The following Amber 24 Benchmarks were performed on an Exxact AMBER Certified MD System using the AMBER 24 Benchmark Suite with the following GPUs:
GeForce | RTX Professional | Data Center |
|
|
|
Single GPU Benchmark in ns/day Overview
GPU | STMV NPT 4fs | Cellulose NVE 2fs | Cellulose NPT 2fs | FactorIX NVE 2fs | FactorIX NPT 2fs | DHFR (JAC) NVE 4fs | DHFR (JAC) NPT 4fs | Nucleosome GB 2fs | Myoglobin GB 2fs |
---|---|---|---|---|---|---|---|---|---|
RTX 5090 | 109.75 | 169.45 | 153.30 | 529.22 | 494.45 | 1655.19 | 1632.97 | 58.61 | 1151.95 |
RTX 5080 | 63.17 | 105.96 | 99.07 | 394.81 | 365.36 | 1513.55 | 1468055 | 30.2 | 871.89 |
RTX 5070 Ti | 53.99 | 87.60 | 82.95 | 345.91 | 329.96 | 1442.07 | 1414.75 | 28.48 | 899.81 |
RTX PRO 6000 Max-Q | 97.44 | 149.84 | 137.29 | 475.04 | 445.08 | 1464.14 | 1439.53 | 47.74 | 940.57 |
RTX PRO 4500 Blackwell | 54.17 | 88.41 | 82.16 | 389.54 | 363.86 | 1481.61 | 1450.13 | 27.02 | 924.67 |
GH200 Superchip | 101.31 | 167.20 | 152.40 | 191.85 | 206.06 | 1323.31 | 1322.17 | 37.24 | 1159.35 |
B200 SXM | 114.16 | 182.32 | 161.19 | 473.74 | 427.26 | 1513.28 | 1447.75 | 46.07 | 1020.24 |
H100 PCIe | 74.50 | 125.82 | 113.81 | 410.77 | 385.12 | 1532.08 | 1500.37 | 37.83 | 1094.57 |
RTX 6000 Ada | 70.97 | 123.98 | 114.99 | 489.93 | 442.91 | 1697.34 | 1666.84 | 31.59 | 1016.00 |
RTX 5000 Ada | 55.30 | 95.91 | 92.32 | 406.98 | 376.67 | 1562.48 | 1550.32 | 26.11 | 841.93 |
RTX 4500 Ada | 37.58 | 67.63 | 63.78 | 306.57 | 288.13 | 1297.88 | 1278.02 | 18.80 | 740.65 |
RTX A6000 | 39.08 | 63.15 | 58.00 | 273.64 | 253.98 | 1132.86 | 1117.95 | 19.70 | 648.58 |
RTX A5500 | 35.12 | 55.07 | 52.03 | 242.84 | 233.43 | 1116.01 | 1126.87 | 15.32 | 592.84 |
RTX A5000 | 32.29 | 49.63 | 47.86 | 225.58 | 216.11 | 1029.89 | 1025.84 | 15.18 | 580.02 |
RTX A4500 | 27.66 | 42.14 | 40.33 | 201.79 | 193.83 | 963.52 | 951.60 | 11.58 | 536.57 |
RTX A4000 | 21.87 | 33.57 | 31.89 | 161.63 | 158.24 | 841.32 | 829.49 | 10.98 | 491.05 |
STMV Production NPT 4fs - 1,067,095 Atoms
Cellulose Production NVE 2fs - 408,609 Atoms
Cellulose Production NPT 2fs - 408,609 Atoms
FactorIX Production NVE 2FS - 90,906 Atoms
FactorIX Production NPT 2fs - 90,906 Atoms
JAC Production NVE 4fs - 23,558 Atoms
JAC Production NPT 4fs- 23,558 Atoms
Myoglobin Production GB - 2,492 Atoms [Implicit]
Nucleosome Production GB - 25,095 Atoms [Implicit]
AMBER 24 Background Architecture & Hardware Recommendations
AMBER consists of a number of different software packages, with the molecular dynamics engine PMEMD as the most compute-intensive and the engine we want to optimize the most. This consists of single CPU (pmemd), multi-CPU (pmemd.MPI), single-GPU (pmemd.cuda), and multi-GPU (pmemd.cuda.MPI) versions. Traditionally, MD simulations would be executed on CPU, but AMBER MD simulations executing calculations on CUDA have made GPUs the most logical choice for speed and cost efficiency.
Most AMBER simulations can fit on a single GPU running all within GPU memory; CPU performance, CPU memory, have little influence on simulation throughput performance, while storage speeds only impact the time it takes to load the model into GPU memory. AMBER also does not take advantage of multi-GPU acceleration on a single calculation. To fully maximize your AMBER simulation throughput on multiple GPUs, run multiple independent AMBER simulations simultaneously on multiple GPUs in your system, server, or computing infrastructure. This means you can test different molecular interactions simultaneously to answer questions faster.
Hardware Recommendation
Here are our top 3 GPU recommendations for running AMBER and our reasoning:
- Cost-effective & Low Atom Count: The NVIDIA RTX PRO 4500 Blackwell offers great performance with excellent price-to-performance and scalability. In simulations with low atom count, the NVIDIA RTX PRO 4500 Blackwell matches the NVIDIA RTX PRO 6000 Blackwell Max-Q performance at a fraction of the cost. For a system with 2–3x NVIDIA RTX PRO 6000 Blackwell GPUs, you can configure a system with 8x RTX 4500 Blackwell GPUs for more simulations in parallel. With that being said, we expect the untested NVIDIA RTX PRO 5000 Blackwell performance to be the best price-to-performance GPU.
- Peak Single GPU Throughput: The NVIDIA RTX 5090 in a workstation is the best performer at a lower price. If you don't need to run multiple simulations simultaneously, the RTX 5090 delivers the fastest results. The only disadvantage is the lack of scalability and multi-GPU deployments.
- Best for Performance and Scalability: The NVIDIA RTX PRO 6000 Blackwell Max-Q offers the most well-rounded price to performance for simulations of all sizes. It also features great scalability with options to deploy in multi-GPU servers and workstations. While not benchmarked here, we think the NVIDIA RTX PRO 6000 Blackwell Server edition will showcase comparable performance to the NVIDIA GeForce RTX 5090 for those looking at 2U and 4U server-only deployments with up to 10 GPU options.
Conclusion
Not all use cases are the same, and AMBER is likely just one of several applications in your research toolkit. At Exxact Corp., we're committed to providing resources that help you configure the optimal custom system for your specific needs.
Since different hardware setups don't significantly impact AMBER's performance, you may want to consider optimizing your system for other applications with more specific requirements. Other simulation suites like GROMACS or NAMD are impacted by CPU performance. Considering all these factors with your budget is a balancing act that could make or break your overall workflow efficiency.
If you have any questions on configuring a system for multiple Life Sciences applications, Exxact can help consult the right system for your desired performance.

We're Here to Deliver the Tools to Power Your Research
With access to the highest performing hardware, at Exxact, we can offer the platform optimized for your deployment, budget, and desired performance so you can make an impact with your research!
Configure your Life Science Solution Today
AMBER 24 NVIDIA GPU Benchmarks
Updated 9/27/25 to add NVIDIA RTX PRO Blackwell, NVIDIA GeForce RTX 50-Series, and NVIDIA B200 SXM
AMBER 24 GPU Benchmarks on NVIDIA GeForce, RTX, and Data Center GPUs
At Exxact, we have benchmarked AMBER for quite some years now to provide GPU performance for the molecular dynamics simulation suite and to educate both our team and customers on the most optimal hardware. We want to equip our readers with information on what GPUs best fit their budget and workload, whether that's ten thousand, hundred thousand, or million-plus atom count models.
Before we get into it, here's some information on the numbers we achieved:
- All Benchmarks are testing a single GPU (even if multiple GPUs are present). AMBER does not take advantage of multi-GPU acceleration but instead can run multiple calculations in parallel on each GPU.
- All simulations were run using AMBER 24 and AmberTools 25. From version to version, there is little change to the code calculation speed. These numbers can be compared to AMBER 22.
- Newly tested GPUs are on CUDA version 12.8, whereas the previous GPUs were tested on CUDA 12.3. This should not affect performance in a significant manner.
- NVIDIA GH200 (Grace Hopper Superchip) configuration tests only the on-board Hopper GPU.
- Once again, AMBER computations are only performed by GPUs via CUDA. Platform variations have an insignificant effect on performance.
We test on a breadth of classic molecular dynamics problems, including PME and generalized born simulations. Simulations are ordered from largest to smallest to reflect relative performance at all model size levels. Your results may vary based on storage speed, computing environment, and other factors.
- STMV NPT 4fs: 1,067,095 atoms
- Cellulose NVE 2fs = 408,609 atoms
- Cellulose NPT 2fs = 408,609 atoms
- FactorIX NVE 2fs = 90,906 atoms
- FactorIX NPT 2fs = 90,906 atoms
- DHFR (JAC Prod.) NVE 2fs = 23,558 atoms
- DHFR (JAC Prod.) NPT 2fs = 23,558 atoms
- Nucleosome GB 2fs = 25,095 atoms
- Myoglobin GB 2fs = 2,492 atoms
Quick AMBER GPU Benchmark Takeaways
- In larger systems, we see the new NVIDIA Blackwell architecture GPUs outperform the previous generation GPUs.
- NVIDIA GeForce RTX 5090 offers the best performance for its cost, featuring high clock speeds and 32GB of memory. However, it lacks multi-GPU scalability due to its physical size, only suitable for single-GPU workstations.
- The NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition is an amazing card for large simulation sizes. In smaller simulations, its lower clock speed showcases less desirable results since it can't effectively ramp up to fully take advantage of this GPU's performance compared to the consumer counterpart RTX 5090 and last generation NVIDIA RTX 6000 Ada.
- For those working with smaller simulation sizes under 100,000K atoms, the NVIDIA RTX PRO 4500 Blackwell is a great option, matching the popular NVIDIA RTX 5000 Ada, but at a cheaper price.
- The NVIDIA B200 SXM has amazing performance, but it is expensive for molecular dynamics. The B200 SXM, GH200, and H100 PCIe are all geared towards AI workloads; their high price tag makes them not the most price-to-performance friendly option for just MD simulation.

We're Here to Deliver the Tools to Power Your Research
With access to the highest-performing hardware, at Exxact, we offer customizable platforms for AMBER optimized for your deployment, budget, and desired performance so you can make an impact with your research!
Configure your Ideal GPU System for AMBERGPUs Benchmarked
The following Amber 24 Benchmarks were performed on an Exxact AMBER Certified MD System using the AMBER 24 Benchmark Suite with the following GPUs:
GeForce | RTX Professional | Data Center |
|
|
|
Single GPU Benchmark in ns/day Overview
GPU | STMV NPT 4fs | Cellulose NVE 2fs | Cellulose NPT 2fs | FactorIX NVE 2fs | FactorIX NPT 2fs | DHFR (JAC) NVE 4fs | DHFR (JAC) NPT 4fs | Nucleosome GB 2fs | Myoglobin GB 2fs |
---|---|---|---|---|---|---|---|---|---|
RTX 5090 | 109.75 | 169.45 | 153.30 | 529.22 | 494.45 | 1655.19 | 1632.97 | 58.61 | 1151.95 |
RTX 5080 | 63.17 | 105.96 | 99.07 | 394.81 | 365.36 | 1513.55 | 1468055 | 30.2 | 871.89 |
RTX 5070 Ti | 53.99 | 87.60 | 82.95 | 345.91 | 329.96 | 1442.07 | 1414.75 | 28.48 | 899.81 |
RTX PRO 6000 Max-Q | 97.44 | 149.84 | 137.29 | 475.04 | 445.08 | 1464.14 | 1439.53 | 47.74 | 940.57 |
RTX PRO 4500 Blackwell | 54.17 | 88.41 | 82.16 | 389.54 | 363.86 | 1481.61 | 1450.13 | 27.02 | 924.67 |
GH200 Superchip | 101.31 | 167.20 | 152.40 | 191.85 | 206.06 | 1323.31 | 1322.17 | 37.24 | 1159.35 |
B200 SXM | 114.16 | 182.32 | 161.19 | 473.74 | 427.26 | 1513.28 | 1447.75 | 46.07 | 1020.24 |
H100 PCIe | 74.50 | 125.82 | 113.81 | 410.77 | 385.12 | 1532.08 | 1500.37 | 37.83 | 1094.57 |
RTX 6000 Ada | 70.97 | 123.98 | 114.99 | 489.93 | 442.91 | 1697.34 | 1666.84 | 31.59 | 1016.00 |
RTX 5000 Ada | 55.30 | 95.91 | 92.32 | 406.98 | 376.67 | 1562.48 | 1550.32 | 26.11 | 841.93 |
RTX 4500 Ada | 37.58 | 67.63 | 63.78 | 306.57 | 288.13 | 1297.88 | 1278.02 | 18.80 | 740.65 |
RTX A6000 | 39.08 | 63.15 | 58.00 | 273.64 | 253.98 | 1132.86 | 1117.95 | 19.70 | 648.58 |
RTX A5500 | 35.12 | 55.07 | 52.03 | 242.84 | 233.43 | 1116.01 | 1126.87 | 15.32 | 592.84 |
RTX A5000 | 32.29 | 49.63 | 47.86 | 225.58 | 216.11 | 1029.89 | 1025.84 | 15.18 | 580.02 |
RTX A4500 | 27.66 | 42.14 | 40.33 | 201.79 | 193.83 | 963.52 | 951.60 | 11.58 | 536.57 |
RTX A4000 | 21.87 | 33.57 | 31.89 | 161.63 | 158.24 | 841.32 | 829.49 | 10.98 | 491.05 |
STMV Production NPT 4fs - 1,067,095 Atoms
Cellulose Production NVE 2fs - 408,609 Atoms
Cellulose Production NPT 2fs - 408,609 Atoms
FactorIX Production NVE 2FS - 90,906 Atoms
FactorIX Production NPT 2fs - 90,906 Atoms
JAC Production NVE 4fs - 23,558 Atoms
JAC Production NPT 4fs- 23,558 Atoms
Myoglobin Production GB - 2,492 Atoms [Implicit]
Nucleosome Production GB - 25,095 Atoms [Implicit]
AMBER 24 Background Architecture & Hardware Recommendations
AMBER consists of a number of different software packages, with the molecular dynamics engine PMEMD as the most compute-intensive and the engine we want to optimize the most. This consists of single CPU (pmemd), multi-CPU (pmemd.MPI), single-GPU (pmemd.cuda), and multi-GPU (pmemd.cuda.MPI) versions. Traditionally, MD simulations would be executed on CPU, but AMBER MD simulations executing calculations on CUDA have made GPUs the most logical choice for speed and cost efficiency.
Most AMBER simulations can fit on a single GPU running all within GPU memory; CPU performance, CPU memory, have little influence on simulation throughput performance, while storage speeds only impact the time it takes to load the model into GPU memory. AMBER also does not take advantage of multi-GPU acceleration on a single calculation. To fully maximize your AMBER simulation throughput on multiple GPUs, run multiple independent AMBER simulations simultaneously on multiple GPUs in your system, server, or computing infrastructure. This means you can test different molecular interactions simultaneously to answer questions faster.
Hardware Recommendation
Here are our top 3 GPU recommendations for running AMBER and our reasoning:
- Cost-effective & Low Atom Count: The NVIDIA RTX PRO 4500 Blackwell offers great performance with excellent price-to-performance and scalability. In simulations with low atom count, the NVIDIA RTX PRO 4500 Blackwell matches the NVIDIA RTX PRO 6000 Blackwell Max-Q performance at a fraction of the cost. For a system with 2–3x NVIDIA RTX PRO 6000 Blackwell GPUs, you can configure a system with 8x RTX 4500 Blackwell GPUs for more simulations in parallel. With that being said, we expect the untested NVIDIA RTX PRO 5000 Blackwell performance to be the best price-to-performance GPU.
- Peak Single GPU Throughput: The NVIDIA RTX 5090 in a workstation is the best performer at a lower price. If you don't need to run multiple simulations simultaneously, the RTX 5090 delivers the fastest results. The only disadvantage is the lack of scalability and multi-GPU deployments.
- Best for Performance and Scalability: The NVIDIA RTX PRO 6000 Blackwell Max-Q offers the most well-rounded price to performance for simulations of all sizes. It also features great scalability with options to deploy in multi-GPU servers and workstations. While not benchmarked here, we think the NVIDIA RTX PRO 6000 Blackwell Server edition will showcase comparable performance to the NVIDIA GeForce RTX 5090 for those looking at 2U and 4U server-only deployments with up to 10 GPU options.
Conclusion
Not all use cases are the same, and AMBER is likely just one of several applications in your research toolkit. At Exxact Corp., we're committed to providing resources that help you configure the optimal custom system for your specific needs.
Since different hardware setups don't significantly impact AMBER's performance, you may want to consider optimizing your system for other applications with more specific requirements. Other simulation suites like GROMACS or NAMD are impacted by CPU performance. Considering all these factors with your budget is a balancing act that could make or break your overall workflow efficiency.
If you have any questions on configuring a system for multiple Life Sciences applications, Exxact can help consult the right system for your desired performance.

We're Here to Deliver the Tools to Power Your Research
With access to the highest performing hardware, at Exxact, we can offer the platform optimized for your deployment, budget, and desired performance so you can make an impact with your research!
Configure your Life Science Solution Today