Linux Benchmarking with FIO

This is a quick primer to get you started with doing some Linux storage benchmarking

Here is the FIO man page which offers an amazing array of parameters and can really help you fine tune your testing.

Before running test, let’s talk about what we will be measuring:

  • IOP/s = Input or Output operations per second
  • Throughput = How many MB/s can you read/write continuously

What About Block and File Sizes?
The goal when benchmarking is really to see if the storage system has been optimized to suit your intended use case. For example a system optimized for Writing/Reading lots of small files (i.e. documents, logs) will benefit from a smaller block size but Writing/Reading large files (i.e. videos, large backups) benefit from a larger block size. Another example would be a database server which may have a large database but due to the way the transactions are committed it may be better to have a smaller block size.

Before running these tests

  • Check you’re in a directory with enough free disk space.
  • Check / pause any other workloads that may interfere with the results.
  • Understand your workload / what you intend to use the storage for - i.e. what matters?
  • Tune anything you might want to tune as above such as block or test file size.

Random write test for IOP/s

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --filename=test --bs=4k --size=4G --readwrite=randwrite --ramp_time=4

Random Read test for IOP/s
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --filename=test --bs=4k --size=4G --readwrite=randread --ramp_time=4

Mixed Random Workload
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --filename=test --bs=4k --size=4G --readwrite=readwrite --ramp_time=4

Sequential write test for throughput
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --filename=test --bs=4M --size=4G --readwrite=write --ramp_time=4

Sequential Read test for throughput
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --filename=test --bs=4M --size=4G --readwrite=read --ramp_time=4

A few other tools to help in watching what is happening while you are doing the testing

  • iotop does for I/O usage what top does for CPU usage. It watches I/O usage information output by the Linux kernel and displays a table of current I/O usage by processes on the system. It is handy for answering the question “Why is the disk churning so much?”.
  • ioping monitors disk I/O latency in real time. The main idea behind ioping is to have a utility similar to ping, which will show disk I/O latency in the same way ping shows network latency.

For more in depth automated testing I use the https://www.phoronix-test-suite.com/

2 Likes

@LTS_Tom, great info on benchmarking storage and really liked the nfs/iscsi comparison.

Can you provide what the truenas hardware was for your nfs/iscsi testing. With no tuning, I wanted to compare based on hardware, could be vastly different but may be a good first comparison.

Thanks

It is a TrueNAS-MINI-3.0-X+ Intel Atom CPU C3758 @ 2.20GHz 64GB Ram & seven Micron 5210 1.92TB SATA ssd

@LTS_Tom , thanks for the specs. I feel that I’m under performing on a bunch of the individual tests. I used your truenas test which compared our results. My system out performs in about 30% off tests.

Truenas scale 22.04 beta 1
Epyc 7282 16-core
256GB 3200MHz ram
10Gbit network, upgrading soon to 25gbit ( mellanox connectx-5)
4x 2TB pcie gen 4 nvme Sabrent drives ( 2x striped mirrors)

I know the sabrent aren’t DC drives but was still hoping they would do good for a truenas setup test.

I have a couple of Intel 905P m.3 but thought they might slow the pool down if I added them as a slog.

Thanks

1 Like

can you please help me in usage of gpu related io egnine in fio ex:- rdma,libcufile.i was trying to find the steps on instatattion of gpu related io engienes and its usage but i didn’t get much info on it. can you plz help in writing the cmd which use gpu for writing io o hte disk

I have no idea how to do that and I am not exactly sure what the goal is.

have u tried below fio

Example libcufile job, using cufile I/O

Required environment variables:

GPU_DEV_IDS : refer to option ‘gpu_dev_ids’

FIO_DIR : ‘directory’. This job uses cuda_io=cufile, so path(s) must

point to GPUDirect Storage filesystem(s)

[global]
ioengine=libcufile
directory=${FIO_DIR}
gpu_dev_ids=${GPU_DEV_IDS}
cuda_io=cufile

‘direct’ must be 1 when using cuda_io=cufile

direct=1

Performance is negatively affected if ‘bs’ is not a multiple of 4k.

Refer to GDS cuFile documentation.

bs=1m
size=1m
numjobs=16

cudaMalloc fails if too many processes attach to the GPU, use threads.

thread

[read]
rw=read

[write]
rw=write

[randread]
rw=randread

[randwrite]
rw=randwrite

[verify]
rw=write
verify=md5

[randverify]
rw=randwrite
verify=md5

https://github.com/axboe/fio/blob/master/examples/libcufile-cufile.fio -->link