Freelancer

An Innocent Mount Issue

I work as a freelance contractor for about a year now, I quit a 10+ years regular CTO job to give into more technical subjects, troubleshoot, code, build infrastructures, A-Team style, I help companies with complex matters that might require experience and rigorousness.

One of my last missions was really, really fun to deal with. A rather big company handling scientific, secret-level data, had an issue with their storage system.
They use Infiniband as their communication layer on an HPC environment, it was not a problem while the underlying operating system was CentOS 7.1 with kernel 3.10.0-1160, but since new machines were installed with CentOS 7.7 and up, with kernels 3.10.0-1062 and up, whenever they wrote a file less than 701 bytes long, the file would be corrupted.
For the record and understanding of the following debugging session, the company uses NFS over RDMA, the latter being the technique used by Infiniband to achieve low latency and great throughput.