SAN system for 3D production

A medium sized project that required the design, supply and construction of a small high performance SAN system and the rebuilding of the networking infrastructure. Other parts required a VM hypervisor, a realiable and instantly available backup solution, a long term archival system (also instantly available) and minor storage solutions.

Design Needs

The client in question is a medium sized 3D and film production studio, with 25 employees specialised in graphic design, modeling, rigging and rendering, artistic direction and photography.

The production has an heterogeneous choice of machines, some PCs and some Macs with expended video cards.

Over the years it has became necessary to have a shared storage for all the active jobs and to perform distributed rendering on a 22-node farm made of custom-built PCs. The solution in place before the new system was a low-end HP server with the Extreme-Z IP software for sharing volumes via AFP (as SMB still has performance issues as 2018 on Macs).

The solution required high transfer speeds for multiple users at the same time through Ethernet and file-level access protocols, and a limited set of very fast machines with an increased link speed.

iSCSI and FibreChannel systems were unfortunately dropped due to the requirement of not having to manually lock the volume that hosts the media files; they did compositing with videos, but it wasn’t the main requirement, so a better compromise is to lose some speed in favour of an unrestricted read/write file-level access for every workstation.

Also a high reliability was a crucial feature, because it already happened to have some downtimes due to hardware or software failure.

The main repository needed to have a daily backup, weekly rotation, instantly available through a network share, and a long term archive.

Space requirements indicated that 25TB in the main repository was a sufficient need, and 40TB in the archive was appropriate. Redundancy is included.

Solution Architecture

A Studio Network Solution EVO server was the crucial choice for providing a very reliable and stable solution, with the best macOS compatibility possile. I personally managed the research and choice of the vendor, the specification study and the supply through a reseller in UK.

The unit was half loaded with disks and 10GbE fiber channel cards, each directly connected to the high-end workstations. An expansion was provisioned (and later effectively purchased) for the upcoming future. The cards for the clients were ATTO Thunderlinkk and SolarFlare PCI cards.

Providing a redundant and available backup was the task for a 16-bay QNAP system, connected through iSCSI with LACP-bounded interfaces. The automation was managed by a custom Linux instance virtualised under Xen, with a script that handles hard links for minimising space constraints. The iSCSI volume used ZFS for snapshot management and data scrubbing, in order to try to avoid data rotting.

HP Enterprise switches were chosen as the replacement for the old 3Com hardware in place.

Setup and Deployment

EVO required a relatively medium amount of time to get here, two (awesome) guys from England came to assist in the first installation; I took care of the organisation and most of the installation, while they taught me some more in-depth configuration.

Networking was completely redone and configured with two VLANs, one for production and the other for the renderfarm. High capacity links were established between two 24p and one 48p switch to avoid bottlenecks.

Fiber channel wiring was performed by a specialist and certified, all ethernet wires have been checked for performance losses.

Two rack cabinets have been purchased with specific requirements, the first is a sound insulated APC cabinet that shields the main servers in one of the office, while the other one is an industrial style enclosure for switching equipment, designed ad hoc.

The archival solution was later purchased as a QSAN 16-bay highly reliable server with 40TB of redundant backup. Neither the EVO nor the QSAN lost a disk due to failure in 6 years.

Testing and Tuning

The system started functioning shortly after the installation, and the workstations were tuned to use Jumbo frames and advanced IP settings to maximise performance. Tests showed that with macOS the best protocol to use was AFP (even with newer OS as 10.13 it’s still the leading choice) over SMB. Speeds reached and surpassed 1GB/s (Gigabyte per second) for a single workstation connected with 10GbE fiber channel, multiple connections dropped the speeds to values that still outperformed local SSD drives (IOPS were not considered).

XenServer showed particular difficulties to setup the netorking correctly, and the ZFS-based backukp system was particularily a hassle to tune, yet it started working perfectly after some more head-banging on the issue.

Final considerations

The project made me discover and learn about SAN, requirements for media storage, advanced networking configurations and complex workflows. I can confidently say that the project has been a success and the incidents have been extremely rare (one or two in a year for five years) and they had very little impact. The machine is sturdy and reliable, support is excellent and the features cover every possile use case we had and more. The backups work tirelessly and without maintenance, while the network always provides the maximum bandwidth to the clients even under high load conditions. Archival works perfectly, and the cabinet is silent enough to be placed in a workplace with people (as it is currently).

It has been an incredible growth occasion for the client and for myself, we still collaborate as 2018 for other projects that involve their computing infrastructure.