Forgot your password?
typodupeerror
Operating Systems Unix BSD

DragonFlyBSD 3.6 Brings AMD/Intel Graphics Drivers & Better SMP Scaling 48

Posted by timothy
from the not-just-hovering-there dept.
An anonymous reader writes "DragonFlyBSD 3.6 was released [Monday] with the big new features being dports, Intel and AMD Radeon KMS kernel graphics drivers, major SMP improvements, and improved language support. Dports is the new package management system based upon the FreeBSD Ports collection and replaces pkgsrc as the default; over 20k packages are available via dports. Major SMP scaling improvements come via reducing lock contention within the kernel and other multi-core enhancements. The Intel and Radeon graphics drivers on DragonFlyBSD were ported from the FreeBSD kernel, which in turn were ported from the upstream Linux kernel."
This discussion has been archived. No new comments can be posted.

DragonFlyBSD 3.6 Brings AMD/Intel Graphics Drivers & Better SMP Scaling

Comments Filter:
  • by bluefoxlucid (723572) on Tuesday November 26, 2013 @01:22PM (#45528019) Journal

    DragonflyBSD is the only interesting BSD. Minix is the other interesting Unix-like. Dragonfly has a whole hell of a lot of novel concepts or novel implementations of concepts: HAMMERFS (runs 30 second snapshots for 24 hours, then 1 day snapshots, then 1 week, 1 month--a versioning file system, semi-useful), checkpointing with freeze/thaw (you can actually freeze an application and reboot, to the point that you can even move the application to another machine running Dragonfly with the same files at the same paths and thaw the application, continue running it as if you just ran sigstop/sigcont), and extremely good scheduling on SMP that scales to thousands of processors and threads way better than FreeBSD (unsure on how it compares to Linux).

    Minix of course is obvious: It's a fully fault-tolerant self-healing microkernel.

    I would enjoy seeing the Linux extensions (events, iptables, filesystems, etc.) rolled into Minix so that udev, dbus, systemd, and basically a straight Linux distro stack work without modification; and then seeing the DragonFlyBSD scheduling concepts and freeze/thaw checkpointing implemented in Minix, with the tools to call checkpointing ported from DragonFlyBSD. Then we could drop the whole thing straight into a Linux distribution like Ubuntu or Fedora and play with it in a VM, and it would work as expected. Obviously you'd need to port tons of drivers to get a usable desktop system; but to get "you can run Linux on this now", you just need to port some Linux-specific subsystems onto Minix and run it in VMware or KVM or VirtualBox. Straight direct comparisons can then be made.

  • by m.dillon (147925) on Tuesday November 26, 2013 @01:25PM (#45528079) Homepage

    This release removes almost all the remaining SMP contention from both critical paths and most common paths. The work continued past the release in the master branch (some additional things which were too complex too test in time for the release). For all intents and purposes the master branch no longer has any SMP contention for anything other than modifying filesystem operations (such as concurrent writing to the same file). And even those sorts of operations are mostly contention free due to the buffer cache and namecache layers.

    Generally speaking what this means is that for smaller 8-core systems what contention there was mostly disappeared one or two releases ago, but larger (e.g. 48-core) systems still had significant contention when many cores were heavily resource-shared. This release and the work now in the master branch basically removes the remaining contention on the larger multi-core systems, greatly improving their scaling and efficiency.

    A full bulk build on our 48-core opteron box took several days a year ago. Today it takes only 14.5 hours to build the almost 21000 packages in the FreeBSD ports tree. These weren't minor improvements.

    Where it matters the most are with heavily shared resources, for example when one is doing a bulk build on a large multi-core system which is constantly fork/exec'ing, running dozens of the same process concurrently. /bin/sh, make, cc1[plus], and so on (a common scenario for any bulk building system), and when accessing heavily shared cached filesystem data (a very common scenario for web servers). Under these conditions there can be hundreds of thousands of path lookups per second and over a million VM faults per second. Even a single exclusive lock in these paths can destroy performance on systems with more than 8 cores. Both the simpler case where a program such as /bin/sh or cc1 is concurrently fork/exec'd thousands to tens of thousands of times per second and the more complex case where sub-shells are used for parallelism (fork without exec)... these cases no longer contend at all.

    Other paths also had to be cleaned up. Process forking requires significant pid-handling interactions to allocate PIDs concurrently, and exec's pretty much require that locks be fine-grained all the way down to the page level (and then shared at the page level) to handle the concurrent VM faults. The process table, buffer cache, and several other major subsystems were rewritten to convert global tables into effectively per-cpu tables. One lock would get 'fixed' and reveal three others that still needed work. Eventually everything was fixed.

    Similarly, network paths have been optimized to the point where a server configuration can process hundreds of thousands of tcp connections per second and we can get full utilization of 10GigE nics.

    And filesystem paths have been optimized greatly as well, though we'll have to wait for HAMMER2 to finish that work for modifying filesystem calls to reap the real rewards from that.

    There are still a few network paths, primarily related to filtering (PF) that are serialized and need to be rewritten, but that and the next gen filesystem are the only big ticket items left in the entire system insofar as SMP goes.

    Well, the last problem, at least until we tackle the next big issue. There's still cache coherency bus traffic which occurs even when e.g. a shared lock is non-contended. The code-base is now at the point where we could probably drop-in the new Intel transactional instructions and prefixes and get real gains (again, only applicable to multi-chip/multi-core solutions, not simple 8-thread systems). It should be possible to bump fork/exec and VM fault performance on shared resources from their current very high levels right on through the stratosphere and into deep space. Maybe I'll make a GSOC out of it next year.

    The filesystem work on HAMMER2 (the filesystem successor to HAMMER1) continues to progress but it wasn't ready for even an early alpha

"Trust me. I know what I'm doing." -- Sledge Hammer

Working...