Friday, 12 May 2017

linux-4.11-ck1, MuQSS version 0.155 for linux-4.11

Announcing a new -ck release, 4.11-ck1  with the latest version of the Multiple Queue Skiplist Scheduler, version 0.155. These are patches designed to improve system responsiveness and interactivity with specific emphasis on the desktop, but configurable for any workload.

linux-4.11-ck1

-ck1 patches:
http://ck.kolivas.org/patches/4.0/4.11/4.11-ck1/

Git tree:
https://github.com/ckolivas/linux/tree/4.11-ck


MuQSS

Download:
4.11-sched-MuQSS_155.patch

Git tree:
4.11-muqss


MuQSS 0.155 updates

Fixed syscall convention in the code for sched_setscheduler.
Fixed copy to user error with max CPUs enabled.
Changed skip lists to 25% probability of increased level (from 50%) scaling each CPU up to loads of 64k as a result.
Minor preemption code tweak.

4.11-ck1 updates

Cleaned up the patchset.

This was the most massive resync I think I can ever recall with the mainline kernel thanks to the complete restructuring of the scheduler code by Mingo. Fortunately there weren't that many actual changes to the scheduler itself that I needed to port but it was still a protracted effort. Probably the most amusing part of the resync was seeing my name mysteriously disappear from the credits of sched/core.c from mainline, although everyone's names were removed along with it. Troll begone?

Given the size of this merge, it is possible there are build configurations that fail, so bear with me and post your config if that's the case.

For those of you stuck with the evil nvidia driver you will find some difficulty getting it to build now thanks to licensing issues. I believe it should work if you use the disable drm option.

Enjoy!
お楽しみ下さい
-ck

EDIT: I should add that even though the patches say make BFQ default, -ck1 does NOT automatically set BFQ as your default I/O scheduler so you'll have to either choose that configuration option when compiling your kernel, or set your I/O scheduler at runtime.

58 comments:

  1. Thanks so much.

    ReplyDelete
  2. A multiple queue skiplist scheduler has to go with a multiple queue io block queuing mechanism. It just sounds too good not to.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Selected patches with 4.11-ck1 and bfq-mq is just phenomenally fast. However when very close to idle load, it will hard lockup on context switch or a single write. On moderate to crazy loads, it works perfectly fine. Not much attention needed on this given how new the framework is for bfq-mq.

      I will switch to the full 4.11-ck1 patchset to see if those issues persist specific to the kernel.

      Delete
    3. Pretty stable with up to 3d uptime. I've been finding SCHED_IDLEPRIO to be a lot more useful in removing jitter and stutters from tmux and clamav than SCHED_BATCH nice 19 on CFS.

      thestinger/linux-hardened patches also build nicely with some rebase edits to kernel/sysctl.

      Delete
  3. @ck It seems that v4.11.1 should include a fix. Temporarily one could use this patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d557d1b58b3546bab2c5bc2d624c5709840e6b10

    > Changed skip lists to 25% probability of increased level (from 50%) scaling each CPU up to loads of 64k as a result.

    Does this has any effect on the general performance of MuQSS?

    ReplyDelete
    Replies
    1. No effect on performance.

      Delete
    2. Wouldn't it would scale up to 65,536 tasks per runqueue?

      Delete
    3. Yes it would. Have you ever seen loads more than 256 per CPU? If so, then you'll notice a difference now...

      Delete
  4. @ck I've noticed something strange with muqss. For example when I run my windows and ubuntu vm with the default cfs scheduler, htop shows cpu usage in the range of 2-5% when the vms are idle. But with MuQSS it jumps from 2% to 20% and back to 10% and continues. Do you have any idea how I can reduce the cpu usage here?

    ReplyDelete
    Replies
    1. I am currently using qemu/kvm for virtualization:

      cfs + windows + idle: 2-5%
      cfs + windows + firefox: 4-6%

      muqss + windows + idle: 10-30%
      muqss + windows + firefox: 30-40%

      Do yo have any idea what is going on here? What can I try to reduce the cpu load?

      Delete
    2. Is the CPU frequency the same in both cases? Is this using 'performance' governor for max frequency?

      Delete
    3. Under muqss cpu accounting is more precise which is why it shows load a bit higher.
      Efficiency wise I would check either temps or batterry usage to determine whether muqss vs cfs is a difference. Maybe there are better ideas, but are two I typically use to determine usefullness of kernel on my laptop :)

      Delete
    4. cpupower frequency-info shows:

      driver: intel_pstate
      hardware limits: 800 MHz - 4.40 GHz
      available cpufreq governors: performance powersave
      current policy: frequency should be within 800 MHz and 4.40 GHz. The governor "powersave" ...

      cfs: the current CPU frequency stays in the range of +/- 900MHz (mostly constant while running the vm)

      muqss: the current CPU frequency stays in the range of +/- 800MHz (mostly constant while running the vm)

      I am not sure if this is relevant but yeah, it still worries me somehow, under cfs it shows 2-4% cpu when the vm is idle which is actually reasonable, the vm is literally idle. But MuQSS shows constant changes from 24% to 50$ cpu? I don't see how this is "more precise"?

      Delete
    5. Also, I've noticed the following: When an application uses like >= 20% cpu, the qemu process has the same cpu usage behavior as in cfs, it runs in the range between 3% and 6%.

      Delete
    6. It's almost certainly just a sampling error; something to do with CPU reporting has changed yet again in mainline kernel and affects the value reported by MuQSS. It can't physically make it use more CPU.

      Delete
    7. I saw this a lot on intel_pstate with Cadence and VMware, but had lower cpu % when I tested with acpi_cpufreq and intel_cpufreq schedutil. There was a post about it in the liquorix forums that suggests it might be an accounting artifact.

      http://techpatterns.com/forums/about2586.html

      Delete
    8. @monotykamary, ck: According to the discussion, the OP has noticed this behvior when he upgraded to 4.10. I can't remember if this also happened with 4.9 but I can certainly say that it did also happen with 4.10. So, if there has been a change it had to be included in 4.10.

      Delete
    9. I tried using acpi_cpufreq (sched_util) and it seems better than before but I am still having these constant cpu spikes (4% -> 7% -> 14% -> 28% -> 4%) which is better than the inte_pstate one. I am going to try ondemand next.

      Delete
    10. You are trying change the cosmetic reporting of CPU which won't affect the function at all. Choose the best governor based on performance and power usage, not on cosmetic reporting of CPU usage.

      Delete
    11. Would MuQSS be affected by the various time accounting options?

      There's:
      TICK_CPU_ACCOUNTING - "Simple tick based cputime accounting" (Liquorix default)
      VIRT_CPU_ACCOUNTING_GEN - "Full dynticks CPU time accounting"
      IRQ_TIME_ACCOUNTING - "Fine granularity task level IRQ time accounting"

      Or maybe the tick rate? I know you recommend 100hz with MuQSS, but I've seen in the comments that there's some weird software interaction where systems boot slower or bad software runs slower / differently (firefox?). Liquorix has all the timeout patches merged as of recently, so maybe that's worth testing.

      But otherwise, I can confirm that the load doesn't seem to make sense. My desktop with 16 cores, but after boot, with nearly 0% cpu usage, it sits at a nice even 1.00. It's almost like the load is offset, rather than calculated differently.

      A good test that I haven't tried is simply turning off all cores except one. If the load really is 1.00 or higher, the individual core should be completely maxed out at 100%.

      Delete
    12. None of those accounting options should make a difference as muqss uses its own idea for low level accounting. There's clearly an accounting bug to do with attributing idle time that has happened recently but it doesn't mean the CPU is using any more energy; it's purely cosmetic.

      As for the slowdown at 100Hz, that has been fixed with the remainder of the patches in -ck, so only -ck has 100Hz + the other timer patches. In fact it should actually be faster in the scenarios where previously 100Hz was slower due to 2000Hz timer resolution effectively.

      Delete
    13. @ck Is it possible for you to find this cpu time accounting bug? Well, just if you're not that busy :)
      This is somewhat annoying with muqss because my system monitoring script is literally signaling me that something is utilizing much cpu. I am currently with cfs but muqss seems somehow faster and snappier than cfs.

      Delete
    14. @ck this bug is affecting me as well. In my case, the excess CPU usage (cosmetic or not) is pushing my CPU into higher power states. It is using more power at idle. I am using cpufreq. Thanks for all your work!

      Delete
  5. Is the CPU frequency the same in both cases? Is this using 'performance' governor for max frequency?

    ReplyDelete
  6. I have error on compiling https://bpaste.net/show/53174289cfc9

    ReplyDelete
  7. @CK
    I'm not sure if "writeback throttling by default" is a good idea. On the first sight many people found it useful, Ubuntu kernel team even combined it with CFQ by default. On the other hand Paolo Valente says it's "stealing control from BFQ" & "Luca is preparing a patch that automatically disables wit when bfq is selected as an I/O scheduler, as it already happens with cfq." (he refers to patch 9661671 (cfq: Disable writeback throttling by default)).
    So maybe it would be better not to enable it for now. Besides, unlike other risky features (like threaded-IRQs, yield_type) we can't even disable it (please correct me if I'm wrong).

    ReplyDelete
    Replies
    1. https://github.com/zen-kernel/zen-kernel/commit/12bba378b6191d8c4084b152dee526613caf8400

      damentz had different results with hierarchical scheduling with BFQ + writeback throttling when he increased BLKDEV_MAX_RQ from 128 to 512.

      Delete
  8. For me "0002-Make-preemptible-kernel-default.patch" didn't apply fully. It changed description of "PREEMPT_VOLUNTARY" but didn't set "PREEMPT". Used "Debian 4.11-1~exp2" as a base config. However "0013-Reinstate-default-Hz-of-100" changed default HZ.

    ReplyDelete
    Replies
    1. Also patch 0003 is duplicated at patch 0004

      Delete
    2. Patch 0004 reverts changes made in patch 0003 for reasons unknown.

      Delete
  9. UP i686 fails:
    ============
    In file included from ./include/linux/rculist.h:10:0,
    from ./include/linux/pid.h:4,
    from ./include/linux/sched.h:13,
    from kernel/sched/MuQSS.c:33:
    kernel/sched/MuQSS.h: In function ‘highest_flag_domain’:
    kernel/sched/MuQSS.h:347:60: error: ‘struct rq’ has no member named ‘sd’; did you mean ‘sl’?
    for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \
    ============
    The '-' in '->sd' is highlighted.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. I have similar issue https://bpaste.net/show/53174289cfc9

      Delete
  10. As some others have reported, I also seem to be noticing CPU spikes with MuQSS. This seems to go back at least to the 4.10 or even 4.9 releases. The most notable effect is that the temperature monitor on my laptop shows about a 15 degree Celsius higher temperature at idle with MuQSS (entire CK patchset) compared to CFQ with an otherwise identical kernel version and kernel config. I have not tested with MuQSS alone but only the CK patchset.

    It does seem to be a little better with 4.11. With 4.10 and CK, my laptop would stay at around 90 degrees Celsius at idle. With 4.10 and CFQ, it would stay around 68-70, but when I recompiled with the CK patches, it is now hovering about 83 degrees.

    ReplyDelete
    Replies
    1. I hope you've done your tests at the same ambient temperature, each time you've tested. You know, fake news and alternative facts are likely to go into the Trump's Dungeon. ;-)

      BR, Manuel Krause

      Delete
    2. I keep my house at about 72 degrees Fahrenheit during the day and 69 at night. So the ambient temperature in my house is rather stable. At this moment, on a CK 4.11 kernel, my laptop is running at 94 degrees Celsius even though Chrome just has an open Gmail and Facebook tab in the background, and this page in the foreground. KDE's system monitor shows no significant CPU usage from any app. It appears that MuQSS apparently increases the amount of activity in kernel space significantly.

      Delete
    3. I would try to post "perf top" and i7z results
      If there is really more time spend in the kernel you should see it there.
      i7z is to see at which cstates your cpu is running powertop works also for that.

      Delete
    4. General recommendation: use intel_pstate or ondemand with acpi_cpufreq, don't use schedutil. If ondemand is still bad for you - lower top limit or maybe try conservative.

      Delete
    5. i7z wasn't very helpful because this laptop has a Centrino Duo CPU that has 1 GHZ, 1.3 GHz, and 1.6 GHz speeds. I ran perf top, and the two main kernel processes that were using CPU were read_hpet and acpi_idle_do_entry which were both using a little under 5% CPU at idle with no other applications running. There were several other kernel processes using about 0.5% CPU. The banner at the top said that the kernel was using about 20% CPU total at idle. I'd have to recompile my kernel with my previous non-MuQQS config to see what the stats are there. This laptop is prone to overheating when it remains active or if the CPU clock is at 1.3 or 1.6 GHz. I use Intel's thermald and laptop-mode tools to help manage the CPU speed and temperature. So it normally stays at 1 GHz at idle. But checking /proc/cpuinfo shows that, when I have my kernel compiled with the CK patchset, my CPU is staying at 1.6 GHz even when at idle. So something is obviously keeping the CPU busy enough to prevent it from throttling down to 1 GHZ.

      Delete
    6. hpet has high overhead is tsc not working for your laptop ?

      Delete
    7. This comment has been removed by the author.

      Delete
    8. [ 0.000000] tsc: Fast TSC calibration using PIT
      [ 0.001000] tsc: Detected 1662.554 MHz processor
      [ 0.102000] TSC synchronization [CPU#0 -> CPU#1]:
      [ 0.102000] Measured 3782661680 cycles TSC warp between CPUs, turning off TSC clock.
      [ 0.102000] tsc: Marking TSC unstable due to check_tsc_sync_source failed
      [ 1.028861] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x17f6f95e426, max_idle_ns: 440795238383 ns

      Apparently it is not working on my Dell Latitude D620.

      Delete
    9. Try with "clocksource=tsc" kernel parameter.

      Or as root:
      echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource

      ^ can put that in rc.local to run it on every boot if it is working.

      Delete
    10. ^ "tsc=reliable" is another interesting kernel parameter.

      Delete
  11. https://docs.google.com/spreadsheets/d/14nHLMeJXOqxj-mlMk_vdb7yLhLOPRdNS4JkYMU0ArHI/edit?usp=sharing

    Default run interbench results, currently with zen against ck1 (no-wbt). No deadlines were missed on ck1 (no-wbt). I will add vanilla ck1 along with a few more kernels later.

    Annoyingly enough, linux-zen and stock arch likes to kill make jobs down to 1 when compiling linux-ck, forcing me to rerun it.

    ---

    schedtool -R -p 5 -e wine *
    (schedtoold) wineserver -R -p 1 -n 19

    I found this is the most stable setup so far for wine that also survives with make -j4 at any policy. wine priority can still be changed with no issues so long as it does not equal that of wineserver's.

    - monotykamary

    ReplyDelete
    Replies
    1. added ck1, bfq, and bfq-mq

      Delete
    2. Nice. Thanks very much for doing those. They look consistent with expectations :)

      Delete
    3. Moar green for MUQSS :)

      Delete
  12. What happened to the Ubuntu kernel downloads integrating MuQSS? Are you no longer offering those?

    ReplyDelete
    Replies
    1. Very time poor at the moment and decided to prioritise and not do them. Are they still desired?

      Delete
    2. There is some interest in them, no doubt about that. At the moment, Liquorix is the primary source to acquire a MuQSS enabled kernel outside of compiling the thing yourself. Which comes with its own set of issues.

      And Liquorix took its sweet damned time to update to 4.11. And 4.12 is going to be a monster release so for a long while MuQSS would probably remain unavailable to those of us that do not build their own kernels. Once 4.12 and a 4.12 version of MuQSS land, that is.

      Delete
    3. Okay, uploaded them.

      Delete
    4. Thanks very much.

      Delete
  13. Hey Con, just wondering if you saw my UP failure post above? No hurry, I know you're busy. :-)

    ReplyDelete
    Replies
    1. Meaning, you want to see the kernel config?
      ===============
      [jwh7@pc linux-ck]$ cat config.last |grep ^CONFIG|grep -ie hz -e threading -e accounting -e smt -e smp -e _up_ -e x64
      CONFIG_BROKEN_ON_SMP=y
      CONFIG_IRQ_FORCED_THREADING=y
      CONFIG_FORCE_IRQ_THREADING=y
      CONFIG_NO_HZ_COMMON=y
      CONFIG_NO_HZ_IDLE=y
      CONFIG_TICK_CPU_ACCOUNTING=y
      CONFIG_IRQ_TIME_ACCOUNTING=y
      CONFIG_TASK_IO_ACCOUNTING=y
      CONFIG_GENERIC_SMP_IDLE_THREAD=y
      CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
      CONFIG_PARAVIRT_TIME_ACCOUNTING=y
      CONFIG_UP_LATE_INIT=y
      CONFIG_X86_UP_APIC=y
      CONFIG_X86_UP_IOAPIC=y
      CONFIG_HZ_100_MUQSS=y
      CONFIG_HZ=100

      Delete
  14. I had been using the "00xy-Swap-sucks.patch" from your -ck patchsets for some while now, also separately patched to the VRQ from Alfred Chen, with 4.10 and now 4.11 kernels.
    My use case is coming along with 14G swap on 2nd internal hdd, 8G RAM and 3G /dev/shm "ramdisk" and highly wished fast resuming from suspend-to-disk (hibernate).
    When having it applied, overall desktop recovery after a S2DISK takes a lot of time until getting responsive again. Main verified reason for it is my "tab-overloaded" Firefox.
    Without this patch, recovery of interactiveness happens much faster after a S2DISK.
    I've had done much swappiness testings just before you published "00xy-Swap-sucks.patch" with my 8G RAM and do absolutely agree with *vm_swappiness = 33*, but the other commit *#define vm_swap_full() 1* isn't a good choice for a usage pattern like mine.

    Best regards,
    ManuelKrause

    ReplyDelete