Kernel gets interrupted with local_irq_disable()

keymaker · 06-16-2023, 10:04 AM

Hi everyone! I've recently come across an intriguing aspect of interrupts: it appears that the kernel can be interrupted even when local_irq_disable() is called.

I'd like to share some code snippets to illustrate this:

Code:

/* kernel module */

static int device_state[100];

/*
  dummy function to add a controllable latency
*/
static void noop_loop(unsigned long num)
{
  int i, j;
  unsigned long num_c = num;
  
  for (j=0; j<num; j++) {
    for (i=0; i<(sizeof(device_state)/sizeof(device_state[0])); i++) {
      device_state[i] = num_c;
      num_c *= 3;
    }
  }
}

static long tw_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
  unsigned long long tw_enter, tw_leave;

  switch (cmd) {
    case TW_DEMO:
      local_irq_disable();
      tw_enter = rdtsc_ordered();
      
      noop_loop(arg);

      tw_leave = rdtsc_ordered();
      local_irq_enable();

      return tw_leave - tw_enter;
  }

  return -1;
}


/* user-space */
void demo_tw_ext(void)
{
  int ret, i;
  unsigned long counter = 0, cnt = 0;;

  pin_task_to(0);

  puts("[*] normal time window");
  for (i=1; i<=10; i++) {
    printf("[*] tw=>%d\n", ioctl(fd, TW_DEMO, 1));
  }

  while (1) {
    counter++;
    ret = ioctl(fd, TW_DEMO, 1);
    if (ret > 20000) {
      cnt++;
      printf("[+] [%u] tw=>%d, catch time window extension in %lu attempts\n", cnt, ret, counter);
      if (cnt >= 10)
        break;
    }
  }

  puts("[*] time window measure demo done\n");
}

The resulting output is as follows:

Quote:

$ sudo ./interface
[*] normal time window
[*] tw=>1266
[*] tw=>770
[*] tw=>782
[*] tw=>778
[*] tw=>775
[*] tw=>783
[*] tw=>783
[*] tw=>778
[*] tw=>770
[*] tw=>762
[+] [1] tw=>39248, catch time window extension in 39305 attempts
[+] [2] tw=>75431, catch time window extension in 347607 attempts
[+] [3] tw=>37828, catch time window extension in 710853 attempts
[+] [4] tw=>52159, catch time window extension in 715386 attempts
[+] [5] tw=>61467, catch time window extension in 716299 attempts
[+] [6] tw=>37778, catch time window extension in 732457 attempts
[+] [7] tw=>49331, catch time window extension in 738861 attempts
[+] [8] tw=>67860, catch time window extension in 750552 attempts
[+] [9] tw=>38113, catch time window extension in 786872 attempts
[+] [10] tw=>57824, catch time window extension in 789542 attempts
[*] time window measure demo done

Based on my understanding, the kernel should not be interrupted while executing noop_loop() because the IRQ is disabled. Consequently, the time window measured by the two rdtsc_ordered() calls should remain stable.

This behavior can be reproduced on both virtual machines (with hardware-assisted virtualization) and physical machines (both AMD and Intel CPUs with the constant_tsc flag). I've also attempted to use the event tracing framework, but the trace log did not indicate any activity within the time window.

If anyone has insights into how the CPU can be interrupted with IRQ disabled or suggestions on how to debug this further, I would greatly appreciate it!

jailbait · 06-17-2023, 12:51 PM

Quote:

Originally Posted by keymaker

Hi everyone! I've recently come across an intriguing aspect of interrupts: it appears that the kernel can be interrupted even when local_irq_disable() is called.

Code:

      local_irq_disable();
      tw_enter = rdtsc_ordered();
      
      noop_loop(arg);

      tw_leave = rdtsc_ordered();
      local_irq_enable();

      return tw_leave - tw_enter;

Based on my understanding, the kernel should not be interrupted while executing noop_loop() because the IRQ is disabled. Consequently, the time window measured by the two rdtsc_ordered() calls should remain stable.

This behavior can be reproduced on both virtual machines (with hardware-assisted virtualization) and physical machines (both AMD and Intel CPUs with the constant_tsc flag). I've also attempted to use the event tracing framework, but the trace log did not indicate any activity within the time window.

If anyone has insights into how the CPU can be interrupted with IRQ disabled or suggestions on how to debug this further, I would greatly appreciate it!

You are looking for the variations in time by looking for an unexpected IRQ interrupt. The time variations might also be caused by variations in the time that noop_loop runs. One thing that could cause the noop_loop times to vary is contention for memory access between the CPUs in a multicore system.

keymaker · 06-17-2023, 09:33 PM

Quote:

Originally Posted by jailbait

You are looking for the variations in time by looking for an unexpected IRQ interrupt. The time variations might also be caused by variations in the time that noop_loop runs. One thing that could cause the noop_loop times to vary is contention for memory access between the CPUs in a multicore system.

Thank you for your reply!

Yes, the demo was tested on a multicore system. In the user program, I pinned the task to a specific CPU. Is it possible for contention to occur even under this setting? I'm curious if the contention for memory access between the CPUs in a multicore system could still affect the timing.

Additionally, it's interesting to note that the syscall itself only takes around 3,000 TSC. I'm trying to understand if contention could potentially extend the time window to around 40,000 TSC. Would you happen to have any insights on this?

Thanks again for your help!

jailbait · 06-18-2023, 12:16 AM

Quote:

Originally Posted by keymaker

Thank you for your reply!

Yes, the demo was tested on a multicore system. In the user program, I pinned the task to a specific CPU. Is it possible for contention to occur even under this setting? I'm curious if the contention for memory access between the CPUs in a multicore system could still affect the timing.

Yes, but the timings that you posted below indicate that I am probably barking up the wrong tree.

Quote:

Additionally, it's interesting to note that the syscall itself only takes around 3,000 TSC. I'm trying to understand if contention could potentially extend the time window to around 40,000 TSC. Would you happen to have any insights on this?

Another thought is the CPU cache. The CPU has a buffer where it caches instructions recently read from memory. Executing a cached instruction is faster than executing an instruction fetched from memory. It is conceivable that the first time the no op loop is executed all instructions are fetched from memory. In subsequent executions of the no op loop perhaps some or all of the no op instructions are found in the CPU cache and executed from the cache without any memory fetches. The percentage of no op instructions that are found in the cache could vary depending on what happens between the time that you enable interrupts and then disable interrupts again for the next loop run.

keymaker · 06-18-2023, 08:50 PM

Quote:

Originally Posted by jailbait

Yes, but the timings that you posted below indicate that I am probably barking up the wrong tree.

Another thought is the CPU cache. The CPU has a buffer where it caches instructions recently read from memory. Executing a cached instruction is faster than executing an instruction fetched from memory. It is conceivable that the first time the no op loop is executed all instructions are fetched from memory. In subsequent executions of the no op loop perhaps some or all of the no op instructions are found in the CPU cache and executed from the cache without any memory fetches. The percentage of no op instructions that are found in the cache could vary depending on what happens between the time that you enable interrupts and then disable interrupts again for the next loop run.

Thank you once again for your insights!

Regarding the CPU cache, I understand that a cache miss can potentially extend the time window, typically by a few hundred TSC(under my experiment setting). However, I'm curious if it's possible for a cache miss to cause the CPU to experience delays of tens of thousands of TSC. Is such a significant impact on performance plausible due to cache misses?

jailbait · 06-20-2023, 10:22 PM

Quote:

Originally Posted by keymaker

Thank you once again for your insights!

Regarding the CPU cache, I understand that a cache miss can potentially extend the time window, typically by a few hundred TSC(under my experiment setting). However, I'm curious if it's possible for a cache miss to cause the CPU to experience delays of tens of thousands of TSC. Is such a significant impact on performance plausible due to cache misses?

The trace that you ran eliminates the possibility that the timing variations are due to extraneous interrupts. The two possibilities that I have raised could be logical explanations of time variations in the speed that a section of code runs. But to answer your specific questions, I couldn't say with the available information. How long is the string of no op instructions in each pass of the noop_loop? How large is the cache? What happens to the cache between executions of the no op loop? What is the cycle speed of the cpu?

I have run such tests on IBM mainframes but never on a microcomputer or PC. To do so I used specialized hardware attached to the the RAM of the machine which was the test subject. The testing machine measured the start time when one specific RAM address was accessed to the stop time when another RAM address was accessed. But since I don't have such testing gear available and I don't even know what model CPU or RAM you are using I can only point you in the general direction of how to study the problem.