Linux memory management questions with regards to hardware

Julius Novachrono · 01-18-2020, 11:13 AM

Hi there!
In about 10 days or so I should have a systems interview and I was directed to this site for studying by the interviewer: https://www.tldp.org/LDP/tlk/tlk.html

In the memory management segment I have a few things I haven't managed to understand:

1. Where does exactly the virtual addresses for every process are sitting at? The kernel part of the physical memory?

2. Is the swap partition a separate section on the HDD where linux should keep stuff that are in physical memory but had to kick out since they were selected to be kicked out AND they are edited somehow = therefor the system has to keep them somewhere before being ultimately saved on the HDD?

3. What exactly is a multi-level page table? Is the meaning of it that you can have fewer, more relevant page tables for each process and keep the other ones NOT on physical memory and therefor ease the weight?
Like instead of 1M entries, go with a 1K entries page table where every entry leads to another page table with 1K entries etc.

4. The free_area vector is like an array where the i'th cell is the collection of every 2^i sized blocks that are free?

5. Is the general idea of demand paging is to get the most relevant part (well, page) to physical memory, and once the process calls for another one using another virtual address it won't be there on the physical memory and therefor creates a page fault and then linux brings the next page and the process tries to call that virtual address again?

6. The mm_struct contains info about an image that is being executed. It also contains pointers to vm_area_struct - what are they exactly? It says that they describe the start and end of an area of virtual memory, but they way I got it the virtual memory of every process is actually pretty much the same, it is the mapping of it that makes it different and allow each process to do it's things.
Is the goal of it to separate the virtual addresses in the virtual space of each process in order to supply them with different instructions or set operations?

7. Caches - what are they exactly for gods sake? It says that there are the Buffer, Page, Swap and Hardware cache (Which is basically the TLB?).
What is the actual mechanism that describes how linux uses them?
As pages gets loaded into the page cache, then linux can remove those pages from the physical memory? Or it is just a way to save the translation from the page table?
What it means for a page to be present in a cache?
How does linux treat the page accordingly?

8. In the last part of this page (3.7 and on) it is explained that in order to free memory linux will try to:

(a) Reducing the size of the buffer and page caches
(b) Swapping out System V shared memory pages
(c) Swapping out and discarding pages

(c) I get, but can you please elaborate on the other two?
Why reducing size of buffer\page cache and swapping out System V shared pages (What are they anyway?) can help to free memory?

I guess this is quite the quiz, but I hope someone can contribute his knowledge and help me to prepare as best I can.

Thanks!

berndbausch · 01-19-2020, 04:53 AM

Quote:

Originally Posted by Julius Novachrono

Hi there!
In about 10 days or so I should have a systems interview and I was directed to this site for studying by the interviewer: https://www.tldp.org/LDP/tlk/tlk.html

This document is over 20 years old. General points are probably still correct, but the more it becomes specific the likelier it is that the Linux kernel has had a major reorganisation. To learn about the Linux kernel, I suggest going to the documentation at kernel.org.

Your questions reveal also that you should learn generally how memory management and paging work. This is not Linux-specific; all operating systems that implement virtual memory use more or less the same principles. Get a text on operating system design and work through it.

Having said that, partial answers:

Quote:

2. Is the swap partition a separate section on the HDD where linux should keep stuff that are in physical memory but had to kick out since they were selected to be kicked out AND they are edited somehow = therefor the system has to keep them somewhere before being ultimately saved on the HDD?

When the kernel needs physical memory, and all physical pages are in use, the kernel selects pages that it can replace. Pages that have not been modified since they were copied to memory, as well as pages that have not been touched for a while are prime candidates. However, a page that was written to can't just be reused; its content must be saved somewhere first. It will be saved to swap space.

Quote:

5. Is the general idea of demand paging is to get the most relevant part (well, page) to physical memory, and once the process calls for another one using another virtual address it won't be there on the physical memory and therefor creates a page fault and then linux brings the next page and the process tries to call that virtual address again?

Yes, pretty much. I would simply say that demand paging means that only those pages of a process are copied to or created in RAM that are currently required. There are techniques like read-ahead which soften that statement somewhat.

Quote:

7. Caches - what are they exactly for gods sake? It says that there are the Buffer, Page, Swap and Hardware cache (Which is basically the TLB?).

Generally speaking, you keep data in a fast cache that originates from some slower storage medium.

The file system buffer cache is in RAM and keeps copies of pages that originate from disks. The cache of a proxy server keeps web pages that originate somewhere far away in the internet. Disks themselves have caches onboard.

The term "hardware cache" is very non-specific; it might refer to the several level of caches that CPUs tend to have (RAM is comparatively slow).

The TLB is a hardware structure that keeps the most recently used translations between virtual and physical addresses. As such it can be seen as the cache of the page table, but it's more than just a cache. It's an example for content-addressable memory I would say.

Quote:

What is the actual mechanism that describes how linux uses them?

Since there are many types of caches, there are many mechanisms.

Quote:

As pages gets loaded into the page cache, then linux can remove those pages from the physical memory? Or it is just a way to save the translation from the page table?
What it means for a page to be present in a cache?
How does linux treat the page accordingly?

The document you refer to says that the page cache "is used to cache the logical contents of a file a page at a time". That is, the page cache is an area in RAM. It has nothing to do with the page table. I think, "page cache" and "buffer cache" are more or less synonymous.

Quote:

8. In the last part of this page (3.7 and on) it is explained that in order to free memory linux will try to:

(a) Reducing the size of the buffer and page caches
(b) Swapping out System V shared memory pages
(c) Swapping out and discarding pages

(c) I get, but can you please elaborate on the other two?
Why reducing size of buffer\page cache and swapping out System V shared pages (What are they anyway?) can help to free memory?

When your application accesses a lot of files, the filesystem buffer cache will grow, as file blocks are cached in it. Releasing file blocks that are stored in the buffer cache makes them available for other purposes.

Shared Memory is virtual memory that can be shared between two or more processes (normally, processes don't share memory). It's called System V because it was designed for an ancient UNIX version named System V. Again, getting rid of shared memory pages in RAM increases the amount of RAM available for other purposes.