Definition: a controlled entry point into the kernel. A way by which user programs can execute functions that require a greater privilege.
A few words about privileges... In general, each CPU / microcontroller has a set of operating modes. Among this set of modes, some of them concern security; without going to deep into details, security means:
- What memory areas can be read/written (depending on the memory map, some ranges from the addressing space might point to I/O devices - this happens for memory-mapped I/O)
- What instructions can be executed (I'm referring to microinstructions)
The current IA-32 architecture has 4 so-called privilege or protection rings. Ring0 is the level with the most privileges, ring1 is next, and ring3 is the last. [Terminology: rings are also called current protection level (CPL) => sometimes we see the term ring0 to 4, sometimes we see the term CPL-0 to CPL-4.] Software runs in one of these 4 rings. Operating systems manage the switching from one ring to another, by calling CPU instructions that effectively do this. Kernel runs in ring0, device drivers run in ring1, user applications usually run in ring3, which restricts access certain functions (like memory map for instance) that would impact the correct behavior of other applications. Since the kernel is the only code to run in ring0, it can control which application can run in which ring. An application who runs in a low-privileged ring cannot force the CPU to switch to a higher-privilege ring, because it simply hasn't the right to execute the instructions that change the CPU state.
Now let's see what happens when a user application makes a system call. Each OS has an API that can be accessed by user applications. Among the provided functions are: I/O functions, processes management, IPC, etc. I will talk what happens when making a Linux system call:
- the application program makes a system call by calling a wrapper function in the C library (glibc, etc), like for instance, fopen.
- the wrapper function must make all of the system call parameters available to the system function. It receives the parameters in the stack (user process' stack), but must copy them in some registers, in order to pass them to the kernel. These registers are %ecx (counter register), %edx (data register) and %ebp (base pointer) are saved on the user stack and %esp (stack pointer) is copied to %ebp before executing sysenter (it helps in restoring the user stack).
- After executing sysenter instruction, processor starts execution at sysenter_entry. sysenter_entry is defined in/usr/src/linux/arch/i386/kernel/entry.S
- wrapper calls __kernel_vsyscall function. This is the function that Address of __kernel_vsyscall is not fixed. The kernel passes this address to user processes using AT_SYSINFO elf parameter.
Of course, user programs can make directly a system call, but wrapper functions provide a more user-friendly way of making these calls.