0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 pstore block oops/panic logger
0004 ==============================
0005
0006 Introduction
0007 ------------
0008
0009 pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
0010 block device and non-block device before the system crashes. You can get
0011 these log files by mounting pstore filesystem like::
0012
0013 mount -t pstore pstore /sys/fs/pstore
0014
0015
0016 pstore block concepts
0017 ---------------------
0018
0019 pstore/blk provides efficient configuration method for pstore/blk, which
0020 divides all configurations into two parts, configurations for user and
0021 configurations for driver.
0022
0023 Configurations for user determine how pstore/blk works, such as pmsg_size,
0024 kmsg_size and so on. All of them support both Kconfig and module parameters,
0025 but module parameters have priority over Kconfig.
0026
0027 Configurations for driver are all about block device and non-block device,
0028 such as total_size of block device and read/write operations.
0029
0030 Configurations for user
0031 -----------------------
0032
0033 All of these configurations support both Kconfig and module parameters, but
0034 module parameters have priority over Kconfig.
0035
0036 Here is an example for module parameters::
0037
0038 pstore_blk.blkdev=/dev/mmcblk0p7 pstore_blk.kmsg_size=64 best_effort=y
0039
0040 The detail of each configurations may be of interest to you.
0041
0042 blkdev
0043 ~~~~~~
0044
0045 The block device to use. Most of the time, it is a partition of block device.
0046 It's required for pstore/blk. It is also used for MTD device.
0047
0048 When pstore/blk is built as a module, "blkdev" accepts the following variants:
0049
0050 1. /dev/<disk_name> represents the device number of disk
0051 #. /dev/<disk_name><decimal> represents the device number of partition - device
0052 number of disk plus the partition number
0053 #. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
0054 name of partitioned disk ends with a digit.
0055
0056 When pstore/blk is built into the kernel, "blkdev" accepts the following variants:
0057
0058 #. <hex_major><hex_minor> device number in hexadecimal representation,
0059 with no leading 0x, for example b302.
0060 #. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of
0061 a partition if the partition table provides it. The UUID may be either an
0062 EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
0063 where SSSSSSSS is a zero-filled hex representation of the 32-bit
0064 "NT disk signature", and PP is a zero-filled hex representation of the
0065 1-based partition number.
0066 #. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
0067 partition with a known unique id.
0068 #. <major>:<minor> major and minor number of the device separated by a colon.
0069
0070 It accepts the following variants for MTD device:
0071
0072 1. <device name> MTD device name. "pstore" is recommended.
0073 #. <device number> MTD device number.
0074
0075 kmsg_size
0076 ~~~~~~~~~
0077
0078 The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4.
0079 It's optional if you do not care oops/panic log.
0080
0081 There are multiple chunks for oops/panic front-end depending on the remaining
0082 space except other pstore front-ends.
0083
0084 pstore/blk will log to oops/panic chunks one by one, and always overwrite the
0085 oldest chunk if there is no more free chunk.
0086
0087 pmsg_size
0088 ~~~~~~~~~
0089
0090 The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4.
0091 It's optional if you do not care pmsg log.
0092
0093 Unlike oops/panic front-end, there is only one chunk for pmsg front-end.
0094
0095 Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
0096 appended to the chunk. On reboot the contents are available in
0097 */sys/fs/pstore/pmsg-pstore-blk-0*.
0098
0099 console_size
0100 ~~~~~~~~~~~~
0101
0102 The chunk size in KB for console front-end. It **MUST** be a multiple of 4.
0103 It's optional if you do not care console log.
0104
0105 Similar to pmsg front-end, there is only one chunk for console front-end.
0106
0107 All log of console will be appended to the chunk. On reboot the contents are
0108 available in */sys/fs/pstore/console-pstore-blk-0*.
0109
0110 ftrace_size
0111 ~~~~~~~~~~~
0112
0113 The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4.
0114 It's optional if you do not care console log.
0115
0116 Similar to oops front-end, there are multiple chunks for ftrace front-end
0117 depending on the count of cpu processors. Each chunk size is equal to
0118 ftrace_size / processors_count.
0119
0120 All log of ftrace will be appended to the chunk. On reboot the contents are
0121 combined and available in */sys/fs/pstore/ftrace-pstore-blk-0*.
0122
0123 Persistent function tracing might be useful for debugging software or hardware
0124 related hangs. Here is an example of usage::
0125
0126 # mount -t pstore pstore /sys/fs/pstore
0127 # mount -t debugfs debugfs /sys/kernel/debug/
0128 # echo 1 > /sys/kernel/debug/pstore/record_ftrace
0129 # reboot -f
0130 [...]
0131 # mount -t pstore pstore /sys/fs/pstore
0132 # tail /sys/fs/pstore/ftrace-pstore-blk-0
0133 CPU:0 ts:5914676 c0063828 c0063b94 call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
0134 CPU:0 ts:5914678 c039ecdc c006385c cpuidle_enter_state <- call_cpuidle+0x44/0x48
0135 CPU:0 ts:5914680 c039e9a0 c039ecf0 cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314
0136 CPU:0 ts:5914681 c0063870 c039ea30 sched_idle_set_state <- cpuidle_enter_state+0x44/0x314
0137 CPU:1 ts:5916720 c0160f59 c015ee04 kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204
0138 CPU:1 ts:5916721 c05ca625 c015ee0c __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204
0139 CPU:1 ts:5916723 c05c813d c05ca630 yield_to <- __mutex_lock_slowpath+0x314/0x358
0140 CPU:1 ts:5916724 c05ca2d1 c05ca638 __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358
0141
0142 max_reason
0143 ~~~~~~~~~~
0144
0145 Limiting which kinds of kmsg dumps are stored can be controlled via
0146 the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's
0147 ``enum kmsg_dump_reason``. For example, to store both Oopses and Panics,
0148 ``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics
0149 ``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0
0150 (KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the
0151 ``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS,
0152 otherwise KMSG_DUMP_MAX.
0153
0154 Configurations for driver
0155 -------------------------
0156
0157 A device driver uses ``register_pstore_device`` with
0158 ``struct pstore_device_info`` to register to pstore/blk.
0159
0160 .. kernel-doc:: fs/pstore/blk.c
0161 :export:
0162
0163 Compression and header
0164 ----------------------
0165
0166 Block device is large enough for uncompressed oops data. Actually we do not
0167 recommend data compression because pstore/blk will insert some information into
0168 the first line of oops/panic data. For example::
0169
0170 Panic: Total 16 times
0171
0172 It means that it's OOPS|Panic for the 16th time since the first booting.
0173 Sometimes the number of occurrences of oops|panic since the first booting is
0174 important to judge whether the system is stable.
0175
0176 The following line is inserted by pstore filesystem. For example::
0177
0178 Oops#2 Part1
0179
0180 It means that it's OOPS for the 2nd time on the last boot.
0181
0182 Reading the data
0183 ----------------
0184
0185 The dump data can be read from the pstore filesystem. The format for these
0186 files is ``dmesg-pstore-blk-[N]`` for oops/panic front-end,
0187 ``pmsg-pstore-blk-0`` for pmsg front-end and so on. The timestamp of the
0188 dump file records the trigger time. To delete a stored record from block
0189 device, simply unlink the respective pstore file.
0190
0191 Attentions in panic read/write APIs
0192 -----------------------------------
0193
0194 If on panic, the kernel is not going to run for much longer, the tasks will not
0195 be scheduled and most kernel resources will be out of service. It
0196 looks like a single-threaded program running on a single-core computer.
0197
0198 The following points require special attention for panic read/write APIs:
0199
0200 1. Can **NOT** allocate any memory.
0201 If you need memory, just allocate while the block driver is initializing
0202 rather than waiting until the panic.
0203 #. Must be polled, **NOT** interrupt driven.
0204 No task schedule any more. The block driver should delay to ensure the write
0205 succeeds, but NOT sleep.
0206 #. Can **NOT** take any lock.
0207 There is no other task, nor any shared resource; you are safe to break all
0208 locks.
0209 #. Just use CPU to transfer.
0210 Do not use DMA to transfer unless you are sure that DMA will not keep lock.
0211 #. Control registers directly.
0212 Please control registers directly rather than use Linux kernel resources.
0213 Do I/O map while initializing rather than wait until a panic occurs.
0214 #. Reset your block device and controller if necessary.
0215 If you are not sure of the state of your block device and controller when
0216 a panic occurs, you are safe to stop and reset them.
0217
0218 pstore/blk supports psblk_blkdev_info(), which is defined in
0219 *linux/pstore_blk.h*, to get information of using block device, such as the
0220 device number, sector count and start sector of the whole disk.
0221
0222 pstore block internals
0223 ----------------------
0224
0225 For developer reference, here are all the important structures and APIs:
0226
0227 .. kernel-doc:: fs/pstore/zone.c
0228 :internal:
0229
0230 .. kernel-doc:: include/linux/pstore_zone.h
0231 :internal:
0232
0233 .. kernel-doc:: include/linux/pstore_blk.h
0234 :internal: