Skip to content

Commit

Permalink
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/lin…
Browse files Browse the repository at this point in the history
…ux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix a variety of bugs, many of which were found by folks using fuzzing
  or error injection.

  Also fix up how test_dummy_encryption mount option is handled for the
  new mount API.

  Finally, fix/cleanup a number of comments and ext4 Documentation
  files"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix a doubled word "need" in a comment
  ext4: add reserved GDT blocks check
  ext4: make variable "count" signed
  ext4: correct the judgment of BUG in ext4_mb_normalize_request
  ext4: fix bug_on ext4_mb_use_inode_pa
  ext4: fix up test_dummy_encryption handling for new mount API
  ext4: use kmemdup() to replace kmalloc + memcpy
  ext4: fix super block checksum incorrect after mount
  ext4: improve write performance with disabled delalloc
  ext4: fix warning when submitting superblock in ext4_commit_super()
  ext4, doc: remove unnecessary escaping
  ext4: fix incorrect comment in ext4_bio_write_page()
  fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment
  • Loading branch information
torvalds committed Jun 19, 2022
2 parents ace2045 + 1f3ddff commit 354c6e0
Show file tree
Hide file tree
Showing 26 changed files with 947 additions and 895 deletions.
68 changes: 34 additions & 34 deletions Documentation/filesystems/ext4/attributes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ disappeared as of Linux 3.0.

There are two places where extended attributes can be found. The first
place is between the end of each inode entry and the beginning of the
next inode entry. For example, if inode.i\_extra\_isize = 28 and
sb.inode\_size = 256, then there are 256 - (128 + 28) = 100 bytes
next inode entry. For example, if inode.i_extra_isize = 28 and
sb.inode_size = 256, then there are 256 - (128 + 28) = 100 bytes
available for in-inode extended attribute storage. The second place
where extended attributes can be found is in the block pointed to by
``inode.i_file_acl``. As of Linux 3.11, it is not possible for this
Expand All @@ -38,8 +38,8 @@ Extended attributes, when stored after the inode, have a header
- Name
- Description
* - 0x0
- \_\_le32
- h\_magic
- __le32
- h_magic
- Magic number for identification, 0xEA020000. This value is set by the
Linux driver, though e2fsprogs doesn't seem to check it(?)

Expand All @@ -55,28 +55,28 @@ The beginning of an extended attribute block is in
- Name
- Description
* - 0x0
- \_\_le32
- h\_magic
- __le32
- h_magic
- Magic number for identification, 0xEA020000.
* - 0x4
- \_\_le32
- h\_refcount
- __le32
- h_refcount
- Reference count.
* - 0x8
- \_\_le32
- h\_blocks
- __le32
- h_blocks
- Number of disk blocks used.
* - 0xC
- \_\_le32
- h\_hash
- __le32
- h_hash
- Hash value of all attributes.
* - 0x10
- \_\_le32
- h\_checksum
- __le32
- h_checksum
- Checksum of the extended attribute block.
* - 0x14
- \_\_u32
- h\_reserved[3]
- __u32
- h_reserved[3]
- Zero.

The checksum is calculated against the FS UUID, the 64-bit block number
Expand All @@ -100,46 +100,46 @@ Attributes stored inside an inode do not need be stored in sorted order.
- Name
- Description
* - 0x0
- \_\_u8
- e\_name\_len
- __u8
- e_name_len
- Length of name.
* - 0x1
- \_\_u8
- e\_name\_index
- __u8
- e_name_index
- Attribute name index. There is a discussion of this below.
* - 0x2
- \_\_le16
- e\_value\_offs
- __le16
- e_value_offs
- Location of this attribute's value on the disk block where it is stored.
Multiple attributes can share the same value. For an inode attribute
this value is relative to the start of the first entry; for a block this
value is relative to the start of the block (i.e. the header).
* - 0x4
- \_\_le32
- e\_value\_inum
- __le32
- e_value_inum
- The inode where the value is stored. Zero indicates the value is in the
same block as this entry. This field is only used if the
INCOMPAT\_EA\_INODE feature is enabled.
INCOMPAT_EA_INODE feature is enabled.
* - 0x8
- \_\_le32
- e\_value\_size
- __le32
- e_value_size
- Length of attribute value.
* - 0xC
- \_\_le32
- e\_hash
- __le32
- e_hash
- Hash value of attribute name and attribute value. The kernel doesn't
update the hash for in-inode attributes, so for that case this value
must be zero, because e2fsck validates any non-zero hash regardless of
where the xattr lives.
* - 0x10
- char
- e\_name[e\_name\_len]
- e_name[e_name_len]
- Attribute name. Does not include trailing NULL.

Attribute values can follow the end of the entry table. There appears to
be a requirement that they be aligned to 4-byte boundaries. The values
are stored starting at the end of the block and grow towards the
xattr\_header/xattr\_entry table. When the two collide, the overflow is
xattr_header/xattr_entry table. When the two collide, the overflow is
put into a separate disk block. If the disk block fills up, the
filesystem returns -ENOSPC.

Expand Down Expand Up @@ -167,15 +167,15 @@ the key name. Here is a map of name index values to key prefixes:
* - 1
- “user.”
* - 2
- “system.posix\_acl\_access
- “system.posix_acl_access
* - 3
- “system.posix\_acl\_default
- “system.posix_acl_default
* - 4
- “trusted.”
* - 6
- “security.”
* - 7
- “system.” (inline\_data only?)
- “system.” (inline_data only?)
* - 8
- “system.richacl” (SuSE kernels only?)

Expand Down
2 changes: 1 addition & 1 deletion Documentation/filesystems/ext4/bigalloc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ means that a block group addresses 32 gigabytes instead of 128 megabytes,
also shrinking the amount of file system overhead for metadata.

The administrator can set a block cluster size at mkfs time (which is
stored in the s\_log\_cluster\_size field in the superblock); from then
stored in the s_log_cluster_size field in the superblock); from then
on, the block bitmaps track clusters, not individual blocks. This means
that block groups can be several gigabytes in size (instead of just
128MiB); however, the minimum allocation unit becomes a cluster, not a
Expand Down
6 changes: 3 additions & 3 deletions Documentation/filesystems/ext4/bitmaps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ group.
The inode bitmap records which entries in the inode table are in use.

As with most bitmaps, one bit represents the usage status of one data
block or inode table entry. This implies a block group size of 8 \*
number\_of\_bytes\_in\_a\_logical\_block.
block or inode table entry. This implies a block group size of 8 *
number_of_bytes_in_a_logical_block.

NOTE: If ``BLOCK_UNINIT`` is set for a given block group, various parts
of the kernel and e2fsprogs code pretends that the block bitmap contains
zeros (i.e. all blocks in the group are free). However, it is not
necessarily the case that no blocks are in use -- if ``meta_bg`` is set,
the bitmaps and group descriptor live inside the group. Unfortunately,
ext2fs\_test\_block\_bitmap2() will return '0' for those locations,
ext2fs_test_block_bitmap2() will return '0' for those locations,
which produces confusing debugfs output.

Inode Table
Expand Down
30 changes: 15 additions & 15 deletions Documentation/filesystems/ext4/blockgroup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,39 +56,39 @@ established that the super block and the group descriptor table, if
present, will be at the beginning of the block group. The bitmaps and
the inode table can be anywhere, and it is quite possible for the
bitmaps to come after the inode table, or for both to be in different
groups (flex\_bg). Leftover space is used for file data blocks, indirect
groups (flex_bg). Leftover space is used for file data blocks, indirect
block maps, extent tree blocks, and extended attributes.

Flexible Block Groups
---------------------

Starting in ext4, there is a new feature called flexible block groups
(flex\_bg). In a flex\_bg, several block groups are tied together as one
(flex_bg). In a flex_bg, several block groups are tied together as one
logical block group; the bitmap spaces and the inode table space in the
first block group of the flex\_bg are expanded to include the bitmaps
and inode tables of all other block groups in the flex\_bg. For example,
if the flex\_bg size is 4, then group 0 will contain (in order) the
first block group of the flex_bg are expanded to include the bitmaps
and inode tables of all other block groups in the flex_bg. For example,
if the flex_bg size is 4, then group 0 will contain (in order) the
superblock, group descriptors, data block bitmaps for groups 0-3, inode
bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining
space in group 0 is for file data. The effect of this is to group the
block group metadata close together for faster loading, and to enable
large files to be continuous on disk. Backup copies of the superblock
and group descriptors are always at the beginning of block groups, even
if flex\_bg is enabled. The number of block groups that make up a
flex\_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.
if flex_bg is enabled. The number of block groups that make up a
flex_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.

Meta Block Groups
-----------------

Without the option META\_BG, for safety concerns, all block group
Without the option META_BG, for safety concerns, all block group
descriptors copies are kept in the first block group. Given the default
128MiB(2^27 bytes) block group size and 64-byte group descriptors, ext4
can have at most 2^27/64 = 2^21 block groups. This limits the entire
filesystem size to 2^21 * 2^27 = 2^48bytes or 256TiB.

The solution to this problem is to use the metablock group feature
(META\_BG), which is already in ext3 for all 2.6 releases. With the
META\_BG feature, ext4 filesystems are partitioned into many metablock
(META_BG), which is already in ext3 for all 2.6 releases. With the
META_BG feature, ext4 filesystems are partitioned into many metablock
groups. Each metablock group is a cluster of block groups whose group
descriptor structures can be stored in a single disk block. For ext4
filesystems with 4 KB block size, a single metablock group partition
Expand All @@ -110,7 +110,7 @@ bytes, a meta-block group contains 32 block groups for filesystems with
a 1KB block size, and 128 block groups for filesystems with a 4KB
blocksize. Filesystems can either be created using this new block group
descriptor layout, or existing filesystems can be resized on-line, and
the field s\_first\_meta\_bg in the superblock will indicate the first
the field s_first_meta_bg in the superblock will indicate the first
block group using this new layout.

Please see an important note about ``BLOCK_UNINIT`` in the section about
Expand All @@ -121,15 +121,15 @@ Lazy Block Group Initialization

A new feature for ext4 are three block group descriptor flags that
enable mkfs to skip initializing other parts of the block group
metadata. Specifically, the INODE\_UNINIT and BLOCK\_UNINIT flags mean
metadata. Specifically, the INODE_UNINIT and BLOCK_UNINIT flags mean
that the inode and block bitmaps for that group can be calculated and
therefore the on-disk bitmap blocks are not initialized. This is
generally the case for an empty block group or a block group containing
only fixed-location block group metadata. The INODE\_ZEROED flag means
only fixed-location block group metadata. The INODE_ZEROED flag means
that the inode table has been initialized; mkfs will unset this flag and
rely on the kernel to initialize the inode tables in the background.

By not writing zeroes to the bitmaps and inode table, mkfs time is
reduced considerably. Note the feature flag is RO\_COMPAT\_GDT\_CSUM,
but the dumpe2fs output prints this as “uninit\_bg”. They are the same
reduced considerably. Note the feature flag is RO_COMPAT_GDT_CSUM,
but the dumpe2fs output prints this as “uninit_bg”. They are the same
thing.
2 changes: 1 addition & 1 deletion Documentation/filesystems/ext4/blockmap.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| i.i\_block Offset | Where It Points |
| i.i_block Offset | Where It Points |
+=====================+==============================================================================================================================================================================================================================+
| 0 to 11 | Direct map to file blocks 0 to 11. |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Expand Down
26 changes: 13 additions & 13 deletions Documentation/filesystems/ext4/checksums.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Checksums
---------

Starting in early 2012, metadata checksums were added to all major ext4
and jbd2 data structures. The associated feature flag is metadata\_csum.
and jbd2 data structures. The associated feature flag is metadata_csum.
The desired checksum algorithm is indicated in the superblock, though as
of October 2012 the only supported algorithm is crc32c. Some data
structures did not have space to fit a full 32-bit checksum, so only the
Expand All @@ -20,7 +20,7 @@ encounters directory blocks that lack sufficient empty space to add a
checksum, it will request that you run ``e2fsck -D`` to have the
directories rebuilt with checksums. This has the added benefit of
removing slack space from the directory files and rebalancing the htree
indexes. If you \_ignore\_ this step, your directories will not be
indexes. If you _ignore_ this step, your directories will not be
protected by a checksum!

The following table describes the data elements that go into each type
Expand All @@ -35,39 +35,39 @@ of checksum. The checksum function is whatever the superblock describes
- Length
- Ingredients
* - Superblock
- \_\_le32
- __le32
- The entire superblock up to the checksum field. The UUID lives inside
the superblock.
* - MMP
- \_\_le32
- __le32
- UUID + the entire MMP block up to the checksum field.
* - Extended Attributes
- \_\_le32
- __le32
- UUID + the entire extended attribute block. The checksum field is set to
zero.
* - Directory Entries
- \_\_le32
- __le32
- UUID + inode number + inode generation + the directory block up to the
fake entry enclosing the checksum field.
* - HTREE Nodes
- \_\_le32
- __le32
- UUID + inode number + inode generation + all valid extents + HTREE tail.
The checksum field is set to zero.
* - Extents
- \_\_le32
- __le32
- UUID + inode number + inode generation + the entire extent block up to
the checksum field.
* - Bitmaps
- \_\_le32 or \_\_le16
- __le32 or __le16
- UUID + the entire bitmap. Checksums are stored in the group descriptor,
and truncated if the group descriptor size is 32 bytes (i.e. ^64bit)
* - Inodes
- \_\_le32
- __le32
- UUID + inode number + inode generation + the entire inode. The checksum
field is set to zero. Each inode has its own checksum.
* - Group Descriptors
- \_\_le16
- If metadata\_csum, then UUID + group number + the entire descriptor;
else if gdt\_csum, then crc16(UUID + group number + the entire
- __le16
- If metadata_csum, then UUID + group number + the entire descriptor;
else if gdt_csum, then crc16(UUID + group number + the entire
descriptor). In all cases, only the lower 16 bits are stored.

Loading

0 comments on commit 354c6e0

Please sign in to comment.