rfc9766v2.txt   rfc9766.txt 
Internet Engineering Task Force (IETF) T. Haynes Internet Engineering Task Force (IETF) T. Haynes
Request for Comments: 9766 T. Myklebust Request for Comments: 9766 T. Myklebust
Category: Standards Track Hammerspace Category: Standards Track Hammerspace
ISSN: 2070-1721 April 2025 ISSN: 2070-1721 April 2025
Extensions for Weak Cache Consistency in NFSv4.2's Flexible File Layout Extensions for Weak Cache Consistency in NFSv4.2's Flexible File Layout
Abstract Abstract
This document specifies extensions to Parallel NFS (pNFS) for This document specifies extensions to NFSv4.2 for improving Weak
improving Weak Cache Consistency (WCC). These extensions introduce Cache Consistency (WCC). These extensions introduce mechanisms that
mechanisms that ensure partial writes performed under a pNFS layout ensure partial writes performed under a Parallel NFS (pNFS) layout
remain coherent and correctly tracked. The solution addresses remain coherent and correctly tracked. The solution addresses
concurrency and data integrity concerns that may arise when multiple concurrency and data integrity concerns that may arise when multiple
clients write to the same file through separate data servers. By clients write to the same file through separate data servers. By
defining additional interactions among clients, metadata servers, and defining additional interactions among clients, metadata servers, and
data servers, this specification enhances the reliability of NFSv4 in data servers, this specification enhances the reliability of NFSv4 in
parallel-access environments and ensures consistency across diverse parallel-access environments and ensures consistency across diverse
deployment scenarios. deployment scenarios.
Status of This Memo Status of This Memo
skipping to change at line 140 skipping to change at line 140
capitals, as shown here. capitals, as shown here.
2. Weak Cache Consistency (WCC) 2. Weak Cache Consistency (WCC)
A pNFS layout type enables the metadata server to inform the client A pNFS layout type enables the metadata server to inform the client
of both the storage protocol and the locations of the data that the of both the storage protocol and the locations of the data that the
client should use when communicating with the storage devices. The client should use when communicating with the storage devices. The
flexible file layout type, as specified in [RFC8435], describes how flexible file layout type, as specified in [RFC8435], describes how
data servers using NFSv3 can be accessed. The client is restricted data servers using NFSv3 can be accessed. The client is restricted
to performing the following NFSv3 operations on the filehandles to performing the following NFSv3 operations on the filehandles
provided in the layout: READ (Section 3.3.6 of [RFC1813]), WRITE provided in the layout: READ, WRITE, and COMMIT (see Sections 3.3.6,
(Section 3.3.7 of [RFC1813]), and COMMIT (Section 3.3.21 of 3.3.7, and 3.3.21 of [RFC1813], respectively). In other words, the
[RFC1813]). In other words, the client may only use NFSv3 operations client may only use NFSv3 operations that act directly on the data
that act directly on the data portion of the file. portion of the file.
Because there is no control protocol (see [RFC8434]) possible with Because there is no control protocol (see [RFC8434]) possible with
all data servers, NFSv3 is used as the control protocol. As such, all data servers, NFSv3 is used as the control protocol. As such,
the following NFSv3 operations are commonly used by the metadata the following NFSv3 operations are commonly used by the metadata
server: CREATE (see Section 3.3.8 of [RFC1813]), GETATTR (see server: CREATE, GETATTR, and SETATTR (see Sections 3.3.8, 3.3.1, and
Section 3.3.1 of [RFC1813]), and SETATTR (see Section 3.3.2 of 3.3.2 of [RFC1813], respectively). That is, the metadata server is
[RFC1813]). That is, the metadata server is only allowed to use only allowed to use NFSv3 operations that directly act on the
NFSv3 operations that directly act on the metadata portion of the metadata portion of the data file. GETATTR allows the metadata
data file. GETATTR allows the metadata server to mainly retrieve the server to mainly retrieve the mtime (modify time), ctime (change
mtime (modify time), ctime (change time), and atime (access time). time), and atime (access time). The metadata server can use this
The metadata server can use this information to determine if the information to determine if the client modified the file whilst it
client modified the file whilst it held an iomode of LAYOUTIOMODE4_RW held an iomode of LAYOUTIOMODE4_RW (see Section 3.3.20 of [RFC8881]).
(see Section 3.3.20 of [RFC8881]). Then it can determine the Then it can determine the following for the metadata file:
following for the metadata file: time_modify (see Section 5.8.2.43 of time_modify, time_metadata, and time_access (see Sections 5.8.2.43,
[RFC8881]), time_metadata (see Section 5.8.2.42 of [RFC8881]), and 5.8.2.42, and 5.8.2.37 of [RFC8881], respectively). That is, it can
time_access (see Section 5.8.2.37 of [RFC8881]). That is, it can
determine the information to return to clients in an NFSv4.2 GETATTR determine the information to return to clients in an NFSv4.2 GETATTR
response. response.
For example, the metadata server might issue an NFSv3 GETATTR For example, the metadata server might issue an NFSv3 GETATTR
operation to the data server, which is typically triggered by a operation to the data server, which is typically triggered by a
client's NFSv4 GETATTR request to the metadata server. In addition client's NFSv4 GETATTR request to the metadata server. In addition
to the cost of each individual GETATTR operation, the data server can to the cost of each individual GETATTR operation, the data server can
be overwhelmed by a large volume of such requests. NFSv3 addressed a be overwhelmed by a large volume of such requests. NFSv3 addressed a
similar challenge by including a post-operation attribute in the READ similar challenge by including a post-operation attribute in the READ
and WRITE operations to report WCC data (see Section 2.6 of and WRITE operations to report WCC data (see Section 2.6 of
 End of changes. 3 change blocks. 
19 lines changed or deleted 18 lines changed or added

This html diff was produced by rfcdiff 1.48.