rfc9766.original | rfc9766.txt | |||
---|---|---|---|---|
Network File System Version 4 T. Haynes | Internet Engineering Task Force (IETF) T. Haynes | |||
Internet-Draft T. Myklebust | Request for Comments: 9766 T. Myklebust | |||
Intended status: Standards Track Hammerspace | Category: Standards Track Hammerspace | |||
Expires: 11 August 2025 7 February 2025 | ISSN: 2070-1721 April 2025 | |||
Add LAYOUT_WCC to NFSv4.2's Flex File Layout Type | Extensions for Weak Cache Consistency in NFSv4.2's Flexible File Layout | |||
draft-ietf-nfsv4-layoutwcc-07 | ||||
Abstract | Abstract | |||
This document specifies extensions to the parallel Network File | This document specifies extensions to NFSv4.2 for improving Weak | |||
System (NFS) version 4 (pNFS) for improving write cache consistency. | Cache Consistency (WCC). These extensions introduce mechanisms that | |||
These extensions introduce mechanisms that ensure partial writes | ensure partial writes performed under a Parallel NFS (pNFS) layout | |||
performed under a pNFS layout remain coherent and correctly tracked. | remain coherent and correctly tracked. The solution addresses | |||
The solution addresses concurrency and data integrity concerns that | concurrency and data integrity concerns that may arise when multiple | |||
may arise when multiple clients write to the same file through | clients write to the same file through separate data servers. By | |||
separate data servers. By defining additional interactions among | defining additional interactions among clients, metadata servers, and | |||
clients, metadata servers, and data servers, this specification | data servers, this specification enhances the reliability of NFSv4 in | |||
enhances the reliability of NFSv4 in parallel-access environments and | parallel-access environments and ensures consistency across diverse | |||
ensures consistency across diverse deployment scenarios. | deployment scenarios. | |||
Note | ||||
This note is to be removed before publishing as an RFC. | ||||
Discussion of this draft takes place on the NFSv4 working group | ||||
mailing list (nfsv4@ietf.org), which is archived at | ||||
https://mailarchive.ietf.org/arch/browse/nfsv4/. Working Group | ||||
information can be found at https://datatracker.ietf.org/wg/nfsv4/ | ||||
about/. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 11 August 2025. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9766. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2025 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1. Definitions | |||
1.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 | 1.2. Requirements Language | |||
2. Weak Cache Consistency (WCC) . . . . . . . . . . . . . . . . 4 | 2. Weak Cache Consistency (WCC) | |||
3. Operation 77: LAYOUT_WCC - Layout Weak Cache Consistency . . 5 | 3. Operation 77: LAYOUT_WCC - Layout Weak Cache Consistency | |||
3.4. Implementation . . . . . . . . . . . . . . . . . . . . . 6 | 3.1. ARGUMENT | |||
3.4.1. Examples of when to use LAYOUT_WCC . . . . . . . . . 6 | 3.2. RESULT | |||
3.4.2. Examples of what to send in the LAYOUT_WCC . . . . . 7 | 3.3. DESCRIPTION | |||
3.5. Allowed Errors . . . . . . . . . . . . . . . . . . . . . 8 | 3.4. Implementation | |||
3.6. Extension of Existing Implementations . . . . . . . . . . 9 | 3.4.1. Examples of When to Use LAYOUT_WCC | |||
3.7. Flex Files Layout Type . . . . . . . . . . . . . . . . . 9 | 3.4.2. Examples of What to Send in LAYOUT_WCC | |||
4. Extraction of XDR . . . . . . . . . . . . . . . . . . . . . . 10 | 3.5. Allowed Errors | |||
4.1. Code Components Licensing Notice . . . . . . . . . . . . 11 | 3.6. Extension of Existing Implementations | |||
5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 | 3.7. Flexible File Layout Type | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | 4. Extraction of XDR | |||
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 5. Security Considerations | |||
7.1. Normative References . . . . . . . . . . . . . . . . . . 11 | 6. IANA Considerations | |||
7.2. Informative References . . . . . . . . . . . . . . . . . 12 | 7. References | |||
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 13 | 7.1. Normative References | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 | 7.2. Informative References | |||
Acknowledgments | ||||
Authors' Addresses | ||||
1. Introduction | 1. Introduction | |||
In the Network File System version 4 (NFSv4) with a Parallel NFS | In the Parallel NFS (pNFS) flexible file layout (see [RFC8435]), | |||
(pNFS) Flexible File Layout (see Section 12 of [RFC8435]) server, | ||||
there is no mechanism for the data servers to update the metadata | there is no mechanism for the data servers to update the metadata | |||
servers for when the data portion of the file is modified. The | servers when the data portion of the file is modified. The metadata | |||
metadata server needs this knowledge to correspondingly update the | server needs this knowledge to correspondingly update the metadata | |||
metadata portion of the file. If the client is using NFSv3 as the | portion of the file. If the client is using NFSv3 as the protocol | |||
protocol with the data server, it can leverage weak cache consistency | with the data server, it can leverage Weak Cache Consistency (WCC) to | |||
(WCC) to update the metadata server of the attribute changes. In | update the metadata server of the attribute changes. In this | |||
this document, we introduce a new operation called LAYOUT_WCC to | document, we introduce a new operation called LAYOUT_WCC to NFSv4.2, | |||
NFSv4.2 which allows the client to periodically report the attributes | which allows the client to periodically report the attributes of the | |||
of the data files to the metadata server. | data files to the metadata server. | |||
Using the process detailed in [RFC8178], the revisions in this | Using the process detailed in [RFC8178], the revisions in this | |||
document become an extension of NFSv4.2 [RFC7862]. They are built on | document become an extension of NFSv4.2 [RFC7862]. They are built on | |||
top of the external data representation (XDR) [RFC4506] generated | top of the External Data Representation (XDR) [RFC4506] generated | |||
from [RFC7863]. | from [RFC7863]. | |||
1.1. Definitions | 1.1. Definitions | |||
For a more comprehensive set of definitions, see Section 1.1 of | For a more comprehensive set of definitions, see Section 1.1 of | |||
[RFC8435]. | [RFC8435]. | |||
(file) data: that part of the file system object that contains the | (file) data: that part of the file system object that contains the | |||
data to be read or written. It is the contents of the object | data to be read or written. It is the contents of the object | |||
rather than the attributes of the object. | rather than the attributes of the object. | |||
skipping to change at page 3, line 38 ¶ | skipping to change at line 120 ¶ | |||
metadata server (MDS): the pNFS server that provides metadata | metadata server (MDS): the pNFS server that provides metadata | |||
information for a file system object. | information for a file system object. | |||
storage device: the target to which clients may direct I/O requests | storage device: the target to which clients may direct I/O requests | |||
when they hold an appropriate layout. Note that each data server | when they hold an appropriate layout. Note that each data server | |||
is a storage device but that some storage device are not data | is a storage device but that some storage device are not data | |||
servers. (See Section 2.1 of [RFC8434] for a discussion on the | servers. (See Section 2.1 of [RFC8434] for a discussion on the | |||
difference between a data server and a storage device.) | difference between a data server and a storage device.) | |||
weak cache consistency (WCC): In NFSv3, WCC allows the client to | weak cache consistency (WCC): the mechanism in NFSv3 that allows the | |||
check for file attribute changes before and after an operation | client to check for file attribute changes before and after an | |||
(See Section 2.6 of [RFC1813]). | operation (see Section 2.6 of [RFC1813]). | |||
1.2. Requirements Language | 1.2. Requirements Language | |||
The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'NOT RECOMMENDED', 'MAY', and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
'OPTIONAL' in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
2. Weak Cache Consistency (WCC) | 2. Weak Cache Consistency (WCC) | |||
A pNFS layout type enables the metadata server to inform the client | A pNFS layout type enables the metadata server to inform the client | |||
of both the storage protocol and the locations of the data that the | of both the storage protocol and the locations of the data that the | |||
client should use when communicating with the storage devices. The | client should use when communicating with the storage devices. The | |||
Flex Files Layout Type, as specified in [RFC8435], describes how data | flexible file layout type, as specified in [RFC8435], describes how | |||
servers using NFSv3 can be accessed. The client is restricted to | data servers using NFSv3 can be accessed. The client is restricted | |||
performing NFSv3 READ (Section 3.3.6 of [RFC1813]), WRITE | to performing the following NFSv3 operations on the filehandles | |||
(Section 3.3.6 of [RFC1813]), and COMMIT (Section 3.3.21 of | provided in the layout: READ, WRITE, and COMMIT (see Sections 3.3.6, | |||
[RFC1813]) operations on the file handles provided in the layout. In | 3.3.7, and 3.3.21 of [RFC1813], respectively). In other words, the | |||
other words, the client may only use NFSv3 operations that act | client may only use NFSv3 operations that act directly on the data | |||
directly on the data portion of the file. | portion of the file. | |||
Because there is no contol protocol (see [RFC8434]) possible with all | Because there is no control protocol (see [RFC8434]) possible with | |||
data servers, NFSv3 is used as the control protocol. As such, the | all data servers, NFSv3 is used as the control protocol. As such, | |||
NFSv3 CREATE (see Section 3.3.8 of [RFC1813]), GETATTR (see | the following NFSv3 operations are commonly used by the metadata | |||
Section 3.3.1 of [RFC1813]), and SETATTR (see Section 3.3.2 of | server: CREATE, GETATTR, and SETATTR (see Sections 3.3.8, 3.3.1, and | |||
[RFC1813]) are operations commonly used by the metadata server. | 3.3.2 of [RFC1813], respectively). That is, the metadata server is | |||
I.e., the metadata server is only allowed to use NFSv3 operations | only allowed to use NFSv3 operations that directly act on the | |||
which directly act on the metadata portion of the data file. GETATTR | metadata portion of the data file. GETATTR allows the metadata | |||
allows the metadata server to mainly retrieve the mtime (modify | server to mainly retrieve the mtime (modify time), ctime (change | |||
time), ctime (change time), and atime (access time). The metadata | time), and atime (access time). The metadata server can use this | |||
server can use this information to determine if the client modified | information to determine if the client modified the file whilst it | |||
the file whilst it held an iomode of LAYOUTIOMODE4_RW (see | held an iomode of LAYOUTIOMODE4_RW (see Section 3.3.20 of [RFC8881]). | |||
Section 3.3.20 of [RFC8881]). Then it can determine the time_modify | Then it can determine the following for the metadata file: | |||
(see Section 5.8.2.43 of [RFC8881]), time_metadata (see | time_modify, time_metadata, and time_access (see Sections 5.8.2.43, | |||
Section 5.8.2.42 of [RFC8881]), and time_access (see Section 5.8.2.37 | 5.8.2.42, and 5.8.2.37 of [RFC8881], respectively). That is, it can | |||
of [RFC8881]) for the metadata file. I.e., the information to return | determine the information to return to clients in an NFSv4.2 GETATTR | |||
to clients in a NFSv4.2 GETATTR response. | response. | |||
For example, the metadata server might issue an NFSv3 GETATTR | For example, the metadata server might issue an NFSv3 GETATTR | |||
operation to the data server, which is typically triggered by a | operation to the data server, which is typically triggered by a | |||
client's NFSv4 GETATTR request to the metadata server. In addition | client's NFSv4 GETATTR request to the metadata server. In addition | |||
to the cost of each individual GETATTR operation, the data server can | to the cost of each individual GETATTR operation, the data server can | |||
be overwhelmed by a large volume of such requests. NFSv3 addressed a | be overwhelmed by a large volume of such requests. NFSv3 addressed a | |||
similar challenge by including a post-operation attribute in the READ | similar challenge by including a post-operation attribute in the READ | |||
and WRITE operations to report weak cache consistency (WCC) data (see | and WRITE operations to report WCC data (see Section 2.6 of | |||
Section 2.6 of [RFC1813]). | [RFC1813]). | |||
Each NFSv3 operation entails a single round trip between the client | Each NFSv3 operation entails a single round trip between the client | |||
and server. Consequently, issuing a WRITE followed by a GETATTR | and server. Consequently, issuing a WRITE followed by a GETATTR | |||
would require two round trips. In that situation, the retrieved | would require two round trips. In that situation, the retrieved | |||
attribute information is regarded as strict server-client | attribute information is regarded as having strict server-client | |||
consistency. By contrast, NFSv4 enables a WRITE and GETATTR to be | consistency. By contrast, NFSv4 enables a WRITE and GETATTR to be | |||
combined within a compound operation, which requires only one round | combined within a compound operation, which requires only one round | |||
trip. This combined approach is likewise considered strict server- | trip. This combined approach is likewise considered to have strict | |||
client consistency. Essentially, NFSv4 READ and WRITE operations | server-client consistency. Essentially, NFSv4 READ and WRITE | |||
omit post-operation attributes, allowing the client to determine | operations omit post-operation attributes, allowing the client to | |||
whether it requires that information. | determine whether it requires that information. | |||
Whilst NFSv4 got rid of the requirement for WCC information to be | Whilst NFSv4 got rid of the requirement for WCC information to be | |||
supplied by the WRITE or READ operations, the introduction of pNFS | supplied by the WRITE or READ operations, the introduction of pNFS | |||
re-introduces the same problem. The metadata server has to | reintroduces the same problem. The metadata server has to | |||
communicate with the data server in order to get at the data which | communicate with the data server in order to get the data that could | |||
could be provided by a WCC model. | be provided by a WCC model. | |||
With the flexible file layout type, the client can leverage the NFSv3 | With the flexible file layout type, the client can leverage the NFSv3 | |||
WCC to service the proxying of times (See Section 4 of | WCC to service the proxying of times (see Section 5 of [RFC9754]), | |||
[I-D.ietf-nfsv4-delstid]). But the granularity of this data is | but the granularity of this data is limited. With client-side | |||
limited. With client side mirroring (See Section 8 of [RFC8435]), | mirroring (see Section 8 of [RFC8435]), the client has to aggregate | |||
the client has to aggregate the N mirrored files in order to send one | the N mirrored files in order to send one piece of information | |||
piece of information instead of N pieces of information. Also, the | instead of N pieces of information. Also, the client is limited to | |||
client is limited to sending that information only when it returns | sending that information only when it returns the delegation. | |||
the delegation. | ||||
This document introduces a new NFSv4.2 operation, LAYOUT_WCC, which | This document introduces a new NFSv4.2 operation, LAYOUT_WCC, which | |||
enables the client to provide the metadata server with information | enables the client to provide the metadata server with information | |||
obtained from the data server. The client is responsible for | obtained from the data server. The client is responsible for | |||
gathering the NFSv3 WCC data, returned by the three permissible NFSv3 | gathering the NFSv3 WCC data, returned by the three permissible NFSv3 | |||
operations, and conveying it back to the metadata server as part of | operations, and conveying it back to the metadata server as part of | |||
NFSv4.2 attributes. The metadata server MAY therefore avoid issuing | NFSv4.2 attributes. The metadata server MAY therefore avoid issuing | |||
costly NFSv3 GETATTR calls to the data servers. Because this | costly NFSv3 GETATTR calls to the data servers. Because this | |||
approach relies on a weak model, the metadata server MAY still | approach relies on a weak model, the metadata server MAY still | |||
perform these calls if it chooses to strengthen the model. | perform these calls if it chooses to strengthen the model. | |||
skipping to change at page 6, line 4 ¶ | skipping to change at line 217 ¶ | |||
3.1. ARGUMENT | 3.1. ARGUMENT | |||
<CODE BEGINS> | <CODE BEGINS> | |||
/// struct LAYOUT_WCC4args { | /// struct LAYOUT_WCC4args { | |||
/// stateid4 lowa_stateid; | /// stateid4 lowa_stateid; | |||
/// layouttype4 lowa_type; | /// layouttype4 lowa_type; | |||
/// opaque lowa_body<>; | /// opaque lowa_body<>; | |||
/// }; | /// }; | |||
<CODE ENDS> | <CODE ENDS> | |||
stateid4 is defined in Section 3.3.12 of [RFC8881]. layouttype4 is | stateid4 is defined in Section 3.3.12 of [RFC8881]. layouttype4 is | |||
defined in Section 3.3.13 of [RFC8881]. | defined in Section 3.3.13 of [RFC8881]. | |||
3.2. RESULT | 3.2. RESULT | |||
<CODE BEGINS> | <CODE BEGINS> | |||
/// struct LAYOUT_WCC4res { | /// struct LAYOUT_WCC4res { | |||
/// nfsstat4 lowr_status; | /// nfsstat4 lowr_status; | |||
/// }; | /// }; | |||
<CODE ENDS> | <CODE ENDS> | |||
nfsstat4 is defined in Section 3.2 of [RFC8881]. | nfsstat4 is defined in Section 3.2 of [RFC8881]. | |||
3.3. DESCRIPTION | 3.3. DESCRIPTION | |||
The current filehandle and the lowa_stateid identify the specific | The current filehandle and the lowa_stateid identify the specific | |||
layout for the LAYOUT_WCC operation. The lowa_type indicates how to | layout for the LAYOUT_WCC operation. The lowa_type indicates how to | |||
interpret the layout-type-specific payload contained in the lowa_body | interpret the layout-type-specific payload contained in the lowa_body | |||
field. The lowa_type is the corresponding value from the IANA | field. The lowa_type is the corresponding value from the "pNFS | |||
registry for 'pNFS Layout Types' for the layout type being used. | Layout Types" IANA registry for the layout type being used. | |||
The lowa_body contains the data file attributes. The client is | The lowa_body contains the data file attributes. The client is | |||
responsible for mapping NFSv3 post-operation attributes to the fattr4 | responsible for mapping NFSv3 post-operation attributes to the fattr4 | |||
representation. Similar to the behavior of post-operation | representation. Similar to the behavior of post-operation | |||
attributes, the client may ignore these attributes, and the server | attributes, the client may ignore these attributes, and the server | |||
may also choose to ignore any attributes included in LAYOUT_WCC. | may also choose to ignore any attributes included in LAYOUT_WCC. | |||
However, the server can use these attributes to avoid querying the | However, the server can use these attributes to avoid querying the | |||
data server for data file attributes. Because these attributes are | data server for data file attributes. Because these attributes are | |||
optional and the client has no recourse if the server opts to | optional and the client has no recourse if the server opts to | |||
disregard them, there is no requirement to return a bitmap4 | disregard them, there is no requirement to return a bitmap4 | |||
indicating which attributes have been accepted in the LAYOUT_WCC | indicating which attributes have been accepted in the LAYOUT_WCC | |||
result. | result. | |||
3.4. Implementation | 3.4. Implementation | |||
3.4.1. Examples of when to use LAYOUT_WCC | 3.4.1. Examples of When to Use LAYOUT_WCC | |||
The only way for the metadata server to detect modifications to the | The only way for the metadata server to detect modifications to the | |||
data file is to probe the data servers via a GETATTR. It can compare | data file is to probe the data servers via a GETATTR. It can compare | |||
the mtime results across multiple calls to detect a NFSv3 WRITE | the mtime results across multiple calls to detect an NFSv3 WRITE | |||
operation by the client. Likewise, the atime results indicate the | operation by the client. Likewise, the atime results indicate the | |||
client having issued a NFSv3 READ operation. As such, the client can | client having issued an NFSv3 READ operation. As such, the client | |||
leverage the LAYOUT_WCC operation whenever it has the belief that the | can leverage the LAYOUT_WCC operation whenever it has the belief that | |||
metadata server would need to refresh the attributes of the data | the metadata server would need to refresh the attributes of the data | |||
files. While the client can send a LAYOUT_WCC at any time, there are | files. While the client can send a LAYOUT_WCC at any time, there are | |||
times it will want to do this operation in order to avoid having the | times it will want to do this operation in order to avoid having the | |||
metadata server issue NFSv3 GETATTR requests to the data servers: | metadata server issue NFSv3 GETATTR requests to the data servers: | |||
* Whenever it sends a GETATTR for any of the following attributes: | * Whenever it sends a GETATTR for any of the following attributes: | |||
size (see Section 5.8.1.5 of [RFC8881]), space_used (see | ||||
Section 5.8.2.25 of [RFC8881]), change (see Section 5.8.1.4 of | - size (see Section 5.8.1.5 of [RFC8881]) | |||
[RFC8881]), time_access (see Section 5.8.2.37 of [RFC8881]), | ||||
time_metadata (see Section 5.8.2.42 of [RFC8881]), and time_modify | - space_used (see Section 5.8.2.35 of [RFC8881]) | |||
(see Section 5.8.2.43 of [RFC8881]). | ||||
- change (see Section 5.8.1.4 of [RFC8881]) | ||||
- time_access (see Section 5.8.2.37 of [RFC8881]) | ||||
- time_metadata (see Section 5.8.2.42 of [RFC8881]) | ||||
- time_modify (see Section 5.8.2.43 of [RFC8881]) | ||||
* Whenever it sends an NFS4ERR_ACCESS error via LAYOUTRETURN or | * Whenever it sends an NFS4ERR_ACCESS error via LAYOUTRETURN or | |||
LAYOUTERROR - it could have already gotten the NFSv3 uid and gid | LAYOUTERROR. It could have already gotten the NFSv3 uid and gid | |||
values back in the WCC of the WRITE, READ, or COMMIT operation | values back in the WCC of the WRITE, READ, or COMMIT operation | |||
which got the error. Thus it could report that information back | that got the error. Thus, it could report that information back | |||
to the metadata server, saving it from querying that information | to the metadata server, saving it from querying that information | |||
via a NFSv3 GETATTR. | via an NFSv3 GETATTR. | |||
* Whenever it sends a SETATTR to refresh the proxied times (See | * Whenever it sends a SETATTR to refresh the proxied times (see | |||
Section 4 of [I-D.ietf-nfsv4-delstid]) - the metadata server is | Section 5 of [RFC9754]). The metadata server will correlate these | |||
going to want to correlate these times in order to detect later | times in order to detect later modification to the data file. | |||
modification to the data file. | ||||
3.4.2. Examples of what to send in the LAYOUT_WCC | 3.4.2. Examples of What to Send in LAYOUT_WCC | |||
The NFSv3 attributes returned in the WCC of WRITE, READ, and COMMIT | The NFSv3 attributes returned in the WCC of WRITE, READ, and COMMIT | |||
are a smaller subset of what can be transmitted as a NFSv4 attribute. | operations are a smaller subset of what can be transmitted as an | |||
The mapping of NFSv3 to NFSv4 attributes is shown in Table 1. The | NFSv4 attribute. The mapping of NFSv3 to NFSv4 attributes is shown | |||
LAYOUT_WCC MUST provide all of these attributes to the metadata | in Table 1. The LAYOUT_WCC MUST provide all of these attributes to | |||
server. Both the uid and gid are stringified into their respective | the metadata server. Both the uid and gid are stringified into their | |||
attributes of owner and owner_group. The reason to provide these two | respective attributes of owner and owner_group. In the case of | |||
attributes is in case of NFS4ERR_ACCESS, the metadata server can | NFS4ERR_ACCESS, the reason to provide these two attributes is that | |||
compare what it expects the values of the uid and gid of the data | the metadata server can compare what it expects the values of the uid | |||
file to be versus the actual values. It can then repair the | and gid of the data file to be versus the actual values. It can then | |||
permissions as needed or modify the expected values it has cached. | repair the permissions as needed or modify the expected values it has | |||
cached. | ||||
+=================+===================+ | +=================+===================+ | |||
| NFSv3 Attribute | NFSv4.2 Attribute | | | NFSv3 Attribute | NFSv4.2 Attribute | | |||
+=================+===================+ | +=================+===================+ | |||
| size | size | | | size | size | | |||
+-----------------+-------------------+ | +-----------------+-------------------+ | |||
| used | space_used | | | used | space_used | | |||
+-----------------+-------------------+ | +-----------------+-------------------+ | |||
| mode | mode | | | mode | mode | | |||
+-----------------+-------------------+ | +-----------------+-------------------+ | |||
skipping to change at page 8, line 30 ¶ | skipping to change at line 330 ¶ | |||
| mtime | time_modify | | | mtime | time_modify | | |||
+-----------------+-------------------+ | +-----------------+-------------------+ | |||
| ctime | time_metadata | | | ctime | time_metadata | | |||
+-----------------+-------------------+ | +-----------------+-------------------+ | |||
Table 1: NFSv3 to NFSv4.2 Attribute | Table 1: NFSv3 to NFSv4.2 Attribute | |||
Mappings | Mappings | |||
3.5. Allowed Errors | 3.5. Allowed Errors | |||
The LAYOUT_WCC operation can raise the errors in Table 2. When an | The LAYOUT_WCC operation can raise the errors listed in Table 2. | |||
error is encountered, the metadata server can decide to ignore the | When an error is encountered, the metadata server can decide to | |||
entire operation or depending on the layout type specific payload, it | ignore the entire operation, or depending on the layout-type-specific | |||
could decide to apply a portion of the payload. Note that there are | payload, it could decide to apply a portion of the payload. Note | |||
no new errors introduced for the LAYOUT_WCC operation and the errors | that there are no new errors introduced for the LAYOUT_WCC operation | |||
in Table 2 are each defined in Section 15.1 of [RFC8881]. Table 2 | and the errors in Table 2 are each defined in Section 15.1 of | |||
can be considered as an extension of Section 15.2 of [RFC8881]. | [RFC8881]. Table 2 can be considered as an extension of Section 15.2 | |||
of [RFC8881]. | ||||
+============+====================================================+ | +============+====================================================+ | |||
| Operation | Errors | | | Operation | Errors | | |||
+============+====================================================+ | +============+====================================================+ | |||
| LAYOUT_WCC | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | | | LAYOUT_WCC | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | | |||
| | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | | | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | | |||
| | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | | | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | | |||
| | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | |||
| | NFS4ERR_INVAL, NFS4ERR_ISDIR, NFS4ERR_MOVED, | | | | NFS4ERR_INVAL, NFS4ERR_ISDIR, NFS4ERR_MOVED, | | |||
| | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | | | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | | |||
skipping to change at page 9, line 27 ¶ | skipping to change at line 361 ¶ | |||
| | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SERVERFAULT, | | | | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SERVERFAULT, | | |||
| | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | | | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | | |||
| | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | | | | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | | |||
| | NFS4ERR_WRONG_TYPE | | | | NFS4ERR_WRONG_TYPE | | |||
+------------+----------------------------------------------------+ | +------------+----------------------------------------------------+ | |||
Table 2: Operations and Their Valid Errors | Table 2: Operations and Their Valid Errors | |||
3.6. Extension of Existing Implementations | 3.6. Extension of Existing Implementations | |||
The new LAYOUT_WCC operation is OPTIONAL for both NFSv4.2 ([RFC7863]) | The new LAYOUT_WCC operation is OPTIONAL for both NFSv4.2 [RFC7863] | |||
and the flexible file layout type ([RFC8435]). | and the flexible file layout type [RFC8435]. | |||
3.7. Flex Files Layout Type | 3.7. Flexible File Layout Type | |||
<CODE BEGINS> | <CODE BEGINS> | |||
/// struct ff_data_server_wcc4 { | /// struct ff_data_server_wcc4 { | |||
/// deviceid4 ffdsw_deviceid; | /// deviceid4 ffdsw_deviceid; | |||
/// stateid4 ffdsw_stateid; | /// stateid4 ffdsw_stateid; | |||
/// nfs_fh4 ffdsw_fh_vers<>; | /// nfs_fh4 ffdsw_fh_vers<>; | |||
/// fattr4 ffdsw_attributes; | /// fattr4 ffdsw_attributes; | |||
/// }; | /// }; | |||
/// | /// | |||
/// struct ff_mirror_wcc4 { | /// struct ff_mirror_wcc4 { | |||
/// ff_data_server_wcc4 ffmw_data_servers<>; | /// ff_data_server_wcc4 ffmw_data_servers<>; | |||
/// }; | /// }; | |||
/// | /// | |||
/// struct ff_layout_wcc4 { | /// struct ff_layout_wcc4 { | |||
/// ff_mirror_wcc4 fflw_mirrors<>; | /// ff_mirror_wcc4 fflw_mirrors<>; | |||
/// }; | /// }; | |||
<CODE ENDS> | <CODE ENDS> | |||
The flex file layout type specific results MUST correspond to the | The results specific to the flexible file layout type MUST correspond | |||
ff_layout4 data structure as defined in Section 5.1 of [RFC8435]. | to the ff_layout4 data structure as defined in Section 5.1 of | |||
There MUST be a one-to-one correspondence between: | [RFC8435]. There MUST be a one-to-one correspondence between the | |||
following: | ||||
* ff_data_server4 -> ff_data_server_wcc4 | * ff_data_server4 -> ff_data_server_wcc4 | |||
* ff_mirror4 -> ff_mirror_wcc4 | * ff_mirror4 -> ff_mirror_wcc4 | |||
* ff_layout4 -> ff_layout_wcc4 | * ff_layout4 -> ff_layout_wcc4 | |||
Each ff_layout4 has an array of ff_mirror4, which have an array of | Each ff_layout4 has an array of ff_mirror4, which has an array of | |||
ff_data_server4. Based on the current filehandle and the | ff_data_server4. Based on the current filehandle and the | |||
lowa_stateid, the server can match the reported attributes. | lowa_stateid, the server can match the reported attributes. | |||
But the positional correspondence between the elements is not | But the positional correspondence between the elements is not | |||
sufficient to determine the attributes to update. Consider the case | sufficient to determine the attributes to update. Consider the case | |||
where a layout had three mirrors and two of them had updated | where a layout has three mirrors and two of them have updated | |||
attributes, but the third did not. A client could decide to present | attributes but the third does not. A client could decide to present | |||
all three mirrors, with one mirror having an attribute mask with no | all three mirrors, with one mirror having an attribute mask with no | |||
attributes present. Or it could decide to present only the two | attributes present. Or it could decide to present only the two | |||
mirrors which had been changed. | mirrors that had been changed. | |||
In either case, the combination of ffdsw_deviceid, ffdsw_stateid, and | In either case, the combination of ffdsw_deviceid, ffdsw_stateid, and | |||
ffdsw_fh_vers will uniquely identify the attributes to be updated. | ffdsw_fh_vers will uniquely identify the attributes to be updated. | |||
All three arguments are required. A layout might have multiple data | All three arguments are required. A layout might have multiple data | |||
files on the same storage device, in which case the ffdsw_deviceid | files on the same storage device, in which case the ffdsw_deviceid | |||
and ffdsw_stateid would match, but the ffdsw_fh_vers would not. | and ffdsw_stateid would match, but the ffdsw_fh_vers would not. | |||
The ffdsw_attributes are processed similar to the obj_attributes in | The ffdsw_attributes are processed similar to the obj_attributes in | |||
the SETATTR arguments (See Section 18.34 of [RFC8881]). | the SETATTR arguments (see Section 18.30 of [RFC8881]). | |||
4. Extraction of XDR | 4. Extraction of XDR | |||
This document contains the external data representation (XDR) | This document contains the XDR [RFC4506] description of the new | |||
[RFC4506] description of the new open flags for delegating the file | NFSv4.2 operation LAYOUT_WCC. The XDR description is embedded in | |||
to the client. The XDR description is embedded in this document in a | this document in a way that makes it simple for the reader to extract | |||
way that makes it simple for the reader to extract into a ready-to- | into a ready-to-compile form. The reader can feed this document into | |||
compile form. The reader can feed this document into the following | the following shell script to produce the machine-readable XDR | |||
shell script to produce the machine-readable XDR description of the | description of the new NFSv4.2 operation LAYOUT_WCC. | |||
new flags: | ||||
<CODE BEGINS> | <CODE BEGINS> | |||
#!/bin/sh | #!/bin/sh | |||
grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??' | grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??' | |||
<CODE ENDS> | <CODE ENDS> | |||
That is, if the above script is stored in a file called 'extract.sh', | That is, if the above script is stored in a file called 'extract.sh', | |||
and this document is in a file called 'spec.txt', then the reader can | and this document is in a file called 'spec.txt', then the reader can | |||
do: | do: | |||
<CODE BEGINS> | <CODE BEGINS> | |||
sh extract.sh < spec.txt > layout_wcc.x | sh extract.sh < spec.txt > layout_wcc.x | |||
<CODE ENDS> | <CODE ENDS> | |||
The effect of the script is to remove leading white space from each | The effect of the script is to remove leading blank space from each | |||
line, plus a sentinel sequence of '///'. XDR descriptions with the | line, plus a sentinel sequence of '///'. XDR descriptions with the | |||
sentinel sequence are embedded throughout the document. | sentinel sequence are embedded throughout the document. | |||
Note that the XDR code contained in this document depends on types | Note that the XDR code contained in this document depends on types | |||
from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]). This | from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]). This | |||
includes both nfs types that end with a 4, such as offset4, length4, | includes both nfs types that end with a 4 (such as offset4 and | |||
etc., as well as more generic types such as uint32_t and uint64_t. | length4) as well as more generic types (such as uint32_t and | |||
uint64_t). | ||||
While the XDR can be appended to that from [RFC7863], the various | While the XDR can be appended to that from [RFC7863], the various | |||
code snippets belong in their respective areas of that XDR. | code snippets belong in their respective areas of that XDR. | |||
4.1. Code Components Licensing Notice | ||||
Both the XDR description and the scripts used for extracting the XDR | ||||
description are Code Components as described in Section 4 of 'Legal | ||||
Provisions Relating to IETF Documents' [LEGAL]. These Code | ||||
Components are licensed according to the terms of that document. | ||||
5. Security Considerations | 5. Security Considerations | |||
There are no new security considerations beyond those in [RFC8435]. | There are no new security considerations beyond those in [RFC8435]. | |||
6. IANA Considerations | 6. IANA Considerations | |||
This section is to be removed before publishing as an RFC. | This document has no IANA actions. | |||
There are no IANA considerations for this document. | ||||
7. References | 7. References | |||
7.1. Normative References | 7.1. Normative References | |||
[I-D.ietf-nfsv4-delstid] | ||||
Haynes, T. and T. Myklebust, "Extending the Opening of | ||||
Files in NFSv4.2", Work in Progress, Internet-Draft, | ||||
draft-ietf-nfsv4-delstid-08, 2 October 2024, | ||||
<https://datatracker.ietf.org/doc/html/draft-ietf-nfsv4- | ||||
delstid-08>. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC4506] Eisler, M., Ed., "XDR: External Data Representation | [RFC4506] Eisler, M., Ed., "XDR: External Data Representation | |||
Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May | Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May | |||
2006, <https://www.rfc-editor.org/info/rfc4506>. | 2006, <https://www.rfc-editor.org/info/rfc4506>. | |||
[RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor | [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor | |||
skipping to change at page 12, line 39 ¶ | skipping to change at line 501 ¶ | |||
[RFC8435] Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible | [RFC8435] Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible | |||
File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018, | File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018, | |||
<https://www.rfc-editor.org/info/rfc8435>. | <https://www.rfc-editor.org/info/rfc8435>. | |||
[RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS) | [RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS) | |||
Version 4 Minor Version 1 Protocol", RFC 8881, | Version 4 Minor Version 1 Protocol", RFC 8881, | |||
DOI 10.17487/RFC8881, August 2020, | DOI 10.17487/RFC8881, August 2020, | |||
<https://www.rfc-editor.org/info/rfc8881>. | <https://www.rfc-editor.org/info/rfc8881>. | |||
7.2. Informative References | [RFC9754] Haynes, T. and T. Myklebust, "Extensions for Opening and | |||
Delegating Files in NFSv4.2", RFC 9754, | ||||
DOI 10.17487/RFC9754, March 2025, | ||||
<https://www.rfc-editor.org/info/rfc9754>. | ||||
[LEGAL] IETF Trust, "Legal Provisions Relating to IETF Documents", | 7.2. Informative References | |||
November 2008, <http://trustee.ietf.org/docs/IETF-Trust- | ||||
License-Policy.pdf>. | ||||
[RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS | [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS | |||
Version 3 Protocol Specification", RFC 1813, | Version 3 Protocol Specification", RFC 1813, | |||
DOI 10.17487/RFC1813, June 1995, | DOI 10.17487/RFC1813, June 1995, | |||
<https://www.rfc-editor.org/info/rfc1813>. | <https://www.rfc-editor.org/info/rfc1813>. | |||
Appendix A. Acknowledgments | Acknowledgments | |||
Dave Noveck, Tigran Mkrtchyan, and Rick Macklem provided reviews of | Dave Noveck, Tigran Mkrtchyan, and Rick Macklem provided reviews of | |||
the document. | the document. | |||
Authors' Addresses | Authors' Addresses | |||
Thomas Haynes | Thomas Haynes | |||
Hammerspace | Hammerspace | |||
Email: loghyr@gmail.com | Email: loghyr@gmail.com | |||
End of changes. 49 change blocks. | ||||
202 lines changed or deleted | 184 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |