Docs maintenance (#3999)

* ✨ doc maintenance * 🖍 apply feedback * 🖍 use the shorthand syntax
huggingface · Mar 30, 2022 · bfb3d09 · bfb3d09 · github-actions · Mar 30, 2022
1 parent d7a3a76
commit bfb3d09
Show file tree

Hide file tree

Showing 11 changed files with 14 additions and 11 deletions.
diff --git a/docs/source/access.mdx b/docs/source/access.mdx
@@ -11,7 +11,7 @@ A [`Dataset`] object is returned when you load an instance of a dataset. This ob
 
 ## Metadata
 
-The [`Dataset`] object contains a lot of useful information about your dataset. For example, call [`dataset.info`] to return a short description of the dataset, the authors, and even the dataset size. This will give you a quick snapshot of the datasets most important attributes.
+The [`Dataset`] object contains a lot of useful information about your dataset. For example, access [`DatasetInfo`] to return a short description of the dataset, the authors, and even the dataset size. This will give you a quick snapshot of the datasets most important attributes.
 
 ```py
 >>> dataset.info
@@ -73,7 +73,7 @@ List the columns names with [`Dataset.column_names`]:
 ['idx', 'label', 'sentence1', 'sentence2']
 ```
 
-Get detailed information about the columns with [`Dataset.features`]:
+Get detailed information about the columns with [`~datasets.Features`]:
 
 ```py
 >>> dataset.features

diff --git a/docs/source/dataset_script.mdx b/docs/source/dataset_script.mdx
@@ -157,7 +157,7 @@ class SuperGlue(datasets.GeneratorBasedBuilder):
 
 ### Default configurations
 
-Users must specify a configuration name when they load a dataset with multiple configurations. Otherwise, 🤗 Datasets will raise a `ValueError`, and prompt the user to select a configuration name. You can avoid this by setting a default dataset configuration with the ['datasets.DatasetBuilder.DEFAULT_CONFIG_NAME'] attribute:
+Users must specify a configuration name when they load a dataset with multiple configurations. Otherwise, 🤗 Datasets will raise a `ValueError`, and prompt the user to select a configuration name. You can avoid this by setting a default dataset configuration with the `DEFAULT_CONFIG_NAME` attribute:
 
 ```py
 class NewDataset(datasets.GeneratorBasedBuilder):

diff --git a/docs/source/filesystems.mdx b/docs/source/filesystems.mdx
@@ -99,7 +99,7 @@ Save your dataset with `botocore.session.Session` and a custom AWS profile:
 
 ## Loading datasets
 
-When you are ready to use your dataset again, reload it with `datasets.load_from_disk`:
+When you are ready to use your dataset again, reload it with [`Dataset.load_from_disk`]:
 
 ```py
 >>> from datasets import load_from_disk

diff --git a/docs/source/load_hub.mdx b/docs/source/load_hub.mdx
@@ -34,7 +34,7 @@ Once you are happy with the dataset you want, load it in a single line with [`lo
 
 Some datasets, like the [General Language Understanding Evaluation (GLUE)](https://huggingface.co/datasets/glue) benchmark, are actually made up of several datasets. These sub-datasets are called **configurations**, and you must explicitly select one when you load the dataset. If you don't provide a configuration name, 🤗 Datasets will raise a `ValueError` and remind you to select a configuration.
 
-Use `get_dataset_config_names` to retrieve a list of all the possible configurations available to your dataset:
+Use the [`get_dataset_config_names`] function to retrieve a list of all the possible configurations available to your dataset:
 
 ```py
 from datasets import get_dataset_config_names

diff --git a/docs/source/loading.mdx b/docs/source/loading.mdx
@@ -383,7 +383,7 @@ See the [Metrics](./how_to_metrics#custom-metric-loading-script) guide for more
 
 ### Load configurations
 
-It is possible for a metric to have different configurations. The configurations are stored in the ['datasets.Metric.config_name'] attribute. When you load a metric, provide the configuration name as shown in the following:
+It is possible for a metric to have different configurations. The configurations are stored in the `config_name` parameter in [`MetricInfo`] attribute. When you load a metric, provide the configuration name as shown in the following:
 
 ```
 >>> from datasets import load_metric

diff --git a/docs/source/metrics.mdx b/docs/source/metrics.mdx
@@ -34,7 +34,7 @@ If you are using a benchmark dataset, you need to select a metric that is associ
 
 ## Metrics object
 
-Before you begin using a [`Metric`] object, you should get to know it a little better. As with a dataset, you can return some basic information about a metric. For example, use `Metric.inputs_descriptions` to get more information about a metrics expected input format and some usage examples:
+Before you begin using a [`Metric`] object, you should get to know it a little better. As with a dataset, you can return some basic information about a metric. For example, access the `inputs_description` parameter in [`datasets.MetricInfo`] to get more information about a metrics expected input format and some usage examples:
 
 ```py
 >>> print(metric.inputs_description)
@@ -71,7 +71,7 @@ Notice for the MRPC configuration, the metric expects the input format to be zer
 
 ## Compute metric
 
-Once you have loaded a metric, you are ready to use it to evaluate a models predictions. Provide the model predictions and references to `Metric.compute`:
+Once you have loaded a metric, you are ready to use it to evaluate a models predictions. Provide the model predictions and references to [`~datasets.Metric.compute`]:
 
 ```py
 >>> model_predictions = model(model_inputs)

diff --git a/docs/source/package_reference/main_classes.mdx b/docs/source/package_reference/main_classes.mdx
@@ -150,8 +150,10 @@ The base class [`IterableDataset`] implements an iterable Dataset backed by pyth
 [[autodoc]] datasets.IterableDataset
     -  remove_columns
     - cast_column
+    - cast
     - __iter__
     - map
+    - rename_column
     - filter
     - shuffle
     - skip

diff --git a/docs/source/process.mdx b/docs/source/process.mdx
@@ -322,7 +322,7 @@ You can also use [`Dataset.map`] with indices if you set `with_indices=True`. Th
 ]
 ```
 
-You can also use [`Dataset.map`] with the rank of the process if you set `with_rank=True`. This is analogous to `with_indices`. The `rank` argument in the mapped function goes after the `index` one if it is already present. The main use-case for it is to parallelize your computation across several GPUs. This requires setting *multiprocess.set_start_method("spawn")*, without which you will receive a CUDA error: *RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method*.
+You can also use [`Dataset.map`] with the rank of the process if you set `with_rank=True`. This is analogous to `with_indices`. The `rank` argument in the mapped function goes after the `index` one if it is already present. The main use-case for it is to parallelize your computation across several GPUs. This requires setting `multiprocess.set_start_method("spawn")`, without which you will receive a CUDA error: `RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method`.
 
 
 ```py

diff --git a/docs/source/repository_structure.mdx b/docs/source/repository_structure.mdx
@@ -3,7 +3,7 @@
 To host and share your dataset, you can create a dataset repository on the Hugging Face Dataset Hub and upload your data files.
 
 This guide will show you how to structure your dataset repository when you upload it.
-A dataset with a supported structure can be loaded automatically with `load_dataset`, and it will have a preview on its dataset page on the Hub.
+A dataset with a supported structure can be loaded automatically with [`~datasets.load_dataset`], and it will have a preview on its dataset page on the Hub.
 
 <Tip>
 

diff --git a/docs/source/stream.mdx b/docs/source/stream.mdx
@@ -165,7 +165,7 @@ Casting only works if the original feature type and new feature type are compati
 
 </Tip>
 
-Use [`Dataset.cast_column`] to change the feature type of just one column. Pass the column name and its new feature type as arguments:
+Use [`IterableDataset.cast_column`] to change the feature type of just one column. Pass the column name and its new feature type as arguments:
 
 ```py
 >>> dataset.features

diff --git a/docs/source/use_dataset.mdx b/docs/source/use_dataset.mdx
@@ -84,6 +84,7 @@ After you set the format, wrap the dataset with `torch.utils.data.DataLoader`. Y
 
 If you are using TensorFlow, you can use [`Dataset.to_tf_dataset`] to wrap the dataset with a **tf.data.Dataset**, which is natively understood by Keras.
 This means a **tf.data.Dataset** object can be iterated over to yield batches of data, and can be passed directly to methods like **model.fit()**.
+
 [`Dataset.to_tf_dataset`] accepts several arguments:
 
 1. `columns` specify which columns should be formatted (includes the inputs and labels).