TensorFlow Datasets

A common way to flow data into your network for training is TensorFlow Datasets.

The goal behind TensorFlow Datasets (TFDS) is to expose datasets in a way that’s easy to consume, where all the preprocessing steps of acquiring the data and getting it into TensorFlow-friendly APIs is done for you.

TensorFlow Datasets builds on this idea, but greatly expands not only the number of datasets available but the diversity of dataset types. The list of available datasets is growing all the time including pictures, text, audio, video and more. Check out the link to see some of the datasets.

TensorFlow Datasets is a separate install from TensorFlow, so be sure to install it before trying out any samples! If you are using Google Colab, it’s already preinstalled.

If you need to install it, you can do so with a pip command:

pip install tensorflow-datasets

Once it’s installed, you can use it to get access to a dataset with tfds.load , passing it the name of the desired dataset. For example, if you want to use Fashion MNIST, you can use code like this:

import tensorflow as tf

import tensorflow_datasets as tfds

mnist_data = tfds.load("fashion_mnist")

for item in mnist_data:

print(item)

Two very important concepts to learn with TensorFlow Datasets are the Splits API -- that gives you a flexible way of splitting up data into Training, Testing, Validation sets, and Mapping Functions, which allow you to do things like Augmentation. So, for example, you saw how to do Image Augmentation on a generator, but if you’re no longer using generators you’ll need an alternative! TFDS makes it simple with the mapping functions.

Here’s code as an example:

def augmentimages(image, label):

image = tf.cast(image, tf.float32)

image = (image/255)

image = tf.image.random_flip_left_right(image)

return image, label

data = tfds.load('horses_or_humans', split='train', as_supervised=True)

train = data.map(augmentimages)

In this case, tfds.load is used to get the horses or humans dataset. It’s pre-split into a ‘train’ subset with the training data, so you can request that. Then, once you have data, you can call it’s ‘map’ method, passing it a function like ‘AugmentImages’ as shown, and from within there you can do your image augmentation.

There’s a lot to learn with TFDS, and it’s a really powerful API. Visit: https://www.tensorflow.org/datasets to go deeper!