Prepare DLHUB training dataset

DLHUB currently supports 3 file formats for the training dataset


This information can be found by launching DLHUB, select the "?" button next to the Detected Data Type (1) to show the Training Data File Help dialog.

By clicking the Up/Down arrow button at the File Format, you can see detail about each supported format and how to prepare the correct training dataset.


Click on Download Example for some example training dataset.


Click on Generate Sample File/Folders, you can download the dataset folder structure of the selected file format.

File Type 1: Classified Image Folder

You may simply organize your images and define their output by categorized folders


Create a parent folder (any name) containing Classified sub-folders (output name), each classified folder contains images of that class.


For example, you want to do image classification for Avengers, you will have a parent folder called Train that contains sub-folders (Spiderman, Superman, Wonderwoman)

In each folder, place appropriate images that belong to that character. 

So basically, the input of the training dataset are these images and the output are their folder's name.


To sucessfully load this type of data into DLHUB, make sure you browse and open the parent folder then click Current Folder

File Type 2: FEATURE vs CATEGORY (csv or txt)

This is the standard format where you list your data as columns, including:

  • Column of labels (output)
  • Column of features (input)


Here is a simple example that has 4 output and 8 inputs

Output 1 is labeled as 1 0 0 0

Output 2 is labeled as 0 1 0 0

Output 3 is labeled as 0 0 1 0

Output 4 is labeled as 0 0 0 1


Each row defines the classified output with its corresponding inputs (features). 


This example only shows one training sample for each output, in the real world application you would have many more samples for each output.

Below is a screenshot of the MNIST training dataset, it contains 60,000 training samples.

The format of this type of dataset is .txt file. To load into to DLHUB, you just need to browse to the file location and open it.

File Type 3: IMAGE MAP FILE (csv or txt)

This is a text file (map file) that contains a list of image directories vs its classified output. (separated by Tab)


You need to make sure the image directory contains the actual image file.


First column will be the list of image path, and second column will be the classified output.

To load this type data into DLHUB, simply load the map file