Dataset
There are so different interpretations of what a dataset
is. Someone call dataset is feature set, but other call it result set, and the other call it feature + result set! So in this application we need to declare what a dataset
is.
Dluid calls the dataset is a bundle of (feature set
+ result set
). And feature
and result
is a record set
.
DataSet : FeatureSet + ResultSet
FeatureSet is RecordSet
ResultSet is RecordSet
RecordSet is set of record
Record is one row in table.
For example, there is a xor data set below.
a | b | a xor b |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
In this case feature set is.
a | b |
---|---|
0 | 0 |
0 | 1 |
1 | 0 |
1 | 1 |
And result set is.
a xor b |
---|
0 |
1 |
1 |
0 |
And record is
a | b | a xor b |
---|---|---|
0 | 0 | 0 |
a | b | a xor b |
---|---|---|
0 | 1 | 1 |
a | b | a xor b |
---|---|---|
1 | 0 | 1 |
a | b | a xor b |
---|---|---|
1 | 1 | 1 |
And record set is collection of record. So data set
, feature set
and result set
are sub type of record set.
* Origin of sample datasource
| |link| |:—:|:—| |iris|https://github.com/deeplearning4j/dl4j-examples/tree/master/dl4j-examples/src/main/resources| |housing|https://github.com/chendaniely/pandas_for_everyone/tree/master/data| |wine|https://github.com/chendaniely/pandas_for_everyone/tree/master/data| |stock|https://github.com/chendaniely/pandas_for_everyone/tree/master/data| |mnist|dataset in dl4j.| |Stock train data|extracted date from 2014-07-24 to 2016-08-03|
|Stock test data|extracted date from 2016-08-04 to 2016-08-25|