public class StreamingSampleStats extends Object
Two options are offered to calculate sample median. Where it is known a priori that the data stream can be accomodated in memory, the exact median can be requested with Statistic.MEDIAN. Where the length of the data stream is unknown, or known to be too large to be held in memory, an approximate median can be calculated using the 'remedian' estimator as described in:
PJ Rousseeuw and GW Bassett (1990) The remedian: a robust averaging method for large data sets. Journal of the American Statistical Society 85:97-104This is requested with Statistic.APPROX_MEDIAN.
Note: the 'remedian' estimator performs badly with non-stationary data, e.g. a data stream that is monotonically increasing will result in an estimate for the median that is too high. If possible, it is best to de-trend or randomly order the data prior to streaming it.
Example of use:
StreamingSampleStats strmStats = new StreamingSampleStats();
// set the statistics that will be calculated
Statistic[] stats = {
Statistic.MEAN,
Statistic.SDEV,
Statistic.RANGE,
Statistic.APPROX_MEDIAN
};
strmStats.setStatistics(stats);
// some process that generates a long stream of data
while (somethingBigIsRunning) {
double value = ...
strmStats.offer(value);
}
// report the results
for (Statistic s : stats) {
System.out.println(String.format("%s: %.4f", s, strmStats.getStatisticValue(s)));
}
| Constructor and Description |
|---|
StreamingSampleStats()
Creates a new sampler and sets the default range type to
Range.Type.EXCLUDE. |
StreamingSampleStats(Range.Type rangesType)
Creates a new sampler with specified use of
Ranges. |
| Modifier and Type | Method and Description |
|---|---|
void |
addNoDataRange(Range<Double> noData)
Adds a range of values to be considered as NoData and then to be excluded
from the calculation of all statistics.
|
void |
addNoDataValue(Double noData)
Adds a single value to be considered as NoData.
|
void |
addRange(Range<Double> range)
Adds a range of values to include in or exclude from the calculation
of all statistics.
|
void |
addRange(Range<Double> range,
Range.Type rangesType)
Adds a range of values to include in or exclude from the calculation
of all statistics.
|
long |
getNumAccepted(Statistic stat)
Gets the number of sample values that have been accepted for the
specified
Statistic. |
long |
getNumNaN(Statistic stat)
Gets the number of NaN values that have been offered.
|
long |
getNumNoData(Statistic stat)
Gets the number of NoData values (including NaN) that have been offered.
|
long |
getNumOffered(Statistic stat)
Gets the number of sample values that have been offered for the
specified
Statistic. |
Set<Statistic> |
getStatistics()
Gets the statistics that are currently set.
|
Double |
getStatisticValue(Statistic stat)
Gets the current value of a running statistic.
|
Map<Statistic,Double> |
getStatisticValues()
Gets the values of all statistics calculated by this sampler.
|
boolean |
isSet(Statistic stat)
Tests whether the specified statistic is currently set.
|
void |
offer(Double sample)
Offers a sample value.
|
void |
offer(Double[] samples)
Offers an array of sample values.
|
void |
setStatistic(Statistic stat)
Adds a statistic to those calculated by this sampler.
|
void |
setStatistics(Statistic[] stats)
Adds the given statistics to those that will be calculated by this sampler.
|
public StreamingSampleStats()
Range.Type.EXCLUDE.public StreamingSampleStats(Range.Type rangesType)
Ranges.rangesType - either Range.Type.INCLUDE
or Range.Type.EXCLUDEpublic void setStatistic(Statistic stat)
stat - the statisticStatisticpublic void setStatistics(Statistic[] stats)
stats - the statisticssetStatistic(Statistic)public boolean isSet(Statistic stat)
Statistic.MEAN is set then SDEV and
VARIANCE will also be set as these three are calculated
together. The same is true for MIN, MAX and RANGE.stat - the statistictrue if the statistic has been set; false otherwise.public void addNoDataRange(Range<Double> noData)
noData - the range defining NoData valuespublic void addNoDataValue(Double noData)
noData - the value to be treated as NoDataaddNoDataRange(Range)public void addRange(Range<Double> range)
range - the range to include/excludepublic void addRange(Range<Double> range, Range.Type rangesType)
range - the range to include/excluderangesType - one of Range.Type.INCLUDE or Range.Type.EXCLUDEpublic Set<Statistic> getStatistics()
public Double getStatisticValue(Statistic stat)
stat - the statisticIllegalStateException - if stat was not previously setpublic long getNumAccepted(Statistic stat)
Statistic.
Note that different statistics might have been set at different times in the sampling process.
stat - the statisticIllegalArgumentException - if the statistic hasn't been setpublic long getNumOffered(Statistic stat)
Statistic. This might be higher than the value
returned by getNumAccepted(org.jaitools.numeric.Statistic) due to nulls,
Double.NaNs and excluded values in the data stream.
Note that different statistics might have been set at different times in the sampling process.
stat - the statisticIllegalArgumentException - if the statistic hasn't been setpublic long getNumNaN(Statistic stat)
stat - the statisticIllegalArgumentException - if the statistic hasn't been setpublic long getNumNoData(Statistic stat)
stat - the statisticIllegalArgumentException - if the statistic hasn't been setpublic void offer(Double sample)
Double.NaNs and nulls are excluded by default.sample - the sample valuepublic void offer(Double[] samples)
samples - the sample valuesCopyright © 2009–2018. All rights reserved.