Flux data gap-filling and flux-partitioning page (beta-version)

The service does not work with Internet Explorer! Use recent versions of Netscape, Mozilla, Opera or Firefox instead!

I want to go directly to the data input form now

I want to go directly to the HOW-TO-USE section now.


Background

Problem 1: The eddy covariance method delivers continuous data sets of mass and energy exchange between ecosystem and atmosphere. However, gaps due to unfavorable micro-meteorological conditions and due to instrument failure are inherent in the data stream. Thus a standardized filling of those gaps is necessary (gap-filling), e.g. to obtain daily, monthly or annually integrated balances.

Problem 2: The eddy covariance method measures the net ecosystem exchange. However, particularly for CO2 exchange a lot more understanding of the ecosystem is gained, when the net flux is partitioned into the main components, gross carbon uptake (GPP) and ecosystem respiration (Reco) (flux-partitioning).

Problem 3: During stable stratification and low turbulent mixing the eddy covariance method faces several problems that introduce bias and uncertainties. These problems primarily happen during night and lead to an underestimation of the night-time flux, i.e. the ecosystem respiration. These problems can be detected via a micro-meteorological quality control that tests if the assumptions of the eddy covariance method are not too strongly violated for a particular half hour (e.g. Foken and Wichura, 1996; http://www.bitoek.uni-bayreuth.de/qaqc/en/forschung/21826/Task122.php).  Under circumstances where the necessary information for those tests is not available, a heuristic class of methods is widely accepted that assumes that a treshhold of friction velcocity (u*) can be site and season specifically established above that night-time fluxes are considered valid. This threshold is usually established by relating the night-time flux to fraction velocity while accounting for temperautre as a the covariate (u*-filtering).

Of course there is need for well-defined and standardized methods. The idea here is to provide a number of methods online to the community. Currently only one method for each problem is implemented in this framework, but any algorithm, that can be called from the DOS-command line can be included later. Please contact me (reichstein [at] unitus.it) if you want to include your method, once this framework has left the experimental stage. The neural network (ANN) methods by Papale & Valentini will be included soon here.

Methods

Here, methods are only briefly described:

Gap-filling

The gap-filling of the eddy covariance and meteorological data will be performed through methods that are similar to Falge et al. (2001), but that consider both the co-variation of fluxes with meteorological variables and the temporal auto-correlation of the fluxes (Reichstein et al. in prep): In this algorithm, three different conditions are identified: 1) Only the data of direct interest are missing, but all meteorological data are available, 2) Also air temperature or VPD is missing, but radiation is available, 3) Also radiation data is missing. In case 1), the missing value is replaced by the average value under similar meteorological conditions within a time-window of ± 7 days. Similar meteorological conditions are present when Rg, Tair and VPD do not deviate by more than 50 W m-2, 2.5 °C, and 5.0 hPa respectively. If no similar meteorological conditions are present within the time window, the averaging window is increased by 14 days. In case 2) the same approach is taken, but similar meteorological conditions can only be defined via Rg deviation less than 50 W m-2 and window size is not increased. In case 3) the missing value is replaced by the average value at the same time of the day (± 1 hour), i.e. by the mean diurnal course. In this case the window size starts with ± 0.5 days (i.e. similar to a linear interpolation from available data at adjacent hours). If after these steps the values could not be filled, the procedure is repeated with increased window sizes until the value can be filled. Both, the method, the window size, and the number and the standard deviation of values averaged is recorded then, so that for individual purposes appropriate data can be selected and e.g. uncertainties can estimated. For convenience, the filled data is further classified into three categories (A, B, C) based on the method (1, 2, or 3) and the window size used (Table 1). The classification is based on the notion, that the estimation of the missing data improves with the knowledge on meteorological conditions and with the use of the temporal auto-correlation of the variable that favours smaller time-windows.

Table 1: Quality classification scheme for gap-filled values, according to method used and averaging time window. A: best; B: acceptable; C: dubious. The output file contains a variable ‘fqc’, with values 0: no gap-filled; 1-3: gap-filled category A-C.

Averaging-time window (days)

Method

 

1

2

3

± 0.5

n.a.

n.a.

A

± 1.5-2.5

n.a.

n.a.

B

>± 2.5

n.a.

n.a.

C

± 7

A

A

n.a

± 14

A

B

n.a.

>± 28

B

C

n.a.

>± 56

C

C

n.a.

 

Flux-partitioning

Only original data (not gap-filled) are used for the flux-partitioning, and all original data flagged with a quality indicator >1 (e.g., with non-turbulent conditions) are dismissed. Night-time data was selected according to a global radiation threshold of 20 W m-2 (night below that threshold), cross-checked against sunrise and sunset data derived from the local time and standard sun-geometrical routines, and defined as ecosystem respiration (Reco). Then the data set is split into consecutive periods of length x (days), and for each period it is checked where there are more than six data points available and whether the temperature range is more than 5°C, since only under these conditions reasonable regressions of Reco versus temperature can be expected (x is a parameter of the algorithm and currently set to 10 days). For each of those periods where the criteria are met, the Lloyd-and-Taylor (1994) regression model

Eq. 1

is fitted to the scatter of ecosystem respiration (Reco) versus either soil or air temperature (T). While the regression parameter T0 is kept constant at –46.02°C as in Lloyd and Taylor (1994), the activation-energy kind of parameter (E0), that essentially determines the temperature sensitivity was allowed to vary. The reference temperature (Tref) was set to 10°C as in the original model.

For each period, the regression parameters and statistics are kept in memory and evaluated after regressions for all periods that have been performed. Only those periods where the standard error of the estimates of the parameter E0 is less than 50% and where estimates are within realistic bounds are accepted. The three estimates of E0 with the smallest standard error are then assumed to best represent the short-term temperature response of Reco and are averaged resulting in an E0,avg value for the data set. Subsequently, the respiration at reference temperature (Reco,ref) is estimated from the night time data for consecutive intervals of y days using non-linear regression of the Reco data versus temperature according to Eq. 1, where E0 is fixed to the E0,avg value (y is a parameter of the algorithm and currently set to 4). The estimated value Reco,ref is then assigned to the central time-point of the period and linearly interpolated between periods. Thus, for each half hour the parameters E0 and Reco,ref are available and are used to estimate Reco as a function of that one temperature (soil or air) that has been also used to derive the parameters.

u*-filtering

For the u*-filtering the data set is split into 6 temperature classes of sample size (according to quantiles) and for each temperature class the set is split into 20 u*-classes. The threshold is defined as the u*-class where the night-time flux reaches more than 95% of the average flux at the higher u*-classes. The threshold is only accepted if for the temperature class if temperature and u* are not or only weakly correlated (|r| < 0.3). The final threshold is defined as the median of the thresholds of the (up-to) six temperature classed. This procedure is applied to the subsets of four three-months periods to account for seasonal variation of vegetation structure. For each period the u*-threshold is reported, but the whole data set is filtered according to the highest threshold found (conservative appraoch). In cases where no u*-threshold could be found it is set to 0.4 . A minimum threshold is set to 0.1.

How to use

Use of this service is quite simple. It consists of three components.

  1. The data input form: Here you provide the file containing the data (as ASCII file )and some basic information that are necessary to read and process the data. As you imagine with automatic processing. It is absolutely important that the data format and the information you provide is exact.
  2. After you press the SUBMIT button the file is uploaded and first the gap-filling is performed. After a certain time depending on size of the data set, number of gaps, and server load and performance (2-15 min, maybe enough for getting a coffee) you are directed to

  3. the page with the gap-filling results. This page contains graphical 'fingerprint' representation of the data, and links to further info and the data itself (the page should look like this). If you follow the link (view whole output directory), you will find additional graphs. At the bottom you can proceed with the flux-partitioning, after inputting further necessary information. [If the gap-filling was not successful you get redirected to another page. You can investigate the last lines of the log, whether there was a problem with your data]
  4. After SUBMIT the flux-partitioning starts, and again after a few minutes you should arrive at

  5. the page with the flux-partitioning results (should look like this), that is structured as the gap-filling results page.

 

THE DATA FORMAT (IMPORTANT)

The data format is quite simple, but is the key to a successful processing. In the testing 99% of errors were related to this issue… The data should be provided as column-oriented ASCII file, e.g. TAB-delimited, as you can simply achieve by saving in EXCEL as text, TAB delimited. For a test you can download this file that represents the Tharandt 1998 data from the CARBODATA CD. Then you'd have to specify in the input form whereto you downloaded the file locally, the year="1998", the time format: "Julian Day, decimal hour", Delimiter="<TAB>".

Please refer to the following minimum template.

Day
Hour
qcNEE
NEE
LE
H
Rg
Tair
Tsoil
rH
precip
Ustar
--
--
--
umolm-2s-1
Wm-2
Wm-2
Wm-2
degC
degC
%
mm
ms-1

1

0.5

1

-1.21

1.49

-11.77

0

7.4

4.19

55.27

0.0

0.72

1

1

1

1.72

3.8

-13.5

0

7.5

4.2

55.95

0.0

0.52

1

1.5

2

-9999

1.52

-18.3

0

7.1

4.22

57.75

0.2

0.22

1

2

2

-9999

3.94

-17.47

0

6.6

4.23

60.2

0.1

0.2

1

2.5

1

2.55

8.3

-21.42

0

6.6

4.22

59.94

0.0

0.33

1

3

2

-9999

1.33

-20.55

0

6.5

4.21

59.25

0.0

0.15

Most importantly the file should have the following properties:

  1. The first line contains the variable names that are exactly the same as in the template (but not case-sensitive), since columns will be identified by the variable names. The same order of columns is preferred but not necessary. Also you may have additional columns without any problems.
  2. The qc variables indicate if the values is valid. If qc>1 then value is considered as faultly and internally replaced by 'missing'. Currently this is only important for NEE.
  3. The second line contains the variable units. Use e.g. "umol m-2 s-1" or "m s-1".
  4. The date/time components are separated into columns. E.g. first column: julian day, second column: decimal hour. Possible date formats are indicated in the input form. Never use an hour of 24 with the time format 'year', 'month', 'day', 'hour', 'minute' (use 0 instead). Hour '0' is interpreted as first hour of the day, i.e. when you have transition from one to another it must be like (day, 23 --> day+1, 0) not like (day, 23 --> day, 0), because then the data set is not chronological (this misunderstanding happened before).
  5. Missing values are indicated by –9999.
  6. Since VPD is used for gap-filling NEE, you should provide it. If you provide Tair and rH, the processing calculates VPD for you from those, using standard assumptions/equations. If none is provided gap-filling category A will more rarely be reached.

Output and security

The output is written into a formatted text file and stored in a directory that has an ID as name. Since directories are non-readable, and only you know the ID, nobody will be able to retrieve the data without your permission. The same is true of course for the input you provide.

I hope I considered everything in the explanation, so that you can go to input form now.

This is a product with contributions from

    ,


                                   

Valuable discussions with Eva Falge and Ralf Geyer are acknowledged.

DISCLAIMER

This is a free service. Usage is at your risk. Any liability of any damage occurring from the use of this service is disclaimed.

reichstein@unitus.it