** Example of processing a complex set of multiyear observations **

# Last edited 170620                                             

# Data set: Three years for surveys in multiple parts of
            the eastern-Mediterranean/Caucasus

# Strategy:
  - combine daily sGPS GAMIT processing with cGPS h-files
  - aggregate the processing by survey length up to about 20 days
  - use survey-combined H-files to get long-term repeatabilities 
    and velocities

# Assumed directory structure:
 /EMed
    /2004 /2006 /2008 /tables /vsoln /emed_rinex

Under each year (experiment) directory is the usual structure
for sh_gamit and sh_glred (e.g. /tables, /[day-number], /rinex, 
/gsoln, etc..  Under /emed_rinex are the survey-mode RINEX files
organized by country and year, e.g., /azerbaijan/2004.
          

# Step 1:  Determine the temporal and spatial distribution of the data

For example, if the data are stored in directories by country and
year, within each RINEX directory you can use sh_get_times to obtain 
a list,   e.g. in rinex_data/azerbaijan/2004, type 
'sh_get_data -sort -f *.04o > times.azer04.sorted',  where the file 
times.azer04.sorted will have
 
shik2672.04o   2004   267   13:31   269   13:42  48:11  5784 3      
khid2672.04o   2004   267   13:39   269   13:41  48:2   5766 3      
kurd2731.04o   2004   273   05:06   275   05:13  48:7  11553 3      
yevl2735.04o   2004   273   07:58   275   08:01  48:3  11536 3      
ayaq2730.04o   2004   273   09:44   275   08:38  46:54  5630 3      
kebe2821.04o   2004   282   09:36   284   07:59  46:23 11137 3      
qurd2821.04o   2004   282   11:32   284   08:02  44:30 10681 3      
shek2841.04o   2004   284   10:13   286   07:05  44:52 10771 3      
kate2840.04o   2004   284   13:23   286   11:32  46:9  11046 3      

and in emedrinex/georgia/2004/rinex, 
type 'sh_get_times -sort -f *.04o > times.georgia.sorted'     

zinv2730.04o   2004   273   13:15   274   13:14  23:59  2878 2      
zinv2740.04o   2004   274   14:23   275   14:19  23:56  2873 2      
pasn2750.04o   2004   275   18:01   276   14:24  20:23  2447 2      
pasn2760.04o   2004   276   14:25   277   14:19  23:54  2869 2      
pasn2770.04o   2004   277   14:51   277   19:01  4:10    502 1      
kazb2780.04o   2004   278   06:31   279   06:29  23:58  2876 2      
kazb2790.04o   2004   279   06:30   279   18:04  11:34  1389 1      
kazb2791.04o   2004   279   18:05   280   06:50  12:45  1531 2      
kudi2810.04o   2004   281   10:11   282   06:29  20:18  2436 2      
kudi2820.04o   2004   282   06:30   282   18:04  11:34  1389 1      
kudi2830.04o   2004   282   18:05   283   10:22  16:17  1955 2      
ingu2840.04o   2004   284   14:46   285   14:24  23:38  2836 2      
ingu2850.04o   2004   285   14:25   286   14:19  23:54  2869 2      
 
So we have data 267, 273–279, and 281-286.  Note that all of the 
Azeri data span parts of three days, so the default setting 
'rx_doy_minus 1' in process.defaults will not be sufficient to
pick up the data two days earlier.  We can change this in 
process.defaults or make it explicit in the sh_gamit command line.
            

# Step 2:  Set up the control files for the first year

Run sh_setup to put templates or actual files you'll need into 
           EMed/2004/tables.
	 
From the /2004 level type 'sh_setup -yr 2004'
 
Edit process.defaults and sites.defaults for this year.

Make appropriate changes to the template process.defaults:

# Directory path for RINEX archives (search all levels)
 set rnxfnd = "/emedrinex"            
# H-files archive
 set hfnd = /net/everest/raid6/ftp/pub/MIT_GLL/H04
# Set compress (copts), delete (dopts) and archive (aopts) options.
# Possible d-, c-, and a- opts: D, H, ao, ac, as, b, c, d, e, g, h, i, j, k, l, m, o, p, q, t, x, ps, all" 
 set copts = ( o q mkx )
 set dopts = ( D ao b c g i j k m p ps t x y )

The copt/dopt settings shown here represent a space-saving approach, retaining only
the files you are likely to view unless something goes wrong.  We rarely use 
the option to put some files in an archive directory, preferring to save only
the minimal files in the day directories themselves. 

Make appropriate changes to the template sites.defaults to automatically
download and include in the processing (by being present in 2004/rinex)
the ITRF08 sites you wish to use to tie the local h-files to the MIT
global files (case insensitive, 4- or 8-character names):

# Tie stations for MIT global h-files, used also for GAMIT constraints
#  North      
 bucu emed ftprnx  
#  West         
 ankr emed ftprnx 
 # Central
 nssp emed ftprnx
#  South 
 tela emed ftprnx 
#  East   
 kit3 emed ftprnx   
                 
Make the appropriate changes to the template sittbl. to provide
moderate constraints in the GAMIT processing to help in resolving
ambiguities:

SITE              FIX    --COORD.CONSTR.--  
      << default for regional stations >>
ALL               NNN    100.  100.   100.     
      << IGS core stations >>
BUCU BUCU_GPS     NNN    0.050 0.050  0.05 
TELA TELA GPS     NNN    0.050 0.050  0.05
KIT3 KIT3_GPS     NNN    0.050 0.050  0.05

Copy into 2004/gsoln the globk_comb.cmd and glorg_comb.cmd templates from gg/tables. 

# Step 3: Create station.info

In 2004/tables, which will contain the SOPAC/MIT version of station.info

 sh_upd_stnfo -ref station.info -w station.info.mit -l sd
 
which will result in the output file having only the entries for
sites listed with ftprnx in sites.defaults. 

Using the log sheets as a guide, in emed_rinex/azerjaiban/2004/rinex and 
emedrinex/georgia/2004/rinex, create survey-specific station.info entries 
manually (editor or using 'make_stnfo') or using 'sh_upd_stnfo -f *.04o' 
then editing any entries that need to be changed.   The starting file
must be named station.info since it is continually concatenated with
the information from each new RINEX file; and the starting file must have
a legitimate set of headers though these do not need to match those in 
files you wish to merge such as the MIT or SOPAC station.info file. Create
this starting template from an existing station.info file by removing
with an editor all entries except the headers.  Rename these survey-specific 
files to , e.g. station.info.azer04, and copy them to 2004/tables. Then merge 
the files:

 sh_upd_stnfo -ref station.info.mit -merge station.info.azer04 station.info.georgia04 
  
which will create a merged file, station.info.new, which you should rename to
station.info for processing. 


# Step 4: Do the GAMIT processing

 sh_gamit -expt emed -d 2004 267     -pres ELEV -rx_doy_minus 2  >&! sh_gamit.log 
 sh_gamit -expt emed -s 2004 273 279 -pres ELEV -rx_doy_minus 2  >&! sh_gamit.log                                
 sh_gamit -expt emed -s 2004 281 286 -pres ELEV -rx_doy_minus 2  >&! sh_gamit.log                                

Check the emailed summary files for sites included, nrms, ambiguities resolved, 
and failed editing (rms 0.). Fix any problems and rerun those days.


# Step 5: Generate the daily time series for editing

If not there already, edit /2004/gsoln/globk_comb.cmd to add a "contingent/option" 
line for saving an output h-file:

COMB out_glb H------_EMED.GLX

This command will be executed only if you include '-globk_cmd_prefit COMB'
in the sh_glred command line.

Edit glorg_comb.cmd to specify the stabilization you want for the
time series (apr_file, stab_site list, parameters to be estimated). 
Run sh_glred from the experiment (year) level (as with sh_gamit and
different from a direct running of glred): 

sh_glred -s 2004 267 2004 286 -expt emed -local —net MIT opt H LC G T >&! sh_glred.log 

Setting '-local' instructs sh_glred to use MIT global h-files only for
days on which a day directory (i.e. field data) exists.  The output of this 
command in the /gsoln directory will be a gdl file, log file, and org file 
for each day, a summary of time-series residuals (tsfit_2004-267_2004-286,sum,
and two new directories: /res_2014-267_2004-286 and /plots_2004-267_2004-286
containing respectively the residuals and time-series plots for each site
on each day. 

Check the stabilization by typing in /gsoln

  grep 'POS STAT' *.org

The number of sites in the stabilization should be what you expected
(4 or more for translation-only, 8 or more well-distributed sites for
translation+rotation), the wrms values should be 1-3 mm, and the nrms
values less than ~2.  

View all the plots ( 'gs *.ps’ ) and note any outliers or noisy 
segments.

Notes:
ardt 267-269 downweight (DW) by adding 6 mm; check long-term plots later
arga 282 U-30, DW 100         
atht 269 N-5 E-10  308 E+5      
boyl 270-272 DW 4      
bucu 286 N+3 287 U-10 
cagl 265-268 DW 5 291 E+7 293 U+15 
madr 282-283 E+5 290 E+5       
ozdm 267-269 N/E slope DW 15 or DW 267 15  LT E favors 267 N favors 268-269
reun 257 U+20  
torl 273-275 DW 5       
triz 269 U+40     
zeck 286 N+5 287 U-15

Experience has shown that with the standard data weighting in GAMIT.
using elevation-dependent weighting based on the autcln statistics, the
median nrms is about 0.7.  Hence we might treat any single value
or nrms that is greater than 1.5 (twice the median) as a two-sigma (95%
level) outlier and apply a downweight by adding white noise with sig_neu,
either enough to make the nrms ~1 or to increase the error bar to a value 
greater than the value of the outlier. In the case of an outlier, adding 
only enough white noise to equal the residual is not as objective as 
adding enough noise to make the outlier of negligible weight, so you 
should pay attention to how much the reweighted outlier is biasing 
the position estimate.  If your primary interest is horizontal deformation,
you might be very generous in the noise added to vertical outliers so
as to prevent their having any effect on the horizontals.

Generate conveniently sig_neu reweights using 'grw' to convert yr/doy
and mm to yy/mm/dd and meters to match the globk convention.  ‘grw’ will
write the converted lines into a file ‘tempout’ from which you can paste
them directly into globk_comb.cmd or by reference using the ‘source’
feature, e.g. ’ source ../../tables/emed_comb.reweights'.


# Step 6: Generate the combined h-file for the survey

sh_glred -s 2004 267 2004 286  -ncomb 20 -globk_cmd_prefix COMB  -expt emed -opt G  >&! sh_glred.log 

which will create H040924_EMED.GLX and globk_emed_04267-286.org, the latter 
to be checked for appropriate chi2 increments in the stacking. With the usual 
GAMIT weighting plus the downweighting (sig_neu) of outliers, the MIT GLX files 
will typically produce chi2 increments of 1 to 3, and regional networks < 1. 
We do not need to specify -net or opt LC in the sh_glred command since the
script will automatically use all the h-files present or linked in /glbf. 
 
*** Steps 1-6 to be repeated for each survey. ***


# Step 7: Generate the multi-year time series

Working in a directory parallel to the survey/year directories,
e.g. EMed/vsoln, create the multi-year gdl file by, e.g.
 
 ls ../????/gsoln/H*GLX > emed.gdl
 
Copy into /vsoln the template files globk_long.cmd and glorg_long.cmd 
from gg/tables and edit them for your analysis, generally needing to 
change only the use_site list (or 'sourced' file) and a list of edits 
in the globk command file, and the stab_list list in the glorg command 
file. 
 
As a check on severe conflicts due to misnamed sites, and on the
application of renames for instrumental changes and earthquakes,
it's a good idea at this point to run 'glist', which reads the h-files,
applies renames, and compares the coordinates with those in the 
apr_file:

glist emed.gdl emed.glist +1 ../tables/itrf08_comb.eq:../tables/emed_renames.eq 
   ' ' ../tables/itrf08_comb.apr  >! glist.out

The glist.out file will echo coordinate differences, and the emed.glist 
file will list all of the sites and their epochs.  If you want to 
generate an explicit use_site list at this point, based on proximity and
adequate epochs and time span, you can run 'glist2cmd' (type to see help).

Now to generate the time series: 

 rm globk_rep.prt globk_rep.org globk_rep.log 
 glred 6 globk_rep.prt globk_rep.log emed.gdl  globk_long.cmd >&! glred.out  
 mv globk_rep.org globk_rep_130708a.org

where we’ve used the current date and a letter extent to identify the 
solution.  The 'rm' step is remove any leftover files since the new ones 
will be concatenated onto the end of any old ones of the same name.

Again check the stabilization quality with 

  grep 'POS STAT' globk_rep_130708a.org

Then generate the plots:

 mkdir plots
 cd plots
 tssum . mit.final.itrf08 -R ../globk_replong_130708a.org 
 sh_plot_pos -f *.pos -r -t tsfit.cmd -t1 2004_001 -t2 2009-001 -u 

w
    
Make note of any problems:

bgnt 04  E+8
blot 04(1) N-50 E-35 typo?     
bucu 04 systematic cf later, DW 5
cnca N nrms 1.6  DW N only, 5    
garb 06 N-10
gr02 N nrms 1.8 DW 5 both  
hoss DW N 4 

As noted in the lecture, the small number of epochs (even for the 
cGPS, after aggregation) means that you have to be careful to remove 
or down-weight any outliers or noisy segments. As with the daily time 
series, the typical long-term nrms median is about 0.7, so you can use 
the same "1.5 threshold" to indicate a 2-sigma outlier. The sig_neu
commands to applied to the long-span solution are best created manually
and put into a separate ‘source’d file, e.g. emed.vel.reweight.  Be 
careful that the date range given in the sig_neu commands fully encompasses
the dates of the H-file to which you wish to apply the downweights. 
                                                                    

# Step 8: Perform a velocity solution

As with continuous data, we must consider the effect of temporally 
correlated noise in the estimates and uncertainties of the velocities. 
So although the small number of observations precludes using the 
First-Order Gauss-Markov Extrapolation (FOGMEx, “realistic sigma”) 
algorithm to the appropriate level of random walk noise for each site, 
we must nevertheless include some in the velocity solution.  The value
to use could be determined from prior experience, from the median of 
the values estimated for cGPS stations in your region, or, iteratively
from a validation of the velocities against a model.  A reasonable value
to start would be 1 mm/sqrt(yr) for the horizontals and 3 mm/sqrt(yr) 
for the verticals, e.g.

 mar_neu all .000001 .000001 .000009 0 0 0 

where the units are m**/yr. Note that sig_neu noise is additive (site-
specific added to “all”), a site-specific mar_neu value supersedes 
“mar_neu all”. 
     
Do the solution:

 rm globk_vel.prt globk_vel.org globk_vel.log 
 globk 6 globk_vel.prt globk_vel.log emed.gdl  globk_long.cmd  VEL >&! glred.out  
 mv globk_vel.org globk_vel_130708a.org
                           
where the VEL option refers to the VEL keywords in globk_long.cmd 
and glorg_long.cmd which turn on velocity estimation.  This scheme
allows you to use the same command files for repeatabilities as
for velocities without any explicit editing.  With the org_opt setting 
in the template (CMDS GLDF PSUM VSUM FIXA RNRP), all of the information
you need to evaluate both the stacking (globk) and the stabilized
solution (glorg) will appear in the glorg print file, globk_vel.org.
Things to check are the chi2 increments from the stacking (all should
be of order ~1.0), the stabilization (number of sites retained in
the iteration and the N E U rms and nrms values), the chi2 increments 
for equated parameters, and the uncertainties of the velocity estimates 
(1-2 mm/yr or less horizontal for a 3-year span of data using at least 
12 hrs of data each year and a random-walk of 1 mm/sqrt(yr). 

To repeat the solution with changes to the equates, stabilization, or
‘plate’ definitions, run glorg separately:

 glorg globk_vel_170708b.org PSUM:VSUM:GDLF:CMDS:FIXA:RNRP glorg_vel.cmd vel.com 
 
To get velocity fields (vel-files) in the reference frames of the
plates listed in the glorg command file, use, e.g.:

 sh_org2vel -file globk_vel_170708b.org -plate eurasia nubia arabia 


# Step 9: Generate the time series for the velocity solution

In principle, the statistics relevant to the velocity solution should
flow from time series generated with a stabilization that includes
all of the well-determined sites in the solution.  If your solution has 
a strong IGS cGPS component over the full span, then the improvement 
using the regional stations will be minor.  On the other hand, if some
or all of your epochs are weak in IGS coverage, using the regional
stations in the stabilization is quite important to evaluate the
actual scatter of each station's epoch-by-epoch positions without
confusion added by a weak stabilization at some epochs.  To get the 
new time series, create an apr file from your velocity solution,

  sh_exglk -f globk_vel_120708a.org -apr vel_120708a.apr

Create a stab_site list with all of the sites that have reasonably
good position and velocity uncertainties.  A utility to help with
this is 'vel2stab'.  

Then repeat Step 7 with the new apr-file (only) and the new 
stabilization list.