Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to SeaExplorer gliders #3

Merged
merged 12 commits into from
Jun 6, 2016
Merged

Conversation

cyrf0006
Copy link
Contributor

@cyrf0006 cyrf0006 commented Mar 3, 2016

Please find here a version of the toolbox that now deals with SeaExplorer gldier data. It is the first time I do a pull request hopefully I am doing it okay.

The toolbox works at least on my local machine, but I had to modify 2 existing scripts that may affect the behavior with other gliders. Please verify this in priority:

  • posixtime (L.55)
  • gridGliderData (L.320)

Otherwise, all the changes I have made should not impact other gliders processing. These changes are tagged with a '-FC' comment. In order to test the toolbox, I have created the script main_glider_data_processing_dtsx

@joanpau
Copy link
Contributor

joanpau commented Mar 7, 2016

Thanks for the PR. Let's go with the revision.
I will add some comments to the code fragments that need some discussion and/or fix.
You can commit the required modifications, and then we can merge them in and close the PR.

error('glider_toolbox:posixtime:MissingMexFile', ...
'Missing required mex file.');
else % -FC
t = (now()-datenum(1970,1,1))*86400;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should be reset (git reset v1.1.1 m/mex_tools/posixtime.m).
The required mex file can be build with setupMexPosixtime,
(it simply exposes the time function from the C standard library).
Even though the conversion here is correct, and you can use it
if you do not want to compile the mex file, it should not be part of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I did not know how needed to compile in C first. Now it works.

@joanpau joanpau changed the title Working preliminary version that include SeaExplorer glider Add support to SeaExplorer gliders Mar 8, 2016
@joanpau
Copy link
Contributor

joanpau commented Mar 8, 2016

Please feel free to commit and push your fixes to your forked repository when they are ready.
The pull request will be automatically updated, and we will have a cleaner context
to solve the remaining issues (if any) and merge in the changes.

preprocessing_options.depth_list(1).depth = 'GPCTD_PRESSURE';
preprocessing_options.depth_list(2).depth = 'SBD_PRESSURE';
%preprocessing_options.depth_list(3).depth = 'NAV_DEPTH';

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to not use the navigation depth Depth here?
Note that, in addition to this depth, you can also enable the depth derivation from CTD pressure,
(which derives the depth using the pressure and the latitude).
You will end up with two variables: depth and depth_ctd.
They might or might not coincide, depending on how the glider works and the sampling configuration options.
Does the SeaExplorer have a dedicated pressure sensor for navigation or uses the one from the CTD?
Can/Do the navigation depth have a different sampling configuration than the CTD pressure?
Sometimes it might useful to have both of them, for example if the CTD sampling is disabled
while other sensors are enabled.

Of course this is just the default configuration, users can adjust it to their need in their runtime environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I was a bit messed with the depth and depth_ctd variables.

There are 2 pressure sensors on SeaExplorer. One for Navigation that return Depth and NAV_DEPTH. They are the same except one is sub-sampled and found in navigation files (Depth ) while the other is found in the higher sampling rate science files. These are not always well calibrated

There is also the pressure sensor of the CTD which is the only one reliable here: SBD_PRESSURE or GPCTD_PRESSURE.

Should I keep both variables under depth and depth_ctd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an addition to this comment: does the pressure converted to depth using hydrostatic relationship somewhere? or using depth_ctd is actually pressure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I would suggest to keep the navigation depth
(ordered list of preferred sources: NAV_DEPTH,Depth) as depth,
and allow the depth from pressure derivation, resulting in the depth_ctd variable.
The rationale is that the GPCTD sampling might be disabled in some stage of a deployment while other sensors are left on, and there will be no vertical coordinate for those readings
(or a very poorly estimated one by interpolation of the previous and next
pressure measurements, which could be too far away),
Actually this is a common situation if the CTD is configured to sample only on downcasts.
Having both depth sequences in the final processed data allows the end user
to choose the best vertical reference coordinate for the data he wants to consume.

The depth_ctd is derived from pressure and latitude using CSIRO's sw_dpth
in processGliderData.m.

@joanpau
Copy link
Contributor

joanpau commented Apr 4, 2016

I merged your PR into the seaexplorer branch, and made some improvements:
mainly style and documentation fixes, and further modifications to sx2mat
to make it Octave compatible and preserve the names of timestamp variables.
Check the details in commit 51b085f. Before merging into master and close the PR,
I would like to discuss two further points:

  1. Regarding sxmerge, is there any good reason to prefer the current algorithm
    over the more general one in dbacat and/or dbamerge? In the later no data
    would be lost in the odd case of lines from gli and dat files sharing the same timestamp,
    so I feel tempted to switch to it. The implementation is quite trivial: build a sorted vector
    of the unique timestamps in variables Timestamp and PLD_REALTIMECLOCK,
    and merge the rows of the gli and dat datasets in the order of that reference timestamp vector.
    Thus the rows of the final matrix will obey to one of these 3 cases:

    • non-NaN values only in some entries corresponding to gli variables
      (if the row comes from gli data)
    • non-NaN values only in some entries corresponding to dat variables
      (if the row comes from dat data)
    • non-NaN values only in some entries corresponding gli and dat variables
      (if there is a row in gli data and a row in dat data with that same timestamp)

    This also rises the question of duplicated variable names in the gli and dat datasets.
    The current code mimics what is done for Slocum gliders. In that case, there might be
    variables with the same name in the navigation and science datasets, and the duplicated ones
    are renamed to follow the behavior of the data merging program provided by the manufacturer.
    Since with the last modifications the names of the variables in SeaExplorer files are preserved,
    the variable names in gli and dat datasets are all different, so that there are no duplicated
    variables and this might be totally irrelevant. In any case, to guard against inadvertent name
    collisions in the future, entries from variables with the same name would be merged into
    the same column according to their timestamp, raising an error if they overlapped with
    inconsistent values (entries from rows with the same timestamp but different non-NaN values).
    Does all this seem reasonable? Or can there be duplicated variables in gli and dat datasets
    that we want to distinguish?

  2. For completeness, it would be nice to provide support for SeaExplorer gliders
    in real time mode too. The processing stages are the same, so there is no need for new code
    except from for the retrieval of raw files transmitted by the glider.
    How do SeaExplorer gliders transmit real time data? Do they send data files to a remote server?
    If so, how can one access that files? FTP or SFTP,maybe? To a fixed directory?

@cyrf0006
Copy link
Contributor Author

cyrf0006 commented Apr 4, 2016

Wow, awesome, many thanks!
Before I answer your questions, how can I (elegantly) test your branch? Should I clone the new branch seaexplorer on my local repo (make a new branch?)

@joanpau
Copy link
Contributor

joanpau commented Apr 4, 2016

With git fetch ORIGIN_REPO you can fetch the changes in the original repository,
synchronizing your (local read-only) copy of the original repository, i.e. your remote branches.

Then you can git checkout --detach ORIGIN_REPO/seaexplorer
for quick tests in 'detached head mode' (look at the DETACHED HEAD section).

Alternatively you can pull the changes into your seaexpplorer with
git checkout seaexpplorer and git merge ORIGIN_REPO/seaexplorer
(or its shortcut git pull ...)

@cyrf0006
Copy link
Contributor Author

cyrf0006 commented Apr 5, 2016

I tested the new version, it looks like it works very well on the data. Here are the answers to your questions:

  • 1.1 No good reason except I was trying to obtain a working version before attacking more fancy problems (and for some reason I couldn't figure out dbamerge algorithm for SeaExplorer data...). Here time duplication only occurs because time in nav (gli) files are precise to the second while science (dat) files are precise to 1/100 of a second. Sometimes nav timestamp are averaged to timestamps shared by science data if the latter falls exactly on a full second (e.g.XX sec. vs XX.000 sec.). Except this, the result should be quite similar to the algorithm of dbamerge. But I am also okay with using the algorithm you suggest. You can implement it if straightforward for you, or I can do it on the merged version.
  • 1.2 There should be no variable name duplication in navigation (CamelCased) and science files (ALL_CAPITALS). The only thing is that until now I was computing the time variable (Posixtime) from the time string in sx2mat. With the new naming convention in this commit, there should be no duplication.
  • 2.0 For the moment, real time data are not accessible via a server. But this will come shortly (I hope). At this moment I will implement the real time processing.

Other points to discuss about the merge:

  • 3.0 I saw that the payload configuration information (the new version I implemented here in global attributes) was removed. Was it on purpose? Because information in this file may be useful for post-processing, we should keep it at least in NetCDF_L0 (to group all information on deployment and avoid carrying the extra file seapayload.cfg on the side);
  • 4.0 I just received some feedback from the manufacturer and the science files have been renamed (pld isntead of dat). So I guess it is now needed to adapt the code (changing dat to pld). Actually, the new naming convention is as follows:
    ".gli.sub" for navigation files (subsampled)
    ".pld1.sub" for realtime science files (subsampled); pld2 if 2 payloads are planed
    ".pld1.raw" for full science files;

Please instruct me if I should go forward with the changes or if you prefer handling them.

@joanpau
Copy link
Contributor

joanpau commented Apr 7, 2016

  • For the merging function sxmerge, I will switch to the more general method then.
  • For the payload configuration, yes, it is not in the NetCDF L0 default configuration by default.
    However, the final user is expected to modify this default configuration according to its needs.
    In this case you just need to add the empty attribute 'configuration_file' to the global attributes
    in the L0 output configuration. You can even commit that in your run time branch.
    (By the way, I would say that 'payload_configuration' or 'seaexplorer_payload_configuration'
    would be a more explicit name for an attribute with the contents of a file named seapayload.cfg.)
  • For the dat-to-pld renaming, can we reasonably think that the later is the definitive nomenclature,
    or do we foresee more changes in the near future (next days/weeks)? If it is stable,
    I will do the cosmetic fixes to the code to reflect the change,
    otherwise I would prefer to wait for the final names.
  • For the real time processing, do you mean that right now there is no communication
    while the glider is at sea, or that data files are not transferred to a computer on-shore?
    In that case, do you have any hint on how this will work?
    And when will this be enabled, at least for development or testing purposes?

@cyrf0006
Copy link
Contributor Author

cyrf0006 commented Apr 7, 2016

Answers in order:

  • okay;
  • okay. But note that the file seapayload.cfg is inherent to SeaExplorer gliders and it would not make sense to rename it unless the manufacturer does it. This files is found on glider's computer and generally accompany the data after recovery.
    [update]: Thinking about this comment, maybe the user could rename it during post-processing with a name specific for deployment (seaexplorer_payload_dplXXX.cfg)...
  • We can reasonably think that these are final...
  • I am speaking solely about the glider I am using. For the moment our real time data are received on a dedicated Windows machine at the manufacturer. These are accessible only via TeamViewer interface (for navigation and/or data download). It is thus not a proper server. It is planned that our institute set its own control panel soon and I will make sure that the real time data are stored on a linux server. My plan is also to transfer these data on EGO-network for real time visualization. I hope this will be before the end of the summer.
    However, for testing purposes, I have a server here where I can store the data manually after deployment and to which we can connect remotely. So I guess we can test it easily.

@joanpau
Copy link
Contributor

joanpau commented Apr 7, 2016

Regarding the payload configuration, I was talking about the name of the attribute
in the NetCDF L0 file, not the name of the payload configuration file.

I will update code and references to .dat files to use .pld instead, and the name patterns
in the default configuration of file options. We have 2 choices here:

  • seaexplorer_options.pld_name_pattern = '^.*pld.*$'; (by default only recognize .pld files)
  • seaexplorer_options.pld_name_pattern = '^.*(pld|gli).*$'; (recognize .pld or .gli files)
    Any preference?

Regarding the real time file access, I see. If that machine at the manufacturer can be accessed
using either FTP or SSH (SFTP indeed), and the file location in that system follows some convention,
we can try to write something, it will be easy. Otherwise, I would wait until the real time file reception
options provided by the manufacturer are consolidate or at least less vague.

@cyrf0006
Copy link
Contributor Author

cyrf0006 commented Apr 7, 2016

Ok sorry for the attribute. Better indeed!

I would prefer option 2 for pld name. But you should read '^.*(pld|dat).*$'; (dat, not gli).

Better wait then. It is not accessible via FTP at the moment.

Joan Pau Beltran added 3 commits April 19, 2016 09:42
Change the merging algorithm of `sxmerge` so that data entries
from gli and dat datasets with the same timestamp are properly merged,
preserving all the values instead of discarding those from the gli dataset.

Even though no variable duplication is expected between gli and dat data,
guard against future issues by checking the consistency of entries of
duplicated variables at the same time insant, if any. Report detailed
information about the inconsistencies in the error message.

Also update `sxcat` and `sx2mat` to match the coding style
and the error messages, and pass the proper time options
to the low level functions in `loadSeaExplorerData`.
According to the manufacturer, the `.dat` suffix of the SeaExplorer science
data files will be changed for `.pld` (payload).
Update code and documentation to reflect the change in the nomenclature.
@joanpau
Copy link
Contributor

joanpau commented Apr 19, 2016

The nomenclature of SeaExplorer files is updated now in the seaexplorer branch
to reflect the change from navigation and science to glider and payload.
The default file name patterns are also updated and made stricter.
sxmerge is also updated to use the more general algorithm described in a previous commit.
Please note that it preserves some entries from glider files that were lost in the previous version
(those from a row with the same timestamp in glider and payload files).

The only thing left is the real time support. In order to close the PR, if we can not figure out
how the data retrieval will be, I will add the relevant code to the real time script,
rising a not implemented error when reaching the data retrieval stage.

@cyrf0006
Copy link
Contributor Author

I know how the realtime data will be retrieved, but it is not implemented yet (server not installed). But the idea is that realtime data comming from IRIDIUM will be synchronized on a linux server accessible via ssh. But for the moment it is also okay to me if an error is raised if an attempt is made.

Concerning the PR, there is one little change I would like to add in configDTOutputNetCDFL* and in
configDataPreprocessingSeaExplorer to take into account 2 versions of MiniFluo Sensor (M1FL and M2FL, which use different wavelength), resulting in 8 possible variable names: M1FL_PHD1; M1FL_PHD2; M1FL_MON1; M1FL_MON2; M2FL_PHD1; M2FL_PHD2; M2FL_MON1; M2FL_MON2.

I would like to implement it on the latest version, but for some reason I cannot checkout the last commits:

$ git checkout remotes/origin/seaexpplorer
   HEAD is now at 51b085f... Improve functions for SeaExplorer support.

Here commit 51b085f... is not the most recent one. Am I doing something wrong?

@joanpau
Copy link
Contributor

joanpau commented Apr 25, 2016

That approach for RT data retrieval seems sound to me.
The communications are secured over SSH and you do not need to store user credentials
in the toolbox configuration files (provided you can use public-private key authentication method).
In addition, if you can use the SFTP extension (provided by most SSH servers),
the situation is quite similar to the Seaglider and Slocum case.
The only thing left is to state the path to the files from a specific glider and/or mission
(in the common case of several gliders sending files to the same remote server).

And for the the last changes: yes, you are doing something wrong:
you are checking out (updating the files in your working directory)
to the last commit of the remote branch seaexpplorer of your remote origin
(a local read-only branch to track the changes in that branch on that remote)
leading you to a detached head state (not on a local writable branch to commit to).
You may found reference links for these three concepts in a previous comment,
which I really recommend because it is invaluable info to understand how to work with git.

What you want to do is to fetch the last changes in the branch seaexplorer of your SOCIB remote,
and merge them in your local seaexpplorer. Hence git fetch <SOCIB_REMOTE_NAME>.
will synchronize the state of the remote repository, fetching the last changes
but without modifying any of your local branches nor files.
Then, from your seaexpplorer (so git checkout seaexpplorer if you are not working on it),
and provided your working directory is clean, git merge <SOCIB_REMOTE_NAME>/seaexplorer
will integrate those commits into your branch, from where you can keep on the development.
You can perform the two steps at once with git pull <SOCIB_REMOTE_NAME>/seaexplorer,
to fetch the latest commits and merge them in the current branch
(effectively git pull = git fetch + git merge)· Again, please read the link about remote branches
where all this is explained in detail and with examples.

@cyrf0006
Copy link
Contributor Author

Yes, I remember your previous comment (and I tried to read the doc.) but my problem is fairly simple: I don't know with what I should replace <SOCIB_REMOTE_NAME> (!)

I tried git fetch git@github.com:socib/glider_toolbox.git/seaexplorer and git fetch remotes/origin/seaexpplorer without success...

$ git branch -a
  master
  run_local
* seaexpplorer
  remotes/origin/HEAD -> origin/master
  remotes/origin/develop
  remotes/origin/experimental_time_mismatch
  remotes/origin/master
  remotes/origin/run_local
  remotes/origin/seaexpplorer

@joanpau
Copy link
Contributor

joanpau commented Apr 25, 2016

Then the reference is this: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes
You may need to add the original repository from which you forked the toolbox.
You can do this with git remote add https://github.com/socib/glider_toolbox.git.
git remote -v will show a summary of your remotes, and git remote show <REMOTE> will provide the details of the specified remote. The -a option in git branch -a will list both the local branches and the remote branches (without it only local branches are list).

Use the new variable names from the manufacturer to distinguish both Minifluo-UV sensors. Output in NetCDF files are now expressed in excitation/emission couple, both for detection and monitoring channels:
'fluorescence_255_360'
'fluorescence_260_315'
'fluorescence_270_340'
'fluorescence_270_376'
'fluorescence_monitoring_255_360'
'fluorescence_monitoring_260_315'
'fluorescence_monitoring_270_340'
'fluorescence_monitoring_270_376'
@cyrf0006
Copy link
Contributor Author

See above a commit where variables names for Minifluo were updated. This version of the toolbox works fine on my SeaExplorer data. For my side, I do not expect to be able to perform real-time processing of the data before few months. So I let you decide whether or not you should close the PR and open a new one later.

@joanpau
Copy link
Contributor

joanpau commented Apr 28, 2016

No problem with the new changes, I'll merge them, add the missing parts
for the real time processing for completeness (leaving out the retrieval part),
and merge everything in the main branch to close the PR.
The file retrieval can be added later on
(it will be identical to getBasestationFiles for Seagliders I guess).

joanpau pushed a commit that referenced this pull request May 4, 2016
Integrate support for SeaExplorer gliders, with minimal adaptions:

  - Move oxygen frequency option from CTD to oxygen sensor in data
    preprocessing, and adapt the default preprocessing configuration.

  - Insignificant cosmetic style fixes.

  Close: #3
@joanpau
Copy link
Contributor

joanpau commented May 4, 2016

Well, I merged the PR in the develop branch. I think everything is ok, but you may want to test it.
If there are no further modifications, all this will be in the next version of the toolbox,
to be released in the next days.

@cyrf0006
Copy link
Contributor Author

I think I have found an annoying mistake in implementation of the SeaExplorer Case concerning the geolocation of profiles.

Lat/Lon variables written in the raw files are the last GPS coordinates recorded. In case of multiple yos between surfacing, this means that all yos will be stamped with the same lat/lon. This caused the toolbox to impose the same location to different profiles that are clearly not at the same position. Before digging into the code to correct this, what would be the most clever way to do it in the philosophy of the toolbox?

@joanpau
Copy link
Contributor

joanpau commented May 16, 2016

If the glider records the position coordinates only after the gps fixes at surface,
then there is no problem, because the missing position coordinates
(which will be there all along the down- and up-casts)
are filled by interpolation of the previous and next valid position readings.

If the glider is repeating the last GPS fix when subsurfacing,
then IMHO that is a bug in the glider implementation,
and I would suggest you to contact the developers to ask them
to fix the glider firmware to not record such non-sense positions.

If, however, you need to handle the later case, you need a way to distinguish
the valid position readings (generated from a gps fix at surface) from the repetitions.
If there is some variable telling you whether the glider is surfacing or subsurfacing,
you can use it as position status in the preprocessing,
with your custom function or value to discard the invalid readings.
Look at the position_list option of the preprocessing (preporocessGliderData):

      POSITION_LIST: longitude and latitude sensor choices.
        Struct array selecting longitude and latitude sensor sets in order
        of preference, with optional mask of valid position readings.
        It should have the following fields:
          LONGITUDE: longitude sequence name.
          LATITUDE: latitude sequence name.
        It may have the following optional fields (empty or missing):
          STATUS: position status sequence name.
          STATUS_GOOD: position status good values or filter.
          STATUS_BAD:  position status bad values or filter.
          CONVERSION: position coordinate conversion.
            Handle or name of the position coordinate conversion function.
            If present and not empty, the selected longitude and latitude 
            sequences are converted through this function.
          TIME: time component of position reading timestamp.
          DATE: date component of position reading timestamp.
          TIME_CONVERSION: position timestamp conversion.
            Handle or name of the position timestamp conversion function.
            If present and not empty, the selected position time stamp and
            position date stamp sequences are converted through this function
            to a sequence of absolute timepstamps.
        Default value: struct('longitude', {'m_gps_lon' 'm_lon'}, ...
                              'latitude', {'m_gps_lat' 'm_lat'}, ...
                              'position_status', {'m_gps_status' []}, ...
                              'position_status_good', {0 []}, ...
                              'position_status_bad', {[] []}, ...
                              'conversion', {@nmea2deg @nmea2deg})

@cyrf0006
Copy link
Contributor Author

cyrf0006 commented May 18, 2016

Here I re-copy a discussion we had in private about this topic:

Please also note that it would be possible to use the variables NAV_RESOURCE or NavState as position status with a good value of 116
to discard both last gps fix repetitions and null coordinates positions,
and hence sxnmea2deg would not be needed.
The raw data you shared seems to confirm this. Try:

$ awk -F "; " '$5 != "" && $6 != "" {printf "%16s %16s %16s\n", $2,
$10, $11}' sea007.78.gli.sub.00* | less
$ awk -F "; " '$5 != "" && $6 != "" {printf "%16s %16s %16s\n", $2,
$5, $6}' 78.dat.* | less

If you want to give it a try, you can use:

preprocessing_options.position_list(1).longitude = 'NAV_LONGITUDE';
preprocessing_options.position_list(1).latitude = 'NAV_LATITUDE';
preprocessing_options.position_list(1).position_status = 'NAV_RESOURCE';
preprocessing_options.position_list(1).position_good = 116;
preprocessing_options.position_list(1).conversion = @nmea2deg;
preprocessing_options.position_list(2).longitude = 'Lon';
preprocessing_options.position_list(2).latitude = 'Lat';
preprocessing_options.position_list(2).position_status = 'NavState';
preprocessing_options.position_list(2).position_good = 116;
preprocessing_options.position_list(2).conversion = @nmea2deg;

If you can confirm that it works properly, I will commit the changes
before
doing the release.

I have done some checks, and unfortunately it seems that right after the
glider switches into State116, there might be few seconds with old GPS
position is left or (lat,lon)=(0,0) if a reboot was made for some
reasons.

Then how about something like this?

preprocessing_options.position_list(1).longitude = 'NAV_LONGITUDE';
  preprocessing_options.position_list(1).latitude = 'NAV_LATITUDE';
  preprocessing_options.position_list(1).position_status = 'NAV_RESOURCE';
  preprocessing_options.position_list(1).position_good = @(x, y, s)(s== 116 & ~(x == 0 & y ==0));
  preprocessing_options.position_list(1).conversion = @nmea2deg;
  preprocessing_options.position_list(2).longitude = 'Lon';
  preprocessing_options.position_list(2).latitude = 'Lat';
  preprocessing_options.position_list(2).position_status = 'NavState';
  preprocessing_options.position_list(2).position_good = @(x, y, s)(s== 116 & ~(x == 0 & y ==0));
  preprocessing_options.position_list(2).conversion = @nmea2deg;

This would certainly solve the problem of (lat,lon) = (0,0). Great.

But some wrong lat-lon from the previous surfacing would remain in the
early stages of state 116. How the toolbox would behave? I guess it would
be problematic because data will look like this:

(lat1, lon1) ; state_116 ; no_GPS (wrong data)
(lat1, lon1) ; state_116 ; no_GPS (wrong data)
(lat2, lon2) ; state_116 ; GPS (good data)
(lat2, lon2) ; state_116 ; GPS (good data)
(lat2, lon2) ; state_116 ; GPS (good data)
...
(empty, empty); other_state; no_GPS  (vehicle under water)
...
(lat2, lon2) ; state_116 ; no_GPS (wrong data)
(lat2, lon2) ; state_116 ; no_GPS (wrong data)
(lat3, lon3) ; state_116 ; GPS (good data)
(lat3, lon3) ; state_116 ; GPS (good data)
(lat3, lon3) ; state_116 ; GPS (good data)

So "empty" coordinates will be filled with exactly (lat2,lon2) while they
should lie in between (lat2, lon2) and (lat3,lon3).

@cyrf0006
Copy link
Contributor Author

cyrf0006 commented May 18, 2016

I also made this test to remove repeated data:

  preprocessing_options.position_list(1).longitude = 'NAV_LONGITUDE';
  preprocessing_options.position_list(1).latitude = 'NAV_LATITUDE';
  preprocessing_options.position_list(1).position_status = 'NAV_RESOURCE';
  preprocessing_options.position_list(1).position_good = @(x, y, s)(s == 116 & ~(x == 0 & y ==0) & ~(diff([x(1); x])==0 & diff([y(1); y])==0));
  preprocessing_options.position_list(1).conversion = @nmea2deg;

  preprocessing_options.position_list(2).longitude = 'Lon';
  preprocessing_options.position_list(2).latitude = 'Lat';
  preprocessing_options.position_list(2).position_status = 'NavState';
  preprocessing_options.position_list(2).position_good = @(x, y, s)(s == 116 & ~(x == 0 & y ==0) & ~(diff([x(1); x])==0 & diff([y(1); y])==0));
  preprocessing_options.position_list(2).conversion = @nmea2deg;

Which works okay for removing repetitions within the same profile, but cannot remove repetitions when they are seaparated by invalid values. The result is still that some different casts are attributed to the same geographic location. The (sloppy) workaround I found is the following:

  preprocessing_options.position_list(1).longitude = 'NAV_LONGITUDE';
  preprocessing_options.position_list(1).latitude = 'NAV_LATITUDE';
  preprocessing_options.position_list(1).position_status = 'NAV_RESOURCE';
  preprocessing_options.position_list(1).position_good = @(x, y, s)(s == 116 & ~(x == 0 & y ==0));
  preprocessing_options.position_list(1).conversion = @sxnmea2deg;

  preprocessing_options.position_list(2).longitude = 'Lon';
  preprocessing_options.position_list(2).latitude = 'Lat';
  preprocessing_options.position_list(2).position_status = 'NavState';
  preprocessing_options.position_list(2).position_good = @(x, y, s)(s == 116 & ~(x == 0 & y ==0));
  preprocessing_options.position_list(2).conversion = @sxnmea2deg;

Where sxnmea2deg has a totally different functionality than before: it removes repetition in lat/lon even if separated with invalid positions

function varargout = sxnmea2deg(varargin)
  error(nargchk(1, 2, nargin, 'struct'));

  for varargidx = 1:numel(varargin)
    nmea = varargin{varargidx};
    deg = fix(nmea/100) + rem(nmea,100)/60;
    varargout{varargidx} = deg;
  end

  % Remove repetition in coordinates (no GPS = repeat last position)
  if numel(varargin) > 1
      index_valid = find(~isnan(varargin{1}));
      index_repetition = find(diff(varargin{1}(index_valid)) == 0 & diff(varargin{2}(index_valid)) == 0);
      if ~isempty(index_repetition)
          varargout{1}(index_valid(index_repetition+1)) = NaN;
          varargout{2}(index_valid(index_repetition+1)) = NaN;
          warning('glider_toolbox:sxnmeadeg:Repetition of GPS coordinates', ...
                  '%d invalid positions replaced by [NaN, NaN].', ...
                  sum(index_repetition));
      end
end

This seems to correct the problem.

@joanpau
Copy link
Contributor

joanpau commented May 18, 2016

Let me add here for reference some of the final points in the private discussion:

  • navigation state==116 it is not a sufficient condition to distinguish fix repetitions from real fixes
  • It is unclear if there is some other variable or condition that signals whether a fix is good or not.
  • If there is no such condition, it is possible to filter out all bad fixes using navigation state first,
    and then remove all repeated fix coordinates.
  • It implies the formal risk of discarding actual fixes that eventually are at the same location,
    or so close that due to the number of decimal digits used in the data file
    they result in the same NMEA coordinates (although with 3 decimal digits in NMEA degrees
    points at a distance of >=3 meters should have distinguishable coordinates at all locations).

From the above observations, and to separate concerns and avoid code repetitions,
instead of the proposed modifications to sxnmea2deg I would suggest to remove it,
leaving the conversion for the standard nmea2deg, and moving the fix check to a specific function:

function valid = sxgoodfix(latitude, longitude, state)
  valid = (state == 116) ...
        & ~isnan(latitude) & ~isnan(longitude) ...
        & ~(latitude == 0 & longitude == 0);
  valid_indices = find(valid);
  valid(valid_indices(2:end)) = ...
    diff(latitude(valid)) ~= 0 & diff(longitude(valid)) ~= 0; 
end

The configuration to perform that good fix filtering would be something like this:

  preprocessing_options.position_list(1).longitude = 'NAV_LONGITUDE';
  preprocessing_options.position_list(1).latitude = 'NAV_LATITUDE';
  preprocessing_options.position_list(1).position_status = 'NAV_RESOURCE';
  preprocessing_options.position_list(1).position_good = @sxgoodfix;
  preprocessing_options.position_list(1).conversion = @nmea2deg;
  preprocessing_options.position_list(2).longitude = 'Lon';
  preprocessing_options.position_list(2).latitude = 'Lat';
  preprocessing_options.position_list(2).position_status = 'NavState';
  preprocessing_options.position_list(2).position_good = @sxgoodfix;
  preprocessing_options.position_list(2).conversion = @nmea2deg;

I am not sure whether that should be in the default configuration or not.
I guess that if the problem is common right now (say present in almost every deployment), then yes.
Otherwise it can be left as a suggestion in a comment in the default configuration.

Trials and opinions welcome.

@cyrf0006
Copy link
Contributor Author

cyrf0006 commented May 18, 2016

I confirm that the last proposition works on several deployments that I have: problems encountered in the earlier version where of multiple casts were associated to the same geographical location are gone. I suggest to keep as default until a new firmware is implemented in the SeaExplorer.

Do you prefer implementing the changes yourself (e.g. writing header for sxgoodfix) or I should do it?

@joanpau
Copy link
Contributor

joanpau commented May 19, 2016

It is already in develop (091b906).

@joanpau joanpau merged commit 0e3a3e8 into socib:master Jun 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants