Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDA get_data never returns on long requests #40

Closed
Dolgalad opened this issue Apr 15, 2022 · 7 comments · Fixed by #41
Closed

AMDA get_data never returns on long requests #40

Dolgalad opened this issue Apr 15, 2022 · 7 comments · Fixed by #41
Assignees
Labels
enhancement New feature or request

Comments

@Dolgalad
Copy link
Contributor

  • Space Physics WebServices Client version: 0.10.1
  • Python version: 3.8.10
  • Operating System: Ubuntu

Description

AMDA creates background jobs to deal with requests that take too long to answer (timeout exceeded). Speasy is not notified of this fact, thus when trying to retrieve a large dataset the get_data method may never return, it will wait indefinitely.

What I Did

import speasy as spz
param_id = "amda/solo_b_rtn_hr"
start = "2020/01/01T00:00:00"
stop = "2021/01/01T00:00:00"
p = spz.get_data(param_id, start, stop)

Solution

Modified the dl_parameter function in speasy.webservices.amda._impl module :

import numpy as np
from datetime import timedelta
....
    def parameter_concat(self, param1, param2):
        """Concatenate parameters
        """
        if param1 is None and param2 is None:
            return None
        if param1 is None:
            return param2
        if param2 is None:
            return param1
        param1.time = np.hstack((param1.time, param2.time))
        param1.data = np.hstack((param1.data, param2.data))
        return param1

    def dl_parameter(self, start_time: datetime, stop_time: datetime, parameter_id: str, **kwargs) -> Optional[
        SpeasyVariable]:
        if isinstance(start_time, datetime):
            start_time = start_time.timestamp()
        if isinstance(stop_time, datetime):
            stop_time = stop_time.timestamp()
        dt = timedelta(days=1).total_seconds()
        if stop_time - start_time > dt:
            var = None
            curr_t = start_time
            while curr_t < stop_time:
                #print(f"Getting block {datetime.utcfromtimestamp(curr_t)} -> {datetime.utcfromtimestamp(curr_t + dt)}")
                if curr_t + timedelta(days=1).total_seconds() < stop_time:
                    var = self.parameter_concat(var , self.dl_parameter(curr_t, curr_t + dt, parameter_id, **kwargs))
                else:
                    var = self.parameter_concat(var, self.dl_parameter(curr_t, stop_time, parameter_id, **kwargs))
                curr_t += dt
            return var

        url = rest_client.get_parameter(
            startTime=start_time, stopTime=stop_time, parameterID=parameter_id, timeFormat='UNIXTIME',
            server_url=self.server_url, **kwargs)
        if url is not None:
            var = load_csv(url)
            if len(var):
                log.debug(
                    f'Loaded var: data shape = {var.values.shape}, data start time = {datetime.utcfromtimestamp(var.time[0])}, data stop time = {datetime.utcfromtimestamp(var.time[-1])}')
            else:
                log.debug('Loaded var: Empty var')
            return var
        return None
@jeandet
Copy link
Member

jeandet commented Apr 16, 2022

Reading this issue makes me think about introducing a max_request_duration parameter that could be 1 day by default. Because the best would be to have a maximum data size for requests but this is really hard to evaluate. Letting the user override this value allows to increase it for really slow datasets where a several days request would be OK and likely more efficient.

@brenard-irap
Copy link

In AMDA, when a "getParameter" request time is greater than 4 minutes, the execution enter in a batch mode.

In this case, the result will look like:

{
    "success": true,
    "status": "in progress",
    "id": "process_ucuGXR_1650348560_252656"
}

In this condition, getStatus API can be used to retrieve the status.
This API can be called until the request is complete. And when it's done, the result should be:

{
    "success": true,
    "status": "done",
    "dataFileURLs": "http://amda.irap.omp.eu/AMDA//data/WSRESULT/getparameter_mms1_dce_qual_brst_35b6786739efcdc5a74ab1dca29d3b6b_20210101T000000_20210102T000000.txt"
}

It seems that Speasy does not implement this scenario.

For information, our AMDA backend is behind a proxy with a timeout defined as 5 minutes. This is why we need to enter in a "batch mode" when the execution of a request is "too long".

@jeandet
Copy link
Member

jeandet commented Apr 19, 2022

@brenard-irap we can use this from REST API now?

@brenard-irap
Copy link

@jeandet Yes

@jeandet
Copy link
Member

jeandet commented Apr 19, 2022

Ok, I propose to work on that during next week workshop.

@jeandet jeandet added the enhancement New feature or request label Apr 19, 2022
@Dolgalad
Copy link
Contributor Author

Keep in mind that when timeout is reached the "batch mode" task is created on the server. This means that if a user interrupts speasy while its getting data, the task will keep running on the server, this is why I don't like the timeout solution.
Splitting the time range into intervals means that if the user interrupts the process only a single block of data will be requested from the server.
Splitting the data also provides a natural way of notifying the user of the progress of the request (functionality I find useful when dealing with long time periods).

Another simple way of dealing with this problem is to raise an Exception if a timeout is reached. The value of the timeout needs to be smaller than the 4 minutes used by AMDA.

@Dolgalad
Copy link
Contributor Author

Dolgalad commented Apr 19, 2022

PR #41

@jeandet jeandet linked a pull request Jun 27, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants