AMDA get_data never returns on long requests #40

Dolgalad · 2022-04-15T14:11:55Z

Space Physics WebServices Client version: 0.10.1
Python version: 3.8.10
Operating System: Ubuntu

Description

AMDA creates background jobs to deal with requests that take too long to answer (timeout exceeded). Speasy is not notified of this fact, thus when trying to retrieve a large dataset the get_data method may never return, it will wait indefinitely.

What I Did

import speasy as spz
param_id = "amda/solo_b_rtn_hr"
start = "2020/01/01T00:00:00"
stop = "2021/01/01T00:00:00"
p = spz.get_data(param_id, start, stop)

Solution

Modified the dl_parameter function in speasy.webservices.amda._impl module :

import numpy as np
from datetime import timedelta
....
    def parameter_concat(self, param1, param2):
        """Concatenate parameters
        """
        if param1 is None and param2 is None:
            return None
        if param1 is None:
            return param2
        if param2 is None:
            return param1
        param1.time = np.hstack((param1.time, param2.time))
        param1.data = np.hstack((param1.data, param2.data))
        return param1

    def dl_parameter(self, start_time: datetime, stop_time: datetime, parameter_id: str, **kwargs) -> Optional[
        SpeasyVariable]:
        if isinstance(start_time, datetime):
            start_time = start_time.timestamp()
        if isinstance(stop_time, datetime):
            stop_time = stop_time.timestamp()
        dt = timedelta(days=1).total_seconds()
        if stop_time - start_time > dt:
            var = None
            curr_t = start_time
            while curr_t < stop_time:
                #print(f"Getting block {datetime.utcfromtimestamp(curr_t)} -> {datetime.utcfromtimestamp(curr_t + dt)}")
                if curr_t + timedelta(days=1).total_seconds() < stop_time:
                    var = self.parameter_concat(var , self.dl_parameter(curr_t, curr_t + dt, parameter_id, **kwargs))
                else:
                    var = self.parameter_concat(var, self.dl_parameter(curr_t, stop_time, parameter_id, **kwargs))
                curr_t += dt
            return var

        url = rest_client.get_parameter(
            startTime=start_time, stopTime=stop_time, parameterID=parameter_id, timeFormat='UNIXTIME',
            server_url=self.server_url, **kwargs)
        if url is not None:
            var = load_csv(url)
            if len(var):
                log.debug(
                    f'Loaded var: data shape = {var.values.shape}, data start time = {datetime.utcfromtimestamp(var.time[0])}, data stop time = {datetime.utcfromtimestamp(var.time[-1])}')
            else:
                log.debug('Loaded var: Empty var')
            return var
        return None

The text was updated successfully, but these errors were encountered:

jeandet · 2022-04-16T15:07:31Z

Reading this issue makes me think about introducing a max_request_duration parameter that could be 1 day by default. Because the best would be to have a maximum data size for requests but this is really hard to evaluate. Letting the user override this value allows to increase it for really slow datasets where a several days request would be OK and likely more efficient.

brenard-irap · 2022-04-19T06:37:11Z

In AMDA, when a "getParameter" request time is greater than 4 minutes, the execution enter in a batch mode.

In this case, the result will look like:

{
    "success": true,
    "status": "in progress",
    "id": "process_ucuGXR_1650348560_252656"
}

In this condition, getStatus API can be used to retrieve the status.
This API can be called until the request is complete. And when it's done, the result should be:

{
    "success": true,
    "status": "done",
    "dataFileURLs": "http://amda.irap.omp.eu/AMDA//data/WSRESULT/getparameter_mms1_dce_qual_brst_35b6786739efcdc5a74ab1dca29d3b6b_20210101T000000_20210102T000000.txt"
}

It seems that Speasy does not implement this scenario.

For information, our AMDA backend is behind a proxy with a timeout defined as 5 minutes. This is why we need to enter in a "batch mode" when the execution of a request is "too long".

jeandet · 2022-04-19T08:06:20Z

@brenard-irap we can use this from REST API now?

brenard-irap · 2022-04-19T08:42:36Z

@jeandet Yes

jeandet · 2022-04-19T09:30:23Z

Ok, I propose to work on that during next week workshop.

Dolgalad · 2022-04-19T10:35:06Z

Keep in mind that when timeout is reached the "batch mode" task is created on the server. This means that if a user interrupts speasy while its getting data, the task will keep running on the server, this is why I don't like the timeout solution.
Splitting the time range into intervals means that if the user interrupts the process only a single block of data will be requested from the server.
Splitting the data also provides a natural way of notifying the user of the progress of the request (functionality I find useful when dealing with long time periods).

Another simple way of dealing with this problem is to raise an Exception if a timeout is reached. The value of the timeout needs to be smaller than the 4 minutes used by AMDA.

Dolgalad · 2022-04-19T15:22:44Z

PR #41

jeandet assigned jeandet and Dolgalad Apr 19, 2022

jeandet added the enhancement New feature or request label Apr 19, 2022

jeandet linked a pull request Jun 27, 2022 that will close this issue

added time range splitting and process status management #41

Merged

jeandet closed this as completed in #41 Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMDA get_data never returns on long requests #40

AMDA get_data never returns on long requests #40

Dolgalad commented Apr 15, 2022

jeandet commented Apr 16, 2022 •

edited

Loading

brenard-irap commented Apr 19, 2022

jeandet commented Apr 19, 2022

brenard-irap commented Apr 19, 2022

jeandet commented Apr 19, 2022

Dolgalad commented Apr 19, 2022

Dolgalad commented Apr 19, 2022 •

edited

Loading

AMDA get_data never returns on long requests #40

AMDA get_data never returns on long requests #40

Comments

Dolgalad commented Apr 15, 2022

Description

What I Did

Solution

jeandet commented Apr 16, 2022 • edited Loading

brenard-irap commented Apr 19, 2022

jeandet commented Apr 19, 2022

brenard-irap commented Apr 19, 2022

jeandet commented Apr 19, 2022

Dolgalad commented Apr 19, 2022

Dolgalad commented Apr 19, 2022 • edited Loading

jeandet commented Apr 16, 2022 •

edited

Loading

Dolgalad commented Apr 19, 2022 •

edited

Loading