diff --git a/README.md b/README.md index 85700e0..9e241cf 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,10 @@ # LOBFrame -We release `LOBFrame' (see the [paper](https://arxiv.org/abs/2403.09267v1)), a novel, open-source code base which presents a renewed way to process large-scale Limit Order Book (LOB) data. This framework integrates all the latest cutting-edge insights from scientific research (see [Lucchese et al.](https://www.sciencedirect.com/science/article/pii/S0169207024000062), [Prata et al.](https://arxiv.org/pdf/2308.01915.pdf)) into a cohesive system. Its strength lies in the comprehensive nature of the implemented pipeline, which includes the data transformation and processing stage, an ultra-fast implementation of the training, validation, and testing steps, as well as the evaluation of the quality of a model's outputs through trading simulations. Moreover, it offers flexibility by accommodating the integration of new models, ensuring adaptability to future advancements in the field. +We release `LOBFrame' (see the two papers [`Deep Limit Order Book Forecasting'](https://arxiv.org/abs/2403.09267) and [`HLOB - Structure and Persistence of Information in Limit Order Books'](https://arxiv.org/abs/2405.18938)), a novel, open-source code base which presents a renewed way to process large-scale Limit Order Book (LOB) data. This framework integrates all the latest cutting-edge insights from scientific research (see [Lucchese et al.](https://www.sciencedirect.com/science/article/pii/S0169207024000062), [Prata et al.](https://arxiv.org/pdf/2308.01915.pdf)) into a cohesive system. Its strength lies in the comprehensive nature of the implemented pipeline, which includes the data transformation and processing stage, an ultra-fast implementation of the training, validation, and testing steps, as well as the evaluation of the quality of a model's outputs through trading simulations. Moreover, it offers flexibility by accommodating the integration of new models, ensuring adaptability to future advancements in the field. ## Introduction -In this tutorial, we show how to replicate the experiments presented in the paper titled __"Deep Limit Order Book Forecasting: A microstructural guide"__. +In this tutorial, we show how to replicate the experiments presented in the two papers titled __"Deep Limit Order Book Forecasting: A microstructural guide"__ and __"HLOB - Structure and persistence of Information in Limit Order Books"__. Before starting, please remember to **ALWAYS CITE OUR WORK** as follows: @@ -17,6 +17,17 @@ Before starting, please remember to **ALWAYS CITE OUR WORK** as follows: } ``` +``` +@misc{briola2024hlob, + title={HLOB -- Information Persistence and Structure in Limit Order Books}, + author={Antonio Briola and Silvia Bartolucci and Tomaso Aste}, + year={2024}, + eprint={2405.18938}, + archivePrefix={arXiv}, + primaryClass={q-fin.TR} +} +``` + ## Pre-requisites Install the required packages: @@ -25,6 +36,12 @@ Install the required packages: pip3 install -r requirements.txt ``` +If you are using a MacOS operating system, please proceed as follows: + +```bash +pip3 install -r requirements_mac_os.txt +``` + ## Data All the code in this repository exploits [LOBSTER](https://lobsterdata.com) data. To have an overview on their structure, please refer to the official documentation available at the following [link](https://lobsterdata.com/info/DataStructure.php). @@ -49,10 +66,26 @@ To start an experiment from scratch, you need to follow these steps: ```bash python3 main --training_stocks "CSCO" --target_stocks "CSCO" --stages "torch_dataset_preparation,torch_dataset_preparation_backtest" --prediction_horizon 10 ``` +- If you are planning to use the HLOB model (see the paper titled [`HLOB - Structure and Persistence of Information in Limit Order Books'](https://arxiv.org/abs/2405.18938)), it is mandatory to execute the following command: + ```bash + python3 main --training_stocks "CSCO" --target_stocks "CSCO" --stages "complete_homological_structures_preparation" + ``` - Run the following command to train the model: ```bash python3 main --training_stocks "CSCO" --target_stocks "CSCO" --stages "training" ``` + Please notice that the currently available models are: + - deeplob + - transformer + - itransformer + - lobtransformer + - dla + - cnn1 + - cnn2 + - binbtabl + - binctabl + - axiallob + - hlob - Run the following command to evaluate the model: ```bash python3 main --training_stocks "CSCO" --target_stocks "CSCO" --experiment_id "" --stages "evaluation" @@ -90,6 +123,7 @@ We now provide the typical structure of a folder before an experiment's run: ├── data_processing │   ├── data_process.py │   └── data_process_utils.py +│   └── complete_homological_utils.py ├── loaders │   └── custom_dataset.py ├── loggers @@ -118,6 +152,8 @@ We now provide the typical structure of a folder before an experiment's run: │   └── tabl_layer.py │   ├── Transformer │   └── transformer.py +|   ├── CompleteHCNN +│   └── complete_hcnn.py ├── optimizers │   ├── executor.py │   └── lightning_batch_gd.py diff --git a/data_processing/complete_homological_utils.py b/data_processing/complete_homological_utils.py new file mode 100644 index 0000000..ba09695 --- /dev/null +++ b/data_processing/complete_homological_utils.py @@ -0,0 +1,321 @@ +import glob +import concurrent.futures +from itertools import chain + +import pandas as pd +import numpy as np +import polars as pl +from typing import * + +import networkx as nx +from fast_tmfg import * +from sklearn.metrics import mutual_info_score + +from utils import get_training_test_stocks_as_string +import matplotlib.pyplot as plt +import seaborn as sns + +import torch + + +def compute_pairwise_mi(df: pd.DataFrame, n_bins: int = 3000) -> pd.DataFrame: + """ + Compute the pairwise mutual information between the columns of a dataframe. + + Parameters + ---------- + df : pandas.Dataframe + The pandas dataframe to compute the pairwise mutual information for. + n_bins: int + The number of bins to use for discretization. + + Returns + ---------- + mi_matrix: pandas.Dataframe + The pairwise mutual information matrix. + + """ + + shuffled_df = df.sample(frac=1, random_state=1).reset_index(drop=True) # Shuffle the dataset. + sampled_df = shuffled_df.sample(n=len(df), replace=True) # Perform bootstrapping. + df = sampled_df.copy() # Copy the dataset into a variable called 'df'. + df.reset_index(drop=True, inplace=True) # Reset the indices. + del sampled_df # Delete an unused variable. + + flat_series = df.values.flatten() # Flat the df to perform a binning on all the values (not feature-by-feature). + bins = pd.cut(flat_series, bins=n_bins, labels=False, retbins=True) # Perform the binning. + # Apply the binning to each feature of the original dataset. + for column in df.columns: + df[column] = pd.cut(df[column], bins=bins[1], labels=False, include_lowest=True) + del flat_series # Delete an unused variable. + + discretized_df = df.copy() # Copy the dataset into a variable called 'discretized_df'. + del df # Delete an unused variable. + + # Initialize an empty Mutual Information (MI) matrix and fill it with 0s. + n_features = discretized_df.shape[1] + mi_matrix = np.zeros((n_features, n_features)) + + # Compute the pairwise MI and fill the MI matrix consequently. + for i in range(n_features): + for j in range(i, n_features): + mi_value = mutual_info_score( + discretized_df.iloc[:, i], discretized_df.iloc[:, j] + ) + mi_matrix[i, j] = mi_value + mi_matrix[j, i] = mi_value + + mi_matrix = pd.DataFrame(mi_matrix) # Transform the MI matrix into a Pandas dataframe. + return mi_matrix # Return the MI matrix in the form of a Pandas dataframe. + + +def process_file( + file: str, +) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame, nx.Graph, nx.Graph, nx.Graph]: + """ + Compute the TMFG for volumes of a given orderbook file. + + Parameters + ---------- + file : str + The path to the file to compute the TMFG for. + + Returns + ---------- + sim_ask : pandas.DataFrame + The pairwise mutual information matrix for the ask volumes. + sim_bid : pandas.DataFrame + The pairwise mutual information matrix for the bid volumes. + sim_all : pandas.DataFrame + The pairwise mutual information matrix for the ask and bid volumes. + net_ask : networkx.Graph + The TMFG for the ask volumes. + net_bid : networkx.Graph + The TMFG for the bid volumes. + net_all : networkx.Graph + The TMFG for the ask and bid volumes. + """ + + print(f"Computing structure for file: {file}...") + # Read the file using polars to accelerate the process. + df = pl.read_csv(file) + df = df.to_pandas() + + # Extract the volumes for the ask and bid sides. + volumes_all = df.iloc[:, 1:41].iloc[:, 1::2] + + # Compute the pairwise mutual information matrices. + sim_all = compute_pairwise_mi(volumes_all) + + # Compute the TMFGs. + model_all = TMFG() + cliques_all, seps_all, adj_matrix_all = model_all.fit_transform( + sim_all, output="weighted_sparse_W_matrix" + ) + + # Convert the adjacency matrices to networkx graphs. + net_all = nx.from_numpy_array(adj_matrix_all) + + return sim_all, net_all, file + + +def mean_tmfg(sm_list: List[pd.DataFrame]) -> pd.DataFrame: + """ + Compute the average similarity matrix for a list of similarity matrices. + + Parameters + ---------- + sm_list : List[pandas.DataFrame] + The list of similarity matrices to compute the average for. + + Returns + ---------- + average_matrix : pandas.DataFrame + The average similarity matrix. + """ + + # Stack the matrices along a new axis (axis=0) + stacked_matrices = np.stack(sm_list, axis=0) + + # Calculate the entry-wise average along the new axis + average_matrix = np.mean(stacked_matrices, axis=0) + np.fill_diagonal(average_matrix, 0) + + average_matrix = pd.DataFrame(average_matrix) + + ''' + plt.figure(figsize=(10, 8)) # Optional: Adjusts the size of the figure + sns.heatmap(average_matrix, annot=True, fmt=".2f", cmap='coolwarm', square=True, linewidths=.5) + plt.title("Correlation Matrix Heatmap") + plt.show() + ''' + + return average_matrix + + +def extract_components( + cliques: List[List[int]], separators: List[List[int]], adjacency_matrix: np.ndarray +) -> Tuple[List[List[int]], List[List[int]], List[List[int]]]: + """ + Given the cliques, separators and adjacency matrix of a TMFG, extract the b-cliques of size 2 (edges), 3 (triangles) and 4 (tetrahedra). + + Parameters + ---------- + cliques : List[int] + The list of cliques of the TMFG. + separators : List[int] + The list of separators of the TMFG. + adjacency_matrix : numpy.ndarray + The adjacency matrix of the TMFG. + + Returns + ---------- + final_b_cliques_4 : List[List[int]] + The final list of tetrahera. + final_b_cliques_3 : List[List[int]] + The final list of triangles. + final_b_cliques_2 : List[List[int]] + The final list of edges. + """ + + # Extract edges. + edges = [] + adjacency_matrix = nx.from_numpy_array(adjacency_matrix) + + for i in nx.enumerate_all_cliques(adjacency_matrix): + if len(i) == 2: + edges.append(sorted(i)) + + b_cliques_4 = [] + b_cliques_3 = [] + b_cliques_2 = [] + + b_cliques_all = nx.enumerate_all_cliques(adjacency_matrix) + + for i in b_cliques_all: + if len(i) == 2: + b_cliques_2.append(sorted(i)) + if len(i) == 3: + b_cliques_3.append(sorted(i)) + if len(i) == 4: + b_cliques_4.append(sorted(i)) + + final_b_cliques_4 = b_cliques_4 + + final_b_cliques_3 = b_cliques_3 + + final_b_cliques_2 = edges + + final_b_cliques_4 = [[(x * 2) + 1 for x in sublist] for sublist in final_b_cliques_4] + final_b_cliques_4 = [[x, x - 1] for sublist in final_b_cliques_4 for x in sublist] + final_b_cliques_4 = list(chain.from_iterable(final_b_cliques_4)) + final_b_cliques_4 = [final_b_cliques_4[i:i + 8] for i in range(0, len(final_b_cliques_4), 8)] + final_b_cliques_4 = [sorted(sublist) for sublist in final_b_cliques_4] + + final_b_cliques_3 = [[(x * 2) + 1 for x in sublist] for sublist in final_b_cliques_3] + final_b_cliques_3 = [[x, x - 1] for sublist in final_b_cliques_3 for x in sublist] + final_b_cliques_3 = list(chain.from_iterable(final_b_cliques_3)) + final_b_cliques_3 = [final_b_cliques_3[i:i + 6] for i in range(0, len(final_b_cliques_3), 6)] + final_b_cliques_3 = [sorted(sublist) for sublist in final_b_cliques_3] + + final_b_cliques_2 = [[(x * 2) + 1 for x in sublist] for sublist in final_b_cliques_2] + final_b_cliques_2 = [[x, x - 1] for sublist in final_b_cliques_2 for x in sublist] + final_b_cliques_2 = list(chain.from_iterable(final_b_cliques_2)) + final_b_cliques_2 = [final_b_cliques_2[i:i + 4] for i in range(0, len(final_b_cliques_2), 4)] + final_b_cliques_2 = [sorted(sublist) for sublist in final_b_cliques_2] + + return final_b_cliques_4, final_b_cliques_3, final_b_cliques_2 + + +def execute_pipeline(file_patterns, general_hyperparameters): + files = [] + for pattern in file_patterns: + files.extend(glob.glob(pattern.format(dataset={general_hyperparameters['dataset']}))) + + max_threads = 5 + with concurrent.futures.ThreadPoolExecutor(max_threads) as executor: + results = list(executor.map(process_file, files)) + + nets_all = [] + sm_all = [] + files_all = [] + + for result in results: + sim_all, net_all, file = result + nets_all.append(net_all) + sm_all.append(sim_all) + files_all.append(file) + + del results + + model_all = TMFG() + cliques_all, seps_all, adj_matrix_all = model_all.fit_transform( + mean_tmfg(sm_all), output="weighted_sparse_W_matrix" + ) + + c4, c3, c2 = extract_components(cliques_all, seps_all, adj_matrix_all) + c4 = list(chain.from_iterable(c4)) + c3 = list(chain.from_iterable(c3)) + c2 = list(chain.from_iterable(c2)) + + original_cliques_all = list(chain.from_iterable(cliques_all)) + original_seps_all = list(chain.from_iterable(seps_all)) + + return c4, c3, c2, original_cliques_all, original_seps_all, adj_matrix_all, sm_all, files_all + + +def get_complete_homology( + general_hyperparameters: Dict[str, Any], + model_hyperparameters: Dict[str, Any], +) -> Dict[str, List[List[int]]]: + """ + Compute the homological structures to be used in the HCNN building process. + + Parameters + ---------- + general_hyperparameters : Dict[str, Any] + The general hyperparameters of the experiment. + + Returns + ---------- + homological_structures : Dict[str, List[List[int]]] + """ + + file_patterns_training = [f"./data/{general_hyperparameters['dataset']}/unscaled_data/training/*{element}*.csv" for element in + general_hyperparameters['training_stocks']] + c4_training, c3_training, c2_training, original_cliques_all_training, original_seps_all_training, adj_matrix_all_training, sm_all_training, files_all_training = execute_pipeline( + file_patterns_training, general_hyperparameters) + + file_patterns_validation = [f"./data/{general_hyperparameters['dataset']}/unscaled_data/validation/*{element}*.csv" for element in + general_hyperparameters['training_stocks']] + _, _, _, _, _, adj_matrix_all_validation, sm_all_validation, files_all_validation = execute_pipeline(file_patterns_validation, general_hyperparameters) + + file_patterns_test = [f"./data/{general_hyperparameters['dataset']}/unscaled_data/test/*{element}*.csv" for element in + general_hyperparameters['target_stocks']] + _, _, _, _, _, adj_matrix_all_test, sm_all_test, files_all_test = execute_pipeline(file_patterns_test, general_hyperparameters) + + homological_structures = {"tetrahedra": c4_training, + "triangles": c3_training, + "edges": c2_training, + "original_cliques": original_cliques_all_training, + "original_separators": original_seps_all_training, + "adj_matrix_training": adj_matrix_all_training, + "similarity_matrices_training": sm_all_training, + "files_training": files_all_training, + "adj_matrix_validation": adj_matrix_all_validation, + "similarity_matrices_validation": sm_all_validation, + "files_validation": files_all_validation, + "adj_matrix_test": adj_matrix_all_test, + "similarity_matrices_test": sm_all_test, + "files_test": files_all_test + } + + training_stocks_string, test_stocks_string = get_training_test_stocks_as_string(general_hyperparameters) + print(training_stocks_string, test_stocks_string) + torch.save(homological_structures, + f"./torch_datasets/threshold_{model_hyperparameters['threshold']}/batch_size_{model_hyperparameters['batch_size']}/training_{training_stocks_string}_test_{test_stocks_string}/complete_homological_structures.pt") + # torch.save(homological_structures, + # f"./torch_datasets/threshold_{model_hyperparameters['threshold']}/batch_size_{model_hyperparameters['batch_size']}/homological_structures_large_tick_stocks.pt") + print('Homological structures have been saved.') + +# get_homology({'dataset': 'nasdaq'}) diff --git a/main.py b/main.py index ffb3b4e..c2d80d9 100644 --- a/main.py +++ b/main.py @@ -9,6 +9,7 @@ parse_args, create_hyperparameters_yaml, ) +from data_processing.complete_homological_utils import get_complete_homology if __name__ == "__main__": # Parse input arguments. @@ -85,6 +86,9 @@ experiment_id, general_hyperparameters, model_hyperparameters, torch_dataset_preparation=False, torch_dataset_preparation_backtest=True ) + if "complete_homological_structures_preparation" in general_hyperparameters["stages"]: + get_complete_homology(general_hyperparameters=general_hyperparameters, model_hyperparameters=model_hyperparameters) + # For the 'training' and 'evaluation' stages, instantiate the executor with proper arguments. if ( "training" in general_hyperparameters["stages"] diff --git a/models/CompleteHCNN/complete_hcnn.py b/models/CompleteHCNN/complete_hcnn.py new file mode 100644 index 0000000..5448ece --- /dev/null +++ b/models/CompleteHCNN/complete_hcnn.py @@ -0,0 +1,135 @@ +import pytorch_lightning as pl +import torch +import torch.nn as nn + + +class Complete_HCNN(pl.LightningModule): + def __init__(self, lighten, homological_structures): + super().__init__() + self.name = "hcnn" + if lighten: + self.name += "-lighten" + + self.homological_structures = homological_structures + self.tetrahedra = self.homological_structures['tetrahedra'] + self.triangles = self.homological_structures['triangles'] + self.edges = self.homological_structures['edges'] + + # ------------ # + + self.conv1_tetrahedra = nn.Sequential( + nn.Conv2d( + in_channels=1, out_channels=32, kernel_size=(1, 2), stride=(1, 2) + ), + nn.ReLU(), + ) + + self.conv1_triangles = nn.Sequential( + nn.Conv2d( + in_channels=1, out_channels=32, kernel_size=(1, 2), stride=(1, 2) + ), + nn.ReLU(), + ) + + self.conv1_edges = nn.Sequential( + nn.Conv2d( + in_channels=1, out_channels=32, kernel_size=(1, 2), stride=(1, 2) + ), + nn.ReLU(), + ) + + # ------------ # + + self.conv2_tetrahedra = nn.Sequential( + nn.Conv2d( + in_channels=32, out_channels=32, kernel_size=(1, 4), stride=(1, 4) + ), + nn.ReLU(), + nn.Conv2d(in_channels=32, out_channels=32, kernel_size=(4, 1)), + nn.ReLU(), + nn.Conv2d(in_channels=32, out_channels=32, kernel_size=(4, 1)), + nn.ReLU(), + ) + + self.conv2_triangles = nn.Sequential( + nn.Conv2d( + in_channels=32, out_channels=32, kernel_size=(1, 3), stride=(1, 3) + ), + nn.ReLU(), + nn.Conv2d(in_channels=32, out_channels=32, kernel_size=(4, 1)), + nn.ReLU(), + nn.Conv2d(in_channels=32, out_channels=32, kernel_size=(4, 1)), + nn.ReLU(), + ) + + self.conv2_edges = nn.Sequential( + nn.Conv2d( + in_channels=32, out_channels=32, kernel_size=(1, 2), stride=(1, 2) + ), + nn.ReLU(), + nn.Conv2d(in_channels=32, out_channels=32, kernel_size=(4, 1)), + nn.ReLU(), + nn.Conv2d(in_channels=32, out_channels=32, kernel_size=(4, 1)), + nn.ReLU(), + ) + + # ------------ # + + self.conv3_tetrahedra = nn.Sequential( + nn.Conv2d( + in_channels=32, out_channels=32, kernel_size=(1, int(len(self.tetrahedra) / 8)) + ), + nn.Dropout(0.35), + nn.ReLU(), + ) + + self.conv3_triangles = nn.Sequential( + nn.Conv2d( + in_channels=32, out_channels=32, kernel_size=(1, int(len(self.triangles) / 6)) + ), + nn.Dropout(0.35), + nn.ReLU(), + ) + + self.conv3_edges = nn.Sequential( + nn.Conv2d( + in_channels=32, out_channels=32, kernel_size=(1, int(len(self.edges) / 4)) + ), + nn.Dropout(0.35), + nn.ReLU(), + ) + + # ------------ # + + self.lstm = nn.LSTM( + input_size=96, hidden_size=32, num_layers=1, batch_first=True + ) + self.fc1 = nn.Linear(32, 3) + + def forward(self, x): + x_tetrahedra = x[:, :, :, self.tetrahedra] + x_triangles = x[:, :, :, self.triangles] + x_edges = x[:, :, :, self.edges] + + x_tetrahedra = self.conv1_tetrahedra(x_tetrahedra) + x_triangles = self.conv1_triangles(x_triangles) + x_edges = self.conv1_edges(x_edges) + + x_tetrahedra = self.conv2_tetrahedra(x_tetrahedra) + x_triangles = self.conv2_triangles(x_triangles) + x_edges = self.conv2_edges(x_edges) + + x_tetrahedra = self.conv3_tetrahedra(x_tetrahedra) + x_triangles = self.conv3_triangles(x_triangles) + x_edges = self.conv3_edges(x_edges) + + x = torch.cat((x_tetrahedra, x_triangles, x_edges), dim=1) + + x = x.permute(0, 2, 1, 3) + x = torch.reshape(x, (-1, x.shape[1], x.shape[2])) + + x, _ = self.lstm(x) + x = x[:, -1, :] + logits = self.fc1(x) + + return logits diff --git a/optimizers/executor.py b/optimizers/executor.py index 4645512..1be2d1e 100644 --- a/optimizers/executor.py +++ b/optimizers/executor.py @@ -12,6 +12,7 @@ from models.CNN2.cnn2 import CNN2 from models.AxialLob.axiallob import AxialLOB from models.TABL.bin_tabl import BiN_BTABL, BiN_CTABL +from models.CompleteHCNN.complete_hcnn import Complete_HCNN from optimizers.lightning_batch_gd import BatchGDManager from loggers import logger from utils import create_tree, get_training_test_stocks_as_string @@ -50,6 +51,9 @@ def __init__(self, experiment_id, general_hyperparameters, model_hyperparameters self.model = BiN_CTABL(120, 40, 100, 5, 120, 5, 3, 1) elif general_hyperparameters["model"] == "axiallob": self.model = AxialLOB() + elif general_hyperparameters["model"] == "hlob": + homological_structures = torch.load(f"./torch_datasets/threshold_{model_hyperparameters['threshold']}/batch_size_{model_hyperparameters['batch_size']}/training_{self.training_stocks_string}_test_{self.test_stocks_string}/complete_homological_structures.pt") + self.model = Complete_HCNN(lighten=model_hyperparameters["lighten"], homological_structures=homological_structures) if self.torch_dataset_preparation: # Prepare the training dataloader. diff --git a/requirements_mac_os.txt b/requirements_mac_os.txt new file mode 100644 index 0000000..49d5546 --- /dev/null +++ b/requirements_mac_os.txt @@ -0,0 +1,101 @@ +aiohttp==3.8.5 +aiosignal==1.3.1 +annotated-types==0.5.0 +anyio==3.7.1 +appdirs==1.4.4 +arrow==1.2.3 +async-timeout==4.0.3 +attrs==23.1.0 +backoff==2.2.1 +beautifulsoup4==4.12.2 +blessed==1.20.0 +certifi==2023.7.22 +charset-normalizer==3.2.0 +click==8.1.7 +cmake==3.25.0 +contourpy==1.1.1 +croniter==1.4.1 +cycler==0.11.0 +dateutils==0.6.12 +deepdiff==6.5.0 +docker-pycreds==0.4.0 +exceptiongroup==1.1.3 +fast-tmfg==0.0.8 +fastapi==0.103.1 +filelock==3.9.0 +fonttools==4.42.1 +frozenlist==1.4.0 +fsspec==2023.9.2 +gitdb==4.0.10 +GitPython==3.1.37 +h11==0.14.0 +idna==3.4 +importlib-resources==6.1.0 +inquirer==3.1.3 +itsdangerous==2.1.2 +Jinja2==3.1.2 +joblib==1.3.2 +kiwisolver==1.4.5 +lightning==2.0.9 +lightning-cloud==0.5.38 +lightning-utilities==0.9.0 +lit==15.0.7 +markdown-it-py==3.0.0 +MarkupSafe==2.1.2 +matplotlib==3.8.0 +mdurl==0.1.2 +mpmath==1.3.0 +multidict==6.0.4 +networkx==3.0 +numpy==1.26.0 +ordered-set==4.1.0 +packaging==23.1 +pandas==2.1.1 +pathtools==0.1.2 +Pillow==10.0.1 +polars==0.19.5 +protobuf==4.24.3 +psutil==5.9.5 +pyarrow==13.0.0 +pydantic==2.1.1 +pydantic_core==2.4.0 +Pygments==2.16.1 +PyJWT==2.8.0 +pyparsing==3.1.1 +python-dateutil==2.8.2 +python-editor==1.0.4 +python-multipart==0.0.6 +pytorch-lightning==2.0.9 +pytz==2023.3.post1 +PyYAML==6.0.1 +readchar==4.0.5 +requests==2.31.0 +rich==13.5.3 +scikit-learn==1.3.1 +scipy==1.11.3 +seaborn==0.13.2 +sentry-sdk==1.31.0 +setproctitle==1.3.2 +six==1.16.0 +smmap==5.0.1 +sniffio==1.3.0 +soupsieve==2.5 +starlette==0.27.0 +starsessions==1.3.0 +sympy==1.12 +threadpoolctl==3.2.0 +torch==2.0.0 +torchinfo==1.8.0 +torchmetrics==1.2.0 +tqdm==4.66.1 +traitlets==5.10.1 +typing_extensions==4.8.0 +tzdata==2023.3 +urllib3==1.26.16 +uvicorn==0.23.2 +wandb==0.15.11 +wcwidth==0.2.6 +websocket-client==1.6.3 +websockets==11.0.3 +yarl==1.9.2 +zipp==3.17.0 diff --git a/utils.py b/utils.py index 0b78b6c..d046692 100644 --- a/utils.py +++ b/utils.py @@ -470,7 +470,7 @@ def parse_args() -> Any: type=str, default="data_processing", help="Stage(s) to be run (to be expressed in this format: 'training,evaluation').", - ) # data_processing | torch_dataset_preparation | torch_dataset_preparation_backtest | training,evaluation | backtest,post_trading_analysis + ) # data_processing | torch_dataset_preparation | torch_dataset_preparation_backtest | complete_homological_structures_preparation | training,evaluation | backtest,post_trading_analysis parser.add_argument( "--include_target_stock_in_training", type=str2bool,