DataFrame / Panel price handler #58

femtotrader · 2016-06-25T06:26:43Z

According discussion #53

Maybe a more general price handler should be done
This price handler could have

a DataFrame as input (when a backtest for only one ticker is necessary)
a Panel as input (when a backtest with several tickers is given)

Index will be Time

Depending of columns names it could be either

a tick price handler (need at least a column Price or Bid/Ask and optionally Volume(s))
a bar price handler (need at least columns named Open High Low Close)

import pandas as pd
import pandas_datareader.data as web
import datetime
import requests_cache


expire_after = datetime.timedelta(days=3)
session = requests_cache.CachedSession(cache_name='cache', backend='sqlite', expire_after=expire_after)
panel = web.DataReader(["GOOG", "IBM"], "yahoo", session=session)

Panel looks like:

<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 1630 (major_axis) x 2 (minor_axis)
Items axis: Open to Adj Close
Major_axis axis: 2010-01-04 00:00:00 to 2016-06-23 00:00:00
Minor_axis axis: GOOG to IBM

def data_from_dataframe(df):
    for (dt, row) in df.iterrows():
        yield(dt, row)

def data_from_panel(panel):
    for dt, data in panel.transpose(1,0,2).iteritems():
        for (ticker, bar) in data.iteritems():
            yield(dt, ticker, bar)

ticker = "GOOG"
df = panel[:,:,ticker]
for (dt, bar) in data_from_dataframe(df):
    print(dt, ticker, bar)

for (dt, ticker, bar) in data_from_panel(panel):
    print(dt, ticker, bar)

The text was updated successfully, but these errors were encountered:

mhallsmoore · 2016-07-02T07:19:59Z

Is the idea here to separate the loading of the data from the handling of the data? I can see that this makes sense. At the moment it would be necessary to clutter PriceHandlers with a lot of data connection logic for DBs, or CSV loading for files.

femtotrader · 2016-07-02T07:39:33Z

Yes you can load from database a chunk of data using Pandas quite easily and have this data into memory (a DataFrame or a Panel).

See http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html

But it's really important to avoid to load all data into memory if you process tick data (because it can be very long before backtest starts)

An other approach (for DB) could be to use Blaze http://blaze.readthedocs.io/en/latest/index.html

table_uri = 'dialect+driver://user:password@host:port/database::table'
for i, chunk in enumerate(odo(table_uri, chunks(pd.DataFrame), chunksize=chunksize)):
    print("="*5 + " chunk %d " % (i + 1) + "="*5)
    for j, row in chunk.iterrows():
        print(row)

Anyway it's important to process by chunk.

mhallsmoore · 2016-07-02T08:02:54Z

Yeah, I had this problem with QSForex when I wanted to load tick data in day-by-day. I wrote a method that would basically load each day's data and process that, then repeat for the next day. That was manageable.

However we definitely need to move to a streaming model, not only from the point of view of memory, but also as this will ease the transition to live data.

femtotrader · 2016-07-02T09:49:04Z

The problem with this Pandas approach is that if your tick data come in a monthly CSV file (an not daily!!) like TrueFX http://www.truefx.com/ does it take a long to load all tick into memory.
So that's the reason why I did https://github.com/femtotrader/femtotrading/blob/master/femtotrading/data_iterator/monthly_csv_tick.py
So I think we need both approach depending of data when want to process.

femtotrader · 2016-07-06T08:45:48Z

Some code (just for loader... not for handler) that might help:

#!/usr/bin/env python

from abc import ABCMeta, abstractmethod
import six
from collections import OrderedDict
import pandas as pd


class AbstractPandasPriceLoader(object):

    __metaclass__ = ABCMeta

    def __init__(self, filenames):
        if isinstance(filenames, tuple):
            filenames = OrderedDict(filenames)
        self.filenames = filenames

    @abstractmethod
    def _get_df_from_filename(self, filename):
        raise NotImplementedError("Should implement _get_df_from_filename()")

    @property
    def panel(self):
        # if len(self.filenames) <= 1:
        #     raise NotImplementedError("filenames must contain several filenames")
        tickers_data = OrderedDict()
        for ticker, filename in self.filenames.items():
            tickers_data[ticker] = self._get_df_from_filename(filename)
        panel = pd.Panel.from_dict(tickers_data).transpose(2, 1, 0)
        return panel

    @property
    def dataframe(self):
        if len(self.filenames) == 1:
            df = self._get_df_from_filename(list(self.filenames.values())[0])
            return df
        else:
            raise NotImplementedError("filenames must contain only one filename")


class CsvTickPandas(AbstractPandasPriceLoader):
    def _get_df_from_filename(self, filename):
        df = pd.io.parsers.read_csv(
            filename, header=0, parse_dates=True,
            dayfirst=True, index_col=1,
            names=("Ticker", "Time", "Bid", "Ask")
        )
        df.index.name = "Date"
        del df["Ticker"]
        return df


class CsvOHLCVPandas(AbstractPandasPriceLoader):
    def _get_df_from_filename(self, filename):
        df = pd.io.parsers.read_csv(
            filename, header=0, parse_dates=True,
            index_col=0, names=(
                "Date", "Open", "High", "Low",
                "Close", "Volume", "Adj Close"
            )
        )
        return df

pd.set_option("max_rows", 10)

# filenames = OrderedDict([
#     ("GOOG", "data/GOOG.csv"),
# ])
# filenames = ("GOOG", "data/GOOG.csv"),
# csv_tick_pandas = CsvTickPandas(filenames)
# df = csv_tick_pandas.dataframe
# print(df)

# filenames = OrderedDict([
#     ("GOOG", "data/GOOG.csv"),
#     ("MSFT", "data/MSFT.csv")
# ])
# csv_tick_pandas = CsvTickPandas(filenames)
# panel = csv_tick_pandas.panel
# print(panel)

filenames = OrderedDict([
    ("SP500TR", "data/SP500TR.csv"),
])
csv_ohlcv_pandas = CsvOHLCVPandas(filenames)
df = csv_ohlcv_pandas.dataframe
print(df)
# panel = csv_ohlcv_pandas.panel
# print(panel)

mhallsmoore added the enhancement label Jul 2, 2016

femtotrader mentioned this issue Jul 23, 2016

Price Iterator, Pandas BarEvent TickEvent Iterator (for DataFrame and Panel) #103

Merged

mhallsmoore closed this as completed in #103 Aug 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame / Panel price handler #58

DataFrame / Panel price handler #58

femtotrader commented Jun 25, 2016 •

edited

Loading

mhallsmoore commented Jul 2, 2016

femtotrader commented Jul 2, 2016 •

edited

Loading

mhallsmoore commented Jul 2, 2016

femtotrader commented Jul 2, 2016 •

edited

Loading

femtotrader commented Jul 6, 2016

DataFrame / Panel price handler #58

DataFrame / Panel price handler #58

Comments

femtotrader commented Jun 25, 2016 • edited Loading

mhallsmoore commented Jul 2, 2016

femtotrader commented Jul 2, 2016 • edited Loading

mhallsmoore commented Jul 2, 2016

femtotrader commented Jul 2, 2016 • edited Loading

femtotrader commented Jul 6, 2016

femtotrader commented Jun 25, 2016 •

edited

Loading

femtotrader commented Jul 2, 2016 •

edited

Loading

femtotrader commented Jul 2, 2016 •

edited

Loading