Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content module #36

Open
g12mcgov opened this issue May 10, 2019 · 4 comments
Open

Content module #36

g12mcgov opened this issue May 10, 2019 · 4 comments

Comments

@g12mcgov
Copy link

g12mcgov commented May 10, 2019

Describe the problem

Currently there is no programatic way of accessing Marquee Content through gs_quant. This is a feature proposal to add a content module for interacting with the new Marquee Content API (/v1/content).

Describe the solution you'd like

State of the world:

At the time of writing, there are two primary means of retrieving content via the Marquee Content API:

Description Method Endpoint Developer Site URL
Get a content piece GET /v1/content/{id} Link
Get many content pieces GET /v1/content Link

Eventually, the entire suite of endpoints will be implemented which will allow querying, searching, updating, and creation of content.

Proposed Solution:

Get Many Contents:

gs_quant should expose out a Content module for supporting the above endpoints.

from gs_quant.content import Content

content = Content()
content.get_many(**kwargs)

# Kwargs correspond to supported query params on the API. I.e:
#
# content.get_many(authorId=<some_author_id>)
# content.get_many(tag=<some_tag>)
# content.get_many(assetId=<some_asset_id>)
# etc...

Get a Single Content:

gs_quant should expose out a Content module for supporting the above endpoints.

from gs_quant.content import Content

content = Content()
content.get('<some_content_id>')

All returned content will be of the form ContentResponse. A link to this object can be found here on the Marquee Developer Site.

By default, all content is Base64 encoded along with the associated MimeType. This allows for transporting the content via JSON, given that we support many different content types (HTML, text, image, PDF, etc...).

A client of gs_quant using that content module might then do:

import base64
from gs_quant.content import Content

content = Content()
response = content.get('<some_content_id>')

text = base64.b64decode(response.content.body)
# <html><h1>blah blah blah</h1></html>

Describe alternatives you've considered

Currently bouncing between the following two implementation styles:

  1. Declare a Content() object (as examples show above) that creates an instance of the class, for doing things like:
content = Content()
content.get()
content.get_many()
...
  1. Go the route of the Dataset model, where the code would look like this:
content = Content('<some_content_id>')
contents = Content('<some_content_id_1>', '<some_content_id_2>', ...)

Not really a fan of this approach for content as I think it's a little awkward / doesn't really provide a fluent API for querying/searching.

Are you willing to contribute
Yes!

Additional context

N/A

@andyphillipsgs @francisg77 @bobbyalex83 @ScottWeinstein

@andrewphillipsn
Copy link
Contributor

The Dataset model is intended to provide an abstraction to multiple data sources, i.e. to allow gs_quant to source data from other places than the Marquee API. For content, we should start with access to the underlying Marquee APIs e.g.

  • create gs_quant.api.gs.content which wraps the existing APIs and allows access to content items
  • determine if we need an abstraction on top of this (gs_quant.content) which can access other content sources. if so, it should be suitably abstracted from the GS internals

Note that in the Dataset example, the ID is to a datasource, not to an individual row. so in your example, it would probably map to a channel, not to a content item. let's get a couple more opinions as well

@francisg77
Copy link
Contributor

API class definitely the best for starting. Agree on the stream comments with datasets - although a Content() item just has to map to an actual piece of content with metadata; perhaps ContentChannel() becomes a first-class object. The question is also then where to functions like 'Get many content pieces' go, similar questions I would imagine for many other APIs:

Two key options as I see it:

  1. Content.get_many() - the typical API model. Convenient as stored on Content piece, blurs the boundary between data items and querying, similar to datasets.
  2. Separate ContentQueryEngine - similar to general server-side development, and assets with SecurityMaster. Clearer item/query distinction, but extra classes and level of indirection

Let's discuss

@g12mcgov
Copy link
Author

g12mcgov commented May 13, 2019

@francisg77 @andyphillipsgs

Already added the abstraction layer you mentioned gs_quant.content, which takes in a provider (in this case GsContentAPI.

As for your other points, I prefer the first option, since it's also consistent with how datasets work currently in the API. Something like:

content = Content(channel='<some_channel>')
content = Content(assetId='<some_asset_id>')
...
# Default with no kwargs will just rely on the content-api doing a default lookup based on who you are.
content = Content()

Then, expose out a method(s) like:

content.get_many(offset=0, limit=10)

This way is nice too because helper class methods could be added to, like content.get_text() to extract the raw text / abstract away the need for base64 decoding etc.

Only issue I see with this is that there's no need for a get-single-content method now, right? (i.e. content.get('<some_id>'), but maybe that's not really an issue.

@g12mcgov
Copy link
Author

@francisg77 @andyphillipsgs

Made an MR with the described changes above in the Gitlab repo. You can find that MR here: https://gitlab.gs.com/marquee/analytics/gs_quant/merge_requests/254

Had to do it internally since I needed to generate the new Content types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants