You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Checklist to complete before beginning development
No development should be done on a Provider API Script until the following info is gathered:
Verify there is a way to retrieve the entire relevant portion of the provider's collection in a systematic way via their API.
The key data the API returns is the "img_id" (foreign_identifier) from which we can get title, author, foreign_landing_url, and more by scraping the landing page of each image.
Verify the API provides license info (license type and version; license URL provides both, and is preferred)
The license is the same for all images: CC0
Verify the API provides stable direct links to individual works.
Not found a link to the full image. An alternative is to use the CDN link with width 960px.
Verify the API provides a stable landing page URL to individual works.
Not directly from API but obtained from: https://stocksnap.io/photo/{img_id}
Note other info the API provides, such as thumbnails, dimensions, attribution info (required if non-CC0 licenses will be kept), title, description, other meta data, tags, etc.
Provides: width, height, downloads, page_views, tags/keywords (seem to be the same content, first is a string and the second is an array)
Attach example responses to API queries that have the relevant info.
The script should be in the src/cc_catalog_airflow/dags/provider_api_scripts/ directory.
The script should have a test suite in the same directory.
The script must use the ImageStore class (Import this from src/cc_catalog_airflow/dags/provider_api_scripts/common/storage/image.py).
The script should use the DelayedRequester class (Import this from src/cc_catalog_airflow/dags/provider_api_scripts/common/requester.py).
The script must not use anything from src/cc_catalog_airflow/dags/provider_api_scripts/modules/etlMods.py, since
that module is deprecated.
If the provider API has can be queried by 'upload date' or something similar,
the script should take a --date parameter when run as a script, giving the
date for which we should collect images. The form should be YYYY-MM-DD (so,
the script can be run via python my_favorite_provider.py --date 2018-01-01).
The script must provide a main function that takes the same parameters as from
the CLI. In our example from above, we'd then have a main function my_favorite_provider.main(date). The main should do the same thing calling
from the CLI would do.
The script must conform to PEP8. Please use pycodestyle (available via pip install pycodestyle) to check for compliance.
The script should use small, testable functions.
The test suite for the script may break PEP8 rules regarding long lines where
appropriate (e.g., long strings for testing).
Examples of other Provider API Scripts
For example Provider API Scripts and accompanying test suites, please see
src/cc_catalog_airflow/dags/provider_api_scripts/flickr.py and
src/cc_catalog_airflow/dags/provider_api_scripts/test_flickr.py, or
src/cc_catalog_airflow/dags/provider_api_scripts/wikimedia_commons.py and
Provider API Endpoint / Documentation
The generic GET url form of the Internal API is the following:
http[s]://stocksnap.io/api/load-photos/date/{asc|desc}/{page}
Documentation not disclosed.
Provider description
StockSnap offers around 33k of beautiful, high quality stock photos for just about any use, allowing download and edition from its site.
Licenses Provided
All images are under the Creative Commons CC0 license. Ref.
Provider API Technical info
Photos appear to be sorted by date and each page returns 40 records maximum. No authentication required.
CDN:
https://cdn.stocksnap.io/img-thumbs/{960w|280h}/{img_id}.jpg
Checklist to complete before beginning development
No development should be done on a Provider API Script until the following info is gathered:
The key data the API returns is the "img_id" (foreign_identifier) from which we can get title, author, foreign_landing_url, and more by scraping the landing page of each image.
The license is the same for all images: CC0
Not found a link to the full image. An alternative is to use the CDN link with width 960px.
Not directly from API but obtained from:
https://stocksnap.io/photo/{img_id}
Provides: width, height, downloads, page_views, tags/keywords (seem to be the same content, first is a string and the second is an array)
Sample response
https://stocksnap.io/api/load-photos/date/desc/30
General Recommendations for implementation
src/cc_catalog_airflow/dags/provider_api_scripts/
directory.ImageStore
class (Import this fromsrc/cc_catalog_airflow/dags/provider_api_scripts/common/storage/image.py
).DelayedRequester
class (Import this fromsrc/cc_catalog_airflow/dags/provider_api_scripts/common/requester.py
).src/cc_catalog_airflow/dags/provider_api_scripts/modules/etlMods.py
, sincethat module is deprecated.
the script should take a
--date
parameter when run as a script, giving thedate for which we should collect images. The form should be
YYYY-MM-DD
(so,the script can be run via
python my_favorite_provider.py --date 2018-01-01
).the CLI. In our example from above, we'd then have a main function
my_favorite_provider.main(date)
. The main should do the same thing callingfrom the CLI would do.
pycodestyle
(available viapip install pycodestyle
) to check for compliance.appropriate (e.g., long strings for testing).
Examples of other Provider API Scripts
For example Provider API Scripts and accompanying test suites, please see
src/cc_catalog_airflow/dags/provider_api_scripts/flickr.py
andsrc/cc_catalog_airflow/dags/provider_api_scripts/test_flickr.py
, orsrc/cc_catalog_airflow/dags/provider_api_scripts/wikimedia_commons.py
andsrc/cc_catalog_airflow/dags/provider_api_scripts/test_wikimedia_commons.py
.The text was updated successfully, but these errors were encountered: