Skip to content

Commit

Permalink
Merge branch '2.2.0' into release
Browse files Browse the repository at this point in the history
  • Loading branch information
jayvarner committed Oct 21, 2021
2 parents 02bfbf4 + 71a8143 commit 4eac8b4
Show file tree
Hide file tree
Showing 128 changed files with 29,754 additions and 866 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ venv
cert*
*.bk
.ruby-version
.vscode

# Sphinx documentation
docs/_build/
Expand Down
2 changes: 1 addition & 1 deletion .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code
extension-pkg-whitelist=
extension-pkg-whitelist=lxml

# Add files or directories to the blacklist. They should be base names, not
# paths.
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@

CHANGELOG
=========

Release 2.2.0
---------------------
* Stream ingest uploads to S3
* Adds status records for ingest tasks
* Adds bulk ingest
* Adds email notifications for ingest success and failure

Release 2.1.1
---------------------
* Fixes migration conflicts
Expand Down
2 changes: 1 addition & 1 deletion apps/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "2.0.0"
__version__ = "2.2.0"
__version_info__ = tuple(
[
int(num) if num.isdigit() else num
Expand Down
2 changes: 1 addition & 1 deletion apps/cms/wagtail_hooks.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""Add custom .css hook"""
from django.contrib.staticfiles.templatetags.staticfiles import static
from django.templatetags.static import static
from django.utils.html import format_html

from wagtail.core import hooks
Expand Down
2 changes: 1 addition & 1 deletion apps/iiif/annotations/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from django.db.models import signals
from django.core.exceptions import ValidationError
from django.dispatch import receiver
from django.utils.translation import ugettext_lazy as _
from django.utils.translation import gettext as _
from django.contrib.auth import get_user_model
from abc import abstractmethod
from bs4 import BeautifulSoup
Expand Down
173 changes: 41 additions & 132 deletions apps/iiif/canvases/fixtures/alto.xml
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,132 +1,41 @@
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Af-beeldinghe van d'eerste eeuwe der Societeyt Iesu voor ooghen ghestelt door de Duyts-Nederlantsche provincie der seluer societeyt., p. 10</title>
</titleStmt>
<publicationStmt>
<distributor>Emory University Library and Information Technology Services</distributor>
</publicationStmt>
<sourceDesc>
<p>Abbyy file derived from OCR of <bibl>Bolland, Johannes, 1596-1665, Henschenius, Godefridus, 1601-1681, Tollenaere, Jean de, 1582-1643, Poirters, Adrien, 1605-1674, Galle, Cornelis, 1576-1650,, Natalis, Michel, 1610-1668,, Diepenbeeck, Abraham van, 1596-1675,, Plantijnsche Drukkerij. Af-beeldinghe van d'eerste eeuwe der Societeyt Iesu voor ooghen ghestelt door de Duyts-Nederlantsche provincie der seluer societeyt., ['1640'].</bibl></p>
</sourceDesc>
</fileDesc>
</teiHeader>
<facsimile>
<surface xml:id="rdx_b70fm.p.idp330004480" type="page" ulx="0" uly="0" lrx="1674" lry="2096">
<graphic url="http://readux.library.emory.edu/books/emory:b70fm/pages/emory:gz6dp/fullsize/"/>
<zone xml:id="rdx_b70fm.b.idm22336128" type="Text" ulx="916" uly="0" lrx="1006" lry="30">
<zone xml:id="rdx_b70fm.ln.idp333906288" type="line" ulx="916" uly="0" lrx="1006" lry="26">
<line>mm</line>
</zone>
</zone>
<zone xml:id="rdx_b70fm.b.idp123641136" type="Text" ulx="498" uly="174" lrx="1582" lry="1810">
<zone xml:id="rdx_b70fm.ln.idm1228400" type="line" ulx="814" uly="185" lrx="1275" lry="213">
<line>AEN DEN LESIIU</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm1225888" type="line" ulx="520" uly="234" lrx="1554" lry="285">
<line>tnaken, om te fchijnen bouen alle andere te kracycn, en die te mer-</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp333911104" type="line" ulx="523" uly="282" lrx="669" lry="318">
<line>drucken ?</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp124740720" type="line" ulx="574" uly="325" lrx="1556" lry="376">
<line>Of ty dit ergbens in'tbeleydt man dit heel ftuck^, met de minfee</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp333420704" type="line" ulx="520" uly="373" lrx="1557" lry="421">
<line>merfmacdehjckheyt man eenighe andere Orden oft Religte, ghedaen</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp333423392" type="line" ulx="519" uly="420" lrx="1554" lry="469">
<line>hebben, datftellen ley ten oordeele manden onpartijdighen Lepr;</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm15742768" type="line" ulx="524" uly="465" lrx="1556" lry="513">
<line>den Tvelcken bier minden fal d'af-beeldingbe mande eerfie eeulve on-</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp187920144" type="line" ulx="512" uly="512" lrx="1556" lry="561">
<line>fer S octet eyt, die "toy met on fen H. Vader gbeerne kennen de laetfle</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm5771488" type="line" ulx="524" uly="559" lrx="1556" lry="606">
<line>en de minfee te %jjn, onderfoo mele oude ende treffelijcke Or dens Van</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm5768768" type="line" ulx="522" uly="605" lrx="1557" lry="655">
<line>S.Augufinus, Beneditlus, Bernardus, Norbertus ,Domimcwi, Fran-</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm21876352" type="line" ulx="524" uly="653" lrx="1557" lry="701">
<line>cifciis, ende meer andere , die met mcerdere mrucht en glorie inde</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp187860496" type="line" ulx="524" uly="698" lrx="1557" lry="747">
<line>H.Kercke merkeert hebben. 'tis defen gheoorloft ghelveeil 'tvocdt</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp132247568" type="line" ulx="517" uly="747" lrx="1557" lry="795">
<line>gberucbt,dathen naeghingb,en noch heden-fdaeghs molght, als eenen</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp132250288" type="line" ulx="521" uly="792" lrx="1566" lry="843">
<line>toet-feen man bun innerhjcl^ yvefen, aende "Svereldt, nu mondeltjck^</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm22906784" type="line" ulx="525" uly="837" lrx="1557" lry="889">
<line>inde predtkatien, nu fchrtftelijck_ inde gbedruckfe boecken, moor oo-</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp188702896" type="line" ulx="519" uly="887" lrx="1556" lry="935">
<line>ghen te ftellen, om daer aen het goudt manden ijuer en liefde te keu-</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm24969840" type="line" ulx="527" uly="933" lrx="1558" lry="984">
<line>ren, met de loelckefy de glorie Godts en des naefeenfaligbeyt, neffens</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm24967088" type="line" ulx="522" uly="978" lrx="1558" lry="1029">
<line>hunne eygbene molmaecktheyt ghetracht hebben te moorderen. Soa</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp123170448" type="line" ulx="523" uly="1027" lrx="1556" lry="1076">
<line>*n magb het ons dan oock_noch tot blaeme noch tot phande ghedijen,</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp132002336" type="line" ulx="523" uly="1074" lrx="1557" lry="1122">
<line>dat Ivy onfe meeder de Socteteyt, die ons iuffchen feo meel drucks</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm19871616" type="line" ulx="524" uly="1120" lrx="1558" lry="1169">
<line>■en lijdens, foo mele opmallen ende ouerlafeen , feo mele merVolghin-</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm19868944" type="line" ulx="522" uly="1167" lrx="1557" lry="1217">
<line>gben en martehenfihterals eene nae-vrucht op'teynde der Tvereldt,</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp14069088" type="line" ulx="528" uly="1212" lrx="1559" lry="1264">
<line>aen de H.Kercke ghebaert heeft, met eene lof-rijeke danckbaerheyt</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm20114720" type="line" ulx="528" uly="1260" lrx="1559" lry="1309">
<line>oppellen: te mm0ds "dry d'eere ende de glorie man alle haere daden aen</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm20112000" type="line" ulx="531" uly="1308" lrx="1440" lry="1354">
<line>Godt den Hecre a/leen, en met aen oris feluen, toe en eyghenen.</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm20521904" type="line" ulx="578" uly="1354" lrx="1558" lry="1401">
<line>Daerotn pet ghy de Socteteyt m't moor- bladt man dit Boeck. in</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp187991840" type="line" ulx="528" uly="1398" lrx="1559" lry="1449">
<line>pnnte gbeflelt met d'ooghen opTvaerts ten bemel gbeflaghen, tvaerfy</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp187994528" type="line" ulx="532" uly="1446" lrx="1558" lry="1495">
<line>met een' oprechte meymngbe Tvederom benen phickt, al datfe man</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idm15754864" type="line" ulx="534" uly="1493" lrx="1558" lry="1541">
<line>daer ontfangben heeft, als ofse op alles loaer medefy biergheprefen en</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp188444912" type="line" ulx="533" uly="1539" lrx="1557" lry="1587">
<line>*verciert "ioordt, met een' ingbekeertbeyt en Tveer-flagh des herten,</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp132019840" type="line" ulx="533" uly="1585" lrx="1558" lry="1635">
<line>(lommelingb andnvoordde, datfe allefftns moor heeft, Tot meerdere</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp132022528" type="line" ulx="534" uly="1632" lrx="1556" lry="1680">
<line>eere ende glorie Godts. Inde rechte handt houdtfe onfe Conftitu-</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp185187232" type="line" ulx="536" uly="1677" lrx="1556" lry="1726">
<line>tien ende Regbelen; indeflmcke op eenen dry-meet bet kruya met de</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp191130368" type="line" ulx="535" uly="1723" lrx="1554" lry="1774">
<line>bernende ~totrcldt} in de Tpelcke den mierighen ijuer Van S.Ignatius,</line>
</zone>
<zone xml:id="rdx_b70fm.ln.idp191133120" type="line" ulx="1458" uly="1776" lrx="1555" lry="1805">
<line>Xaue-</line>
</zone>
</zone>
</surface>
</facsimile>
</TEI>
<?xml version="1.0" encoding="utf-8"?>
<alto xmlns="http://www.loc.gov/standards/alto/ns-v2#" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v2# http://www.loc.gov/standards/alto/alto.xsd">
<Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName>./P100.tif</fileName>
</sourceImageInformation>
<OCRProcessing ID="IdOcr">
<ocrProcessingStep>
<processingSoftware>
<softwareName>tesseract 4.0.0</softwareName>
</processingSoftware>
</ocrProcessingStep>
</OCRProcessing>
</Description>
<Layout>
<Page ID="page_1" PHYSICAL_IMG_NR="1" HEIGHT="8403" WIDTH="6194">
<PrintSpace HEIGHT="8403" WIDTH="6194" VPOS="0" HPOS="0">
<ComposedBlock ID="block_1_1" HEIGHT="12" WIDTH="901" VPOS="275" HPOS="3126">
<TextBlock ID="par_1_1" HEIGHT="12" WIDTH="901" VPOS="275" HPOS="3126" LANG="ita">
<TextLine ID="line_1_1" HEIGHT="12" WIDTH="901" VPOS="275" HPOS="3126">
<String ID="word_1_29" CONTENT="MAGNA" HEIGHT="164" WIDTH="758" VPOS="1787" HPOS="1894" WC="0.95"/>
<String ID="word_1_30" CONTENT="CAMPI" HEIGHT="140" WIDTH="637" VPOS="1820" HPOS="2763" WC="0.92"/>
<String ID="word_1_31" CONTENT="MARTII" HEIGHT="147" WIDTH="730" VPOS="1834" HPOS="3504" WC="0.87"/>
<String ID="word_1_32" CONTENT="" HEIGHT="3" WIDTH="3" VPOS="1983" HPOS="4355" WC="0.85"/>
</TextLine>
<TextLine ID="line_1_2" HEIGHT="141" WIDTH="2744" VPOS="2006" HPOS="1239">
<String ID="word_1_33" CONTENT="ICHNOGRAPHIA" HEIGHT="119" WIDTH="1228" VPOS="2006" HPOS="1239" WC="0.85"/>
<String ID="word_1_34" CONTENT="DESCRIPTA" HEIGHT="103" WIDTH="858" VPOS="2032" HPOS="2574" WC="0.87"/>
<String ID="word_1_35" CONTENT="SV" HEIGHT="98" WIDTH="185" VPOS="2047" HPOS="3545" WC="0.5"/>
<String ID="word_1_36" CONTENT="NT" HEIGHT="99" WIDTH="227" VPOS="2048" HPOS="3756" WC="0.5"/>
</TextLine>
</TextBlock>
</ComposedBlock>
</PrintSpace>
</Page>
<Page ID="page_2" PHYSICAL_IMG_NR="1" HEIGHT="160" WIDTH="118">
<PrintSpace HEIGHT="160" WIDTH="118" VPOS="0" HPOS="0"/>
</Page>
</Layout>
</alto>
33 changes: 33 additions & 0 deletions apps/iiif/canvases/fixtures/bad_hocr.hocr
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name='ocr-system' content='tesseract 4.0.0' />
<meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word ocrp_wconf'/>
</head>
<body>
<div class='ocr_page' id='page_1' title='image "./P100.tif"; bbox 0 0 6194 8403; ppageno 0'>
<div class='ocr_carea' id='block_1_1' title="bbox 996 1787 4358 2147">
<p class='ocr_par' id='par_1_1' lang='ita' title="bbox 996 1787 4358 2147">
<span class='ocr_line' id='line_1_1' title="bbox 996 1787 4358 1986; unsupported_thing 1; baseline 0.014 -47; x_size 211; x_descenders 48; x_ascenders 33">
<span class='ocrx_word' id='word_1_1' title='bbox 1894 1787 2652 1951; x_wconf 95'>MAGNA</span>
<span class='ocrx_word' id='word_1_2' title='bbox 2763 1820 3400 1960; x_wconf 92'>CAMPI</span>
<span class='ocrx_word' id='word_1_3' title='bbox 3504 1834 4234 1981; x_wconf 87'>MARTII</span>
<span class='ocrx_word' id='word_1_4' title='bbox 4355 1983 4358 1986; x_wconf 85'>—</span>
</span>
<span class='ocr_line' id='line_1_2' title="bbox 1239 2006 3983 2147; unsupported_thing 2; baseline 0.012 -31; x_size 138.68248; x_descenders 20.682476; x_ascenders 24">
<span class='ocrx_word' id='word_1_5' title='bbox 1239 2006 2467 2125; x_wconf 85'>ICHNOGRAPHIA</span>
<span class='ocrx_word' id='word_1_6' title='bbox 2574 2032 3432 2135; x_wconf 87'>DESCRIPTA</span>
<span class='ocrx_word' id='word_1_7' title='bbox 3545 2047 3730 2145; x_wconf 50'>SV</span>
<span class='ocrx_word' id='word_1_8' title='bbox 3756 2048 3983 2147; x_wconf 50'>NT</span>
</span>
</p>
</div>
</div>
<div class='ocr_page' id='page_2' title='image "./P100.tif"; bbox 0 0 118 160; ppageno 1'>
</div>
</body>
</html>
File renamed without changes.
33 changes: 33 additions & 0 deletions apps/iiif/canvases/fixtures/hocr.hocr
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name='ocr-system' content='tesseract 4.0.0' />
<meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word ocrp_wconf'/>
</head>
<body>
<div class='ocr_page' id='page_1' title='image "./P100.tif"; bbox 0 0 6194 8403; ppageno 0'>
<div class='ocr_carea' id='block_1_1' title="bbox 996 1787 4358 2147">
<p class='ocr_par' id='par_1_1' lang='ita' title="bbox 996 1787 4358 2147">
<span class='ocr_line' id='line_1_1' title="bbox 996 1787 4358 1986; baseline 0.014 -47; x_size 211; x_descenders 48; x_ascenders 33">
<span class='ocrx_word' id='word_1_1' title='bbox 1894 1787 2652 1951; x_wconf 95'>MAGNA</span>
<span class='ocrx_word' id='word_1_2' title='bbox 2763 1820 3400 1960; x_wconf 92'>CAMPI</span>
<span class='ocrx_word' id='word_1_3' title='bbox 3504 1834 4234 1981; x_wconf 87'>MARTII</span>
<span class='ocrx_word' id='word_1_4' title='bbox 4355 1983 4358 1986; x_wconf 85'>—</span>
</span>
<span class='ocr_line' id='line_1_2' title="bbox 1239 2006 3983 2147; baseline 0.012 -31; x_size 138.68248; x_descenders 20.682476; x_ascenders 24">
<span class='ocrx_word' id='word_1_5' title='bbox 1239 2006 2467 2125; x_wconf 85'>ICHNOGRAPHIA</span>
<span class='ocrx_word' id='word_1_6' title='bbox 2574 2032 3432 2135; x_wconf 87'>DESCRIPTA</span>
<span class='ocrx_word' id='word_1_7' title='bbox 3545 2047 3730 2145; x_wconf 50'>SV</span>
<span class='ocrx_word' id='word_1_8' title='bbox 3756 2048 3983 2147; x_wconf 50'>NT</span>
</span>
</p>
</div>
</div>
<div class='ocr_page' id='page_2' title='image "./P100.tif"; bbox 0 0 118 160; ppageno 1'>
</div>
</body>
</html>
Loading

0 comments on commit 4eac8b4

Please sign in to comment.