Skip to content

Commit

Permalink
Merge pull request #464 from davidshq-contribute/doc-updates
Browse files Browse the repository at this point in the history
Reformatted docs README and updated grammar/spelling where appropriate.
  • Loading branch information
opensemanticsearch committed Mar 17, 2023
2 parents 4185528 + 14f5948 commit e8554ea
Show file tree
Hide file tree
Showing 9 changed files with 233 additions and 191 deletions.
86 changes: 52 additions & 34 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,48 @@
---
title: Open-Source Search Engine with Apache Lucene / Solr
authors:
title: Open-Source Search Engine with Apache Lucene / Solr
authors:
- Markus Mandalka
---

# Open-Source Search Engine with Apache Lucene / Solr

*Provides integrated research tools for easier searching, monitoring, analytics, discovery & text mining (of heterogenous & large document sets & news) with free software on your own server.*


## Integrated research tools for easier searching, monitoring, analytics, discovery & text mining of heterogenous & large document sets & news with free software on your own server

### Search engine (Fulltext search)


### Search engine(Fulltext search)
[Easy full text search](../doc/search) across multiple data sources and many different file formats. Just enter a search query (which can include [powerful search operators](../doc/search/operators)) and navigate through the results.


[Easy full text search](../doc/search) in multiple data sources and many different file formats: Just enter a search query (which can include [powerful search operators](../doc/search/operators)) and navigate through the results.


### Thesaurus & Grammar (Semantic search)


### Thesaurus & Grammar(Semantic search)
Based on a [thesaurus](../doc/datamanagement/thesaurus) the multilingual semantic search engine will find [synonyms, hyponyms and aliases](../doc/search/fuzzy#synonyms), too. Using heuristics for [grammar rules like stemming](../doc/search/fuzzy#stemming) it can find other word forms, too.


Based on a [thesaurus](../doc/datamanagement/thesaurus) the multilingual semantic search engine will find [synonyms, hyponyms and aliases](../doc/search/fuzzy#synonyms), too. Using heuristics for [grammar rules like stemming](../doc/search/fuzzy#stemming) it finds other word forms, too.


### Interactive filters (Faceted search)


### Interactive filters(Faceted search)

Easy navigation through many results with [interactive filters](../doc/search#faceted_search) (faceted search) which aggregate an overview over and interactive filters for (meta) data like authors, organizations, persons, places, dates, products, tags or document types.


Easy navigation through many results with [interactive filters](../doc/search#faceted_search) (faceted search) which aggregates an overview over and interactive filters for (meta) data like authors, organizations, persons, places, dates, products, tags or document types.




### Exploration, browsing & preview (Exploratory search)

### Exploration, browsing & preview(Exploratory search)



Explore your data or search results with an [overview of aggregated search results](../doc/search#faceted_search) by different facets with [named entities (i.e. file paths, tags, persons, locations, organisations or products)](../doc/datamanagement/thesaurus), while browsing with comfortable navigation through search results or document sets.
View previews (i.e. PDF, extracted Text, Table rows or Images).
Analyze or review document sets by preview, extracted text or [wordlists for textmining](../doc/analytics/textmining).
Explore your data or search results with an [overview of aggregated search results](../doc/search#faceted_search) by different facets with [named entities (i.e. file paths, tags, persons, locations, organisations or products)](../doc/datamanagement/thesaurus), while browsing with comfortable navigation through search results or document sets.
View previews (i.e. PDF, extracted Text, Table rows or Images).
Analyze or review document sets by preview, extracted text or [wordlists for textmining](../doc/analytics/textmining).



Expand All @@ -54,19 +52,23 @@ Analyze or review document sets by preview, extracted text or [wordlists for tex



[Tag your documents with keywords, categories, names or text notes](../doc/datamanagement/annotation "Tagging and annotation") that are not included in the original content to find them better later (document management & knowledge management) or in other research or search contexts or to be able to filter annotated or tagged documents by interactive filters (faceted search).

Or evaluate, value or assess or filter documents (i.e. for validation or collaborative filtering).
[Tag your documents with keywords, categories, names or text notes](../doc/datamanagement/annotation "Tagging and annotation") that are not included in the original content to find them better later (document management & knowledge management) or in other research or search contexts or to be able to filter annotated or tagged documents by interactive filters (faceted search).

Or evaluate, value or assess or filter documents (i.e. for validation or collaborative filtering).




### Datavisualization (Dataviz)

### Data Visualization (Dataviz)


Visualizing data like document dates as [trend charts](../doc/analyze/trend) or [text analysis](../doc/analyze/textmining) for example as [word clouds](../doc/analyze/words), [connections and networks in visual graph view](../doc/analytics/graph) or view results with [geodata as interactive maps](../doc/analytics/map).


Visualizing data such as:
- document dates as [trend charts](../doc/analyze/trend)
- [text analysis](../doc/analyze/textmining) as [word clouds](../doc/analyze/words)
- [connections and networks in visual graph view](../doc/analytics/graph)
- view results with [geodata as interactive maps](../doc/analytics/map).



Expand All @@ -75,8 +77,12 @@ Visualizing data like document dates as [trend charts](../doc/analyze/trend) or
### Monitoring: Alerts & Watchlists (Newsfeeds)



Stay informed via watchlists for news alerts from media monitoring or activity streams of new or changed documents on file shares: Subscribe searches and filters as RSS-Newsfeed and get notifications when there are changed or new documents, news or search results for your keywords, search context or filter.

Stay informed via watchlists for:
- news alerts from media monitoring
- activity streams of new or changed documents on file shares

You can subscribe to searches and filters as RSS-Newsfeed and get notifications when there are changed or new documents, news or search results for your keywords, search context or filter.



Expand All @@ -87,7 +93,19 @@ Stay informed via watchlists for news alerts from media monitoring or activity s
### Supports different file formats


No matter if [structured data like databases, tables or spreadsheets](../doc/search/table) or [unstructured data like text documents](../doc/analytics/textmining), E-Mails or even scanned legacy documents: Search in many different formats and content types (text files, Word and other Microsoft Office documents or OpenOffice documents, Excel or LibreOffice Calc tables, PDF, E-Mail, CSV, doc, images, photos, pictures, JPG, TIFF, videos and [many other file formats](http://tika.apache.org/1.13/formats.html)).
Open Semantic Search can help you index and search your data whether you are working with:
- [structured data like databases, tables or spreadsheets](../doc/search/table)
- [unstructured data like text documents](../doc/analytics/textmining)
- E-Mails
- even scanned legacy documents
- text files
- Microsoft Office, OpenOffice, and LibreOffice docuemnts including Excel and Calc
- PDF
- CSV
- Images (photos, pictures, JPG, TIFF)
- Videos

And that isn't all, see a full list of [supported file formats](http://tika.apache.org/1.13/formats.html).



Expand All @@ -96,9 +114,9 @@ No matter if [structured data like databases, tables or spreadsheets](../doc/sea
### Supports multiple data sources


Find all your data at one place: Search in many different [data sources](../doc/admin/connectors) like [files and folders, file server, file shares](../connector/files), [databases](../connector/db), websites, Content Management Systems, [RSS-Feeds](../doc/datamanagement/rss) and many more.

The Connectors and Importers of the [Extract Transform Load (ETL) framework for Data Integration](../etl) connects and combines multiple data sources and as integrated [document analysis and data enrichment](../doc/data_enrichment) framework it enhances the data with the analysis results of diverse analytics tools.
You can find all your data in one place. Search many different [data sources](../doc/admin/connectors) like [files and folders, file server, file shares](../connector/files), [databases](../connector/db), websites, Content Management Systems, [RSS-Feeds](../doc/datamanagement/rss) and more.

The Connectors and Importers of the [Extract Transform Load (ETL) framework for Data Integration](../etl) connect and combine multiple data sources and, as an integrated [document analysis and data enrichment](../doc/data_enrichment) framework, it enhances the data with the analysis results of diverse analytics tools.



Expand All @@ -108,7 +126,7 @@ The Connectors and Importers of the [Extract Transform Load (ETL) framework for



[Optical character recognition (OCR) or automatic text recognition for images](../doc/admin/config/ocr) and text content stored in graphical format like scanned legacy documents, screenshots or photographed documents in the form of image files or embedded in PDF files.
[Optical character recognition (OCR) or automatic text recognition for images](../doc/admin/config/ocr) and text content stored in graphical format like scanned legacy documents, screenshots or photographed documents in the form of image files or embedded in PDF files.



Expand All @@ -127,8 +145,8 @@ The Connectors and Importers of the [Extract Transform Load (ETL) framework for
### Mobile (Responsive Design)



Open Semantic Search can not only be used with every desktop (Linux, Windows or Mac) or web browser. With its [responsive design](http://foundation.zurb.com "Powerded by Zurb Foundation") and open standards like HTML5 it is possible to search with tablets, smartphones and other mobiles.

Open Semantic Search can be used with every desktop (Linux, Windows or Mac) and web browser. With its [responsive design](http://foundation.zurb.com "Powered by Zurb Foundation") and open standards like HTML5 it is possible to search with tablets, smartphones and other mobile devices as well.



Expand All @@ -137,7 +155,7 @@ Open Semantic Search can not only be used with every desktop (Linux, Windows or
### Metadata management (RDF)


Structure your research, investigation, navigation, document sets, collections, metadata forms or notes in a Semantic Wiki, Drupal or another content management system (CMS) or with an innovative annotation framework with taxonomies and custom fields for tagging documents, annotations, linking relationships, mapping and structured notes. So you integrate powerful and flexible metadata management or annotation tools using interoperable open standards like Resource Description Framework (RDF) and Simple Knowledge Organization System ([SKOS](https://www.w3.org/TR/skos-primer)).
Structure your research, investigation, navigation, document sets, collections, metadata forms or notes in a Semantic Wiki, Drupal or another content management system (CMS) or with an innovative annotation framework with taxonomies and custom fields for tagging documents, annotations, linking relationships, mapping and structured notes. You can integrate powerful and flexible metadata management or annotation tools using interoperable open standards like the Resource Description Framework (RDF) and the Simple Knowledge Organization System ([SKOS](https://www.w3.org/TR/skos-primer)).



Expand All @@ -146,9 +164,9 @@ Structure your research, investigation, navigation, document sets, collections,
### Filesystem monitoring



Using [file monitoring](../trigger/filemonitoring), new or changed files are indexed within seconds without frequent recrawls (which is not possible often if many files).
Colleagues are able to find new data immediately without (often forgotten) uploads to a data or document management system (DMS) or filling out a data registration form for each new or changed document or dataset in a data management system, data registry or digital asset management (DAM) system.

Using [file monitoring](../trigger/filemonitoring), new or changed files are indexed within seconds without requiring frequent recrawls (which is not possible often if there are many files).
Colleagues are able to find new data immediately without (often forgotten) uploads to a data or document management system (DMS) or filling out a data registration form for each new or changed document or dataset in a data management system, data registry or digital asset management (DAM) system.



Expand Down
51 changes: 26 additions & 25 deletions docs/solr-ontology-tagger/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Ontology tagger for Solr (Automatic tagging by RDF ontologies & SKOS thesaurus)
authors:
title: Ontology tagger for Solr (Automatic tagging by RDF ontologies & SKOS thesaurus)
authors:
- Markus Mandalka
---

Expand All @@ -9,45 +9,46 @@ authors:

## Annotator for Apache Solr by Resource Description Framework (RDF) ontology & Simple Knowledge Organization System (SKOS) thesaurus


The auto-tagger **Ontology Tagger** for **Apache Solr** is the preconfigured search engine component for **automatic tagging** or auto-classification of **documents** in an Apache Solr index for faceted search by labels in data structures like **ontologies** in the open standard **RDF** & **thesauruses** in open standard **SKOS** or [linked open data sources and databases](../doc/datamanagement/opendata) like [Wikidata](../doc/datamanagement/opendata#wikidata).


The auto-tagger **Ontology Tagger** for **Apache Solr** is the preconfigured search engine component for **automatic tagging** or auto-classification of **documents** in an Apache Solr index for faceted search by labels in data structures like **ontologies** in the open standard **RDF** & **thesauruses** in the open standard **SKOS** or [linked open data sources and databases](../doc/datamanagement/opendata) like [Wikidata](../doc/datamanagement/opendata#wikidata).

## Automatic tagger for faceted search with Solr


So you can [structure, filter and navigate your indexed documents or datasets by faceted search](../doc/search#faceted_search) based on structures like [thesauri, knowledge bases, lists of entities, ontologies or taxonomies](../doc/datamanagement/thesaurus) available in open standards for semantic web or linked data formats like Resource Description Format (RDF) or [Simple knowledge organization system (SKOS)](../doc/datamanagement/thesaurus#skos).


You can [structure, filter and navigate your indexed documents or datasets by faceted search](../doc/search#faceted_search) based on structures like [thesauri, knowledge bases, lists of entities, ontologies or taxonomies](../doc/datamanagement/thesaurus) available in open standards for the semantic web or linked data formats like Resource Description Format (RDF) or [Simple knowledge organization system (SKOS)](../doc/datamanagement/thesaurus#skos).

## Free Open Source Software (FOSS)


Since the Ontology based auto-tagging tool and library is free Open Source Software based on Python & rdflib, the full source code is included inside the downloadable packages and hosted on [Github](https://github.com/opensemanticsearch/solr-ontology-tagger).


Since the Ontology based auto-tagging tool and library is free Open Source Software based on Python & rdflib, the full source code is included inside the downloadable packages and hosted on [Github](https://github.com/opensemanticsearch/solr-ontology-tagger).

## User interface (UI) for managing ontologies and thesauri for automatic tagging


A simple web app based user interface (UI) for easy configuring Solr with ontologies or thesauri for faceded search is provided by the [Python Django App Ontologies Manager](../doc/datamanagement/ontologies), which code is available inside our distribution packages and on [Github](https://github.com/opensemanticsearch/open-semantic-search-apps), too.


A simple web app based user interface (UI) for easily configuring Solr with ontologies or thesauri for faceted search is provided by the [Python Django App Ontologies Manager](../doc/datamanagement/ontologies), and it's code is available inside our distribution packages and on [Github](https://github.com/opensemanticsearch/open-semantic-search-apps) too.

## Poor mans entity linking without disambiguation


Since for most usecases not so important if you work mainly with your own datasets and domain specific knowledge instead of universal databases with many ambigous concepts or names, at the moment there is no disambiguation integrated for automatic tagging or poor mans entity linking. Please donate so we can integrate methods and UIs to disambiguate homonyms and different entities with same names or same labels.


Since for most use cases not so important if you work mainly with your own datasets and domain specific knowledge instead of universal databases with many ambigous concepts or names, at the moment there is no disambiguation integrated for automatic tagging or poor mans entity linking. *Please donate so we can integrate methods and UIs to disambiguate homonyms and different entities with same names or same labels.*

## Automatic ontology tagger and annotator for Elastic search


Our search engine distribution is based on Apache Solr. Please donate with the subject "Elasticsearch ontology tagger" if you want to use these integrated tools for Elastic Search, too, since a generalization of this relative small parts of the search engine specific code would cost only few hours of effort or configure an alternate Ontology Annotator with [Elastic search plugin](https://github.com/flaxsearch/BioSolr/tree/master/ontology/ontology-annotator/elasticsearch-ontology-annotator).


Our search engine distribution is based on Apache Solr. *Please donate with the subject "Elasticsearch ontology tagger" if you want to use these integrated tools for Elasticsearch*, since a generalization of this relatively small part of the search engine specific code would cost only few hours of effort or configure an alternate Ontology Annotator with an [Elastic search plugin](https://github.com/flaxsearch/BioSolr/tree/master/ontology/ontology-annotator/elasticsearch-ontology-annotator).

## Open Source tools for entity linking, dictionary based entity extraction or dictionary based annotation


Other methods, open source frameworks and free tools for automatic tagging, entity linking, entity extraction or disambiguation by machine learning:


Other methods, open source frameworks and free tools for automatic tagging, entity linking, entity extraction or disambiguation by machine learning:

* [Apache Stanbol Entity Linking Engine](https://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking)
* [BioSolr ontology annotator](https://github.com/flaxsearch/BioSolr/tree/master/ontology/ontology-annotator) - Ontology tagger for Solr or Elastic Search
* [Fast entity linker](https://github.com/yahoo/FEL) - Entity linking with disambiguation by machine learning
* [Fast entity linker](https://github.com/yahoo/FEL) - Entity linking with disambiguation by machine learning
* [Dexter](http://dexter.isti.cnr.it/)
* [NEL](https://github.com/wikilinks/nel)
* [SolrTextTagger](https://github.com/OpenSextant/SolrTextTagger)
* [Solr Dictionary Annotator](https://github.com/elsevierlabs-os/soda) - Microservice for Spark
* [Datafari OntologyUpdateProcessor](https://datafari.atlassian.net/wiki/display/DATAFARI/Link+an+ontology) - Solr update processor plugin
* [Apache UIMA dictionary annotator](https://uima.apache.org/d/uima-addons-current/DictionaryAnnotator/DictionaryAnnotatorUserGuide.html)* [Apache UIMA concept mapper](https://uima.apache.org/d/uima-addons-current/ConceptMapper/ConceptMapperAnnotatorUserGuide.html#dict)
* [Apache UIMA dictionary annotator](https://uima.apache.org/d/uima-addons-current/DictionaryAnnotator/DictionaryAnnotatorUserGuide.html)
* [Apache UIMA concept mapper](https://uima.apache.org/d/uima-addons-current/ConceptMapper/ConceptMapperAnnotatorUserGuide.html#dict)
Loading

0 comments on commit e8554ea

Please sign in to comment.