From 1f980249877937518fd73350a2c9796847291ec6 Mon Sep 17 00:00:00 2001 From: Dave Mackey Date: Thu, 16 Mar 2023 23:53:38 -0400 Subject: [PATCH 1/4] Reformatted docs README and updated grammar/spelling where appropriate. --- docs/README.md | 86 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 52 insertions(+), 34 deletions(-) diff --git a/docs/README.md b/docs/README.md index dfdccdf..e55a025 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,50 +1,48 @@ --- -title: Open-Source Search Engine with Apache Lucene / Solr -authors: +title: Open-Source Search Engine with Apache Lucene / Solr +authors: - Markus Mandalka --- # Open-Source Search Engine with Apache Lucene / Solr +*Provides integrated research tools for easier searching, monitoring, analytics, discovery & text mining (of heterogenous & large document sets & news) with free software on your own server.* -## Integrated research tools for easier searching, monitoring, analytics, discovery & text mining of heterogenous & large document sets & news with free software on your own server +### Search engine (Fulltext search) -### Search engine(Fulltext search) +[Easy full text search](../doc/search) across multiple data sources and many different file formats. Just enter a search query (which can include [powerful search operators](../doc/search/operators)) and navigate through the results. -[Easy full text search](../doc/search) in multiple data sources and many different file formats: Just enter a search query (which can include [powerful search operators](../doc/search/operators)) and navigate through the results. +### Thesaurus & Grammar (Semantic search) -### Thesaurus & Grammar(Semantic search) +Based on a [thesaurus](../doc/datamanagement/thesaurus) the multilingual semantic search engine will find [synonyms, hyponyms and aliases](../doc/search/fuzzy#synonyms), too. Using heuristics for [grammar rules like stemming](../doc/search/fuzzy#stemming) it can find other word forms, too. -Based on a [thesaurus](../doc/datamanagement/thesaurus) the multilingual semantic search engine will find [synonyms, hyponyms and aliases](../doc/search/fuzzy#synonyms), too. Using heuristics for [grammar rules like stemming](../doc/search/fuzzy#stemming) it finds other word forms, too. +### Interactive filters (Faceted search) -### Interactive filters(Faceted search) +Easy navigation through many results with [interactive filters](../doc/search#faceted_search) (faceted search) which aggregate an overview over and interactive filters for (meta) data like authors, organizations, persons, places, dates, products, tags or document types. - -Easy navigation through many results with [interactive filters](../doc/search#faceted_search) (faceted search) which aggregates an overview over and interactive filters for (meta) data like authors, organizations, persons, places, dates, products, tags or document types. +### Exploration, browsing & preview (Exploratory search) -### Exploration, browsing & preview(Exploratory search) - -Explore your data or search results with an [overview of aggregated search results](../doc/search#faceted_search) by different facets with [named entities (i.e. file paths, tags, persons, locations, organisations or products)](../doc/datamanagement/thesaurus), while browsing with comfortable navigation through search results or document sets. -View previews (i.e. PDF, extracted Text, Table rows or Images). -Analyze or review document sets by preview, extracted text or [wordlists for textmining](../doc/analytics/textmining). +Explore your data or search results with an [overview of aggregated search results](../doc/search#faceted_search) by different facets with [named entities (i.e. file paths, tags, persons, locations, organisations or products)](../doc/datamanagement/thesaurus), while browsing with comfortable navigation through search results or document sets. +View previews (i.e. PDF, extracted Text, Table rows or Images). +Analyze or review document sets by preview, extracted text or [wordlists for textmining](../doc/analytics/textmining). @@ -54,19 +52,23 @@ Analyze or review document sets by preview, extracted text or [wordlists for tex -[Tag your documents with keywords, categories, names or text notes](../doc/datamanagement/annotation "Tagging and annotation") that are not included in the original content to find them better later (document management & knowledge management) or in other research or search contexts or to be able to filter annotated or tagged documents by interactive filters (faceted search). - -Or evaluate, value or assess or filter documents (i.e. for validation or collaborative filtering). +[Tag your documents with keywords, categories, names or text notes](../doc/datamanagement/annotation "Tagging and annotation") that are not included in the original content to find them better later (document management & knowledge management) or in other research or search contexts or to be able to filter annotated or tagged documents by interactive filters (faceted search). +Or evaluate, value or assess or filter documents (i.e. for validation or collaborative filtering). -### Datavisualization (Dataviz) +### Data Visualization (Dataviz) - -Visualizing data like document dates as [trend charts](../doc/analyze/trend) or [text analysis](../doc/analyze/textmining) for example as [word clouds](../doc/analyze/words), [connections and networks in visual graph view](../doc/analytics/graph) or view results with [geodata as interactive maps](../doc/analytics/map). + + +Visualizing data such as: +- document dates as [trend charts](../doc/analyze/trend) +- [text analysis](../doc/analyze/textmining) as [word clouds](../doc/analyze/words) +- [connections and networks in visual graph view](../doc/analytics/graph) +- view results with [geodata as interactive maps](../doc/analytics/map). @@ -75,8 +77,12 @@ Visualizing data like document dates as [trend charts](../doc/analyze/trend) or ### Monitoring: Alerts & Watchlists (Newsfeeds) - -Stay informed via watchlists for news alerts from media monitoring or activity streams of new or changed documents on file shares: Subscribe searches and filters as RSS-Newsfeed and get notifications when there are changed or new documents, news or search results for your keywords, search context or filter. + +Stay informed via watchlists for: +- news alerts from media monitoring +- activity streams of new or changed documents on file shares + +You can subscribe to searches and filters as RSS-Newsfeed and get notifications when there are changed or new documents, news or search results for your keywords, search context or filter. @@ -87,7 +93,19 @@ Stay informed via watchlists for news alerts from media monitoring or activity s ### Supports different file formats -No matter if [structured data like databases, tables or spreadsheets](../doc/search/table) or [unstructured data like text documents](../doc/analytics/textmining), E-Mails or even scanned legacy documents: Search in many different formats and content types (text files, Word and other Microsoft Office documents or OpenOffice documents, Excel or LibreOffice Calc tables, PDF, E-Mail, CSV, doc, images, photos, pictures, JPG, TIFF, videos and [many other file formats](http://tika.apache.org/1.13/formats.html)). +Open Semantic Search can help you index and search your data whether you are working with: +- [structured data like databases, tables or spreadsheets](../doc/search/table) +- [unstructured data like text documents](../doc/analytics/textmining) +- E-Mails +- even scanned legacy documents +- text files +- Microsoft Office, OpenOffice, and LibreOffice docuemnts including Excel and Calc +- PDF +- CSV +- Images (photos, pictures, JPG, TIFF) +- Videos + +And that isn't all, see a full list of [supported file formats](http://tika.apache.org/1.13/formats.html). @@ -96,9 +114,9 @@ No matter if [structured data like databases, tables or spreadsheets](../doc/sea ### Supports multiple data sources -Find all your data at one place: Search in many different [data sources](../doc/admin/connectors) like [files and folders, file server, file shares](../connector/files), [databases](../connector/db), websites, Content Management Systems, [RSS-Feeds](../doc/datamanagement/rss) and many more. - -The Connectors and Importers of the [Extract Transform Load (ETL) framework for Data Integration](../etl) connects and combines multiple data sources and as integrated [document analysis and data enrichment](../doc/data_enrichment) framework it enhances the data with the analysis results of diverse analytics tools. +You can find all your data in one place. Search many different [data sources](../doc/admin/connectors) like [files and folders, file server, file shares](../connector/files), [databases](../connector/db), websites, Content Management Systems, [RSS-Feeds](../doc/datamanagement/rss) and more. + +The Connectors and Importers of the [Extract Transform Load (ETL) framework for Data Integration](../etl) connect and combine multiple data sources and, as an integrated [document analysis and data enrichment](../doc/data_enrichment) framework, it enhances the data with the analysis results of diverse analytics tools. @@ -108,7 +126,7 @@ The Connectors and Importers of the [Extract Transform Load (ETL) framework for -[Optical character recognition (OCR) or automatic text recognition for images](../doc/admin/config/ocr) and text content stored in graphical format like scanned legacy documents, screenshots or photographed documents in the form of image files or embedded in PDF files. +[Optical character recognition (OCR) or automatic text recognition for images](../doc/admin/config/ocr) and text content stored in graphical format like scanned legacy documents, screenshots or photographed documents in the form of image files or embedded in PDF files. @@ -127,8 +145,8 @@ The Connectors and Importers of the [Extract Transform Load (ETL) framework for ### Mobile (Responsive Design) - -Open Semantic Search can not only be used with every desktop (Linux, Windows or Mac) or web browser. With its [responsive design](http://foundation.zurb.com "Powerded by Zurb Foundation") and open standards like HTML5 it is possible to search with tablets, smartphones and other mobiles. + +Open Semantic Search can be used with every desktop (Linux, Windows or Mac) and web browser. With its [responsive design](http://foundation.zurb.com "Powered by Zurb Foundation") and open standards like HTML5 it is possible to search with tablets, smartphones and other mobile devices as well. @@ -137,7 +155,7 @@ Open Semantic Search can not only be used with every desktop (Linux, Windows or ### Metadata management (RDF) -Structure your research, investigation, navigation, document sets, collections, metadata forms or notes in a Semantic Wiki, Drupal or another content management system (CMS) or with an innovative annotation framework with taxonomies and custom fields for tagging documents, annotations, linking relationships, mapping and structured notes. So you integrate powerful and flexible metadata management or annotation tools using interoperable open standards like Resource Description Framework (RDF) and Simple Knowledge Organization System ([SKOS](https://www.w3.org/TR/skos-primer)). +Structure your research, investigation, navigation, document sets, collections, metadata forms or notes in a Semantic Wiki, Drupal or another content management system (CMS) or with an innovative annotation framework with taxonomies and custom fields for tagging documents, annotations, linking relationships, mapping and structured notes. You can integrate powerful and flexible metadata management or annotation tools using interoperable open standards like the Resource Description Framework (RDF) and the Simple Knowledge Organization System ([SKOS](https://www.w3.org/TR/skos-primer)). @@ -146,9 +164,9 @@ Structure your research, investigation, navigation, document sets, collections, ### Filesystem monitoring - -Using [file monitoring](../trigger/filemonitoring), new or changed files are indexed within seconds without frequent recrawls (which is not possible often if many files). -Colleagues are able to find new data immediately without (often forgotten) uploads to a data or document management system (DMS) or filling out a data registration form for each new or changed document or dataset in a data management system, data registry or digital asset management (DAM) system. + +Using [file monitoring](../trigger/filemonitoring), new or changed files are indexed within seconds without requiring frequent recrawls (which is not possible often if there are many files). +Colleagues are able to find new data immediately without (often forgotten) uploads to a data or document management system (DMS) or filling out a data registration form for each new or changed document or dataset in a data management system, data registry or digital asset management (DAM) system. From 4b9dc02fb2fdad5a5eca7d568b687887dacaa505 Mon Sep 17 00:00:00 2001 From: Dave Mackey Date: Fri, 17 Mar 2023 00:24:57 -0400 Subject: [PATCH 2/4] Add section to mkdocs that includes sub-apps --- mkdocs.yml | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/mkdocs.yml b/mkdocs.yml index 22b0254..c9908b3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -23,6 +23,19 @@ nav: - 'Configuration': 'doc/admin/config/README.md' - 'Task queue': 'doc/admin/queue/README.md' - 'Logs': 'doc/admin/config/log/README.md' + - 'Additional Applications': + - 'CSV Manager': 'docs/solr-search-csv-python-django/README.md' + - 'Graph Explorer': 'docs/graph-explorer/README.md' + - 'Lexememes': 'docs/lexememes/README.md' + - 'Ontology Tagger': 'docs/solr-ontology-tagger/README.md' + - 'Relevance Ranking Analysis': 'docs/solr-relevance-ranking-analysis/README.md' + - 'RDF or SKOS to OCR': 'docs/rdf2ocr/README.md' + - 'RSS-Feed Manager': 'docs/rss-feed-manager-python-django/README.md' + - 'Search Lists': 'docs/search-list/README.md' + - 'SKOS to Solr': 'docs/skos2solr/README.md' + - 'Solr Server': 'docs/solr/README.md' + - 'Tagger': 'docs/tagger/README.md' + - 'QueryTagger': 'docs/solr-search-querytagger-python-django/README.md' - 'Download': 'download/README.md' - 'Development': 'https://github.com/opensemanticsearch/open-semantic-search/' - 'Donate': 'donate/README.md' From edafad30b64a3aa2678f7abece96ec113198879b Mon Sep 17 00:00:00 2001 From: Dave Mackey Date: Fri, 17 Mar 2023 00:25:46 -0400 Subject: [PATCH 3/4] Grammar, formatting and spelling improvements to several docs --- docs/solr-ontology-tagger/README.md | 51 +++++++------- docs/solr-ranking-analysis/README.md | 68 +++++++++--------- .../solr-relevance-ranking-analysis/README.md | 70 +++++++++---------- docs/solr-search-csv-python-django/README.md | 36 +++++----- .../README.md | 37 +++++----- docs/solr/README.md | 55 ++++++++------- docs/tagger/README.md | 8 +-- 7 files changed, 168 insertions(+), 157 deletions(-) diff --git a/docs/solr-ontology-tagger/README.md b/docs/solr-ontology-tagger/README.md index e746b25..fa45ccc 100644 --- a/docs/solr-ontology-tagger/README.md +++ b/docs/solr-ontology-tagger/README.md @@ -1,6 +1,6 @@ --- -title: Ontology tagger for Solr (Automatic tagging by RDF ontologies & SKOS thesaurus) -authors: +title: Ontology tagger for Solr (Automatic tagging by RDF ontologies & SKOS thesaurus) +authors: - Markus Mandalka --- @@ -9,45 +9,46 @@ authors: ## Annotator for Apache Solr by Resource Description Framework (RDF) ontology & Simple Knowledge Organization System (SKOS) thesaurus - -The auto-tagger **Ontology Tagger** for **Apache Solr** is the preconfigured search engine component for **automatic tagging** or auto-classification of **documents** in an Apache Solr index for faceted search by labels in data structures like **ontologies** in the open standard **RDF** & **thesauruses** in open standard **SKOS** or [linked open data sources and databases](../doc/datamanagement/opendata) like [Wikidata](../doc/datamanagement/opendata#wikidata). - + +The auto-tagger **Ontology Tagger** for **Apache Solr** is the preconfigured search engine component for **automatic tagging** or auto-classification of **documents** in an Apache Solr index for faceted search by labels in data structures like **ontologies** in the open standard **RDF** & **thesauruses** in the open standard **SKOS** or [linked open data sources and databases](../doc/datamanagement/opendata) like [Wikidata](../doc/datamanagement/opendata#wikidata). + ## Automatic tagger for faceted search with Solr - -So you can [structure, filter and navigate your indexed documents or datasets by faceted search](../doc/search#faceted_search) based on structures like [thesauri, knowledge bases, lists of entities, ontologies or taxonomies](../doc/datamanagement/thesaurus) available in open standards for semantic web or linked data formats like Resource Description Format (RDF) or [Simple knowledge organization system (SKOS)](../doc/datamanagement/thesaurus#skos). - + +You can [structure, filter and navigate your indexed documents or datasets by faceted search](../doc/search#faceted_search) based on structures like [thesauri, knowledge bases, lists of entities, ontologies or taxonomies](../doc/datamanagement/thesaurus) available in open standards for the semantic web or linked data formats like Resource Description Format (RDF) or [Simple knowledge organization system (SKOS)](../doc/datamanagement/thesaurus#skos). + ## Free Open Source Software (FOSS) - -Since the Ontology based auto-tagging tool and library is free Open Source Software based on Python & rdflib, the full source code is included inside the downloadable packages and hosted on [Github](https://github.com/opensemanticsearch/solr-ontology-tagger). - + +Since the Ontology based auto-tagging tool and library is free Open Source Software based on Python & rdflib, the full source code is included inside the downloadable packages and hosted on [Github](https://github.com/opensemanticsearch/solr-ontology-tagger). + ## User interface (UI) for managing ontologies and thesauri for automatic tagging - -A simple web app based user interface (UI) for easy configuring Solr with ontologies or thesauri for faceded search is provided by the [Python Django App Ontologies Manager](../doc/datamanagement/ontologies), which code is available inside our distribution packages and on [Github](https://github.com/opensemanticsearch/open-semantic-search-apps), too. - + +A simple web app based user interface (UI) for easily configuring Solr with ontologies or thesauri for faceted search is provided by the [Python Django App Ontologies Manager](../doc/datamanagement/ontologies), and it's code is available inside our distribution packages and on [Github](https://github.com/opensemanticsearch/open-semantic-search-apps) too. + ## Poor mans entity linking without disambiguation - -Since for most usecases not so important if you work mainly with your own datasets and domain specific knowledge instead of universal databases with many ambigous concepts or names, at the moment there is no disambiguation integrated for automatic tagging or poor mans entity linking. Please donate so we can integrate methods and UIs to disambiguate homonyms and different entities with same names or same labels. - + +Since for most use cases not so important if you work mainly with your own datasets and domain specific knowledge instead of universal databases with many ambigous concepts or names, at the moment there is no disambiguation integrated for automatic tagging or poor mans entity linking. *Please donate so we can integrate methods and UIs to disambiguate homonyms and different entities with same names or same labels.* + ## Automatic ontology tagger and annotator for Elastic search - -Our search engine distribution is based on Apache Solr. Please donate with the subject "Elasticsearch ontology tagger" if you want to use these integrated tools for Elastic Search, too, since a generalization of this relative small parts of the search engine specific code would cost only few hours of effort or configure an alternate Ontology Annotator with [Elastic search plugin](https://github.com/flaxsearch/BioSolr/tree/master/ontology/ontology-annotator/elasticsearch-ontology-annotator). - + +Our search engine distribution is based on Apache Solr. *Please donate with the subject "Elasticsearch ontology tagger" if you want to use these integrated tools for Elasticsearch*, since a generalization of this relatively small part of the search engine specific code would cost only few hours of effort or configure an alternate Ontology Annotator with an [Elastic search plugin](https://github.com/flaxsearch/BioSolr/tree/master/ontology/ontology-annotator/elasticsearch-ontology-annotator). + ## Open Source tools for entity linking, dictionary based entity extraction or dictionary based annotation - -Other methods, open source frameworks and free tools for automatic tagging, entity linking, entity extraction or disambiguation by machine learning: - + +Other methods, open source frameworks and free tools for automatic tagging, entity linking, entity extraction or disambiguation by machine learning: + * [Apache Stanbol Entity Linking Engine](https://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking) * [BioSolr ontology annotator](https://github.com/flaxsearch/BioSolr/tree/master/ontology/ontology-annotator) - Ontology tagger for Solr or Elastic Search -* [Fast entity linker](https://github.com/yahoo/FEL) - Entity linking with disambiguation by machine learning +* [Fast entity linker](https://github.com/yahoo/FEL) - Entity linking with disambiguation by machine learning * [Dexter](http://dexter.isti.cnr.it/) * [NEL](https://github.com/wikilinks/nel) * [SolrTextTagger](https://github.com/OpenSextant/SolrTextTagger) * [Solr Dictionary Annotator](https://github.com/elsevierlabs-os/soda) - Microservice for Spark * [Datafari OntologyUpdateProcessor](https://datafari.atlassian.net/wiki/display/DATAFARI/Link+an+ontology) - Solr update processor plugin -* [Apache UIMA dictionary annotator](https://uima.apache.org/d/uima-addons-current/DictionaryAnnotator/DictionaryAnnotatorUserGuide.html)* [Apache UIMA concept mapper](https://uima.apache.org/d/uima-addons-current/ConceptMapper/ConceptMapperAnnotatorUserGuide.html#dict) +* [Apache UIMA dictionary annotator](https://uima.apache.org/d/uima-addons-current/DictionaryAnnotator/DictionaryAnnotatorUserGuide.html) +* [Apache UIMA concept mapper](https://uima.apache.org/d/uima-addons-current/ConceptMapper/ConceptMapperAnnotatorUserGuide.html#dict) diff --git a/docs/solr-ranking-analysis/README.md b/docs/solr-ranking-analysis/README.md index eab35d4..72bd00c 100644 --- a/docs/solr-ranking-analysis/README.md +++ b/docs/solr-ranking-analysis/README.md @@ -1,6 +1,6 @@ --- -title: Solr Relevance Ranking Analysis and Visualization Tool -authors: +title: Solr Relevance Ranking Analysis and Visualization Tool +authors: - Markus Mandalka --- @@ -9,57 +9,57 @@ authors: ## Relevance Ranking Analysis and Visualization for easier Solr relevancy tuning - - -This Python Django based Open Source tool and web user interface (UI) for easier [Solr Relevancy](https://wiki.apache.org/solr/SolrRelevancyFAQ) analysis helps while search relevance tuning and relevancy ranking debugging. - -Therefore the tool summarize and visualize the relevance ranking and scoring by [field boosts](https://lucene.apache.org/solr/guide/7_6/the-dismax-query-parser.html#qf-query-fields-parameter) (qf), term weights (TF/IDF) and [boost function](https://lucene.apache.org/solr/guide/7_6/the-dismax-query-parser.html#bf-boost-functions-parameter) (bf) score of documents found by an Apache Solr search query. - + + +This Python Django based Open Source tool and web user interface (UI) for easier [Solr Relevancy](https://wiki.apache.org/solr/SolrRelevancyFAQ) analysis helps while one is performing search relevance tuning and relevancy ranking debugging. + +Therefore the tool summarize and visualizes the relevance ranking and scoring by [field boosts](https://lucene.apache.org/solr/guide/7_6/the-dismax-query-parser.html#qf-query-fields-parameter) (qf), term weights (TF/IDF) and [boost function](https://lucene.apache.org/solr/guide/7_6/the-dismax-query-parser.html#bf-boost-functions-parameter) (bf) score of documents found by an Apache Solr search query. + ## Usage - -Open the web user interface (UI) on the server/port/path you run this Django web app (see section "Installation"). - -Copy the full Solr query (URL) to the field "Query" of the form in the web user interface (UI) - -Click the button "Analyze relevance ranking" - - + +Open the web user interface (UI) on the server/port/path you run this Django web app (see section "Installation"). + +Copy the full Solr query (URL) to the field "Query" of the form in the web user interface (UI) + +Click the button "Analyze relevance ranking" + + ## Visual summary - - -So you get an visual summary of the relevance ranking of the found documents: - + + +So you get an visual summary of the relevance ranking of the found documents: + ![](../screenshots/solr-relevance-ranking-analysis.png) ## Ranking details - - -By clicking the button "Show details" you get the full details of the scoring calculation for each document: - + + +By clicking the button "Show details" you get the full details of the scoring calculation for each document: + ![](../screenshots/solr-relevance-ranking-analysis-details.png) ## Visualization - - -The button "Chart" in the top bar shows a more compact visualization: - + + +The button "Chart" in the top bar shows a more compact visualization: + ![](../screenshots/solr-relevance-ranking-analysis-visualization.png) ## Installation and configuration - -The tool can be used with other [Apache Solr](http://lucene.apache.org/solr/) environments than Open Semantic Search. - -You find the documentation of the installation and configuration in the [README.md](https://github.com/opensemanticsearch/solr-relevance-ranking-analysis). - + +The tool can be used with other [Apache Solr](http://lucene.apache.org/solr/) environments than Open Semantic Search. + +You find the documentation of the installation and configuration in the [README.md](https://github.com/opensemanticsearch/solr-relevance-ranking-analysis). + ## Free Open Source Software - + The tool is Free Software. You find the full [Source Code on GitHub](https://github.com/opensemanticsearch/solr-relevance-ranking-analysis) diff --git a/docs/solr-relevance-ranking-analysis/README.md b/docs/solr-relevance-ranking-analysis/README.md index eab35d4..a97ea81 100644 --- a/docs/solr-relevance-ranking-analysis/README.md +++ b/docs/solr-relevance-ranking-analysis/README.md @@ -1,6 +1,6 @@ --- -title: Solr Relevance Ranking Analysis and Visualization Tool -authors: +title: Solr Relevance Ranking Analysis and Visualization Tool +authors: - Markus Mandalka --- @@ -9,57 +9,57 @@ authors: ## Relevance Ranking Analysis and Visualization for easier Solr relevancy tuning - - -This Python Django based Open Source tool and web user interface (UI) for easier [Solr Relevancy](https://wiki.apache.org/solr/SolrRelevancyFAQ) analysis helps while search relevance tuning and relevancy ranking debugging. - -Therefore the tool summarize and visualize the relevance ranking and scoring by [field boosts](https://lucene.apache.org/solr/guide/7_6/the-dismax-query-parser.html#qf-query-fields-parameter) (qf), term weights (TF/IDF) and [boost function](https://lucene.apache.org/solr/guide/7_6/the-dismax-query-parser.html#bf-boost-functions-parameter) (bf) score of documents found by an Apache Solr search query. - + + +This Python Django based Open Source tool and web user interface (UI) allows for easier [Solr Relevancy](https://wiki.apache.org/solr/SolrRelevancyFAQ) analysis and is helpful while one is performing search relevance tuning and relevancy ranking debugging. + +The tool summarizes and visualizes the relevance ranking and scoring by [field boosts](https://lucene.apache.org/solr/guide/7_6/the-dismax-query-parser.html#qf-query-fields-parameter) (qf), term weights (TF/IDF) and the [boost function](https://lucene.apache.org/solr/guide/7_6/the-dismax-query-parser.html#bf-boost-functions-parameter) (bf) score of documents found by an Apache Solr search query. + ## Usage - -Open the web user interface (UI) on the server/port/path you run this Django web app (see section "Installation"). - -Copy the full Solr query (URL) to the field "Query" of the form in the web user interface (UI) - -Click the button "Analyze relevance ranking" - - + +Open the web user interface (UI) on the server/port/path you run this Django web app (see section "Installation"). + +Copy the full Solr query (URL) to the field "Query" of the form in the web user interface (UI) + +Click the button "Analyze relevance ranking" + + ## Visual summary - - -So you get an visual summary of the relevance ranking of the found documents: - + + +So you get a visual summary of the relevance ranking of the found documents: + ![](../screenshots/solr-relevance-ranking-analysis.png) ## Ranking details - - -By clicking the button "Show details" you get the full details of the scoring calculation for each document: - + + +By clicking the button "Show details" you get the full details of the scoring calculation for each document: + ![](../screenshots/solr-relevance-ranking-analysis-details.png) ## Visualization - - -The button "Chart" in the top bar shows a more compact visualization: - + + +The button "Chart" in the top bar shows a more compact visualization: + ![](../screenshots/solr-relevance-ranking-analysis-visualization.png) ## Installation and configuration - -The tool can be used with other [Apache Solr](http://lucene.apache.org/solr/) environments than Open Semantic Search. - -You find the documentation of the installation and configuration in the [README.md](https://github.com/opensemanticsearch/solr-relevance-ranking-analysis). - + +The tool can be used with other [Apache Solr](http://lucene.apache.org/solr/) environments than Open Semantic Search. + +You can find the documentation on installation and configuration in the [README.md](https://github.com/opensemanticsearch/solr-relevance-ranking-analysis). + ## Free Open Source Software - -The tool is Free Software. You find the full [Source Code on GitHub](https://github.com/opensemanticsearch/solr-relevance-ranking-analysis) + +The tool is Free Software. You can find the full [Source Code on GitHub](https://github.com/opensemanticsearch/solr-relevance-ranking-analysis) diff --git a/docs/solr-search-csv-python-django/README.md b/docs/solr-search-csv-python-django/README.md index 92e239a..40b2b10 100644 --- a/docs/solr-search-csv-python-django/README.md +++ b/docs/solr-search-csv-python-django/README.md @@ -1,31 +1,35 @@ --- -title: CSV Manager +title: CSV Manager authors: - - Markus Mandalka + - Markus Mandalka --- # CSV Manager -User interface (webapp) for structured import of CSV spreadsheets - +User interface (webapp) for structured import of CSV spreadsheets. + ## Usage - - + + See the [user documentation](../doc/search/csv) + ## Installation -* download the module *opensemanticsearch-search-csv-python-django* +* Download the module *opensemanticsearch-search-csv-python-django* * Copy the directory *csvmanager* from the zip file into your Django apps directory -* Enable the new app: - -Add "*csvmanager*" to your INSTALLED\_APPS setting like this: - - -`INSTALLED_APPS = ( - ... - 'csvmanager', - )`* Include the search\_list URLconf in your project urls.py like this: +* Enable the new app: + +Add "*csvmanager*" to your `INSTALLED_APPS` setting like this: + + +```python +INSTALLED_APPS = ( + ... + 'csvmanager', + ) +``` +* Include the `search_list` URLconf in your project `urls.py` like this: `url(r'^csvmanager/', include('csvmanager.urls')),` diff --git a/docs/solr-search-querytagger-python-django/README.md b/docs/solr-search-querytagger-python-django/README.md index 7718256..b4b40ee 100644 --- a/docs/solr-search-querytagger-python-django/README.md +++ b/docs/solr-search-querytagger-python-django/README.md @@ -1,31 +1,36 @@ --- -title: Web interface for tagging all results of a Solr search query -authors: +title: Web interface for tagging all results of a Solr search query +authors: - Markus Mandalka --- # Web interface for tagging all results of a Solr search query -Tagging all results of a search query. - +Tags all results of a search query. + ## Usage - - + + See the [user documentation](../doc/search/tagging_results_of_search_query) + ## Installation * download the module *solr-search-querytagger-python-django* * Copy the directory *querytagger* from the zip file into your Django apps directory -* Enable the new app: - -Add "*querytagger*" to your INSTALLED\_APPS setting like this: - - -`INSTALLED_APPS = ( - ... - 'querytagger', - )`* Include the querytagger URLconf in your project urls.py like this: -`url(r'^querytagger/', include('querytagger.urls')),` +* Enable the new app: + +Add "*querytagger*" to your `INSTALLED\APPS` setting like this: + + +```python +INSTALLED_APPS = ( + ... + 'querytagger', +) +``` + +Include the querytagger URLconf in your project `urls.py` like this: +`url(r'^querytagger/', include('querytagger.urls')),` \ No newline at end of file diff --git a/docs/solr/README.md b/docs/solr/README.md index f00a4a1..69d79dc 100644 --- a/docs/solr/README.md +++ b/docs/solr/README.md @@ -1,7 +1,7 @@ --- -title: Solr Server (Daemon) +title: Solr Server (Daemon) authors: - - Markus Mandalka + - Markus Mandalka --- # Solr Server (Daemon) @@ -9,50 +9,51 @@ authors: ## Solr package for Debian and Ubuntu - -This Debian package and Ubuntu package is a preconfigurated [Apache Solr](http://lucene.apache.org/solr) server running as a daemon providing important settings like integration of the [thesaurus editor](../doc/datamanagement/thesaurus) and ontologies manager, settings for more performance, disabled logging and security settings and a more current Solr version than the [packages of the Debian](https://packages.debian.org/search?suite=stable§ion=all&arch=any&searchon=sourcenames&keywords=lucene-solr) or Ubuntu standard repositories. - + +This Debian package and Ubuntu package is a preconfigurated [Apache Solr](http://lucene.apache.org/solr) server running as a daemon providing important settings like the integration of the [thesaurus editor](../doc/datamanagement/thesaurus) and ontologies manager, settings for better performance, disabled logging and security settings and a more current Solr version than the [packages of the Debian](https://packages.debian.org/search?suite=stable§ion=all&arch=any&searchon=sourcenames&keywords=lucene-solr) or Ubuntu standard repositories. + # Settings of preconfigured Solr package ## Disabled Logfiles -**Disabled logfiles**: we don't want to write each search query to Solr logs. If you want to switch on logging for debugging purposes, switch on `file` and `console` the config file `/var/solr/log4j.properties` +**Disabled logfiles**: we don't want to write each search query to Solr logs. If you want to switch on logging for debugging purposes, switch on `file` and `console` in the config file `/var/solr/log4j.properties` + ## Autocommits -**Automatic commits** to the index after 15 seconds after adding or update of documents (autocommit=15000) - +**Automatic commits** to the index after 15 seconds after adding or update of documents (autocommit=15000) + ## Running as daemon -**Automatic start on booting** since running as daemon in Debian GNU/Linux or Ubuntu Linux. - - +**Automatic start on booting** running as a daemon in Debian GNU/Linux or Ubuntu Linux. + + ## Increase maximum RAM settings of the Java Virtual Machine (JVM) -**Automatic memory settings**: In most cases no manual setting of Java virtual machine options needed anymore. Allows the Java VM to use as much RAM as possible at this server, so you wont get problems because of default Java Virtual Machine (JVM) maximal RAM settings (option *-Xmx*) if indexing very much data or large documents. - +**Automatic memory settings**: In most cases no manual setting of Java virtual machine options is needed anymore. Allows the Java VM to use as much RAM as possible on this server, so you won't have problems because of default Java Virtual Machine (JVM) maximal RAM settings (option *-Xmx*) if indexing large amounts of data or large documents. + ## Swappiness -**Disabled swappiness**, so the system will only swap if necessary. So it doesn't to optimize RAM for running software swapping the Solr index and search caches automatically after some time because they are not used for some time. Why? Even if some parts of the Solr index and caches in RAM are not used for long time (f.e. if search isn't used for the night or some days) and that RAM could be used by other software meanwhile to: To read hundrets of MB or some GB from Swap on slower harddisks to RAM again because of using again while the first search after long time would lead to timeouts and errors on maybe important searches, which than could take tens of seconds more time. - +**Disabled swappiness**, so the system will only swap if necessary. So it doesn't do so to optimize RAM for running software swapping the Solr index and search caches automatically after some time because they are not used for some time. Why? Even if some parts of the Solr index and caches in RAM are not used for a long time (i.e. if search isn't used for the night or some days) and that RAM could be used by other software meanwhile to read hundrets of MB or some GB from Swap on slower harddisks to RAM again because of using again while the first search after long time would lead to timeouts and errors on maybe important searches, which then could take tens of seconds longer. + ## Access only from localhost - -For security reasons access to the Solr search server is only possible from the same computer. -So **[access is only possible from localhost](#localhost)**, so that if you set a password to the User Interfaces module *solr-php-ui* and the search apps nobody without an account on your computer or an account for a service on your computer can read all data from unprotected Solr instead - -To enable Solr remote admin access from other computers than localhost you have to edit jetty-http.xml and delete the default="127.0.0.1" from the config option "host". Then restart Solr by `service solr restart`. - -Warning: You don't want to enable access to unprotected Solr server with the possibility to read, add, change or delete all indexed data for everybody on the net or internet! So if the computers are part of a network you can not fully trust, you have to protect the IP of the Solr server or the Solr port for example by a firewall. - + +For security reasons access to the Solr search server is only possible from the same computer. +So **[access is only possible from localhost](#localhost)**, so that if you set a password to the User Interfaces module *solr-php-ui* and the search apps nobody without an account on your computer or an account for a service on your computer can read all the data from Solr. + +To enable Solr remote admin access from other computers than localhost you have to edit `jetty-http.xml` and delete the `default="127.0.0.1"` from the config option `"host"`. Then restart Solr by `service solr restart`. + +Warning: You don't want to enable access to an unprotected Solr server with the possibility to read, add, change or delete all indexed data for everybody on the intranet or internet! So if the computers are part of a network you can not fully trust, you have to protect the IP of the Solr server or the Solr port for example by a firewall. + ## Solr schema - -There are additional fields and stemming configured in the Solr schema. You can read the XML schema config in `/var/solr/data/core1/conf/managed-schema` which is based on the Solr example config set `/opt/solr/server/solr/configsets`, so you can use a diff tool to compare and see the config additions. - -Additionally the ETL and search tool add & use some additional fields with are created automatically by Solr dynamic fields feature configured for the schema because of type endings like \_b \_s or \_tt. You can see such additional fields by the table view. \ No newline at end of file + +There are additional fields and stemming configured in the Solr schema. You can read the XML schema config in `/var/solr/data/core1/conf/managed-schema` which is based on the Solr example config set `/opt/solr/server/solr/configsets`, so you can use a diff tool to compare and see the config additions. + +Additionally the ETL and search tool adds & uses some additional fields which are created automatically using the Solr dynamic fields feature configured for the schema because of type endings like \_b \_s or \_tt. You can see such additional fields using the table view. \ No newline at end of file diff --git a/docs/tagger/README.md b/docs/tagger/README.md index 0d64720..ab1b2b7 100644 --- a/docs/tagger/README.md +++ b/docs/tagger/README.md @@ -1,12 +1,12 @@ --- -title: Open Semantic Tagger -authors: +title: Open Semantic Tagger +authors: - Markus Mandalka --- # Open Semantic Tagger -Tagger is a light weight responsive web app for tagging web pages and documents. - +Tagger is a light weight responsive web app for tagging web pages and documents. + It stores the tags for the documents, files or web pages in the Django database and makes them available in RDF. \ No newline at end of file From 14f5948e228f7090119498a45cab546174e1fae8 Mon Sep 17 00:00:00 2001 From: Dave Mackey Date: Fri, 17 Mar 2023 00:38:17 -0400 Subject: [PATCH 4/4] Fix bad links for additional apps entries --- mkdocs.yml | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/mkdocs.yml b/mkdocs.yml index c9908b3..c68bd85 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -24,18 +24,18 @@ nav: - 'Task queue': 'doc/admin/queue/README.md' - 'Logs': 'doc/admin/config/log/README.md' - 'Additional Applications': - - 'CSV Manager': 'docs/solr-search-csv-python-django/README.md' - - 'Graph Explorer': 'docs/graph-explorer/README.md' - - 'Lexememes': 'docs/lexememes/README.md' - - 'Ontology Tagger': 'docs/solr-ontology-tagger/README.md' - - 'Relevance Ranking Analysis': 'docs/solr-relevance-ranking-analysis/README.md' - - 'RDF or SKOS to OCR': 'docs/rdf2ocr/README.md' - - 'RSS-Feed Manager': 'docs/rss-feed-manager-python-django/README.md' - - 'Search Lists': 'docs/search-list/README.md' - - 'SKOS to Solr': 'docs/skos2solr/README.md' - - 'Solr Server': 'docs/solr/README.md' - - 'Tagger': 'docs/tagger/README.md' - - 'QueryTagger': 'docs/solr-search-querytagger-python-django/README.md' + - 'CSV Manager': '/solr-search-csv-python-django/README.md' + - 'Graph Explorer': '/graph-explorer/README.md' + - 'Lexememes': '/lexememes/README.md' + - 'Ontology Tagger': '/solr-ontology-tagger/README.md' + - 'Relevance Ranking Analysis': '/solr-relevance-ranking-analysis/README.md' + - 'RDF or SKOS to OCR': '/rdf2ocr/README.md' + - 'RSS-Feed Manager': '/rss-feed-manager-python-django/README.md' + - 'Search Lists': '/search-list/README.md' + - 'SKOS to Solr': '/skos2solr/README.md' + - 'Solr Server': '/solr/README.md' + - 'Tagger': '/tagger/README.md' + - 'QueryTagger': '/solr-search-querytagger-python-django/README.md' - 'Download': 'download/README.md' - 'Development': 'https://github.com/opensemanticsearch/open-semantic-search/' - 'Donate': 'donate/README.md'