Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Performance Issues at index.php?option=com_content&view=articles SQL __workflow_associations #43701

Open
SmikeSix2 opened this issue Jun 25, 2024 · 5 comments

Comments

@SmikeSix2
Copy link

We have a few joomla new portals with lots of articles, in the 20k+ count.
The Article index.php?option=com_content&view=articles felt really slow and a lot of editors were feeling bad about the new version.
I had time to look into it and the SQL produced for selecting the articles (at least in my case), with the inner joins takes 3s on a rly big site.
TLDR go into debug mode to the articles view and look at the sql query, then see how long it takes or do an explain, it may be only in our systems with some error while migrating or some indexes missing. MariaDB latest version

I fixed it by changing two inner joins to left joins and forcing an index of the __workflow_associations table. Without the forcing of the index, it does some weird shit ->
idx_extension,202,const,21516,Using index condition; Using temporary; Using filesort

My final querying looks like that:

->join('LEFT', $db->quoteName('#__languages', 'l'), $db->quoteName('l.lang_code') . ' = ' . $db->quoteName('a.language'))
->join('LEFT', $db->quoteName('#__content_frontpage', 'fp'), $db->quoteName('fp.content_id') . ' = ' . $db->quoteName('a.id'))
->join('LEFT', $db->quoteName('#__users', 'uc'), $db->quoteName('uc.id') . ' = ' . $db->quoteName('a.checked_out'))
->join('LEFT', $db->quoteName('#__viewlevels', 'ag'), $db->quoteName('ag.id') . ' = ' . $db->quoteName('a.access'))
->join('LEFT', $db->quoteName('#__categories', 'c'), $db->quoteName('c.id') . ' = ' . $db->quoteName('a.catid'))
->join('LEFT', $db->quoteName('#__categories', 'parent'), $db->quoteName('parent.id') . ' = ' . $db->quoteName('c.parent_id'))
->join('LEFT', $db->quoteName('#__users', 'ua'), $db->quoteName('ua.id') . ' = ' . $db->quoteName('a.created_by'))
->join('INNER', $db->quoteName('#__workflow_associations', 'wa') . ' FORCE INDEX (PRIMARY)', $db->quoteName('wa.item_id') . ' = ' . $db->quoteName('a.id'))
->join('LEFT', $db->quoteName('#__workflow_stages', 'ws'), $db->quoteName('ws.id') . ' = ' . $db->quoteName('wa.stage_id'))
->join('LEFT', $db->quoteName('#__workflows', 'w'), $db->quoteName('w.id') . ' = ' . $db->quoteName('ws.workflow_id'));

(copy from forum post)
INNER JOIN leads to a temporary table with 22k entries for our smallest portal.

With INNER JOIN 500ms, LEFT JOIN its 1.5ms. In bigger portals that up to 5s and makes it quite unusable. Not sure if the DB is missing an index, i tried to optimize, but for now ill change that behaviour

EXPLAIN SELECT a.id,a.asset_id,a.title,a.alias,a.checked_out,a.checked_out_time,a.catid,a.state,a.access,a.created,a.created_by,a.created_by_alias,a.modified,a.ordering,a.featured,a.language,a.hits,a.publish_up,a.publish_down,a.introtext,a.fulltext,a.note,a.images,a.metakey,a.metadesc,a.metadata,a.version,fp.featured_up,fp.featured_down,l.title AS language_title,l.image AS language_image,uc.name AS editor,ag.title AS access_level,c.title AS category_title,c.created_user_id AS category_uid,c.level AS category_level,c.published AS category_published,parent.title AS parent_category_title,parent.id AS parent_category_id,parent.created_user_id AS parent_category_uid,parent.level AS parent_category_level,ua.name AS author_name,wa.stage_id AS stage_id,ws.title AS stage_title,ws.workflow_id AS workflow_id,w.title AS workflow_title,COALESCE(NULLIF(ROUND(v.rating_sum / v.rating_count, 0), 0), 0) AS rating,COALESCE(NULLIF(v.rating_count, 0), 0) AS rating_count
FROM hycwr_content AS a
LEFT JOIN hycwr_languages AS l ON l.lang_code = a.language
LEFT JOIN hycwr_content_frontpage AS fp ON fp.content_id = a.id
LEFT JOIN hycwr_users AS uc ON uc.id = a.checked_out
LEFT JOIN hycwr_viewlevels AS ag ON ag.id = a.access
LEFT JOIN hycwr_categories AS c ON c.id = a.catid
LEFT JOIN hycwr_categories AS parent ON parent.id = c.parent_id
LEFT JOIN hycwr_users AS ua ON ua.id = a.created_by
INNER JOIN hycwr_workflow_associations AS wa ON wa.item_id = a.id
INNER JOIN hycwr_workflow_stages AS ws ON ws.id = wa.stage_id
INNER JOIN hycwr_workflows AS w ON w.id = ws.workflow_id
LEFT JOIN hycwr_content_rating AS v ON a.id = v.content_id
WHERE wa.extension = 'com_content.article' AND a.state IN (1,0)
ORDER BY a.id desc LIMIT 20;

EXPLAIN SELECT a.id,a.asset_id,a.title,a.alias,a.checked_out,a.checked_out_time,a.catid,a.state,a.access,a.created,a.created_by,a.created_by_alias,a.modified,a.ordering,a.featured,a.language,a.hits,a.publish_up,a.publish_down,a.introtext,a.fulltext,a.note,a.images,a.metakey,a.metadesc,a.metadata,a.version,fp.featured_up,fp.featured_down,l.title AS language_title,l.image AS language_image,uc.name AS editor,ag.title AS access_level,c.title AS category_title,c.created_user_id AS category_uid,c.level AS category_level,c.published AS category_published,parent.title AS parent_category_title,parent.id AS parent_category_id,parent.created_user_id AS parent_category_uid,parent.level AS parent_category_level,ua.name AS author_name,COALESCE(NULLIF(ROUND(v.rating_sum / v.rating_count, 0), 0), 0) AS rating,COALESCE(NULLIF(v.rating_count, 0), 0) AS rating_count
FROM hycwr_content AS a
LEFT JOIN hycwr_languages AS l ON l.lang_code = a.language
LEFT JOIN hycwr_content_frontpage AS fp ON fp.content_id = a.id
LEFT JOIN hycwr_users AS uc ON uc.id = a.checked_out
LEFT JOIN hycwr_viewlevels AS ag ON ag.id = a.access
LEFT JOIN hycwr_categories AS c ON c.id = a.catid
LEFT JOIN hycwr_categories AS parent ON parent.id = c.parent_id
LEFT JOIN hycwr_users AS ua ON ua.id = a.created_by
LEFT JOIN hycwr_content_rating AS v ON a.id = v.content_id
WHERE (a.state = 0 OR a.state = 1)
ORDER BY a.id desc LIMIT 20;

EXPLAIN SELECT a.id, a.title, a.alias, a.checked_out, a.checked_out_time, a.catid, a.state, a.access, a.created, a.created_by, a.created_by_alias, a.modified, a.ordering, a.featured, a.language, a.hits, a.publish_up, a.publish_down, a.note,l.title AS language_title, l.image AS language_image,uc.name AS editor,ag.title AS access_level,c.title AS category_title, c.created_user_id AS category_uid, c.level AS category_level,parent.title AS parent_category_title, parent.id AS parent_category_id,
parent.created_user_id AS parent_category_uid, parent.level AS parent_category_level,ua.name AS author_name,COALESCE(NULLIF(ROUND(v.rating_sum / v.rating_count, 0), 0), 0) AS rating,
COALESCE(NULLIF(v.rating_count, 0), 0) as rating_count

FROM hycwr_content AS a

LEFT JOIN hycwr_languages AS l
ON l.lang_code = a.language

LEFT JOIN hycwr_users AS uc
ON uc.id=a.checked_out

LEFT JOIN hycwr_viewlevels AS ag
ON ag.id = a.access

LEFT JOIN hycwr_categories AS c
ON c.id = a.catid

LEFT JOIN hycwr_categories AS parent
ON parent.id = c.parent_id

LEFT JOIN hycwr_users AS ua
ON ua.id = a.created_by

LEFT JOIN hycwr_content_rating AS v
ON a.id = v.content_id

WHERE (a.state = 0 OR a.state = 1)

ORDER BY a.id desc
LIMIT 20;
Top
Smike
Joomla! Apprentice
Joomla! Apprentice
Posts: 10
Joined: Wed Mar 06, 2019 3:45 pm
Re: We have a news backend with few thousand articles, the workflow leads to slow loading times...
Post by Smike » Tue Jun 25, 2024 3:31 pm

Also in the live system the index primary needed to be enforced, or it did use also a slow query. im not sure why though, it had the same db.

->join('LEFT', $db->quoteName('#__workflow_associations', 'wa'), $db->quoteName('wa.item_id') . ' = ' . $db->quoteName('a.id'))

LEFT JOIN hycwr_workflow_associations AS wa FORCE INDEX (PRIMARY) ON wa.item_id = a.id

System information (as much as possible)

Debian.

Additional comments

@richard67
Copy link
Member

Not sure if we can use a „FORCE INDEX“ clause in the join in our code which produces the SQL, as it also has to work with PostgreSQL, and the database framework classes as far as I know don’t provide an API method for that, so we would have to use plain SQL for that clause.

@richard67
Copy link
Member

Regarding the joins of workflow tables this PR might help and solve parts of this issue: #40176

@carlitorweb
Copy link
Member

carlitorweb commented Jun 25, 2024

About performance when you list all the articles, that query select the fulltext of each article and that increase a lot the load time.
A workaround I found when the site is big, is use somehow the archive state, in this way at leats the query skip a good amount of articles who do not need be listed

@SmikeSix2
Copy link
Author

Not sure if we can use a „FORCE INDEX“ clause in the join in our code which produces the SQL, as it also has to work with PostgreSQL, and the database framework classes as far as I know don’t provide an API method for that, so we would have to use plain SQL for that clause.

yeah its odd, I actually first changed them to left joins, which worked locally.
on the servers, for some reason, it used some other __workflow_associations index and created a temporarily sorted 22k table.
so in both cases, LEFT or INNER join there were some performance issues. the #40176 probably would fix this

(all LEFT JOINS)
It takes workflow_associations as first index
LEFT JOIN hycwr_workflow_associations AS wa ON wa.item_id = a.id

1,SIMPLE,wa,ref,"PRIMARY,idx_item_stage_extension,idx_item_id,idx_extension",idx_extension,202,const,21516,Using index condition; Using temporary; Using filesort

VS
LEFT JOIN hycwr_workflow_associations AS wa FORCE INDEX (PRIMARY) ON wa.item_id = a.id

1,SIMPLE,a,index,"PRIMARY,idx_state,idx_id_desc,idx_content_main,idx_hycwr_content_state,idx_state_id",PRIMARY,4,,39,Using where

Oh yeah it uses that index because the WHERE has:

wa.extension = 'com_content.article' ...

(we are on joomla 4 and have 16 portals with 1million+ articles, i just upgraded them to 4 from 3, i just cant do that agian)

@SmikeSix2
Copy link
Author

SmikeSix2 commented Jun 26, 2024

Also in the bigger Websites you make a COUNT(*) on all articles, with all JOINS. Thats really not good, takes 2s on a large website.

I worked myself around and just got rid of workflows conditions and i added a $info to DatabaseQuery and in BaseDatabaseModel i added:

if($query->info == "articles_com_content"){ $query->clear('select')->clear('order')->clear('limit')->clear('offset')->clear('join')->select('COUNT(*)'); }else{ $query->clear('select')->clear('order')->clear('limit')->clear('offset')->select('COUNT(*)'); }
Very hackish, but it just needs to work.

You should extend the DatabaseQuery and look if it has an optimized COUNT(*) for the paginations.

SELECT COUNT(*)
FROM hycwr_content AS a
WHERE a.state IN (0,1)

2.4ms
vs
2s
SELECT COUNT(*)
FROM kog80_content AS a
LEFT JOIN kog80_languages AS l ON l.lang_code = a.language
LEFT JOIN kog80_content_frontpage AS fp ON fp.content_id = a.id
LEFT JOIN kog80_users AS uc ON uc.id = a.checked_out
LEFT JOIN kog80_viewlevels AS ag ON ag.id = a.access
LEFT JOIN kog80_categories AS c ON c.id = a.catid
LEFT JOIN kog80_categories AS parent ON parent.id = c.parent_id
LEFT JOIN kog80_users AS ua ON ua.id = a.created_by
LEFT JOIN kog80_workflow_associations AS wa FORCE INDEX (PRIMARY) ON wa.item_id = a.id
LEFT JOIN kog80_workflow_stages AS ws ON ws.id = wa.stage_id
LEFT JOIN kog80_workflows AS w ON w.id = ws.workflow_id
LEFT JOIN kog80_content_rating AS v ON a.id = v.content_id
WHERE a.state IN (0,1)

Edit: Sorry for the bad English and unformatted text. I was in a rush.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants