0. Survey and Tutorial (12)
1. Database Configuration
2. Query Optimization
3. Workload Scheduling (2)
4. Database Design
5. Database Monitoring (9)
6. Database Diagnosis
- 6.1 System Diagnosis (3)
- 6.2 Query Diagnosis (1)
7. Training Data Generation
- 7.1 Query Generation (2)
- 7.2 Data Generation (3)
8. AI Techniques
- 8.1 Feature Encoding (6)
- 8.2 Model Transfer (1)
9. Database Frameworks (14)
10. Demonstrations
11. Talks

0. Survey and Tutorial

[Survey | AIDB] Xuanhe Zhou, Chengliang Chai, Guoliang Li, Ji Sun. Database Meets Artificial Intelligence: A Survey. TKDE, 2020. [paper]

[Survey | ML4DB] Wei Wang, Meihui Zhang, Gang Chen, et al. Database meets deep learning: Challenges and opportunities. SIGMOD Record, 2016. [paper]

[Survey | RL4DB] Qingpeng Cai, Can Cui, Yiyuan Xiong, et al. A Survey on Deep Reinforcement Learning for Data Processing and Analytics. arXive, 2021. [paper]

[Tutorial | AI4DB] Stratos Idreos, Tim Kraska. From auto-tuning one size fits all to self-designed and learned data-intensive systems. SIGMOD, 2019. [paper]

[Tutorial | AI4DB] Guoliang Li, Xuanhe Zhou, Lei Cao. AI Meets Database: AI4DB and DB4AI. SIGMOD 2021. [paper][slides]

[Tutorial | AI4DB] Guoliang Li, Xuanhe Zhou, Lei Cao. Machine Learning for Databases. VLDB 2021. [paper][slides]

[Tutorial | AI4Tuning] Jiaheng Lu, Yuxing Chen, Herodotos Herodotou, Shivnath Babu. Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems, VLDB, 2019. [paper][slides]

[Tutorial | AI4CloudDB] Alekh Jindal, Matteo Interlandi. Machine Learning for Cloud Data Systems: the Promise , the Progress , and the Path Forward. VLDB, 2021. [paper]

[Tutorial | AI4Tuning] Zhengtong Yan, Jiaheng Lu, Naresh Chainani, Chunbin Lin. Workload-Aware Performance Tuning for Autonomous DBMSs. ICDE, 2021. [paper]

[Tutorial | AI4DBCluster] Brad Glasbergen, Michael Abebe, Khuzaima Daudjee. Tutorial: Adaptive Replication and Partitioning in Data Systems. Middleware, 2018. [paper]

[Tutorial | LearnedIndex] Abdullah Al-Mamun, Hao Wu, Walid G. Aref. A Tutorial on Learned Multi-dimensional Indexes. SIGSPATIAL, 2020. [paper]

[Tutorial | NLP4DB] Immanuel Trummer. From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management. VLDB, 2022. [paper]

1. Database Configuration

Knob Tuner

Heuristic

[Rule-based] PGTune: https://pgtune.leopard.in.ua.

[Search-based] OpenTuner: An Extensible Framework for Program Autotuning (PACT, 2014) [paper]

[Search-based] BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning (SoCC, 2017) [paper]

BO-based

[Gaussian Process] Tuning Database Conﬁguration Parameters with iTuned. (VLDB, 2009) [paper]

[Gaussian Process] Automatic database management system tuning through large-scale machine learning. (SIGMOD, 2017) [paper]

[Gaussian Process, Featurization] Black or White? How to Develop an AutoTuner for Memory-based Analytics (SIGMOD, 2020) [paper]

[Gaussian Process, Model Transferring] ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases (SIGMOD, 2021) [paper]

[Contextual Gaussian Process] CGPTuner: a Contextual Gaussian Process Bandit Approach for the Automatic Tuning of IT Configurations Under Varying Workload Conditions (VLDB, 2021) [paper]

[Bounded Gaussian Process] Towards Dynamic and Safe Configuration Tuning for Cloud Databases (SIGMOD, 2022) [paper]

[Gaussian Process] LlamaTune: Sample-Efficient DBMS Configuration Tuning (VLDB, 2022) [paper]

DL-based

[DL] iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases (VLDB, 2019) [paper]

RL-based

[RL] An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning (SIGMOD, 2019) [paper]

[RL, Query Encoding] QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning (VLDB, 2019) [paper]

[Light-weight RL] Universal Database Optimization using Reinforcement Learning (VLDB, 2021) [paper]

[RL, Pre-trained model] Watuning: A workload-aware tuning system with attention-based deep reinforcement learning. (JCST, 2021) [paper]

[RL, NLP model] The Case for NLP-Enhanced Database Tuning: Towards Tuning Tools that "Read the Manual" (VLDB, 2021) [paper]

[RL, NLP model] DB-BERT: a Database Tuning Tool that “Reads the Manual” (SIGMOD, 2022) [paper]

[RL, Genetic algorithm] HUNTER- An Online Cloud Database Hybrid Tuning System for Personalized Requirements (SIGMOD,2022 ) [paper]

Experiments

An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems (VLDB, 2021) [paper]

Facilitating Database Tuning with Hyper-Parameter Optimization- A Comprehensive Experimental Evaluation (VLDB, 2021) [paper]

Knob Selection

SARD: A statistical approach for ranking database tuning parameters (ICDEW, 2008) [paper]

Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs (HotStorage 2020) [paper]

Peer-reviewed papers and codes at https://github.com/evolveDB/tuning-survey/blob/main/README.md

View Advisor

A. Jindal, K. Karanasos, S. Rao, and H. Patel. Selecting subexpressions to materialize at datacenter scale. PVLDB, 11(7):800–812, 2018.[paper]

Ahmed, R., Bello, R., Witkowski, A., & Kumar, P. (2020). Automated generation of materialized views in Oracle. VLDB, 2020. [paper]

Yuan, H., Sun, J., & Li, G. (2020). Automatic View Generation for Equivalent Subqueries with Deep Learning and Reinforcement Learning. ICDE, 2020. [paper]

Han, Y., Li, G., Yuan, H., & Sun, J. (n.d.). An Autonomous Materialized View Management System with Deep Reinforcement Learning. ICDE, 2021. [paper]

Yue Han, Chengliang Chai, Jiabin Liu, Guoliang Li, Chuangxian Wei, Chaoqun Zhan. Dynamic Materialized View Management using Graph Neural Network. ICDE 2023. [paper]

Index Advisor

[Experimental Evaluation] Jan Kossmann, Stefan Halfpap, Marcel Jankrift, Rainer Schlosser: Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms. Proc. VLDB Endow. 13(11): 2382-2395 (2020) [paper]

[Heuristic-based, AutoAdmin] Surajit Chaudhuri, Vivek R. Narasayya: An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. VLDB 1997: 146-155 [paper]

[Heuristic-based, DB2Advis] Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, Alan Skelley: DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. ICDE 2000: 101-110 [paper]

[Heuristic-based, Relaxation] Nicolas Bruno, Surajit Chaudhuri: Automatic Physical Database Tuning: A Relaxation-based Approach. SIGMOD Conference 2005: 227-238 [paper]

[Heuristic-based, COLT] Karl Schnaitter, Serge Abiteboul, Tova Milo, Neoklis Polyzotis: On-Line Index Selection for Shifting Workloads. ICDE Workshops 2007: 459-468 [paper]

[Heuristic-based, Extend] Rainer Schlosser, Jan Kossmann, Martin Boissier: Efficient Scalable Multi-attribute Index Selection Using Recursive Strategies. ICDE 2019: 1238-1249 [paper]

[Learning-based, DQN] Hai Lan, Zhifeng Bao, Yuwei Peng: An Index Advisor Using Deep Reinforcement Learning. CIKM 2020: 2105-2108 [paper]

[Learning-based, DQN] Zahra Sadri, Le Gruenwald, Eleazar Leal: Online Index Selection Using Deep Reinforcement Learning for a Cluster Database. ICDE Workshops 2020: 158-161 [paper]

[Learning-based, DQN] Gabriel Paludo Licks, Júlia Mara Colleoni Couto, Priscilla de Fátima Miehe, Renata De Paris, Duncan Dubugras A. Ruiz, Felipe Meneguzzi: SmartIX: A Database Indexing Agent based on Reinforcement Learning. Appl. Intell. 50(8): 2575-2588 (2020) [paper]

[Learning-based, DQN] Vishal Sharma, Curtis E. Dyreson, Nicholas Flann: MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning. IDEAS 2021: 56-64 [paper]

[Learning-based, DQN] Yu Yan, Shun Yao, Hongzhi Wang, Meng Gao: Index selection for NoSQL database with deep reinforcement learning. Inf. Sci. 561: 20-30 (2021) [paper]

[Learning-based, DQN] Vishal Sharma, Curtis E. Dyreson: Indexer++: Workload-aware Online Index Tuning with Transformers and Reinforcement Learning. SAC 2022: 372-380 [paper]

[Learning-based, MAB] R. Malinga Perera, Bastian Oetomo, Benjamin I. P. Rubinstein, Renata Borovica-Gajic: DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees. ICDE 2021: 600-611 [paper]

[Learning-based, MAB] R. Malinga Perera, Bastian Oetomo, Benjamin I. P. Rubinstein, Renata Borovica-Gajic: HMAB: Self-Driving Hierarchy of Bandits for Integrated Physical Database Design Tuning. Proc. VLDB Endow. 16(2): 216-229 (2022) [paper]

[Learning-based, MCTS] Xuanhe Zhou, Luyang Liu, Wenbo Li, Lianyuan Jin, Shifu Li, Tianqing Wang, Jianhua Feng: AutoIndex: An Incremental Index Management System for Dynamic Workloads. ICDE 2022: 2196-2208 [paper]

[Learning-based, MCTS] Wentao Wu, Chi Wang, Tarique Siddiqui, Junxiong Wang, Vivek R. Narasayya, Surajit Chaudhuri, Philip A. Bernstein: Budget-aware Index Tuning with Reinforcement Learning. SIGMOD Conference 2022: 1528-1541 [paper]

[Optimization, Learned Cost] Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, Vivek R. Narasayya: AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. SIGMOD Conference 2019: 1241-1258 [paper]

[Optimization, Learned Cost] Jianling Gao, Nan Zhao, Ning Wang, Shuang Hao: SmartIndex: An Index Advisor with Learned Cost Estimator. CIKM 2022: 4853-4856 [paper]

[Optimization, Workload Summarization] Tarique Siddiqui, Saehan Jo, Wentao Wu, Chi Wang, Vivek R. Narasayya, Surajit Chaudhuri: ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning. SIGMOD Conference 2022: 660-673 [paper]

[Optimization, What-if Call] Tarique Siddiqui, Wentao Wu, Vivek R. Narasayya, Surajit Chaudhuri: DISTILL: Low-Overhead Data-Driven Techniques for Filtering and Costing Indexes for Scalable Index Tuning. Proc. VLDB Endow. 15(10): 2019-2031 (2022) [paper]

Partition Advisor

[horizontal, DRL] Benjamin Hilprecht, Carsten Binnig, Uwe Röhm. Learning a Partitioning Advisor for Cloud Databases. SIGMOD, 2020. [paper]

[horizontal, DRL] Benjamin Hilprecht, Carsten Binnig, Uwe Röhm. Towards learning a partitioning advisor with deep reinforcement learning. aiDM@SIGMOD, 2019. [paper]

[horizontal, HybridAlgorithms] Panos Parchas, Yonatan Naamad, Peter Van Bouwel, et al. Fast and effective distribution-key recommendation for amazon redshift. PVLDB, 2020. [paper]

[horizontal, DataSkip] Martin Boissier, Kurzynski Daniel. Workload-driven horizontal partitioning and pruning for large HTAP systems. ICDE Workshop, 2018. [paper]

[horizontal, GraphPartition] Carlo Curino, Yang Zhang, Evan P. C. Jones, Samuel Madden. Schism: a Workload-Driven Approach to Database Replication and Partitioning. PVLDB, 2010. [paper]

[horizontal, Heuristic] Jun Rao, Chun Zhang, Nimrod Megiddo, Guy M. Lohman. Automating physical database design in a parallel database. SIGMOD, 2002. [paper]

[vertical, DRL] Campero Durand G, Piriyev R, Pinnecke M, et al. Automated vertical partitioning with deep reinforcement learning. ADBIS, 2019. [paper]

[co-partition] Zamanian, E., Binnig, C., & Salama, A. (2015). Locality-aware partitioning in parallel database systems. SIGMOD. [paper]

[co-partition] Rabl, T., & Jacobsen, H. A. (2017). Query centric partitioning and allocation for partially replicated database systems. SIGMOD. [paper]

[situ] Olma, M., Karpathiotakis, M., Alagiannis, I., Athanassoulis, M., & Ailamaki, A. (2020). Adaptive partitioning and indexing for in situ query processing. VLDB Journal. [paper]

2. Query Optimization

Query Rewriter

(note other interesting problems like text2SQL are not within the scope)

Traditional

[rewrite rules] Béatrice Finance, Georges Gardarin. A Rule-Based Query Rewriter in an Extensible DBMS. ICDE 1991. [paper]

[rewrite rules] Hamid Pirahesh, Joseph M. Hellerstein, Waqar Hasan. Extensible/Rule Based Query Rewrite Optimization in Starburst. SIGMOD Conference 1992. [paper]

[cost/heuristic rewrite] Rafi Ahmed, Allison W. Lee, Andrew Witkowski, et al. Cost-Based Query Transformation in Oracle. VLDB 2006: 1026-1036. [paper]

[heuristic rewrite] De Araújo, A. H. M., Monteiro, J. M., Antônio, J., De Macêdo, F., Tavares, J. A., Brayner, A., & Lifschitz, S. (2014). ARe-SQL: An Online, Automatic and Non-Intrusive Approach for Rewriting SQL Queries. JIDM, 2014. [paper]

[equivalence] Shumo Chu, Konstantin Weitz, Alvin Cheung, Dan Suciu. HoTTSQL: proving query rewrites with univalent SQL semantics. PLDI 2017: 510-524. [paper]

[optimization engine] Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M. J., & Lemire, D. (2018). Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources. SIGMOD, 2018. [paper]

[map-reduce] Partho Sarthi, Kaushik Rajan, Akash Lal, Abhishek Modi, et al. Generalized Sub-Query Fusion for Eliminating Redundant I/O from Big-Data Queries. OSDI 2020: 209-224. [paper]

[streaming] Wentao Wu, Philip A. Bernstein, Alex Raizman, Christina Pavlopoulou. Cost-based Query Rewriting Techniques for Optimizing Aggregates Over Correlated Windows. CoRR abs/2008.12379 (2020) [paper]

[rewrite rules] Zhaoguo Wang, Zhou Zhou, Yicun Yang, Haoran Ding, Gansen Hu, Ding Ding, Chuzhe Tang, Haibo Chen, Jinyang Li. WeTune: Automatic Discovery and Verification of Query Rewrite Rules. SIGMOD Conference 2022: 94-107. [paper]

Learning-based

[predicate rewrite] Qi Zhou, Joy Arulraj, Shamkant B. Navathe, William Harris, Jinpeng Wu. Sia : Optimizing Queries using Learned Predicates. SIGMOD, 2021. [paper]

[rewrite strategy] Xuanhe Zhou, Guoliang Li, Chengliang Chai, Jianhua Feng. A Learned Query Rewrite System using Monte Carlo Tree Search. VLDB, 2022. [paper]

Cost Estimation

[Card, Query-based] Xiao Hu, Yuxi Liu, Haibo Xiu, Pankaj K. Agarwal, Debmalya Panigrahi, Sudeepa Roy, Jun Yang. Selectivity Functions of Range Queries are Learnable. SIGMOD, 2022. [paper]

[Card, Query-based] Kipf A, Kipf T, Radke B, et al. Learned cardinalities: Estimating correlated joins with deep learning. CIDR, 2019. [paper]

[Card, Query-based] Woltmann L, Hartmann C, Thiele M, et al. Cardinality estimation with local deep learning models. aiDM, 2019. [paper]

[Card, Query-based] Tzoumas K, Deshpande A, Jensen C S. Lightweight graphical models for selectivity estimation without independence assumptions[J]. Proceedings of the VLDB Endowment, 4(11): 852-863, 2011. [paper]

[Card, Query-based] Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., & Chaudhuri, S. (2018). Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment, 12(9), 1044–1057, 2018. [paper]

[Card, Query-based] Hayek, R., & Shmueli, O. (2020). NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT. arXiv， 2020. [paper]

[Card, Query-based, Adaptability] Beibin Li, Yao Lu, Srikanth Kandula: Warper: Efficiently Adapting Learned Cardinality Estimators to Data and Workload Drifts. SIGMOD Conference 2022: 1920-1933 [paper]

[Card, Data-based] Lu Y, Kandula S, König A C, et al. Pre-training summarization models of structured datasets for cardinality estimation[J]. Proceedings of the VLDB Endowment, 2021. [paper]

[Card, Data-based] Yang, Z., Liang, E., Kamsetty, A., Wu, C., Duan, Y., Chen, X., … Stoica, I. (2019). Deep Unsupervised Cardinality Estimation. VLDB, 2019. [paper]

[Card, Data-based] Yang, Z., Kamsetty, A., Luan, S., Liang, E., Duan, Y., Chen, X., & Stoica, I. (2020). Neurocard: One cardinality estimator for all tables. Proceedings of the VLDB Endowment, 14(1), 61–73, 2020. [paper]

[Card, Data-based] Zhu R, Wu Z, Han Y, et al. FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation[J]. arXiv preprint arXiv:2011.09022, 2020. [paper]

[Card, Data-based] Wu Z, Shaikhha A, Zhu R, et al. BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation. arXiv preprint arXiv: 2012.14743, 2020. [paper]

[Card, Data-based] Leis, V., Radke, B., Gubichev, A., Kemper, A., & Neumann, T. (2017). Cardinality estimation done right: Index-based join sampling. CIDR, 2017. [paper]

[Card, Data-based] Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., & Binnig, C. (2020). DeepDB: Learn from data, not from queries! Proceedings of the VLDB Endowment, 13(7), 992–1005, 2020. [paper]

[Card, Data-based] Zhu, R., Wu, Z., Han, Y., Zeng, K., Pfadler, A., Qian, Z., … Cui, B. (2020). FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. VLDB, 2021. [paper]

[Card, Data-based] Hasan S, Thirumuruganathan S, Augustine J, et al. Deep learning models for selectivity estimation of multi-attribute queries. SIGMOD, 2020. [paper]

[Card, Data-based] Heimel M, Kiefer M, Markl V. Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. Proceedings of the ACM SIGMOD, 2015. [paper]

[Card, Data-based] Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. Quicksel: Quick selectivity learning with mixture models. SIGMOD 2020. [paper]

[Card, Data-based] Jiayi Wang, Chengliang Chai, Jiabin Liu, Guoliang Li. FACE: A Normalizing Flow based Cardinality Estimator. VLDB 2022. [paper]

[Card, Data-based] Yao Lu, Srikanth Kandula, Arnd Christian König, Surajit Chaudhuri. Pre-training summarization models of structured datasets for cardinality estimation. VLDB 2022. [paper]

[Card, Query&Data-based] Wu P, Cong G. A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation[C]//Proceedings of the 2021 International Conference on Management of Data. 2021: 2009-2022. [paper]

[Card] Parimarjan Negi, Ryan C. Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, Mohammad Alizadeh. Flow-Loss: Learning Cardinality Estimates That Matter. VLDB Endow, 14(11): 2019-2032, 2021. [paper]

[Cost] Marcus, R., & Papaemmanouil, O. (2019). Plan-Structured Deep Neural Network Models for Query Performance Prediction. 1733–1746. [paper]

[Cost] Sun, J., & Li, G. (n.d.). An End-to-End Learning-based Cost Estimator. VLDB, 2020. [paper]

[Cost] Benjamin Hilprecht, Carsten Binnig. Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction. VLDB, 2022. [paper]

[ EA&B ] Wang, X., Qu, C., Wu, W., Wang, J., & Zhou, Q. (2021). Are We Ready For Learned Cardinality Estimation? Proc. VLDB Endow. 14(9): 1640-1654 (2021). [paper]

[ EA&B ] Sun, J., Zhang, J., Sun, Z., Li, G., & Tang, N. (n.d.). Learned Cardinality Estimation : A Design Space Exploration and a Comparative Evaluation [ EA & B ]. 14(1). VLDB, 2022. [paper]

[ EA&B ] Yuxing Han, Ziniu Wu, Peizhi Wu, et al. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation Yuxing. VLDB, 2022. [paper]

[ EA&B ] Kyoungmin Kim, Jisung Jung, In Seo, Wook-Shin Han, Kangwoo Choi, Jaehyok Chong: Learned Cardinality Estimation: An In-depth Study. SIGMOD Conference 2022: 1214-1227 [paper]

[ EA&B ] Harmouch, H., & Naumann, F. (2018). Cardinality Estimation: An Experimental Survey. Pvldb, 11(4), 4999–512, 2017. [paper]

Plan Optimization

[Parallel MCTS] Ziyun Wei, Immanuel Trummer. SkinnerMT: Parallelizing for Efficiency and Robustness in Adaptive Query Processing on Multicore Platforms. PVLDB, 2022. [paper]

[OptimizedRL] Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica. Balsa. Learning a Query Optimizer Without Expert Demonstrations. SIGMOD, 2022 [paper]

Jan Kossmann. Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization. CIDR, 2022 [paper]

Ron Avnur, Joseph M. Hellerstein. Eddies: Continuously Adaptive Query Processing. SIGMOD, 2000. [paper]

Marcus, R., Negi, P., Mao, H., Zhang, C., Alizadeh, M., Kraska, T., … Tatbul, N. (2018). Neo: A Learned query optimizer. Proceedings of the VLDB Endowment, 12(11), 1705–1718, 2018. [paper]

Marcus, R., & Papaemmanouil, O. (2018). Deep reinforcement learning for join order enumeration. Proceedings of the 1st International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, AiDM 2018, 0–3. [paper]

Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. (2016). How Good Are Query Optimizers, Really? Proceedings of the VLDB Endowment, 9(3), 204–215. [paper]

Trummer, I., Wang, J., Maram, D., Moseley, S., Jo, S., & Antonakakis, J. (n.d.). SkinnerDB : Regret-Bounded Query Evaluation via Reinforcement Learning. SIGMOD, 2019. [paper]

Ding, M., Chen, S., & Manegold, S. (2021). Progressive Join Algorithms Considering User Preference. CIDR, 2021. [paper]

Yu, X., Li, G., Tang, N. (n.d.). Reinforcement Learning with Tree-LSTM for Join Order Selection. ICDE, 2020. [paper]

Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao. Towards a Learning Optimizer for Shared Clouds. Proc. VLDB Endow. 12(3): 210-222, 2018. [paper]

Plan Hinter

Pasupuleti, K., Park, M., & Valluri, S. (n.d.). SQL Plan Observability through Hints in Oracle Autonomous Database.

Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2020). Bao: Making Learned Query Optimization Practical. SIGMOD, 2021. [paper]

Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal. Steering Query Optimizers: A Practical Take on Big Data Workloads. SIGMOD, 2021. [paper]

3. Workload Scheduling

Ibrahim Sabek, Tenzin Samten Ukyab, Tim Kraska. LSched: A Workload-Aware Learned Query Scheduler for Analytical Database Systems. SIGMOD, 2022. [paper]

Chi Zhang, Ryan Marcus, and et al. Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning. In VLDB, 2020. [paper]

4. Database Design

Index

One-dimensional Index

[1-D, Immutable] Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018). The case for learned index structures. SIGMOD, 2018. [paper] [code]

[1-D, Mutable] Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., & Kraska, T. (2019). Fiting-tree: A data-aware index structure. SIGMOD, 2019. [paper]

[1-D, Mutable, Secondary] Wu, Y., Yu, J., Tian, Y., Sidle, R., Barber, R. (2019). Designing succinct secondary indexing mechanism by exploiting column correlations. SIGMOD 2019. [paper]

[1-D, Mutable] Ferragina, P., & Vinciguerra, G. (2020). The PGM-index : a fully-dynamic compressed learned index with provable worst-case bounds. VLDB, 2020. [paper]

[1-D, Mutable] Ding, J., Minhas, U. F., Yu, J., Wang, C., Do, J., Li, Y., Zhang, H., Chandramouli, B., Gehrke, J., Kossmann, D., Lomet, D., & Kraska, T. (2020). ALEX: An Updatable Adaptive Learned Index. SIGMOD, 2020. [paper] [code]

[1-D, Mutable, Persistent] Lu, B., Ding, J., Lo, E., Minhas, U. F., & Wang, T. (2021). APEX: A High-Performance Learned Index on Persistent Memory. VLDB, 2021. [paper]

[1-D, Immutable, Auto-generated] Dittrich, J., Nix, J., & Schön, C. (2021). The next 50 Years in Database Indexing or: The Case for Automatically Generated Index Structures. VLDB, 2021. [paper] [code]

[1-D, Mutable, Concurrency] Li, P., Hua, Y., Jia, J., Zuo, P. (2021). FINEdex: A Fine-grained Learned Index Scheme for Scalable and Concurrent Memory Systems. VLDB, 2021. [paper]

[1-D, Mutable] Wu, J., Zhang, Y., Chen, S., Wang, J., Chen, Y., Xing, C. (2021). Updatable learned index with precise positions. VLDB, 2021. [paper]

[1-D, Mutable] Ma, C., Yu, X., Li, Y., Meng, X., & Maoliniyazi, A. (2022). FILM: A Fully Learned Index for Larger-Than-Memory Databases. VLDB, 2022. [paper]

[1-D, Mutable, Concurrency] Wang, Z., Chen, H., Wang, Y., & Tang, C. (2022). The Concurrent Learned Indexes for Multicore Data Storage. ACM Transactions on Storage, 18(1), 1-35. [paper] [code]

[1-D, Mutable] Jiaoyi Zhang, Yihan Gao. (2022). CARMI: A Cache-Aware Learned Index with a Cost-based Construction Algorithm. VLDB, 2022. [paper]

[1-D, Mutable] Shangyu Wu. (2022). NFL: Robust Learned Index via Distribution Transformation. VLDB, 2022. [paper]

[1-D, Mutable, Persistent] Zhang, Z., Chu, Z., Jin, P., Luo, Y., Xie, X., Wan, S., Luo, Y., Wu, X., Zou, P., Zheng, C., Wu, G., Rudoff. A. (2022). PLIN: A Persistent Learned Index for Non-Volatile Memory with High Performance and Instant Recovery. VLDB, 2022. [paper]

Multi-dimensional Index

[Multi-D, Immutable] Nathan, V., Ding, J., Alizadeh, M., & Kraska, T. (2020). Learning multi-dimensional indexes. SIGMOD, 2020. [paper]

[Multi-D, Mutable, Persistent] Li, P., Lu, H., Zheng, Q., Yang, L., & Pan, G. (2020). LISA: A Learned Index Structure for Spatial Data. SIGMOD, 2020. [paper]

[Multi-D, Mutable, Persistent] Qi, J., Liu, G., Jensen, C.S., Kulik, L. (2020). Effectively learning spatial indices. VLDB, 2020. [paper]

[Multi-D, Immutable] Ding, J., Nathan, V., Alizadeh, M., & Kraska, T. (2020). Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. VLDB, 2020. [paper]

[Multi-D, Mutable] Dong, H., Chai, C., Luo, Y., Liu, J., Feng, J., Zhan, C. (2022). RW-Tree: A Learned Workload-aware Framework for R-tree Construction. ICDE, 2022. [paper]

Experiment and Analysis

[1-D, Immutable, Analysis] Ferragina, P., Lillo, F., & Vinciguerra, G. (2020). Why are learned indexes so effective?. ICML, 2020. [paper]

[1-D, Immutable, Experiment] Marcus, R., Stoian, M., Kipf, A., Misra, S., van Renen, A., Kemper, A., Neumann, T., & Kraska, T. (2020). Benchmarking learned indexes. VLDB, 2020. [paper] [code]

[1-D, Poisoning Attack] Evgenios M. Kornaropoulos, Silei Ren, Roberto Tamassia. (2022). The Price of Tailoring the Index to Your Data: Poisoning Attacks on Learned Index Structures. SIGMOD, 2022. [paper]

[1-D, Mutable, Experiment] Wongkham, C., Lu, B., Liu, C., Zhong, Z., Lo, E., Wang, T. (2022). Are Updatable Learned Indexes Ready?. VLDB, 2022. [paper]

[1-D, Immutable, Experiment] Maltry, M., Dittrich, J. (2022). A critical analysis of recursive model indexes. VLDB, 2022. [paper]

[1-D, Hash Index, Experiment] Sabek, I., Vaidya, K., Horn TUM, D., Kipf, A., Mitzenmacher, M., Kraska, T., Horn, D., Kraska Can, T. (2022) Can Learned Models Replace Hash Functions?. VLDB, 2022. [paper]

Layout

[Learned Layout] Liwen Sun, Michael J. Franklin, Sanjay Krishnan, et al. Fine-grained partitioning for aggressive data skipping. SIGMOD, 2014. [paper]

[Learned Layout] Yang, Z., Chandramouli, B., Wang, C., Gehrke, J., Li, Y., Minhas, U. F., … Acharya, R. (n.d.). Qd-tree: Learning Data Layouts for Big Data Analytics. SIGMOD, 2020. [paper]

[Learned Layout] Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, et al. Instance-Optimized Data Layouts for Cloud Analytics Workloads. SIGMOD, 2021. [paper]

[Learned Layout] Bandle, M., Giceva, J., & Neumann, T. (2021). To Partition, or Not to Partition, That is the Join Question in a Real System. SIGMOD, 2021. [paper]

[Data Container] Madden S, Ding J, Kraska T, Sudhir S, Cohen D, Mattson T, Tatbul N. Self-Organizing Data Containers. CIDR, 2022. [paper]

[Learned Layout] Teng Zhang, Jian Tan, Xin Cai, Jianying Wang, Feifei Li, Jianling Sun. SA-LSM : Optimize Data Layout for LSM-tree Based Storage using Survival Analysis. VLDB, 2022. [paper]

[Learned Layout] Michael Abebe. Tiresias: Enabling Predictive Autonomous Storage and Indexing. VLDB, 2022. [paper]

Query Execution

[CodeGen] Immanuel Trummer. CodexDB: Synthesizing Code for Qery Processing from Natural Language Instructions using GPT-3 Codex. VLDB, 2022. [paper]

Zhang, C., Marcus, R., Kleiman, A., & Papaemmanouil, O. (2020). Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning. AIDB@VLDB, 2020. [paper]

5. Database Monitoring

[Trend Prediction] L. Ma, D. V. Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon, “Query-based Workload Forecasting for Self-driving Database Management Systems,” in SIGMOD, 2018. [paper]

[Performance Prediction] Dorn, J., Apel, S., & Siegmund, N. (n.d.). Mastering Uncertainty in Performance Estimations of Configurable Software Systems. (3).

[Performance Prediction] Marcus, R., & Papaemmanouil, O. (2019). Plan-structured deep neural network models for query performance prediction. Proceedings of the VLDB Endowment, 12(11), 1733–1746. [paper]

[Performance Prediction] Wu, W., Chi, Y., Hacig̈um̈uş, H., & Naughton, J. F. (2013). Towards predicting query execution time for concurrent and dynamic database workloads. Proceedings of the VLDB Endowment, 6(10), 925–936. [paper]

[Performance Prediction] Duggan, J., Papaemmanouil, O., Cetintemel, U., & Upfal, E. (2014). Contender: A resource modeling approach for concurrent query performance prediction. Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings, 109–120. [paper]

[Performance Prediction] Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüş, H., & Naughton, J. F. (2013). Predicting query execution time: Are optimizer cost models really unusable? Proceedings - International Conference on Data Engineering, (1), 1081–1092. [paper]

[Performance Prediction] Higginson, A. S., Dediu, M., Arsene, O., Paton, N. W., & Embury, S. M. (2020). Database Workload Capacity Planning using Time Series Analysis and Machine Learning. Proceedings of the ACM SIGMOD International Conference on Management of Data, 769–783. [paper]

[Performance Prediction] Unterbrunner, P., Giannikis, G., Alonso, G., Fauser, D., & Kossmann, D. (2009). Predictable performance for unpredictable workloads. Proceedings of the VLDB Endowment, 2(1), 706–717. [paper]

[Performance Prediction] Xuanhe Zhou, Ji Sun, Guoliang Li, Jianhua Feng. Query Performance Prediction for Concurrent Queries using Graph Embedding. [paper]

6. Database Diagnosis

System Diagnosis

Yoon, D. Y., Niu, N., & Mozafari, B. (2016). DBSherlock: A performance diagnostic tool for transactional databases. Proceedings of the ACM SIGMOD International Conference on Management of Data, 26-June-20(i), 1599–1614. [paper]

Kalmegh, P., Babu, S., & Roy, S. (2019). iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks. SIGMOD. [paper]

Ma, M., Yin, Z., Zhang, S., Wang, S., Zheng, C., & Jiang, X. (2020). Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases. PVLDB Endowment. [paper]

Query Diagnosis

Xiaoze Liu, Zheng Yin, Chao Zhao, et al. PinSQL: Pinpoint Root Cause SQLs to Resolve Performance Issues in Cloud Databases. ICDE 2022. [paper]

7. Training Data Generation

Query Generation

L.Zhang, C.Chai, X.Zhou, and G.Li. Learned sqlgen: Constraint-aware sql generation using reinforcement learning. In SIGMOD, 2022. [paper]

Liu X, Kong X, Liu L, et al. TreeGAN: syntax-aware sequence generation with generative adversarial networks. In ICDM, 2018. [paper]

Data Generation

[DeepAR] Jingyi Yang, Peizhi Wu, Gao Cong, Tieying Zhang, Xiao He. SAM: Database Generation from Query Workloads with Supervised Autoregressive Models. SIGMOD, 2022. [paper]

Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Volker Markl. Expand your training limits! Generating training data for ML-based data management. SIGMOD, 2021 [paper]

Ju Fan, Tongyu Liu, Guoliang Li, Yuwei Shen, Xiaoyong Du. Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration. VLDB 2020. [paper]

8. AI Techniques

Feature Encoding

[PlanEncoding] Yue Zhao, Gao Cong, Jiachen Shi, Chunyan Miao. QueryFormer: A Tree Transformer Model for Query Plan Representation. VLDB, 2022. [paper]

[Plan2Feature] Debjyoti Paul, Jie Cao, Feifei Li, Vivek Srikumar. Database Workload Characterization with Query Plan Encoders. VLDB, 2022. [paper]

[Pretrained Representation] Xiu Tang, Sai Wu, Mingli Song, Shanshan Ying, Feifei Li, Gang Chen: PreQR: Pre-training Representation for SQL Understanding. SIGMOD Conference 2022: 204-216 [paper]

[WorkloadAsGraph] Sanjay Agrawal, Eric Chu, Vivek R. Narasayya. Automatic physical design tuning: workload as a sequence. SIGMOD, 2006. [paper]

[DataSummary] Brit Youngmann et al. Guided Exploration of Data Summaries. VLDB, 2022. [paper]

Jiang H, Liu C, Paparrizos J, et al. Good to the Last Bit: Data-Driven Encoding with CodecDB. SIGMOD 2021. [paper]

Model Transfer

Meghdad Kurmanji, Peter Triantafillou. Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data. SIGMOD, 2023. [paper]

9. Database Frameworks

[MLTrain] Lim WS, Butrovich M, Zhang W, Crotty A, Ma L, Xu P, Gehrke J, Pavlo A. Database Gyms. CIDR, 2023. [paper]

[AcademicDB] Immanuel L Haffner, Jens Dittrich. mutable: A Modern DBMS for Research and Fast Prototyping. CIDR, 2023. [paper]

[ModelValid] Remmelt Ammerlaan, Gilbert Antonius, Marc Friedman, et al. PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost!. VLDB, 2022. [[paper]

[Transferable] Ziniu Wu, et al. A Unified Transferable Model for ML-Enhanced DBMS. CIDR, 2022. [paper]

[Transferable] Benjamin Hilprecht, Carsten Binnig. One Model to Rule them All: Towards Zero-Shot Learning for Databases. CIDR, 2022. [paper]

[AutoDB] Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., … Zhang, T. (2017). Self-Driving Database Management Systems. CIDR, 2017. [paper]

[AutoDB] Li, F. (2018). Cloud native database systems at Alibaba: Opportunities and challenges. Proceedings of the VLDB Endowment, 2018. [paper]

[AutoDB] Kraska, T., Alizadeh, M., Beutel, A., Chi, E. H., Ding, J., Kristo, A., … Nathan, V. (2019). SageDB: A learned database system. CIDR, 2019. [paper]

[AutoDB] Li, G., Zhou, X., Li, S. (2019). XuanYuan: An AI-Native Database. Data Eng., 2019. [paper]

[AutoDB] Hilprecht, B., Bang, T., El-Hindi, M., Hättasch, B., Khanna, A., Rehrmann, R., … Binnig, C. (2020). DBMS Fitting: Why should we learn what we already know? Cidr, 2020. [paper]

[AutoDB] Ma, L., Zhang, W., Jiao, J., Wang, W., Butrovich, M., Lim, W. S., … Pavlo, A. (2021). MB2 : Decomposed Behavior Modeling for Self-Driving Database Management Systems. SIGMOD, 2021. [paper]

[AutoDB] Guoliang Li, Xuanhe Zhou, , Ji Sun, Xiang Yu, Yue Han, Lianyuan Jin, Wenbo Li, Tianqing Wang, Shifu Li. openGauss: An Autonomous Database System. VLDB, 2021. [paper]

[NLP] James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, Alon Y. Levy. From Natural Language Processing to Neural Databases. VLDB, 2021. [paper]

[Embedding] Raasveldt, M.. MonetDBLite: An embedded analytical database. SIGMOD, 2018. [paper]

10. Demonstrations

[DB Tuning] Immanuel Trummer. Demonstrating DB-BERT: A Database Tuning Tool that "Reads" the Manual. SIGMOD, 2022. [paper]

[DB Tuning] Luming Sun, Tao Ji, Cuiping Li, Hong Chen. DeepO: A Learned Query Optimizer. SIGMOD, 2022. [paper]

[DB Tuning] Junxiong Wang, Immanuel Trummer, Debabrota Basu. Demonstrating UDO: A Unified Approach for Optimizing Transaction Code, Physical Design, and System Parameters via Reinforcement Learning. SIGMOD, 2021. [paper]

[O&M Platform] Xuanhe Zhou, Lianyuan Jin, Ji Sun, Xinyang Zhao, Xiang Yu, Shifu Li, Tianqing Wang, Kun Li, luyang liu. DBMind: A Self-Driving Platform in openGauss. VLDB, 2021. [paper] [website]

[DB Tuning] Bohan Zhang, Dana Van Aken, Justin Wang, Tao Dai, Shuli Jiang, Jacky Lao, Siyuan Sheng, Andrew Pavlo, Geoffrey J. Gordon. A Demonstration of the ottertune automatic database management system tuning service. VLDB, 2018. [paper]

11. Talks

[AutoDB] Andy Pavlo, Matthew Butrovich, Lin Ma, Prashanth Menon, Wan Shen Lim, Dana Van Aken, William Zhang. Make Your Database System Dream of Electric Sheep : Towards Self-Driving Operation. VLDB, 2021. [paper]

[AutoDB] Tim Kraska. Towards instance-optimized data systems. VLDB, 2021. [paper]

[AutoDB] Guoliang Li. AI-Native Database. VLDB, 2021. [slides]

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
README.md		README.md
icde-2022-tutorial-paper-list		icde-2022-tutorial-paper-list

Feng0w0/AIDB

Folders and files

Latest commit

History

Repository files navigation

Researches in Autonomous Databases

Table of Contents