Skip to content

Feng0w0/AIDB

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Researches in Autonomous Databases

Continuously update the autonomous database works based on our past tutorials.

Kindly let us know if we have missed any great papers. Thank you!

Table of Contents

0. Survey and Tutorial

[Survey | AIDB] Xuanhe Zhou, Chengliang Chai, Guoliang Li, Ji Sun. Database Meets Artificial Intelligence: A Survey. TKDE, 2020. [paper]

[Survey | ML4DB] Wei Wang, Meihui Zhang, Gang Chen, et al. Database meets deep learning: Challenges and opportunities. SIGMOD Record, 2016. [paper]

[Survey | RL4DB] Qingpeng Cai, Can Cui, Yiyuan Xiong, et al. A Survey on Deep Reinforcement Learning for Data Processing and Analytics. arXive, 2021. [paper]

[Tutorial | AI4DB] Stratos Idreos, Tim Kraska. From auto-tuning one size fits all to self-designed and learned data-intensive systems. SIGMOD, 2019. [paper]

[Tutorial | AI4DB] Guoliang Li, Xuanhe Zhou, Lei Cao. AI Meets Database: AI4DB and DB4AI. SIGMOD 2021. [paper][slides]

[Tutorial | AI4DB] Guoliang Li, Xuanhe Zhou, Lei Cao. Machine Learning for Databases. VLDB 2021. [paper][slides]

[Tutorial | AI4Tuning] Jiaheng Lu, Yuxing Chen, Herodotos Herodotou, Shivnath Babu. Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems, VLDB, 2019. [paper][slides]

[Tutorial | AI4CloudDB] Alekh Jindal, Matteo Interlandi. Machine Learning for Cloud Data Systems: the Promise , the Progress , and the Path Forward. VLDB, 2021. [paper]

[Tutorial | AI4Tuning] Zhengtong Yan, Jiaheng Lu, Naresh Chainani, Chunbin Lin. Workload-Aware Performance Tuning for Autonomous DBMSs. ICDE, 2021. [paper]

[Tutorial | AI4DBCluster] Brad Glasbergen, Michael Abebe, Khuzaima Daudjee. Tutorial: Adaptive Replication and Partitioning in Data Systems. Middleware, 2018. [paper]

[Tutorial | LearnedIndex] Abdullah Al-Mamun, Hao Wu, Walid G. Aref. A Tutorial on Learned Multi-dimensional Indexes. SIGSPATIAL, 2020. [paper]

[Tutorial | NLP4DB] Immanuel Trummer. From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management. VLDB, 2022. [paper]

1. Database Configuration

Knob Tuner

Heuristic

[Rule-based] PGTune: https://pgtune.leopard.in.ua.

[Search-based] OpenTuner: An Extensible Framework for Program Autotuning (PACT, 2014) [paper]

[Search-based] BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning (SoCC, 2017) [paper]

BO-based

[Gaussian Process] Tuning Database Configuration Parameters with iTuned. (VLDB, 2009) [paper]

[Gaussian Process] Automatic database management system tuning through large-scale machine learning. (SIGMOD, 2017) [paper]

[Gaussian Process, Featurization] Black or White? How to Develop an AutoTuner for Memory-based Analytics (SIGMOD, 2020) [paper]

[Gaussian Process, Model Transferring] ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases (SIGMOD, 2021) [paper]

[Contextual Gaussian Process] CGPTuner: a Contextual Gaussian Process Bandit Approach for the Automatic Tuning of IT Configurations Under Varying Workload Conditions (VLDB, 2021) [paper]

[Bounded Gaussian Process] Towards Dynamic and Safe Configuration Tuning for Cloud Databases (SIGMOD, 2022) [paper]

[Gaussian Process] LlamaTune: Sample-Efficient DBMS Configuration Tuning (VLDB, 2022) [paper]

DL-based

[DL] iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases (VLDB, 2019) [paper]

RL-based

[RL] An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning (SIGMOD, 2019) [paper]

[RL, Query Encoding] QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning (VLDB, 2019) [paper]

[Light-weight RL] Universal Database Optimization using Reinforcement Learning (VLDB, 2021) [paper]

[RL, Pre-trained model] Watuning: A workload-aware tuning system with attention-based deep reinforcement learning. (JCST, 2021) [paper]

[RL, NLP model] The Case for NLP-Enhanced Database Tuning: Towards Tuning Tools that "Read the Manual" (VLDB, 2021) [paper]

[RL, NLP model] DB-BERT: a Database Tuning Tool that “Reads the Manual” (SIGMOD, 2022) [paper]

[RL, Genetic algorithm] HUNTER- An Online Cloud Database Hybrid Tuning System for Personalized Requirements (SIGMOD,2022 ) [paper]

Experiments

An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems (VLDB, 2021) [paper]

Facilitating Database Tuning with Hyper-Parameter Optimization- A Comprehensive Experimental Evaluation (VLDB, 2021) [paper]

Knob Selection

SARD: A statistical approach for ranking database tuning parameters (ICDEW, 2008) [paper]

Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs (HotStorage 2020) [paper]

Peer-reviewed papers and codes at https://github.com/evolveDB/tuning-survey/blob/main/README.md

View Advisor

A. Jindal, K. Karanasos, S. Rao, and H. Patel. Selecting subexpressions to materialize at datacenter scale. PVLDB, 11(7):800–812, 2018.[paper]

Ahmed, R., Bello, R., Witkowski, A., & Kumar, P. (2020). Automated generation of materialized views in Oracle. VLDB, 2020. [paper]

Yuan, H., Sun, J., & Li, G. (2020). Automatic View Generation for Equivalent Subqueries with Deep Learning and Reinforcement Learning. ICDE, 2020. [paper]

Han, Y., Li, G., Yuan, H., & Sun, J. (n.d.). An Autonomous Materialized View Management System with Deep Reinforcement Learning. ICDE, 2021. [paper]

Yue Han, Chengliang Chai, Jiabin Liu, Guoliang Li, Chuangxian Wei, Chaoqun Zhan. Dynamic Materialized View Management using Graph Neural Network. ICDE 2023. [paper]

Index Advisor

[Experimental Evaluation] Jan Kossmann, Stefan Halfpap, Marcel Jankrift, Rainer Schlosser: Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms. Proc. VLDB Endow. 13(11): 2382-2395 (2020) [paper]

[Heuristic-based, AutoAdmin] Surajit Chaudhuri, Vivek R. Narasayya: An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. VLDB 1997: 146-155 [paper]

[Heuristic-based, DB2Advis] Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, Alan Skelley: DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. ICDE 2000: 101-110 [paper]

[Heuristic-based, Relaxation] Nicolas Bruno, Surajit Chaudhuri: Automatic Physical Database Tuning: A Relaxation-based Approach. SIGMOD Conference 2005: 227-238 [paper]

[Heuristic-based, COLT] Karl Schnaitter, Serge Abiteboul, Tova Milo, Neoklis Polyzotis: On-Line Index Selection for Shifting Workloads. ICDE Workshops 2007: 459-468 [paper]

[Heuristic-based, Extend] Rainer Schlosser, Jan Kossmann, Martin Boissier: Efficient Scalable Multi-attribute Index Selection Using Recursive Strategies. ICDE 2019: 1238-1249 [paper]

[Learning-based, DQN] Hai Lan, Zhifeng Bao, Yuwei Peng: An Index Advisor Using Deep Reinforcement Learning. CIKM 2020: 2105-2108 [paper]

[Learning-based, DQN] Zahra Sadri, Le Gruenwald, Eleazar Leal: Online Index Selection Using Deep Reinforcement Learning for a Cluster Database. ICDE Workshops 2020: 158-161 [paper]

[Learning-based, DQN] Gabriel Paludo Licks, Júlia Mara Colleoni Couto, Priscilla de Fátima Miehe, Renata De Paris, Duncan Dubugras A. Ruiz, Felipe Meneguzzi: SmartIX: A Database Indexing Agent based on Reinforcement Learning. Appl. Intell. 50(8): 2575-2588 (2020) [paper]

[Learning-based, DQN] Vishal Sharma, Curtis E. Dyreson, Nicholas Flann: MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning. IDEAS 2021: 56-64 [paper]

[Learning-based, DQN] Yu Yan, Shun Yao, Hongzhi Wang, Meng Gao: Index selection for NoSQL database with deep reinforcement learning. Inf. Sci. 561: 20-30 (2021) [paper]

[Learning-based, DQN] Vishal Sharma, Curtis E. Dyreson: Indexer++: Workload-aware Online Index Tuning with Transformers and Reinforcement Learning. SAC 2022: 372-380 [paper]

[Learning-based, MAB] R. Malinga Perera, Bastian Oetomo, Benjamin I. P. Rubinstein, Renata Borovica-Gajic: DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees. ICDE 2021: 600-611 [paper]

[Learning-based, MAB] R. Malinga Perera, Bastian Oetomo, Benjamin I. P. Rubinstein, Renata Borovica-Gajic: HMAB: Self-Driving Hierarchy of Bandits for Integrated Physical Database Design Tuning. Proc. VLDB Endow. 16(2): 216-229 (2022) [paper]

[Learning-based, MCTS] Xuanhe Zhou, Luyang Liu, Wenbo Li, Lianyuan Jin, Shifu Li, Tianqing Wang, Jianhua Feng: AutoIndex: An Incremental Index Management System for Dynamic Workloads. ICDE 2022: 2196-2208 [paper]

[Learning-based, MCTS] Wentao Wu, Chi Wang, Tarique Siddiqui, Junxiong Wang, Vivek R. Narasayya, Surajit Chaudhuri, Philip A. Bernstein: Budget-aware Index Tuning with Reinforcement Learning. SIGMOD Conference 2022: 1528-1541 [paper]

[Optimization, Learned Cost] Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, Vivek R. Narasayya: AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. SIGMOD Conference 2019: 1241-1258 [paper]

[Optimization, Learned Cost] Jianling Gao, Nan Zhao, Ning Wang, Shuang Hao: SmartIndex: An Index Advisor with Learned Cost Estimator. CIKM 2022: 4853-4856 [paper]

[Optimization, Workload Summarization] Tarique Siddiqui, Saehan Jo, Wentao Wu, Chi Wang, Vivek R. Narasayya, Surajit Chaudhuri: ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning. SIGMOD Conference 2022: 660-673 [paper]

[Optimization, What-if Call] Tarique Siddiqui, Wentao Wu, Vivek R. Narasayya, Surajit Chaudhuri: DISTILL: Low-Overhead Data-Driven Techniques for Filtering and Costing Indexes for Scalable Index Tuning. Proc. VLDB Endow. 15(10): 2019-2031 (2022) [paper]

Partition Advisor

[horizontal, DRL] Benjamin Hilprecht, Carsten Binnig, Uwe Röhm. Learning a Partitioning Advisor for Cloud Databases. SIGMOD, 2020. [paper]

[horizontal, DRL] Benjamin Hilprecht, Carsten Binnig, Uwe Röhm. Towards learning a partitioning advisor with deep reinforcement learning. aiDM@SIGMOD, 2019. [paper]

[horizontal, HybridAlgorithms] Panos Parchas, Yonatan Naamad, Peter Van Bouwel, et al. Fast and effective distribution-key recommendation for amazon redshift. PVLDB, 2020. [paper]

[horizontal, DataSkip] Martin Boissier, Kurzynski Daniel. Workload-driven horizontal partitioning and pruning for large HTAP systems. ICDE Workshop, 2018. [paper]

[horizontal, GraphPartition] Carlo Curino, Yang Zhang, Evan P. C. Jones, Samuel Madden. Schism: a Workload-Driven Approach to Database Replication and Partitioning. PVLDB, 2010. [paper]

[horizontal, Heuristic] Jun Rao, Chun Zhang, Nimrod Megiddo, Guy M. Lohman. Automating physical database design in a parallel database. SIGMOD, 2002. [paper]

[vertical, DRL] Campero Durand G, Piriyev R, Pinnecke M, et al. Automated vertical partitioning with deep reinforcement learning. ADBIS, 2019. [paper]

[co-partition] Zamanian, E., Binnig, C., & Salama, A. (2015). Locality-aware partitioning in parallel database systems. SIGMOD. [paper]

[co-partition] Rabl, T., & Jacobsen, H. A. (2017). Query centric partitioning and allocation for partially replicated database systems. SIGMOD. [paper]

[situ] Olma, M., Karpathiotakis, M., Alagiannis, I., Athanassoulis, M., & Ailamaki, A. (2020). Adaptive partitioning and indexing for in situ query processing. VLDB Journal. [paper]

2. Query Optimization

Query Rewriter

(note other interesting problems like text2SQL are not within the scope)

Traditional

[rewrite rules] Béatrice Finance, Georges Gardarin. A Rule-Based Query Rewriter in an Extensible DBMS. ICDE 1991. [paper]

[rewrite rules] Hamid Pirahesh, Joseph M. Hellerstein, Waqar Hasan. Extensible/Rule Based Query Rewrite Optimization in Starburst. SIGMOD Conference 1992. [paper]

[cost/heuristic rewrite] Rafi Ahmed, Allison W. Lee, Andrew Witkowski, et al. Cost-Based Query Transformation in Oracle. VLDB 2006: 1026-1036. [paper]

[heuristic rewrite] De Araújo, A. H. M., Monteiro, J. M., Antônio, J., De Macêdo, F., Tavares, J. A., Brayner, A., & Lifschitz, S. (2014). ARe-SQL: An Online, Automatic and Non-Intrusive Approach for Rewriting SQL Queries. JIDM, 2014. [paper]

[equivalence] Shumo Chu, Konstantin Weitz, Alvin Cheung, Dan Suciu. HoTTSQL: proving query rewrites with univalent SQL semantics. PLDI 2017: 510-524. [paper]

[optimization engine] Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M. J., & Lemire, D. (2018). Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources. SIGMOD, 2018. [paper]

[map-reduce] Partho Sarthi, Kaushik Rajan, Akash Lal, Abhishek Modi, et al. Generalized Sub-Query Fusion for Eliminating Redundant I/O from Big-Data Queries. OSDI 2020: 209-224. [paper]

[streaming] Wentao Wu, Philip A. Bernstein, Alex Raizman, Christina Pavlopoulou. Cost-based Query Rewriting Techniques for Optimizing Aggregates Over Correlated Windows. CoRR abs/2008.12379 (2020) [paper]

[rewrite rules] Zhaoguo Wang, Zhou Zhou, Yicun Yang, Haoran Ding, Gansen Hu, Ding Ding, Chuzhe Tang, Haibo Chen, Jinyang Li. WeTune: Automatic Discovery and Verification of Query Rewrite Rules. SIGMOD Conference 2022: 94-107. [paper]

Learning-based

[predicate rewrite] Qi Zhou, Joy Arulraj, Shamkant B. Navathe, William Harris, Jinpeng Wu. Sia : Optimizing Queries using Learned Predicates. SIGMOD, 2021. [paper]

[rewrite strategy] Xuanhe Zhou, Guoliang Li, Chengliang Chai, Jianhua Feng. A Learned Query Rewrite System using Monte Carlo Tree Search. VLDB, 2022. [paper]

Cost Estimation

[Card, Query-based] Xiao Hu, Yuxi Liu, Haibo Xiu, Pankaj K. Agarwal, Debmalya Panigrahi, Sudeepa Roy, Jun Yang. Selectivity Functions of Range Queries are Learnable. SIGMOD, 2022. [paper]

[Card, Query-based] Kipf A, Kipf T, Radke B, et al. Learned cardinalities: Estimating correlated joins with deep learning. CIDR, 2019. [paper]

[Card, Query-based] Woltmann L, Hartmann C, Thiele M, et al. Cardinality estimation with local deep learning models. aiDM, 2019. [paper]

[Card, Query-based] Tzoumas K, Deshpande A, Jensen C S. Lightweight graphical models for selectivity estimation without independence assumptions[J]. Proceedings of the VLDB Endowment, 4(11): 852-863, 2011. [paper]

[Card, Query-based] Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., & Chaudhuri, S. (2018). Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment, 12(9), 1044–1057, 2018. [paper]

[Card, Query-based] Hayek, R., & Shmueli, O. (2020). NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT. arXiv, 2020. [paper]

[Card, Query-based, Adaptability] Beibin Li, Yao Lu, Srikanth Kandula: Warper: Efficiently Adapting Learned Cardinality Estimators to Data and Workload Drifts. SIGMOD Conference 2022: 1920-1933 [paper]

[Card, Data-based] Lu Y, Kandula S, König A C, et al. Pre-training summarization models of structured datasets for cardinality estimation[J]. Proceedings of the VLDB Endowment, 2021. [paper]

[Card, Data-based] Yang, Z., Liang, E., Kamsetty, A., Wu, C., Duan, Y., Chen, X., … Stoica, I. (2019). Deep Unsupervised Cardinality Estimation. VLDB, 2019. [paper]

[Card, Data-based] Yang, Z., Kamsetty, A., Luan, S., Liang, E., Duan, Y., Chen, X., & Stoica, I. (2020). Neurocard: One cardinality estimator for all tables. Proceedings of the VLDB Endowment, 14(1), 61–73, 2020. [paper]

[Card, Data-based] Zhu R, Wu Z, Han Y, et al. FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation[J]. arXiv preprint arXiv:2011.09022, 2020. [paper]

[Card, Data-based] Wu Z, Shaikhha A, Zhu R, et al. BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation. arXiv preprint arXiv: 2012.14743, 2020. [paper]

[Card, Data-based] Leis, V., Radke, B., Gubichev, A., Kemper, A., & Neumann, T. (2017). Cardinality estimation done right: Index-based join sampling. CIDR, 2017. [paper]

[Card, Data-based] Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., & Binnig, C. (2020). DeepDB: Learn from data, not from queries! Proceedings of the VLDB Endowment, 13(7), 992–1005, 2020. [paper]

[Card, Data-based] Zhu, R., Wu, Z., Han, Y., Zeng, K., Pfadler, A., Qian, Z., … Cui, B. (2020). FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. VLDB, 2021. [paper]

[Card, Data-based] Hasan S, Thirumuruganathan S, Augustine J, et al. Deep learning models for selectivity estimation of multi-attribute queries. SIGMOD, 2020. [paper]

[Card, Data-based] Heimel M, Kiefer M, Markl V. Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. Proceedings of the ACM SIGMOD, 2015. [paper]

[Card, Data-based] Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. Quicksel: Quick selectivity learning with mixture models. SIGMOD 2020. [paper]

[Card, Data-based] Jiayi Wang, Chengliang Chai, Jiabin Liu, Guoliang Li. FACE: A Normalizing Flow based Cardinality Estimator. VLDB 2022. [paper]

[Card, Data-based] Yao Lu, Srikanth Kandula, Arnd Christian König, Surajit Chaudhuri. Pre-training summarization models of structured datasets for cardinality estimation. VLDB 2022. [paper]

[Card, Query&Data-based] Wu P, Cong G. A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation[C]//Proceedings of the 2021 International Conference on Management of Data. 2021: 2009-2022. [paper]

[Card] Parimarjan Negi, Ryan C. Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, Mohammad Alizadeh. Flow-Loss: Learning Cardinality Estimates That Matter. VLDB Endow, 14(11): 2019-2032, 2021. [paper]

[Cost] Marcus, R., & Papaemmanouil, O. (2019). Plan-Structured Deep Neural Network Models for Query Performance Prediction. 1733–1746. [paper]

[Cost] Sun, J., & Li, G. (n.d.). An End-to-End Learning-based Cost Estimator. VLDB, 2020. [paper]

[Cost] Benjamin Hilprecht, Carsten Binnig. Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction. VLDB, 2022. [paper]

[ EA&B ] Wang, X., Qu, C., Wu, W., Wang, J., & Zhou, Q. (2021). Are We Ready For Learned Cardinality Estimation? Proc. VLDB Endow. 14(9): 1640-1654 (2021). [paper]

[ EA&B ] Sun, J., Zhang, J., Sun, Z., Li, G., & Tang, N. (n.d.). Learned Cardinality Estimation : A Design Space Exploration and a Comparative Evaluation [ EA & B ]. 14(1). VLDB, 2022. [paper]

[ EA&B ] Yuxing Han, Ziniu Wu, Peizhi Wu, et al. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation Yuxing. VLDB, 2022. [paper]

[ EA&B ] Kyoungmin Kim, Jisung Jung, In Seo, Wook-Shin Han, Kangwoo Choi, Jaehyok Chong: Learned Cardinality Estimation: An In-depth Study. SIGMOD Conference 2022: 1214-1227 [paper]

[ EA&B ] Harmouch, H., & Naumann, F. (2018). Cardinality Estimation: An Experimental Survey. Pvldb, 11(4), 4999–512, 2017. [paper]

Plan Optimization

[Parallel MCTS] Ziyun Wei, Immanuel Trummer. SkinnerMT: Parallelizing for Efficiency and Robustness in Adaptive Query Processing on Multicore Platforms. PVLDB, 2022. [paper]

[OptimizedRL] Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica. Balsa. Learning a Query Optimizer Without Expert Demonstrations. SIGMOD, 2022 [paper]

Jan Kossmann. Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization. CIDR, 2022 [paper]

Ron Avnur, Joseph M. Hellerstein. Eddies: Continuously Adaptive Query Processing. SIGMOD, 2000. [paper]

Marcus, R., Negi, P., Mao, H., Zhang, C., Alizadeh, M., Kraska, T., … Tatbul, N. (2018). Neo: A Learned query optimizer. Proceedings of the VLDB Endowment, 12(11), 1705–1718, 2018. [paper]

Marcus, R., & Papaemmanouil, O. (2018). Deep reinforcement learning for join order enumeration. Proceedings of the 1st International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, AiDM 2018, 0–3. [paper]

Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. (2016). How Good Are Query Optimizers, Really? Proceedings of the VLDB Endowment, 9(3), 204–215. [paper]

Trummer, I., Wang, J., Maram, D., Moseley, S., Jo, S., & Antonakakis, J. (n.d.). SkinnerDB : Regret-Bounded Query Evaluation via Reinforcement Learning. SIGMOD, 2019. [paper]

Ding, M., Chen, S., & Manegold, S. (2021). Progressive Join Algorithms Considering User Preference. CIDR, 2021. [paper]

Yu, X., Li, G., Tang, N. (n.d.). Reinforcement Learning with Tree-LSTM for Join Order Selection. ICDE, 2020. [paper]

Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao. Towards a Learning Optimizer for Shared Clouds. Proc. VLDB Endow. 12(3): 210-222, 2018. [paper]

Plan Hinter

Pasupuleti, K., Park, M., & Valluri, S. (n.d.). SQL Plan Observability through Hints in Oracle Autonomous Database.

Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2020). Bao: Making Learned Query Optimization Practical. SIGMOD, 2021. [paper]

Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal. Steering Query Optimizers: A Practical Take on Big Data Workloads. SIGMOD, 2021. [paper]

3. Workload Scheduling

Ibrahim Sabek, Tenzin Samten Ukyab, Tim Kraska. LSched: A Workload-Aware Learned Query Scheduler for Analytical Database Systems. SIGMOD, 2022. [paper]

Chi Zhang, Ryan Marcus, and et al. Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning. In VLDB, 2020. [paper]

4. Database Design

Index

One-dimensional Index

[1-D, Immutable] Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018). The case for learned index structures. SIGMOD, 2018. [paper] [code]

[1-D, Mutable] Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., & Kraska, T. (2019). Fiting-tree: A data-aware index structure. SIGMOD, 2019. [paper]

[1-D, Mutable, Secondary] Wu, Y., Yu, J., Tian, Y., Sidle, R., Barber, R. (2019). Designing succinct secondary indexing mechanism by exploiting column correlations. SIGMOD 2019. [paper]

[1-D, Mutable] Ferragina, P., & Vinciguerra, G. (2020). The PGM-index : a fully-dynamic compressed learned index with provable worst-case bounds. VLDB, 2020. [paper]

[1-D, Mutable] Ding, J., Minhas, U. F., Yu, J., Wang, C., Do, J., Li, Y., Zhang, H., Chandramouli, B., Gehrke, J., Kossmann, D., Lomet, D., & Kraska, T. (2020). ALEX: An Updatable Adaptive Learned Index. SIGMOD, 2020. [paper] [code]

[1-D, Mutable, Persistent] Lu, B., Ding, J., Lo, E., Minhas, U. F., & Wang, T. (2021). APEX: A High-Performance Learned Index on Persistent Memory. VLDB, 2021. [paper]

[1-D, Immutable, Auto-generated] Dittrich, J., Nix, J., & Schön, C. (2021). The next 50 Years in Database Indexing or: The Case for Automatically Generated Index Structures. VLDB, 2021. [paper] [code]

[1-D, Mutable, Concurrency] Li, P., Hua, Y., Jia, J., Zuo, P. (2021). FINEdex: A Fine-grained Learned Index Scheme for Scalable and Concurrent Memory Systems. VLDB, 2021. [paper]

[1-D, Mutable] Wu, J., Zhang, Y., Chen, S., Wang, J., Chen, Y., Xing, C. (2021). Updatable learned index with precise positions. VLDB, 2021. [paper]

[1-D, Mutable] Ma, C., Yu, X., Li, Y., Meng, X., & Maoliniyazi, A. (2022). FILM: A Fully Learned Index for Larger-Than-Memory Databases. VLDB, 2022. [paper]

[1-D, Mutable, Concurrency] Wang, Z., Chen, H., Wang, Y., & Tang, C. (2022). The Concurrent Learned Indexes for Multicore Data Storage. ACM Transactions on Storage, 18(1), 1-35. [paper] [code]

[1-D, Mutable] Jiaoyi Zhang, Yihan Gao. (2022). CARMI: A Cache-Aware Learned Index with a Cost-based Construction Algorithm. VLDB, 2022. [paper]

[1-D, Mutable] Shangyu Wu. (2022). NFL: Robust Learned Index via Distribution Transformation. VLDB, 2022. [paper]

[1-D, Mutable, Persistent] Zhang, Z., Chu, Z., Jin, P., Luo, Y., Xie, X., Wan, S., Luo, Y., Wu, X., Zou, P., Zheng, C., Wu, G., Rudoff. A. (2022). PLIN: A Persistent Learned Index for Non-Volatile Memory with High Performance and Instant Recovery. VLDB, 2022. [paper]

Multi-dimensional Index

[Multi-D, Immutable] Nathan, V., Ding, J., Alizadeh, M., & Kraska, T. (2020). Learning multi-dimensional indexes. SIGMOD, 2020. [paper]

[Multi-D, Mutable, Persistent] Li, P., Lu, H., Zheng, Q., Yang, L., & Pan, G. (2020). LISA: A Learned Index Structure for Spatial Data. SIGMOD, 2020. [paper]

[Multi-D, Mutable, Persistent] Qi, J., Liu, G., Jensen, C.S., Kulik, L. (2020). Effectively learning spatial indices. VLDB, 2020. [paper]

[Multi-D, Immutable] Ding, J., Nathan, V., Alizadeh, M., & Kraska, T. (2020). Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. VLDB, 2020. [paper]

[Multi-D, Mutable] Dong, H., Chai, C., Luo, Y., Liu, J., Feng, J., Zhan, C. (2022). RW-Tree: A Learned Workload-aware Framework for R-tree Construction. ICDE, 2022. [paper]

Experiment and Analysis

[1-D, Immutable, Analysis] Ferragina, P., Lillo, F., & Vinciguerra, G. (2020). Why are learned indexes so effective?. ICML, 2020. [paper]

[1-D, Immutable, Experiment] Marcus, R., Stoian, M., Kipf, A., Misra, S., van Renen, A., Kemper, A., Neumann, T., & Kraska, T. (2020). Benchmarking learned indexes. VLDB, 2020. [paper] [code]

[1-D, Poisoning Attack] Evgenios M. Kornaropoulos, Silei Ren, Roberto Tamassia. (2022). The Price of Tailoring the Index to Your Data: Poisoning Attacks on Learned Index Structures. SIGMOD, 2022. [paper]

[1-D, Mutable, Experiment] Wongkham, C., Lu, B., Liu, C., Zhong, Z., Lo, E., Wang, T. (2022). Are Updatable Learned Indexes Ready?. VLDB, 2022. [paper]

[1-D, Immutable, Experiment] Maltry, M., Dittrich, J. (2022). A critical analysis of recursive model indexes. VLDB, 2022. [paper]

[1-D, Hash Index, Experiment] Sabek, I., Vaidya, K., Horn TUM, D., Kipf, A., Mitzenmacher, M., Kraska, T., Horn, D., Kraska Can, T. (2022) Can Learned Models Replace Hash Functions?. VLDB, 2022. [paper]

Layout

[Learned Layout] Liwen Sun, Michael J. Franklin, Sanjay Krishnan, et al. Fine-grained partitioning for aggressive data skipping. SIGMOD, 2014. [paper]

[Learned Layout] Yang, Z., Chandramouli, B., Wang, C., Gehrke, J., Li, Y., Minhas, U. F., … Acharya, R. (n.d.). Qd-tree: Learning Data Layouts for Big Data Analytics. SIGMOD, 2020. [paper]

[Learned Layout] Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, et al. Instance-Optimized Data Layouts for Cloud Analytics Workloads. SIGMOD, 2021. [paper]

[Learned Layout] Bandle, M., Giceva, J., & Neumann, T. (2021). To Partition, or Not to Partition, That is the Join Question in a Real System. SIGMOD, 2021. [paper]

[Data Container] Madden S, Ding J, Kraska T, Sudhir S, Cohen D, Mattson T, Tatbul N. Self-Organizing Data Containers. CIDR, 2022. [paper]

[Learned Layout] Teng Zhang, Jian Tan, Xin Cai, Jianying Wang, Feifei Li, Jianling Sun. SA-LSM : Optimize Data Layout for LSM-tree Based Storage using Survival Analysis. VLDB, 2022. [paper]

[Learned Layout] Michael Abebe. Tiresias: Enabling Predictive Autonomous Storage and Indexing. VLDB, 2022. [paper]

Query Execution

[CodeGen] Immanuel Trummer. CodexDB: Synthesizing Code for Qery Processing from Natural Language Instructions using GPT-3 Codex. VLDB, 2022. [paper]

Zhang, C., Marcus, R., Kleiman, A., & Papaemmanouil, O. (2020). Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning. AIDB@VLDB, 2020. [paper]

5. Database Monitoring

[Trend Prediction] L. Ma, D. V. Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon, “Query-based Workload Forecasting for Self-driving Database Management Systems,” in SIGMOD, 2018. [paper]

[Performance Prediction] Dorn, J., Apel, S., & Siegmund, N. (n.d.). Mastering Uncertainty in Performance Estimations of Configurable Software Systems. (3).

[Performance Prediction] Marcus, R., & Papaemmanouil, O. (2019). Plan-structured deep neural network models for query performance prediction. Proceedings of the VLDB Endowment, 12(11), 1733–1746. [paper]

[Performance Prediction] Wu, W., Chi, Y., Hacig̈um̈uş, H., & Naughton, J. F. (2013). Towards predicting query execution time for concurrent and dynamic database workloads. Proceedings of the VLDB Endowment, 6(10), 925–936. [paper]

[Performance Prediction] Duggan, J., Papaemmanouil, O., Cetintemel, U., & Upfal, E. (2014). Contender: A resource modeling approach for concurrent query performance prediction. Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings, 109–120. [paper]

[Performance Prediction] Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüş, H., & Naughton, J. F. (2013). Predicting query execution time: Are optimizer cost models really unusable? Proceedings - International Conference on Data Engineering, (1), 1081–1092. [paper]

[Performance Prediction] Higginson, A. S., Dediu, M., Arsene, O., Paton, N. W., & Embury, S. M. (2020). Database Workload Capacity Planning using Time Series Analysis and Machine Learning. Proceedings of the ACM SIGMOD International Conference on Management of Data, 769–783. [paper]

[Performance Prediction] Unterbrunner, P., Giannikis, G., Alonso, G., Fauser, D., & Kossmann, D. (2009). Predictable performance for unpredictable workloads. Proceedings of the VLDB Endowment, 2(1), 706–717. [paper]

[Performance Prediction] Xuanhe Zhou, Ji Sun, Guoliang Li, Jianhua Feng. Query Performance Prediction for Concurrent Queries using Graph Embedding. [paper]

6. Database Diagnosis

System Diagnosis

Yoon, D. Y., Niu, N., & Mozafari, B. (2016). DBSherlock: A performance diagnostic tool for transactional databases. Proceedings of the ACM SIGMOD International Conference on Management of Data, 26-June-20(i), 1599–1614. [paper]

Kalmegh, P., Babu, S., & Roy, S. (2019). iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks. SIGMOD. [paper]

Ma, M., Yin, Z., Zhang, S., Wang, S., Zheng, C., & Jiang, X. (2020). Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases. PVLDB Endowment. [paper]

Query Diagnosis

Xiaoze Liu, Zheng Yin, Chao Zhao, et al. PinSQL: Pinpoint Root Cause SQLs to Resolve Performance Issues in Cloud Databases. ICDE 2022. [paper]

7. Training Data Generation

Query Generation

L.Zhang, C.Chai, X.Zhou, and G.Li. Learned sqlgen: Constraint-aware sql generation using reinforcement learning. In SIGMOD, 2022. [paper]

Liu X, Kong X, Liu L, et al. TreeGAN: syntax-aware sequence generation with generative adversarial networks. In ICDM, 2018. [paper]

Data Generation

[DeepAR] Jingyi Yang, Peizhi Wu, Gao Cong, Tieying Zhang, Xiao He. SAM: Database Generation from Query Workloads with Supervised Autoregressive Models. SIGMOD, 2022. [paper]

Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Volker Markl. Expand your training limits! Generating training data for ML-based data management. SIGMOD, 2021 [paper]

Ju Fan, Tongyu Liu, Guoliang Li, Yuwei Shen, Xiaoyong Du. Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration. VLDB 2020. [paper]

8. AI Techniques

Feature Encoding

[PlanEncoding] Yue Zhao, Gao Cong, Jiachen Shi, Chunyan Miao. QueryFormer: A Tree Transformer Model for Query Plan Representation. VLDB, 2022. [paper]

[Plan2Feature] Debjyoti Paul, Jie Cao, Feifei Li, Vivek Srikumar. Database Workload Characterization with Query Plan Encoders. VLDB, 2022. [paper]

[Pretrained Representation] Xiu Tang, Sai Wu, Mingli Song, Shanshan Ying, Feifei Li, Gang Chen: PreQR: Pre-training Representation for SQL Understanding. SIGMOD Conference 2022: 204-216 [paper]

[WorkloadAsGraph] Sanjay Agrawal, Eric Chu, Vivek R. Narasayya. Automatic physical design tuning: workload as a sequence. SIGMOD, 2006. [paper]

[DataSummary] Brit Youngmann et al. Guided Exploration of Data Summaries. VLDB, 2022. [paper]

Jiang H, Liu C, Paparrizos J, et al. Good to the Last Bit: Data-Driven Encoding with CodecDB. SIGMOD 2021. [paper]

Model Transfer

Meghdad Kurmanji, Peter Triantafillou. Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data. SIGMOD, 2023. [paper]

9. Database Frameworks

[MLTrain] Lim WS, Butrovich M, Zhang W, Crotty A, Ma L, Xu P, Gehrke J, Pavlo A. Database Gyms. CIDR, 2023. [paper]

[AcademicDB] Immanuel L Haffner, Jens Dittrich. mutable: A Modern DBMS for Research and Fast Prototyping. CIDR, 2023. [paper]

[ModelValid] Remmelt Ammerlaan, Gilbert Antonius, Marc Friedman, et al. PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost!. VLDB, 2022. [[paper]

[Transferable] Ziniu Wu, et al. A Unified Transferable Model for ML-Enhanced DBMS. CIDR, 2022. [paper]

[Transferable] Benjamin Hilprecht, Carsten Binnig. One Model to Rule them All: Towards Zero-Shot Learning for Databases. CIDR, 2022. [paper]

[AutoDB] Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., … Zhang, T. (2017). Self-Driving Database Management Systems. CIDR, 2017. [paper]

[AutoDB] Li, F. (2018). Cloud native database systems at Alibaba: Opportunities and challenges. Proceedings of the VLDB Endowment, 2018. [paper]

[AutoDB] Kraska, T., Alizadeh, M., Beutel, A., Chi, E. H., Ding, J., Kristo, A., … Nathan, V. (2019). SageDB: A learned database system. CIDR, 2019. [paper]

[AutoDB] Li, G., Zhou, X., Li, S. (2019). XuanYuan: An AI-Native Database. Data Eng., 2019. [paper]

[AutoDB] Hilprecht, B., Bang, T., El-Hindi, M., Hättasch, B., Khanna, A., Rehrmann, R., … Binnig, C. (2020). DBMS Fitting: Why should we learn what we already know? Cidr, 2020. [paper]

[AutoDB] Ma, L., Zhang, W., Jiao, J., Wang, W., Butrovich, M., Lim, W. S., … Pavlo, A. (2021). MB2 : Decomposed Behavior Modeling for Self-Driving Database Management Systems. SIGMOD, 2021. [paper]

[AutoDB] Guoliang Li, Xuanhe Zhou, , Ji Sun, Xiang Yu, Yue Han, Lianyuan Jin, Wenbo Li, Tianqing Wang, Shifu Li. openGauss: An Autonomous Database System. VLDB, 2021. [paper]

[NLP] James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, Alon Y. Levy. From Natural Language Processing to Neural Databases. VLDB, 2021. [paper]

[Embedding] Raasveldt, M.. MonetDBLite: An embedded analytical database. SIGMOD, 2018. [paper]

10. Demonstrations

[DB Tuning] Immanuel Trummer. Demonstrating DB-BERT: A Database Tuning Tool that "Reads" the Manual. SIGMOD, 2022. [paper]

[DB Tuning] Luming Sun, Tao Ji, Cuiping Li, Hong Chen. DeepO: A Learned Query Optimizer. SIGMOD, 2022. [paper]

[DB Tuning] Junxiong Wang, Immanuel Trummer, Debabrota Basu. Demonstrating UDO: A Unified Approach for Optimizing Transaction Code, Physical Design, and System Parameters via Reinforcement Learning. SIGMOD, 2021. [paper]

[O&M Platform] Xuanhe Zhou, Lianyuan Jin, Ji Sun, Xinyang Zhao, Xiang Yu, Shifu Li, Tianqing Wang, Kun Li, luyang liu. DBMind: A Self-Driving Platform in openGauss. VLDB, 2021. [paper] [website]

[DB Tuning] Bohan Zhang, Dana Van Aken, Justin Wang, Tao Dai, Shuli Jiang, Jacky Lao, Siyuan Sheng, Andrew Pavlo, Geoffrey J. Gordon. A Demonstration of the ottertune automatic database management system tuning service. VLDB, 2018. [paper]

11. Talks

[AutoDB] Andy Pavlo, Matthew Butrovich, Lin Ma, Prashanth Menon, Wan Shen Lim, Dana Van Aken, William Zhang. Make Your Database System Dream of Electric Sheep : Towards Self-Driving Operation. VLDB, 2021. [paper]

[AutoDB] Tim Kraska. Towards instance-optimized data systems. VLDB, 2021. [paper]

[AutoDB] Guoliang Li. AI-Native Database. VLDB, 2021. [slides]

About

ai4db and db4ai work

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages