欢迎光临散文网 会员登陆 & 注册

RookieDB概述

2023-01-13 18:05 作者:CodeSnake  | 我要投稿

Tips:本文主要内容是解析RookieDB项目,并进行相关代码框架阐述,RookieDB 项目对于需要学习数据库领域的同学们来说是一个不可多得的基础底层项目,我们虽没有完整深入关系型数据库的整体开发,但给了我们一个很好的学习视角和途径,让我们更明白数据库的数据类型、索引、缓存、备份、恢复等关键技术节点。


课程资源

资源汇总

  • https://csdiy.wiki/%E6%95%B0%E6%8D%AE%E5%BA%93%E7%B3%BB%E7%BB%9F/CS186/

在学习这门课中用到的所有资源和作业实现都汇总在 PKUFlyingPig/CS186 - GitHub 中。

文章更好浏览体验:https://blog.codesnake.space/tags/rookiedb


RookieDB Overview

RookieDB is a bare-bones database implementation which supports executing simple transactions in series. In the assignments of this class you will be adding support for B+ tree indices, efficient join algorithms, query optimization, multigranularity locking to allow concurrent execution of transactions, and database recovery.

RookieDB是一个光秃秃的数据库,支持串联执行简单的事务。在这门课的作业中,你将会使其支持B+树索引、高效的连接算法、查询优化、允许并发执行事务的多粒度锁,以及数据库恢复。

For convenience, the staff will be maintaining a read-only public repo here containing the project skeleton. When starting projects remember to work off of the private repos provided to you through GitHub Classroom rather than the public one.

为方便起见,工作人员将在这里维护一个包含项目骨架的只读公共仓库。当开始项目时,请记得使用通过GitHub教室提供给你的私人仓库而不是公共仓库。

As you will be working with this codebase for the rest of the semester, it is a good idea to get familiar with it. The code is located in the src/main/java/edu/berkeley/cs186/database directory, while the tests are located in the src/test/java/edu/berkeley/cs186/database directory. The following is a brief overview of each of the major sections of the codebase.

由于你将在本学期余下的时间里使用这个代码库,熟悉它是一个好主意。代码位于 src/main/java/edu/berkeley/cs186/database 目录中,而测试则位于 src/test/java/edu/berkeley/cs186/database 目录中。下面是对代码库中每个主要部分的简要概述。

cli_命令行界面

The cli directory contains all the logic for the database's command line interface. Running the main method of CommandLineInterface.java will create an instance of the database and create a simple text interface that you can send and review the results of queries in. The inner workings of this section are beyond the scope of the class (although you're free to look around), you'll just need to know how to run the Command Line Interface.

cli目录包含了数据库命令行界面的所有逻辑。运行CommandLineInterface.java的主方法将创建一个数据库的实例,并创建一个简单的文本界面,你可以在其中发送和审查查询结果。这一部分的内部工作超出了本类的范围(尽管你可以自由地查看),你只需要知道如何运行命令行界面。

parser_解析

The subdirectory cli/parser contains a lot of scary looking code! Don't be intimidated, this is all automatically generated automatically from the file RookieParser.jjt in the root directory of the repo. The code here handles the logic to convert from user inputted queries (strings) into a tree of nodes representing the query (parse tree).

子目录cli/parser(解释器)包含了很多看起来很吓人的代码! 不要被吓到,这都是由 repo 根目录下的 RookieParser.jjt 文件自动生成的。这里的代码处理的是将用户输入的查询(字符串)转换成代表查询的节点树(解析树)的逻辑。

visitor_访问

The subdirectory cli/visitor contains classes that help traverse the trees created from the parser and create objects that the database can work with directly.

子目录cli/visitor(访问者)包含帮助遍历从解析器创建的树并创建数据库可以直接处理的对象的类。


common_公共目录

The common directory contains bits of useful code and general interfaces that are not limited to any one part of the codebase.

common(公共)目录包含了一些有用的代码和一般的接口,不限于代码库的任何一个部分。


concurrency_并发

The concurrency directory contains a skeleton for adding multigranularity locking to the database. You will be implementing this in Project 4.

concurrency(并发)目录包含一个骨架,用于向数据库添加多粒度锁。你将在项目4中实现这一点。


databox_数据盒

Our database has, like most DBMS's, a type system distinct from that of the programming language used to implement the DBMS. (Our DBMS doesn't quite provide SQL types either, but it's modeled on a simplified version of SQL types).

像大多数DBMS一样,我们的数据库有一个与用于实现DBMS的编程语言不同的类型系统。(我们的DBMS也不提供完全的SQL类型,但它是以SQL类型的简化版本为模型的)。

The databox directory contains classes which represents values stored in a database, as well as their types. The various DataBox classes represent values of certain types, whereas the Type class represents types used in the database.

databox目录包含了代表存储在数据库中的值的类,以及它们的类型。各种DataBox类代表某些类型的值,而Type类代表数据库中使用的类型。

An example:

DataBox x = new IntDataBox(42); // The integer value '42'.
Type t = Type.intType();        // The type 'int'.
Type xsType = x.type();         // Get x's type, which is Type.intType().
int y = x.getInt();             // Get x's value: 42.
String s = x.getString();       // An exception is thrown, since x is not a string.


index_索引

The index directory contains a skeleton for implementing B+ tree indices. You will be implementing this in Project 2.

Index(索引)目录包含一个实现B+树索引的骨架。你将在项目2中实现它。

memory_内存

The memory directory contains classes for managing the loading of data into and out of memory (in other words, buffer management).

memory(内存)目录包含管理数据载入和流出内存的类(换句话说,缓冲区管理)。

The BufferFrame class represents a single buffer frame (page in the buffer pool) and supports pinning/unpinning and reading/writing to the buffer frame. All reads and writes require the frame be pinned (which is often done via the requireValidFrame method, which reloads data from disk if necessary, and then returns a pinned frame for the page).

BufferFrame类表示一个缓冲帧(缓冲池中的页面),并支持对缓冲帧的固定/取消固定和读/写。所有的读和写都需要固定帧(这通常是通过requireValidFrame方法完成的,该方法在必要时从磁盘重新加载数据,然后返回页面的固定帧)。**

The BufferManager interface is the public interface for the buffer manager of our DBMS.

The BufferManagerImpl class implements a buffer manager using a write-back buffer cache with configurable eviction policy. It is responsible for fetching pages (via the disk space manager) into buffer frames, and returns Page objects to allow for manipulation of data in memory.

BufferManager接口是我们DBMS的缓冲区管理器的公共接口。

BufferManagerImpl类使用带有可配置的提取策略的回写缓冲区缓存实现了缓冲区管理器。它负责将页面(通过磁盘空间管理器)获取到缓冲区帧中,并返回Page对象以允许对内存中的数据进行操作。

The Page class represents a single page. When data in the page is accessed or modified, it delegates reads/writes to the underlying buffer frame containing the page.

The EvictionPolicy interface defines a few methods that determine how the buffer manager evicts pages from memory when necessary. Implementations of these include the LRUEvictionPolicy (for LRU) and ClockEvictionPolicy (for clock).

Page类表示单个页面。当访问或修改页中的数据时,它将读/写委托给包含页的底层缓冲帧。

EvictionPolicy接口定义了一些方法,用于确定缓冲区管理器在必要时如何从内存中清除页。这些方法的实现包括LRUEvictionPolicy(用于LRU)和ClockEvictionPolicy(用于时钟)。


IO_输入输出流

The io directory contains classes for managing data on-disk (in other words, disk space management).

IO目录包含用于管理磁盘上数据的类(换句话说,磁盘空间管理)。

The DiskSpaceManager interface is the public interface for the disk space manager of our DBMS.

The DiskSpaceMangerImpl class is the implementation of the disk space manager, which maps groups of pages (partitions) to OS-level files, assigns each page a virtual page number, and loads/writes these pages from/to disk.

DiskSpaceManager接口是DBMS的磁盘空间管理器的公共接口。

DiskSpaceMangerImpl类是磁盘空间管理器的实现,它将页面组(分区)映射到操作系统级文件,为每个页面分配一个虚拟页码,并将这些页面从/写入磁盘。


query_查询

The query directory contains classes for managing and manipulating queries.

The various operator classes are query operators (pieces of a query), some of which you will be implementing in Project 3.

Query(查询)目录包含用于管理和操作查询的类。

各种操作符类都是查询操作符(查询的一部分),其中一些将在项目3中实现。

The QueryPlan class represents a plan for executing a query (which we will be covering in more detail later in the semester). It currently executes the query as given (runs things in logical order, and performs joins in the order given), but you will be implementing a query optimizer in Project 3 to run the query in a more efficient manner.

QueryPlan类表示执行查询的计划(我们将在本学期晚些时候更详细地讨论)。它目前按照给定的顺序执行查询(按照逻辑顺序运行,并按照给定的顺序执行连接),但是您将在Project 3中实现一个查询优化器,以更有效的方式运行查询。


recovery_恢复

The recovery directory contains a skeleton for implementing database recovery a la ARIES. You will be implementing this in Project 5.

Recovery(恢复)目录包含实现数据库恢复的框架。您将在项目5中实现它。

table_表

The table directory contains classes representing entire tables and records.

The Table class is, as the name suggests, a table in our database. See the comments at the top of this class for information on how table data is layed out on pages.

The Schema class represents the schema of a table (a list of column names and their types).

The Record class represents a record of a table (a single row). Records are made up of multiple DataBoxes (one for each column of the table it belongs to).

The RecordId class identifies a single record in a table.

The HeapFile interface is the interface for a heap file that the Table class uses to find pages to write data to.

The PageDirectory class is an implementation of HeapFile that uses a page directory.

Table(表)目录包含表示整个表和记录的类。

顾名思义,Table类是数据库中的一个表。有关表数据如何在页面上布局的信息,请参阅该类顶部的注释。

Schema类表示表的模式(表结构)(列名称及其类型的列表)。

Record类表示一个表(单行)的记录。记录由多个databox组成(它所属的表的每列对应一个databox)。

RecordId类标识表中的单个记录。

HeapFile接口是堆文件的接口,Table类使用堆文件查找要写入数据的页面。

PageDirectory类是使用页面目录的HeapFile的一个实现。


stats_统计数据

The table/stats directory contains classes for keeping track of statistics of a table. These are used to compare the costs of different query plans, when you implement query optimization in Project 4.

table/stats目录包含用于跟踪表统计信息的类。在项目4中实现查询优化时,这些参数用于比较不同查询计划的成本。

关键类

Transaction.java

The Transaction interface is the public interface of a transaction - it contains methods that users of the database use to query and manipulate data.

Transaction接口是事务的公共接口——它包含数据库用户用于查询和操作数据的方法。

This interface is partially implemented by the AbstractTransaction abstract class, and fully implemented in the Database.Transaction inner class.

该接口部分由AbstractTransaction抽象类实现,并在Database中完全实现。事务内部类。


TransactionContext.java

The TransactionContext interface is the internal interface of a transaction - it contains methods tied to the current transaction that internal methods (such as a table record fetch) may utilize.

TransactionContext接口是事务的内部接口——它包含绑定到当前事务的方法,内部方法(如表记录获取)可以利用这些方法。

The current running transaction's transaction context is set at the beginning of a Database.Transaction call (and available through the static getCurrentTransaction method) and unset at the end of the call.

当前正在运行的事务的事务上下文设置在**Database.Transaction的开头。事务调用(并且可通过静态getCurrentTransaction方法获得)并在调用结束时取消设置。**

This interface is partially implemented by the AbstractTransactionContext abstract class, and fully implemented in the Database.TransactionContext inner class.

该接口部分由AbstractTransactionContext抽象类实现,并在数据库中完全实现。TransactionContext内部类。


Database.java

The Database class represents the entire database. It is the public interface of our database - users of our database can use it like a Java library.

Database类表示整个数据库。它是我们数据库的公共接口——我们数据库的用户可以像使用Java库一样使用它。

All work is done in transactions, so to use the database, a user would start a transaction with Database#beginTransaction, then call some of Transaction's numerous methods to perform selects, inserts, and updates.

所有的工作都是在事务中完成的,因此要使用数据库,用户需要使用databdatab# beginTransaction启动一个事务,然后调用transaction的许多方法中的一些来执行选择、插入和更新。

For example:

More complex queries can be found in src/test/java/edu/berkeley/cs186/database/TestDatabase.java

更复杂的查询可以在 src/test/java/edu/berkeley/cs186/database/TestDatabase.java 中找到。



RookieDB概述的评论 (共 条)

分享到微博请遵守国家法律