Databases tutorial for true beginners
This will be the repository for my notes I found today taken during my computer science class. I thought of making this private but figured it might help someone out there. Just so that there is no mistake, these class notes were quickly written during class so assume huge errors and are transcribed here verbatim for my personal reminiscing.
- Class overview
- What is a database?
- What is a dbms?
- SQL + Relation overview
Overview of class
There will be three homeworks, one project and 2 exams. The project we'll have to make is an online stock trade similar to Etrade and will be done in teams of two. The professor mentioned using JDBC. (Not too familiar with this) In previous semesters, the project were like Ebay. I really wanted to do that. Oh well
What is a Database?
A database is a collection of data central to some enterprise. It's basically a record of information.
Say when you buy an item in an online store, (ebay) that online store will record the item you purchased, when it was ordered, how many, etc so that it can compile a history list for you. So the database stores that information and nothing else.
It certainly does not record how that information will be presented in the site. Just pure raw data.
What is a Database Management System?
A program that manages the database. In other words, it controls how information is stored and accessed. It supports a high-level access language such as SQL to retrieve the data. Most DBMS are based on the relational data model although there is the Object Oriented model to consider. However, the professor said we'll focus on the former.
What is A.C.I.D.
- Atomic: whether it commit or doesn't.
- Consistency: obeys the rules of organization. (Integrity Constraints)
- Isolation: like the synchronize method in Java.
- Durability: - examples given was RAID.
Don't remember exactly what ACID was for but believe it had to do with four key points to consider in a DBMS.
What is a Transaction Processing System?
A transaction is a request that changes the database state. Using the online store example, when a customer orders a can of beans, the database that holds what you bought will have to be updated. (ie the state of the db must change) TPS consists of TP monitor which controls the execution of transactions, databases, and transactions.
In the above diagram, dictionary defines the system. That is to say how each database is organized. He also mentioned that serializable execution is when you get the same result no matter the order that two or more requests are processed. Not sure how that fits in though.
What is the difference between OLTP vs OLAP?
stands for Online Transitional Processing.
OLTP is your average maintenance between what happens in the real world and keeping the database up to date. Using previous example, when a person orders one box of cookies and two bottles of soda online, this data must be inserted into the order history database.
means Online Analytic Processing.
On the other hand, OLAP is more for looking at patterns (data mining) The real life example given was this supermarket who had accumulated data of what their customers had been purchasing over the years and had it analyzed to see if there were any shopping patterns. Turns out that when people bought diapers, they did so along with beer. Go figure.
Overview for relation & SQL
Dealing with Database security
- Authorization: make sure user is who he says he is.
- Authentication: only allowed access to what he's owed.
- Encryption: data sent can't be easily seen.
Database: In detail
The information in a DB is stored in tables. Just visualize how Excel looks like where there are columns and rows. There is a domain for each column. What that means is there is a limit on what can go under a column. Say an online store only sold food and nothing else. If you had a table with an "items ordered" column, then its domain would be restricted to food. You can't have numbers in there just because it wouldn't make sense.
What is a relation?
A relation is a "mathematical entity corresponding to a table" (basically a table) and a relational instance is a set of tuples. There are different types of operations on relations such as unary (deletion of a row) and binary (Cartesian product).
- Each row a different entity. No duplicates allowed.
- Column states a particular fact about each entity.
- A tuple is a row.
- An attribute is a column.
What is SQL?
A language used to manipulate tables and stands for Structured Query Language. It has the basic form of:
SELECT "list of columns you want"
FROM "name of table"
WHERE "any conditions/filters"
A things to note is that using * after
SELECT says you want all the columns in that table
What is ACID?
A transaction is a program that accesses the database in response to real-world events but requirements are placed which are known as the ACID properties. All is the responsibility of the transaction monitor except for consistency which is for the transaction designer to worry about. Limiting the occurrence of an event, like the overbooking of a flight, is expressed as integrity constraints which are "just assertions that must be satisfied by the database state"
- Atomicity: A real-world event occurs (commits) or doesn't (aborts). In other words, the transaction runs to completion or has not effect at all.
- Consistency: database is in a state that satisfies all integrity constraints when the execution of transaction starts and after it finishes. New database state must also satisfy specifications of transaction.
- Isolation: not impacted by execution of multiple transactions. If I deposit money in my bank and so do you, my transaction shouldn't be affected by yours.
- Durability: Once data is committed (info in database is changed), system must ensure that it's not lost in case of failures like a power outage or network problems.
One thing the professor really stressed was that "No data is better than bad data." During this part of the lecture, he also mentioned that if the database is being used just to access and not update data, you must optimize the system for reading. For some reason, around this point I wrote that load is equivalent to read and store is equivalent to write.
What is a Relation Data Model
All data is stored in 1's and 0's but dealing with bits is not something humans can readily understand. That is why there must be some level of abstraction. This is where schema comes into play being the abstract description of data. With all this, one can focus on the application.
Physical Data Level
The physical schema goes into the nitty gritty details of how data is stored. (tracks, cylinder, indices, etc) Early application used to deal in this level. That is how we got into problems like the Y2K bug as any changes to data structure becomes difficult to make. These details became intertwined with application code which made it more complex. So when the change needed to be made in the two digit representation of the year, it was quite a task.
Conceptual Data Level
Encapsulates information about the physical layer in a way like the Object Oriented way. Essentially hides the details and provides physical data separation. No need to ask questions like how many bytes in an int? Who cares! That's why there's a mapping of the conceptual and the physical data level.
External Data Level
The external schema specifies a view of the data in terms of the conceptual level (known as the View as well). In other words, it limits what a certain class of users can see sort of like a filter. It provides conceptual data independence and similar to the previously mentioned, there is a mapping of the external data with the conceptual schema. Applications are written in terms of the external schema.