CS 5200: Database Management Systems

MODULES

Overview
Objectives

NoSQL Databases and MongoDB

Another type of database that is not based on the relational model has been growing in popularity for the last decade. Non-Relational or NoSQL databases are document-oriented database management systems whose entities are comprised of attribute/value pairs. The entities are not governed by a schema type. Instead, documents are only related in that they share the same set of attributes.

In this module we will explore this type of DBBMS, undertstand the programmming model, and see how to create applications using the APIs provided by the Java connector for the database. We will also look at some aspects of NoSQL databases that do not provide commonly accepted database properties, such guaranteed consistency and ACID transactions. We will see how applications deal with these issues.

Lecture 10
Lecture 11
Assignment 7
Lecture 12
Assignment 8
Lecture 13

NoSQL Databases and MongoDB

This lecture will introduce a different database model that is not based on SQL, which is often referred to as non-relational or NoSQL. We will examine the nature and motivations behind NoSQL databases and learn how they represent data what embedded languages they use for data definition, data modeling, and queries.

We will also introduce MongoDB, a NoSQL database that has become increasingly popular for dynamic web and big-data applications We will review how to install MongoDB and the Java connector that provides a Java API for interacting with MongoDB, and how to use t he Eclipse IDE to create MongoDB applications We will also see how to use the Mongo Shell interactive command interpreter to perform operations in MongoDB’s native JavaScript query language.

Finally, we will look at how data is represented within MongoDB using JavaScript Object Notation (JSON) and its binary form, BSON.

Readings

Class presentation

Lecture 10 presentation

Tutorial readings

MongoDB - Java from TutorialsPoint (last accessed 2017-11-01).
JSON Tutorial from TutorialsPoint (last acccessed 2017-11-01).
JSON from Wikipedia (last accessed 2017-11-01).

Additional readings

MongoDB Manual v.3.4 (last accessed 2017-11-01)

MongoDB Data Modeling and Queries

In this lecture we will present techniques for data modeling in MongoDB, including document structure, atomicity of write operations, document growth, and data use and performance. Next, we will look at how to perform queries on documents in both the Mongo shell and the Java APIs, and how to control the order and which fields are returned. Finally, we will discuss several options for graphical development tools to develop MongoDB databases and applications.

Readings

Class presentation

Lecture 11 presentation

Tutorial readings

MongoDB - Java from TutorialsPoint (last accessed 2017-11-01).
MongoDB Java Tutorial (last accessed 2017-11-10)

Additional readings

MongoDB Data Modeling (last accessed 2017-11-01)
Mongo Shell (last accessed 2017-11-01)
Query Documents (last accessed 2017-11-01)

MongoDB Indexing and Text Searching

In order to speed up access to data in a document, MongoDB provides a facility for creating indexes on specified fields. Indexes provide more efficient access to the content than searching through all fields of every document in a container. MongoDB provides several kinds of indexes for the different types of data that it can store.<

In this lecture, we will learn about the different kinds of indexes available in MongoDB, and how to specify and create them. We will also see how to control the indexing to optimize the operations for specific types of data.

Finally, we will learn about the extensive set of facilities provided by MongoDB for indexing and searching for text. As we will see, MongoDB can perform sophisticated text tokenizatin and indexing using stemming, stop lists, and token delimiters that are language specific. Text in a number of languages can be managed with a single document, and the indexing and search functions recognized the different languages within a document.

Readings

Class presentation

Lecture 12 presentation

C Tutorial readings

Indexes MongoDB manual (last accessed 2017-11-20).
Indexing from TutorialsPoint (last accessed 2017-11-20).
Text Search MongoDB manual (last accessed 2017-11-20).
Text Search Tutorial from TutorialsPoint (last accessed 2017-11-20).

MongoDB Geospatial Indexing and Queries

Representing and querying geospatial information has become an important requirement for a variety of applications. MongoDB is a popular choice for them because of its geospatial capabilities.

This lecture will briefly introduce the concepts of geospatial indexes, and then look at the two types of geospatial indexing provided by MongoDB: flat(2d) and spherical (2dspherical) and the uses for each of them.

Next It will cover how geospatial data is represented by the widely used GeoJSON extension to JSON, and the the specific types of geometrical constructs for single and composite shapes. This lecture will also present the MongoDB operators that allow querying geospatial and look at how to combine the operators.

Finally, we will look at a simple geospatial application to see how the pieces fit together.

Readings

Class presentation

Lecture 13 presentation

MongoDB Tutorial readings

Geospatial Queries MongoDB manual (last accessed 2017-11-20).
2Dsphere Indexes MongoDB manual (last accessed 2017-11-20).
Finding Restaurants with Geospatial Queries MongoDB manual (last accessed 2017-11-20).
GeoJSON Objects MongoDB manual (last accessed 2017-11-20).
GeoJSAN from Wikipedia (last accessed 2017-11-20).

DUE DATE: Wed. June 19 by 11:59:59pm

This set of problems will give you some practice using MongoDB queries. You may create and submit your solution as a Java program.

Here is a sample JSON record that defines a restaurant. The definition includes an embedded document for the address of the restaurant, and an array of documents representing grades, consisting of a date the restaurant was graded, a letter grade, and a numeric score.

{
  "address": {
     "building": "1007",
     "coord": [ -73.856077, 40.848447 ],
     "street": "Morris Park Ave",
     "zipcode": "10462"
  },
  "borough": "Bronx",
  "cuisine": "Bakery",
  "grades": [
     { "date": { "$date": 1393804800000 }, "grade": "A", "score": 2 },
     { "date": { "$date": 1378857600000 }, "grade": "A", "score": 6 },
     { "date": { "$date": 1358985600000 }, "grade": "A", "score": 10 },
     { "date": { "$date": 1322006400000 }, "grade": "A", "score": 9 },
     { "date": { "$date": 1299715200000 }, "grade": "B", "score": 14 }
  ],
  "name": "Morris Park Bake Shop",
  "restaurant_id": "30075445"
}

Your individual Github repositories "2019S1CS5200SV/assignment-7-ccsid" contain a file "restaurants.zip" with several thousand records for restaurants. Download and unzip this file as "restaurants.json".

Create a Mongo database "restaurantDB" and a container "restaurants". Use the Mongo tool to load these records into the "restaurants" container. Here is the command:

mongoimport -d restaurantDB -c restaurants retaurants.json

Be sure to include the path to the file if it not in your current directory. You should see output simiar to the following.

connected to: localhost
imported 3772 documents

You should inspect the container to ensure that the data was properly loaded. Once you have done that, create queries that answer the following questions. You may use Java, NodeJS/Javascript, or MongoDB shell. Turn in a Java file "Assignment_7.java" or a Javascript file "assignment_7.js".c

Write a MongoDB query to display the fields restaurant_id, name, borough and cuisine for all the documents in the collection restaurant.
Write a MongoDB query to displays the the first 5 restaurants in the borough of the Bronx.
Write a MongoDB query to find the restaurant ID, name, borough and cuisine for those restaurants that contain 'Reg' as three letters somewhere in its name.
Write a MongoDB query to find the restaurants that are located at longitude value west of -74 degrees.
Write a MongoDB query to find the restaurants that do not prepare any 'American' cuisine and achieved a grade point 'A' (at least one 'A') not belongs to the borough Brooklyn. The document must be displayed according to the cuisine in descending order.
Write a MongoDB query to find the restaurant Id, name, borough and cuisine for those restaurants which belong to the borough of Staten Island or Queens or Bronx or Brooklyn.
Write a MongoDB query to find the restaurant Id, name, borough and cuisine for those restaurants which achieved a score which is not more than 10 (some score of 10 or less).

DUE DATE: TBD by 11:59:59pm.

Chambers Twentieth Century Dictionary

Create a Java program that reads dictionary entries from a volume of Chamber's Twentieth Century Dictionary, and adds documents for each entry from "SAB to "SYZYGY" to a MongoDB collection named "chambers_20th_c_dictionary".. The dictionary file is located at: http://www.gutenberg.org/cache/epub/38700/pg38700.txt.

Dictionary entries are separated by empty lines, and may span multiple lines. A definition begins with a the word followed an optonal pronounciation, followed by definitions, followed by an optional derivation.

Documents have the following fields:

word: the word being defined
pronouciation: the ponenetic pronounciation of the word (optional).
derivation: the derivation of the word (optional).
definitions: an array of definition strings
note: definition note; usually a "see" reference in lieu of a definition.

If the word is terminated by a '.', the remaining text becomes a note and no further parsing is required for the entry:

SACRE. Same as SAKER.

Otherwise, the word is terminated by a ', ' and a pronounciation follows.

If the pronounciation is terminated by a '.' the remaining text becomes a note and no further parsing is required for the entry:

SACQUE, sak. See SACK (1).

Otherwise the pronounciation is terminated by a ',' and one or more definitions and an optional derivation follows.

The derivation always appears at the end of the definition withing spare brackets:

SACODES, s[=a]-k[=o]'d[=e]z, _n._ a genus of beetles of the family
_Cyphonidæ_. [Gr. _sakos_, a shield, _eidos_, form.]

Remove the brackets around the derivation.

Definitions are separated by two dashes ("--"), provided the are preceeded by a period ('.') terminating the previous definition. Break the these into separate definitions, and add them to the document as an array. For example, the following has two definitions following the pronounciation:

SABURRA, s[=a]-bur'ä, _n._ a foulness of the stomach.--_adj._
SABURR'AL.--_n._ SABURR[=A]'TION, sand-baking: the application of a hot
sand-bath.

Add the word, the definitions, and the derivation fields to the text index for the container. Also create a separate index for the word.

Develop a reasonable set of tests that query the definitions to demonstrate that the database is properly constructed. This might involve querying for a list of words and comparing the results to expected values or definition counts. It might also involve quering for certain alternate forms of words contained in the definitions (e.g. "SABURR'AL" in the definition for 'SUBURRA' shown above). Please come up with other tests as well that demonstrate that the database is properly constructed.

In the class GitHub repository 2018FACS5200SV/assignment-8 repository, you will find a Java class NewWorldDictionaryReader whose main funciton parses dictionary entries and sets variables for the fields. In the interest of time, you are welcome to use this as a starting point, but you may also write your own if you wish. If you find errors in the code, please feel free to make fixes, and to share your fixes with others in the class via Piazza.