- Flume is a tool for converting data back and forth between a relational database and the HDFS. False
- ______ minimizes the number of disk reads necessary to retrieve a row of data. Row-centric storage
- A(n) ______ is a process or set of operations in a calculation. Algorithm
- Scaling out is keeping the same number of systems, but migrating each system to a larger one. False
- When using MapReduce, a _______ function takes a collection and data and sorts and filters it into a set of key-value pairs. Map
- ______ focuses on filtering data as it enters the system to determine which data to keep and which to discard. Stream processing
- ______ is a tool for converting data back and forth between a relational database and the HDFS. Sqoop
- ______ is keeping the same number of systems, but migrating each system to a larger system. Scaling up
- ______ is NOT one of the “3 Vs” of Big Data. Validation
- ______ processing occurs when a program runs from beginning to end without any user interaction. Batch
- ______ was the first SQL on Hadoop application. Impala
- A ______ is a programmed function within an object used to manipulate the data in that same object. Method
- A block report is used to let the name node know that the data mode is still available. False
- A column family database is a NoSQL database model that organizes data in key-value pairs with keys mapped to a set of columns in the value component. True
- A query in a graph database is called a ______. Traversal
- A reduce function takes a collection of key-value pairs with the same key value and summarizes them into a single result. True
- A(n) ______ is a tag that is used to associate a collection of nodes as being of the same type or belonging to the same group. Label
- Big Data ______. captures data in whatever format it naturally exists
- Big Data processing imposes a structure on the data as needed for applications as a part of retrieval and processing. True
- By default, Hadoop uses a replication factor of ______. Three
- Characteristics that are important in working with data in the relational database model also apply to Big Data. True
- Data collected or aggregated around a central topic or entity is said to be ______ aware. Aggregate
- Document databases group documents into logical groups called ______. Collections
- For a data set to be considered Big Data, it must display only one of the 3 Vs (volume, velocity and variety). False
- Graph theory is a mathematical and computer science field that models relationships, or edges, between objects called ______. Nodes
- Hadoop is a database that has become the de facto standard for most Big Data storage and processing. True
- Hive is a good choice for jobs that require a small subset of data to be returned very quickly. False
- In many ways, the issues associated with volume and velocity are the same. True
- In MongoDB, ______ method retrieves objects from a collection that match the restrictions provided. find()
- In MongoDB, the ______ method is used to improve the readability of retrieved documents through the use of line breaks and indention. pretty()
- In the context of Big Data, ______ refers to the trustworthiness of a set of data. Veracity
- In the context of Big Data, ______ relates to changes in meaning. Variability
- Interest in graph databases can be tied to the area of social networks. True
- Key-value and document databases are structurally similar. True
- Lack of specificity is what leads to ambiguity in defining Big Data. True
- Modeling and storing data about relationships is the focus of ______ databases. Graph
- Most NoSQL products run only in a Linux or Unix environment. True
- Neo4j is a ______ database. Graph
- Relational databases rely on unstructured data. False
- The ability to graphically present data in a way that makes it understandable is the concept of value. False
- The analysis of data to produce actionable results is feedback loop processing True
- The name, MongoDB, comes from the word humongous as its developers intended their new product to support extremely large data sets. True
- To query the value component of the pair when using a key-value database, use get or ______. Fetch
- Two of the most popular applications to simplify the process of creating MapReduce jobs are Hive and ______. Pig
- Under the HDFS system, using a write-one, ready-many model simplifies concurrency issues. True
- When using a HDFS, a heartbeat is sent every ______ to notify the name node that the data mode is still available. 3 seconds
- When using a HDFS, the ______ node creates new files by communicating with the ______ node. client; name
- When using MapReduce, best practices suggest that the number of mappers on a given node should be ______. 100 or less
- Which of the following is NOT a key assumption of the Hadoop Distributed File System? Write many, read-once
- Which of the following is NOT one of the standard NoSQL categories? Chart databases
Other Links:
See other websites for quiz:
Check on QUIZLET