INF503 QUIZ

  1. ​​​Flume is a tool for converting data back and forth between a relational database and the HDFS.   False
  2. ​______ minimizes the number of disk reads necessary to retrieve a row of data.  ​Row-centric storage
  3. ​A(n) ______ is a process or set of operations in a calculation.  Algorithm
  4. ​Scaling out is keeping the same number of systems, but migrating each system to a larger one.  False
  5. ​When using MapReduce, a _______ function takes a collection and data and sorts and filters it into a set of key-value pairs.  Map
  6. ______ focuses on filtering data as it enters the system to determine which data to keep and which to discard.​        ​Stream processing
  7. ______ is a tool for converting data back and forth between a relational database and the HDFS.  Sqoop
  8. ______ is keeping the same number of systems, but migrating each system to a larger system.  Scaling up
  9. ______ is NOT one of the “3 Vs” of Big Data.   Validation
  10. ______ processing occurs when a program runs from beginning to end without any user interaction.  Batch
  11. ______ was the first SQL on Hadoop application.   Impala
  12. A ______ is a programmed function within an object used to manipulate the data in that same object.  Method
  13. A block report is used to let the name node know that the data mode is still available.  False
  14. A column family database is a NoSQL database model that organizes data in key-value pairs with keys mapped to a set of columns in the value component.   True
  15. A query in a graph database is called a ______.  Traversal
  16. A reduce function takes a collection of key-value pairs with the same key value and summarizes them into a single result.  True
  17. A(n) ______ is a tag that is used to associate a collection of nodes as being of the same type or belonging to the same group.  Label
  18. Big Data ______.   captures data in whatever format it naturally exists
  19. Big Data processing imposes a structure on the data as needed for applications as a part of retrieval and processing.  True
  20. By default, Hadoop uses a replication factor of ______.  Three
  21. Characteristics that are important in working with data in the relational database model also apply to Big Data.   True
  22. Data collected or aggregated around a central topic or entity is said to be ______ aware.   Aggregate
  23. Document databases group documents into logical groups called ______.  Collections
  24. For a data set to be considered Big Data, it must display only one of the 3 Vs (volume, velocity and variety).   False
  25. Graph theory is a mathematical and computer science field that models relationships, or edges, between objects called ______.   Nodes
  26. Hadoop is a database that has become the de facto standard for most Big Data storage and processing.  True
  27. Hive is a good choice for jobs that require a small subset of data to be returned very quickly.​  False
  28. In many ways, the issues associated with volume and velocity are the same.​  True
  29. In MongoDB, ______ method retrieves objects from a collection that match the restrictions provided.  find()
  30. In MongoDB, the ______ method is used to improve the readability of retrieved documents through the use of line breaks and indention.  pretty()
  31. In the context of Big Data, ______ refers to the trustworthiness of a set of data.  Veracity
  32. In the context of Big Data, ______ relates to changes in meaning.  Variability
  33. Interest in graph databases can be tied to the area of social networks.​  True
  34. Key-value and document databases are structurally similar.  True
  35. Lack of specificity is what leads to ambiguity in defining Big Data.                True
  36. Modeling and storing data about relationships is the focus of ______ databases.   Graph
  37. Most NoSQL products run only in a Linux or Unix environment.  True
  38. Neo4j is a ______ database.   Graph
  39. Relational databases rely on unstructured data.  False
  40. The ability to graphically present data in a way that makes it understandable is the concept of value.                False
  41. The analysis of data to produce actionable results is feedback loop processing   True
  42. The name, MongoDB, comes from the word humongous as its developers intended their new product to support extremely large data sets.  True
  43. To query the value component of the pair when using a key-value database, use get or ______.  Fetch
  44. Two of the most popular applications to simplify the process of creating MapReduce jobs are Hive and ______.   Pig
  45. Under the HDFS system, using a write-one, ready-many model simplifies concurrency issues.   True
  46. When using a HDFS, a heartbeat is sent every ______ to notify the name node that the data mode is still available.  3 seconds
  47. When using a HDFS, the ______ node creates new files by communicating with the ______ node.  client; name
  48. When using MapReduce, best practices suggest that the number of mappers on a given node should be ______.  100 or less
  49. Which of the following is NOT a key assumption of the Hadoop Distributed File System?  Write many, read-once
  50. Which of the following is NOT one of the standard NoSQL categories?  Chart databases

Other Links:

Statistics Quiz

Networking Quiz

See other websites for quiz:

Check on QUIZLET

Check on CHEGG

Leave a Reply

Your email address will not be published. Required fields are marked *