- A newly popular unit of data in the Big Data era is the petabyte (PB), which is 1015 bytes
- All of the following statements about MapReduce are true EXCEPT MapReduce runs without fault tolerance.
- Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights. What is this process called? in-memory analytics
- Big Data is being driven by the exponential growth, availability, and use of information. True
- Big Data simplifies data governance issues, especially for global firms. False
- Big Data uses commodity hardware, which is expensive, specialized hardware that is custom built for a client or application. False
- Companies with the largest revenues from Big Data tend to be the largest computer and IT services firms.
- Current total storage capacity lags behind the digital information being generated in the world. True
- Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called? Variability
- Despite their potential, many current NoSQL tools lack mature management and monitoring tools. True
- For low latency, interactive reports, a data warehouse is preferable to Hadoop. True
- Hadoop and MapReduce require each other to work. False
- Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel. True
- How does Hadoop work? It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.
- If you have many flexible programming languages running in parallel, Hadoop is preferable to a data warehouse. True
- In a Hadoop “stack,” what is a slave node? a node where data is stored and processed
- In a Hadoop “stack,” what node periodically replicates and stores data from the Name Node should it fail? secondary node
- In a network analysis, what connects nodes? Edges
- In Application Case 7.6, Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse, it was found that urban individuals have a higher number of diagnosed disease conditions. True
- In most cases, Hadoop is used to replace data warehouses. False
- In the Alternative Data for Market Analysis or Forecasts case study, satellite data was NOT used for monitoring individual customer patterns
- In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal? determine differences in rates of disease in urban and rural populations
- In the financial services industry, Big Data can be used to improve both A & B.
- In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service. True
- In the Salesforce case study, streaming data is used to identify services that customers use most. False
- In the Twitter case study, how did influential users support their tweets? objective data
- It is important for Big Data and self-service business intelligence to go hand in hand to get maximum value from analytics. True
- MapReduce can be easily understood by skilled programmers due to its procedural nature. True
- Satellite data can be used to evaluate the activity at retail locations as a source of alternative True
- Social media mentions can be used to chart and predict flu outbreaks. True
- The quality and objectivity of information disseminated by influential users of Twitter is higher than that disseminated by noninfluential users. True
- The term “Big Data” is relative as it depends on the size of the using organization. True
- There is a clear difference between the type of information support provided by influential users versus the others on Twitter. True
- Traditional data warehouses have not been able to keep up with the variety and complexity of data.
- Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse? unrestricted, ungoverned sandbox explorations
- Using data to understand customers/clients and business operations to sustain and foster growth and profitability is an increasingly challenging task for today’s enterprises.
- What is Big Data’s relationship to the cloud? Amazon and Google have working Hadoop cloud offerings.
- What is the Hadoop Distributed File System (HDFS) designed to handle? unstructured and semistructured non-relational data
- Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources? grid computing
- Which of the following sources is likely to produce Big Data the fastest? RFID tags
Other Links:
See other websites for quiz:
Check on QUIZLET