Hadoop

Partitioner controls the partitioning of what data?

final keys

final values

intermediate keys

intermediate values

Q 1 / 33

Hadoop

SQL Windowing functions are implemented in Hive using which keywords?

UNION DISTINCT, RANK

OVER, RANK

OVER, EXCEPT

UNION DISTINCT, RANK

Q 2 / 33

Hadoop

Rather than adding a Secondary Sort to a slow Reduce job, it is Hadoop best practice to perform which optimization?

Add a partitioned shuffle to the Map job.

Add a partitioned shuffle to the Reduce job.

Break the Reduce job into multiple, chained Reduce jobs.

Break the Reduce job into multiple, chained Map jobs.

Q 3 / 33

Hadoop

Hadoop Auth enforces authentication on protected resources. Once authentication has been established, it sets what type of authenticating cookie?

encrypted HTTP

unsigned HTTP

compressed HTTP

signed HTTP

Q 4 / 33

Hadoop

MapReduce jobs can be written in which language?

Java or Python

SQL only

SQL or Java

Python or SQL

Q 5 / 33

Hadoop

To perform local aggregation of the intermediate outputs, MapReduce users can optionally specify which object?

Reducer

Combiner

Mapper

Counter

Q 6 / 33

Hadoop

To verify job status, look for the value `_` in the `_`.

SUCCEEDED; syslog

SUCCEEDED; stdout

DONE; syslog

DONE; stdout

Q 7 / 33

Hadoop

Which line of code implements a Reducer method in MapReduce 2.0?

public void reduce(Text key, Iterator<IntWritable> values, Context context){…}

public static void reduce(Text key, IntWritable[

public static void reduce(Text key, Iterator<IntWritable> values, Context context){…}

public void reduce(Text key, IntWritable[

Q 8 / 33

Hadoop

To get the total number of mapped input records in a map job task, you should review the value of which counter?

FileInputFormatCounter

FileSystemCounter

JobCounter

TaskCounter (NOT SURE)

Q 9 / 33

Hadoop

Hadoop Core supports which CAP capabilities?

A, P

C, A

C, P

C, A, P

Q 10 / 33

Hadoop

What are the primary phases of a Reducer?

combine, map, and reduce

shuffle, sort, and reduce

reduce, sort, and combine

map, sort, and combine

Q 11 / 33

Hadoop

To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the `_` service, which is `_`.

Oozie; open source

Oozie; commercial software

Zookeeper; commercial software

Zookeeper; open source

Q 12 / 33

Hadoop

For high availability, use multiple nodes of which type?

data

name

memory

worker

Q 13 / 33

Hadoop

DataNode supports which type of drives?

hot swappable

cold swappable

warm swappable

non-swappable

Q 14 / 33

Hadoop

Which method is used to implement Spark jobs?

on disk of all workers

on disk of the master node

in memory of the master node

in memory of all workers

Q 15 / 33

Hadoop

In a MapReduce job, where does the map() function run?

on the reducer nodes of the cluster

on the data nodes of the cluster (NOT SURE)

on the master node of the cluster

on every node of the cluster

Q 16 / 33

Hadoop

To reference a master file for lookups during Mapping, what type of cache should be used?

distributed cache

local cache

partitioned cache

cluster cache

Q 17 / 33

Hadoop

Skip bad records provides an option where a certain set of bad input records can be skipped when processing what type of data?

cache inputs

reducer inputs

intermediate values

map inputs

Q 18 / 33

Hadoop

Which command imports data to Hadoop from a MySQL database?

spark import --connect jdbc:mysql://mysql.example.com/spark --username spark --warehouse-dir user/hue/oozie/deployments/spark

sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --warehouse-dir user/hue/oozie/deployments/sqoop

sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --warehouse-dir user/hue/oozie/deployments/sqoop

spark import --connect jdbc:mysql://mysql.example.com/spark --username spark --password spark --warehouse-dir user/hue/oozie/deployments/spark

Q 19 / 33

Hadoop

In what form is Reducer output presented?

compressed (NOT SURE)

sorted

not sorted

encrypted

Q 20 / 33

Hadoop

Which library should be used to unit test MapReduce code?

JUnit

XUnit

MRUnit

HadoopUnit

Q 21 / 33

Hadoop

If you started the NameNode, then which kind of user must you be?

hadoop-user

super-user

node-user

admin-user

Q 22 / 33

Hadoop

State _ between the JVMs in a MapReduce job

can be configured to be shared

is partially shared

is shared

is not shared (https://www.lynda.com/Hadoop-tutorials/Understanding-Java-virtual-machines-JVMs/191942/369545-4.html)

Q 23 / 33

Hadoop

To create a MapReduce job, what should be coded first?

a static job() method

a Job class and instance (NOT SURE)

a job() method

a static Job class

Q 24 / 33

Hadoop

To connect Hadoop to AWS S3, which client should you use?

S3A

S3N

the EMR S3

Q 25 / 33

Hadoop

HBase works with which type of schema enforcement?

schema on write

no schema

external schema

schema on read

Q 26 / 33

Hadoop

HDFS file are of what type?

read-write

read-only

write-only

append-only

Q 27 / 33

Hadoop

A distributed cache file path can originate from what location?

hdfs or top

http

hdfs or http

hdfs

Q 28 / 33

Hadoop

Which library should you use to perform ETL-type MapReduce jobs?

Hive

Pig

Impala

Mahout

Q 29 / 33

Hadoop

What is the output of the Reducer?

`map function processes a certain key-value pair and emits a certain number of key-value pairs and the Reduce function processes values grouped by the same key and emits another set of key-value pairs as output.`

a relational table

an update to the input file

a single, combined list

a set of <key, value> pairs

Q 30 / 33

Hadoop

To optimize a Mapper, what should you perform first?

Override the default Partitioner.

Skip bad records.

Break up Mappers that do more than one task into multiple Mappers.

Combine Mappers that do one task into large Mappers.

Q 31 / 33

Hadoop

When implemented on a public cloud, with what does Hadoop processing interact?

files in object storage

graph data in graph databases

relational data in managed RDBMS systems

JSON data in NoSQL databases

Q 32 / 33

Hadoop

In the Hadoop system, what administrative mode is used for maintenance?

### Q34. In what format does RecordWriter write an output file? ### Q35. To what does the Mapper map input key/value pairs? ### Q36. Which Hive query returns the first 1,000 values? ### Q37. To implement high availability, how many instances of the master node should you configure? ### Q38. Hadoop 2.x and later implement which service as the resource coordinator? ### Q39. In MapReduce, **_** have _ ### Q40. What type of software is Hadoop Common? ### Q41. If no reduction is desired, you should set the numbers of _ tasks to zero ### Q42. MapReduce applications use which of these classes to report their statistics? ### Q43. _ is the query language, and _ is storage for NoSQL on Hadoop ### Q44. MapReduce 1.0 _ YARN ### Q45. Which type of Hadoop node executes file system namespace operations like opening, closing, and renaming files and directories? ### Q46. HQL queries produce which job types? ### Q47 Suppose you are trying to finish a Pig script that converts text in the input string to uppercase. What code is needed on line 2 below?     1 data = LOAD '/user/hue/pig/examples/data/midsummer.txt'...     2 ### Q48. In a MapReduce job, which phase runs after the Map phase completes? ### Q49. Where would you configure the size of a block in a Hadoop environment? ### Q50. Hadoop systems are **_** RDBMS systems. ### Q51. Which object can be used to distribute jars or libraries for use in MapReduce tasks? ### Q52. To view the execution details of an Impala query plan, which function would you use ? ### Q53. Which feature is used to roll back a corrupted HDFS instance to a previously known good point in time? ### Q54. Hadoop Common is written in which language? ### Q55. Which file system does Hadoop use for storage? ### Q56. What kind of storage and processing does Hadoop support? ### Q57. Hadoop Common consists of which components? ### Q58. Most Apache Hadoop committers' work is done at which commercial company? ### Q59. To get information about Reducer job runs, which object should be added? ### Q60. After changing the default block size and restarting the cluster, to which data does the new size apply? ### Q61. Which statement should you add to improve the performance of the following query?  SELECT   c.id,   c.name,   c.email_preferences.categories.surveys FROM customers c;  ### Q62. What custom object should you implement to reduce IO in MapReduce? ### Q63. You can optimize Hive queries using which method? ### Q64. If you are processing a single action on each input, what type of job should you create? ### Q65. The simplest possible MapReduce job optimization is to perform which of these actions? ### Q66. When you implement a custom Writable, you must also define which of these object?

data mode

safe mode

single-user mode

pseudo-distributed mode

<key, value> pairs

keys

values

<value, key> pairs

an average of keys for values

10.

a sum of keys for values

11.

a set of intermediate key/value pairs

12.

a set of final key/value pairs

13.

SELECT…WHERE value = 1000

14.

SELECT … LIMIT 1000

15.

SELECT TOP 1000 …

16.

SELECT MAX 1000…

17.

one

18.

zero

19.

shared

20.

two or more (https://data-flair.training/blogs/hadoop-high-availability-tutorial)

21.

kubernetes

22.

JobManager

23.

JobTracker

24.

YARN

25.

tasks; jobs

26.

jobs; activities

27.

jobs; tasks

28.

activities; tasks

29.

database

30.

distributed computing framework

31.

operating system

32.

productivity tool

33.

combiner

34.

reduce

35.

mapper

36.

intermediate

37.

mapper

38.

reducer

39.

combiner

40.

counter

41.

HDFS; HQL

42.

HQL; HBase

43.

HDFS; SQL

44.

SQL; HBase

45.

does not include

46.

is the same thing as

47.

includes

48.

replaces

49.

ControllerNode

50.

DataNode

51.

MetadataNode

52.

NameNode

53.

Impala

54.

MapReduce

55.

Spark

56.

Pig

57.

as (text:CHAR[]); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

58.

as (text:CHARARRAY); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

59.

as (text:CHAR[]); upper_case = FOREACH data org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

60.

as (text:CHARARRAY); upper_case = FOREACH data org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

61.

Combiner

62.

Reducer

63.

Map2

64.

Shuffle and Sort

65.

dfs.block.size in hdfs-site.xmls

66.

orc.write.variable.length.blocks in hive-default.xml

67.

mapreduce.job.ubertask.maxbytes in mapred-site.xml

68.

hdfs.block.size in hdfs-site.xml

69.

replacements for

70.

not used with

71.

substitutes for

72.

additions for

73.

distributed cache

74.

library manager

75.

lookup store

76.

registry

77.

explain

78.

query action

79.

detail

80.

query plan

81.

partitioning

82.

snapshot

83.

replication

84.

high availability

85.

C++

86.

87.

Haskell

88.

Java

89.

NAS

90.

FAT

91.

HDFS

92.

NFS

93.

encrypted

94.

verified

95.

distributed

96.

remote

97.

Spark and YARN

98.

HDFS and MapReduce

99.

HDFS and S3

100.

Spark and MapReduce

101.

Cloudera

102.

Microsoft

103.

Google

104.

Amazon

105.

Reporter

106.

IntReadable

107.

IntWritable

108.

Writer

109.

all data

110.

no data

111.

existing data

112.

new data

113.

GROUP BY

114.

FILTER

115.

SUB-SELECT

116.

SORT

117.

Comparator

118.

Mapper

119.

Combiner

120.

Reducer

121.

secondary indices

122.

summary statistics

123.

column-based statistics

124.

a primary key index

125.

partition-only

126.

map-only

127.

reduce-only

128.

combine-only

129.

Add more master nodes.

130.

Implement optimized InputSplits.

131.

Add more DataNodes.

132.

Implement a custom Mapper.

133.

a sort policy

134.

a combiner policy

135.

a compression policy

136.

a filter policy

Q 33 / 33

Partitioner controls the partitioning of what data?

SQL Windowing functions are implemented in Hive using which keywords?

Rather than adding a Secondary Sort to a slow Reduce job, it is Hadoop best practice to perform which optimization?

Hadoop Auth enforces authentication on protected resources. Once authentication has been established, it sets what type of authenticating cookie?

MapReduce jobs can be written in which language?

To perform local aggregation of the intermediate outputs, MapReduce users can optionally specify which object?

To verify job status, look for the value `___` in the `___`.

Which line of code implements a Reducer method in MapReduce 2.0?

To get the total number of mapped input records in a map job task, you should review the value of which counter?

Hadoop Core supports which CAP capabilities?

What are the primary phases of a Reducer?

To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the `___` service, which is `___`.

For high availability, use multiple nodes of which type?

DataNode supports which type of drives?

Which method is used to implement Spark jobs?

In a MapReduce job, where does the map() function run?

To reference a master file for lookups during Mapping, what type of cache should be used?

Skip bad records provides an option where a certain set of bad input records can be skipped when processing what type of data?

Which command imports data to Hadoop from a MySQL database?

In what form is Reducer output presented?

Which library should be used to unit test MapReduce code?

If you started the NameNode, then which kind of user must you be?

State _ between the JVMs in a MapReduce job

To create a MapReduce job, what should be coded first?

To connect Hadoop to AWS S3, which client should you use?

HBase works with which type of schema enforcement?

HDFS file are of what type?

A distributed cache file path can originate from what location?

Which library should you use to perform ETL-type MapReduce jobs?

What is the output of the Reducer?

To optimize a Mapper, what should you perform first?

When implemented on a public cloud, with what does Hadoop processing interact?

In the Hadoop system, what administrative mode is used for maintenance?

To verify job status, look for the value `_` in the `_`.

To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the `_` service, which is `_`.