1.
final keys
2.
final values
3.
intermediate keys
4.
intermediate values
Q 1 / 33
1.
UNION DISTINCT, RANK
2.
OVER, RANK
3.
OVER, EXCEPT
4.
UNION DISTINCT, RANK
Q 2 / 33
1.
Add a partitioned shuffle to the Map job.
2.
Add a partitioned shuffle to the Reduce job.
3.
Break the Reduce job into multiple, chained Reduce jobs.
4.
Break the Reduce job into multiple, chained Map jobs.
Q 3 / 33
1.
encrypted HTTP
2.
unsigned HTTP
3.
compressed HTTP
4.
signed HTTP
Q 4 / 33
1.
Java or Python
2.
SQL only
3.
SQL or Java
4.
Python or SQL
Q 5 / 33
1.
Reducer
2.
Combiner
3.
Mapper
4.
Counter
Q 6 / 33
1.
SUCCEEDED; syslog
2.
SUCCEEDED; stdout
3.
DONE; syslog
4.
DONE; stdout
Q 7 / 33
1.
public void reduce(Text key, Iterator<IntWritable> values, Context context){…}
2.
public static void reduce(Text key, IntWritable[
3.
public static void reduce(Text key, Iterator<IntWritable> values, Context context){…}
4.
public void reduce(Text key, IntWritable[
Q 8 / 33
1.
FileInputFormatCounter
2.
FileSystemCounter
3.
JobCounter
4.
TaskCounter (NOT SURE)
Q 9 / 33
1.
A, P
2.
C, A
3.
C, P
4.
C, A, P
Q 10 / 33
1.
combine, map, and reduce
2.
shuffle, sort, and reduce
3.
reduce, sort, and combine
4.
map, sort, and combine
Q 11 / 33
1.
Oozie; open source
2.
Oozie; commercial software
3.
Zookeeper; commercial software
4.
Zookeeper; open source
Q 12 / 33
1.
data
2.
name
3.
memory
4.
worker
Q 13 / 33
1.
hot swappable
2.
cold swappable
3.
warm swappable
4.
non-swappable
Q 14 / 33
1.
on disk of all workers
2.
on disk of the master node
3.
in memory of the master node
4.
in memory of all workers
Q 15 / 33
1.
on the reducer nodes of the cluster
2.
on the data nodes of the cluster (NOT SURE)
3.
on the master node of the cluster
4.
on every node of the cluster
Q 16 / 33
1.
distributed cache
2.
local cache
3.
partitioned cache
4.
cluster cache
Q 17 / 33
1.
cache inputs
2.
reducer inputs
3.
intermediate values
4.
map inputs
Q 18 / 33
1.
spark import --connect jdbc:mysql://mysql.example.com/spark --username spark --warehouse-dir user/hue/oozie/deployments/spark
2.
sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --warehouse-dir user/hue/oozie/deployments/sqoop
3.
sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --warehouse-dir user/hue/oozie/deployments/sqoop
4.
spark import --connect jdbc:mysql://mysql.example.com/spark --username spark --password spark --warehouse-dir user/hue/oozie/deployments/spark
Q 19 / 33
1.
compressed (NOT SURE)
2.
sorted
3.
not sorted
4.
encrypted
Q 20 / 33
1.
JUnit
2.
XUnit
3.
MRUnit
4.
HadoopUnit
Q 21 / 33
1.
hadoop-user
2.
super-user
3.
node-user
4.
admin-user
Q 22 / 33
1.
can be configured to be shared
2.
is partially shared
3.
is shared
4.
is not shared (https://www.lynda.com/Hadoop-tutorials/Understanding-Java-virtual-machines-JVMs/191942/369545-4.html)
Q 23 / 33
1.
a static job() method
2.
a Job class and instance (NOT SURE)
3.
a job() method
4.
a static Job class
Q 24 / 33
1.
S3A
2.
S3N
3.
S3
4.
the EMR S3
Q 25 / 33
1.
schema on write
2.
no schema
3.
external schema
4.
schema on read
Q 26 / 33
1.
read-write
2.
read-only
3.
write-only
4.
append-only
Q 27 / 33
1.
hdfs or top
2.
http
3.
hdfs or http
4.
hdfs
Q 28 / 33
1.
Hive
2.
Pig
3.
Impala
4.
Mahout
Q 29 / 33
`map function processes a certain key-value pair and emits a certain number of key-value pairs and the Reduce function processes values grouped by the same key and emits another set of key-value pairs as output.`
1.
a relational table
2.
an update to the input file
3.
a single, combined list
4.
a set of <key, value> pairs
Q 30 / 33
1.
Override the default Partitioner.
2.
Skip bad records.
3.
Break up Mappers that do more than one task into multiple Mappers.
4.
Combine Mappers that do one task into large Mappers.
Q 31 / 33
1.
files in object storage
2.
graph data in graph databases
3.
relational data in managed RDBMS systems
4.
JSON data in NoSQL databases
Q 32 / 33
### Q34. In what format does RecordWriter write an output file? ### Q35. To what does the Mapper map input key/value pairs? ### Q36. Which Hive query returns the first 1,000 values? ### Q37. To implement high availability, how many instances of the master node should you configure? ### Q38. Hadoop 2.x and later implement which service as the resource coordinator? ### Q39. In MapReduce, **_** have _ ### Q40. What type of software is Hadoop Common? ### Q41. If no reduction is desired, you should set the numbers of _ tasks to zero ### Q42. MapReduce applications use which of these classes to report their statistics? ### Q43. _ is the query language, and _ is storage for NoSQL on Hadoop ### Q44. MapReduce 1.0 _ YARN ### Q45. Which type of Hadoop node executes file system namespace operations like opening, closing, and renaming files and directories? ### Q46. HQL queries produce which job types? ### Q47 Suppose you are trying to finish a Pig script that converts text in the input string to uppercase. What code is needed on line 2 below? 1 data = LOAD '/user/hue/pig/examples/data/midsummer.txt'... 2 ### Q48. In a MapReduce job, which phase runs after the Map phase completes? ### Q49. Where would you configure the size of a block in a Hadoop environment? ### Q50. Hadoop systems are **_** RDBMS systems. ### Q51. Which object can be used to distribute jars or libraries for use in MapReduce tasks? ### Q52. To view the execution details of an Impala query plan, which function would you use ? ### Q53. Which feature is used to roll back a corrupted HDFS instance to a previously known good point in time? ### Q54. Hadoop Common is written in which language? ### Q55. Which file system does Hadoop use for storage? ### Q56. What kind of storage and processing does Hadoop support? ### Q57. Hadoop Common consists of which components? ### Q58. Most Apache Hadoop committers' work is done at which commercial company? ### Q59. To get information about Reducer job runs, which object should be added? ### Q60. After changing the default block size and restarting the cluster, to which data does the new size apply? ### Q61. Which statement should you add to improve the performance of the following query? SELECT c.id, c.name, c.email_preferences.categories.surveys FROM customers c; ### Q62. What custom object should you implement to reduce IO in MapReduce? ### Q63. You can optimize Hive queries using which method? ### Q64. If you are processing a single action on each input, what type of job should you create? ### Q65. The simplest possible MapReduce job optimization is to perform which of these actions? ### Q66. When you implement a custom Writable, you must also define which of these object?
1.
data mode
2.
safe mode
3.
single-user mode
4.
pseudo-distributed mode
5.
<key, value> pairs
6.
keys
7.
values
8.
<value, key> pairs
9.
an average of keys for values
10.
a sum of keys for values
11.
a set of intermediate key/value pairs
12.
a set of final key/value pairs
13.
SELECT…WHERE value = 1000
14.
SELECT … LIMIT 1000
15.
SELECT TOP 1000 …
16.
SELECT MAX 1000…
17.
one
18.
zero
19.
shared
20.
two or more (https://data-flair.training/blogs/hadoop-high-availability-tutorial)
21.
kubernetes
22.
JobManager
23.
JobTracker
24.
YARN
25.
tasks; jobs
26.
jobs; activities
27.
jobs; tasks
28.
activities; tasks
29.
database
30.
distributed computing framework
31.
operating system
32.
productivity tool
33.
combiner
34.
reduce
35.
mapper
36.
intermediate
37.
mapper
38.
reducer
39.
combiner
40.
counter
41.
HDFS; HQL
42.
HQL; HBase
43.
HDFS; SQL
44.
SQL; HBase
45.
does not include
46.
is the same thing as
47.
includes
48.
replaces
49.
ControllerNode
50.
DataNode
51.
MetadataNode
52.
NameNode
53.
Impala
54.
MapReduce
55.
Spark
56.
Pig
57.
as (text:CHAR[]); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
58.
as (text:CHARARRAY); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
59.
as (text:CHAR[]); upper_case = FOREACH data org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
60.
as (text:CHARARRAY); upper_case = FOREACH data org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
61.
Combiner
62.
Reducer
63.
Map2
64.
Shuffle and Sort
65.
dfs.block.size in hdfs-site.xmls
66.
orc.write.variable.length.blocks in hive-default.xml
67.
mapreduce.job.ubertask.maxbytes in mapred-site.xml
68.
hdfs.block.size in hdfs-site.xml
69.
replacements for
70.
not used with
71.
substitutes for
72.
additions for
73.
distributed cache
74.
library manager
75.
lookup store
76.
registry
77.
explain
78.
query action
79.
detail
80.
query plan
81.
partitioning
82.
snapshot
83.
replication
84.
high availability
85.
C++
86.
C
87.
Haskell
88.
Java
89.
NAS
90.
FAT
91.
HDFS
92.
NFS
93.
encrypted
94.
verified
95.
distributed
96.
remote
97.
Spark and YARN
98.
HDFS and MapReduce
99.
HDFS and S3
100.
Spark and MapReduce
101.
Cloudera
102.
Microsoft
103.
104.
Amazon
105.
Reporter
106.
IntReadable
107.
IntWritable
108.
Writer
109.
all data
110.
no data
111.
existing data
112.
new data
113.
GROUP BY
114.
FILTER
115.
SUB-SELECT
116.
SORT
117.
Comparator
118.
Mapper
119.
Combiner
120.
Reducer
121.
secondary indices
122.
summary statistics
123.
column-based statistics
124.
a primary key index
125.
partition-only
126.
map-only
127.
reduce-only
128.
combine-only
129.
Add more master nodes.
130.
Implement optimized InputSplits.
131.
Add more DataNodes.
132.
Implement a custom Mapper.
133.
a sort policy
134.
a combiner policy
135.
a compression policy
136.
a filter policy
Q 33 / 33