Notes

Notes - notes.io

https://www.latestinterviewquestions.com/hadoop-multiple-choice-questions-answers
1,6,8,12,15,28,29,30,31,32,37,38,39,40,43,45,52,53,56,62,66,68

1. What is a SequenceFile? | Hadoop Mcqs
A. ASequenceFilecontains a binaryencoding ofan arbitrary numberof homogeneous writable objects.

B. ASequenceFilecontains a binary encoding of an arbitrary number of heterogeneous writable objects.

C. ASequenceFilecontains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.

D. ASequenceFilecontains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be sametype.

Answer: D

6. You need to import a portion of a relational database every day as files to HDFS, and generate Java classes to Interact with your imported data. Which of the following tools should you use to accomplish this? | Hadoop Mcqs
A. Pig

B. Hue

C. Hive

D. Flume

E. Sqoop

F. Oozie

G. fuse-dfs

Answer: C,E

8. Workflows expressed in Oozie can contain: | Hadoop Mcqs
A. Iterative repetition of MapReduce jobs until a desired answer or state is reached.

B. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.

C. Sequences of MapReduce jobs only; no Pig or Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.

D. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.

Answer: D

12. Which of the following scenarios makes HDFS unavailable? | Hadoop Mcqs
A. JobTracker failure

B. TaskTracker failure

C. DataNode failure

D. NameNode failure

E. Secondary NameNode failure

Answer: A

15. The Combine stage, if present, must perform the same aggregation operation as Reduce. | Hadoop Mcqs
A. True

B. False

Ans: B

28. What are sequence files and why are they important?
A. Sequence files are binary format files that are compressed and are splitable. They are often used in high-performance map-reduce jobs

B. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted

C. Sequence files are intermediate files that are created by Hadoop after the map step

D. Both B and C are correct

Answer: A

29. What are map files and why are they important?
A. Map files are stored on the namenode and capture the metadata for all blocks on a particular rack.

This is how Hadoop is "rack aware"

B. Map files are the files that show how the data is distributed in the Hadoop cluster.

C. Map files are generated by Map-Reduce after the reduce step. They show the task distribution during job execution

D. Map files are sorted sequence files that also have an index. The index allows fast data look up.

Answer: D

30. How can you use binary data in MapReduce?
A. Binary data can be used directly by a map-reduce job. Often binary data is added to a sequence file.

B. Binary data cannot be used by Hadoop fremework. Binary data should be converted to a Hadoop compatible format prior to loading.

C. Binary can be used in map-reduce only with very limited functionlity. It cannot be used as a key for example.

D. Hadoop can freely use binary files with map-reduce jobs so long as the files have headers

Answer: A

31. What is map - side join?
A . Map-side join is done in the map phase and done in memory

B . Map-side join is a technique in which data is eliminated at the map step

C . Map-side join is a form of map-reduce API which joins data from different locations

D . None of these answers are correct

Answer: A

32. What is reduce - side join?
A. Reduce-side join is a technique to eliminate data from initial data set at reduce step

B. Reduce-side join is a technique for merging data from different sources based on a specific key.

C. Reduce-side join is a set of API to merge data from different sources.

D. None of these answers are correct

Answer: B

37. What is the default input format?
A. The default input format is xml. Developer can specify other input formats as appropriate if xml is not the correct input.

B. There is no default input format. The input format always should be specified.

C. The default input format is a sequence file format. The data needs to be preprocessed before using the default input format.

D. The default input format is TextInputFormat with byte offset as a key and entire line as a value.

Answer: D

38. How can you overwrite the default input format?
A. In order to overwrite default input format, the Hadoop administrator has to change default settings in config file.

B. In order to overwrite default input format, a developer has to set new input format on job config before submitting the job to a cluster.

C. The default input format is controlled by each individual mapper and each line needs to be parsed indivudually.

D. None of these answers are correct.

Answer: B

39. What are the common problems with map-side join?
A. The most common problem with map-side joins is introducing a high level of code complexity.

This complexity has several downsides: increased risk of bugs and performance degradation.

Developers are cautioned to rarely use map-side joins.

B. The most common problem with map-side joins is lack of the avaialble map slots since map-side joins require a lot of mappers.

C. The most common problems with map-side joins are out of memory exceptions on slave nodes.

D. The most common problem with map-side join is not clearly specifying primary index in the join.

This can lead to very slow performance on large datasets.

Answer: C

40. Which is faster: Map-side join or Reduce-side join? Why?
A. Both techniques have about the the same performance expectations.

B. Reduce-side join because join operation is done on HDFS.

C. Map-side join is faster because join operation is done in memory.

D. Reduce-side join because it is executed on a the namenode which will have faster CPU and more memory.

Answer: C

43. Can you run Map - Reduce jobs directly on Avro data?
A. Yes, Avro was specifically designed for data processing via Map-Reduce

B. Yes, but additional extensive coding is required

C. No, Avro was specifically designed for data storage only

D. Avro specifies metadata that allows easier data access. This data cannot be used as part of mapreduce execution, rather input specification only.

Answer: A

45. What is the best performance one can expect from a Hadoop cluster?
A. The best performance expectation one can have is measured in seconds. This is because Hadoop can only be used for batch processing

B. The best performance expectation one can have is measured in milliseconds. This is because Hadoop executes in parallel across so many machines

C. The best performance expectation one can have is measured in minutes. This is because Hadoop can only be used for batch processing

D. It depends on on the design of the map-reduce program, how many machines in the cluster, and the amount of data being retrieved

Answer: A

52. Which process describes the lifecycle of a Mapper?
A. The JobTracker calls the TaskTracker’s configure () method, then its map () method and finally its close () method.

B. The TaskTracker spawns a new Mapper to process all records in a single input split.

C. The TaskTracker spawns a new Mapper to process each key-value pair.

D. The JobTracker spawns a new Mapper to process all records in a single file.

Answer: C

53. Determine which best describes when the reduce method is first called in a MapReduce job?
A. Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The programmer can configure in the job what percentage of the intermediate data should arrive before the reduce method begins.

B. Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called only after all intermediate data has been copied and sorted.

C. Reduce methods and map methods all start at the beginning of a job, in order to provide optimal performance for map-only or reduce-only jobs.

D. Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called as soon as the intermediate key-value pairs start to arrive.

Answer: D

56. In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values?
A. The values are in sorted order.

B. The values are arbitrarily ordered, and the ordering may vary from run to run of the same MapReduce job.

C. The values are arbitrary ordered, but multiple runs of the same MapReduce job will always have the same ordering.

D. Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values.

Answer: B

62. Given a directory of files with the following structure: line number, tab character, string:
Example:
abialkjfjkaoasdfjksdlkjhqweroij
kadf jhuwqounahagtnbvaswslmnbfgy
kjfteiomndscxeqalkzhtopedkfslkj
You want to send each line as one record to your Mapper. Which InputFormat would you use to complete
the line: setInputFormat (________.class);
A. BDBInputFormat

B. KeyValueTextInputFormat

C. SequenceFileInputFormat

D. SequenceFileAsTextInputFormat

Answer: C

68. Workflows expressed in Oozie can contain:
A. Iterative repetition of MapReduce jobs until a desired answer or state is reached.

B. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.

C. Sequences of MapReduce jobs only; no Pig or Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.

D. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.

Answer: D

Notes.io is a web-based application for taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000 notes created and continuing...

With notes.io;

* You can take a note from anywhere and any device with internet connection.
* You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
* You can quickly share your contents without website, blog and e-mail.
* You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
* Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.

Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.

Easy: Notes.io doesn’t require installation. Just write and share note!

Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )

Free: Notes.io works for 12 years and has been free since the day it was started.

You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;

Email: [email protected]

Twitter: http://twitter.com/notesio

Instagram: http://instagram.com/notes.io

Facebook: http://facebook.com/notesio

Regards;
Notes.io Team

Notes

Notes - notes.io

Shortened Note Link

Long File

Notes