mapreduce - Hadoop searching words from one file in another file -
I want to create a houp application that can read a word from a file and search in another file.
If the word is present - it must be written in an output file if the word does not exist - it must be typed on any other output file
I tried some examples, I have two The questions are
Two files are about 200 MB Checking each word in any other file can be out of memory Is there an alternative way of doing this?
How can the data be written in different files, because while shortening the stage of the hoop only writes a file, is it possible to have a filter to reduce the phase of data writing in different output files is?
Thank you.
How can I do this?
- In the words 'map' In the divided value, emit (& lt; word>, & lt; source>) (* 1)
- You will find in 'less': (
, & lt; source list> ) - Check the source list (may be long for both / all sources)
- If all the sources are not in the list, then each time (& lt; missingsource>, &
- job2: job.setNumReduceTasks (& lt; numberofsources>)
- job2: 'map' (& lt; missingsource>, & lt; word >) Emit in
- job2: for each; Missingsource> 'less' in all (empty, word>)
The less-you output is different from the & lt; Missingsources>, each containing the missing words for the document. To mark the files, at the beginning of 'less' you can & lt; Missingsource> can write once.
(* 1) How to find the source in the map (0.20):
Private string local name; Private lessons outkey = new text (); Private Text Outview = New Text (); ... public zero setup (context reference) throws interrupted exception, IOException {super.setup (reference); Local name = ((file split) context.getInputSplit ()). GetPath () ToString (); } Public Zero map (Object key, text value, context reference) throws IOException, interrupted ... exception {... outkey.set (...); Outvalue.set (localname); Context.write (outkey, outvalue); }
Comments
Post a Comment