Rails design doubt: Should/could I load the whole dictionary/table into memory? -
I'm a newbie in a simple rail application that translates a document (long string) from one language to another Dictionary is a table of the word (to find and move a string regexp, and a block that uses a replacement string). The table is 1 million records long.
Each request is a document that wants to translate. In the cruel force approach for the first time, I need to run a whole dictionary against every request / document.
Since the dictionary will run all the time (from the first record to the last), instead of loading the table of records with each document, I think the whole dictionary in the form of an array in memory To be kept as an array.
I know that this is not the most efficient, but the dictionary is run completely at this point.
1.- If no skill can be obtained by restructuring the document and dictionary (which means that it is not possible to create small subsets of the dictionary) What is the best design approach?
2.- Do you know about similar projects that I can learn?
3.- How should I learn how to load such a large table in the memory (cash?) At the train startup?
The answer to any question will be very much appreciated, thank you very much!
I do not think your web hoster will be happy with the solution in this way. Do this script
dict = {} (0..1000_000) .each | Number | Another way to store a hash is to store gigabytes of RAM on its MBP [/ # {num} /] = "# {num} _subst" end
Your replacement should be stored in the memcade so that you can store them at least in the machines.
'rubygems' is required 'memcached' @table = memcached.new ("localhost: 11211") retained_keys = (0..1000_000) .each do. Number | Archived_blob = martial dump ([/ # {num} /, "# {num} _subst"]) @ table .set ("p # {num}", stored_blob) end
There will be a need to worry about keeping "hot" keys because they will not be required for a memcatch.
However the best way, for your case, it would be very simple - write your replacement in a file (one line for each replacement) and create a stream-filter that reads the file line through the line , And changes from this file. You can also parallel it according to the mapping function, say, the replacement letter and the replacement marker.
But this should start with you:
"base64" file is required. Open (". / Dict.marshal", "wb"). File | (0..1000_000) .each Do | | Number | Stored_blob = Base64.encode64 (Martial Dump ([/ # {num} /, "# {num} _subst"])) file.puts (stored_blob) puts the end "Table population (should be a 35 meg file), now Let's go to the replacement "File.open (". / Dict.marshal "," r "). F | Until F. Pattern, Replacement = Marshal.load (Base64.decode64 (f.gets)) puts the end end "all replacement out"
to populate the file and to load each replacement For, it seems to me:
the real 0m21.262s user 0m19.100s sys 0m0.502s
just to load the string from the regexp and file (All million, piece slice)
Real 0m7.855s user 0m7.645s sys 0m0.105s
So this is 7 seconds IO overhead, but you Any memory Ona is a huge room (and improved) - Arsaijed almost 3 Mhz. If you do Io in bulk, or make a file for 10-50 replacements and load them completely, then you should be able to run it easily, keep the file on an SSD or RAID and you will get a winner Found, but you can keep your RAM.
Comments
Post a Comment