c# - n-grams using regex -

- April 15, 2013

I am working on an Enhanced and Alternative Communication (AAC) program. My current goal is to store the history of input / spoken text and to find the common word fragment or word NGram, I am currently using the implementation based on the lzw compression algorithm as discussed. Although this approach does not behave in the way of production of NGram.

For example, I say that I "enter the mountain and through the jungle" many times. My desired production will be "the whole phrase" on the mountain and through the jungle. The use of my current implementation has broken into the Trigram and one word has been added on each repeating word. So on the first entry I get "on the mountain".

This is an examination - this is another test.

We believe that we have the following text:

This is also a test - The emergency broadcast system test interrupted my favorite song

My goal would be to "test this emergency broadcast system" I entered the next "that is a test" And "test the emergency broadcast system" to return the sub within a regex I can industry. Is it anything that is possible through regesx or is I going on the wrong path? I appreciate any help.

What do I need with regular expressions, the technique shown alone comes close.

I ended up using a combination of my initial system with some regiments as shown below.

This parse is the transcript of the first presidential debate (approximately 16,500 words) in approximately 30 seconds, which is quite fast for my purposes.

Search This Blog

V MVP

c# - n-grams using regex -

Comments

Post a Comment

Popular posts from this blog

php - multilevel menu with multilevel array -

c# - TypeConverter in propertygrid only converts from string, not to -

jQuery UI: Datepicker month format -