python - tricky string matching -


I have to find the first index of substrings in a big string. I only want to match it with whole words and I want it to be case-insensitive, except that I want to treat it as separate words as Camel Seas.

Below is the code move, but it is slow. Any suggestions I want to speed up? I was trying some regex stuff, but one that could not get all the edge matters handled.

  DIF word_start_index (text, seek_word): start_index = 0 curr_word = "" DEF case_change (): return curr_word and ch.isupper () and curr_word [-1] .islower () def is_match (): Return to curr_word.lower () == seek_word.lower () i, in enumerate (text) ch: if case_change () or not ch.isalnum (): if_match (): return_index curr_word = "" start_index = None if ch.isalnum (): if start_index is none: start_index = i curr_word + = ch if is_match (): return_index return if __name__ test_words = ["a", "foo", "bar", "baz" , "Golf", "cart", "fred"] == "__main__" for the word in test_words: # 01234567890123456789012345 test_text = "a_foobar_foobar baz golf_CART": Match_start = word_start_index (test_text, term) print match_start, Word  

Output:

  0 9 Fu 12 times 16 Falcon 20 Golf 25 Cart None Feed  

If I was doing this with regular expressions then I would probably like to do this:

  Def word_start_index2 (text, seek_word): camel_case = seek_word [0] .upper () + seek_word [1:] Less () seek_word_i = '' .join ('[' c.lower (+) + c.upper () + ']' in search for c) regex1 = 'r' (?: (? & Lt; = [^ a -zA-Z]) | ^ 'seek_word_i + r' (? = $ | [^ A-zA-Z]) 'regex2 = r' (?: (? & Lt; = [az] | [^ AZ] ) | ^ '+ Camel_case + r' (? = $ | [Az] | [^ az]) 'regex ='% S |% s'% (regex1, regex2) import again m = re.search (regex, text ) If not m: return none other: return m.start ()  

I have not tested the performance against my version, but you can try to see that Is it better or bad and let us know.

My answer can give you different output On the six side cases, but in your comments you said that you do not care about these matters.

In addition, I tried to use the notation (? I) part of Regex as insensitive, but for some reason it fails to work properly. Why can not I explain it?

The last self-knitwear: The function needs to validate its arguments. What is left for the code clarity. You must add at least check for:

  • The text should be a string
  • seek_word must match string '[a-zA-Z] +'

Comments

Popular posts from this blog

php - multilevel menu with multilevel array -

c# - TypeConverter in propertygrid only converts from string, not to -

jQuery UI: Datepicker month format -