vb.net - How to find all occurrences of specific string in long text -
i have long text (e.g. information many books) in 1 string , in 1 line.
i want find isbn (only number - each number prevents chars isbn). found code how extract number on first position. problem how create loop text. can use example streamreader? thank answers.
example:
sub main() dim getliteratura string = "'author 1. name of book 1. isbn 978-80-251-2025-5.', 'author 2. name of book 2. isbn 80-01-01346.', 'author 3. name of book. isbn 80-85849-83.'" dim test integer = getliteratura.indexof("isbn") dim getisbn string = getliteratura.substring(test + 5, getliteratura.indexof(".", test + 1) - test - 5) console.write(getisbn) console.readkey() end sub
since can pass start position indexof
method, can loop through string starting search last iteration left off. instance:
dim getliteratura string = "'author 1. name of book 1. isbn 978-80-251-2025-5.', 'author 2. name of book 2. isbn 80-01-01346.', 'author 3. name of book. isbn 80-85849-83.'" dim isbns new list(of string)() dim position integer = 0 while position <> -1 position = getliteratura.indexof("isbn", position) if position <> -1 dim endposition integer = getliteratura.indexof(".", position + 1) if endposition <> -1 isbns.add(getliteratura.substring(position + 5, endposition - position - 5)) end if position = endposition end if end while
that efficient of method find, if data loaded string. however, method not readable or flexible. if things concern more mere efficiency, may want consider using regex:
for each match in regex.matches(getliteratura, "isbn (?<isbn>.*?)\.") isbns.add(i.groups("isbn").value) next
as can see, not easier read, configurable. store pattern externally in resource, configuration file, database, etc.
if data isn't loaded string, , efficiency utmost concern, may want using stream reader load small subset of data memory @ once. logic bit more complicated, still not overly difficult.
here's simple example of how via streamreader
:
dim isbns new list(of string)() using reader streamreader = new streamreader(stream) dim builder new stringbuilder() dim isbnregex new regex("isbn (?<isbn>.*?)\.") while not reader.endofstream dim charvalue integer = reader.read() if charvalue <> -1 builder.append(convert.tochar(charvalue)) dim matches matchcollection = isbnregex.matches(builder.tostring()) if matches.count <> 0 each match in matches isbns.add(i.groups("isbn").value) next builder.clear() end if end if end while end using
as can see, in example, match found, adds list , clears out builder
being used buffer. way, amount of data being held in memory @ 1 time never more size of 1 "record".
update
since, based on comments, you're having trouble getting work properly, here full working sample outputs just isbn numbers, without of surrounding characters. create new vb.net console application , paste in following code:
imports system.text.regularexpressions module module1 public sub main() dim data string = "'author 1. name of book 1. isbn 978-80-251-2025-5.', 'author 2. name of book 2. isbn 80-01-01346.', 'author 3. name of book. isbn 80-85849-83.'" each string in getisbns(data) console.writeline(i) next console.readkey() end sub public function getisbns(data string) list(of string) dim isbns new list(of string)() each match in regex.matches(data, "isbn (?<isbn>.*?)\.") isbns.add(i.groups("isbn").value) next return isbns end function end module
Comments
Post a Comment