Posted June 12, 2007 at 08:06pm in
Computers
I’m sure some of you know that Apple’s browser Safari has been released for Windows a few days ago. I gave it a try and I like it, I wish you could change the blue apple style scroll bars to a different color, but oh well. It throws another browser into the mix and as I was typing this Safari crashed. I should mention it is Beta, and can be downloaded at Apple’s Website.
Posted June 12, 2007 at 06:06pm in
Programming
My latest project has been the document management site for the FTA and one of the underlying features was to index all of the PDF, HTML, TXT, DOC, XLS, and RTF files inside of SOLR. SOLR currently does not provide a means of grabbing the text from binary files, other than CSV files. To provide this functionality, Eric Pugh, one of the principles of OpenSource Connections and a Java expert implemented code from our previous Java developer and made some bug fixes. This brought us to a point where we can call SOLR with via GET request to one of the binary file handlers and it will pull the data from that file and directly index it into SOLR. Highlighting words in the text is an additional feature of SOLR so when a user does a search for a word, lets say “bus”, it will be displayed very similar to the layout of Google, but with the words highlighted in yellow. It is a simple link to the document and a snippet of text from the document with highlighted keywords. This weekend I will get the opportunity to meet a friend of Eric Pughs’, Eric Hatcher, who is a contributor to the SOLR project, I am excited.
A little more background on SOLR, Solr is a search server written in Java and uses the Lucene search library. A number of large companies are currently using Solr to power their enterprise applications; CNET uses it for their Reviews and Shopper.com search. A few more examples are Discogs, Search.com, News.com,and the powerful archive.org uses it exclusively.
Solr operates as a REST web service, which means it is very simple to work with and you can use any language you want. You can update and query via GET and delete via POST. This is an example I will be using.
http://localhost:8983/solr/select/?tr=example.xsl&wt=xslt&indent=on&version=2.1&q=consistent&start=0&rows=10&fl=id&qt=instock&hl.fl=features,sku&hl.snippets=2&hl=true
Which is sending a query for “consistent”, we want to indent the text, start at record 0, and have 10 rows. We want to translate XSLT with the XSLT write handler, and highlight the search words. The snippets value defines how many summarized results will be displayed. In my example I don’t want a lot of other information.
I recommend checking SOLR out.