Friday, February 24, 2012

[Full Text Search] automatic document classification

Hi every body,
I want to use Full Text Search of SQL Server 2005 to automatically classify
my documents; by classifying I mean to give each document a type considering
the words it contains.
To give a type to a document, I think that one solution is to store the
document in the Database, the proceed the Full Text indexing, and finally
compare the indexed keywords with the words that have to be in each type.
The, I ill result the type that matches the greatest number of words.
Do you think this solution gonna solve my problem ?
Please give me your remarks, and ask me if you need more details.
ThanksThis is a classic way of doing categorization. Basically you create a vector
with a must have term or terms, and then weight the rest using isabout and
order by rand desc. The ones with the highest rank are closest to your
vector.
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Happy" <Happy@.discussions.microsoft.com> wrote in message
news:F7ED2D33-0335-4F9F-A385-53FD54DFADED@.microsoft.com...
> Hi every body,
> I want to use Full Text Search of SQL Server 2005 to automatically
> classify
> my documents; by classifying I mean to give each document a type
> considering
> the words it contains.
> To give a type to a document, I think that one solution is to store the
> document in the Database, the proceed the Full Text indexing, and finally
> compare the indexed keywords with the words that have to be in each type.
> The, I ill result the type that matches the greatest number of words.
> Do you think this solution gonna solve my problem ?
> Please give me your remarks, and ask me if you need more details.
> Thanks
>|||Hi Hilary
Can You give me a link that illustrate an example of categorization ?
Thanks
"Hilary Cotter" wrote:

> This is a classic way of doing categorization. Basically you create a vect
or
> with a must have term or terms, and then weight the rest using isabout and
> order by rand desc. The ones with the highest rank are closest to your
> vector.
> --
> Hilary Cotter
> Director of Text Mining and Database Strategy
> RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
> This posting is my own and doesn't necessarily represent RelevantNoise's
> positions, strategies or opinions.
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
>
> "Happy" <Happy@.discussions.microsoft.com> wrote in message
> news:F7ED2D33-0335-4F9F-A385-53FD54DFADED@.microsoft.com...
>
>

No comments:

Post a Comment