Skip to content Skip to sidebar Skip to footer

In Count Vectorizer Which Axis To Use?

I want to create a document term matrix. In my case it is not like documents x words but it is sentences x words so the sentences will act as the documents. I am using 'l2' normali

Solution 1:

By L2 normalization, do you mean division by the total count? If you normalize along axis=0, then the value of x_{i,j} is the probability of the word j over all sentences i (division by the global word count), which is dependent on the length of the sentence, as longer ones can repeat some words over and over again and will have a much higher probability for this word, as they contribute a lot to the global word count. If you normalize along axis=1, then you're asking whether sentences have the same composition of words, as you normalize along the lenght of the sentence.


Post a Comment for "In Count Vectorizer Which Axis To Use?"