kp1197 19 hours ago Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?
Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?