Using Machine learning techniques, we have classified top 1 million alexa sites into categories. We used data available from various sources like DMOZ, open directory project etc. to train our learning algorithm, which was refined by manual feedback from usage of over 100K users , providing feedback on the quality of classification
2. Identify Tags : Identify common tags associated with the URL or document .
Quickly visualize what the web page or document is about
3. Search : Index the data for efficient searching. Full text search for documents
Currently done using Lucene / SOLR
Under progress: Ability to index the whole page that has been added. Full text search capabilities of PDF and Word documents.
The autocomplete feature provides suggestions for search terms.
(replace buy -> read )
We primarily use two kinds of approach for computing recommended articles
1) Content-based systems examine properties of the items recommended. For instance, if a user reads many articles on Technology Startups, then we recommend articles on those from our database
2) Collaborative filtering systems recommend items based on similarity measures between users and/or items. The items recommended to a user are those preferred by similar users.
5. Extract Named entities
Named-entity recognition (NER) is an information extraction procedure that seeks to locate and classify elements in text into pre-defined categories such as the names of persons, organizations, locations etc. NER helps in understanding the text better and also improves other takss such as search and text analysis
6 Relation / Ontology Identify the relations
The Knowledge Graph also helps us understand the relationships between things. Marie Curie is a person in the Knowledge Graph, and she had two children, one of whom also won a Nobel Prize, as well as a husband, Pierre Curie, who claimed a third Nobel Prize for the family. All of these are linked in our graph. It’s not just a catalog of objects; it also models all these inter-relationships. It’s the intelligence between these different entities that’s the key.
7. Interest Graph
There are a number of uses for interest graphs both from a personal and business standpoint. Interest graphs can be applied in conjunction with social graphs as a way to meet or connect to people in a social network
or community who have shared or common interests, and who may not necessarily otherwise know each other. 
Interest graphs can also be applied to marketing for purposes such as audience analytics and audience-based buying, for sentiment analysis, and for advertising as another form of behavioral profiling and targeting based on interests. As an example, through the use of interest graphs companies like Twitter
are able to target ads more specifically based on their users’ individual interests.
Interest graphs may be applied to product development by using customer interests to help determine which new features or capabilities to provide in future versions of a product.
Interest graphs have many other uses as well
research and other content discovery and filtering
as input to recommendation engines for films, books, music, etc.,
and for learning and education.
8. Identify Communities / groups / network
Segment the users into groups based on their shared interests. Useful to identify the influencers, domain experts, targeted marketing etc.