September 2008 spanish newpapers analysis The aim of this experiment is to develop a process to synthesize in a map the documents interest centers (in this case news extracted from the web). We have developed a system to extract web documents, suggesting entities and analyze the co-occurrence relations between selected entities. Follow we described the process:
- Extracts news papers (the news are classified into newspaper categories: national, international, business, etc)
Select a news set (economies news , international news, Google news, etc) - Select a entities set extracted from the selected news
Calculate the co-occurrence between extracted entities
Filter the co-occurrence network - Visualize the co-occurrence network
Maps collection:To interpret the maps we must take account:
- We only analysis september 2008 spanish news
- The extracted networks are not social networks but represent the most relevant actors in the interest centers and the co-ocurrences relations between them
- Networks can be analyzed using ARS techniques. For example, we can determine the most national or international influential leaders, using pagerank algorithm.
Note: For this experiment have been used TREDAR and BITUS. TREDAR is a wiki programming environment for web applications and BITUS (Bits under surveillance) is a package software to integrate different information sources and extract knowledge from them (both tools are in development stage). Guess was used to visualize maps.
|