+ All Categories
Home > Documents > Information aggregation in an entrepreneurship portal

Information aggregation in an entrepreneurship portal

Date post: 09-Jan-2023
Upload: unl-pt
View: 0 times
Download: 0 times
Share this document with a friend
Information Aggregation in an Entrepreneurship Portal Rui Figueiredo ISCTE University Institute of Lisbon Lisbon - Portugal [email protected] Carlos J. Costa Adetti-IUL/ISCTE University Institute of Lisbon Lisbon – Portugal [email protected] Manuela Aparicio Adetti-IUL/ISCTE University Institute of Lisbon Lisbon – Portugal [email protected] ABSTRACT This study can be applied in multiple contexts. The main focus is to demonstrate the ability to gather information from multiple sources, using open source platforms and technologies for publishing information. In this paper, we purpose a conceptual model to an entrepreneurship portal based on a search engine paradigm. Here it is purposed model derived from the literature and validated using open source technology. In this study we propose an open source platform That aggregates information from Various sources, organizing the information needed. On this platform the main task is to gather information from Several sources, using publishing technologies. Described in this paper it multiple scenarios Applied to an entrepreneurship portal. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features - abstract data types, polymorphism, control structures. This is just an example, please use the correct category and subject descriptors for your submission. General Terms Management, Economics, Human Factors, Standardization, Theory. Keywords Open source, Search Engine, Portal, Entrepreneurship, Information Retrieval, Design of Communication 1. INTRODUCTION In the Internet we find a wide amount of information that is the is the main reason being wellness is a rich source for data mining. Information retrieval Manuel Can Be a tedious process and Can team turns into quite consuming, Especially When a large part of the information does not match what users seek. Unfortunately, the need to characterize the user information is not a simple problem. The Can narrow your search criteria, Obtaining concrete results, but We Can get irrelevant information to the user. Many broadcasting technologies, meet different types of unstructured content collections. It is quite common users That's not get the desired results, When They Their search information using natural language. Some authors study the information seeking behavior, focusing on the modes of scanning When searching on the World Wide Web [4]. seeking information through preformed Various Can Be modes of information seeking: sweeping, discriminating, satisfying and optimizing. This approach is based on the users behavior of the weather They search an informal or a formal search, or if the undirected users or conditioned viewing. In undirected viewing, the user has no private information in mind. In the conditioned viewing information is directed to known sources Which are browsed or bookmarked [3]. "Individuals Actively create the meaning of information through Their thoughts, feelings and actions" [4]. Semantic Web is a concept promoted to Improve the capabilities of the current World Wide Web [10], for it is the study of the meaning of words and linguistic expressions. As mentioned before, finding information on a particular subject through searching the web is inexact since there is not a standard way to describe data content of web pages. The semantic web describes the use of metadata modeled by some standards (XML, RDF and Dublin Core), this standard to help users find information more readily. Semantic Web Also Refers to reach information through the use of software agents, agents are software programs Those That Can preform tasks to assist humans. These agents automatically gather information from the Internet According to the terms used by the user. Tim Berners Lee et al (2001) Say That "The Semantic Web, in contrast, is more flexible. The consumer and producer agents Can reach a correct understanding shared by Exchanging ontologies, Which Provide the vocabulary needed for discussion. "Semantic Web Can give a more exact and need to return to the search made. Therefore including the terms in the search Can Be the solution, being wellness Referred by some authors as a good practice to deepen and refine search users, However Can lead to false issues. This paper demonstrates an attempt to solve an erratic search return output problem, gathering information publishing technologies and Their ordering. The simple object Can Be misleading, Causing a shift in search. Data retrieval is a solution to the user of a database system, it solves the problem of searching for a particular topic, subject, topic. To be more is needed it needs an IR (Information Retrieval) That system interprets the contents of the collection and assign a score (rank) According the level of relevance of the query. The Interpretation of content Involves extracting syntactic and semantic content and Their intersection that is identical to what 45 Permission to make digital or hard copies of all or part of this work for personal or classroom use is-granted without fee provided That copies are not made or distributed for profit or commercial advantage and That copies bear this notice and the full citation on the first page . Otherwise to copy, or republish, to post on servers or to redistribute to lists, requires prior specific permission and / or a fee. OSDOC'12, June, 11, 2012, Lisboa, Portugal. Copyright 2012 ACM 978-1-4503-1284-4/12/0006..$10.00.

Information Aggregation in an Entrepreneurship PortalRui Figueiredo

ISCTEUniversity Institute of Lisbon

Lisbon - Portugal

[email protected]

Carlos J. CostaAdetti-IUL/ISCTE

University Institute of Lisbon Lisbon – Portugal

[email protected]

Manuela AparicioAdetti-IUL/ISCTE

University Institute of Lisbon Lisbon – Portugal

[email protected] study can be applied in multiple contexts. The main focus is to demonstrate the ability to gather information from multiple sources, using open source platforms and technologies for publishing information. In this paper, we purpose a conceptual model to an entrepreneurship portal based on a search engine paradigm. Here it is purposed model derived from the literature and validated using open source technology. In this study we propose an open source platform That aggregates information from Various sources, organizing the information needed. On this platform the main task is to gather information from Several sources, using publishing technologies. Described in this paper it multiple scenarios Applied to an entrepreneurship portal.

Categories and Subject DescriptorsD.3.3 [Programming Languages]: Language Constructs and Features - abstract data types, polymorphism, control structures. This is just an example, please use the correct category and subject descriptors for your submission.

General TermsManagement, Economics, Human Factors, Standardization, Theory.

KeywordsOpen source, Search Engine, Portal, Entrepreneurship, Information Retrieval, Design of Communication

1. INTRODUCTIONIn the Internet we find a wide amount of information that is the is the main reason being wellness is a rich source for data mining. Information retrieval Manuel Can Be a tedious process and Can team turns into quite consuming, Especially When a large part of the information does not match what users seek. Unfortunately, the need to characterize the user information is not a simple problem. The Can narrow your search criteria, Obtaining concrete results, but We Can get irrelevant information to the user.

Many broadcasting technologies, meet different types of unstructured content collections. It is quite common users That's not get the desired results, When They Their search information using natural language.

Some authors study the information seeking behavior, focusing on the modes of scanning When searching on the World Wide Web [4]. seeking information through preformed Various Can Be modes of information seeking: sweeping, discriminating, satisfying and optimizing. This approach is based on the users behavior of the weather They search an informal or a formal search, or if the undirected users or conditioned viewing. In undirected viewing, the user has no private information in mind. In the conditioned viewing information is directed to known sources Which are browsed or bookmarked [3]. "Individuals Actively create the meaning of information through Their thoughts, feelings and actions" [4]. Semantic Web is a concept promoted to Improve the capabilities of the current World Wide Web [10], for it is the study of the meaning of words and linguistic expressions. As mentioned before, finding information on a particular subject through searching the web is inexact since there is not a standard way to describe data content of web pages. The semantic web describes the use of metadata modeled by some standards (XML, RDF and Dublin Core), this standard to help users find information more readily. Semantic Web Also Refers to reach information through the use of software agents, agents are software programs Those That Can preform tasks to assist humans. These agents automatically gather information from the Internet According to the terms used by the user. Tim Berners Lee et al (2001) Say That "The Semantic Web, in contrast, is more flexible. The consumer and producer agents Can reach a correct understanding shared by Exchanging ontologies, Which Provide the vocabulary needed for discussion.

"Semantic Web Can give a more exact and need to return to the search made. Therefore including the terms in the search Can Be the solution, being wellness Referred by some authors as a good practice to deepen and refine search users, However Can lead to false issues. This paper demonstrates an attempt to solve an erratic search return output problem, gathering information publishing technologies and Their ordering.

The simple object Can Be misleading, Causing a shift in search. Data retrieval is a solution to the user of a database system, it solves the problem of searching for a particular topic, subject, topic. To be more is needed it needs an IR (Information Retrieval) That system interprets the contents of the collection and assign a score (rank) According the level of relevance of the query. The Interpretation of content Involves extracting syntactic and semantic content and Their intersection that is identical to what


Permission to make digital or hard copies of all or part of this work for personal or classroom use is-granted without fee provided That copies are not made or distributed for profit or commercial advantage and That copies bear this notice and the full citation on the first page . Otherwise to copy, or republish, to post on servers or to redistribute to lists, requires prior specific permission and / or a fee.OSDOC'12, June, 11, 2012, Lisboa, Portugal.Copyright 2012 ACM 978-1-4503-1284-4/12/0006..$10.00.

the user needs. The Difficulty is not only to the user extracts this information, but Also in knowledge of how to use and excellent determine relevance, this Notion of relevance is the center of an information retrieval system. The key to IR system Can Obtain all documents are relevant to the user as well as less relevant. In the context of information retrieval, systems use the documents in That collection that contains words the user searches. The usage of an information retrieval system has its purpose to search for more information on a particular topic, the extraction of information from the query results, using a language That Keeps objects defined with support for regular expressions or expressions of relational algebra.

In our study we defined the General Objectives:

Propose a conceptual model of entrepreneurship portal

Develop a technological medium to Appropriate Various stakeholders of the needs of a portal for entrepreneurship

Gather information in a single location, using open source platform

In this paper we present the literature review to define a conceptual model for an entrepreneurship portal and the development of a prototype is Also Presented here.

2. LITERATURE REVIEWEntrepreneurship is a concept That Was Originated in France in 1755. The role of the entrepreneur differentiates the role of investor, the the first that Supports the risk and the Second That Provides the capital [2]. The term entrepreneurship was used, the concept, for the first time, by the economist Joseph Schumpeter in 1950, it was used to sinonim of a creative person that is Able to make a successful innovations. Entrepreneurship is a term used to describe or Specify, Particularly, that 'an individual who holds a special way, innovative Activities, organizes and runs the business fits That Their interests and capabilities [6]. An entrepreneur, Thinks, creates and shapes the new company or a project. Several authors studied the entrepreneurship [6];[5], in early studies the entrepreneur took a central role [8], this indivuduals Were Identified to high propensity for risk. Based on Economic Theory, and in the context of market economy, some studies Concluded That, The Greater growth of entrepreneurship Were coincident to teams of market imbalance [15];[14], Between supply and demand. Economic agents seek a balance in order to optimize resources and Their Achieve Their Goals. According to These Theories, Economic agents, entrepreneurs with Certain characteristics found business opportunities, making plans to Minimizing risk and Engaging Their entrepreneurial projects. This allowed entrepneuship projects eployment to PvM and PvP and Therefore and economical growth. Entrepreneurship conquered the optimization of Various types of resources, Either know-how, materials or work capacity. The creation of new business led to investment in the local economies, creating jobs, Improving business competitiveness and the promotion of methods, techniques and Innovative designs. It is common for entrepreneurs to leverage the positive Economic Growth, current

national conditions or promote: That Facilitate entrepreneurship are ranked in good economies, regarding financial markets, education, labor market, technology, infrastructure, management, government. These limiting factors to an Conducted Economic Growth and innovation [17].The lesson to microeconomic, personal needs lead the markets to new abilities, economical postulates, Namely That of rationality and equilibrium [16], They are all based on Economic models, and for this study Finds That entrepreneurs seek to maximize Their know what how, by seeking the Necessary resources to undertake projects Them That lead to business opportunities. Within this seeking, the market is a place where entrepreneurs and investors and Their Can find answers. Market you serve as a channel to Achieve a Balance Between Those Who Are willing to undertake new projects and who is willing to support financially and These projects. In this context and reality in the digital economy, markets are transposable to the Web and economical agents benefit from a place where supply and demand meet Can, Can Be such a place the portal. The portal concept eat from the allegory of "gateway", Which Indicates the location Applied Technology to access information organized, consistent, accurate and understandable by the users. The web portal is to simply Generally classified as a portal, with the website That Offers Several features and services, email services, forums, wikis, search engines, online stores, organized for the same purposes. Most popular portals HAD excellant strong start in mid-1996 with the AOL portal to the visitors the lead it has a high content of services, the portal That HAD excellant birth in 1983, Nearly bankrupt company, Quantum Computer Services, led by Steve Case . Steve known as a businessman, Began a business selling computer games for the Atari 2700, one of the services of a business portal That HAD based on customer loyalty through the sale of modems in order to create a dedicated channel for transfer of securities excellant own. In the 1989 Launched a new service for Apple II and Macintosh computers and service in 1991 for DOS machines, However with the evolution of the project was moving segment, and in October 1991, Quantum changed excellant name to America Online, was a strong Beginning in the was of the portals in Conjunction with Yahoo. In 1999 he was the "boom" of online services, attention centered on the Qualities of each portal, to Introduce unique point [9]. Portals Gained Greater importance, search engines with content gathering, Attracting a 'wider audience [18]. According to the Library of Congress (the Library of Congress, 2003 & 2011) portal is a tool or set of tools organized, Which assists in the identification and selection of Knowledge Discovery, Provides essential search engines, organized by metadata, taxonomies, thematic content with different fonts, Which are Also Freely available and some commercial content. Portal Becomes a very broad concept and Can contains anything from a simple online catalog to complex intranet solutions. The common feature is a starting point for something, Providing an entry for the use of Internet services [20]. According Chumacher and Schwickert (1999) cam first the classification, taking into account the interests of users and Services offered. The year 2001 was the year most portals That Were Focused on Consumers, Zirpins et al (2001), classified into two distinct classes portals, vertical portals and horizontal portals, and in the case of vertical portals, They assume, at first, the paper directory on the web, Unlike the horizontal portals, Which Began by targeting it for a specific purpose and theme [1]. The most


common are: EIP, enterprise information portals, also known as "Corporate Portals" [3] and serves as a platfform in Internet for customers, employees and suppliers. Another subclass of vertical portals and intranet portals [20]. This type of portal Brings together a set of applications that aim to serve all the functional units of a business enterprise. Finally the authors refer to the recent emergence of so-called portals B2B (business-to-business), Which Provide a virtual environment for Exchanging Between Organizations and business information. Nowadays portals are the k ey element in the change process and Promoting Economic Growth and innovation [19].

3. PROPOSALIn the figure 1, it is presented a conceptual proposal. The user can be either, an investor or an entrepreneur. He searches for the information he needs, iusing keywords. The search engine gives back the "business concept." Each business concept is decomposed in a set of Subjects.

Figure 1 - Conceptual Proposal

4. IMPLEMENTATIONAs represented in figure 2, an user wants to know more about a specific industry, therefore, he searches within the portal search engine. The engine uses the key word and formulates a query to multiple sources, like databases, web information and learned reaction searches from other users. Then the search engine displays the output visualized according to a taxonomy.

In the figure 2, we may see the possible scenario. In this scenario, the user search information about mobile communication in Portugal. The result he organized obtains information according to several subjects (or facets), like operators that supply communication services in the market, policy and law information, it is also displayed, competition analysis, technical issues, and stock market value of main players.

Figure 2 – Usage scenario for mobile communication market

In Figure 3 of it gives an overview of the concept to be applied, which allows relating to user interaction with our conceptual model, this figure is divided into three layers, which allows observing the layer of use, processing, and layer where the elements are assembled aggregation of information.

In the layer located on top of the figure represents a demonstration of user interaction with our platform. In this sense an issue that we may be justified in a simple, "a loyal user? how? "in this model, we gathered a concept that applies to platforms that support a large density of content, these platforms provide a simple search box on the homepage, thus enhancing the platform. Users appreciate the simple search, several studies show that the user wants to have freedom in the choice and control of your search, lead the user with a more detailed form, limits the user in your search. Most common thought of a frequent user "I do not want to search according to the solutions presented ...". In the current context of our conceptual model the user in this case being an investor or an entrepreneur has available a simple search box that facilitates and gives freedom to the user according to research your ideas / needs. A simple text box is a vanishing point for when the items / links are not displayed sufficient. Simply put this issue by giving the user freedom of research using a simple text box, allowing the user to focus on their interests freely, getting the necessary information, and increasing its interaction with the platform. The intermediate layer of Figure 3 represents a simple way to demonstrate the flow of information that part of the usage of the user with the platform.

Figure 3 – Process Information Diagram

Figure 3 is a diagram which shows schematically the use and processing of information from the aggregation of different sources, such as web-services RSS feeds, RDF's among others. After indexing, information returns to the user, as an output, organized according to certain taxonomies, which afterwords is


Entity Metadata gives a meaning to a file that will be available to other users, can be used as a research source. these have a meaning and a well defined value. In this sense, all the documents / data / nodes are easily identified and cataloged to contain landmarks or reference points that allow limited information in all forms, which can be summaries of information on the form or content of a source.

The Search API is an API module that integrates research with the Views module, allowing any index to be created and displayed through views. All properties of an entity, as well as those of related entities (eg, a node of an author), are available as fields, filters and arguments for all indexed fields, the types can be created with any indexed value in a single field.With the implementation of solr indexer increases the performance and integration with Views.

The involvement of the technologies applied and faceted collaboration of research meets one of the most interesting aspects of our proposal, since it enables an evolution of research adapted to our conceptual model, relating each concept of business affairs and decomposed. Evolve the existing search module in the core platform and enables users to navigate on its content in a natural way, making it easier to search, without feeling "lost" the data obtained, being in fact a feature that fits in the area of information retrieval and gathers a discovery research. Exposes metadata in such a way that you can build your queries, refine and extend your current query. The Present module automatically reflects the query that combines free-text search forms avoiding complex, ever give empty results, but sets of results organized and cataloged by the appropriate taxonomies desired scenario. To improve interest/focus of the user conceptual model implemented in our approach in the implementation of auto-suggestion, also known as "auto-complete" in Lucene SORL via or via facets NGramFilterFactory.

The auto-suggestion comes from global searches using EdgeNGrams, this feature displays the information in real time when it was introduced by the user in the input box, it immediately returns a refined list of suggestions. Each character adds value added to the list suggested. This feature involves the use of EdgeNGrams as part of the analysis.

As an example the word "entrepreneurship", calculating the word level and character level bigrams, trigrams, four-grams, and n-grams, is composed by the following n-grams:

• Character Level unigrams e | n | t | r | e | p | r | e | n | e | u | r | s | h | i | p

• Character Level bigrams en | nt | tr | re | ep | pr | re | en | ne | eu | ur | rs | sh | hi | ip | p

• Character Level trigrams ent | ntr | tre | rep | epr | pre | ren | ene | neu | eur | urs | rsh | shi | hip | ip | p

• Character Level 4-grams entr | ntre | trep | repr | epre | pren | rene | eneu | neur | eurs | ursh | rshi | ship | hip | ip | p

• Word Level unigrams entrepreneurship

• Word Level bigrams

• Word Level trigrams

• Word Level 4-grams

The letters that compose the word are subject of a frequency analysis: Number of times each letter is used in the phrase e → 4 h → 1 i → 1 n → 2 p → 2 r → 3 s → 1 t → 1 u → 1

n-grams are useful when you need to search for substrings of terms. An edge n-gram an n-gram is constructed from an edge of a side or end.


In this paper we propose a model for information aggregation in an entrepreneurial portal. This model is supported in the concept of information facets, that 'implements a display subjects

In this paper is also presented the usage of an open source search engine, that can be integrated in a content management system. In this study it was Possible to analyze the performance of the search engine.

In what concerns future work, it is important Identifying what are the the main Subjects. Also it is important Identifying the Link Between research concepts. Also searching algorithms may be useful. For future work it is also intended to prototyping an entrepreneurship portal using Drupal.

6. ACKNOWLEDGMENTSThis study was Partially supported by FCT.


[1] Boettcher, J & Strauss, H. (2000) "Howard Strauss What is a Portal, Anyway?" Syllabus, January 2000


[2] Cantillon, R. (1755). Essai sur la nature du commerce en general.

[3] Choo, C. W., Detlor, B., & Turnbull, D. (2000). Information on the Web seeking Obtained from http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/729/638

[4] Choo, Chun Wei (2001) "Environmental scanning the information seeking and organizational learning." Information Research,7(1) [Available at: http://InformationR.net/ir/7-1/paper112.html]

[5] Dejardin, M. (2000) "Entrepreneurship and Economic Growth: An Obvious Conjunction?" CREW, Faculty of Economics and Social Sciences, University of Namur, availabe at: http://www.spea.indiana.edu/ids/pdfholder/IDSissn00-8.pdf accessed on 29/06/2011

[6] Eckhardt, J. & Shane, S. (2003) "Opportunities and Entrepreneurship", Journal of Management 2003 29 (3) 333-349, Elsevier Science Inc., available at: http://faculty.weatherhead.case.edu/shane/or/OR2.pdf Accessed on: 29/6/2011

[7] Firestone, J. (2003) Enterprise Information Portals and Knowlwdge Management, Elsevier Science, Butterworth Heinemann

[8] Khilstrom, R., & Laffont, J. (1979). "A general equilibrium entrepreneurial theory of firm formation abased on risk aversion." Journal of Political Economy, 87 (4): 719-748.

[9] Klein, Alec (2003) "Stealing Time: Steve Case, Jerry Levin and the collapse of AOL Time Warner"

[10] Lee, T. (1999) Weaving the Web The past, present and future of the World Wide Web by Inventor excellant, Orion, London

[11] Lee, T., Hendler, J. , Lassila, O. (2001) "The Semantic Web", Available at:


[12] Libary of the Congress (2003) "List of Portal Application Functionalities for the Library of Congressavailable at: http://www.loc.gov/catdir/lcpaig/portalfunctionalitieslist4publiccomment1st7-22-03revcomp.pdf , Accessed on: 30/06/2011

[13] Library of Congress the (2011) Glossary, available at: (http://www.loc.gov/acq/conser/glossary.html ) Accessed on 29/06/2011

[14] Mas-Colell, A., Whinston, M., & Green, J. (1995). Microeconomic theory. New York: Oxford University Press. New York Times. 1984. Cogeneration jars the power industry: Vol 3.

[15] Pearce, W. (1992). The MIT dictionary of modern economics. Cambridge: MIT Press.

[16] Samuelson, P., & Nordhaus, W. (2004). Microeconomics (18th ed.). McGraw-Hill/Irwin.

[17] Schumpeter, JA (1942), Capitalism, Socialism and Democracy, New York: Harper and Row.

[18] Webopedia (2004), www.webopedia.com[19] Wong, P., Ho, Y. & Autio, E. (2005) "Entrepreneurship

Innovation and Economic Growth Evidence from GEM data," Journal on Small Business Economics, 24, 335-350, Springer available at: , acessed on: 29/06/2011

[20] Zirpins, C., Weinreich, H., Bartelt, A. & Lamersdorf, W . (2001) Advanced Concepts for Next Generation Portals. University of Hamburg, Department of Informatics, Distributed Systems Group (VSYS)

[21] Apache Software Foundation (2012) “Solr Tutorial” available at:http://lucene.apache.org/solr/api/doc-files/tutorial.html

