Monday, October 18, 2010

Business Intelligence BI is the English word, also known as Business

 Intelligence acronym. Business intelligence is usually understood as existing enterprise data into knowledge to help companies make informed business decision-making tools. On the data here from the enterprise business systems, including orders, inventory, trade accounts, customers and suppliers and competitors from companies in which the industry as well as data from other external business environment in which the various data . The business of business intelligence to assist business decision, not only is the operating layer can also be a tactical and strategic levels of decision-making. In order to turn data into knowledge, need to use the data warehouse, online analytical processing (OLAP) tools and data mining techniques. Therefore, from a technical perspective, business intelligence is not new technology, it's just data warehousing, OLAP and data mining technologies such as integrated use. The concept of business intelligence was first proposed in 1996. At that time the business intelligence is defined as a class data warehouse (or data mart), reporting, data analysis, data mining, data backup and recovery components, and to help business decision for the purpose of its application. Currently, the business intelligence is usually understood as existing enterprise data into knowledge to help companies make informed business decision-making tools. On the data here from the enterprise business systems, including orders, inventory, transaction accounts, customer and supplier data and the industries in which companies and competitors from the data, as well as from other external business environment in which all kinds of data. The business of business intelligence to aid decision-making can be both business operations layer, it can be tactical and strategic levels of decision-making. In order to turn data into knowledge, need to use the data warehouse, online analytical processing (OLAP) tools and data mining techniques. Therefore, from a technical perspective, business intelligence is not new technology, it's just data warehousing, OLAP and data mining technologies such as integrated use. Therefore, it is a business intelligence solution should be seen as more appropriate. Business Intelligence is the key to the enterprise from many different operational systems from the data and extract useful data cleansing to ensure the accuracy of the data, and then after extraction (Extraction), conversion (Transformation) and load (Load), the ETL process into an enterprise-class data warehouse to get a global view of enterprise data, in this based on the use of appropriate tools for query and analysis, data mining tools, OLAP tools to analyze and deal with (this decision support when the information becomes knowledge), and finally the knowledge presented to managers, for managers to support decision-making process. Currently, the business intelligence products and solutions can be divided into the data warehouse products, data extraction product, OLAP products, display products, and integration of several products over an application for the overall solutions. Currently, many companies active in the business intelligence (hereinafter called BI) field. In fact, to meet user needs BI products and solutions must be built on stable, integrated platform, the platform needs to provide user management, security, control, connection and access the data source, analyze and share information function. BI Platform Standardization is also important, as it relates to a variety of applications and enterprise systems compatibility problems, not solve the compatibility problems, BI system can not play its due effect. Here we are on the BI system, a laboratory model (we call D Systems) for Functional Anatomy, to introduce the BI system. D system is a system for end users direct access to business data, enabling managers from every point of view of the use of commercial data, timely grasp of operational status of the organization to make scientific management decision-making system. D system standard reports from a simple browse to the advanced data analysis to meet the needs of staff within the organization. D system covers the conventional sense, business intelligence (BI) system function, the main framework include the following. System can read data D read multiple formats (such as Excel, Access, Tab separated by a fixed length txt and txt, etc.) files, relational database can be read at the same time (corresponding ODBC) data. In reading the text and on the basis of the data, D system can be completed: Connecting the text to two joint projects in the CSV file as a key (Key), the required data into a file, this can be as easy as the operation of the database, but not the user program can be realized. Set the project type as a data type of project, in addition to the button (button) (text item), value items, you can also set the date that the date data in the form of projects, multimedia projects and need to generate the button but the list can be viewed in the light display project. Project data set the date during the year or quarter, and so can generate a new combination?? Afternoon or time combined with other projects to generate a new time. Set the level for the value of the project, can be set level to generate corresponding button. For example, the project can generate and the age of 20-year-old age group, 30-year-old age level of the corresponding button. Associated analysis / correlation analysis limited mainly used to find the correlation between different events, that an event occurred at the same time, another event is also a frequent occurrence. The focus of rapid correlation analysis found that those who have practical value in the associated events. The main basis is that the probability of events and conditional probability should meet certain statistical significance. D system to the analysis of this association is designed in the form of buttons, by selecting yes / no association, and / opposite association. For structured data, customer buying habits data, for example, association analysis using D system can be found associated with the customer purchase requirements. For example, a savings account opened at the same time customers are likely to bond trading and stock trading. Using this knowledge can take proactive marketing strategy, expanding customer buying the product range to attract more customers. Displayed value ratio / D system can indicate the display order of data items between the numerical ratio between the size of a button to render and display its composition ratio can also change the order of values and other project data. Select button, the dynamic display of constantly changing. This data can be compared visual effects, and to highlight the differences, to facilitate in-depth analysis of the essence behind the phenomenon. Monitoring pre-set conditions to meet the conditions button to display the alarm (red), attention (yellow) signal, so that the problems at a glance. For example: last quarter, less than 100 million turnover store warning (yellow marked), less than 50 million alarm (red marked). After execution, D system put to name the button named them with the corresponding colors. Button value-added features multiple buttons can be combined to form a new button. For example: the 【April】, 【】 May, June】 【combination of the three buttons get a new button after the 2nd quarter】 【. Select features from the large amount of data recorded in the select button and remove the necessary data. Pick out the data can be re-constitute the same operating environment. So that the user can concentrate on the data of interest. Multimedia information from the digital camera that features photos or video files, graphics input through the scanner and other multimedia files, word processing or spreadsheet software made the report, HTML and other files stored in standard form,UGG shoes, you can find the button . Split split button function button class specific circumstances, only the individual switch is split button to continue the implementation of connection has been logged in the setting treatment. Program calls the function to a button to find the extracted data, other software or users to pass the original program, and perform these procedures. Find button name button name by the Find button function, you can specify both precise and fuzzy search methods. In addition, other buttons can also find the result class related data qualified. Rich picture list screen can be used and / or change the search conditions, statistical / sort. Statistical value of an object only for projects in three statistical methods: the total number of pieces, on average, and can change the value by way of 12 the display format. Perspective view of the screen and change the view to provide switching functions and setting conditions by changing the corresponding value of (cell) color for emphasis. Perspective transformation can be followed by a wide range of data analysis. View statistics only for the object value of the project, statistical methods are combined, average, proportion (vertical, horizontal), cumulative (longitudinal and transverse) and the weighted average, maximum, minimum, and the absolute latest in 12 species. Numerical project toggle button class by the class of (rows and columns were set up to 8 layers), from whole to part, while digging down layer, while analyzing the data, you can explore the problem more clearly. Exhibit D system uses its own screen graphics library developed to provide a column chart, line chart, pie charts, area charts, column + line 35 top-five. The chart on the screen, you can also view the same as in class, free to return to the levels of mining and other operations. Statistical data output function to print a list of pictures and charts, can be a good statistical analysis of data output to other applications, or in HTML format. Deal with stereotypes,bailey UGG boots, the required output is displayed when the log shape, can be generated automatically to deal with stereotypes button. Later, just press the button, even if very complex operation, also can be the desired list, view and graph displayed. D system, business intelligence system application can assist the establishment of information centers, such as generating reports and analysis of various work reports. For the following analysis: Analysis of sales analysis of the major sales targets, such as gross profit, gross margin, cross over, into the ratio of sales, profitability, turnover, up, chain, etc.; and analysis of Victoria but also from the management structure, category brand , date, time and other point of view, the analysis of dimensional and multi-level drill down to get very thorough analysis of ideas; same information based on the predicted mass of data, alarm information and other analytical data; sales targets can be generated based on a variety of new PivotTable. The main commodity analysis of data from commodity sales data and basic data of goods to produce in order to analyze the structure of the main line of thought. Analyzes the data structure of a product category, brand, pricing structure, gross margin structure, settlement structure, origin, structure, resulting in product breadth, product depth, product-out rate, the introduction rate of goods, commodities exchange rate, key commodities, selling commodities, unsalable goods, seasonal merchandise and other indicators. D system through the analysis of these indicators to guide the restructuring of corporate goods, strengthening the competitiveness of the commodity business and reasonable configuration. Analyzed the system by D personnel indicators of the company, especially on sales targets (mainly sales targets, margin targets?? For, purchase and sale of goods number, the number of consignment of goods, funds used, cash flow, etc.) analysis to to assessing employee performance, improve staff motivation, and human resources to provide the scientific basis for rational use. Analysis of the main themes, composition of staff, sales per capita sales, personal sales performance for sales, the management structure of the per capita sales, gross profit contribution, procurement officers in charge of much of the purchase, the proportion of purchase and sale of consignment, the introduction of how the sales of goods and so on.

end-user query and reporting tools. Designed to support the primary user's access to raw data, not including the finished product suitable for professional report generation tool. OLAP tools. Provide multi-dimensional data management environment, the typical application is the modeling business problems and business data analysis. Also known as multidimensional OLAP analysis. Data Mining (Data Mining) software. Use, such as neural networks, rule induction and other technologies used to discover the relationship between the data and make inferences based on data. Data Warehouse (Data Warehouse) and data marts (Data Mart) products. Including data conversion, management and access to other aspects of pre-configured software, usually including some business models, such as financial analysis model. Online analytical processing (OLAP) concept was first used by the father of relational database EFCodd made in 1993, he made the 12 criteria on the OLAP. OLAP put forward caused a great response, OLAP as a class of products with the online transaction processing (OLTP) was separated. Today's data processing can be divided into two categories: online transaction processing OLTP (On-Line Transaction Processing), online analytical processing OLAP (On-Line Analytical Processing). OLTP is a traditional relational database the main application, mainly basic, routine transactions, such as bank transactions. OLAP is a data warehouse system, major applications, support for complex analysis operations, focusing on decision support, and provide intuitive query results. OLAP is to enable analysts, managers or executives of information from multiple perspectives can be a quick, consistent, interactive access, to gain a better understanding of the data of a class of software technology. The goal is to meet OLAP decision support environment or to meet the specific multi-dimensional query and reporting needs of its core technology is the By combining a number of important physical property is defined as more than one dimension (dimension), enables users to different dimension of comparability of data. Therefore, it can be said multi-dimensional OLAP data analysis tools collection. OLAP multi-dimensional analysis of the basic operation of a drill-down (roll up and drill down), slices (slice) and cut into pieces (dice), and rotate (pivot), drill across, drill through and so on. Drill is to change the dimension level, transform the particle size. It includes drill up (roll up) and drill down (drill down). roll up in one-dimensional low-level details of the data will be summarized to the high-level summary data, or reduce the dimension; and drill down on the contrary, it is deep into the details of the summary data from the observed data or to add new dimensions. Slice and dice is part of the dimension of the selected value, metrics concern the distribution of the remaining dimension. If only two remaining dimensions, it is a slice; if there are three, it is cut. Rotation is the direction of change dimension, that dimension in the table to re-arrange the placement (for example,cheap UGG boots, ranks of the exchange.) OLAP There are many implementations, according to store data in different ways can be divided into ROLAP, MOLAP, HOLAP. Said that the ROLAP OLAP implementation based on relational databases (Relational OLAP). Relational database as the core, the structure of multi-dimensional relational data representation and storage. ROLAP multidimensional structure of the multidimensional database table is divided into two categories: one is the fact table, dimension is used to store data and keywords; the other is the dimension table, that is, each dimension table to store at least one dimension level, members of the types of dimensional description. Dimension tables and fact tables have primary key and foreign key linked together to form a The level of complex peacekeeping, in order to avoid redundant data storage space occupied by large, multiple tables can be used to describe the expansion of this mode of star known as MOLAP cube-based organizations that implement OLAP (Multidimensional OLAP). With multidimensional data organization as the core, that is, MOLAP uses multidimensional arrays to store data. Multi-dimensional data storage will form a HOLAP that OLAP-based hybrid data organization to achieve (? Type. This approach is more flexible. There are other ways to achieve OLAP, such as providing a dedicated SQL Server, for some storage model (such as stars type, snow type) to provide special support for SQL queries. OLAP tools are specific issues and analysis of online data access. it's way through the multidimensional data analysis, query and reporting. dimension is the one specific point of observation data. For example a business case considering sales, usually from the time of the different regions and product insight into the perspective of product sales. This time, region and product is the dimension. The combination of these dimensions and the study of different metrics indicators form the OLAP multidimensional array is the basis of analysis can be formalized as (D 1, D 2,UGG boots clearance, ... ..., D n, metrics), such as (area, time, products, sales). multi-dimensional analysis is on the multi-dimensional data take the form of organized slice (Slice), cut into pieces (Dice), drill (Drill-down and Roll-up), rotational (Pivot) and other analysis of action, in order to analyze the data, allowing users to from various angles, many sides to observe the data in the database, so in-depth understanding of the information contained in the data. mainstream business intelligence tools, including BO, COGNOS, BRIO. some domestic platform of software tools such as KCOM also incorporates some of the basic business intelligence tools. According to the organization of comprehensive data in different ways, there are now a common OLAP multidimensional database based on MOLAP and ROLAP relational database based on the two. MOLAP is a multidimensional way to organize and store data, ROLAP is the use of existing relational database technology to simulate the multi-dimensional data. in data warehouse applications, OLAP applications are generally the front-end tools for data warehouse applications, and OLAP tools for data mining tools can also be the same, with the use of statistical analysis tools to enhance decision-making analysis. After several years of accumulation, most medium and large enterprises and units have established a relatively complete CRM, ERP, OA and other basic information systems. uniform characteristics of these systems are: staff or user through the operation of the business, and ultimately increase the database, modify, or delete operation. The system can be uniformly referred to as OLTP (Online Transaction Process, online transaction processing), refers to the system running for some time, must help the enterprises to collect a lot of historical data. However, the dispersion in the database , independent of the wealth of data for operational staff is only a few can not read the bible. business people need is information, that they be able to read, understand and benefit from abstract information. At this point, how the data into information so that operational staff (including managers) to fully grasp the use of this information, and decision support, business intelligence is the main problem. how to exist in the database data into information for business need? most of the answer is reporting system. In short, the report can be called a BI system has been, and it is a low-end BI implementation. now foreign companies have entered in the end most of the BI, called the data analysis. There are some enterprises have begun to enter the high-end BI, called data mining. Moreover, China's enterprises, most still remain in the report stage. Data Report can not replace traditional reporting system technology has been mature and familiar Excel, Crystal Reports, Reporting Service and so on have been widely used . However, as data increases, the demand increased, the traditional reporting system, more and more challenges. 1. the data too much, too little information form dense pile of a large amount of data, in the end a close look at the number of business people each A data? in the end, what these data represent the information and what trend? higher level of leadership, the more concise information needed. If I was chairman of the board, I might just need a word: our current situation is good, medium or bad? 2. difficult to interactive analysis to understand the various combinations of good report too rigid custom. For example, a table can be listed in different regions, different product sales, the other in different areas listed in the table, the sales of customers of different ages. But , these two tables can not answer such as data on the surface, but deep in the potential with massive amounts of data which rules? What is the maximum value to our customers, the products are interrelated and to what extent? more deep rules for the greater value of decision support, however, difficult it is to dig out. 4. hard to trace the history of the formation of isolated islands of data many business systems, data exists in different places. too old data (such as a year of data) is often backed up out of business system, leading to macro-analysis, long-term historical analysis is very difficult. Therefore, the development of the times, the traditional reporting system has been unable to meet growing business needs, and companies are looking forward to the new technology. data analysis and data mining era is coming. It should be noted that the data analysis and data mining system is designed to bring us more of the value of decision support, not replace the data in the report. reporting system still has its advantages can not be replaced, and will be long and the data analysis, co-exist with mining system down. eight-dimensional If the above data analysis focused on the OLTP database to add, modify, delete and other routine operations, OLAP (Online Analytics Process, online analysis system) will focus on macro issues for a comprehensive analysis of the data, access to valuable information. To to achieve the purpose of OLAP, the traditional relational database is no longer enough, and need a new technology called multi-dimensional database. multidimensional database concepts are not complicated. As an example, we want to describe the April 2003 sales of Coke in the northern region of 10 million, involves several aspects: time, product, region. These are called dimensions. As for sales, called the measure. Of course, there are costs and profit. In addition to time, products and regions, we can have many dimensions, such as customer's gender, occupation, sales, promotions and so on. In fact, the use of multi-dimensional database may be an 8-dimensional or 15-dimensional cube. Although the structure of 15-dimensional cube is very complex, but conceptually very simple. The overall structure of the data analysis system is divided into four parts: the source systems, data warehouses, multidimensional databases, client. * source system: including all existing OLTP system, build BI systems do not need to change the existing system.



・ data warehouse: data centralization, through data extraction, the data extracted from the source system continuously, it could be once a day, or once every 3 hours, of course, automatically the. data warehouse built on relational database is still, often found called such as sales, inventory or financial. * client: the client software can be a good multi-dimensional cube to show the variety of information to the user. Data analysis of the case:



in the actual case , we have built using Oracle 9i data warehouse, Microsoft Analysis Service 2000 to build a multidimensional database, ProClarity 6.0 as the client analysis software. decomposition tree like an organization chart. decomposition tree is the answer the following questions? highest sales? * within specific product categories, distribution of a variety of products to sales of between? * Which sales staff to complete the highest percentage of sales? in Figure 1, can be in various regions of the PC, and percentage of sales at a glance. any layer decomposition tree can start randomly according to different dimensions. in the decomposition tree, in the region this layer is to start by country, at the national level is a by product that start. Projection (Figure 3) the use of casual point diagram format, show two or three measure relationship. predict the concentration of data points between the two variables there is a strong correlation between the distribution of the sparse data points may appear obvious relationship. projection is large amounts of data for analysis. causal relationship in the show has obvious effects, such as exceptional data points can be considered further, because they fall on the br> see through your data mining needs of a broad sense, any mining information from the database are called data mining process. From this point of view, data mining is the BI. but said technical terms, data mining (Data Mining) Special means: the source data cleansing and conversion through a set of data for mining. data mining in this data set with a fixed form to complete the extraction of knowledge, and finally a suitable model for further analysis of knowledge in decision-making. from This narrow point of view, we can define: data mining is the data set from a particular form of refined knowledge. data mining are often for specific data, specific problem, select one or more mining algorithms to find hidden data in the following the law, these laws are often used to predict and support decision-making. and the DSS, EIS systems, business intelligence has better prospects for development. In recent years, business intelligence market continues to grow. IDC forecasts that by 2005, BI market will reach $ 11.8 billion, with an average annual growth rate of 27% (Information Access Tools Market Forecast and Analysis: 2001-2005, IDC # 24779, June 2001). With the enterprise CRM, ERP, SCM applications such as the introduction of the system, companies do not remain in the transaction process and focus on the effective use of enterprise data for accurate and faster decision-making needs to support more and more intense, thus driving the demand for business intelligence will be enormous. business intelligence trends can be summarized as the following: the function has a configurable, flexible, can change the scope of the BI system for the sector from a particular user service to be extended to all users throughout the enterprise services. Meanwhile, business users in terms of demand differences, BI systems provide extensive, targeted functions. from simple data acquisition, to the use of WEB and LAN, WAN for rich interaction, decision-making information and knowledge, analysis and use. solutions more open, extensible, users can customization, in ensuring the core technology, providing customized interface for the unique needs of different enterprises, BI systems provide the core technology, the system has a personality, that is the basis of the original program by adding your own code and address programs, enhance the customer interface and extended features of; can provide custom-based business intelligence platform / P> From a separate business intelligence business intelligence to the development of this embedded business intelligence applications is a major trend that is now in the enterprise Some applications, such as finance, human resources, sales and other business intelligence components embedded in the system, to make universal sense of the transaction processing system with business intelligence features. consider the BI system, rather than a component of the BI system is not a simple matter, such as the OLAP technology to a certain application, a relatively complete business intelligence development processes, such as business problem analysis, program design, prototype development, system application process is indispensable. from the traditional function of the Enhanced functional changes in the business intelligence capabilities enhanced relative to the earlier query with the SQL tools to implement business intelligence features. the current application of the BI system in addition to achieve outside of a traditional BI system functions, most have realized the data in Figure 2 Analysis layer functions. and data mining, business modeling is the application of BI system should be strengthened to better improve the system performance.

PDI (Kettle) What? PDI (Kettle) can do?

PDI (Kettle) What?

PDI (Kettle) is an open source, metadata-driven ETL (data extract, transform, load) tool, open source ETL tools in a more powerful functionality.

PDI stands for Pentaho Data Integeration, Kettle is the former name of PDI, Kettle kettle intended meaning, the meaning of the expression of the data stream.

Kettle's main author is Matt, he started in 2003, this project's code in the PDI can see the earliest date in about April 2003. starting from version 2.2, Kettle project into the open field, and follow the LGPL agreement.

Kettle in 2006, joined the organization of open source BI Pentaho, officially named PDI, joined Pentaho Kettle after the development of faster and faster, and more and more people concerned about it.

PDI (Kettle) can do?

can say that where there are data integration, conversion, migration scenarios can be used PDI, he replaced the completion of data conversion tasks manual coding and reduce the development effort.

No comments:

Post a Comment