65 articles in this selection
|
|
| 2011/09/02 Hadoops Everywhere
We don’t pay enough attention to Hadoop. By “we” I mean DBAs, the rest of the world is paying plenty of attention to Hadoop. Recently, I started asking my customers and fellow DBAs about Hadoop adoption in their company. Turns out that many of them have Hadoop. Hadoop shows up in large companies and small ones, in established industries and in startups. Its everywhere. The way Hadoop shows up in all companies, and the way DBAs don’t pay Hadoop much attention, reminds me a lot of how MySQL started showing up in the enterprise. It didn’t start by DBAs showing up one morning and telling their managers: “There’s this new open source database. Its not as stable as Oracle and it doesn’t have all the features we need, but man – its going to save us tons of money, and its pretty simple to manage.”...
| |
|
|
|
|
| 2010/11/13 Announcing Google Refine 2.0, a power tool for data wranglers
Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions....
| |
|
|
| 2010/10/17 Extending Business Intelligence with Graph Analytics
A wide range of tools is available with which users can analyze their data stored in data warehouses and production databases. These tools range from straightforward reporting tools via interactive online analytical processing tools to advanced statistical tools. All these tools help users in some way to improve their business operations and business decisions. They help by presenting data in a textual or graphical way by summarizing data, by grouping data, or by making predictions. But there are things most of these tools can’t do, and that is analyze data when it’s structured as a graph or network and when that data must be analyzed by traversing the graph....
| |
|
|
| 2010/09/29 A Truly ID-iotic Design
There are those who believe that each and every table in a relational database must have a "uniquifier" Primary Key (IDENTITY, SEQUENCE, GUID, and so on) and there are those who actually know how to design a relational database. As Dave points out, the trouble comes when the former group is actually allowed to design databases....
| |
|
|
| 2010/07/30 De zin en onzin van NoSQL | Webwereld
SQL is de de-facto standaard als het gaat om applicatie-databases. Maar er wordt meer en meer getwijfeld aan het nut van relationele databases, ten gunste van het zogenaamde 'NoSQL'.
| |
|
|
|
|
| 2010/06/03 Nederlandse Voornamenbank
De Nederlandse Voornamen Databank geeft informatie over 500.000 verschillende voornamen die in Nederland voorkomen. Het bestand is gebaseerd op de voornamen die als eerste naam en/of als volgnaam omstreeks 1 juli 2006 bij de Gemeentelijke Basisadministratie geregistreerd waren, aangevuld met recentere eerste voornamen die verkregen zijn van de Sociale Verzekeringsbank. Daarnaast zijn er een aantal voornamen opgenomen die genoemd worden in de Prisma Voornamen (samengesteld door J. van der Schaar, bewerkt door Doreen Gerritzen), maar niet meer in Nederland gedragen worden....
| |
|
|
|
|
| 2009/09/29 Dialecten in Nederland
Voorbeelden van dialecten in Nederland, toegankelijk via aanduidingen op een kaart. Interessant.
| |
|
|
| 2009/06/16 Data warehouse appliance
In computing, a data warehouse appliance consists of an integrated set of servers, storage, operating system(s), DBMS and software specifically pre-installed and pre-optimized for data warehousing (DW). Alternatively, the term can also apply to similar software-only systems[1] — purportedly very easy to install on specific recommended hardware configurations.[2] DW appliances provide solutions for the mid-to-large volume data warehouse market, offering low-cost performance most commonly on data volumes in the terabyte to petabyte range....
| |
|
|
|
|
|
|
| 2009/06/16 ScaleDB
ScaleDB is a storage engine that plugs into MySQL, enabling it to deliver enterprise-class scalability that rivals far more expensive commercial databases. By leveraging clusters of low-cost commodity servers and eliminating manpower-intensive data partitioning (or sharding), ScaleDB delivers a far superior total cost of ownership (TCO)....
| |
|
|
| 2009/06/16 Google Fusion Tables (Beta)
Upload small or large data sets from spreadsheets or CSV files. Visualize your data on maps, timelines and charts. Pick who can access your data; hide parts of your data if needed. Merge data from multiple tables. Discuss your data with others. Track changes and discussions....
| |
|
|
| 2009/06/16 Google zendt database in de wolken
Google heeft een experimentele versie van Fusion Tables vrijgegeven, een revolutionair andere aanpak van databasemanagement, volgens analisten.
| |
|
|
| 2009/06/09 The future of data marts
Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC).
| |
|
|
| 2009/06/07 Inverted index
In information technology, an inverted index (also referred to as postings file or inverted file) is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents, in this case allowing full text search. The inverted file may be the database file itself, rather than its index. It is the most popular data structure used in document retrieval systems.[1] Several significant general-purpose mainframe-based database management systems have used inverted list architectures, including ADABAS, DATACOM/DB, and Model 204. There are two main variants of inverted indexes: A record level inverted index (or inverted file index or just inverted file) contains a list of references to documents for each word. A word level inverted index (or full inverted index or inverted list) additionally contains the positions of each word within a document.[2] The latter form offers more functionality (like phrase sear...
| |
|
|
| 2009/06/07 illuminate Systems' iLuminate May Be the Most Flexible Analytical Database Ever
OK, I freely admit I’m fascinated by offbeat database engines. Maybe there is a support group for this. In any event, the highlight of my brief visit to the DAMA International Symposium and Wilshire Meta-Data Conference conference last month was a presentation by Joe Foley of illuminate Solutions , which marked the U.S. launch of his company’s iLuminate analytical database....
| |
|
|
| 2009/06/05 Apache CouchDB: The CouchDB Project
Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language....
| |
|
|
|
|
| 2009/04/30 2 Petabyte PostgreSQL
A rather unexpected company helped us get pgCon started off with a bang: Yahoo! Last night, they threw a night-before party for the conference, gave away a Wii, and announced a 2-petabyte database built with a modified version of PostgreSQL.
| |
|
|
| 2009/04/30 Aster nCluster Builds on Open Source PostgreSQL
NCluster starts with PostgreSQL. According to Mayank Bawa, CEO and co-founder of Aster Data Systems, nCluster uses PostgreSQL as a data store on each node of a hardware cluster. Aster-built distributed database technology coordinates the nodes to deliver shared-nothing, parallelized database processing (MPP). According to Bawa, nCluster relies on "a series of patent-pending algorithms and processes that optimize the placement, partitioning, balancing, replication, and querying across a cluster of intelligent nodes." Bawa calls PostgreSQL "a very stable foundation/abstraction on which we build our algorithms."...
| |
|
|
| 2009/04/30 Web Analytics Databases Get Even Larger
Web analytics databases are getting even larger. eBay now has a 6 1/2 petabyte warehouse running on Greenplum — user data — to go with its more established 2 1/2 petabyte Teradata system. Between the two databases, the metrics are enormous — 17 trillion rows, 150 billion new rows per day, millions of queries per day, and so on. Meanwhile, Facebook has 2 1/2 petabytes managed by Hadoop, not running on a conventional DBMS at all, Yahoo has over a petabyte (on a homegrown system), and Fox/MySpace has two different multi-hundred terabyte systems (Greenplum and Aster Data nCluster). eBay and Fox are the two Greenplum customers I wrote in about last August, when they both seemed to be headed to the petabyte range in a hurry. These are basically all web log/clickstream databases, except that network event data is even more voluminous than the pure clickstream stuff....
| |
|
|
| 2009/04/28 Getting started with ADAPT
In this white paper we discuss logical vs. physical data modeling, review currently available modeling techniques and why they are not appropriate for OLAP, and introduce the basic elements of ADAPT™ (Application Design for Analytical Processing Technologies). We explain each of the nine ADAPT database objects and their symbols and illustrate how to use the symbols with simple examples....
| |
|
|
|
|
|
|
|
|
| 2009/03/10 De magie van de kubus verdwijnt
Het succes van de kubus heeft veel leveranciers van relationele databases ertoe bewogen kubustechnologie in te kopen. Waar de kubus in eerste instantie een los product was in het aanbod van de database-leveranciers, wordt de kubustechnologie inmiddels steeds verder geïntegreerd in de relationele database. Daarnaast wordt vaker functionaliteit voor het analyseren van relationele data met een hoge performance in de database geïmplementeerd. De grenzen tussen relationele en dimensionele technologie vervagen daarmee en een keuze voor een van de twee is steeds moeilijker te maken. Dit artikel zet de belangrijkste ontwikkelingen rond de kubus op een rijtje en geeft handvatten bij de keuze voor een relationele of multidimensionale oplossing....
| |
|
|
| 2009/03/03 Data Warehouse Appliance from Dataupia
Dataupia runs on Linux (which you don't touch) and plugs into the top layer of the Oracle (or DB2 or SQL Server) execution engine and then Dataupia takes over execution of database primitives (parse, joins and so forth) and storage management (which you will no longer need to administer) from the database....
| |
|
|
| 2009/03/03 Voertuiggegevens
De RDW registreert APK-data, technische en andere gegevens van voertuigen. Op deze pagina kunt u de gegevens van voertuigen met een Nederlands kenteken opvragen.
| |
|
|
| 2009/03/01 Oracle SQL Developer
Oracle SQL Developer is a free and fully supported graphical tool for database development. With SQL Developer, you can browse database objects, run SQL statements and SQL scripts, and edit and debug PL/SQL statements. You can also run any number of provided reports, as well as create and save your own. SQL Developer enhances productivity and simplifies your database development tasks....
| |
|
|
| 2009/02/17 Open Source Data Warehousing? :: Tholis Consulting
The economic downturn is causing an increasing interest in using open source (OS) solutions for BI. One of my previous blogposts already raised the issue of the missing pieces in open source analytical databases, but nevertheless more and more companies are using OS databases for data warehouse purposes....
| |
|
|
| 2009/02/16 The OLAP Report: Project Gemini — Microsoft’s Brilliant OLAP Trojan Horse
Project Gemini was announced at the second annual Microsoft BI Conference in Seattle on October 6, 2008, and there is no doubt that it was the highlight of the three-day conference. The Gemini code-name is meant to imply that Excel and Analysis Services, and end-users and IT, will be twinned using the new product, but our pun (‘BOTH’) reflects the view that Gemini appears to be both a brilliant way of introducing Analysis Services to Excel users and some rather smart technology....
| |
|
|
| 2009/02/13 neo4j open source graph database »
Neo4j is a graph database. It is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables. A graph (mathematical lingo for a network) is a flexible data structure that allows a more agile and rapid style of development....
| |
|
|
| 2009/02/13 Is the Relational Database Doomed? - ReadWriteWeb
Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud. One key message this sends is, "if you want vast, on-demand scalability, you need a non-relational database". If that is true, then is this a sign that the once mighty relational database finally has a chink in its armor? Is this a sign that relational databases have had their day and will decline over time? In this post, we'll look at the current trend of moving away from relational databases in certain situations and what this means for the future of the relational database....
| |
|
|
|
|
| 2009/02/06 Infochimps.org: Free Redistributable Data Sets of Every Kind
The infochimps.org community is assembling and interconnecting the world's best repository for raw data -- a sort of giant free allmanac, with tables on everything you can put in a table. Built by data nerds, used by data nerds, it's a central source for the information you need to power the projects the world needs....
| |
|
|
|
|
| 2009/02/06 EXASOL AG: EXASolution - taking business intelligence to a new level
The EXASOL concept: intelligent software instead of expensive hardware! Right from the start, EXASolution was developed to meet the needs of data warehousing environments. It is based on standard servers that operate in parallel and offer a good value for money. Compared with conventional, relational database systems, the revolutionary in-memory technology drastically improves the performance of ad hoc queries by up to a factor of 100....
| |
|
|
|
|
|
|
| 2009/02/06 TPC-H
The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size....
| |
|
|
| 2009/02/06 Kickfire: Data Analytics for the Masses - ReadWriteWeb
You may not realize it, but the data analytics market is buzzing. There are new vendors emerging, new products popping up, new deals being done, and several new strategies being pursued. Vendors are predominately chasing big data, with battles lines being drawn by solution providers that cater to between roughly 100 TB and 10 PB data sets. The battle was inevitable because the world is producing data at a phenomenal rate, and we have an increasing need to analyze them within shorter time frames. In this post we analyze one of these vendors, Kickfire....
| |
|
|
| 2009/02/06 Infobright
Designed for analytics, ICE is easy to use, simple to manage and ideal for data volumes up to 30 TB and more. ICE combines a column-oriented database with a unique Knowledge Grid architecture to eliminate the complexity of data warehousing.
| |
|
|
| 2009/02/06 Magic Quadrant for Data Warehouse Database Management Systems
The data warehouse DBMS market is expanding at a record pace with new vendors, new offerings and high growth. We discuss this, the growth of appliance offerings and how data warehouse DBMS software-only vendors are responding with enhanced functionality and low-cost, market-entry solutions....
| |
|
|
| 2009/02/06 BeyeNETWORK: A Brief Review of Indicator and Flag Columns, Part 2
In Part 1 of this article, we looked at the basic characteristics of what are known as indicator or flag columns in databases. These are columns that, at least ideally, should record a value of true, false, or indeterminate for an attribute whose definition can be stated as a proposition that can be affirmed or denied. Let us now look a little further into how these columns relate to business rules and to metadata....
| |
|
|
| 2009/02/06 BeyeNETWORK: A Brief Review of Indicator and Flag Columns
In many entities, but particularly in master data entities, it is not uncommon to see attributes that are generally known as “indicators” or “flags.” These become columns in physical databases, where they are also known as “indicators” or “flags.” They seem to be often passed over as rather simple, but they are worth thinking about. Unfortunately, not a lot is written about indicator columns, so what follows is my personal view of them based on my particular experience, and I freely admit there may be other perspectives that are worth considering....
| |
|
|
| 2009/02/06 Kimball Forum
Discussion Group for the Dimensional Data Warehouse and Business Intelligence Community
| |
|
|
| 2009/01/04 Data Modeling: Reality Requires Supertypes and Subtypes (PDF)
In this paper, we’ll examine the underutilized technique of entity supertyping and subtyping (also known as generalization hierarchies or inheritance). Our goal is to answer two key questions: 1. Where, when, and why we should supertype and subtype our entities? 2. How can we generate DDL that truly implements what has been modeled?...
| |
|
|
| 2009/01/04 Understanding Hidden Subtypes
Subtypes can be found in many data models. They occur when an entity (the supertype) consists of other entities (subtypes) that exist at lower levels of abstraction and have their own particular attributes.
| |
|
|
| 2008/11/30 Rory - Neopoleon : Excel as a database
As a developer, you've probably, at some unfortunate point in your life (possibly several points, actually), been handed an Excel file that has been crammed full of "data" by someone in marketing and told to "do something with it."
| |
|
|
| 2008/11/18 SQL Injection Cheat Sheet
Currently only for MySQL and Microsoft SQL Server, some ORACLE and some PostgreSQL. Most of samples are not correct for every single situation. Most of the real world environments may change because of parenthesis, different code bases and unexpected, strange SQL sentences....
| |
|
|
|
|
| 2008/11/11 Freebase: an open, shared database of the world's knowledge
Freebase, created by Metaweb Technologies, is an open database of the world’s information. It’s built by the community and for the community – free for anyone to query, contribute to, build applications on top of, or integrate into their websites.
| |
|
|
| 2008/10/22 The Difference Between Data Mart and Data Warehouse
The biggest decision facing most IT managers today is whether or not they should construct the data mart before the data warehouse. Many vendors will tell you that data warehouses are hard to build, as well as expensive.
| |
|
|
| 2008/10/12 SierraVault Archives
SierraVault is an archival project related to the games and publicity materials of Sierra On-Line, an influential and industry leading computer game company of the 1980s and 90s.
| |
|
|
| 2008/09/29 TPC-H
The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions....
| |
|
|
| 2008/09/25 Slashdot | Oracle To Sell Database Hardware
"In a move the company is billing as its first foray into the hardware business, Oracle Corp. said Wednesday it will begin selling server computers that come with its database software pre-installed."
| |
|
|
| 2008/09/20 D B Cooper's Loot
Found an old $20 bill, and want to easily check to see if it was part of the loot from the most storied skyjacking in American history? Then look no further!
| |
|
|
|
|
|
|
| 2008/08/25 The 1-petabyte Barrier Is Crumbling
"I had been a database industry analyst for a decade before I found 1-gigabyte databases to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling. Specifically, we are about to see data warehouses — running on commercial database management systems — that contain over 1 petabyte of actual user data. For example, Greenplum is slated to have two of them within 60 days. Given how close it was a year ago, Teradata may have crossed the 1-petabyte mark by now too. And by the way, Yahoo already has a petabyte+ database running on a home-grown system. Meanwhile, the 100-terabyte mark is almost old hat. Besides the vendors already mentioned above, others with 100+ terabyte databases deployed include Netezza, DATAllegro, Dataupia, and even SAS."...
| |
|
|
| 2008/05/26 phpMyAdmin
MySQL Database Administration Tool
| |
|
|