Planet MySQL

Syndicate content
Planet MySQL - http://www.planetmysql.org/
Updated: 5 min 14 sec ago

A tale of a bug…

3 hours 14 min ago

So I sometimes get asked if we funnel back bug reports or patches back to MySQL from Drizzle. Also, MariaDB adds some interest here as they are a lot closer (and indeed compatible with) to MySQL. With Drizzle, we have deviated really quite heavily from the MySQL codebase. There are still some common areas, but they’re getting rarer (especially to just directly apply a patch).

Back in June 2009, while working on Drizzle at Sun, I found a bug that I knew would affect both. The patch would even directly apply (well… close, but I made one anyway).

So the typical process of me filing a MySQL bug these days is:

  • Stewart files bug
  • In the next window of Sveta being awake, it’s verified.

This happened within a really short time.

Unfortunately, what happens next isn’t nearly as awesome.

Namely, nothing. For a year.

So a year later, I filed it in launchpad for MariaDB.

So, MariaDB is gearing up for a release, it’s a relatively low priority bug (but it does have a working, correct and obvious patch), within 2 months, Monty applied it and improved the error checking around it.

So MariaDB bug 588599 is Fix Committed (June 2nd 2010 – July 20th 2010), MySQL Bug 45377 is still Verified (July 20th 2009 – ….).

(and yes, this tends to be a general pattern I find)

But Mark says he gets things through… so yay for him.2


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL and Open Source at Oracle Open World

Wed, 07/21/2010 - 15:39
Oracle Open World is fast approaching.   I am definitely looking forward to the MySQL and open source activity at the upcoming Oracle Open World conference in San Francisco.   I'm scheduled to deliver a presentation on implementing MySQL on Windows.   I am also looking forward to Oracle's first Open World conference with MySQL under the Oracle family umbrella.
PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Idiosyncrasies that BITE Webinar

Wed, 07/21/2010 - 14:18

If you have not looked at my recent presentation that I presented at ODTUG Kaleidoscope 2010, then feel free to join me tomorrow at 5pm EDT when I will giving a webinar on this talk. You can register online at https://www1.gotomeeting.com/register/730452824


PlanetMySQL Voting: Vote UP / Vote DOWN

3 webinars on Upgrading MySQL

Wed, 07/21/2010 - 14:15

The IOUG Online Education Series: Get Real with Upgrades will include next week 3 different MySQL webinars. These are:

  • MySQL 5.1: Why and How to Upgrade by Sheeri Cabral on Tuesday, July 27, 12:00 p.m. – 1:00 p.m. CT
  • MySQL Upgrades With No Downtime by Sean Hull on Wednesday, July 28, 12:00 p.m. – 1:00 p.m. CT
  • MySQL Upgrade Best Practices by Matt Yonkovit on Thursday, July 29, 12:00 p.m. – 1:00 p.m. CT

Each speaker will be covering different areas and reasons for considering and implementing a successful MySQL Upgrade.


PlanetMySQL Voting: Vote UP / Vote DOWN

The open core issue (part two)

Wed, 07/21/2010 - 13:07

In the first part of this post I discussed the underlying division that drives the debate about open core, and the futility of arguing about what constitutes an “open source company” without any relevant definition.

Since then Monty Widenius has proposed a definition that would exclude any company that does not produce open source software (including open source support providers) and any company that does not provide access to 100% of its code (which would often exclude Red Hat as it moves to open source acquired code).

In the meantime others have declared that there is no such thing as an open source company and decided instead to discourage use of the term altogether. This is the logical conclusion of the argument that Open Source is Not a Business Model, and while this seems like a nuclear option, it does at least mean that we can hopefully avoid repeated arguments about whether company X is an open source company or not.

Since we seem to be able to move on from the theoretical argument about whether open core vendors are open source or not, it is an opportune moment to turn the debate towards a more practical assessment of the open core strategy and its strengths and weaknesses.

This second blog post turns attention to the open core strategy itself and examines some of the common criticisms. Some of them are valid, some exaggerated, and some are misunderstandings. If the debate is to progress it is important to stop fixating on issues that fit in the latter two categories and focus on those that fit in the first.

There are plenty of criticisms to choose from, and this is a complicated subject, so this is a long post. I have tried to cover all the major issues in one go in order to give a thorough representation of my views on the subject.

For an overview of the open core model itself and how it compares to other strategies for generating revenue see this post. With that in mind (deep breath) here goes:

Crippleware
Since open core relies on generating revenue from proprietary extensions to an open source core it is often asserted that the open source core will be crippled in some way to ensure that users opt for the proprietary version.

That is like claiming that open source support providers deliberately make open source projects difficult to work with in order to sell more support contracts.

Any strategy that worked like that would be flawed. That is why the open core strategy does not work that way.

Like the open source support strategy, open core relies on having a ubiquitous, fully functional, open source project.

Instead of selling support, open core vendors sell value-added features that are designed to be of value to paying customers. Of course the strategy relies on segmenting the audience for the product and delivering features that would be appropriate to each.

As Simon Phipps wrote:

“The community edition is used by a group of people who have the time and skills to deploy by themselves and who have no need of the many differences of the commercial versions. The commercial versions are feature-rich and effectively lock their users into a traditional commercial ISV relationship with the vendor.”

I’m sure open core vendors would dispute the reference to lock-in, but Simon’s comment makes it clear that there are two audiences for two separate products.

This distinction is important in understanding how Likewise Software can claim that customers drive open core: “The added functionality in Enterprise benefits a very specific segment of our community, and we work closely with our enterprise customers to ensure we provide value here.”

The point is that paying customers, as opposed to open source users, see value in the proprietary features and are prepared to buy the product. The strategy will fail if open source users also see value in those features but are denied them, or are forced to pay to adopt them.

The phrase “bait and switch” is often used to criticize the open core approach (indeed the term open core was promoted specifically to provide an alternative vocabulary to bait and switch) and suggests that users are either tricked or forced into taking the proprietary features. Clearly any strategy that relies on misleading potential customers is going to be short-lived.

It is true that some vendors are not great at communicating the differences between the open source core and the proprietary version in the past, but our previous transparency test indicated that they have got a lot better in that regard.

To some extent this is as a result of pressure from open source advocates. More than that though, I believe, it is the result of vendors realizing that a successfully executed open core strategy relies on transparency and in not attempting to sell anything to community users (and vastly improving the quality of their marketing communication).

While open core vendors have in the past been guilty of treating the community a sales pipeline, we have observed that the next generation of start-ups has learned that the best way to encourage a frictionless relationship between a vendor and its community is not to attempt to “convert” users at all.

It goes without saying that forcing users to use the proprietary extensions is going to be flawed – the open core strategy depends on keeping both users and paying customers happy, independently.

Managing that is not easy, as our research has confirmed. As previously noted, our CAOS report into how open source changes approaches to sales and marketing included a few choice quotes from open core vendors on its challenge:

“Number one [challenge] is differentiation between core and commercial.”

Clearly, the difficulty with open core is in deciding what features to put in which version, and what proportion of a company’s engineering effort should be focused on the open source project.

“We can compete with ourselves; i.e., our commercial product may not be purchased because our open source/core product contains sufficient functionality to solve customer problems.”

It is a significant challenge and some vendors have been better at it than others but it does not follow that because this challenge exists then the core must be crippleware. Indeed the success of the strategy depends on it not being crippleware.

“Continuing to maintain the right balance of functionality between the freely downloadable open core and the commercial extensions is both art and science. It’s critical to get that right so the model continues to grow and advance.”

Are there some open core open source projects that are lacking in quality? Of course, but that doesn’t mean that all open core open source projects are crippleware. There are some pretty crappy “fully functional” community-developed open source projects – that does not mean the community development model is flawed.

Half a product

In response to Larry Augustin’s statement that “Well over half of our engineering effort produces code that is released under an OSI approved license”, Tarus Balog commented:

“Well over half? Well, that’s pretty good, but is open source code something that can be divided? Can I say “here is the product, but you only get to use half of it under an open source license”. Who decides which half? If I look at it in binary, do I just get to use the ones or just the zeroes?”

I am assuming Tarus is deliberately misunderstanding Larry’s statement in order to be facetious/mischievous (the last line certainly suggests so) but the statement highlights another misconception of the open core strategy – that the vendor starts with a fully-featured product and then divides it into the basic core (open source) functionality and the value-added (proprietary) extensions.

As we have already stated, however, in order for the strategy to work the core project must be widely adopted. For that to happen it needs to be a complete project. As Jack Repenning notes:

“if the open parts accomplish their goal fully and well… then the open-source product deserves to be assessed on its own terms. If there are also commercially licensed, or even proprietary/closed things associated with it that together accomplish some larger goal, that’s a different product, not a betrayal of the open one.”

The biggest issue that open core has, in my opinion, is that it attempts to bypass Clayton Christensen’s law of Conservation of Attractive Profits, which states that “When attractive profits disappear at one stage in the value chain because a product becomes modular and commoditized, the opportunity to earn attractive profits with proprietary products will usually emerge at an adjacent stage.”

It is this law that explains why it is difficult for open source support vendors to generate significant profits (since they have commoditized their own stage of the value chain). The same is true of open core vendors, except that that are attempting to commoditize only a portion of their value chain – the core functionality offered by the open source version, while betting that their value added extensions are sufficiently differentiated to retain the ability to generate profits.

Christensen’s law dictates that it will always be easier to generate profit at an adjacent stage. Or, as Matt Asay explains in the context of the OpenStacks project: “The reason OpenStack may be a big winner is that Rackspace doesn’t need OpenStack to make money. At least, not directly.”

Community matters
Cloudera recently told The 451 Group that 50% of its engineering effort is focused on open source projects (specifically Hadoop and its related projects) with the other half focused on the proprietary capabilities that are delivered in Cloudera Enterprise.

No one would suggest that Apache Hadoop is a limited or crippled project, simply because Cloudera (and IBM, Karmasphere and others) are offering closed-source complementary products.

Of course the difference between Hadoop and SugarCRM community (or many other open source core projects) is that Hadoop is a community-developed project, while in vendor-led open core projects are dominated by a single vendor.

This enables accusations of lock-in and a lack of community contributions to the development process. I’ll address the first issue shortly, but with reference to community development it is undoubtedly true that the majority of vendor-led open core projects do not enjoy the benefits of a collaborative development process.

Carlo Daffara explains how the strategy is a tradeoff between monetization and contributions, noting that “it is simply not possible to get something like Linux or Apache with open core”.

That is true, but then that is also not the aim of open core. Vendor-led open source projects are invariably more focused on creating ubiquitous platform and lowering barriers to adoption than they are on creating ubiquitous platforms for collaborative development.

Not all open source software projects are collaboratively developed. Whether this is a concern is very much a matter of personal opinion – is it enough that software is under and open source license, or does it also have to be developed collaboratively?

It is a problem, of course, if a company actively avoids contributions from elsewhere on the grounds that that doing so would impact their proprietary extensions. Simon Phipps notes that one of the reasons NASA involved itself in the OpenStacks cloud projects was due to frustration with Eucalyptus Systems’ reluctance to accept contributions that competed with its closed extensions.

Clearly this highlights a potential problem for open core vendors, but it is one that actually contradicts the accusation that open core users have no choice but to accept the proprietary extensions. While there is lock-in associated with any software choice, one of the weaknesses of the open core approach is users could decide to fork and/or develop open source versions of the proprietary features.

Similarly, Dana Blankenhorn reports that SplendidCRM has already replicated the user interface delivered in SugarCRM’s paid-for versions and made it available in its own community edition.

It is important to note, however, that while issues related to the community contributions are a symptom of the open core licensing strategy, they are by no means exclusive to open core.

The recent debate about open core was kicked off my this post by former Compiere CEO Jorg Janke about the apparent failure of Compiere’s strategy in putting too much emphasis on the closed extensions (as well as mistakes related to the partnership model).

In a follow-up post Jorg turned his attention to the project’s lack of external contribution. While it is clear that this was an issue that was exacerbated by the open core strategy it is important to note that the development model was dictated by a decision that was taken prior to the open core strategy being adopted.

Similarly the vast majority of the developers of the MySQL database have always been employees of its owner (first MySQL, then Sun and now Oracle). The shift towards open core (and it hasn’t got there yet) came much later than the decisions that prompted the development model.

Development costs
One area in which the lack of community does matter, of course, is in the R&D costs of open core vendors. The greater a proportion of employees that you have focused on development (of open source or proprietary code) the greater your development costs are going to be. This is undoubtedly a valid criticism of the open core model as the company is failing to benefit from R&D cost savings in terms of both the open source core and closed source extensions.

Arguably, the company is also impacted by higher development and testing costs since the closed source extensions do not benefit from exposure to the open source user community. How significant this additional cost might be depends on the significance of the extensions and the relative size of the community (since the vendor will still go through the traditional alpha/beta testing with its paying customers).

Another cost, arguably, is the loss of quality in the proprietary extensions resulting from the smaller testing group and the lack of open source code review. Again this is a valid criticism, but it is one that belongs in a much larger debate about the relative benefits of open source and proprietary development strategies.

Venture-capitalist tool
Jorg’s initial post also discussed how VC investors had pushed Compiere towards the open core approach, and another criticism is that it is the chosen OSS-related business strategy of VCs. Again this is a valid argument. There is no doubt that VCs are attracted to the open core strategy and have encouraged its wider adoption.

However, it is also worth noting that there are exceptions to the rule. OpenLogic is a VC-backed company that has been vocally critical of open core, while xTuple is a self-funded open core vendor (there are other examples).

I would also point out that VCs are also fully aware that for the strategy to be successful it depends on a ubiquitous, full-functional open source core, and that attempts at crippleware will fail.

Lock-in and other problems
Perhaps the most obvious criticism of open core, from an open source perspective, is that it perpetuates the use of proprietary software. Again this is valid, and I have previously covered why I think open core vendors are limiting their opportunities by focusing on product-led strategies and leaving themselves open to accusations of lock-in.

Again these are really issues for a larger debate about the relative merits of proprietary and open source licensing.

Finally (one hopes) the other major criticism of open core vendors is that they are misusing the term open source to describe themselves (or as Henrik Ingo put it “So if I don’t call myself ‘open source vendor’, then everything is fine? (yes)”

Assuming the decision to avoid using terms like “open source company” are maintained, this becomes less of an issue, but it is worth noting that the attempts at policing the term have been counter-productive.

The point is this: if you want open core vendors to refer to themselves as “open core companies” rather than “open source companies” then demonizing the open core strategy is not the way to go about it. Is it any wonder that Larry Augustin does not want SugarCRM to be seen as open core when accusations of crippleware are being thrown around?

Previously, Redmonk’s Stephen O’Grady noted that there is the potential for serious collateral damage in the way the debate about open core licensing is progressing.

Henrik Ingo notes that companies like CollabNet (which is not open core) is “concerned about the negative image now attached to open core and worried that his company would then be suffering from the negative image too.“

There are many ways in which an open core strategy could fail, but that does not mean that all open core strategies will fail. Open core is just the strategy -how you execute that strategy determines whether you succeed or fail.

I agree with CollabNet’s Jack Repenning that the conversation needs to move “a bit towards how to do it right, and away from confrontation”.

That was my aim with these two posts. I am sure there are plenty of people who will disagree with plenty of the things that have been written above. My intention is not to be confrontational but to take a balanced view of the potential problems related to the open core strategy.

We will be writing more about other strategies for generating revenue from open source software, in a follow-up to our Open Source is Not a Business Model report, which is due to be published latter this year. It will provide more context for the economic motivators and issues involved in the various models, as well as updated research on which vendors are following which strategies, and why, as well as a survey to uncover what software users make of it all. The report will be freely available to CAOS subscribers. For more details of the CAOS research practice, and to apply for trial access, click here.


PlanetMySQL Voting: Vote UP / Vote DOWN

On “Replace Into”, “Insert Ignore”, and Secondary Keys

Wed, 07/21/2010 - 12:51

In posts on June 30 and July 6, I explained how implementing the commands “replace into” and “insert ignore” with TokuDB’s fractal trees data structures can be two orders of magnitude faster than implementing them with B-trees. Towards the end of each post, I hinted at that there are some caveats that complicate the story a little. In this post, I explain one of the complications: secondary indexes.

Secondary indexes act the same way in TokuDB as they do in InnoDB. They store the defined secondary key, and the primary key as a pointer to the rest of the row. So, say the table foo has the following schema:

create table (a int, b int, c int, primary key (a), key(b));

And we did:

insert into foo values (1,10,100),(2,20,200);

Logically, there is one dictionary that stores all the data (this is the clustered primary key). Let us call it the main dictionary:

key value 1 10,100 2 20,200

And there is another dictionary for the secondary key that stores the column ‘b’ and the primary key, ‘a’:

key value 10 1 20 2

For secondary indexes to work properly, there must be a one to one correspondence between elements in the secondary index and in the primary index. If this correspondence is broken, then the table is corrupt.

Now suppose we were to execute:

replace into foo values (1,1000,1000);

This does:


  • in main dictionary, overwrite the value of key ’1′ and value ’10,100′ with key ’1′ and value ’1000,1000′.
  • in secondary dictionary, remove the key ’10′ with value ’1′.
  • in secondary dictionary, insert the key ’1000′ and key ’1′.

Notice that we cannot perform the second step unless we know the content of the existing row that is being replaced. Learning the content of the existing row requires a lookup in the main dictionary, which incurs a disk seek.

So, when executing “replace into” or “insert ignore” on tables with secondary keys, all engines must still incur a disk seek on the primary dictionary to learn where associated elements are in a secondary index, whereas if no secondary keys exist, then TokuDB’s fractal trees can avoid this disk seek.

Even with secondary indexes, fractal tree indexes are preferred. B-trees still incur additional disk seeks on insertions into secondary indexes that fractal trees do not. However, with no secondary indexes, fractal trees can do away with the mandatory disk seek whereas B-trees do not.


PlanetMySQL Voting: Vote UP / Vote DOWN

Conversation starters for OSCON

Wed, 07/21/2010 - 09:01
I will be at OSCON in a few hours. Mohan and I have a talk on FlashCache on Thursday. The talk will have lots of details on the FlashCache implementation. I expect to be quiet except for a few slides on performance. Mohan and Paul did an amazing job getting FlashCache running on our servers. This is an opportunity to learn from Mohan.

As Percona has been doing a lot of work with it, I hope they will be at OSCON to discuss their experience with it.

There are other interesting things to talk about if your area is data management:

  • MySQL is doing great for their customers this year. I began using MySQL in 2005. This has been the best year for me. The 5.1 release is excellent and the 5.5 beta looks great. The MySQL development team has been fixing bugs fast. Merging InnoDB and MySQL into one company means that their development teams work better, fix bugs faster and talk more about new features.
  • Your SQL, not MySQL, is frequently the problem. Sometimes we make MySQL deployments better by hacking on MySQL. More often improvement comes from changing application SQL. While some of the application changes compensate for less than perfect behavior in MySQL, more are done to fix things that would be a problem for any database. The biggest thing that MySQL needs to fix in this area is monitoring. It must make it easier to identify performance problems. Until then tcpdump, Poor Mans Profiler and mk-query-digest are excellent options.
  • Where did Java go wrong? MySQL has a wonderful JDBC driver. I don't blame the implementation. But Java clients continue to cause too many database problems for me. I recently logged all SQL on a server and noticed that the JDBC client connected, ran 13 statements to prepare the connection (including 5 set autocommit statements) and then ran 1 query. That is an amazing amount of bloat. I have seen many other cases where preparing/returning a connection from/to the pool required 5 to 10 statements. Given that number of round trips between the client and server it isn't likely that the connection pool saves any overhead on the database. I am currently dealing with Java applications that set tx_isolation to read-committed for InnoDB. With MySQL 5.1 all binlog events written for such a connection must use row-based replication. For now I will assume that most of the Java apps use read-committed because they want to rather than because they need to.
  • Just because your database is sharded doesn't mean you lose joins. You lose the ability to do joins or enforce foreign keys across all of your data. But lots of interesting queries can be run within one shard. I prefer that long running queries using something other than MySQL as they will run much faster elsewhere.

PlanetMySQL Voting: Vote UP / Vote DOWN

Speaking at MySQL Meetup in Northern Virginia

Wed, 07/21/2010 - 05:34

The closest thing I know of to a “Northern Virginia MySQL Meetup” is the Sterling Database Data Solutions Group. I got in touch with the organizer and we scheduled a meeting next Wednesday July 28th. I’ll be presenting, and so will someone from Fusion-IO, a solid-state storage vendor. This is on short notice, so tell your friends about it! It would be great to grow a strong monthly meetup presence in this area.

Here’s the abstract I sent: “This talk covers best practices to help you get the most out of MySQL performance. It assumes you know a database well, though it need not be MySQL. We’ll cover several angles of the topic. Configuration is usually the first thing people ask about. Although it’s possible to misconfigure MySQL and get bad performance, the configuration options you need for good performance are few and rather simple. We’ll see how to inspect MySQL’s performance and status, also a fairly simple subject. Next is query tuning. There are a few surprises in MySQL due to its simpler query execution engine than Oracle or SQL Server. We’ll see how to avoid those surprises and work with the query optimizer. Finally, we’ll focus on what you should know if you are considering migrating part or all of your application from Oracle. There will be plenty of time for questions, so bring yours!”

Related posts:

  1. I’ll be speaking at the O’Reilly MySQL Conference 2010
  2. Speaking at Surge 2010
  3. Speaking at EdUI Conference 2009
  4. Speaking at CPOSC 2009
  5. Speaking about Maatkit at CPOSC


PlanetMySQL Voting: Vote UP / Vote DOWN

SQL trick: overcoming GROUP_CONCAT limitation in special cases

Wed, 07/21/2010 - 05:14

In Verifying GROUP_CONCAT limit without using variables, I have presented a test to verify if group_concat_max_len is sufficient for known limitations. I will follow the path where I assume I cannot control group_concat_max_len, not even in session scope, and show an SQL solution, dirty as it is, to overcome the GROUP_CONCAT limitation, under certain conditions.

Sheeri rightfully asks why I wouldn’t just set group_concat_max_len in session scope. The particular case I have is that I’m providing a VIEW definition. I’d like users to “install” that view, i.e. to CREATE it on their database. The VIEW does some logic, and uses GROUP_CONCAT to implement that logic.

Now, I have no control on the DBA or developer who created the view. The creation of the view has nothing to do with the group_concat_max_len setting on her database instance.

An example

OK, apologies aside. Using the sakila database, I execute:

mysql> SELECT GROUP_CONCAT(last_name) FROM actor \G *************************** 1. row *************************** GROUP_CONCAT(last_name): AKROYD,AKROYD,AKROYD,ALLEN,ALLEN,ALLEN,ASTAIRE,BACALL,BAILEY,BAILEY,BALE,BALL,BARRYMORE,BASINGER,BENING,BENING,BERGEN,BERGMAN,BERRY,BERRY,BERRY,BIRCH,BLOOM,BOLGER,BOLGER,BRIDGES,BRODY,BRODY,BULLOCK,CAGE,CAGE,CARREY,CHAPLIN,CHASE,CHASE,CLOSE,COSTNER,CRAWFORD,CRAWFORD,CRONYN,CRONYN,CROWE,CRUISE,CRUZ,DAMON,DAVIS,DAVIS,DAVIS,DAY-LEWIS,DEAN,DEAN,DEE,DEE,DEGENERES,DEGENERES,DEGENERES,DENCH,DENCH,DEPP,DEPP,DERN,DREYFUSS,DUKAKIS,DUKAKIS,DUNST,FAWCETT,FAWCETT,GABLE,GARLAND,GARLAND,GARLAND,GIBSON,GOLDBERG,GOODING,GOODING,GRANT,GUINESS,GUINESS,GUINESS,HACKMAN,HACKMAN,HARRIS,HARRIS,HARRIS,HAWKE,HESTON,HOFFMAN,HOFFMAN,HOFFMAN,HOPE,HOPKINS,HOPKINS,HOPKINS,HOPPER,HOPPER,HUDSON,HUNT,HURT,JACKMAN,JACKMAN,JOHANSSON,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KEITEL,KEITEL,KILMER,KILMER,KILMER,KILMER,KILMER,LEIGH,LOLLOBRIGIDA,MALDEN,MANSFIELD,MARX,MCCONAUGHEY,MCCONAUGHEY,MCDORMAND,MCKELLEN,MCKELLEN,MCQUEEN,MCQUEEN,MIRANDA,MONROE,MONROE,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,NOLTE,NOLTE,NOLTE,NOLTE,OLIVIER,OLIVIER,PALTROW,PALTROW,P 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +---------+------+--------------------------------------+ | Level   | Code | Message                              | +---------+------+--------------------------------------+ | Warning | 1260 | 1 line(s) were cut by GROUP_CONCAT() | +---------+------+--------------------------------------+ 1 row in set (0.00 sec)

So, my GROUP_CONCAT has been truncated. How much did I lose?

mysql> SELECT SUM(LENGTH(last_name) + 1) - 1 FROM actor; +--------------------------------+ | SUM(LENGTH(last_name) + 1) - 1 | +--------------------------------+ |                           1445 | +--------------------------------+

(In the above query I counted the separating commas; they are part of the GROUP_CONCAT limit).

The special case at hand

The proposed SQL trick assumes the following:

  • The length of the GROUP_CONCAT result is known to be under a certain value.
  • A GROUP_CONCAT of any set of n rows is known to be shorter than (or equal to) 1024 characters.

In our above example, I happen to know that the length of the GROUP_CONCAT result is below 2048. I also happen to know that any 100 rows will yield in a GROUP_CONCAT length of less than 1024.

How can I know this? Well, the length of my VARCHAR, or the fact I’m handling INT values can give me upper bounds on total lengths.

Steps towards the solution

Returning to our example, my intention becomes clearer: I want to work it out in two phases (later on I’ll show how this can be done in more phases). Any of the following is good:

mysql> SELECT GROUP_CONCAT(last_name) FROM actor WHERE actor_id BETWEEN 1 and 100 \G *************************** 1. row *************************** GROUP_CONCAT(last_name): GUINESS,WAHLBERG,CHASE,DAVIS,LOLLOBRIGIDA,NICHOLSON,MOSTEL,JOHANSSON,SWANK,GABLE,CAGE,BERRY,WOOD,BERGEN,OLIVIER,COSTNER,VOIGHT,TORN,FAWCETT,TRACY,PALTROW,MARX,KILMER,STREEP,BLOOM,CRAWFORD,MCQUEEN,HOFFMAN,WAYNE,PECK,SOBIESKI,HACKMAN,PECK,OLIVIER,DEAN,DUKAKIS,BOLGER,MCKELLEN,BRODY,CAGE,DEGENERES,MIRANDA,JOVOVICH,STALLONE,KILMER,GOLDBERG,BARRYMORE,DAY-LEWIS,CRONYN,HOPKINS,PHOENIX,HUNT,TEMPLE,PINKETT,KILMER,HARRIS,CRUISE,AKROYD,TAUTOU,BERRY,NEESON,NEESON,WRAY,JOHANSSON,HUDSON,TANDY,BAILEY,WINSLET,PALTROW,MCCONAUGHEY,GRANT,WILLIAMS,PENN,KEITEL,POSEY,ASTAIRE,MCCONAUGHEY,SINATRA,HOFFMAN,CRUZ,DAMON,JOLIE,WILLIS,PITT,ZELLWEGER,CHAPLIN,PECK,PESCI,DENCH,GUINESS,BERRY,AKROYD,PRESLEY,TORN,WAHLBERG,WILLIS,HAWKE,BRIDGES,MOSTEL,DEPP 1 row in set (0.00 sec) mysql> SELECT GROUP_CONCAT(last_name) FROM actor WHERE actor_id BETWEEN 101 and 200 \G *************************** 1. row *************************** GROUP_CONCAT(last_name): DAVIS,TORN,LEIGH,CRONYN,CROWE,DUNST,DEGENERES,NOLTE,DERN,DAVIS,ZELLWEGER,BACALL,HOPKINS,MCDORMAND,BALE,STREEP,TRACY,ALLEN,JACKMAN,MONROE,BERGMAN,NOLTE,DENCH,BENING,NOLTE,TOMEI,GARLAND,MCQUEEN,CRAWFORD,KEITEL,JACKMAN,HOPPER,PENN,HOPKINS,REYNOLDS,MANSFIELD,WILLIAMS,DEE,GOODING,HURT,HARRIS,RYDER,DEAN,WITHERSPOON,ALLEN,JOHANSSON,WINSLET,DEE,TEMPLE,NOLTE,HESTON,HARRIS,KILMER,GIBSON,TANDY,WOOD,MALDEN,BASINGER,BRODY,DEPP,HOPE,KILMER,WEST,WILLIS,GARLAND,DEGENERES,BULLOCK,WILSON,HOFFMAN,HOPPER,PFEIFFER,WILLIAMS,DREYFUSS,BENING,HACKMAN,CHASE,MCKELLEN,MONROE,GUINESS,SILVERSTONE,CARREY,AKROYD,CLOSE,GARLAND,BOLGER,ZELLWEGER,BALL,DUKAKIS,BIRCH,BAILEY,GOODING,SUVARI,TEMPLE,ALLEN,SILVERSTONE,WALKEN,WEST,KEITEL,FAWCETT,TEMPLE 1 row in set (0.00 sec)

It’s somewhat tempting to try the following trick based on IF, but see what happens:

mysql> SELECT GROUP_CONCAT(IF(actor_id BETWEEN 1 AND 100, last_name, '')) FROM actor\G *************************** 1. row *************************** GROUP_CONCAT(IF(actor_id BETWEEN 1 AND 100, last_name, '')): AKROYD,AKROYD,,,,,ASTAIRE,,BAILEY,,,,BARRYMORE,,,,BERGEN,,BERRY,BERRY,BERRY,,BLOOM,BOLGER,,BRIDGES,BRODY,,,CAGE,CAGE,,CHAPLIN,CHASE,,,COSTNER,CRAWFORD,,CRONYN,,,CRUISE,CRUZ,DAMON,DAVIS,,,DAY-LEWIS,DEAN,,,,DEGENERES,,,DENCH,,DEPP,,,,DUKAKIS,,,FAWCETT,,GABLE,,,,,GOLDBERG,,,GRANT,GUINESS,GUINESS,,HACKMAN,,HARRIS,,,HAWKE,,HOFFMAN,HOFFMAN,,,HOPKINS,,,,,HUDSON,HUNT,,,,JOHANSSON,JOHANSSON,,JOLIE,JOVOVICH,KEITEL,,,KILMER,KILMER,KILMER,,,,LOLLOBRIGIDA,,,MARX,MCCONAUGHEY,MCCONAUGHEY,,MCKELLEN,,MCQUEEN,,MIRANDA,,,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,,,,,OLIVIER,OLIVIER,PALTROW,PALTROW,PECK,PECK,PECK,PENN,,PESCI,,PHOENIX,PINKETT,PITT,POSEY,PRESLEY,,,,,SINATRA,SOBIESKI,STALLONE,STREEP,,,SWANK,TANDY,,TAUTOU,TEMPLE,,,,,TORN,TORN,,TRACY,,VOIGHT,WAHLBERG,WAHLBERG,,WAYNE,,,WILLIAMS,,,WILLIS,WILLIS,,,WINSLET,,,WOOD,,WRAY,ZELLWEGER,, 1 row in set (0.00 sec)

We’re getting there, though. We will mimic GROUP_CONCAT‘s separator by using CONCAT, and remove the default separator:

SELECT GROUP_CONCAT( IF(actor_id BETWEEN 1 AND 100, CONCAT(',', last_name), '') SEPARATOR '' ) AS result FROM actor \G *************************** 1. row *************************** result: ,AKROYD,AKROYD,ASTAIRE,BAILEY,BARRYMORE,BERGEN,BERRY,BERRY,BERRY,BLOOM,BOLGER,BRIDGES,BRODY,CAGE,CAGE,CHAPLIN,CHASE,COSTNER,CRAWFORD,CRONYN,CRUISE,CRUZ,DAMON,DAVIS,DAY-LEWIS,DEAN,DEGENERES,DENCH,DEPP,DUKAKIS,FAWCETT,GABLE,GOLDBERG,GRANT,GUINESS,GUINESS,HACKMAN,HARRIS,HAWKE,HOFFMAN,HOFFMAN,HOPKINS,HUDSON,HUNT,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KILMER,KILMER,KILMER,LOLLOBRIGIDA,MARX,MCCONAUGHEY,MCCONAUGHEY,MCKELLEN,MCQUEEN,MIRANDA,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,OLIVIER,OLIVIER,PALTROW,PALTROW,PECK,PECK,PECK,PENN,PESCI,PHOENIX,PINKETT,PITT,POSEY,PRESLEY,SINATRA,SOBIESKI,STALLONE,STREEP,SWANK,TANDY,TAUTOU,TEMPLE,TORN,TORN,TRACY,VOIGHT,WAHLBERG,WAHLBERG,WAYNE,WILLIAMS,WILLIS,WILLIS,WINSLET,WOOD,WRAY,ZELLWEGER 1 row in set (0.00 sec) Solution

Let’s combine all we had so far to get the final result:

SELECT SUBSTRING( CONCAT( GROUP_CONCAT( IF(actor_id BETWEEN 1 AND 100, CONCAT(',', last_name), '') SEPARATOR '' ), GROUP_CONCAT( IF(actor_id BETWEEN 101 AND 200, CONCAT(',', last_name), '') SEPARATOR '' ) ), 2 ) AS result FROM actor \G *************************** 1. row *************************** result: AKROYD,AKROYD,ASTAIRE,BAILEY,BARRYMORE,BERGEN,BERRY,BERRY,BERRY,BLOOM,BOLGER,BRIDGES,BRODY,CAGE,CAGE,CHAPLIN,CHASE,COSTNER,CRAWFORD,CRONYN,CRUISE,CRUZ,DAMON,DAVIS,DAY-LEWIS,DEAN,DEGENERES,DENCH,DEPP,DUKAKIS,FAWCETT,GABLE,GOLDBERG,GRANT,GUINESS,GUINESS,HACKMAN,HARRIS,HAWKE,HOFFMAN,HOFFMAN,HOPKINS,HUDSON,HUNT,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KILMER,KILMER,KILMER,LOLLOBRIGIDA,MARX,MCCONAUGHEY,MCCONAUGHEY,MCKELLEN,MCQUEEN,MIRANDA,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,OLIVIER,OLIVIER,PALTROW,PALTROW,PECK,PECK,PECK,PENN,PESCI,PHOENIX,PINKETT,PITT,POSEY,PRESLEY,SINATRA,SOBIESKI,STALLONE,STREEP,SWANK,TANDY,TAUTOU,TEMPLE,TORN,TORN,TRACY,VOIGHT,WAHLBERG,WAHLBERG,WAYNE,WILLIAMS,WILLIS,WILLIS,WINSLET,WOOD,WRAY,ZELLWEGER,AKROYD,ALLEN,ALLEN,ALLEN,BACALL,BAILEY,BALE,BALL,BASINGER,BENING,BENING,BERGMAN,BIRCH,BOLGER,BRODY,BULLOCK,CARREY,CHASE,CLOSE,CRAWFORD,CRONYN,CROWE,DAVIS,DAVIS,DEAN,DEE,DEE,DEGENERES,DEGENERES,DENCH,DEPP,DERN,DREYFUSS,DUKAKIS,DUNST,FAWCETT,GARLAND,GARLAND,GARLAND,GIBSON,GOODING,GOODING,GUINESS,HACKMAN,HARRIS,HARRIS,HESTON,HOFFMAN,HOPE,HOPKINS,HOPKINS,HOPPER,HOPPER,HURT,JACKMAN,JACKMAN,JOHANSSON,KEITEL,KEITEL,KILMER,KILMER,LEIGH,MALDEN,MANSFIELD,MCDORMAND,MCKELLEN,MCQUEEN,MONROE,MONROE,NOLTE,NOLTE,NOLTE,NOLTE,PENN,PFEIFFER,REYNOLDS,RYDER,SILVERSTONE,SILVERSTONE,STREEP,SUVARI,TANDY,TEMPLE,TEMPLE,TEMPLE,TOMEI,TORN,TRACY,WALKEN,WEST,WEST,WILLIAMS,WILLIAMS,WILLIS,WILSON,WINSLET,WITHERSPOON,WOOD,ZELLWEGER,ZELLWEGER 1 row in set (0.00 sec) More than 2048 characters?

As far as the upper limit is known, we can work this trick in the same manner. Assume the length is expected to be 3000 characters. We can then CONCAT three, or four, or five GROUP_CONCAT results, each of fewer number of rows as required. Just copy+paste the above GROUP_CONCAT(…) clause a couple more times, and edit the actor_id BETWEEN n AND m clauses.

Moreover, further using MIN(actor_id), MAX(actor_id) can minimize dependencies on specific values.

Dirty? ugly? Not arguing. But it’s working! In some ways it is not such a dirty solution: I’m avoiding using stored routines (easily setting the group_concat_max_len session variable from within a stored function’s body, see Justin’s suggestion), so I’m only relying on SQL, not on “external” technology, if I may call it that way.


PlanetMySQL Voting: Vote UP / Vote DOWN

Why software startups decide to patent ... or not

Wed, 07/21/2010 - 05:00

This guest post was co-written by Pamela Samuelson, Distinguished Professor of Law and Information at the University of California, Berkeley, and Stuart J.H. Graham, Chief Economist for the U.S. Patent & Trademark Office. This piece will also appear in the November 2010 issue of Communications of the ACM.

Two-thirds of the approximately 700 software entrepreneurs who participated in the 2008 Berkeley Patent Survey report that they neither have nor are seeking patents for innovations embodied in their products and services. These entrepreneurs rate patents as the least important mechanism among seven options for attaining competitive advantage in the marketplace. Even software startups that hold patents regard them as providing only a slight incentive to invest in innovation.

These are three of the most striking findings from our recently published article, "High Technology Entrepreneurs and the Patent System: Results of the 2008 Berkeley Patent Survey."

After providing some background about the survey, this column will discuss some key findings about how software startup firms perceive, use and are affected by the patent system.

While the three findings highlighted above might seem to support a software patent abolitionist position, it is significant that a third of the software entrepreneurs reported having or seeking patents, and that they perceive patents to be important to persons or firms from whom they hope to obtain financing.

Survey background

More than 1,300 high technology entrepreneurs in the software, biotechnology, medical devices, and computer hardware fields filled out the Berkeley Patent Survey. All of these firms had been started no more than ten years before the survey was conducted. We drew our sample from a general population of software firms registered with Dun & Bradstreet (D&B) and from the VentureXpert (VX) database that has a rich data set on venture-backed startups. (Just over 500 of the survey respondents were D&B firms; just under 200 were VX firms.)

Eighty percent of the software respondents were either the CEOs or CTOs of their firms, and most had experience in previous startups. The average software firm had 58 employees, half of whom were engineers. Between 10 and 15 percent of the software startup respondents among the D&B respondents were venture-backed firms. Among the software respondents, only 2 percent had experienced an initial public offering (IPO), while 9 percent had been acquired by another firm.

Our interest in conducting this survey arose because high technology entrepreneurs have contributed significantly to economic growth in recent decades. They build firms that create new products, services, organizations, and opportunities for complementary economic activities. We were curious to know the extent to which high tech startups were utilizing the patent system, as well as to learn their reasons for choosing to avail themselves of the patent system -- or not.

The basic economic principle underlying the patent system is that technology innovations are often expensive, time-consuming, and risky to develop, although once developed, these innovations are often cheap and easy to copy. In the absence of intellectual property rights (IPRs), innovative high tech firms may have insufficient incentives to invest in innovation insofar as they cannot recoup their research and development (R&D) expenses and justify further investments in innovation because of cheap copies that undermine the firms' recoupment strategy.

Although this economic principle applies to all companies, early-stage technology firms might, we conjectured, be more sensitive to IPRs than more mature firms. The former often lack various kinds of complementary assets (such as well-defined marketing channels and access to cheap credit) that the latter are more likely to enjoy. We decided it would be worthwhile to test this conjecture empirically. With generous funding from the Ewing Marion Kauffman Foundation, we and two other colleagues designed and carried out the survey and analyzed the results.

Why startups decide to patent -- or not to

The most important reasons for seeking patents, as reported by the software executives who responded to the Berkeley Patent Survey, were these:

  1. to prevent competitors from copying the innovation (2.3 on a 4 point scale, where 2 was moderately important)
  2. to enhance the firms’ reputation (2.2)
  3. and to secure investment and improve the likelihood of an IPO (1.96 and 1.97 respectively)

The importance of patents to investors was also evident from survey data showing striking differences in the rate of patenting among the VX and the D&B software companies.

Three-quarters of the D&B firms had no patents and were not seeking them. Because the D&B firms are, we believe, typical of the population of software startup firms in the U.S., their responses may be representative of patenting rates among software startups generally. It is, in fact, possible that the overall percentage of software startup patenting is lower than this, insofar as patent holders may have been more likely than other software entrepreneurs to take time to fill out a Berkeley Patent Survey.

In striking contrast to the D&B respondents, over two-thirds of the VX software startup respondents in the sample, all venture-backed, had or were seeking patents. We cannot say why these VC-backed firms were more likely to seek patents than other firms. Perhaps VCs are urging the firms they fund to seek patents; or VCs may be choosing to fund the development of software technologies that VCs think are more amenable to patenting.

Interestingly, the rate of patenting did not vary by the age of the firm (that is, older firms did not patent at rates statistically significant from younger firms).

Why forgo patenting?

The survey asked two sets of questions about decisions to forego patenting: For the last innovation for which the firm chose not to seek a patent, what factors influenced this decision, and then what was the most important factor in the decision?

The costs of obtaining and of enforcing patents emerged as the first and second most frequent explanation. Twenty-eight percent of the software startups reported that the costs of obtaining patents had been the most important factor in this decision, and 12 percent said that the costs of enforcing patents was the most important factor. (They reported that average cost of getting a software patent was just under $30,000.)

Ease of inventing around the innovation and satisfaction with trade secrecy also influenced software startup decisions not to seek patents, although only rarely were these factors considered the most important.

Intriguingly, more than 40 percent of the software executive respondents cited the unpatentability of the invention as a factor in decisions to forego patenting, and almost a quarter of them rated this as the most important factor. Indeed, unpatentability ranked just behind costs of obtaining patents as the most frequently cited "most important factor" for not seeking patents.

It is difficult to know what to make of the unpatentability finding. One explanation might be that the software entrepreneur respondents believed that patent standards of novelty, non-obviousness, and the like are so rigorous that their innovation might not have satisfied patent requirements. Yet, because the patentability of software innovations has been contentious for decades, it may also be that a significant number of these entrepreneurs have philosophical or practical objections to patents in their field.



How important are patents to competitive advantage?

One of the most striking findings of our study is that software firms ranked patents dead last among seven strategies for attaining competitive advantage identified by the survey, as Figure 1 below shows. (The relative unimportance of patents for competitive advantage in the software field contrasts sharply with the perceived importance of patents in the biotech industry, where patents are ranked the most important means of attaining such advantage.)



Figure 1: Measures of Capturing "Competitive Advantage" from Inventions



As Figure 1 shows, software startups regard first-mover advantage as the single most important strategy for attaining competitive advantage. Next most important was complementary assets (e.g., providing services for licensed software or offering a proprietary complement to an open source program).

Interestingly, these two strategies for getting ahead in the market outstrip the IPRs about which we inquired for software firms. Among IPRs, though, copyrights and trademarks, closely followed by secrecy and difficulties of reverse engineering, outranked patents as means of attaining competitive advantage among software respondents by a statistically significant margin.



What incentive effects do patents have?


The Berkeley Patent survey asked startup executives to rate the incentive effects of patents on a scale, where 0 = no incentive, 1 = weak incentive, 2 = moderate incentive, and 3 = strong incentive, for engaging in four types of innovation: (1) inventing new products, processes, or services, (2) conducting initial R&D, (3) creating internal tools or processes, and (4) undertaking the risks and costs of commercializing the innovation.

We were surprised to discover that the software respondents reported that patents provide only weak incentives for engaging in core activities, such as invention of new products (.96) and commercialization (.93). By contrast, biotech and medical device firms reported just above 2 (moderate incentives) for these same questions.

Interestingly, the results did not change significantly even when focusing only on responses from software entrepreneurs whose firms hold at least one patent or application. Even patent-holding software entrepreneurs reported that patents provide just above a weak incentive for engaging in these innovation-related activities.



Resolving a paradox

If patents provide only weak incentives for investing in innovation among software startups, why are two-thirds of the VX firms and at least one-quarter of the D&B firms seeking patents?

The answer may lie in the perception among software entrepreneurs that patents may be important to potential funders, such as venture capitalists (VCs), angel investors, other firms, commercial banks, and friends and family. Sixty percent of software startups that had negotiated with VCs reported that that they perceived patents to be an important factor in VC decisions about whether to make the investments. Between 40 and 50 percent of the software respondents reported that patents were important to other types of investors, such as angels, investment banks, and other companies.



How well is the patent system working?

While most of the Berkeley Patent Survey questions focused on what firms had actually been doing vis-à-vis patents, we decided to ask a few questions to gauge the perception of high tech entrepreneurs about the patent system. We asked, for example, how well the entrepreneurs perceive the patent system to be working for them and for their industry. The scale for responses ranged from 0 = very poorly to 4 = very well, and 2 = neither poorly or well.

The software entrepreneurs' for-my-industry rating was 1.6 and their for-my-firm rating was 1.7. Both results tend toward the poorly end of the scale (in contrast to the biotech and medical device firms that reported above 2 ratings on both questions).

It is interesting is that the VX firms were slightly less positive about the patent system than the D&B firms, although the difference was not statistically significant. We also tested to see if the responses were bipolar (that is, did some software firms rate the patent system very poorly and their ratings canceled out by some positive responses?), but discovered that the ratings fell into a normal distribution, suggesting that we had drawn a sample from a cross-section of the population.



Conclusion

Over the next several years, we expect to engage in further analysis of the results of the 2008 Berkeley Patent Survey and to report new findings about the roles that patents play in the software industry. The initial findings reported here and in the larger article suggest that software entrepreneurs do not find persuasive the canonical story that patents provide strong incentives to invest in technology innovation. These executives regard first-mover advantage and complementary assets as more important than IPRs in conferring competitive advantage upon their firms. Moreover, among IPRs, copyrights and trademarks are perceived to be more important than patents. Still, about one-third of our software entrepreneur respondents reported having or seeking patents, and their perception that their investors care about patents seems to be a key factor in decisions to obtain patents.


Related:



References:

Stuart J.H. Graham, Robert P. Merges, Pam Samuelson, & Ted Sichelman, High Technology Entrepreneurs and the Patent System: Results of the 2008 Berkeley Patent Survey, Berkeley Technology Law Journal, 25:4, pp. 1255-1327 (2010), available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1429049.


About the Authors:

Pamela Samuelson is the Richard M. Sherman Distinguished Professor of Law & Information, University of California, Berkeley.

Stuart J.H. Graham is on leave from his position as an Assistant Professor at the Georgia Institute of Technology, College of Management, to serve as the Chief Economist for the U.S. Patent & Trademark Office (USPTO). The views expressed in this article are his own, and are not the views of the USPTO.


PlanetMySQL Voting: Vote UP / Vote DOWN

Transaltion of "Chapter 8. Large amount of data." of "Methods for searching errors in SQL application" just published

Wed, 07/21/2010 - 01:11

I started translation of Part 3 "Other cases". In this part I mostly describe how wrong settings can lead to misterious errors. First chapter in this part about large amount of data.



Part 3. Other cases.


Chapter 8. Large amount of data.


There are cases when query is just symptom of wrong behavior, but true reason is wrong settings.



One of the cases is too small max_allowed_packet for data sent. MySQL
server variable max_allowed_packet defines maximum possible amount of
data which MySQL server can receive or send. Amount of
max_allowed_packet is specified in bytes.



Error usually looks like:





$mysql51 test <phpconf2009_1.sql

ERROR 1153 (08S01) at line 33: Got a packet bigger than 'max_allowed_packet' bytes


...


Rest of the chapter is here



PlanetMySQL Voting: Vote UP / Vote DOWN

Speaking at OSCON 2010

Tue, 07/20/2010 - 18:59

I am speaking in a few days at OSCON 2010 in Portland, Oregon. My talk, “MySQL Bottleneck Hunters – How Schooner Increased MySQL Performance by 8x” is about how Schooner optimized MySQL to run on modern hardware with flash memory. Come on by if you’re at OSCON!

Here’s the abstract:

MySQL Bottleneck Hunters – How Schooner Increased MySQL Performance by 8x — Thursday, July 22, 2010 at 11:30am

MySQL users have an insatiable need for speed, capacity, and availability, all at a reasonable cost. This session will provide technical overview of the approach that Schooner engineering took to optimize MySQL Enterprise and InnoDB with flash memory, multi-core processors, and DRAM to achieve an 8x improvement in performance relative to existing systems.

This session will provide a crash course in the performance tuning tools and techniques that Schooner used to radically improve MySQL performance on commodity hardware. It will also offer an overview of the results that were achieved, and compare these results to other commonly deployed MySQL server architectures, including stock MySQL on SSDs and Fusion-io, using the industry-standard DBT2 benchmark.


PlanetMySQL Voting: Vote UP / Vote DOWN

Estimating Replication Capacity

Tue, 07/20/2010 - 18:51

It is easy for MySQL replication to become bottleneck when Master server is not seriously loaded and the more cores and hard drives the get the larger the difference becomes, as long as replication
remains single thread process. At the same time it is a lot easier to optimize your system when your replication runs normally - if you need to add/remove indexes and do other schema changes you probably would be looking at some methods involving replication if you can't take your system down. So here comes the catch in many systems - we find system is in need for optimization when replication can't catch up but yet optimization process we're going to use relays on replication being functional and being able to catch up quickly.

So the question becomes how can we estimate replication capacity, so we can deal with replication load before slave is unable to catch up.

Need to replication capacity is not only needed in case you're planning to use replication to perform system optimization, it is also needed on other cases. For example in sharded environment you may need to schedule downtime or set object read only to move it to another shard. It is much nicer if it can be planned in advance rather than done on emergency basics when slave(s) are unable to catch up and application is suffering because of stale data. This especially applies to Software as Service providers which often have very strict SLA agreements with their customers and which can have a lot of data per customer so move can take considerable amount of time.

So what is replication capacity I call replication capacity the ability to replicate the master load. If replication is able to replicate 3 times the write load from the master without falling behind I will call it replication capacity of 3. When used with context of applying binary logs (for example point in time recovery from backup) replication capacity of 1 will mean you can reply 1 hour worth of binary logs within 1 hour. I will call "replication load" the inverse of replication capacity - this is basically what percentage of time the replication thread was busy replicating events vs staying idle.

Note you can speak about idle replication capacity, when box does not do anything else as well as loaded replication capacity when the box serves the normal load. Both are important. You care about idle replication capacity when you have no load on the slave and need it to catch up or when restoring from backup, the loaded replication capacity matters during normal operation.

So we defined what replication capacity is. There is however no tools which can tell us straight what replication capacity is for the given system. It also tends to float depending on the load similar as loadavg metrics. Here are some of the ways to measure it:

1) Use "UserStats" functionality from Google patches, which is now available in Percona Server and MariaDB. This is the probably the easiest and most accurate approach but it
does not work in Oracle MySQL Server. set userstat_running=1 and run following query:

PLAIN TEXT SQL:
  1. mysql> SELECT * FROM information_schema.user_statistics WHERE user="#mysql_system#" \G
  2. *************************** 1. row ***************************
  3. USER: #mysql_system#
  4. TOTAL_CONNECTIONS: 1
  5. CONCURRENT_CONNECTIONS: 0
  6. CONNECTED_TIME: 446
  7. BUSY_TIME: 74
  8. CPU_TIME: 0
  9. BYTES_RECEIVED: 0
  10. BYTES_SENT: 63
  11. BINLOG_BYTES_WRITTEN: 0
  12. ROWS_FETCHED: 0
  13. ROWS_UPDATED: 127576
  14. TABLE_ROWS_READ: 4085689
  15. SELECT_COMMANDS: 0
  16. UPDATE_COMMANDS: 119127
  17. OTHER_COMMANDS: 89557
  18. COMMIT_TRANSACTIONS: 90259
  19. ROLLBACK_TRANSACTIONS: 0
  20. DENIED_CONNECTIONS: 1
  21. LOST_CONNECTIONS: 0
  22. ACCESS_DENIED: 0
  23. EMPTY_QUERIES: 0
  24. 1 row IN SET (0.00 sec)

In this case CONNECTED_TIME is 446 second, out of this replication thread was busy (BUSY_TIME) 74 seconds which means replication capacity is 446/74 = 6
You normally would not like to measure it from the start but rather take the difference in these counters every 5 minutes or other interval of your choice.

2) Use full slow query log and mk-query-digest. This method is great for one time execution especially as it comes together with giving you the list of queries which load replication
the most. It however works only with statement level replication. You need to set long_query_time=0 and log_slave_slow_statements=1 for this method to work.
Get the log file which will include all queries MySQL server ran with their times and run mk-query-digest with filter to only check queries from replication thread:

mk-query-digest slow-log --filter '($event->{user} || "") =~ m/[SLAVE_THREAD]/' > /tmp/report-slave.txt

In the report you will see something like this as a header:

PLAIN TEXT SQL:
  1. # 475s user time, 1.2s system time, 80.41M rss, 170.38M vsz
  2. # Current date: Mon Jul 19 15:12:24 2010
  3. # Files: slow-log
  4. # Overall: 1.22M total, 1.27k unique, 558.56 QPS, 0.37x concurrency ______
  5. # total min max avg 95% stddev median
  6. # Exec time 819s 1us 92s 669us 260us 120ms 93us
  7. # Lock time 28s 0 166ms 23us 49us 192us 25us
  8. # Rows sent 4.27k 0 325 0.00 0 1.04 0
  9. # Rows exam 30.88M 0 1.28M 26.48 0 3.07k 0
  10. # Time range 2010-07-19 14:35:53 to 2010-07-19 15:12:22
  11. # bytes 350.99M 5 1022.34k 301.01 719.66 5.75k 124.25
  12. # Bytes sen 1.94M 0 9.42k 1.67 0 110.38 0
  13. # Killed 0 0 0 0 0 0 0
  14. # Last errn 34.11M 0 1.55k 29.26 0 185.83 0
  15. # Merge pas 0 0 0 0 0 0 0
  16. # Rows affe 875.19k 0 17.55k 0.73 0.99 25.61 0.99
  17. # Rows read 2.20M 0 14.83k 1.88 1.96 24.68 1.96
  18. # Tmp disk 4.15k 0 1 0.00 0 0.06 0
  19. # Tmp table 14.19k 0 2 0.01 0 0.14 0
  20. # Tmp table 8.30G 0 2.01M 7.12k 0 117.75k 0
  21. # 0% (5k) Filesort
  22. # 0% (5k) Full_join
  23. # 0% (7k) Full_scan
  24. # 0% (10k) Tmp_table
  25. # 0% (4k) Tmp_table_on_disk

There is a lot of interesting you can find out from this header but in relation to replication capacity - you can get replication load, which is same as "concurrency" figure (0.37x) The concurrency as reported by mk-query-digest is sum of query execution time vs time range the log file covers. In this case as we know there is only one replication thread it will be same as replication load. This gives us replication capacity of 1/0.37 = 2.70

This method should work with original MySQL Server in theory, though I have not tested it. Some versions had log_slave_slow_statements unreliable and also you may need to adjust regular expression for finding users replication thread uses.

3) Processlist Polling This method is simple - the Slave thread has different status in Show Processlist depending on if it processes query or simply waiting. By pooling processlist frequently (for example 10 times a second) we can compute the approximate percentage the thread was busy vs idle. Of course running processlist very aggressively can be an overhead especially if it is busy system with a lot of connections

PLAIN TEXT SQL:
  1. mysql> SHOW processlist;
  2. +--------+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
  3. | Id | User | Host | db | Command | Time | State | Info |
  4. +--------+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
  5. | 801812 | system user | | NULL | Connect | 2665 | Waiting FOR master TO send event | NULL |
  6. | 801813 | system user | | NULL | Connect | 0 | Has READ ALL relay log; waiting FOR the slave I/O thread TO UPDATE it | NULL |
  7. | 802354 | root | localhost | NULL | Query | 0 | NULL | SHOW processlist |
  8. +--------+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
  9. 3 rows IN SET (0.00 sec)

4) Slave Catchup/Binlog Application method. We can just get the spare server with backups restored on it and apply binary log to it. If 1 hour worth of binary logs applies for 10 minutes we have replication capacity of 6. The challenge of course having spare server around and it is quite labor intensive. At the same time it can be good measurement to take during backup recovery trials when you're doing this activity anyway. Using this way you can also measure "cold" vs "hot" replication capacity as well as how long replication warmup takes. It is very typical for servers with cold cache to perform a lot slower then they are warmed up. Measuring times for each binary log separately should give you these numbers.

The less intrusive process which can be done in production (especially if you have slave which is used for backups/reporting etc) is to stop the replication for some time and when see how long it takes to catch up. If you paused replication for 10 minutes and it took 5 minutes to catch up your replication capacity will be 3 (not 2) because you not only had to process the events for outstanding 10 minutes but also for these 5 minutes it took to catch up. The formula is (Time_Replication_Paused+Time_Took_To_Catchup)/Time_Took_To_Catchup.

So how much of replication capacity do you need in the healthy system ? It depends a lot on many things including how fast do you need to be able to recover from backups and how much your load variance is. A lot of systems have special requirements on the time it takes to warmup too (there are different things you can do about it too). First I would measure replication capacity on 5 minute intervals (or something similar) because it tends to vary a lot. When I would suggest to ensure the loaded replication capacity is at least 3 during the peak load and 5 during the normal load. This applies to normal operational load - if you push heavy ALTER TABLE through replication they will surely get your replication capacity down for their duration.

One more thing about these methods - methods 1,2,3 work well only if replication capacity is above 1, so system is caught up. If it is less than 1, so the master writes more binary logs than slave can process they will show number close to 1. the method 4 however with work even if replication can't ever catch up - If 1 hour worth of binary logs takes 2 hours to apply, your replication capacity is 0.5.

Entry posted by peter | No comment

Add to: | | | |


PlanetMySQL Voting: Vote UP / Vote DOWN

Three editions of MySQL are available

Tue, 07/20/2010 - 15:19

Yes, you read the title correctly — there are three editions of MySQL available, according to http://www.mysql.com/products/enterprise/server.html. Well, that page names two, and then of course there is the community edition….

From the manual page:

MySQL Enterprise Server is available in the following editions:

* MySQL Enterprise Server – Pro is the world’s most popular open source database that enables you to rapidly deliver high performance and scalable Online Transaction Processing (OLTP) applications.
* MySQL Enterprise Server – Advanced is the most comprehensive edition of MySQL. It provides all the benefits of MySQL Enterprise Server Pro and adds horizontal table and index partitioning for improving the performance and management of VLDBs (Very Large Databases).

How is “horizontal table and index partitioning” different from the regular partitioning available in MySQL 5.1?

Those of us that have been around for the past 3 or so years know that there was a point in time where there were two different editions of MySQL available, back when MySQL Enterprise and MySQL Community were actually different. But that experiment was a complete failure, and the code is now the same. MySQL Enterprise does package the software in a way that is not available to the community, specifically the quarterly service pack (QSP) releases. But the actual code….the same.

The pricing page at http://globalspecials.sun.com/store/mysql/ContentTheme/pbPage.categoryEnterprise shows that the Advanced server can be acquired for $3k (Gold) or $5k (Platinum) per year. The fee is worth it for the support MySQL will give you, but why is MySQL muddying the waters by having more “editions”, which very likely are not even different code?

(Special thanks to Aaron Macks for pointing out the existence of mysql-advanced, which was the impetus for this blog post.)


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL HA with DRDB and Heartbeat on CentOS 5.5

Tue, 07/20/2010 - 12:26

This is one of a few MySQL High Availability strategies.  I have used this for years and found it work great.  If you don’t know about DRBD and MySQL you should read Peter’s comments.

These are step by step instructions for Redhat 5 or CentOS.

If you need more details please refer to:
http://www.drbd.org/users-guide/

Configuring MySQL for DRBD
http://dev.mysql.com/doc/refman/5.1/en/ha-drbd-install-mysql.html

Getting started:

The OS in this example is CentOS 5.5.  I added a new disk (/dev/sde) to the four disk RAID-5 and RAID-1 I was already using.   I’m only creating an 8 gig disk (vmware). You should start with a partition (LVM and or RAID) partition big enough for your data.

# uname -a
Linux db1.grennan.com 2.6.18-194.8.1.el5 #1 SMP Thu Jul 1 19:04:48 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md1              24065660   2826564  19996896  13% /
/dev/md0                101018     20988     74814  22% /boot
tmpfs                   513476         0    513476   0% /dev/shm

# fdisk -l /dev/sde

Disk /dev/sde: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1        1044     8385898+  83  Linux

DRBD:

Installation:
On machine1 and machine2 install DRBD and its kernel module.  You may need to review the packages you have available using ‘yum list | grep drbd’.  These are for CentOS 5.5.  You may also need to reboot after this step.

 # yum -y install drbd  # yum –y install kmod-drbd82.x86_64 # modprobe drbd  

Configuration:
On both machines edit this configuration file.  I have highlighted parts you will need to edit in red.

# vi /etc/drbd.conf # # please have a a look at the example configuration file in # /usr/share/doc/drbd82/drbd.conf # # Our MySQL share resource db { protocol C; startup { wfc-timeout 0; degr-wfc-timeout 120; } disk { on-io-error detach; } # or panic, ... syncer {  rate 6M; } on db1.grennan.com { device /dev/drbd1;} disk /dev/sde1;  address 192.168.2.13:7789;  meta-disk internal; } on db2.grennan.com { device /dev/drbd1; disk /dev/sde1; address 192.168.2.14:7789; meta-disk internal; } }

Manage DRDB processes:

On both machines run

# drbdadm adjust db

On machine1

# drbdsetup /dev/drbd1 primary –o # service drbd start

On machine2

# service drbd start

On both machines(see status):

# service drbd status

On machine1

# mkfs -j /dev/drbd1
# tune2fs -c -1 -i 0 /dev/drbd1
# mkdir /data
# mount -o rw /dev/drbd1 /data

On machine2

# mkdir /data

Test failover:
This is how you perform a manual fail over. You will use HA to do this for you in the next sections.

On primary (server1)

# umount /data
# drbdadm secondary db

On secondary (server2)

# drbdadm primary db
# service drbd status
# mount -o rw /dev/drbd1 /data
# df

Filesystem           1K-blocks      Used Available Use% Mounted on /dev/md1              24065660   1898696  20924764   9% / /dev/md0                101018     14886     80916  16% /boot tmpfs                   513472         0    513472   0% /dev/shm /dev/drbd1             8253948    149628   7685040   2% /data


Note we never formatted (mkfs) the disk on machine2! Here it is, ready to go, DRDB has copied all the data.

MySQL:

Here are a few notes for you to think about.

  • The default location for MySQL data is /var/lib/mysql.  You will be moving this to /data/mysql.
  • MySQL configuration is in /etc/my.cnf.  So that changes to the configuration move with failover, you should put my.cnf in /data/mysql and create a sym-link of /etc/my.cnf to this file.

Now comes the hurdle.

  • Install MySQL as you wish.
  • Move your data directory to a /data/mysql

On machine1

# mkdir /data/mysql
# chown  mysql.mysql /data/mysql
# cp –prv /var/lib/mysql/* /data/mysql
Start MySQL on machine1.
Create some sample database and table. Stop MySQL. Do a manual switchover of DRBD. Start MySQL on machine2 and query for that table. It should work. But, this is of no use if you have to switchover manually every time. When you have this working you are ready to move to Heartbeat.

Here are a couple of scripts to make this easy.

drdb-secondary

# service mysql stop
# umount /data
# drbdadm secondary db
# drdb-primary:
# drbdadm primary db
# mount -o rw /dev/drbd1 /data
# service mysql start


HA:

  • IMPORTANT: Heartbeat uses either Linux Services (LSB) Resource Agents or Heartbeat Resource Agents (HRA) to start and stop heartbeat resources. You will be adding MySQL (LSB), drbddisk (HRA) and IPaddr2 (HRA) are our heartbeat resources.
  • Refer this page on Resource Agent
  • As you are aware of it many *nix services are started using LSB Resource Agents. They are found in /etc/init.d

Installation:

On machine1 and machine2 install Heartbeat and needed utilities.  You may need to review the packages you have available using ‘yum list | grep drbd’.  These are for CentOS 5.5.  You may also need to reboot after this step.

# yum -y install gnutls*
# yum -y install ipvsadm*
# yum -y install heartbeat*
# yum -y install heartbeat.x86_64

Configuration:

Edit /etc/sysctl.conf and set net.ipv4.ip_forward = 1

# vi /etc/sysctl.conf

Controls IP packet forwarding net.ipv4.ip_forward = 1

# /sbin/chkconfig –level 2345 heartbeat on

# /sbin/chkconfig –del ldirectord


Configure HA:

You need to setup the following configuration files on both machines:

# vi /etc/ha.d/ha.cf

#/etc/ha.d/ha.cf content debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 694 # If you have multiple HA setup in same network.. use different ports bcast eth0 # Linux auto_failback on # This will failback to machine1 after it comes back ping 192.168.2.1 # The gateway apiauth ipfail gid=haclient uid=hacluster node db1.grennan.com node db2.grennan.com  

On both machines

NOTE: Assuming 192.168.2.15 is virtual IP for your MySQL resource and mysqld is the LSB resource agent. The host name (db2) should be the secondary server’s name.
# vi /etc/ha.d haresources

# /etc/ha.d/haresources content
db2.grennan.com LVSSyncDaemonSwap::master Paddr2::192.168.2.15/24/eth0  rbddisk::db Filesystem::/dev/drbd1::/data::ext3 mysqld

# vi /etc/ha.d/authkeys

#/etc/ha.d/authkeys content
auth 2
2 sha1 BigSecretKeyks9wjwlf9gskg905snvl

Now, make your authkeys secure:

# chmod 600 /etc/ha.d/authkeys


Check your work:

On both machines, one at a time, stop MySQL and make sure MySQL does not start when the system reboots (init 6).

If it does, you may need to remove it from the init process with:

# /sbin/chkconfig –level 2345 MySQL off

Start Heartbeat.

# service heartbeat start

These commands will give you status about this LVS setup:
# /etc/ha.d/resource.d/LVSSyncDaemonSwap master status
# ip addr sh
# service heartbeat status
# df
# service mysqld status

Access your HA-MySQL server like:
# mysql –h 192.168.2.15

Shutdown machine1 to see MySQL up on machine2. ‘shutdown now’

Start machine1 to see MySQL back on machine1.


PlanetMySQL Voting: Vote UP / Vote DOWN

Database Architectures &amp; Performance

Tue, 07/20/2010 - 11:12
For decades the debate between shared-disk and shared-nothing databases has raged. The shared-disk camp points to the laundry list of functional benefits such as improved data consistency, high-availability, scalability and elimination of partitioning/replication/promotion. The shared-nothing camp shoots back with superior performance and reduced costs. Both sides have a point.

First, let’s look at the performance issue. RAM (average access time of 200 nanoseconds) is considerably faster than disk (average access time of 12,000,000 nanoseconds). Let me put this 200:12,000,000 ratio into perspective. A task that takes a single minute in RAM would take 41 days in disk. So why do I bring this up?

Shared-Nothing: Since the shared-nothing database has sole ownership of its data—it doesn’t share the data with other nodes—it can operate in the machine’s local RAM, only writing infrequently to disk (flushing the data to disk). This makes shared-nothing databases very fast.

Shared-Disk: Cannot rely on the machine’s local RAM, because every write by one node must be instantly available to the other nodes, to ensure that they don’t use stale data and corrupt the database. So instead of relying on local RAM, all write transactions must be written to disk. This is where the 1 minute to 41 days ratio above comes into play and kills performance of shared-disk databases.

Let’s look at some of the ways databases can utilize RAM instead of disk to improve performance:

Read Cache: Databases typically use the RAM as a fast read cache. Upon reading data from the disk, this data is stored in the read cache so that subsequent use of that data is satisfied from RAM instead of the disk. For example, upon reading a person’s name from disk, that name is stored in the cache for fast access. The database wouldn’t need to read that name from disk again until that person’s name is changed (rare), or that RAM space is reused for a piece of data that is used more frequently. Read cache can significantly improve database performance.

BOTH shared-disk and shared-nothing databases can exploit read cache. The shared-disk database just needs a system to either invalidate or update the data in read cache when one of the nodes has made a change. This is pretty standard in shared-disk databases.

Background Writing: Writing data to the disk is by far the most time consuming process in a write transaction. During the transaction, that portion of the data is locked, meaning it is unavailable for other functions. So, if you can move the writing of the data outside of the transaction—write the data in the background—you get faster transactions, which means less locking contention, which means faster throughput.

SHARED-NOTHING can exploit this performance enhancement, since each server owns the data in its RAM. However, shared-disk databases cannot do this because they need to share that updated data with the other database nodes in the cluster. Since the local node’s cache is not shared, in a shared-disk database, the only option is to use the shared disk to share that data across the nodes.

Transactional Cache: The next step in utilizing RAM instead of disk is to use it in a transactional manner. This means that the database can make multiple changes to data in RAM prior to writing the final results to disk. For example, if you have 100 widgets, you can store that inventory count in RAM, and then decrement it with each sale. If you sell 23 widgets, then instead of writing each transaction to disk, you update it in RAM. When you flush this data to disk, it results in a single disk write, writing the inventory number 77, instead of writing each of the 23 transactions individually to disk.

SHARED-NOTHING can perform transactions on data while it is in RAM. Once again, shared-disk databases cannot do this because you might have multiple nodes updating the inventory. Since they cannot look into each others local RAM, they must once again write each transaction to disk.

As you can see, shared-nothing databases have an inherent performance advantage. The next blog post will address how modern shared-disk databases address these performance challenges.
PlanetMySQL Voting: Vote UP / Vote DOWN

DBJ – Mult-master MySQL Improves Manageability

Tue, 07/20/2010 - 05:00

Multi-master MySQL, with the MMM management software brings a whole host of new features, and manageability to your MySQL deployments.   Run backups, alter tables, perform upgrades all without slowing down your production users.

Read more at Database Journal – Using Multi-master MySQL To Get A Leg Up On Database Performance


PlanetMySQL Voting: Vote UP / Vote DOWN

INFORMATION_SCHEMA tables are case sensitive

Tue, 07/20/2010 - 03:11

I wanted to get examples of some of the extra information that the Percona server has in its INFORMATION_SCHEMA metadata, and in doing so, I stumbled across an interesting MySQL bug/feature/point — INFORMATION_SCHEMA tables (which are actually system views) are case sensitive when used in comparisons:

mysql> use information_schema; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select @@version; +--------------------+ | @@version | +--------------------+ | 5.1.36-xtradb6-log | +--------------------+ 1 row in set (0.00 sec) mysql> use information_schema; Database changed mysql> show tables like 'innodb%'; Empty set (0.00 sec) mysql> show tables like 'INNODB%'; +----------------------------------------+ | Tables_in_information_schema (INNODB%) | +----------------------------------------+ | INNODB_BUFFER_POOL_PAGES_INDEX | | INNODB_RSEG | | INNODB_LOCKS | | INNODB_BUFFER_POOL_PAGES | | INNODB_TRX | | INNODB_INDEX_STATS | | INNODB_LOCK_WAITS | | INNODB_CMP_RESET | | INNODB_CMP | | INNODB_CMPMEM_RESET | | INNODB_BUFFER_POOL_PAGES_BLOB | | INNODB_CMPMEM | | INNODB_TABLE_STATS | +----------------------------------------+ 13 rows in set (0.00 sec)

It is not just for the new tables Percona has added:

mysql> show tables like 'table%'; Empty set (0.00 sec) mysql> show tables like 'TABLE%'; +---------------------------------------+ | Tables_in_information_schema (TABLE%) | +---------------------------------------+ | TABLES | | TABLE_CONSTRAINTS | | TABLE_PRIVILEGES | +---------------------------------------+ 3 rows in set (0.00 sec)

And it is not due to the collation:

mysql> show global variables like '%collat%'; +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | latin1_swedish_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +----------------------+-------------------+ 3 rows in set (0.00 sec) mysql> show session variables like '%collat%'; +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | latin1_swedish_ci | | collation_database | utf8_general_ci | | collation_server | latin1_swedish_ci | +----------------------+-------------------+ 3 rows in set (0.00 sec)
PlanetMySQL Voting: Vote UP / Vote DOWN

MapReduce – DBInputFormat – Serialization on readers

Mon, 07/19/2010 - 21:46
Last week I was working on EC2 MySQL server where one of the slave is taking lot of time to catch-up; and only job that is running on that server is mapreduce job to access InnoDB tables for read-only meta data. And debugging it further, noticed that every access to database server is serialized with [...]
PlanetMySQL Voting: Vote UP / Vote DOWN

Rename Maria Contest Winner

Mon, 07/19/2010 - 19:52

After two months of submissions, Monty Program employee review, community voting and Monty’s final decision, we are happy to announce that the Maria storage engine will henceforth be known as …

Aria!

Congratulations to Chris Tooley who suggested the name. Chris said about Aria in his submission, “Maria without the ‘M’, plus aria is a pleasant musical term.” Chris is now the proud new owner of a System 76 Meerkat net-top computer. Thanks to our good friends at System76 for providing this nifty prize.

Hopefully, in time, “Aria” will also be a pleasing database engine term. And now we will not have the confusion between MariaDB and Maria.


PlanetMySQL Voting: Vote UP / Vote DOWN

Creative Commons License
This work by Mark Schoonover is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Based on a work at blog.thetajoin.com. All comments copyright their respective owners.

Drupal SEO