Wieści RSS
Planet MySQL
Planet MySQL - https://planet.mysql.com

  • What is MySQL partitioning ?
    MySQL partitioning makes data distribution of individual tables (typically we recommend partition for large & complex I/O table for performance, scalability and manageability) across multiple files based on partition strategy / rules. In very simple terms, different portions of table are stored as separate tables in different location to distribute I/O optimally. The user defined division of data by some rule is known as partition function, In MySQL we partition data by RANGE of values / LIST of values / internal hashing function / linear hashing function. By restricting the query examination on the selected partitions by matching rows increases the query performance by multiple times compared to the same query on a non partitioned table, This methodology is also called partition pruning (trimming of unwanted partitions), Please find below example of partition pruning: CREATE TABLE tab1 ( col1 VARCHAR(30) NOT NULL, col2 VARCHAR(30) NOT NULL, col3 TINYINT UNSIGNED NOT NULL, col4 DATE NOT NULL ) PARTITION BY RANGE( col3 ) ( PARTITION p0 VALUES LESS THAN (100), PARTITION p1 VALUES LESS THAN (200), PARTITION p2 VALUES LESS THAN (300), PARTITION p3 VALUES LESS THAN MAXVALUE ); Write a SELECT query benefitting partition pruning: SELECT col1, col2, col3, col4 FROM tab1 WHERE col3 > 200 AND col3 < 250; What is explicit partitioning in MySQL and how is it different from partition pruning ?  In MySQL we can explicitly select partition and sub-partitions when executing a statement matching a given WHERE condition, This sounds very much similar to partition pruning, but there is a difference: Partition to be checked are explicitly mentioned in the query statement, In partition pruning it is automatic. In explicit partition, the explicit selection of partitions is supported for both queries and DML statements, partition pruning applies only to queries. SQL statements supported in explicit partitioning – SELECT, INSERT, UPDATE, DELETE, LOAD DATA, LOAD XML and REPLACE Explicit partition example: CREATE TABLE customer ( cust_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, cust_fname VARCHAR(25) NOT NULL, cust_lname VARCHAR(25) NOT NULL, cust_phone INT NOT NULL, cust_fax INT NOT NULL ) PARTITION BY RANGE(cust_id) ( PARTITION p0 VALUES LESS THAN (100), PARTITION p1 VALUES LESS THAN (200), PARTITION p2 VALUES LESS THAN (300), PARTITION p3 VALUES LESS THAN MAXVALUE ); Query explicitly mentioning partition: mysql> SELECT * FROM customer PARTITION (p1); RANGE partitioningIn RANGE partitioning you can partition values within a given range, Ranges should be contiguous but not overlapping, usually defined by VALUES LESS THAN operator, The following examples explain how to create and use RANGE partitioning for MySQL performance: CREATE TABLE customer_contract( cust_id INT NOT NULL, cust_fname VARCHAR(30), cust_lname VARCHAR(30), st_dt DATE NOT NULL DEFAULT '1970-01-01', end_dt DATE NOT NULL DEFAULT '9999-12-31', contract_code INT NOT NULL, contract_id INT NOT NULL ) PARTITION BY RANGE (contract_id) ( PARTITION p0 VALUES LESS THAN (50), PARTITION p1 VALUES LESS THAN (100), PARTITION p2 VALUES LESS THAN (150), PARTITION p3 VALUES LESS THAN (200) ); For example, let us suppose that you wish to partition based on the year contract ended: CREATE TABLE customer_contract( cust_id INT NOT NULL, cust_fname VARCHAR(30), cust_lname VARCHAR(30), st_dt DATE NOT NULL DEFAULT '1970-01-01', end_dt DATE NOT NULL DEFAULT '9999-12-31', contract_code INT NOT NULL, contract_id INT NOT NULL ) PARTITION BY RANGE (year(end_dt)) ( PARTITION p0 VALUES LESS THAN (2001), PARTITION p1 VALUES LESS THAN (2002), PARTITION p2 VALUES LESS THAN (2003), PARTITION p3 VALUES LESS THAN (2004) ); It is also possible to partition a table by RANGE, based on the value of a TIMESTAMP column, using the UNIX_TIMESTAMP() function, as shown in this example: CREATE TABLE sales_forecast ( sales_forecast_id INT NOT NULL, sales_forecast_status VARCHAR(20) NOT NULL, sales_forecast_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) PARTITION BY RANGE ( UNIX_TIMESTAMP(sales_forecast_updated) ) ( PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ), PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-04-01 00:00:00') ), PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-07-01 00:00:00') ), PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-10-01 00:00:00') ), PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-01-01 00:00:00') ), PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-04-01 00:00:00') ), PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-07-01 00:00:00') ), PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-10-01 00:00:00') ), PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2010-01-01 00:00:00') ), PARTITION p9 VALUES LESS THAN (MAXVALUE) ); LIST partitioningThe difference between RANGE and LIST partitioning is: In LIST partitioning, each partition is grouped on the selected list of values of a specific column. You can do it by PARTITION BY LIST (EXPR) where EXPR is the selected column for list partition, We have explained LIST partitioning with example below: CREATE TABLE students ( student_id INT NOT NULL, student_fname VARCHAR(30), student_lname VARCHAR(30), student_joined DATE NOT NULL DEFAULT '1970-01-01', student_separated DATE NOT NULL DEFAULT '9999-12-31', student_house INT, student_grade_id INT ) PARTITION BY LIST(student_grade_id) ( PARTITION P1 VALUES IN (1,2,3,4), PARTITION P2 VALUES IN (5,6,7), PARTITION P3 VALUES IN (8,9,10), PARTITION P4 VALUES IN (11,12) ); HASH partitioningHASH partitioning makes an even distribution of data among predetermined number of partitions, In RANGE and LIST partitioning you must explicitly define the partitioning logic and which partition given column value or set of column values are stored. In HASH partitioning MySQL take care of this, The following example explains HASH partitioning better: CREATE TABLE store ( store_id INT NOT NULL, store_name VARCHAR(30), store_location VARCHAR(30), store_started DATE NOT NULL DEFAULT '1997-01-01', store_code INT ) PARTITION BY HASH(store_id) PARTITIONS 4; P.S. : If you do not include a PARTITIONS clause, the number of partitions defaults to 1. LINEAR HASH partitioningThe LINEAR HASH partitioning utilizes a linear powers-of-two algorithm, Where HASH partitioning employs the modulus of the hashing function’s value. Please find below LINEAR HASH partitioning example: CREATE TABLE store ( store_id INT NOT NULL, store_name VARCHAR(30), store_location VARCHAR(30), store_started DATE NOT NULL DEFAULT '1997-01-01', store_code INT ) PARTITION BY LINEAR HASH( YEAR(store_started) ) PARTITIONS 4; KEY partitioningKEY partitioning is very much similar to HASH, the only difference is, the tasing function for the KEY partitioning is supplied by MySQL, In case of MySQL NDB Cluster, MD5() is used, For tables using other storage engines, the MySQL server uses the storage engine specific hashing function which os based on the same algorithm as PASSWORD(). CREATE TABLE contact( id INT NOT NULL, name VARCHAR(20), contact_number INT, email VARCHAR(50), UNIQUE KEY (id) ) PARTITION BY KEY() PARTITIONS 5; P.S. – if the unique key column were not defined as NOT NULL, then the previous statement would fail. SubpartitioningSUBPARTITIONING  is also known as composite partitioning, You can partition table combining RANGE and HASH for better results, The example below explains SUBPARTITIONING better: CREATE TABLE purchase (id INT, item VARCHAR(30), purchase_date DATE) PARTITION BY RANGE( YEAR(purchase_date) ) SUBPARTITION BY HASH( TO_DAYS(purchase_date) ) SUBPARTITIONS 2 ( PARTITION p0 VALUES LESS THAN (2000), PARTITION p1 VALUES LESS THAN (2010), PARTITION p2 VALUES LESS THAN MAXVALUE ); It is also possible to define subpartitions explicitly using SUBPARTITION clauses to specify options for individual subpartitions: CREATE TABLE purchase (id INT, item VARCHAR(30), purchase_date DATE) PARTITION BY RANGE( YEAR(purchase_date) ) SUBPARTITION BY HASH( TO_DAYS(purchase_date) ) ( PARTITION p0 VALUES LESS THAN (2000) ( SUBPARTITION s0, SUBPARTITION s1 ), PARTITION p1 VALUES LESS THAN (2010) ( SUBPARTITION s2, SUBPARTITION s3 ), PARTITION p2 VALUES LESS THAN MAXVALUE ( SUBPARTITION s4, SUBPARTITION s5 ) ); Things to remember: Each partition must have the same number of subpartitions. Each SUBPARTITION clause must include (at a minimum) a name for the subpartition. Otherwise, you may set any desired option for the subpartition or allow it to assume its default setting for that option. Subpartition names must be unique across the entire table. For example, the following CREATE TABLE statement is valid in MySQL 5.7: CREATE TABLE purchase (id INT, item VARCHAR(30), purchase_date DATE) PARTITION BY RANGE( YEAR(purchase_date) ) SUBPARTITION BY HASH( TO_DAYS(purchase_date) ) ( PARTITION p0 VALUES LESS THAN (1990) ( SUBPARTITION s0, SUBPARTITION s1 ), PARTITION p1 VALUES LESS THAN (2000) ( SUBPARTITION s2, SUBPARTITION s3 ), PARTITION p2 VALUES LESS THAN MAXVALUE ( SUBPARTITION s4, SUBPARTITION s5 ) ); MySQL partitioning limitationsMySQL partitioning also has limitations, We are listing down below the limitations of MySQL partitioning: A PRIMARY KEY must include all columns in the table’s partitioning function: CREATE TABLE tab3 ( column1 INT NOT NULL, column2 DATE NOT NULL, column3 INT NOT NULL, column4 INT NOT NULL, UNIQUE KEY (column1, column2), UNIQUE KEY (column3) ) PARTITION BY HASH(column1 + column3) PARTITIONS 4; Expect this error after running above script – ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table’s partitioning function The right way of doing it: CREATE TABLE table12 ( column1 INT NOT NULL, column2 DATE NOT NULL, column3 INT NOT NULL, column4 INT NOT NULL, UNIQUE KEY (column1, column2, column3) ) PARTITION BY HASH(column3) PARTITIONS 5; CREATE TABLE table25 ( column11 INT NOT NULL, column12 DATE NOT NULL, column13 INT NOT NULL, column14 INT NOT NULL, UNIQUE KEY (column11, column13) ) PARTITION BY HASH(column11 + column13) PARTITIONS 5; Most popular limitation of MySQL – Primary key is by definition a unique key, this restriction also includes the table’s primary key, if it has one. The example below explains this limitation better: CREATE TABLE table55 ( column11 INT NOT NULL, column12 DATE NOT NULL, column13 INT NOT NULL, column14 INT NOT NULL, PRIMARY KEY(column11, column12) ) PARTITION BY HASH(column13) PARTITIONS 4; CREATE TABLE table65 ( column20 INT NOT NULL, column25 DATE NOT NULL, column30 INT NOT NULL, column35 INT NOT NULL, PRIMARY KEY(column20, column30), UNIQUE KEY(column25) ) PARTITION BY HASH( YEAR(column25) ) PARTITIONS 5; Both of the above scripts will return this error – ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table’s partitioning function The right way of doing it: CREATE TABLE t45 ( column50 INT NOT NULL, column55 DATE NOT NULL, column60 INT NOT NULL, column65 INT NOT NULL, PRIMARY KEY(column50, column55) ) PARTITION BY HASH(column50 + YEAR(column55)) PARTITIONS 5; CREATE TABLE table88 ( column80 INT NOT NULL, column81 DATE NOT NULL, column82 INT NOT NULL, column83 INT NOT NULL, PRIMARY KEY(column80, column81, column82), UNIQUE KEY(column81, column82) ); In above example, the primary key does not include all columns referenced in the partitioning expression. However, both of the statements are valid ! You can still successfully partition a MySQL table without unique keys – this also includes having no primary key and you may use any column or columns in the partitioning expression as long as the column type is compatible with the partitioning type, The example below shows partitioning a table with no unique / primary keys: CREATE TABLE table_has_no_pk (column10 INT, column11 INT, column12 varchar(20)) PARTITION BY RANGE(column10) ( PARTITION p0 VALUES LESS THAN (500), PARTITION p1 VALUES LESS THAN (600), PARTITION p2 VALUES LESS THAN (700), PARTITION p3 VALUES LESS THAN (800) ); You cannot later add a unique key to a partitioned table unless the key includes all columns used by the table’s partitioning expression, The example below explains this much better: ALTER TABLE table_has_no_pk ADD PRIMARY KEY(column10); ALTER TABLE table_has_no_pk drop primary key; ALTER TABLE table_has_no_pk ADD PRIMARY KEY(column10,column11); ALTER TABLE table_has_no_pk drop primary key; However, the next statement fails, because column10 is part of the partitioning key, but is not part of the proposed primary key: mysql> ALTER TABLE table_has_no_pk ADD PRIMARY KEY(column11); ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function mysql> MySQL partitioning limitations (at storage engine level) InnoDB InnoDB foreign keys and MySQL partitioning are not compatible, Partitioned InnoDB tables cannot have foreign key references, nor can they have columns referenced by foreign keys, So you cannot partition InnoDB tables which have or referenced by foreign keys. InnoDB does not support use of multiple disks for subpartition (MyISAM supports this feature) Use ALTER TABLE … REBUILD PARTITION and ALTER TABLE … ANALYZE PARTITION than using ALTER TABLE … OPTIMIZE PARTITION NDB storage engine  We can only partition by KEY (including LINEAR KEY) in NDB storage engine. FEDERATED storage engine  Partitioning not supported in FEDERATED storage engine. CSV storage engine Partitioning not supported in CSV storage engine. MERGE storage engine  Tables using the MERGE storage engine cannot be partitioned. Partitioned tables cannot be merged. MySQL functions shown in the following list are allowed in partitioning expressions: ABS() CEILING() DATEDIFF() DAY() DAYOFMONTH() DAYOFWEEK() DAYOFYEAR() EXTRACT() FLOOR() HOUR() MICROSECOND() MINUTE() MOD() MONTH() QUARTER() SECOND() TIME_TO_SEC() TO_DAYS() TO_SECONDS() UNIX_TIMESTAMP() WEEKDAY() YEAR() YEARWEEK() MySQL partitioning and locks  Effect on DML statements In MySQL 5.7, updating a partitioned MyISAM table cause only the affected partitioned to be locked. SELECT statements (including those containing unions or joins) lock only those partitions that actually need to be read. This also applies to SELECT …PARTITION. An UPDATE prunes locks only for tables on which no partitioning columns are updated. REPLACE and INSERT lock only those partitions having rows to be inserted or replaced. However, if an AUTO_INCREMENT value is generated for any partitioning column then all partitions are locked. INSERT … ON DUPLICATE KEY UPDATE is pruned as long as no partitioning column is updated. INSERT … SELECT locks only those partitions in the source table that need to be read, although all partitions in the target table are locked. Locks imposed by LOAD DATA statements on partitioned tables cannot be pruned. Effect on DML statements CREATE VIEW does not cause any locks. ALTER TABLE … EXCHANGE PARTITION prunes locks; only the exchanged table and the exchanged partition are locked. ALTER TABLE … TRUNCATE PARTITION prunes locks; only the partitions to be emptied are locked. In addition, ALTER TABLE statements take metadata locks on the table level. Effect on other statements LOCK TABLES cannot prune partition locks. CALL stored_procedure(expr) supports lock pruning, but evaluating expr does not. DO and SET statements do not support partitioning lock pruning.   The post What is MySQL partitioning ? appeared first on MySQL Consulting, Support and Remote DBA Services.

  • When Database Warm Up is Not Really UP
    The common wisdom with database performance management is that a “cold” database server has poor performance. Then, as it “warms up”, performance improves until finally you reach a completely warmed up state with peak database performance. In other words, that to get peak performance from MySQL you need to wait for database warm up. This thinking comes from the point of view of database cache warmup. Indeed from the cache standpoint, you start with an empty cache and over time the cache is filled with data. Moreover the longer the database runs, the more statistics about data access patterns it has, and the better it can manage database cache contents. Over recent years with the rise of SSDs, cache warmup has become less of an issue. High Performance NVMe Storage can do more than 1GB/sec read, meaning you can warm up a 100GB database cache in less than 2 minutes. Also, SSD IO latency tends to be quite good so you’re not paying as high a penalty for a higher miss rate during the warm up stage. It is not all so rosy with database performance over time. Databases tend to delay work when possible, but there is only so much delaying you can do. When the database can’t delay work any longer performance tends to be negatively impacted. Here are some examples of delaying work: Checkpointing: depending on the database technology and configuration, checkpointing may be delayed for 30 minutes or more after database start Change Buffer (Innodb) can delay index maintenance work Pushing Messages from Buffers to Leaves (TokuDB) can be delayed until space in the buffers is exhausted Compaction for RocksDB and other LSM-Tree based system can take quite a while to reach steady state In all these cases database performance can be a lot better almost immediately after start compared to when it is completely “warmed up”. An experiment with database warm up Let’s illustrate this with a little experiment running Sysbench with MySQL and Innodb storage engine for 15 minutes: sysbench --db-driver=mysql --threads=200 --rand-type=uniform --report-interval=10 --percentile=99 --time=900 --mysql-user=root --mysql-password= /usr/share/sysbench/oltp_update_index.lua --table_size=100000000 run Let’s look in detail at what happens during the run using graphs from Percona Monitoring and Management As you can see the number of updates/sec we’re doing actually gets worse (and more uneven) after the first 3 minutes, while a jump to peak performance is almost immediate The log space usage explains some of this—in the first few minutes, we did not need to do as aggressive flushing as we had to do later. On the InnoDB I/O graph we can see a couple of interesting things. First, you can see how quickly warm up happens—in 2 minutes the IO is already at half of its peak. You can also see the explanation for the little performance dip after its initial high performance (around 19:13)—this is where we got close to using all log space, so active flushing was required while, at the same time, a lot of IO was still needed for cache warmup. Reaching Steady State is another term commonly used to describe the stage after warm up completes. Note though that such steady state is not guaranteed to be steady at all. In fact, the most typical steady state is unsteady. For example, you can see in this blog post both InnoDB and MyRocks have quite a variance. Summary While the term database warm up may imply performance after warm up will be better, it is often not the case. “Reaching Steady State” is a better term as long as you understand that “steady” does not mean uniform performance.   The post When Database Warm Up is Not Really UP appeared first on Percona Database Performance Blog.

  • How to Automate Minor Version Upgrades for MySQL on RDS
    Amazon RDS for MySQL offers the option to automate minor version upgrades using the minor version upgrade policy, a property that lets you decide if Amazon is allowed to perform the upgrades on your behalf. Usually the goal is not to upgrade automatically every RDS instance but to keep up to date automatically non-production deployments. This helps you address engine issues as soon as possible and improve the automation of the deployment process. If your are using the AWS Command Line Interface (CLI) and you have an instance called test-rds01 it is as simple as changing [--auto-minor-version-upgrade | --no-auto-minor-version-upgrade] For example: aws rds modify-db-instance --db-instance-identifier test-rds01 --apply-immediately --auto-minor-version-upgrade true And if you use the AWS Management Console, it is just a check box.  All sorted? Unfortunately not. The main problem is that Amazon performs those upgrade only in rare circumstances. As for Amazon’s documentation: Minor version upgrades only occur automatically if a minor upgrade replaces an unsafe version, such as a minor upgrade that contains bug fixes for a previous version. In all other cases, you must modify the DB instance manually to perform a minor version upgrade. If the new version fixes any vulnerabilities that were present in the previous version, then the auto minor version upgrade will automatically take place during the next weekly maintenance window on your DB instance. In all other cases, you should manually perform the minor version upgrade. So in most scenarios, the automatic upgrade is unlikely to happen and using the auto-minor-version-upgrade  attribute is not the way to keep your MySQL running on RDS updated to the latest available minor version.How to improve automation of minor version upgrades Amazon RDS for MySQL Let’s say you want to reduce the time a newer minor version reaches your development environments or even your production ones. How can you achieve that on RDS? First of all you have to consider the delay it takes for a minor version to reach RDS that can be anything between a few weeks and a few months.  And you might even not notice that a new minor is available as it is not obvious how to be notified when it is. What is the best way to be notified of new minor versions available on RDS MySQL? In the past you could (even automatically) monitor the release notes page but the page is not anymore used for RDS. Now you have to monitor the database announcement page, something that you can hardly automate. Any way to speed up the minor version upgrades? You can use the AWS CLI invoking the describe-db-engine-versions API or write a simple Lambda function to retrieve the latest available minor version and act accordingly: you can, for example, notify your team of DBAs using Amazon Simple Notification Service (SNS) or you can automatically upgrade the instance. Let’s first see how to achieve that using the command line: aws --profile sandbox rds describe-db-engine-versions --engine 'mysql' --engine-version '5.7' --query "DBEngineVersions[-1].EngineVersion" where the -1 in the array let you filter out the very latest version of the engine available on RDS. Today the result is “5.7.21” and a simple cron job will monitor and can trigger notification for changes. Note that the same approach can be used to retrieve the latest available minor version for engines running MySQL 5.5 and MySQL 5.6. And PostgreSQL engines too. If you want to automatically and immediately upgrade your instance, the logic can be easily done in a few lines in bash with a cron on a EC2. For example, the following function requires only the database instance identifier: rds_minor_upgrade() { rds_endpoint=$1 engine_version="5.7" rds_current_minor=$(aws rds describe-db-instances --db-instance-identifier="$rds_endpoint" --query "DBInstances[].EngineVersion") rds_latest_minor=$(aws rds describe-db-engine-versions -- engine 'mysql' --engine-version $eng_version --query "DBEngineVersions[-1].EngineVersion") if [ "$rds_latest_minor" != "$rds_current_minor" ]; then aws rds modify-db-instance --apply-immediately --engine-version $rds_latest_minor --db-instance-identifier $rds_endpoint fi } Alternatively you can write the code as a scheduled Lambda function in your favourite language. For example, using the AWS node.js SDK you can manage RDS and implement the logic above using the rds.describeDBEngineVersions andrds.modifyDBInstance to achieve the same. (...) rds.describeDBEngineVersions(params, function(err, data) { (...) }); (...) var params = { DBInstanceIdentifier: 'test-rds01', ApplyImmediately: true, EngineVersion: '<new minor version>', (...) }; rds.modifyDBInstance(params, function(err, data) { (...) }); Speed up your minor upgrade! To summarize, Amazon Web Services does not offer a real way to automatically upgrade a RDS instance to the latest available minor in the most common scenarios, but it is very easy to achieve that by taking advantage of the AWS CLI or the many SDKs. The goal is not to upgrade automatically every deployment. You would not normally use this for production deployments. However, being able to monitor the latest available minor version on RDS and apply the changes automatically for development and staging deployment can significantly reduce the time it takes to have MySQL up to date on RDS and make your upgrade process more automated.   The post How to Automate Minor Version Upgrades for MySQL on RDS appeared first on Percona Community Blog.

  • Percona Live Europe 2018 Call for Papers is Now Open
    Announcing the opening of the Percona Live Europe Open Source Database Conference 2018 in Frankfurt, Germany call for papers. It will be open from now until August 10, 2018. Our theme this year is Connect. Accelerate. Innovate. As a speaker at Percona Live Europe, you’ll have the opportunity to CONNECT with your peers—open source database experts and enthusiasts who share your commitment to improving knowledge and exchanging ideas. ACCELERATE your projects and career by presenting at the premier open source database event, a great way to build your personal and company brands. And influence the evolution of the open source software movement by demonstrating how you INNOVATE! Community initiatives remain core to the open source ethos, and we are proud of the contribution we make with Percona Live Europe in showcasing thought leading practices in the open source database world. With a nod to innovation, for the first time, this year we are introducing a business track to benefit those business leaders who are exploring the use of open source and are interested in learning more about its costs and benefits. Speaking Opportunities The Percona Live Europe Open Source Database Conference 2018 Call for Papers is open until August 10, 2018. We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions. Classes and talks are invited for Foundation (either entry level or of general interest to all), Core (intermediate) and Masterclass (advanced) levels. If selected, you will receive a complimentary full conference pass. Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A). Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. We encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A). Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration. Topics and Tracks We want proposals that cover the many aspects of application development using all open source databases, as well as new and interesting ways to monitor and manage database environments. Did you just embrace open source databases this year? What are the technical and business values of moving to or using open source databases? How did you convince your company to make the move? Was there tangible ROI? Best practices and current trends, including design, application development, performance optimization, HA and clustering, cloud, containers and new technologies, as well as new and interesting ways to monitor and manage database environments—what’s holding your focus? Share your case studies, experiences and technical knowledge with an engaged audience of open source peers. In the submission entry you will be asked to indicate which of these tracks your proposal best fits: tutorial, business needs; case studies/use cases; operations; or developer. A few ideas The conference committee is looking for proposals that cover the many aspects of using, deploying and managing open source databases, including: Open source – Describe the technical and business values of moving to or using open source databases. How did you convince your company to make the move? Was there tangible ROI? Security – All of us have experienced security challenges. Whether they are initiated by legislature (GDPR), bugs (Meltdown/Spectre), experience (external attacks) or due diligence (planning for the worst), when do you have ‘enough’ security? Are you finding that security requirements are preventing your ability to be agile? Serverless, Cloud or On-Premise – The technology landscape is no longer a simple one, and mixing infrastructures has almost become the norm. Are you designing data architectures for the new landscape, and eager to share your experience? Have microservices become an important part of your plans? MySQL – Do you have an opinion on what is new and exciting in MySQL? With the release of MySQL 8.0, are you using the latest features? How and why? Are they helping you solve any business issues, or making deployment of applications and websites easier, faster or more efficient? Did the new release get you to change to MySQL? What do you see as the biggest impact of the MySQL 8.0 release? Do you use MySQL in conjunction with other databases in your environment? MongoDB – How has the 3.6 release improved your experience in application development or time-to-market? How are the new features making your database environment better? What is it about MongoDB 4.0 that excites you? What are your experiences with Atlas? Have you moved to it, and has it lived up to its promises? Do you use MongoDB in conjunction with other databases in your environment? PostgreSQL – Why do you use PostgreSQL as opposed to other SQL options? Have you done a comparison or benchmark of PostgreSQL vs. other types of databases related to your tasks? Why and what were the results? How does PostgreSQL help you with application performance or deployment? How do you use PostgreSQL in conjunction with other databases in your environment? SQL, NewSQL, NoSQL – It’s become a perennial question without an easy answer. How do databases compare, how do you choose the right technology for the job, how do you trade off between features and their benefits in comparing databases? If you have ever tried a hybrid database approach in a single application, how did that work out? How nicely does MongoDB play with MySQL in the real world? Do you have anything to say about using SQL with NoSQL databases? High Availability – What choices are you making to ensure high availability? How do you find the balance between redundancy and cost? Are you using hot backups, and if so, what happened when you needed to rollback on them? Scalability – When did you recognize you needed to address data scale? Did your data growth take you by surprise or were you always in control? Did it take a degradation in performance to get your management to sit up and take notice? How do you plan for scale if you can’t predict demand? What the Future Holds – What do you see as the “next big thing”? What new and exciting features are going to be released? What’s in your next release? What new technologies will affect the database landscape? AI? Machine learning? Blockchain databases? Let us know about innovations you see on the way. How to respond to the call for papers For information on how to submit your proposal visit our call for papers page. The conference web pages will be updated throughout the next few weeks and bios, synopsis and slides will be published on those pages after the event. Sponsorship If you would like to obtain a sponsor pack for Percona Live Europe Open Source Database Conference 2018, you will find more information including a prospectus on our sponsorship page. You are welcome to contact me, Bronwyn Campbell, directly. The post Percona Live Europe 2018 Call for Papers is Now Open appeared first on Percona Database Performance Blog.

  • MariaDB Galera cluster and GTID
    In MariaDB 10.2.12, these two don’t yet work together. GTID = Global Transaction ID.  In the master-slave asynchronous replication realm, this means that you can reconnect a slave to another server (change its master) and it’ll happily continue replicating from the correct point.  No more fussing with filenames and offsets (which of course will both differ on different machines). So in concept the GTIID is “globally” unique – that means it’s consistent across an entire infra: a binlogged write transaction will have the same GTID no matter on which machine you look at it. OK: if you are transitioning from async replication to Galera cluster, and have a cluster as slave of the old infra, then GTID will work fine. PROBLEM: if you want to run an async slave in a Galera cluster, GTID will currently not work. At least not reliably. The overview issue is MDEV-10715, the specific problem is documented in MDEV-14153 with some comments from me from late last week. MDEV-14153 documents cases where the GTID is not in fact consistent – and the way in which it isn’t is most disturbing. The issue appears as “drift”. A GTID is made up of R-S-# where R is replication domain (0 unless set by an app), S for server-id where the write was originally done, and # which is just a number. The only required rule for the # is that that each next event has to have a higher number than the previous.  In principle there could be #s missing, that’s ok. In certain scenarios, the # part of the GTID falls behind on the “other nodes” in the Galera cluster. There was the node where the statement was first issued, and then there are the other nodes which pick up the change through the Galera (wsrep) cluster mechanism. Those other nodes.  So at that point, different nodes in the cluster have different GTIDs for the same query. Not so great. To me, this looked like a giant red flag though: if a GTID is assigned on a commit, and then replicated through the cluster as part of that commit, it can’t change. Not drift, or any other change. So the only possible conclusion must be that it is in fact not passed through the cluster, but “reinvented” by a receiving cluster node, which simply assumes that the current event from a particular server-id is previous-event id + 1.  That assumption is false, because as I mentioned above it’s ok for gaps to exist.  As long as the number keeps going up, it’s fine. Here is one of the simplest examples of breakage (extract from a binlog, with obfuscated table names): # at 12533795 #180704 5:00:22 server id 1717 end_log_pos 12533837 CRC32 0x878fe96e GTID 0-1717-1672559880 ddl /*!100001 SET @@session.gtid_seq_no=1672559880*//*!*/; # at 12533837 #180704 5:00:22 server id 1717 end_log_pos 12534024 CRC32 0xc6f21314 Query thread_id=4468 exec_time=0 error_code=0 SET TIMESTAMP=1530644422/*!*/; SET @@session.time_zone='SYSTEM'/*!*/; DROP TEMPORARY TABLE IF EXISTS `qqq`.`tmp_foobar` /* generated by server */ /*!*/; Fact: temporary tables are not replicated (imagine restarting a slave, it wouldn’t have whatever temporary tables were supposed to exist). So, while this event is stored in the binary log (which it is to ensure that if you replay the binlog on a machine, it correctly drops the temporary table after creating and using it), it won’t go through a cluster.  Remember that Galera cluster is essentially a ROW-based replication scheme; if there are changes in non-temporary tables, of course they get replicated just fine.  So if an app creates a temporary table, does some calculations, and then inserts the result of that into a regular table, the data of that last bit will get replicated. As it should. In a nutshell, as far as data consistency goes, we’re all fine. But the fact that we have an event that doesn’t really get replicated creates the “fun” in the “let’s assume the next event is just the previous + 1” logic. This is where the drift comes in. Sigh. In any case, this issue needs to be fixed by let’s say “being re-implemented”: the MariaDB GTID needs to be propagated through the Galera cluster, so it’s the same on every server, as it should be. Doing anything else is always going to go wrong somewhere, so trying to catch more cases like the above example is not really the correct way to go. If you are affected by this or related problems, please do vote on the relevant MDEV issues. That is important!  If you need help tracking down problems, feel free to ask.  If you have more information on the matter, please comment too!  I’m sure this and related bugs will be fixed, there are very capable developers at MariaDB Corp and Codership Oy (the Galera company). But the more information we can provide, the better. It often helps with tracking down problems and creating reproducible test cases.