Uploaded image for project: 'XenServer Org'
  1. XenServer Org
  2. XSO-445

Import/Export speed is a nightmare

    Details

    • Type: Bug
    • Status: Done (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: All
    • Fix Version/s: None
    • Component/s: API, Networking, Storage, XenCenter
    • Labels:
      None
    • Environment:

      XenServer 6.0-6.5 SP1 (fully patched), Windows Server 2008, XenCenter 6.0.x to 6.6.90.3063
      2x 10 GE (LACP trunk) to 1x 10 GE at the Backup System

      Description

      Since this topic is nearly as old as XenServer itself and noone cared about it until now, I declared it as a big bug, because it's impacts are from annoying time-waste to broken backups/backup windows.
      We have an MS Exchange 2007 VM with a database of about 400 GB + OS etc. and usually export the whole VM once a month to have a disaster recovery, which could be imported on any XenServer, if something really bad happens.
      The export-process now takes ridiculous 12+ (twelve!) hours.

      The environment details are as following:
      XenServer 6.5 SP1 (upgraded since about 6.0)
      2x Xeon E5-2690 (2.9-3.8 GHz octacore)
      128 GB RAM
      7x 600 GB 2.5" 10k SAS via P420i/2 GB as LVM, local storage repo
      Dom0 set to 4 GB (tested with 3 GB before, will see if it makes a difference, which I don't expect)

      Backupserver:
      2x Xeon E5450 (3 GHz QuadCore)
      24 GB RAM
      6x 600 GB 3.5" 15k SAS (Seagate Cheetah 15k[6/7]) as RAID 5

      To decrease the total time to export (it'd take days if not), I parallelize the exports to 2 exports at once. That means I start with Exchange and iterate through the list of VMs - with the result, that while exporting Exchange, all other VMs are getting exported and the Exchange export is still running and script is waiting for it, to continue moving stuff to tape then.

      I'm triggering the exports via xe.exe from XenCenter (currently tying Dundee Beta 2 v. 6.6.90.3063) by a small windows batch file.
      The main part is just basics:
      %XenServer% vm-export vm=%1 compress=false filename=%BackupPath2%%1.xva

      Means, for speed-reasons, I disable compression - as the tapedrive does it itself and the Exchange-DB and other VMs probably won't compress that good anyways.
      From time to time, I clean VMs by using "sdelete -z" to zero empty space.

      What I could see:
      xe.exe consumes only core at the backupsystem, means everything is single-threadded - whyever.
      xe.exe from Exchange export, is usually hitting it's one-core-max-usage, what seems to limit that.
      Dom0 has a load average of, currently: 3.12, 3.04, 2.98
      stunnel consumes most time by about 100-150% and xapi about 40-50%, another process, "fe" (whatever it is) takes ~10-12 and then some drops of performance for tapdisk/qemu-dm processes.

      Measured by iostat, tps are mostly between 300 and 400, with peaks up to 1200+ and a read-speed of 20-40 MB/s with short peaks up to over 350 MBs.
      Load on the backupsystem is jumping around a bit:
      from 200-800 mbit/s, disk at 20-100 MB/s and a total CPU-usage of about 33%.
      CPU-usage is mostly constat around 30%.
      Network currently holds 500-700 mbit/s, HDDs time of max. activity jumps between about 9 and 90%, very rarely hitting 100%.

      Attached a screenshot while exporting (backupserver at the left).
      The blue lines at "Datenträger" (Data disks) is the saturation of the RAID set, green is the amount of data transfered.
      What we could see from the screenshot:
      1. RAIDs are not constantly at their limits
      2. Network is far away from limiting anything

      That leaves, that there is some optimisation at the CPU-usage/process handling needed.
      Probably some multi-tasking stuff for xe.exe, that splits the various steps needet to convert data from XenServer(API?) to a .xva file on the harddisk.

      For an imagination of how many ppl. already tried to fix that or have been that much affected, that they tried fo find help at the forums, there is an over-the-years-growed thread:
      https://discussions.citrix.com/topic/237200-vm-importexport-performance/
      (So imagine how many ppl. just didn't post/only read but still are affected)

      I'm willing to make more analyzes or giving needed details/doing export tests with a prepared VM or whatever you need, but please, please, please work on that, since it's a major bug for using and maintaining XenServer and by growing VMs and datastores, that problem won't decrease.

      I hope you understand our (see the list of ppl. contributed to the thread) pain.

      Regards

      • Christof
      1. xapi-strace1.tar.bz2
        132 kB
        Christof Giesers
      2. xapi-strace2.tar.bz2
        701 kB
        Christof Giesers
      1. 2016-01-09 16_46_34-load_overview.png
        125 kB
      2. 2016-04-21 14_19_00-XenCenter.png
        3 kB
      3. 2016-04-26 20_41_32-Import XVA.png
        8 kB
      4. BrokenCIFS-sambaShare.png
        15 kB
      5. BrokenCIFS-sambaShare.png
        15 kB
      6. Capture.JPG
        55 kB
      7. export7.0.1.png
        40 kB
      8. exports-12790c-devSnapshot.png
        14 kB
      9. ExportXEdom0_firstBroken_2ndAsExpected.png
        23 kB
      10. ExportXEdom0.png
        75 kB
      11. ExportXEexe_win.png
        90 kB
      12. ExportXenCenter.png
        89 kB
      13. TaskmanagerCIFSTaget_dom0-XEused.png
        12 kB
      14. TaskmanagerCIFSTaget_Win-XEused.png
        22 kB
      15. TaskmanagerCIFSTaget_Win-XEused2.png
        48 kB
      16. XS701benchmarks.JPG
        121 kB

        Activity

        Hide
        codedmind codedmind added a comment - - edited

        @David i don't want be offensive in any way, and i'm glad and grateful for all our work.
        I still have one pool with 2 servers 6.5 and one other server with 7 running xenserver, but like i said in previous post i must decide my infra structure future, i will get three servers in the next weeks.

        I just can't wait more time, this problem is very old, only a few months ago xenserver team assume it, is a fact that some improvements have been made, but for my production enviroment is a risk keep waiting with this to be solved, ours Vm's are bigger every day, HPE (we bought hpe servers) keep unsupport msa with xenserver... so that is my decision.

        Hope you understand and don't get me wrong.

        Show
        codedmind codedmind added a comment - - edited @David i don't want be offensive in any way, and i'm glad and grateful for all our work. I still have one pool with 2 servers 6.5 and one other server with 7 running xenserver, but like i said in previous post i must decide my infra structure future, i will get three servers in the next weeks. I just can't wait more time, this problem is very old, only a few months ago xenserver team assume it, is a fact that some improvements have been made, but for my production enviroment is a risk keep waiting with this to be solved, ours Vm's are bigger every day, HPE (we bought hpe servers) keep unsupport msa with xenserver... so that is my decision. Hope you understand and don't get me wrong.
        Hide
        davidcot David Cottingham added a comment -

        codedmind Understood, and no worries – and I know that the issue has been around for a long while.

        One thought on the question of upgrading: if you're purchasing new hardware, then you could take the next release when it comes out, put it on that hardware, upgrade your existing servers to it, and then move your VMs around with the increased performance. At which point it becomes easy to take the old servers, wipe them and install the next release from clean, with the new partition layout.

        Clearly that still means that you'd need to wait for the next release, and also that anyone doing this would need new hardware, but I wanted to put it out there as a possibility.

        Re HPE and MSA arrays, yep, understood. It's an HPE decision that we've talked to them in the past about, but no real movement, unfortunately.

        Show
        davidcot David Cottingham added a comment - codedmind Understood, and no worries – and I know that the issue has been around for a long while. One thought on the question of upgrading: if you're purchasing new hardware, then you could take the next release when it comes out, put it on that hardware, upgrade your existing servers to it, and then move your VMs around with the increased performance. At which point it becomes easy to take the old servers, wipe them and install the next release from clean, with the new partition layout. Clearly that still means that you'd need to wait for the next release, and also that anyone doing this would need new hardware, but I wanted to put it out there as a possibility. Re HPE and MSA arrays, yep, understood. It's an HPE decision that we've talked to them in the past about, but no real movement, unfortunately.
        Hide
        cgiesers Christof Giesers added a comment -

        The HPE-thing is easy: They say there's no effort for HP, because almost noone is requesting the support. Since we already had some voices here, about ppl. that would want to use it... I can't follow that argument.
        I've managed to get contact to the european "Category Manager - Storage", through 'Storage-Guy' Calvin Zito, and he told me that. The cerfitication would cost HPE money and they don't want to invest that money, if ppl. won't demand that.
        As solution, we will add that HP will help solving problems even when used XenServer to our approved HPE-project and will be covered by the care acks then.
        He said it's a quite common thing to add specialities like this to projects.

        So the support-thing is solved and I'm quite curious to test some platforms combined with the FC connected MSA (we decided against iSCSI).

        Show
        cgiesers Christof Giesers added a comment - The HPE-thing is easy: They say there's no effort for HP, because almost noone is requesting the support. Since we already had some voices here, about ppl. that would want to use it... I can't follow that argument. I've managed to get contact to the european "Category Manager - Storage", through 'Storage-Guy' Calvin Zito, and he told me that. The cerfitication would cost HPE money and they don't want to invest that money, if ppl. won't demand that. As solution, we will add that HP will help solving problems even when used XenServer to our approved HPE-project and will be covered by the care acks then. He said it's a quite common thing to add specialities like this to projects. So the support-thing is solved and I'm quite curious to test some platforms combined with the FC connected MSA (we decided against iSCSI).
        Hide
        davidcot David Cottingham added a comment -

        Yep, Christof Giesers is completely correct: it's basically about HPE seeing demand in order to do the testing. And they are normally happy to add support when they see a customer who wants to buy .

        Show
        davidcot David Cottingham added a comment - Yep, Christof Giesers is completely correct: it's basically about HPE seeing demand in order to do the testing. And they are normally happy to add support when they see a customer who wants to buy .
        Hide
        akiparuk Andrei added a comment -

        Thank you, it looks like it is fixed in new version.
        On Xenserver 7.0.0 (release): 1 thread export speed from 68 to 78 MB/s
        On Xenserver 7.0.0 (release): 3 threads export speed from 150 to 180 MB/s
        On Xenserver 7.0.1.4112 (alpha): 1 thread export speed from 140 to 150 MB/s
        On Xenserver 7.0.1.4112 (alpha): 3 threads export speed from 150 to 180 MB/s

        Current speed is limited by test hardware with sata-2 usual disks..

        Show
        akiparuk Andrei added a comment - Thank you, it looks like it is fixed in new version. On Xenserver 7.0.0 (release): 1 thread export speed from 68 to 78 MB/s On Xenserver 7.0.0 (release): 3 threads export speed from 150 to 180 MB/s On Xenserver 7.0.1.4112 (alpha): 1 thread export speed from 140 to 150 MB/s On Xenserver 7.0.1.4112 (alpha): 3 threads export speed from 150 to 180 MB/s Current speed is limited by test hardware with sata-2 usual disks..

          People

          • Assignee:
            philippeg philippe gabriel
            Reporter:
            cgiesers Christof Giesers
          • Votes:
            12 Vote for this issue
            Watchers:
            28 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: